bfgn.data_management package¶
Submodules¶
bfgn.data_management.apply_model_to_data module¶
-
bfgn.data_management.apply_model_to_data.
apply_model_to_site
(cnn, data_container, feature_files, destination_basename, output_format='GTiff', creation_options=[], CNN_MODE=False, exclude_feature_nodata=False)[source]¶ Apply a trained model to a raster file.
- Parameters
cnn (
Model
) – Pre-trained keras CNN modeldata_container (
DataContainer
) – Holds info like scalersfeature_files (
List
[str
]) – Per-site feature files to apply the model todestination_basename (
str
) – Base of the output file (will get appropriate extension)output_format (
str
) – A viable gdal output data format.creation_options (
List
[str
]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]CNN_MODE (
bool
) – Should the model be applied in CNN mode?exclude_feature_nodata (
bool
) – Flag to remove all pixels in features without data from applied model
- :Returns
None
- Return type
None
-
bfgn.data_management.apply_model_to_data.
maximum_likelihood_classification
(likelihood_file, data_container, destination_basename, output_format='GTiff', creation_options=[])[source]¶ Convert a n-band map of probabilities to a classified image using maximum likelihood.
- Parameters
likelihood_file (
str
) – File with per-class likelihoodsdata_container (
DataContainer
) – Holds info like scalersdestination_basename (
str
) – Base of the output file (will get appropriate extension)creation_options (
List
[str
]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]
- :Returns
None
- Return type
None
bfgn.data_management.common_io module¶
-
bfgn.data_management.common_io.
get_all_interior_extent_subset_pixel_locations
(gdal_datasets, window_radius, inner_window_radius=None, shuffle=True, return_xy_size=False)[source]¶
-
bfgn.data_management.common_io.
read_map_subset
(datafiles, upper_lefts, window_diameter, mask=None, nodata_value=None, lower_bound=None, upper_bound=None, reference_geotransform=None)[source]¶
-
bfgn.data_management.common_io.
get_boundary_sets_from_boundary_files
(config)[source]¶ - Return type
List
[Dataset
]
-
bfgn.data_management.common_io.
rasterize_vector
(vector_file, geotransform, output_shape)[source]¶ Rasterizes an input vector directly into a numpy array.
-
bfgn.data_management.common_io.
read_mask_chunk
(_site, upper_left, window_diameter, reference_geotransform, config)[source]¶ - Return type
<built-in function array>
-
bfgn.data_management.common_io.
convert_envi_file
(original_file, destination_basename, output_format, cleanup=False, creation_options=[])[source]¶ Convert an ENVI file to another output format with a gdal_translate call
- Parameters
original_file (
str
) – Source envi filedestination_basename (
str
) – Base of the output file (will get appropriate extension)output_format (
str
) – A viable gdal output data format.cleanup (
bool
) – boolean indicating whether or not to cleanup original envi filescreation_options (
List
[~T]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]
- Return type
None
- Returns
None
-
bfgn.data_management.common_io.
read_chunk_by_row
(datasets, pixel_upper_lefts, x_size, y_size, line_offset, nodata_value=None)[source]¶ Read a chunk of multiple datasets line-by-line.
- Parameters
datasets (
List
[Dataset
]) – each feature dataset to readpixel_upper_lefts (
List
[List
[int
]]) – upper left hand pixel of each datasetx_size (
int
) – size of x data to ready_size (
int
) – size of y data to readline_offset (
int
) – line offset from UL of each set to start reading atnodata_value (
Optional
[float
]) – value to encode to np.nan
- Returns
feature array
- Return type
feature_array
bfgn.data_management.data_core module¶
-
class
bfgn.data_management.data_core.
DataContainer
(config)[source]¶ Bases:
object
A container class that holds data objects that will need to be passed around for modeling, reporting, and application.
-
features
= []¶
-
responses
= []¶
-
weights
= []¶
-
training_sequence
= None¶
-
validation_sequence
= None¶
-
feature_band_types
= None¶
-
response_band_types
= None¶
-
feature_raw_band_types
= None¶
-
response_raw_band_types
= None¶
-
feature_per_band_encoded_values
= None¶
-
response_per_band_encoded_values
= None¶
-
feature_scaler
= []¶
-
response_scaler
= []¶
-
train_folds
= None¶
-
config
= None¶
-
logger
= None¶ Root logger for DataContainer. Available if user wants to directly modify the log formatting, handling, or other behavior.
- Type
-
build_or_load_rawfile_data
(rebuild=False)[source]¶ If rawfile data has previously been built as described by the config, load it back up (essentially free operation, only data shells will be loaded). If rawfile data does not yet exist, build it as described by the config.
- Parameters
rebuild (
bool
) – Flag used to rebuild data from scratch, even if it already exists. Defaults to False.- Return type
None
-
build_or_load_scalers
(rebuild=False)[source]¶ If scalers have previously been built as described by the config, load them back up. If scalers do not yet exist, build it as described by the config. Required data to have already been built.
- Parameters
rebuild – Flag used to refit scalers, even if they already exists. Defaults to False.
-
load_sequences
()[source]¶ Create and attach sequences to self. Requires data to already be built and any scalers to have been fit.
-
-
bfgn.data_management.data_core.
get_log_filepath
(config)[source]¶ Get the default log path for data builds.
- Parameters
config (
Config
) – Configuration file.- Returns
Filepath to built data log.
- Return type
log_filepath
-
bfgn.data_management.data_core.
get_temporary_data_filepaths
(config, filename_suffix)[source]¶ - Return type
bfgn.data_management.ooc_functions module¶
-
bfgn.data_management.ooc_functions.
one_hot_encode_array
(raw_band_types, array, memmap_file=None, per_band_encoding=None)[source]¶ One hot encode an array of mixed real and categorical variables.
- Parameters
raw_band_types (
List
[str
]) – Band types for given array, either ‘R’ for real or ‘C’ for categorical.array (<built-in function array>) – array to encode
memmap_file (
Optional
[str
]) – file to use to do things out-of-coreper_band_encoding (
Optional
[List
[<built-in function array>]]) – if none, this will be calculated and returned. If not none, these will be used to encode the array
- Returns
now one-hot-encoded band_types: the one-hot-encoded versinon of the band-types return_band_encoding: the encoding used on a per-categorical-band basis, if per_band_encoding was None when
provided, otherwise None
- Return type
array
bfgn.data_management.scalers module¶
-
bfgn.data_management.scalers.
get_available_scalers
()[source]¶ Gets list of available scaler names.
- Return type
List
[str
]- Returns
List of available scaler names.
-
bfgn.data_management.scalers.
get_scaler
(scaler_name, scaler_options)[source]¶ Gets scaler matching the provided name.
- Parameters
- Return type
- Returns
Scaler matching the provided name.
-
class
bfgn.data_management.scalers.
BaseGlobalScaler
(savename_base=None)[source]¶ Bases:
object
Scalers handle the process of transforming data prior to fitting or predicting using the neural network, as well as inverse transforming the data for applications or review afterwards. In this case, we use readily available scalers from the scikit-learn package to handle the nitty-gritty of the transform and inverse transform, and we use the Scaler class to handle the nitty-gritty of reshaping and otherwise handling the image arrays.
-
scaler_name
= None¶
-
savename
= None¶
-
-
class
bfgn.data_management.scalers.
BaseSklearnScaler
(savename_base)[source]¶ Bases:
bfgn.data_management.scalers.BaseGlobalScaler
-
scaler
= None¶
-
-
class
bfgn.data_management.scalers.
ConstantScaler
(savename_base, constant_scaler=None, constant_offset=None)[source]¶ Bases:
bfgn.data_management.scalers.BaseGlobalScaler
-
constant_scaler
= None¶
-
constant_offset
= None¶
-
-
class
bfgn.data_management.scalers.
RobustScaler
(savename_base, quantile_range=(10.0, 90.0))[source]¶
bfgn.data_management.sequences module¶
-
class
bfgn.data_management.sequences.
BaseSequence
(feature_scaler, response_scaler, batch_size, apply_random_transforms=False, nan_replacement_value=None)[source]¶ Bases:
keras.utils.data_utils.Sequence
-
feature_scaler
= None¶
-
response_scaler
= None¶
-
apply_random_transforms
= None¶
-
bfgn.data_management.single_image_scaling module¶
-
bfgn.data_management.single_image_scaling.
scale_vector
(dat, flag, nodata_value=-9999)[source]¶ Scale a 1-d numpy array in a specified maner, ignoring nodata values. Arguments: dat - input vector to be scaled flag - an indicator of the chosen scaling option
Keyword Aguments: ndoata_value - value to be ignored, None of no nodata_value specified
Return: The offset and gain scaling factors, in a two-value list form.
-
bfgn.data_management.single_image_scaling.
scale_image
(image, flag, nodata_value=-9999)[source]¶ Scale an image based on preset flag. Arguments: image - 3d array with assumed dimensions y,x,band flag - scaling flag to use (None if no scaling)
Return: An image matching the input image dimension with scaling applied to it.
-
bfgn.data_management.single_image_scaling.
scale_image_mean_std
(image, nodata_value=-9999)[source]¶ Mean center and standard-deviation normalize an image. Arguments: image - 3d array with assumed dimensions y,x,band
Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified
Return: Image with per-band mean centering and std normalization applied
-
bfgn.data_management.single_image_scaling.
scale_image_mean
(image, nodata_value=-9999)[source]¶ Mean center an image. Arguments: image - 3d array with assumed dimensions y,x,band
Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified
Return: Image with per-band mean centering applied
-
bfgn.data_management.single_image_scaling.
scale_image_minmax
(image, nodata_value=-9999)[source]¶ Scale image based on local mins and maxes. Arguments: image - 3d array with assumed dimensions y,x,band
Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified
Return: Image with per-band minmax scaling applied
-
bfgn.data_management.single_image_scaling.
fill_nearest_neighbor
(image, nodata=-9999)[source]¶ Fill in missing values in an image using a nearest neighbor approach. Arguments: image - 3d array with assumed dimensions y,x,band
Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified
Return: Image with nodata_value values filled in with their nearest neighbors.
bfgn.data_management.training_data module¶
-
bfgn.data_management.training_data.
build_training_data_ordered
(config, feature_raw_band_types, response_raw_band_types)[source]¶
-
bfgn.data_management.training_data.
build_training_data_from_response_points
(config, feature_raw_band_types, response_raw_band_types)[source]¶
-
bfgn.data_management.training_data.
get_proj
(fname)[source]¶ Get the projection of a raster/vector dataset.
- Parameters
fname (str) – Name of input file
:return The projection of the input fname
- Return type
-
bfgn.data_management.training_data.
check_projections
(f_files, r_files, b_files=None)[source]¶ - Return type
List
[str
]
-
bfgn.data_management.training_data.
check_resolutions
(f_files, r_files, b_files=None)[source]¶ - Return type
List
[str
]
-
bfgn.data_management.training_data.
calculate_categorical_weights
(responses, weights, config, batch_size=100)[source]¶ - Return type
List
[<built-in function array>]
-
bfgn.data_management.training_data.
read_labeling_chunk
(_site, offset_from_ul, config, reference_geotransform)[source]¶ - Return type
<built-in function array>
-
bfgn.data_management.training_data.
read_segmentation_chunk
(_site, all_file_upper_lefts, offset_from_ul, config, reference_geotransform, sample_index)[source]¶ - Return type