bfgn.data_management package¶

Submodules¶

bfgn.data_management.apply_model_to_data module¶

bfgn.data_management.apply_model_to_data.apply_model_to_site(cnn, data_container, feature_files, destination_basename, output_format='GTiff', creation_options=[], CNN_MODE=False, exclude_feature_nodata=False)[source]¶

Apply a trained model to a raster file.

Parameters

cnn (Model) – Pre-trained keras CNN model
data_container (DataContainer) – Holds info like scalers
feature_files (List[str]) – Per-site feature files to apply the model to
destination_basename (str) – Base of the output file (will get appropriate extension)
output_format (str) – A viable gdal output data format.
creation_options (List[str]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]
CNN_MODE (bool) – Should the model be applied in CNN mode?
exclude_feature_nodata (bool) – Flag to remove all pixels in features without data from applied model

:Returns: None

Return type: None

bfgn.data_management.apply_model_to_data.maximum_likelihood_classification(likelihood_file, data_container, destination_basename, output_format='GTiff', creation_options=[])[source]¶

Convert a n-band map of probabilities to a classified image using maximum likelihood.

Parameters

likelihood_file (str) – File with per-class likelihoods
data_container (DataContainer) – Holds info like scalers
destination_basename (str) – Base of the output file (will get appropriate extension)
creation_options (List[str]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]

:Returns: None

Return type: None

bfgn.data_management.common_io module¶

bfgn.data_management.common_io.upper_left_pixel(trans, interior_x, interior_y)[source]¶

bfgn.data_management.common_io.get_overlapping_extent(dataset_list_of_lists)[source]¶

bfgn.data_management.common_io.get_overlapping_extent_coordinates(dataset_list_of_lists)[source]¶

bfgn.data_management.common_io.get_all_interior_extent_subset_pixel_locations(gdal_datasets, window_radius, inner_window_radius=None, shuffle=True, return_xy_size=False)[source]¶

bfgn.data_management.common_io.read_map_subset(datafiles, upper_lefts, window_diameter, mask=None, nodata_value=None, lower_bound=None, upper_bound=None, reference_geotransform=None)[source]¶

bfgn.data_management.common_io.get_boundary_sets_from_boundary_files(config)[source]¶

Return type: List[Dataset]

bfgn.data_management.common_io.get_site_boundary_set(config, _site)[source]¶

Return type: Dataset

bfgn.data_management.common_io.get_site_boundary_vector_file(config, _site)[source]¶

Return type: str

bfgn.data_management.common_io.rasterize_vector(vector_file, geotransform, output_shape)[source]¶

Rasterizes an input vector directly into a numpy array.

Parameters

vector_file (str) – Input vector file to be rasterized
geotransform (List[float]) – A gdal style geotransform
output_shape (Tuple) – The shape of the output file to be generated

Returns

A rasterized 2-d numpy array

Return type

mask

bfgn.data_management.common_io.read_mask_chunk(_site, upper_left, window_diameter, reference_geotransform, config)[source]¶

Return type: <built-in function array>

bfgn.data_management.common_io.noerror_open(filename, file_handle=0)[source]¶

Return type: Dataset

bfgn.data_management.common_io.convert_envi_file(original_file, destination_basename, output_format, cleanup=False, creation_options=[])[source]¶

Convert an ENVI file to another output format with a gdal_translate call

Parameters

original_file (str) – Source envi file
destination_basename (str) – Base of the output file (will get appropriate extension)
output_format (str) – A viable gdal output data format.
cleanup (bool) – boolean indicating whether or not to cleanup original envi files
creation_options (List[~T]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]

Return type

None

Returns

None

bfgn.data_management.common_io.read_chunk_by_row(datasets, pixel_upper_lefts, x_size, y_size, line_offset, nodata_value=None)[source]¶

Read a chunk of multiple datasets line-by-line.

Parameters

datasets (List[Dataset]) – each feature dataset to read
pixel_upper_lefts (List[List[int]]) – upper left hand pixel of each dataset
x_size (int) – size of x data to read
y_size (int) – size of y data to read
line_offset (int) – line offset from UL of each set to start reading at
nodata_value (Optional[float]) – value to encode to np.nan

Returns

feature array

Return type

feature_array

bfgn.data_management.data_core module¶

class bfgn.data_management.data_core.DataContainer(config)[source]¶

Bases: object

A container class that holds data objects that will need to be passed around for modeling, reporting, and application.

features = []¶

responses = []¶

weights = []¶

training_sequence = None¶

validation_sequence = None¶

feature_band_types = None¶

response_band_types = None¶

feature_raw_band_types = None¶

response_raw_band_types = None¶

feature_per_band_encoded_values = None¶

response_per_band_encoded_values = None¶

feature_scaler = []¶

response_scaler = []¶

train_folds = None¶

config = None¶

logger = None¶

Root logger for DataContainer. Available if user wants to directly modify the log formatting, handling, or other behavior.

Type: logging.Logger

build_or_load_rawfile_data(rebuild=False)[source]¶

If rawfile data has previously been built as described by the config, load it back up (essentially free operation, only data shells will be loaded). If rawfile data does not yet exist, build it as described by the config.

Parameters: rebuild (bool) – Flag used to rebuild data from scratch, even if it already exists. Defaults to False.
Return type: None

build_or_load_scalers(rebuild=False)[source]¶

If scalers have previously been built as described by the config, load them back up. If scalers do not yet exist, build it as described by the config. Required data to have already been built.

Parameters: rebuild – Flag used to refit scalers, even if they already exists. Defaults to False.

load_sequences()[source]¶: Create and attach sequences to self. Requires data to already be built and any scalers to have been fit.

check_band_types(file_list, band_types)[source]¶

Check the format of the band types config parameter.

Parameters

file_list – List of list of input files
band_types – List of list of band types, corresponding to the first site in the file_list

:Returns: errors: List of errors

Return type: List[str]

get_band_types(file_list, band_types)[source]¶

Check the format of the band types config parameter.

Parameters

file_list – List of list of input files
band_types – List of list of band types, corresponding to the

first site in the file_list

Returns: Raw output band types.
Return type: band_types

bfgn.data_management.data_core.create_built_data_output_directory(config)[source]¶

Return type: None

bfgn.data_management.data_core.get_log_filepath(config)[source]¶

Get the default log path for data builds.

Parameters: config (Config) – Configuration file.
Returns: Filepath to built data log.
Return type: log_filepath

bfgn.data_management.data_core.get_temporary_features_filepath(config)[source]¶

Return type: str

bfgn.data_management.data_core.get_temporary_responses_filepath(config)[source]¶

Return type: str

bfgn.data_management.data_core.get_temporary_weights_filepath(config)[source]¶

Return type: str

bfgn.data_management.data_core.get_temporary_data_filepaths(config, filename_suffix)[source]¶

Return type: str

bfgn.data_management.data_core.get_built_features_filepaths(config)[source]¶

Return type: List[str]

bfgn.data_management.data_core.get_built_responses_filepaths(config)[source]¶

Return type: List[str]

bfgn.data_management.data_core.get_built_weights_filepaths(config)[source]¶

Return type: List[str]

bfgn.data_management.data_core.get_built_data_config_filepath(config)[source]¶

Return type: str

bfgn.data_management.data_core.get_built_data_filepaths(config, filename_suffix)[source]¶

Return type: List[str]

bfgn.data_management.data_core.get_memmap_basename(config)[source]¶

Return type: str

bfgn.data_management.data_core.get_built_data_container_filepath(config)[source]¶

Return type: str

bfgn.data_management.ooc_functions module¶

bfgn.data_management.ooc_functions.one_hot_encode_array(raw_band_types, array, memmap_file=None, per_band_encoding=None)[source]¶

One hot encode an array of mixed real and categorical variables.

Parameters

raw_band_types (List[str]) – Band types for given array, either ‘R’ for real or ‘C’ for categorical.
array (<built-in function array>) – array to encode
memmap_file (Optional[str]) – file to use to do things out-of-core
per_band_encoding (Optional[List[<built-in function array>]]) – if none, this will be calculated and returned. If not none, these will be used to encode the array

Returns

now one-hot-encoded band_types: the one-hot-encoded versinon of the band-types return_band_encoding: the encoding used on a per-categorical-band basis, if per_band_encoding was None when

provided, otherwise None

Return type

array

bfgn.data_management.ooc_functions.permute_array(source, source_filename, permutation)[source]¶

Return type: <built-in function array>

bfgn.data_management.scalers module¶

bfgn.data_management.scalers.get_available_scalers()[source]¶

Gets list of available scaler names.

Return type: List[str]
Returns: List of available scaler names.

bfgn.data_management.scalers.get_scaler(scaler_name, scaler_options)[source]¶

Gets scaler matching the provided name.

Parameters

scaler_name (str) – Scaler name from available scalers.
scaler_options (dict) – Configuration for requested scaler.

Return type

BaseGlobalScaler

Returns

Scaler matching the provided name.

class bfgn.data_management.scalers.BaseGlobalScaler(savename_base=None)[source]¶

Bases: object

Scalers handle the process of transforming data prior to fitting or predicting using the neural network, as well as inverse transforming the data for applications or review afterwards. In this case, we use readily available scalers from the scikit-learn package to handle the nitty-gritty of the transform and inverse transform, and we use the Scaler class to handle the nitty-gritty of reshaping and otherwise handling the image arrays.

scaler_name = None¶

savename = None¶

fit(image_array)[source]¶

inverse_transform(image_array)[source]¶

transform(image_array)[source]¶

fit_transform(image_array)[source]¶

save()[source]¶

load()[source]¶

class bfgn.data_management.scalers.BaseSklearnScaler(savename_base)[source]¶

Bases: bfgn.data_management.scalers.BaseGlobalScaler

scaler = None¶

inverse_transform(image_array)[source]¶

transform(image_array)[source]¶

save()[source]¶

load()[source]¶

class bfgn.data_management.scalers.NullScaler(savename_base)[source]¶

Bases: bfgn.data_management.scalers.BaseGlobalScaler

inverse_transform(image_array)[source]¶

transform(image_array)[source]¶

save()[source]¶

load()[source]¶

class bfgn.data_management.scalers.ConstantScaler(savename_base, constant_scaler=None, constant_offset=None)[source]¶

Bases: bfgn.data_management.scalers.BaseGlobalScaler

constant_scaler = None¶

constant_offset = None¶

inverse_transform(image_array)[source]¶

transform(image_array)[source]¶

save()[source]¶

load()[source]¶

class bfgn.data_management.scalers.StandardScaler(savename_base)[source]¶: Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.MinMaxScaler(savename_base, feature_range=(0, 1))[source]¶: Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.RobustScaler(savename_base, quantile_range=(10.0, 90.0))[source]¶: Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.PowerScaler(savename_base, method='box-cox')[source]¶: Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.QuantileUniformScaler(savename_base, output_distribution='uniform')[source]¶: Bases: bfgn.data_management.scalers.BaseSklearnScaler

bfgn.data_management.sequences module¶

class bfgn.data_management.sequences.BaseSequence(feature_scaler, response_scaler, batch_size, apply_random_transforms=False, nan_replacement_value=None)[source]¶

Bases: keras.utils.data_utils.Sequence

feature_scaler = None¶

response_scaler = None¶

apply_random_transforms = None¶

get_raw_and_transformed_sample(index)[source]¶

Return type: Tuple[Tuple[List[<built-in function array>], List[<built-in function array>]], Tuple[List[<built-in function array>], List[<built-in function array>]]]

class bfgn.data_management.sequences.MemmappedSequence(features, responses, weights, feature_scaler, response_scaler, batch_size, apply_random_transforms, feature_mean_centering, nan_replacement_value)[source]¶: Bases: bfgn.data_management.sequences.BaseSequence

bfgn.data_management.single_image_scaling module¶

bfgn.data_management.single_image_scaling.scale_vector(dat, flag, nodata_value=-9999)[source]¶

Scale a 1-d numpy array in a specified maner, ignoring nodata values. Arguments: dat - input vector to be scaled flag - an indicator of the chosen scaling option

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata_value specified

Return: The offset and gain scaling factors, in a two-value list form.

bfgn.data_management.single_image_scaling.scale_image(image, flag, nodata_value=-9999)[source]¶

Scale an image based on preset flag. Arguments: image - 3d array with assumed dimensions y,x,band flag - scaling flag to use (None if no scaling)

Return: An image matching the input image dimension with scaling applied to it.

bfgn.data_management.single_image_scaling.scale_image_mean_std(image, nodata_value=-9999)[source]¶

Mean center and standard-deviation normalize an image. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band mean centering and std normalization applied

bfgn.data_management.single_image_scaling.scale_image_mean(image, nodata_value=-9999)[source]¶

Mean center an image. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band mean centering applied

bfgn.data_management.single_image_scaling.scale_image_minmax(image, nodata_value=-9999)[source]¶

Scale image based on local mins and maxes. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band minmax scaling applied

bfgn.data_management.single_image_scaling.fill_nearest_neighbor(image, nodata=-9999)[source]¶

Fill in missing values in an image using a nearest neighbor approach. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with nodata_value values filled in with their nearest neighbors.

bfgn.data_management.training_data module¶

bfgn.data_management.training_data.build_training_data_ordered(config, feature_raw_band_types, response_raw_band_types)[source]¶

Return type: Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>], List[str], List[str], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.build_training_data_from_response_points(config, feature_raw_band_types, response_raw_band_types)[source]¶

Return type: Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>], List[str], List[str], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.get_proj(fname)[source]¶

Get the projection of a raster/vector dataset.

Parameters: fname (str) – Name of input file

:return The projection of the input fname

Return type: str

bfgn.data_management.training_data.check_projections(f_files, r_files, b_files=None)[source]¶

Return type: List[str]

bfgn.data_management.training_data.check_resolutions(f_files, r_files, b_files=None)[source]¶

Return type: List[str]

bfgn.data_management.training_data.calculate_categorical_weights(responses, weights, config, batch_size=100)[source]¶

Return type: List[<built-in function array>]

bfgn.data_management.training_data.read_labeling_chunk(_site, offset_from_ul, config, reference_geotransform)[source]¶

Return type: <built-in function array>

bfgn.data_management.training_data.read_segmentation_chunk(_site, all_file_upper_lefts, offset_from_ul, config, reference_geotransform, sample_index)[source]¶

Return type: bool

bfgn.data_management.training_data.load_built_data_files(config, writeable=False)[source]¶

Return type: Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.check_built_data_files_exist(config)[source]¶

Return type: bool

bfgn.data_management package¶

Submodules¶

bfgn.data_management.apply_model_to_data module¶

bfgn.data_management.common_io module¶

bfgn.data_management.data_core module¶

bfgn.data_management.ooc_functions module¶

bfgn.data_management.scalers module¶

bfgn.data_management.sequences module¶

bfgn.data_management.single_image_scaling module¶

bfgn.data_management.training_data module¶

Module contents¶