bfgn.data_management package

Submodules

bfgn.data_management.apply_model_to_data module

bfgn.data_management.apply_model_to_data.apply_model_to_site(cnn, data_container, feature_files, destination_basename, output_format='GTiff', creation_options=[], CNN_MODE=False, exclude_feature_nodata=False)[source]

Apply a trained model to a raster file.

Parameters
  • cnn (Model) – Pre-trained keras CNN model

  • data_container (DataContainer) – Holds info like scalers

  • feature_files (List[str]) – Per-site feature files to apply the model to

  • destination_basename (str) – Base of the output file (will get appropriate extension)

  • output_format (str) – A viable gdal output data format.

  • creation_options (List[str]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]

  • CNN_MODE (bool) – Should the model be applied in CNN mode?

  • exclude_feature_nodata (bool) – Flag to remove all pixels in features without data from applied model

:Returns

None

Return type

None

bfgn.data_management.apply_model_to_data.maximum_likelihood_classification(likelihood_file, data_container, destination_basename, output_format='GTiff', creation_options=[])[source]

Convert a n-band map of probabilities to a classified image using maximum likelihood.

Parameters
  • likelihood_file (str) – File with per-class likelihoods

  • data_container (DataContainer) – Holds info like scalers

  • destination_basename (str) – Base of the output file (will get appropriate extension)

  • creation_options (List[str]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]

:Returns

None

Return type

None

bfgn.data_management.common_io module

bfgn.data_management.common_io.upper_left_pixel(trans, interior_x, interior_y)[source]
bfgn.data_management.common_io.get_overlapping_extent(dataset_list_of_lists)[source]
bfgn.data_management.common_io.get_overlapping_extent_coordinates(dataset_list_of_lists)[source]
bfgn.data_management.common_io.get_all_interior_extent_subset_pixel_locations(gdal_datasets, window_radius, inner_window_radius=None, shuffle=True, return_xy_size=False)[source]
bfgn.data_management.common_io.read_map_subset(datafiles, upper_lefts, window_diameter, mask=None, nodata_value=None, lower_bound=None, upper_bound=None, reference_geotransform=None)[source]
bfgn.data_management.common_io.get_boundary_sets_from_boundary_files(config)[source]
Return type

List[Dataset]

bfgn.data_management.common_io.get_site_boundary_set(config, _site)[source]
Return type

Dataset

bfgn.data_management.common_io.get_site_boundary_vector_file(config, _site)[source]
Return type

str

bfgn.data_management.common_io.rasterize_vector(vector_file, geotransform, output_shape)[source]

Rasterizes an input vector directly into a numpy array.

Parameters
  • vector_file (str) – Input vector file to be rasterized

  • geotransform (List[float]) – A gdal style geotransform

  • output_shape (Tuple) – The shape of the output file to be generated

Returns

A rasterized 2-d numpy array

Return type

mask

bfgn.data_management.common_io.read_mask_chunk(_site, upper_left, window_diameter, reference_geotransform, config)[source]
Return type

<built-in function array>

bfgn.data_management.common_io.noerror_open(filename, file_handle=0)[source]
Return type

Dataset

bfgn.data_management.common_io.convert_envi_file(original_file, destination_basename, output_format, cleanup=False, creation_options=[])[source]

Convert an ENVI file to another output format with a gdal_translate call

Parameters
  • original_file (str) – Source envi file

  • destination_basename (str) – Base of the output file (will get appropriate extension)

  • output_format (str) – A viable gdal output data format.

  • cleanup (bool) – boolean indicating whether or not to cleanup original envi files

  • creation_options (List[~T]) – GDAL creation options to pass for output file, e.g.: [‘TILED=YES’, ‘COMPRESS=DEFLATE’]

Return type

None

Returns

None

bfgn.data_management.common_io.read_chunk_by_row(datasets, pixel_upper_lefts, x_size, y_size, line_offset, nodata_value=None)[source]

Read a chunk of multiple datasets line-by-line.

Parameters
  • datasets (List[Dataset]) – each feature dataset to read

  • pixel_upper_lefts (List[List[int]]) – upper left hand pixel of each dataset

  • x_size (int) – size of x data to read

  • y_size (int) – size of y data to read

  • line_offset (int) – line offset from UL of each set to start reading at

  • nodata_value (Optional[float]) – value to encode to np.nan

Returns

feature array

Return type

feature_array

bfgn.data_management.data_core module

class bfgn.data_management.data_core.DataContainer(config)[source]

Bases: object

A container class that holds data objects that will need to be passed around for modeling, reporting, and application.

features = []
responses = []
weights = []
training_sequence = None
validation_sequence = None
feature_band_types = None
response_band_types = None
feature_raw_band_types = None
response_raw_band_types = None
feature_per_band_encoded_values = None
response_per_band_encoded_values = None
feature_scaler = []
response_scaler = []
train_folds = None
config = None
logger = None

Root logger for DataContainer. Available if user wants to directly modify the log formatting, handling, or other behavior.

Type

logging.Logger

build_or_load_rawfile_data(rebuild=False)[source]

If rawfile data has previously been built as described by the config, load it back up (essentially free operation, only data shells will be loaded). If rawfile data does not yet exist, build it as described by the config.

Parameters

rebuild (bool) – Flag used to rebuild data from scratch, even if it already exists. Defaults to False.

Return type

None

build_or_load_scalers(rebuild=False)[source]

If scalers have previously been built as described by the config, load them back up. If scalers do not yet exist, build it as described by the config. Required data to have already been built.

Parameters

rebuild – Flag used to refit scalers, even if they already exists. Defaults to False.

load_sequences()[source]

Create and attach sequences to self. Requires data to already be built and any scalers to have been fit.

check_band_types(file_list, band_types)[source]

Check the format of the band types config parameter.

Parameters
  • file_list – List of list of input files

  • band_types – List of list of band types, corresponding to the first site in the file_list

:Returns

errors: List of errors

Return type

List[str]

get_band_types(file_list, band_types)[source]

Check the format of the band types config parameter.

Parameters
  • file_list – List of list of input files

  • band_types – List of list of band types, corresponding to the

first site in the file_list

Returns

Raw output band types.

Return type

band_types

bfgn.data_management.data_core.create_built_data_output_directory(config)[source]
Return type

None

bfgn.data_management.data_core.get_log_filepath(config)[source]

Get the default log path for data builds.

Parameters

config (Config) – Configuration file.

Returns

Filepath to built data log.

Return type

log_filepath

bfgn.data_management.data_core.get_temporary_features_filepath(config)[source]
Return type

str

bfgn.data_management.data_core.get_temporary_responses_filepath(config)[source]
Return type

str

bfgn.data_management.data_core.get_temporary_weights_filepath(config)[source]
Return type

str

bfgn.data_management.data_core.get_temporary_data_filepaths(config, filename_suffix)[source]
Return type

str

bfgn.data_management.data_core.get_built_features_filepaths(config)[source]
Return type

List[str]

bfgn.data_management.data_core.get_built_responses_filepaths(config)[source]
Return type

List[str]

bfgn.data_management.data_core.get_built_weights_filepaths(config)[source]
Return type

List[str]

bfgn.data_management.data_core.get_built_data_config_filepath(config)[source]
Return type

str

bfgn.data_management.data_core.get_built_data_filepaths(config, filename_suffix)[source]
Return type

List[str]

bfgn.data_management.data_core.get_memmap_basename(config)[source]
Return type

str

bfgn.data_management.data_core.get_built_data_container_filepath(config)[source]
Return type

str

bfgn.data_management.ooc_functions module

bfgn.data_management.ooc_functions.one_hot_encode_array(raw_band_types, array, memmap_file=None, per_band_encoding=None)[source]

One hot encode an array of mixed real and categorical variables.

Parameters
  • raw_band_types (List[str]) – Band types for given array, either ‘R’ for real or ‘C’ for categorical.

  • array (<built-in function array>) – array to encode

  • memmap_file (Optional[str]) – file to use to do things out-of-core

  • per_band_encoding (Optional[List[<built-in function array>]]) – if none, this will be calculated and returned. If not none, these will be used to encode the array

Returns

now one-hot-encoded band_types: the one-hot-encoded versinon of the band-types return_band_encoding: the encoding used on a per-categorical-band basis, if per_band_encoding was None when

provided, otherwise None

Return type

array

bfgn.data_management.ooc_functions.permute_array(source, source_filename, permutation)[source]
Return type

<built-in function array>

bfgn.data_management.scalers module

bfgn.data_management.scalers.get_available_scalers()[source]

Gets list of available scaler names.

Return type

List[str]

Returns

List of available scaler names.

bfgn.data_management.scalers.get_scaler(scaler_name, scaler_options)[source]

Gets scaler matching the provided name.

Parameters
  • scaler_name (str) – Scaler name from available scalers.

  • scaler_options (dict) – Configuration for requested scaler.

Return type

BaseGlobalScaler

Returns

Scaler matching the provided name.

class bfgn.data_management.scalers.BaseGlobalScaler(savename_base=None)[source]

Bases: object

Scalers handle the process of transforming data prior to fitting or predicting using the neural network, as well as inverse transforming the data for applications or review afterwards. In this case, we use readily available scalers from the scikit-learn package to handle the nitty-gritty of the transform and inverse transform, and we use the Scaler class to handle the nitty-gritty of reshaping and otherwise handling the image arrays.

scaler_name = None
savename = None
fit(image_array)[source]
inverse_transform(image_array)[source]
transform(image_array)[source]
fit_transform(image_array)[source]
save()[source]
load()[source]
class bfgn.data_management.scalers.BaseSklearnScaler(savename_base)[source]

Bases: bfgn.data_management.scalers.BaseGlobalScaler

scaler = None
inverse_transform(image_array)[source]
transform(image_array)[source]
save()[source]
load()[source]
class bfgn.data_management.scalers.NullScaler(savename_base)[source]

Bases: bfgn.data_management.scalers.BaseGlobalScaler

inverse_transform(image_array)[source]
transform(image_array)[source]
save()[source]
load()[source]
class bfgn.data_management.scalers.ConstantScaler(savename_base, constant_scaler=None, constant_offset=None)[source]

Bases: bfgn.data_management.scalers.BaseGlobalScaler

constant_scaler = None
constant_offset = None
inverse_transform(image_array)[source]
transform(image_array)[source]
save()[source]
load()[source]
class bfgn.data_management.scalers.StandardScaler(savename_base)[source]

Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.MinMaxScaler(savename_base, feature_range=(0, 1))[source]

Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.RobustScaler(savename_base, quantile_range=(10.0, 90.0))[source]

Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.PowerScaler(savename_base, method='box-cox')[source]

Bases: bfgn.data_management.scalers.BaseSklearnScaler

class bfgn.data_management.scalers.QuantileUniformScaler(savename_base, output_distribution='uniform')[source]

Bases: bfgn.data_management.scalers.BaseSklearnScaler

bfgn.data_management.sequences module

class bfgn.data_management.sequences.BaseSequence(feature_scaler, response_scaler, batch_size, apply_random_transforms=False, nan_replacement_value=None)[source]

Bases: keras.utils.data_utils.Sequence

feature_scaler = None
response_scaler = None
apply_random_transforms = None
get_raw_and_transformed_sample(index)[source]
Return type

Tuple[Tuple[List[<built-in function array>], List[<built-in function array>]], Tuple[List[<built-in function array>], List[<built-in function array>]]]

class bfgn.data_management.sequences.MemmappedSequence(features, responses, weights, feature_scaler, response_scaler, batch_size, apply_random_transforms, feature_mean_centering, nan_replacement_value)[source]

Bases: bfgn.data_management.sequences.BaseSequence

bfgn.data_management.single_image_scaling module

bfgn.data_management.single_image_scaling.scale_vector(dat, flag, nodata_value=-9999)[source]

Scale a 1-d numpy array in a specified maner, ignoring nodata values. Arguments: dat - input vector to be scaled flag - an indicator of the chosen scaling option

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata_value specified

Return: The offset and gain scaling factors, in a two-value list form.

bfgn.data_management.single_image_scaling.scale_image(image, flag, nodata_value=-9999)[source]

Scale an image based on preset flag. Arguments: image - 3d array with assumed dimensions y,x,band flag - scaling flag to use (None if no scaling)

Return: An image matching the input image dimension with scaling applied to it.

bfgn.data_management.single_image_scaling.scale_image_mean_std(image, nodata_value=-9999)[source]

Mean center and standard-deviation normalize an image. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band mean centering and std normalization applied

bfgn.data_management.single_image_scaling.scale_image_mean(image, nodata_value=-9999)[source]

Mean center an image. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band mean centering applied

bfgn.data_management.single_image_scaling.scale_image_minmax(image, nodata_value=-9999)[source]

Scale image based on local mins and maxes. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with per-band minmax scaling applied

bfgn.data_management.single_image_scaling.fill_nearest_neighbor(image, nodata=-9999)[source]

Fill in missing values in an image using a nearest neighbor approach. Arguments: image - 3d array with assumed dimensions y,x,band

Keyword Aguments: ndoata_value - value to be ignored, None of no nodata speified

Return: Image with nodata_value values filled in with their nearest neighbors.

bfgn.data_management.training_data module

bfgn.data_management.training_data.build_training_data_ordered(config, feature_raw_band_types, response_raw_band_types)[source]
Return type

Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>], List[str], List[str], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.build_training_data_from_response_points(config, feature_raw_band_types, response_raw_band_types)[source]
Return type

Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>], List[str], List[str], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.get_proj(fname)[source]

Get the projection of a raster/vector dataset.

Parameters

fname (str) – Name of input file

:return The projection of the input fname

Return type

str

bfgn.data_management.training_data.check_projections(f_files, r_files, b_files=None)[source]
Return type

List[str]

bfgn.data_management.training_data.check_resolutions(f_files, r_files, b_files=None)[source]
Return type

List[str]

bfgn.data_management.training_data.calculate_categorical_weights(responses, weights, config, batch_size=100)[source]
Return type

List[<built-in function array>]

bfgn.data_management.training_data.read_labeling_chunk(_site, offset_from_ul, config, reference_geotransform)[source]
Return type

<built-in function array>

bfgn.data_management.training_data.read_segmentation_chunk(_site, all_file_upper_lefts, offset_from_ul, config, reference_geotransform, sample_index)[source]
Return type

bool

bfgn.data_management.training_data.load_built_data_files(config, writeable=False)[source]
Return type

Tuple[List[<built-in function array>], List[<built-in function array>], List[<built-in function array>]]

bfgn.data_management.training_data.check_built_data_files_exist(config)[source]
Return type

bool

Module contents