Utils Module

`utils`

Classes:

Name	Description
`GeoInterface`	A class that provides a unified interface for working with geospatial data from various sources.

Functions:

Name	Description
`sample_raster_nearest`	Sample a raster file at specific coordinates, taking the nearest pixel.
`reproject_crop_raster`	Reproject and crop a raster file.
`copy_file`	Copy a file from source to destination, optionally creating a symbolic link instead.
`parallel_executor`	Executes a function across multiple processes and collects the results.
`read_gdb_layer`	Reads selected columns from a GDB layer and returns them in a pandas DataFrame.

`GeoInterface`

A class that provides a unified interface for working with geospatial data from various sources.

This class can load and process data from raster files (.tif/.tiff), CSV files, shapefiles, or pandas DataFrames. It provides methods for finding nearest neighbors using haversine distances between geographic coordinates.

Attributes:

Name	Type	Description
`df`	`DataFrame`	The loaded data, containing at minimum 'lat' and 'lon' columns.
`points_rad`	`ndarray`	The latitude/longitude points converted to radians.
`tree`	`BallTree`	A BallTree structure for efficient nearest neighbor queries.

Parameters:

Name	Type	Description	Default
`data_source`	`str or DataFrame`	The input data source. Can be: - Path to a raster file (.tif/.tiff) - Path to a CSV file (.csv) - Path to a shapefile (.shp) - A pandas DataFrame with 'lat' and 'lon' columns	required

Raises:

Type	Description
`ValueError`	If the data source format is unsupported or required columns are missing.

Methods:

Name	Description
`find_nearest`	Find the nearest 'k' data points for each latitude and longitude provided separately.
`lookup`	Find the nearest data point to a single latitude and longitude.

`find_nearest(lats, lons, k=1)`

Find the nearest 'k' data points for each latitude and longitude provided separately.

Parameters:

Name	Type	Description	Default
`lats`	`list of float`	A list of latitudes.	required
`lons`	`list of float`	A list of longitudes.	required
`k`	`int`	Number of nearest neighbors to find.	`1`

Returns:

Type	Description
	list or pandas.DataFrame: Depending on 'k', returns a DataFrame or a list of DataFrames with the nearest points.

`lookup(lat, lon)`

Find the nearest data point to a single latitude and longitude.

Parameters:

Name	Type	Description	Default
`lat`	`float`	Latitude of the query point.	required
`lon`	`float`	Longitude of the query point.	required

Returns:

Type	Description
	pandas.Series: The row from the DataFrame corresponding to the nearest point.

`sample_raster_nearest(raster_file, coords, crs='EPSG:4326')`

Sample a raster file at specific coordinates, taking the nearest pixel.

Parameters:

Name	Type	Description	Default
`raster_file`	`str`	Path to the raster file.	required
`coords`	`list of tuples`	List of (x, y)/(lon, lat) tuples.	required
`crs`	`str`	The CRS the coords are in.	`'EPSG:4326'`

Returns:

Name	Type	Description
`dict`		A dictionary with band names as keys and lists of pixel values at the given coordinates as values.

`reproject_crop_raster(src, dst, out_epsg, min_coords, max_coords)`

Reproject and crop a raster file. src_filename: Source file path. dst_filename: Destination file path. out_epsg: Output coordinate system as EPSG code. min_lon, min_lat, max_lon, max_lat: Bounding box coordinates.

`copy_file(src, dest, symlink=False)`

Copy a file from source to destination, optionally creating a symbolic link instead.

Parameters:

Name	Type	Description	Default
`src`	`str`	Path to the source file	required
`dest`	`str`	Path to the destination file/link	required
`symlink`	`bool`	Whether to create a symbolic link instead of copying. Defaults to False.	`False`

Returns:

Type	Description
	str \| None: Path to the destination file if successful, None if source doesn't exist

Note

If symlink is True and the destination already exists, it will be removed first.

`parallel_executor(func, args, method='Process', max_workers=10, return_value=False, bar=True, timeout=None, verbose_errors=False)`

Executes a function across multiple processes and collects the results.

Parameters:

Name	Description	Default
`func`	The function to execute.	required
`method`	string as Process or Thread.	`'Process'`
`args`	An iterable of arguments to pass to the function.	required
`max_workers`	The maximum number of processes to use.	`10`
`return_value`	A boolean indicating whether the function returns a value.	`False`
`timeout`	Number of seconds to wait for a process to complete	`None`
`verbose_errors`	A boolean indicating whether to print full error traceback or just the exception	`False`

Returns:

Name	Type	Description
`results`		If return_value is True, a list of results from the function executions sorted according to If return_value is False, an empty list is returned.
`failed_indices`		A list of indices of arguments for which the function execution failed.

`read_gdb_layer(gdb_data, layer_name, columns=None, names=None)`

Reads selected columns from a GDB layer and returns them in a pandas DataFrame.

Parameters:

Name	Type	Description	Default
`gdb`	`gdb`	The GDB file opened by ogr.	required
`layer_name`	`str`	The name of the layer to read.	required
`columns`	`list`	List of column indices to read. If None, all columns are read.	`None`
`names`	`list`	List of column names corresponding to the indices in `columns`. If None, all column names are inferred from the layer definition.	`None`

Returns:

Type	Description
	pd.DataFrame: The resulting dataframe.