Skip to content

Utils Module

utils

Classes:

Name Description
GeoInterface

A class that provides a unified interface for working with geospatial data from various sources.

Functions:

Name Description
sample_raster_nearest

Sample a raster file at specific coordinates, taking the nearest pixel.

reproject_crop_raster

Reproject and crop a raster file.

copy_file

Copy a file from source to destination, optionally creating a symbolic link instead.

parallel_executor

Executes a function across multiple processes and collects the results.

read_gdb_layer

Reads selected columns from a GDB layer and returns them in a pandas DataFrame.

GeoInterface

A class that provides a unified interface for working with geospatial data from various sources.

This class can load and process data from raster files (.tif/.tiff), CSV files, shapefiles, or pandas DataFrames. It provides methods for finding nearest neighbors using haversine distances between geographic coordinates.

Attributes:

Name Type Description
df DataFrame

The loaded data, containing at minimum 'lat' and 'lon' columns.

points_rad ndarray

The latitude/longitude points converted to radians.

tree BallTree

A BallTree structure for efficient nearest neighbor queries.

Parameters:

Name Type Description Default
data_source str or DataFrame

The input data source. Can be: - Path to a raster file (.tif/.tiff) - Path to a CSV file (.csv) - Path to a shapefile (.shp) - A pandas DataFrame with 'lat' and 'lon' columns

required

Raises:

Type Description
ValueError

If the data source format is unsupported or required columns are missing.

Methods:

Name Description
find_nearest

Find the nearest 'k' data points for each latitude and longitude provided separately.

lookup

Find the nearest data point to a single latitude and longitude.

find_nearest(lats, lons, k=1)

Find the nearest 'k' data points for each latitude and longitude provided separately.

Parameters:

Name Type Description Default
lats list of float

A list of latitudes.

required
lons list of float

A list of longitudes.

required
k int

Number of nearest neighbors to find.

1

Returns:

Type Description

list or pandas.DataFrame: Depending on 'k', returns a DataFrame or a list of DataFrames with the nearest points.

lookup(lat, lon)

Find the nearest data point to a single latitude and longitude.

Parameters:

Name Type Description Default
lat float

Latitude of the query point.

required
lon float

Longitude of the query point.

required

Returns:

Type Description

pandas.Series: The row from the DataFrame corresponding to the nearest point.

sample_raster_nearest(raster_file, coords, crs='EPSG:4326')

Sample a raster file at specific coordinates, taking the nearest pixel.

Parameters:

Name Type Description Default
raster_file str

Path to the raster file.

required
coords list of tuples

List of (x, y)/(lon, lat) tuples.

required
crs str

The CRS the coords are in.

'EPSG:4326'

Returns:

Name Type Description
dict

A dictionary with band names as keys and lists of pixel values at the given coordinates as values.

reproject_crop_raster(src, dst, out_epsg, min_coords, max_coords)

Reproject and crop a raster file. src_filename: Source file path. dst_filename: Destination file path. out_epsg: Output coordinate system as EPSG code. min_lon, min_lat, max_lon, max_lat: Bounding box coordinates.

copy_file(src, dest, symlink=False)

Copy a file from source to destination, optionally creating a symbolic link instead.

Parameters:

Name Type Description Default
src str

Path to the source file

required
dest str

Path to the destination file/link

required
symlink bool

Whether to create a symbolic link instead of copying. Defaults to False.

False

Returns:

Type Description

str | None: Path to the destination file if successful, None if source doesn't exist

Note

If symlink is True and the destination already exists, it will be removed first.

parallel_executor(func, args, method='Process', max_workers=10, return_value=False, bar=True, timeout=None, verbose_errors=False)

Executes a function across multiple processes and collects the results.

Parameters:

Name Type Description Default
func

The function to execute.

required
method

string as Process or Thread.

'Process'
args

An iterable of arguments to pass to the function.

required
max_workers

The maximum number of processes to use.

10
return_value

A boolean indicating whether the function returns a value.

False
timeout

Number of seconds to wait for a process to complete

None
verbose_errors

A boolean indicating whether to print full error traceback or just the exception

False

Returns:

Name Type Description
results

If return_value is True, a list of results from the function executions sorted according to If return_value is False, an empty list is returned.

failed_indices

A list of indices of arguments for which the function execution failed.

read_gdb_layer(gdb_data, layer_name, columns=None, names=None)

Reads selected columns from a GDB layer and returns them in a pandas DataFrame.

Parameters:

Name Type Description Default
gdb gdb

The GDB file opened by ogr.

required
layer_name str

The name of the layer to read.

required
columns list

List of column indices to read. If None, all columns are read.

None
names list

List of column names corresponding to the indices in columns. If None, all column names are inferred from the layer definition.

None

Returns:

Type Description

pd.DataFrame: The resulting dataframe.