Utils Module
utils
Classes:
Name | Description |
---|---|
GeoInterface |
A class that provides a unified interface for working with geospatial data from various sources. |
Functions:
Name | Description |
---|---|
sample_raster_nearest |
Sample a raster file at specific coordinates, taking the nearest pixel. |
reproject_crop_raster |
Reproject and crop a raster file. |
copy_file |
Copy a file from source to destination, optionally creating a symbolic link instead. |
parallel_executor |
Executes a function across multiple processes and collects the results. |
read_gdb_layer |
Reads selected columns from a GDB layer and returns them in a pandas DataFrame. |
GeoInterface
A class that provides a unified interface for working with geospatial data from various sources.
This class can load and process data from raster files (.tif/.tiff), CSV files, shapefiles, or pandas DataFrames. It provides methods for finding nearest neighbors using haversine distances between geographic coordinates.
Attributes:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
The loaded data, containing at minimum 'lat' and 'lon' columns. |
points_rad |
ndarray
|
The latitude/longitude points converted to radians. |
tree |
BallTree
|
A BallTree structure for efficient nearest neighbor queries. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_source
|
str or DataFrame
|
The input data source. Can be: - Path to a raster file (.tif/.tiff) - Path to a CSV file (.csv) - Path to a shapefile (.shp) - A pandas DataFrame with 'lat' and 'lon' columns |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the data source format is unsupported or required columns are missing. |
Methods:
Name | Description |
---|---|
find_nearest |
Find the nearest 'k' data points for each latitude and longitude provided separately. |
lookup |
Find the nearest data point to a single latitude and longitude. |
find_nearest(lats, lons, k=1)
Find the nearest 'k' data points for each latitude and longitude provided separately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lats
|
list of float
|
A list of latitudes. |
required |
lons
|
list of float
|
A list of longitudes. |
required |
k
|
int
|
Number of nearest neighbors to find. |
1
|
Returns:
Type | Description |
---|---|
list or pandas.DataFrame: Depending on 'k', returns a DataFrame or a list of DataFrames with the nearest points. |
lookup(lat, lon)
Find the nearest data point to a single latitude and longitude.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lat
|
float
|
Latitude of the query point. |
required |
lon
|
float
|
Longitude of the query point. |
required |
Returns:
Type | Description |
---|---|
pandas.Series: The row from the DataFrame corresponding to the nearest point. |
sample_raster_nearest(raster_file, coords, crs='EPSG:4326')
Sample a raster file at specific coordinates, taking the nearest pixel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raster_file
|
str
|
Path to the raster file. |
required |
coords
|
list of tuples
|
List of (x, y)/(lon, lat) tuples. |
required |
crs
|
str
|
The CRS the coords are in. |
'EPSG:4326'
|
Returns:
Name | Type | Description |
---|---|---|
dict |
A dictionary with band names as keys and lists of pixel values at the given coordinates as values. |
reproject_crop_raster(src, dst, out_epsg, min_coords, max_coords)
Reproject and crop a raster file. src_filename: Source file path. dst_filename: Destination file path. out_epsg: Output coordinate system as EPSG code. min_lon, min_lat, max_lon, max_lat: Bounding box coordinates.
copy_file(src, dest, symlink=False)
Copy a file from source to destination, optionally creating a symbolic link instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src
|
str
|
Path to the source file |
required |
dest
|
str
|
Path to the destination file/link |
required |
symlink
|
bool
|
Whether to create a symbolic link instead of copying. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
str | None: Path to the destination file if successful, None if source doesn't exist |
Note
If symlink is True and the destination already exists, it will be removed first.
parallel_executor(func, args, method='Process', max_workers=10, return_value=False, bar=True, timeout=None, verbose_errors=False)
Executes a function across multiple processes and collects the results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
The function to execute. |
required | |
method
|
string as Process or Thread. |
'Process'
|
|
args
|
An iterable of arguments to pass to the function. |
required | |
max_workers
|
The maximum number of processes to use. |
10
|
|
return_value
|
A boolean indicating whether the function returns a value. |
False
|
|
timeout
|
Number of seconds to wait for a process to complete |
None
|
|
verbose_errors
|
A boolean indicating whether to print full error traceback or just the exception |
False
|
Returns:
Name | Type | Description |
---|---|---|
results |
If return_value is True, a list of results from the function executions sorted according to If return_value is False, an empty list is returned. |
|
failed_indices |
A list of indices of arguments for which the function execution failed. |
read_gdb_layer(gdb_data, layer_name, columns=None, names=None)
Reads selected columns from a GDB layer and returns them in a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdb
|
gdb
|
The GDB file opened by ogr. |
required |
layer_name
|
str
|
The name of the layer to read. |
required |
columns
|
list
|
List of column indices to read. If None, all columns are read. |
None
|
names
|
list
|
List of column names corresponding to the indices in |
None
|
Returns:
Type | Description |
---|---|
pd.DataFrame: The resulting dataframe. |