max_p_functions.py

lib.max_p_functions.correct_neighbors_in_shapefile(param, combined_file, existing_neighbors)

This function finds the neighbors in the shapefile. Somehow, max-p cannot figure out the correct neighbors and some clusters are physically neighbors but they are not considered as neighbors. This is where this function comes in.

It creates a small buffer around each polygon. If the enlarged polygons intersect, and the area of the intersection exceeds a threshold, then the polygons are considered neighbors, and the dictionary of neighbors is updated.

Parameters
  • param (dict) – The dictionary of parameters including the coordinate reference system CRS and the resolution of input rasters res_desired.

  • combined_file (str) – The path to the shapefile of polygons to be clustered.

  • existing_neighbors (dict) – The dictionary of neighbors as extracted from the shapefile, before any eventual correction.

Return neighbors_corrected

The dictionary of neighbors after correction (equivalent to an adjacency matrix).

Return type

dict

lib.max_p_functions.eq_solver(coef, ll_point, ul_point, ur_point)

This function serves as the solver to find coefficient values A, B and C for our defined function which is used to calculate the threshold.

Parameters
  • coef (dict) – The coefficients which are calculated.

  • ll_point (tuple(int, int)) – Coordinates of lower left point.

  • ul_point (tuple(int, int)) – Coordinates of upper left point.

  • ur_point (tuple(int, int)) – Coordinates of upper right point.

Return f

Coefficient values for A, B and C in a numpy array. A is f[0], B is f[1] and C is f[2].

Return type

numpy array

lib.max_p_functions.get_coefficients(paths)

This function gets the coefficients A, B and C for solving the 3 equations which will lead to the calculation of the threshold in the max-p algorithm.

Parameters

paths (str) – The dictionary of paths including the one to non_empty_rasters.

Return coef

The coefficient values for A, B and C returned as a dictionary. The expected structure is similar to this dictionary: {‘A’: 0.55, ‘B’: 2.91, ‘C’: 0.61}.

Return type

dict

lib.max_p_functions.max_p_clustering(paths, param)

This function applies the max-p algorithm to the obtained polygons. Depending on the number of clusters in the whole map after k-means, it decides whether to run max-p clustering multiple times (for each part, then for the whole map) or once (for the whole map). If you have already results for each part, you can skip that by setting use_results_of_max_parts to 1.

Parameters
  • paths (dict) – Dictionary of paths pointing to polygonized_clusters and max_p_combined.

  • param (dict) – Dictionary of parameters including max-p related parameters (maximum_number and use_results_of_maxp_parts), and eventually the compression_ratio for the first round of max-p clustering.

Returns

The called functions max_p_parts and max_p_whole_map generate outputs.

Return type

None

lib.max_p_functions.max_p_parts(paths, param)

This function applies the max-p algorithm on each part. It identifies the neighbors from the shapefile of polygons for that part. If there are disconnected parts, it assumes that they are neighbors with the closest polygon.

The max-p algorithm aggregates polygons to a maximum number of regions that fulfil a certain condition. That condition is formulated as a minimum share of the sum of data values, thr. The threshold is set differently for each part, so that the largest and most diverse part keeps a large number of polygons, and the smallest and least diverse is aggregated into one region. This is determined by the function get_coefficients.

After assigning the clusters to the polygons, they are dissolved according to them, and the values of each property are aggregated according to the aggregation functions of the inputs, saved in agg.

Parameters
  • paths (dict) – Dictionary containing the paths to the folder of inputs and polygons, to the CSV non_empty_rasters, to the output folder parts_max_p and to the output file max_p_combined.

  • param (dict) – Dictionary of parameters containing the raster_names and their weights and aggregation methods agg, the compression_ratio of the polygonized kmeans clusters, and the CRS to be used for the shapefiles.

Returns

The results of the clustering are shapefiles for each part, saved in the folder parts_max_p, and a combination of these for the whole map max_p_combined.

Return type

None

lib.max_p_functions.max_p_whole_map(paths, param, combined_file)

This function runs the max-p algorithm for the whole map, either on the results obtained from max_p_parts, or on those obtained from polygonize_after_k_means, depending on the number of polygons after kmeans clustering.

It identifies the neighbors from the shapefile of polygons. If there are disconnected components (an island of polygons), it assumes that they are neighbors with the closest polygon. It also verifies that the code identifies neighbors properly and corrects that eventually using correct_neighbors_in_shapefile.

The max-p algorithm aggregates polygons to a maximum number of regions that fulfil a certain condition. That condition is formulated as a minimum share of the sum of data values, thr. The threshold for the whole map is set as a function of the number of polygons before clustering, and the desired number of polygons at the end. However, that number may not be matched exactly. The user may wish to adjust the threshold manually until the desired number is reached (increase the threshold to reduce the number of regions, and vice versa).

After assigning the clusters to the polygons, they are dissolved according to them, and the values of each property are aggregated according to the aggregation functions of the inputs, saved in agg.

Parameters
  • paths (dict) – Dictionary containing the paths to the folder of inputs and to the output file output.

  • param (dict) – Dictionary of parameters containing the raster_names and their weights and aggregation methods agg, the desired number of features at the end final_number, and the CRS to be used for the shapefiles.

  • combined_file (str) – Path to the shapefile to use as input. It is either the result obtained from max_p_parts, or the one obtained from polygonize_after_k_means.

Returns

The result of the clustering is one shapefile for the whole map saved directly in output.

Return type

None