hdx.scraper.geonode.geonodetohdx
GeoNode Utilities:
Reads from GeoNode servers and creates datasets.
create_dataset_showcase
def create_dataset_showcase(dataset: Dataset, showcase: Showcase,
**kwargs: Any) -> None
Create dataset and showcase
Arguments:
dataset
Dataset - Dataset to createshowcase
Showcase - Showcase to create**kwargs
- Args to pass to dataset create_in_hdx call
Returns:
None
delete_from_hdx
def delete_from_hdx(dataset: Dataset) -> None
Delete dataset and any associated showcases
Arguments:
dataset
Dataset - Dataset to delete
Returns:
None
GeoNodeToHDX Objects
class GeoNodeToHDX()
Utilities to bring GeoNode data into HDX. hdx_geonode_config_yaml points to a YAML file that overrides base values and is in this format:
ignore_data: - deprecated
category_mapping: Elevation: 'elevation - topography - altitude' 'Inland Waters': river
titleabstract_mapping: bridges: - bridges - transportation - 'facilities and infrastructure' idp: camp: - 'displaced persons locations - camps - shelters' - 'internally displaced persons - idp' else: - 'internally displaced persons - idp'
Arguments:
geonode_url
str - GeoNode server urldownloader
Download - Download object from HDX Python Utilitieshdx_geonode_config_yaml
Optional[str] - Configuration file for scraper
get_ignore_data
def get_ignore_data() -> List[str]
Get terms in the abstract that mean that the dataset should not be added to HDX
Returns:
List[str]
- List of terms in the abstract that mean that the dataset should not be added to HDX
get_category_mapping
def get_category_mapping() -> Dict[str, str]
Get mappings from the category field category__gn_description to HDX metadata tags
Returns:
Dict[str,str]
- List of mappings from the category field category__gn_description to HDX metadata tags
get_titleabstract_mapping
def get_titleabstract_mapping() -> Dict[str, Union[Dict, List]]
Get mappings from terms in the title or abstract to HDX metadata tags
Returns:
Dict[str,Union[Dict,List]]
- List of mappings from terms in the title or abstract to HDX metadata tags
get_countries
def get_countries(use_count: bool = True) -> List[Dict]
Get countries from GeoNode
Arguments:
use_count
bool - Whether to use null count metadata to exclude countries. Defaults to True.
Returns:
List[Dict]
- List of countries in form (iso3 code, name)
get_layers
def get_layers(countryiso: Optional[str] = None) -> List[Dict]
Get layers from GeoNode optionally for a particular country
Arguments:
countryiso
Optional[str] - ISO 3 code of country from which to get layers. Defaults to None (all countries).
Returns:
List[Dict]
- List of layers
get_orgname
@staticmethod
def get_orgname(metadata: Dict, orgclass: Type = Organization) -> str
Get orgname from Dict if available or use orgid from Dict to look up organisation name
Arguments:
metadata
Dict - Dictionary containing keys: maintainerid, orgid, updatefreq, subnationalorgclass
Type - Class to use for look up. Defaults to Organization.
Returns:
str
- Organisation name
generate_dataset_and_showcase
def generate_dataset_and_showcase(
countryiso: str,
layer: Dict,
metadata: Dict,
get_date_from_title: bool = False,
process_dataset_name: Callable[[str], str] = lambda x: x,
dataset_codlevel_mapping: Dict[str, List] = dict(),
dataset_tags_mapping: Dict[str, List] = dict()
) -> Tuple[Optional[Dataset], Optional[List], Optional[Showcase]]
Generate dataset and showcase for GeoNode layer
Arguments:
countryiso
str - ISO 3 code of countrylayer
Dict - Data about layer from GeoNodemetadata
Dict - Dictionary containing keys: maintainerid, orgid, updatefreq, subnationalget_date_from_title
bool - Whether to remove dates from title. Defaults to False.process_dataset_name
Callable[[str], str] - Function to change the dataset name. Defaults to lambda x: x.dataset_codlevel_mapping
Dict[str, List] - Mapping from dataset name to cod levels. Defaults to empty dictionary.dataset_tags_mapping
Dict[str, List] - Mapping from dataset name to additional tags. Defaults to empty dictionary.
Returns:
Tuple[Optional[Dataset],List,Optional[Showcase]]
- Dataset, date ranges in dataset title and Showcase objects or None, None, None
generate_datasets_and_showcases
def generate_datasets_and_showcases(
metadata: Dict,
create_dataset_showcase: Callable[[Dataset, Showcase, Any],
None] = create_dataset_showcase,
use_count: bool = True,
countrydata: Dict[str, Optional[str]] = None,
get_date_from_title: bool = False,
process_dataset_name: Callable[[str], str] = lambda x: x,
dataset_codlevel_mapping: Dict[str, List] = dict(),
dataset_tags_mapping: Dict[str, List] = dict(),
**kwargs: Any) -> List[str]
Generate datasets and showcases for all GeoNode layers
Arguments:
metadata
Dict - Dictionary containing keys: maintainerid, orgid, updatefreq, subnationalcreate_dataset_showcase
Callable[[Dataset, Showcase, Any], None] - Function to call to create dataset and showcaseuse_count
bool - Whether to use null count metadata to exclude countries. Defaults to True.countrydata
Dict[str, Optional[str]] - Dictionary of countrydata. Defaults to None (read from GeoNode).get_date_from_title
bool - Whether to remove dates from title. Defaults to False.process_dataset_name
Callable[[str], str] - Function to change the dataset name. Defaults to lambda x: x.dataset_codlevel_mapping
Dict[str, List] - Mapping from dataset name to cod levels. Defaults to empty dictionary.dataset_tags_mapping
Dict[str, List] - Mapping from dataset name to additional tags. Defaults to empty dictionary.**kwargs
- Args to pass to dataset create_in_hdx call
Returns:
List[str]
- List of names of datasets added or updated
delete_other_datasets
def delete_other_datasets(
datasets_to_keep: List[str],
metadata: Dict,
delete_from_hdx: Callable[[Dataset], None] = delete_from_hdx) -> None
Delete all GeoNode datasets and associated showcases in HDX where layers have been deleted from the GeoNode server.
Arguments:
datasets_to_keep
List[str] - List of dataset names that are to be kept (they were added or updated)metadata
Dict - Dictionary containing keys: maintainerid, orgid, updatefreq, subnationaldelete_from_hdx
Callable[[Dataset], None] - Function to call to delete dataset
Returns:
None