Skip to main content

HerdingCATs

A project to speed up how data analysts explore and interact with open data sources.

uv add HerdCats

Explore Open Data

Navigate the open data ecosystem with support for CKAN, OpenDataSoft, and other bespoke data catalogue APIs.

Find the Data You Need

Search across multiple data catalogues with a unified interface. Access datasets from government portals, energy providers, and humanitarian sources.

Transform & Load Data

Convert open datasets to Pandas or Polars DataFrames, or load directly to cloud storage for further analysis with minimal effort.

Try it Yourself

Getting started with HerdingCATs is simple. Below is a complete example showing how to access and upload data from the London Data Store.

example.py
import HerdingCats as hc

def main():
    # Create a session with a predefined catalogue
    with hc.CatSession(hc.CkanDataCatalogues.LONDON_DATA_STORE)as session:
        # Create an explorer for the catalogue
        explorer = hc.CkanCatExplorer(session)

        # Create a data loader
        data_loader = hc.CkanLoader()

        # Show package info to get more details
        package = explorer.show_package_info("use-of-force")

        # Extract the resource URLs
        extracted_data = explorer.extract_resource_url(package)

        # Take the 8th resource from the list
        data_to_load = extracted_data[7]

        # Upload the data to AWS S3
        # This uses the "raw" but you can specify "parquet" as well
        data_loader.upload_data(
            data_to_load,
            "your-bucket-name",
            "your-custom-name",
            "raw",
            "s3"
        )

if __name__ == "__main__":
    main()