CKAN Extension Data Extractor

Keitaro
3 min readFeb 26, 2024

Ckanext-dataextractor is a powerful extension for CKAN that streamlines data extraction, manipulation, and integration processes. With customizable configurations, comprehensive documentation, and support resources, users can efficiently retrieve and refine datasets hosted on CKAN instances, enabling seamless integration into their data workflows. This extension empowers organizations to unlock valuable insights and make informed decisions based on the data extracted from CKAN.

This CKAN extension offers a simple installation process, and we will guide you through the steps to set it up and start leveraging its powerful data extraction capabilities within your CKAN environment.

Let’s start!

Requirements

For example, you might want to mention here which versions of CKAN this extension works with.

Installation

To install ckanext-dataextractor:

Activate your CKAN virtual environment, for example:

. /usr/lib/ckan/default/bin/activate

Install the ckanext-dataextractor Python package into your virtual environment

pip install ckanext-dataextractor

Add dataextractor to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).

Restart CKAN. For example if you’ve deployed CKAN with Apache on Ubuntu:

sudo service apache2 reload

Config Settings

Add Azure storage account settings:

ckanext.dataextractor.azure_storage_account_name = ...
ckanext.dataextractor.azure_storage_account_key = ...
ckanext.dataextractor.azure_storage_container_name = ...

Add blobs expiration in days config:

ckanext.dataextractor.blob_expiration_days = ...

Add resource rows per page limit, default max is 10:

ckanext.dataextractor.resource_rows_limit = ...

Add pagination pages shown limit, default max is 6:

ckanext.dataextractor.pagination_limit = ...

Limit the number of records shown when using datastore_resource_search action (defaults to 10000):

ckanext.dataextractor.default_search_limit = ...

Setup query timeout limit (in milliseconds) for datastore read- only account (defaults to 60000):

ckanext.dataextractor.query_timeout = ...

Override default search limit and retrieve/download all data for a given resource (defaults to False):

ckanext.dataextractor.enable_full_download = ...

Change datastore root url shown in the examples in Data API window:

ckanext.dataextractor.datastore_root_url = ...
ckanext.dataextractor.datastore_root_url = ...

Development Installation

To install ckanext-dataextractor for development, activate your CKAN virtualenv and do:

git clone https://github.com/viderumglobal/ckanext-dataextractor.git
cd ckanext-dataextractor
python setup.py develop
pip install -r dev-requirements.txt

Documentation

In order to view the documentation for all API actions open documentation/index.html.

If you want to update or rebuild the documentation please visit the guide for writing documentation.

Running the Tests

To run the tests, do:

nosetests --nologcapture --with-pylons=test.ini

To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage) then run:

nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.dataextractor --cover-inclusive --cover-erase --cover-tests

Registering ckanext-dataextractor on PyPI

ckanext-dataextractor should be availabe on PyPI as https://pypi.python.org/pypi/ckanext-dataextractor. If that link doesn’t work, then you can register the project on PyPI for the first time by following these steps:

  • Create a source distribution of the project:
python setup.py sdist
  • Register the project:
python setup.py register
  • Upload the source distribution to PyPI:
python setup.py sdist upload
  • Tag the first release of the project on GitHub with the version number from the setup.py file. For example if the version number in setup.py is 0.0.1 then do:
git tag 0.0.1
git push --tags

Releasing a New Version of ckanext-dataextractor

ckanext-dataextractor is availabe on PyPI as https://pypi.python.org/pypi/ckanext-dataextractor. To publish a new version to PyPI follow these steps:

  • Update the version number in the setup.py file. See PEP 440 for how to choose version numbers.
  • Create a source distribution of the new version:
python setup.py sdist
  • Upload the source distribution to PyPI:
python setup.py sdist upload
  • Tag the new release of the project on GitHub with the version number from the setup.py file. For example if the version number in setup.py is 0.0.2 then do:
git tag 0.0.2
git push --tags

Conclusion:

Ckanext-dataextractor simplifies data extraction from CKAN, enabling users to efficiently retrieve and manipulate datasets. With its seamless integration, customizable configurations, and support resources, ckanext-dataextractor helps organizations unlock their CKAN-hosted data’s full potential.

--

--

Keitaro

Keitaro is a Linux and Open-source software company that develops solutions empowering organizations and enterprises around the world.