Integrating Webstore and CKAN

From CKAN

Jump to: navigation, search

CKAN can be configured to automatically upload certain datasets into Webstore so that advantage can be taken of Webstore's functionality. The datasets that are uploaded are done based on the content-type of the data with the primary expected usage being for CSV and Excel files.

How does it work

File uploads were previously handled by the ckanext-storage extension which took care of handling file uploads to the relevant storage engine, making use of OFS. Now however this functionality has been integrated into CKAN core so that while file uploads are configured in nearly the same way the ckanext-storage extension is no longer needed for file uploads that are stored locally using pairtree (which has been added to the CKAN requirements files).

To enable the core file upload functionality it is only necessary to add the following two settings to the CKAN ini file

 ckan.storage.bucket = ckan   
 ckan.storage.directory = /tmp   

ckan.storage.directory should be a folder that will be used for storing the uploaded files and should already exist, although CKAN will ensure that it is marked as a valid pairtree repository. Once a file has been uploaded, using the new StorageController the ckanext-archiver will be run an asynchronous task to archive the file. As part of this archiving it will check whether the file should be uploaded to webstore or not using both the content type of the content or the archiver settings. The content type is determined by the archiver download and used to determine whether it is appropriate for webstore.

The archiver settings specify whether uploading to webstore is on or off, where the webstore is and whether retries are enabled or not - it is best that retries are not enabled during development.

The resource uploaded will be stored in webstore against the url /{user}/{resource-id}/data unless it is an excel file in which case one table will be created for each sheet with data in it. The user in the url will be the username of the person doing the upload and the resource-id will be the id of the resource that contains the original file. When updating the resource the webstore specific data will be stored in the webstore_url and webstore_last_updated properties.


Configuring CKAN and ckanext-archiver

CKAN's .ini file should contain two new values, the bucket name and the directory where files will be stored. The directory is probably the more important of the two. If the ckan.storage.directory is not set then the upload will not be enabled.

 ckan.storage.bucket = ckan 
 ckan.storage.directory = /tmp     

The ckanext-archiver must also be configured in the settings.py to specify whether uploading to webstore is enabled, were it is and whether to retry failed uploads at a later time.

 UPLOAD_TO_WEBSTORE = True 
 WEBSTORE_URL = "http://localhost:50002" # The http(s) and host part of the url 
 RETRIES = False   

Configuring webstore

The only changes required to webstore are configuration changes to support authorisation via CKAN. These should be added to your settings.py and should be as follows:

 AUTH_FUNCTION = 'ckan'   
 CKAN_DB_URI = 'postgresql://user@host/database'
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox