Managing Datasets

From CKAN

Jump to: navigation, search

Contents

Do I just register a dataset, or can I upload and store data online?

CKAN now allows you to both register links to datasets and store data directly online. For more information, please see this blog post

I've spotted some spam on CKAN

This happens occasionally. It is useful to alert the CKAN admins to a dataset that is spam or has been spammed by editing the dataset and adding tag: meta.spam

A CKAN system administrator can delete the spam revisions from the history page (/revision) and then purge them at /ckan-admin/trash provided by the admin extension.

How Do I Install the CKAN Bookmarklet?

Want to automate creating a dataset from the info in a webpage? You can use the CKAN bookmarklet. This will automatically extract information such as url, title and a description and send it off to the new dataset form (where you can then edit further).

To install the bookmarklet just create a new bookmark with the following text as the location:

javascript:(function(){f='http://thedatahub.org/dataset/new?url='+encodeURIComponent(window.location.href)+'&title='+encodeURIComponent(document.title);if((n=document.getElementsByName('description')[0])&&(d=n.content)){f+='&notes='+encodeURIComponent(d);}a=function(){if(!window.open(f)){location.href=f;}};if(/Firefox/.test(navigator.userAgent)){setTimeout(a,0)}else{a()}})()

You could title the bookmark 'Add to CKAN'.

For more details on installing bookmarklets see the del.icio.us help page.

What's the Difference Between Deletion and Purging ?

'Deleting' a dataset is not permanent (it can be resurrected), whereas 'purging' a dataset loses it forever. The recommended policy is to delete datasets that are wrongly added or duplicates, and purge datasets which are spam.

How do I Delete a dataset ?

NB: You need privileges to do this: you must either be an administrator for the dataset, or a CKAN system administrator. If you were the person who created the dataset (whilst logged in) then you are automatically an administrator for the dataset so can delete it.

  1. When viewing the dataset, click the "Edit" tab.
  2. Close to the bottom of the form should be the field called "State". Change the value from "active" to "deleted".
  3. Hit save

Alternatively you should mark the dataset for deletion by editing it and adding a suitable tag e.g.: meta.duplicate or meta.not-data and a system administrator should clean it up in time (or ask us on the ckan IRC or the ckan-discuss email list).

When a dataset is in the "deleted" state then it is still visible to its admin and sysadmins, but for general users it will not be listed or searchable.

How do deleted datasets work ?

In CKAN changes are "revisioned" so that the history of changes can be viewed, and any can be reversed if desired. 'Deleting' a dataset actually means changing the dataset's 'state' value from 'active' to 'deleted', which can then be changed back to resurrect it.

The average user will not be able to browse or search for a 'deleted' dataset, but the dataset's administrator, or a system administrator can. These particular users should check the 'state' field when viewing the dataset, to see if it is deleted or not.

How Do I Purge a Dataset ?

(ideal for a dataset created by a spammer - you get rid of its revision history)

NB You need to be a CKAN system administrator to do this

If you are a system administrator:

How Do I Purge a Revision ?

You need to be a CKAN system administrator. If you are a system administrator:

How do I purge a group ?

(ideal for a group created by a spammer - you get rid of its revision history)


NB You need to be a CKAN system administrator.

If you are a system administrator:

How Do I Deal with Duplicate Datasets ?


What about Spam and Permissions?

We favour a low barrier to editing in CKAN, to encourage and benefit from many casual additions and improvements just like for Wikipedia. But we tend to setup CKAN to require a user to register or log-in to create or edit datasets, because this is a defence against automated spam. If spam does get through, since the database is versioned, it is easy to remove spam or other bad edits.

On theDataHub.org we have made two conditions:

Users that log-in to CKAN can set edit permissions on datasets and groups that they create, if it is necessary to 'lock-down' or 'open-up' a particular dataset.

Dataset Notes Markup

In addition to markdown syntax, dataset notes support CKAN-specific markup for linking to datasets, tags, and groups:

Tag Conventions

What Are Tag Families ?

There is a convention of using the '.' character to create groupings of tags, for example: the meta group of tags described in the next item. Current known tag families are:

NB: at an earlier stage there was a convention of using '-' as grouping separator. However, '-' is also used to separate words in multi-word tags so this practice is now discouraged.

What Are Meta Tags ?

Meta tags have the prefix "meta" and are especially use for "house-keeping"-type activities around datasets such as marking datasets as spam. Current standard meta tags are:

Dataset Resources

What Form Should the Hash Field Have?

Options:

What Hash Function Should I Use

We recommend using sha1.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox