Purpose

From CKAN

Jump to: navigation, search

Discussion about the Purpose of CKAN.net, CKAN software and the CKAN community.

Contents

Original Vision

Below is the basic outline for CKAN and its place in the wider 'Debian of Data' created in 2006-2007. See also: http://blog.okfn.org/writings/componentization-and-open-data/ http://blog.okfn.org/2010/02/23/introducing-datapkg/

In (open) knowledge development, we stand where software developers stood almost 30 years ago. If one freely distributes one work - whether it is a database, a learning module or a scientific paper - it is often done in a manner which impedes re-use. Significant effort must be expended to extract and re-format material so that it is useful for the purposes of others.

In software development, similar problems were encountered, and a major way they were addressed was by an increased focus on 'componentization'; developing discrete 'packages' (or libraries) of code that could be easily re-combined . Today, tools such as the Linux 'apt-get' and 'yum' package managers demonstrate the enormous potential of componentization: thousands of interdependent packages can be easily located, acquired and installed. We need to realize that potential for open data.

We need to deliver for data what systems like Debian or CPAN do for code: provide an (automatable) mechanism for discovery, indexing and 'installation' of open data "packages" – a 'dataset package' in this context denoting a collection of material (datasets, documents, images etc) substantial enough, and with sufficient potential for reuse, to warrant distribution as a unit (for example a large collection of photos, a database, a complete set of Shakespeare's works).

To this end we have been developing the CKAN (Comprehensive Knowledge Archive Network) system together with its tools and protocols such as datapkg.

ckan-vision.png

Autumn 2010 Discussion

Copied from http://ckan.okfnpad.org/meta

The Purpose

What is the ultimate aim (of CKAN)?

Making it easy to get, use, share data in a *scalable* way.

The scalability here is not fundamentally about scaling a specific service (e.g. number of users or size of databasee) but about the way we work with data, specifically supporting a decentralized, collaborative, componentized approach.

The "debian" of data - distributed, collaborative data development using a componentized approach (individual 'dataset packages' that we plug together as we do with software libraries)

Features:

Richard Cyganiak

A directory of all machine-readable datasets available on the web, supporting a number of functions:

  1. . finding datasets
  2. . informing about the data -- what have others done with it, what problems did they find etc
  3. . connecting people who work with that data
  4. . collecting resources/scripts/applications/documentation related to the datasets
  5. . automation of common tasks, such as notification, archival, updates, provision of converted versions

It already does a decent job at 1 and 2.

(Alternative) Naming

Tagline

Rufus Pollock

A place to find, reuse and share data

Get, use and share data

A data hub - get, use and share data

The data hub - find, collaborate, share

Find, share and reuse open content and data

Find, share and reuse

The data directory with a difference

Raw data now

Find, share reuse (in a machine automable way)

An open wiki-like registry for datasets " " for data hackers

Original: Knowledge and data will increasingly be provided in packaged form to be re-used combined in a manner similar to software. Need a registry to keep track of what open knowledge projects and packages exist.

Rob Myers

It needs to capture the fact that these are a) datasets b) on all subjects c) that you can download and use freely d) rather than any half-assed non-commercial APIs, etc. .

"The world's knowledge at your fingers"

"Datasets to download and use freely"

Words

Hub Collaborate Clean Find Share Reuse Get

CKAN software (Tagline)

The data directory software with a difference

Metaphors

Things to think about

  1. . Community coordinator
  2. . Project information
  3. . Division between ckan.net and the ckan the software
  4. . meta.ckan.net
  5. . ckan.org -> and should be proper development community (and be trac)
  6. . Move main ckan repo
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox