Purpose
From CKAN
Discussion about the Purpose of CKAN.net, CKAN software and the CKAN community.
Contents |
Original Vision
Below is the basic outline for CKAN and its place in the wider 'Debian of Data' created in 2006-2007. See also: http://blog.okfn.org/writings/componentization-and-open-data/ http://blog.okfn.org/2010/02/23/introducing-datapkg/
In (open) knowledge development, we stand where software developers stood almost 30 years ago. If one freely distributes one work - whether it is a database, a learning module or a scientific paper - it is often done in a manner which impedes re-use. Significant effort must be expended to extract and re-format material so that it is useful for the purposes of others.
In software development, similar problems were encountered, and a major way they were addressed was by an increased focus on 'componentization'; developing discrete 'packages' (or libraries) of code that could be easily re-combined . Today, tools such as the Linux 'apt-get' and 'yum' package managers demonstrate the enormous potential of componentization: thousands of interdependent packages can be easily located, acquired and installed. We need to realize that potential for open data.
We need to deliver for data what systems like Debian or CPAN do for code: provide an (automatable) mechanism for discovery, indexing and 'installation' of open data "packages" – a 'dataset package' in this context denoting a collection of material (datasets, documents, images etc) substantial enough, and with sufficient potential for reuse, to warrant distribution as a unit (for example a large collection of photos, a database, a complete set of Shakespeare's works).
To this end we have been developing the CKAN (Comprehensive Knowledge Archive Network) system together with its tools and protocols such as datapkg.
Autumn 2010 Discussion
Copied from http://ckan.okfnpad.org/meta
The Purpose
What is the ultimate aim (of CKAN)?
Making it easy to get, use, share data in a *scalable* way.
The scalability here is not fundamentally about scaling a specific service (e.g. number of users or size of databasee) but about the way we work with data, specifically supporting a decentralized, collaborative, componentized approach.
The "debian" of data - distributed, collaborative data development using a componentized approach (individual 'dataset packages' that we plug together as we do with software libraries)
Features:
- Catalogue - Place to register existence of data and content esp that which is open
- Make sure information about license and download urls -- key components of opennes in opendefinition.org
- Really needs to be 'social'. How do I share data, snippets, etc with my friends?
- Catalogue - find (going beyond google, related datasets for this one, previews, search on specific attributes, clear license info)
- Archiving and storage
- Community - share knowledge and knowhow about datasets (the wiki-like features)
Richard Cyganiak
A directory of all machine-readable datasets available on the web, supporting a number of functions:
- . finding datasets
- . informing about the data -- what have others done with it, what problems did they find etc
- . connecting people who work with that data
- . collecting resources/scripts/applications/documentation related to the datasets
- . automation of common tasks, such as notification, archival, updates, provision of converted versions
It already does a decent job at 1 and 2.
(Alternative) Naming
- datadirectory
- getthedata.org
- dataquestions.org
- "The Open Knowledge Directory". [Rob Myers]
- "The Open Data Directory".
- "The Open Dataset Directory"
- Open Data Archive [Neil McEvoy]
- Open Database Catalogue [Emil Dambauskas]
- Open Database List
Tagline
Rufus Pollock
A place to find, reuse and share data
Get, use and share data
A data hub - get, use and share data
The data hub - find, collaborate, share
Find, share and reuse open content and data
Find, share and reuse
The data directory with a difference
Raw data now
Find, share reuse (in a machine automable way)
An open wiki-like registry for datasets " " for data hackers
Original: Knowledge and data will increasingly be provided in packaged form to be re-used combined in a manner similar to software. Need a registry to keep track of what open knowledge projects and packages exist.
Rob Myers
It needs to capture the fact that these are a) datasets b) on all subjects c) that you can download and use freely d) rather than any half-assed non-commercial APIs, etc. .
"The world's knowledge at your fingers"
"Datasets to download and use freely"
Words
Hub Collaborate Clean Find Share Reuse Get
CKAN software (Tagline)
The data directory software with a difference
Metaphors
- ecosystem
- mining / refining
- pieces loosely joined
- connecting different pieces
- mapping datasets
- finding datasets
- library, archive, resource, store
- raw material
Things to think about
- . Community coordinator
- . Project information
- . Division between ckan.net and the ckan the software
- . meta.ckan.net
- . ckan.org -> and should be proper development community (and be trac)
- . Move main ckan repo