Related Software
From CKAN
This page lists software that is related to CKAN in some way e.g. useful tools, similar software etc.
Contents |
See Also
Working with Tabular Data
- Messytables (by CKAN team)
- Often, tabular table is prepared in a way that does not lend itself to easy machine extraction: random header columns, encoding issues, incorrect or missing type information are just some of the common pitfalls. messytables is a library that accepts input in various formats and applies heuristics to guess a correct way of accessing the contained information.
- Tablib: Pythonic Tabular Datasets
- Tablib is an MIT Licensed format-agnostic tabular dataset library, written in Python. It allows you to import, export, and manipulate tabular data sets. Advanced features include, segregation, dynamic columns, tags & filtering, and seamless format import & export
- Tabular - Tabular data container and associated convenience routines in Python
- Pandas
- pandas is a library providing, among other things, a set of convenient and powerful data structures for working with labeled statistical (financial, economic, econometric) data sets. We will refer to this data as time series and cross-sectional (or longitudinal) which are common terms in statistics and econometrics. pandas has multiple target audiences:
- csvkit
- csvkit is a suite of utilities for converting to and working with CSV, written in Pyhton and release under the MIT licence
Data Catalogues and Data Platforms Software (Open-Source)
- Catalog denotes that the system links-to and provides metadata about data hosted elsewhere rather than acting as a repository for the data itself.
- Storage denotes that the system includes facilities for hosting data
- CKAN Data Hub Software - CKAN Wiki - CKAN.net community instance
- Catalog and Storage
- National Data Catalog (source, source v2)
- Catalog
- OpenGov Sweden (source)
- Catalog
- CivicApps Drupal (codebase being opened) - Based on Drupal and Developed by a third-party company. Only provide the download links and don't host the datasets.
- Catalog
- PANDA - PANDA wants to be your newsroom data appliance. It provides a place for you to store data, search it, and share it with the rest of your newsroom.
- Storage
Data Catalogue and Platform Services
- CKAN (Cloud version) - CKAN has been regularly deployed in a SaaS model in the cloud
- Figshare -- aimed at scientists, to encourage sharing of rarely-published data such as negative results and unpublished figures
- Buzzdata -- unreleased (as of May 2001). Startup offering "github for data mungers" (my words, not theirs!)
- Socrata
- Microsoft OGDI on Azure (source code)
- Infochimps - Give access to some of the actual datasets through an API. Also data available for purchase
Data Request Services
- DataTO (source) - also powers DataOTT
- http://isitopendata.org/
Miscellaneous
DAS
DAS = Distributed Annotation Server (for the Genome)
- distributed annotation (federated data sources...)
- DAS client (which scrolls like google maps, all written in javascript, including handling indexed binary files): http://www.biodalliance.org/
- DAS registry > 1,800 sources are registered: http://www.dasregistry.org/