Data Formats
From CKAN
| Deprecated. This page is deprecated and should no longer be used. It has been wholly or partially replaced by http://dataprotocols.org/ |
THIS PAGE IS DEPRECATED. Its content has been copied to (and been superseded) by material on http://dataprotocols.org/
A listing of formats for data exchange and related activities (e.g. query) that may be relevant to CKAN.
Contents |
Formats - General
RDF and Linked Data
- Overview: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
- Triple format developed by W3C
Google Visualization API
- Metadata: http://code.google.com/apis/visualization/documentation/dev/implementing_data_source.html#jsondatatable
- Data: http://code.google.com/apis/visualization/documentation/reference.html#dataparam. Three attributes:
- cols: define types of cols
- rows: a list of rows
- p: arbitrary key/value pairs.
CKAN Data API
CKAN Data API is provided by the Webstore.
OData
- Overview: http://odata.org/
- Microsoft's data format
- XML + Atom based
DSPL - Dataset Publishing Language
- http://code.google.com/intl/it-IT/apis/publicdata/
- Use for Google Public Data Explorer
- Multi-CSV based with additional metadata
SQL
- Standard ANSI SQL
SQLite
- http://www.sqlite.org/
- SQLite binary format - not just sql Not specified by anyone in particular but suggested by several people and now used by Scraperwiki
SODA - Socrata Open Data API
Metaweb Object Model
- Generic 'triple/graph' format used for Freebase
Formats - Tabular
This section was put together primarily to inform the development of tabular formats in relation to the DataExplorer.
General characteristics
Most systems have a model that looks something like:
Dataset
- headers: list of Columns
- data: RowSet
- total (total_rows in couch, count in sql style systems): number of rows in RowSet
Column:
- name
- label
RowSet - list of rows:
- getLength
- getRow(i): returns row
Row:
- list of cells
R (Data Frames)
TODO: Need more info ...
Tablib
- Tablib: http://docs.tablib.org/
- Tablib Core: https://github.com/kennethreitz/tablib/blob/develop/tablib/core.py
Model:
- Dataset - core object
- dict: list of Rows (can instantiate with list of arrays/tuples)
- headers: header fields
- Row: list of fields
- Databook: list of Datasets (e.g. spreadsheet workbook)
SlickGrid
JS tabular data presentation.
- SlickGrid: https://github.com/mleibman/SlickGrid
- SlickGrid.Data.DataView: https://github.com/mleibman/SlickGrid/blob/master/slick.dataview.js
Model:
- Two arguments: data, columns
- Data: an array of dicts or a Model object
- Model: object implement three methods - see sample implementation SlickGrid.Data.DataView_
- model.getItem(i) // Returns the ith row
- model.getLength() // Returns the number of items
- model.getItemMetadata(i) // not sure about this ...
- Model: object implement three methods - see sample implementation SlickGrid.Data.DataView_
- Columns: at least id, name (label) and field attributes. See https://github.com/mleibman/SlickGrid/wiki/Column-Options
JS Data
Model:
- Data.Hash (A sortable Hash data-structure)
- Data.Graph (A data abstraction for all kinds of linked data)
- Data.Collection (A simplified interface for tabular data that uses a Data.Graph internally)
- Persistence Layer for Data.Graphs
See Also
Querying
Unstructured Query Language
- http://www.unqlspec.org/display/UnQL/Home
- UnQL means Unstructured Query Language. It's an open query language for JSON, semi-structured and document databases.
HTSQL
- http://htsql.org/
- A database query language based on SQL
- HTSQL is a URI-based high-level query language for relational databases. HTSQL wraps your database with a web service layer, translating HTTP requests into SQL and returning results as HTML, JSON, etc.
URI Fragment Identifiers for the text/csv Media Type
- Method for addressing (and hence possibly querying) into csv documents
- http://tools.ietf.org/html/draft-hausenblas-csv-fragment-00
- Status: draft
- Published: 26 April 2011