UX
From CKAN
UX Reworking Plan Summer 2011 - implementation in ticket: http://trac.ckan.org/ticket/1294
Contents |
(Emerging) Consensus
Main pages:
- Front page for non-logged in
- Get started and global activity stream
- Dashboard
- account page but for you (when logged in)
- Account page for a given account
- Collections, Resources, and Activity
- Dataset page
- Resources
- Resource page
Advanced data catalogue use cases
- As a data publisher, I want to point people at a list of datasets that I have released.
- As a data publisher, I want a ticket to be raised and a notification to be sent when a resource I have published becomes unavailable. After resolving the issue, I want to report back and close the issue.
- As a data wrangler, I want to upload resources to a (human-)named location and retrieve them by name. Once I have produced a derivative file, I want to upload it to a new, named location and document both the relation and the steps performed to create the change.
- As a data wrangler, I want to get an overview of the contents of a table by seeing column names, example values, type guesses and the distinct & null values count for each column.
- As a data wrangler, I want to follow the activities of my peers through activity views on their profile pages, their resource pages and - after following them - on my profile page.
- As a data user, I want to file tickets against a data source, specifying either availability, formatting or content issues. I want to group similar reports (e.g. 500 broken rows in a 20k rows table), set a priority and comment on an issue.
- As a data wrangler, I want to find a well-documented API for my data after uploading it.
Near-term UX Issues
- Partial/in-place package editing
- Sort groups by size, present in box grid not table listing (possibly associate image)
- Page response speed < 250ms for all pages
- Remove comments from home page. Have proper comments.
- Remove tags navigation
- Fundamentally rework history & change display
Front-page redesign draft
PublicData.eu
http://publicdata.eu/ already exemplifies lots of good changes we should make:
- Consoidated and smaller 'masthead' section including basic menu
- Consolidated subnav menu including things like RSS feed
- better focus on main elements (such as title)
- (basic) integration of apps and ideas
RP Brain Dump
- Rename package to dataset (one day package may make sense but not yet)
- Move main 'menu' to top right (freeing up space below)
Dataset Page
- Simplify page
- Title + project url at top (as currently)
- Summary snippet next (with read more link)
- resources next (just call it data or have not title)
- Notes (title as description or without a title)
- Move RSS to bottom of page (or remove)
- Put follow at right hand side in submenu bar area
- Move actions to ??
- Call to action at top of dataset pages
- No data -- help us get some
- Not open -- user isitopendata.org
Other stuff:
- Quick add/edit for a resource
- Quick tag
Sidebar
- share this, comments, getthedata, ...
Integrate Forum / Q&A (getthedata.org)
- Register on ckan from GTD
- CKAN sidebar with latest q&a
- Run a daily cron job to extract from getthedata and register on CKAN
- Show snippet from item (how?) + then show full item in jquery dialog
- (?) integrate getthedata at: /forum/
- have '/requestdata/' leading to a specially tagged item in the forums
- requires login integration ...
- could proxy in osqa using deliverance ...?
Front page
- Front page: 2 (3) actions which are very bold:
- Get Data - search (no search results should lead to option to: request data)
- May wish to allow request straight off so people do not get lost in searching
- Share (Publish) Data
- Upload (dataset not already public)
- Register (dataset already online)
- Get Data - search (no search results should lead to option to: request data)
Bigger Items
- {user|org}/{dataset-name} - all datasets and resources owned by a user / organization
- Make user home / profile page much nicer and real place for user
Brain Dump from some user (Pablo)
- In "Add Package" (http://ckan.net/package/new), collapsing stuff under "Add more information (Groups, authors etc)" makes it hard to explain to people where to go add information. Also makes impossible to use ctrl+f to jump to "Extras", for example. Since the subsections Groups, Details and Extras are not even that verbose, maybe they should just stay always open.
- By the way, why isn't just Author within the basic information?
- Other methods of contact: some people are uncomfortable with exposing their e-mail addresses online. CKAN provides some disguising of the address via char encoding, but some people are still uncomfortable with that. However, knowing who to talk about the data, report errors, etc. is an important thing. Maybe we could allow a dropdown where people can choose to provide WebID, Twitter, Google+, etc?
- It's unclear when I should have a package with many "resources" or one package per downloadable resource. Many users create multiple packages and link them together from the description field with a "package:X" link. Something similar could be supported, where you indicate a "parent dataset" to avoid duplication. On the other direction, people can add resources that are already packages, so they don't have to provide a download link twice, for example. The linkage information in itself could be exploited in many interesting ways.
- What if the dataset description was organized in very simple terms of: What, Who, When, How much, Why, Where to find?
- What: package name, title, URL, Tags (topic of dataset, relevant formats, etc.)
- How open: put the licensing information in prominent place.
- Who: author, maintainer, is data authoritative (published by producer), maybe also who can use the dataset (link to how open).
- When: version, last updated
- How much: size of dataset in number of records, megabytes, etc.
- Where to find: an example record, downloads, query interfaces, etc.
- Why: description of why people should care about this dataset
- Against spam: for edits that change URLs we should require a captcha. I'm seeing more and more spam creeping in to CKAN.
- Reporting spam: we should be able to flag users that are abusing the website.
- Broken links: a bot that checks which resources are still alive and flags them on CKAN could be a nice idea. We'd be happy to implement bots for the formats we're interested in, but in order to add that to CKAN it would be nice to have a "broken?" flag to set for each download link.
Friedrich Brain Dump
http://gettingreal.37signals.com/ch04_Make_Opinionated_Software.php
Things I want to do in CKAN but can't
For some of these its unclear whether they're within scope, but this is itself an interesting discussion. The purpose here is to be specific rather than overgeneralize.
- Manage my packages: see what resources I manage, and a short status indicator for each ('ready to load', 'needs wrangling', 'stub')
- Browse all packages that Rufus manages, see what he last edited.
- Facet users by package categories, tags - contact a community of practice.
- Sign up to packages; naming the skills I have to offer with regards to them.
- Receive data pushes - I currently have 12 messages in Inbox that are really data pushes, plus many more in the "OpenSpending" group on CKAN that people want me/us to look at.
- Have my own little N-step pipeline for dealing with these, or simply tickets to track progress.
- Organize a data hunt: quickly create a set of stub packages and assign them to team members/friends. See an activity stream and interact with failing efforts.
- Priorize data: given a set of data packages, priorize them as a group (e.g. for data hunt but also for advocacy - the scary "So what kind of things do you want to have released?" moment)
- Offer a virtual currency bounty for a dataset and see a list of all bounties within my social network.
- Learn about which data actions are available for a given package based on licensing and formats ('can't fork', 'can't use for organization X which is marked commercial', 'can't bulk download')
- Keep track of the various scrapers and transformation scripts I've set up on servers around the net, see which ones are failing and when each last ran, what artifacts each produced.
- Cron-schedule a transformation script to be executed on a managed server by CKAN with inputs and outputs as resources. Pay via Paypal or by entering my AWS credentials directly.
- See what applications and processing scripts are available for each dataset, for an application see the datasets upon which it is based (double-linked apps catalogue).
- See data advisories or warnings people have made for this data, ranging from "this should be type-converted" to "nazi propaganda". See specific locations within the data this refers to.
- Highlight potential stories for data journalists, see this list in full or per country/topic.
- This is functionality we also need in OpenSpending but since it is about data - CKAN should store it.
- Have a canonical location for data (not metadata) which I can trust to be persistent. I don't trust OKF for this so I want either archive.org or Google. Upload data to this location from the command line, and have various versions of the dataset around.
- Deal with the data in superficial ways:
- Get an overview of the data structure (column headers and sheet names, graph structure for JSON or RDF, de-facto schema in XML) and data quality (column types, empty values, out-of-order values, file format corruptions).
- Perform basic operations on tabular data, such as selecting a subset of columns, rows etc. Highlight foreign key relations across differen sheets, resources or data sets (including fuzzy foreigns/reconciliation).
- Get an impression of the shape and dimensions of a shapefile, kml set.
Proposed solutions
GitHub-Style User/Organization Model
- Each package has one (and only one) owner.
- Packages can be forked and merged.
Reduce to four conceptual enties
- A user, as an actor. Every other entity has an owner, but may also be edited by others.
- A reference (formerly Resource); basically a link with a title, note and short metadata
- Determine MIME type etc. automatically
- A collection (formerly Package or Group); grouping of references that *adds metadata* to all of them (i.e. the metadata is thought to apply directly to each reference).
- This is many-to-many with references.
- Collections may have queues where users suggest possible references.
- (Are collections in collections?)
- DSPL-exportable.
- An update on either a reference or a collection; can be a notice from a crawler, a link to a derived data set, an advisory, a story pointer, a bounty, a preview etc.
- Updates are typed and each type may refer to a template for rendering the update, we may pull these in remotely.
- They are not immutable, may be updated when their message changes.
Type metadata
- It needs to have a human-readable name, a machine name, perhaps a namespace and other metadata.
Insert mode


