Solr Search
From CKAN
This page lists additional information for running CKAN with an Apache Solr search backend. There are basic details on install at: http://readthedocs.org/docs/ckan/en/latest/install-from-source.html
Contents |
Deployment
This documents the install under Ubuntu LTS (as required for a Debian install of CKAN).
- install solr-jetty and python-virtualenv via apt.
- set up a virtualenv in /etc/ckan/common with a pip requirements list at /etc/ckan/pip-common.txt, e.g:
-e hg+https://bitbucket.org/okfn/ckanext-solr#egg=ckanext-solr
- enable Jetty startup in /etc/defaults/jetty, set HOST to 127.0.0.1 and JAVA_OPTS -Xmx to a reasonable value (min working 128m, 1g better)
- symlink /etc/ckan/common/src/ckanext-solr/schema.xml to /var/lib/jetty/webapps/solr/conf
- edit the CKAN config in /etc/ckan/$INSTANCE/$INSTANCE.ini, set search backend:
search_backend = solr solr_url = http://localhost:8983/solr #solr_user = <NOT REQUIRED> #solr_password = <NOT REQUIRED>
- Start jetty and restart apache2.
Config settings
The extension has some options:
- solr_url will set the URL to the Solr core. If you are running against a single core instance, this is usually xxxx:8983/solr.
- solr_user and solr_password are HTTP basic authentication credentials to be used for requests against the backend. Solr does not actually support HTTP basic auth but password protection is required where the connection between CKAN and the server is via public IP addresses, so Solr may be operated through a reverse proxy with password protection.
Schema updates
Each time the schema is updated, the index data needs to be dropped and regenerated. This can be done by running the following commands:
rm -rf /var/lib/solr/data/* paster --plugin=ckan search-index --config=/etc/ckan/$INSTANCE/$INSTANCE.ini rebuild
Multiple cores
Solr supports multi-core operation which is very convenient for running multiple services against the same server. This is not required, however, to separate multiple CKAN installs: these filter any request via ckan.site_id anyway and multiple CKAN installs can use the same core without any trouble. To set up multicore, follow these steps:
- Move any existing Solr core config in /var/lib/jetty/webapps/solr/conf to /var/lib/jetty/webapps/solr/ckan/conf.
- Move data in /var/lib/solr/data/ to /var/lib/solr/data/ckan. Make sure this is writeable by Jetty.
- Create an XML file in /var/lib/jetty/webapps/solr/solr.xml with the following basic structure:
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
<core name="ckan" instanceDir="ckan">
<property name="dataDir" value="/var/lib/solr/data/ckan" />
</core>
</cores>
</solr>
- Adapt the CKAN solr_url to point at xxxx:8983/solr/ckan and reload apache2.
- Reboot jetty.
To add further core, copy the conf directory into a new named subfolder of /var/lib/jetty/webapps/solr/, replace the schema.xml and create an empty data directory in /var/lib/solr/data that the jetty process can write to. Then add a section to the XML file and restart jetty.
Extension info
- Extension info on ckanext-solr is at: List_of_Extensions#Apache_Solr_search_backend