Solr

From Wikitech
Jump to: navigation, search

Solr is a Lucene-based search engine. WMF used it for translation memory and spatial searches, but switched to Elasticsearch.

Puppet

modules/solr + manifests/role/solr.pp

Operations

How to poke it with a stick

Checking if it's OK:

maxsem@fenari:~$ curl http://solr1001:8983/solr/admin/cores?action=STATUS
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst> (loads of XML)

Checking if search works:

$ curl 'http://solr1001:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on'
(loads of XML)

Checking replication status

$ curl http://solr1001:8983/solr/replication?command=details

Get schema remotely:

$ curl 'http://solr1001:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml'

Restart

# service jetty restart

Upgrading schema

  • Delete existing documents:
$ curl 'http://localhost:8983/solr/update?commit=true&stream.body=%3Cdelete%3E%3Cquery%3E*%3A*%3C%2Fquery%3E%3C%2Fdelete%3E'
  • Stop the server:
# service jetty stop
  • Update schema.xml e.g. by forcing a puppet run:
# puppetd -tv
  • Reindex the data

Logs

  • If jetty was restarted, it moves today's logs to *.<some number>

In /var/log/jetty:

  • request.log - requests
  • stderrout.log - errors and debug information

These logs are prefixed with "year_month_day.".

Examples:

  • 2012_12_19.stderrout.log.100139764
  • 2012_12_20.request.log

GeoData

Disabling updates

GeoData is updated every 30 minutes by cronjob on terbium. You can disable it by creating /tmp/disable-update-geodata on that server. Deleting the file will reenable updates.

Switching master

  • Disable updates (see above). Wait some time in case update cronjob was already running.
  • Change replication_master in mainfests/role/solr.pp. When deploying this change, it's preferrable to start with the new master.
  • Update $wgGeoDataSolrMaster and $wgGeoDataSolrHosts in CommonSettings.php. Don't forget to reduce the weight on new master to avoid spikes during replication being too high.

Packages

We use our own backports:

$ dpkg -l *solr*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                       Version                    Description
+++-==========================-==========================-====================================================================
ii  libsolr-java               3.6.0+dfsg-1               Enterprise search server based on Lucene - Java libraries
ii  solr-common                3.6.0+dfsg-1               Enterprise search server based on Lucene3 - common files
ii  solr-jetty                 3.6.0+dfsg-1               Enterprise search server based on Lucene3 - Jetty integration

Links