Solr
Appearance
This page contains historical information. It may be outdated or unreliable.
Solr is a Lucene-based search engine. WMF used it for translation memory and spatial searches, but switched to Elasticsearch.
Puppet
modules/solr + manifests/role/solr.pp
Operations
How to poke it with a stick
Checking if it's OK:
maxsem@fenari:~$ curl http://solr1001:8983/solr/admin/cores?action=STATUS <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst> (loads of XML)
Checking if search works:
$ curl 'http://solr1001:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on' (loads of XML)
Checking replication status
$ curl http://solr1001:8983/solr/replication?command=details
Get schema remotely:
$ curl 'http://solr1001:8983/solr/admin/file/?contentType=text/xml;charset=utf-8&file=schema.xml'
Restart
# service jetty restart
Upgrading schema
- â Delete existing documentsâ :
$ curl 'http://localhost:8983/solr/update?commit=true&stream.body=%3Cdelete%3E%3Cquery%3E*%3A*%3C%2Fquery%3E%3C%2Fdelete%3E'
- Stop the server:
# service jetty stop
- Update schema.xml e.g. by forcing a puppet run:
# puppetd -tv
- Reindex the data
Logs
- If jetty was restarted, it moves today's logs to *.<some number>
In /var/log/jetty:
- request.log - requests
- stderrout.log - errors and debug information
These logs are prefixed with "year_month_day.".
Examples:
- 2012_12_19.stderrout.log.100139764
- 2012_12_20.request.log
GeoData
Disabling updates
GeoData is updated every 30 minutes by cronjob on terbium. You can disable it by creating /tmp/disable-update-geodata on that server. Deleting the file will reenable updates.
Switching master
- Disable updates (see above). Wait some time in case update cronjob was already running.
- Change replication_master in mainfests/role/solr.pp. When deploying this change, it's preferrable to start with the new master.
- Update $wgGeoDataSolrMaster and $wgGeoDataSolrHosts in CommonSettings.php. Don't forget to reduce the weight on new master to avoid spikes during replication being too high.
Packages
We use our own backports:
$ dpkg -l *solr* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Description +++-==========================-==========================-==================================================================== ii libsolr-java 3.6.0+dfsg-1 Enterprise search server based on Lucene - Java libraries ii solr-common 3.6.0+dfsg-1 Enterprise search server based on Lucene3 - common files ii solr-jetty 3.6.0+dfsg-1 Enterprise search server based on Lucene3 - Jetty integration
Links
- Documentation (pretty crappy)