Discovery/Analytics/Glent
Glent is a search platform project that generates query suggestions based on search logs in a batch process. The suggestions are regenerated once a week and shipped to the production cirrussearch clusters to be presented to users when their search query is similar to a query glent has suggestions for. This documention is limited to the analytics portion of the glent suggestions pipeline. Querying the glent data and presenting suggestions to users is implemented in the CirrusSearch MediaWiki extension.
Development Environment
You will need java and maven for the analytics portion of glent.
Code
The source code is in gerrit project search/glent
. To start working with glent clone the repository:
git clone https://gerrit.wikimedia.org/r/search/glent
Build
You can build the distribution package by running:
./mvnw package
and the package will be in the target/
directory.
Release
All deployments of glent code must be accompanied by a release to make the jars available to production services. Releases are handled by the maven-release-plugin
and can be run in a two stage process:
./mvnw release:prepare
in the source repository which updates the version numbers. If your system username is different that the one in gerrit, use-Dusername=...
option../mvnw release:perform
in the source repository - this deploys the artifacts to archiva.
Note that for the above you will need archiva deployment credentials in your ~/.m2/settings.xml file for the archiva.releases
and archiva.snapshots
repositories.
If there is some problem with preparing the release the local repository can be reset with:
./mvnw release:rollback
Deployment
The analytics portion of glent is run by an airflow dag in the gerrit project wikimedia/discovery/analytics
. Once a new version of glent has been released you will need to update the deployed jars.
Once the jars are updated, deploy the jars to Airflow production.