Discovery/Analytics/Glent

From Wikitech

Glent is a search platform project that generates query suggestions based on search logs in a batch process. The suggestions are regenerated once a week and shipped to the production cirrussearch clusters to be presented to users when their search query is similar to a query glent has suggestions for. This documention is limited to the analytics portion of the glent suggestions pipeline. Querying the glent data and presenting suggestions to users is implemented in the CirrusSearch MediaWiki extension.

Development Environment

You will need java and maven for the analytics portion of glent.

Code

The source code is in gerrit project search/glent. To start working with glent clone the repository:

 git clone https://gerrit.wikimedia.org/r/search/glent

Build

You can build the distribution package by running:

 ./mvnw package

and the package will be in the target/ directory.

Release

All deployments of glent code must be accompanied by a release to make the jars available to production services. Releases are handled by the maven-release-plugin and can be run in a two stage process:

  1. ./mvnw release:prepare in the source repository which updates the version numbers. If your system username is different that the one in gerrit, use -Dusername=... option.
  2. ./mvnw release:perform in the source repository - this deploys the artifacts to archiva.

Note that for the above you will need archiva deployment credentials in your ~/.m2/settings.xml file for the archiva.releases and archiva.snapshots repositories.

If there is some problem with preparing the release the local repository can be reset with:

./mvnw release:rollback

Deployment

The analytics portion of glent is run by an airflow dag in the gerrit project wikimedia/discovery/analytics. Once a new version of glent has been released you will need to update the deployed jars.

Once the jars are updated, deploy the jars to Airflow production.