Help talk:Toolforge/Elasticsearch

From Wikitech
Jump to navigation Jump to search

Some notes on access control

This is a condensation of a IRC chat I had with WMF cloud admins in Jan 2022. While elastic search in general provides fine-grained access control, the open-source version running in Toolforge does not. This is what is meant by "Elasticsearch does not offer multi-tenant access control in its open source version." The practical consequence of this is that any tool with write access to elasticsearch will have write access to every other tool's data. This includes the ability to delete all data in the database.

The limited access controls implemented in Toolforge consists of a HTTP proxy in front of the elasticsearch server which filters based on the HTTP request type (i.e. GET vs POST).

Further complicating this is the fact that some elasticsearch clients (for example, opensearch 1.0 for Python), use POST requests to perform queries. Presumably this is to get around the URL length limit imposed by the GET request. The implication of this is that every python client using opensearch needs to have write access, even if they are only doing read-only operations such as search queries. Thus:

curl -s http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80/spi-tools-dev-es-index/_count 

will succeed:

{
  "count" : 121302,
  "_shards" : {
     "successful" : 1,
     "failed" : 0,
     "total" : 1,
     "skipped" : 0
  }
}

but the equivalent operation in python

es = OpenSearch('elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80')
print (es.count(index='/spi-tools-dev-es-index'))

will raise an AuthorizationException due to a 403 Forbidden HTTP return code. RoySmith (talk) 15:35, 31 January 2022 (UTC)[reply]