Help talk:Toolforge/Elasticsearch
Some notes on access control
This is a condensation of a IRC chat I had with WMF cloud admins in Jan 2022. While elastic search in general provides fine-grained access control, the open-source version running in Toolforge does not. This is what is meant by "Elasticsearch does not offer multi-tenant access control in its open source version." The practical consequence of this is that any tool with write access to elasticsearch will have write access to every other tool's data. This includes the ability to delete all data in the database.
The limited access controls implemented in Toolforge consists of a HTTP proxy in front of the elasticsearch server which filters based on the HTTP request type (i.e. GET vs POST).
Further complicating this is the fact that some elasticsearch clients (for example, opensearch 1.0 for Python), use POST requests to perform queries. Presumably this is to get around the URL length limit imposed by the GET request. The implication of this is that every python client using opensearch needs to have write access, even if they are only doing read-only operations such as search queries. Thus:
curl -s http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80/spi-tools-dev-es-index/_count
will succeed:
{ "count" : 121302, "_shards" : { "successful" : 1, "failed" : 0, "total" : 1, "skipped" : 0 } }
but the equivalent operation in python
es = OpenSearch('elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80') print (es.count(index='/spi-tools-dev-es-index'))
will raise an AuthorizationException
due to a 403 Forbidden
HTTP return code. RoySmith (talk) 15:35, 31 January 2022 (UTC)