Data Platform/Systems/OpenSearch-on-K8s/Administration
OpenSearch on K8s (WIP)
Intended audience: SREs who operate the OpenSearch on K8s platform.
Note: This platform is not yet in production. See T408586 ☂️ OpenSearch on K8s: Ensure platform is ready for production ☂️ for the latest on production status.
If you are a service owner, or are interested in deploying on the platform, you may be more interested in this page.
Dashboards and Alerts
See our OpenSearch on K8s dashboard. The dashboard contains a number of metrics we use to gauge health. Of particular interest are cluster state, thread pools, FIXME: Add more data
You can find alerts in the WMF's alert repo . FIXME: Update with alerts details once alerts actually exist.
Deploying a New OpenSearch on K8s Cluster
- Begin by following the general instructions for deploying a new Kubernetes service.
- Create/merge a patch that increases default resources for your new namespace (example patch) . This is needed because the version of the operator chart we use (2.7.0) does not allow changing the resources allotted to the bootstrap pod, and it needs at least 2 GB RAM to stand up the cluster.
- Create and add secrets. By default, you will need the following secrets:
| Secret Name | Details |
|---|---|
| username | Used to access the OpenSearch REST API via basic auth. Set the username to the k8s namespace unless the service owner wants something else. |
| password | Used to access the OpenSearch REST API via basic auth. |
| hashed_password | Used to populate the opensearch security YAML config, which in turn is pushed to the OpenSearch Security API via securityadmin.sh . This must be a bcrypted hash. You can generate this with htpasswd , Python's bcrypt library, etc.
|
- Create and merge a patch that tells the OpenSearch operator to watch your new namespace. Example patch
Changing Resources on a Live Cluster
The chart has several places to change requests/limits.
So far as I can tell, the one that actually applies is under
opensearchCluster:
nodePools:
- component: masters
roles:
- "master"
- "data"
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "2000m"
As of this writing, that configuration is applied via this file in our deployment-charts repo. Which means if you want to override it, you need to add it to the values.yaml file specific to your deployment and/or helmfile release.
Again, as of this writing, the operator does not detect changes to the resources. As such, you'll have to delete the pods one-by-one to apply the resource changes. Take your time and ensure the cluster is healthy before moving on to the next pod!
1.
$ k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
memory: 2Gi
memory: 2Gi
2.
k delete po opensearch-ipoid-test-masters-0
3.
k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
memory: 4Gi
memory: 4Gi
The operator should be handling this automatically. We'll continue investigating, but it's likely this will be fixed when we move to a newer version of the operator and/or helm chart.
API Calls
Audit Logging
PT=43885; curl -H 'Content-type: Application/json' -XPOST -k -u ${PW} https://0:${PT}/security-auditlog-2025.10.31/_search?pretty