Data Platform/Systems/OpenSearch-on-K8s/Administration
OpenSearch on K8s (WIP)
Intended audience: SREs who operate the OpenSearch on K8s platform.
Note: This platform is not yet in production. See T408586 ☂️ OpenSearch on K8s: Ensure platform is ready for production ☂️ for the latest on production status.
If you are a service owner, or are interested in deploying on the platform, you may be more interested in this page.
Dashboards and Alerts
See our OpenSearch on K8s dashboard. The dashboard contains a number of metrics we use to gauge health. Of particular interest are cluster state, thread pools, FIXME: Add more data
You can find alerts in the Wikimedia Foundation's alert repo .
Deploying a New OpenSearch on K8s Cluster
- Begin by following the general instructions for deploying a new Kubernetes service.
- Create/merge a patch that adds your new namespace to the list of tenantNamespaces in deployment-charts repo (example CR). OpenSearch uses RBD, so you want to change the
helmfile.d/admin_ng/values/dse-k8s-eqiad/ceph-csi-rbd-values.yamlfile. - After merging, you will need to deploy
admin_ngfor both dse-k8s clusters. - Create/merge a patch that increases default resources for your new namespace (example patch) . This is needed because the version of the operator chart we use (2.7.0) does not allow changing the resources allotted to the bootstrap pod, and it needs at least 2 GB RAM to stand up the cluster.
- Create and add secrets. By default, you will need the following secrets:
| Secret Name | Details |
|---|---|
| username | Used to access the OpenSearch REST API via basic auth. Set the username to the k8s namespace unless the service owner wants something else. |
| password | Used to access the OpenSearch REST API via basic auth. |
| hashed_password | Used to populate the opensearch security YAML config, which in turn is pushed to the OpenSearch Security API via securityadmin.sh . This must be a bcrypted hash. You can generate this with htpasswd , Python's bcrypt library, etc.
|
- Create and merge a patch that tells the OpenSearch operator to watch your new namespace. Example patch
Phases of Deployment
1. The bootstrap, first master, and security pods are created:
NAME READY STATUS RESTARTS AGE
cluster-bootstrap-0 1/1 Running 0 61s
cluster-masters-0 0/1 Running 0 61s
cluster-securityconfig-update-f7wmp 1/1 Running 0 61s
During this phase, the securityconfig pod (which creates the .opendistro_security index, somewhat akin to the MySQL users table) continually polls the cluster, waiting for the cluster to reach yellow status (in this context, when the bootstrap node and initial master node form a cluster). You will see a lot of Waiting to connect to the cluster in the security config pod, this is normal for the first few minutes of a deploy.
If you look at the bootstrap pod's logs, you might see logs complaining that the .opendistro_security index doesn't exist. This is also normal for the first few minutes of a deploy.
About the Bootstrap Node The bootstrap node is a full-fledged OpenSearch cluster node, as opposed to a lightweight initContainer that just runs a script. So what makes it different from the other OpenSearch cluster nodes?
A) It starts before any other pods, I believe this is to prevent race conditions and ensure a predictable cluster formation process. B) Its resources cannot be controlled directly within the chart, so we have to set them via namespace-specific container defaults (see this CR for an example). The bootstrap pod needs at least 2 GB of RAM to do its job. |
2. The cluster has formed, but not all the masters have deployed:
kubectl get po
NAME READY STATUS RESTARTS AGE
cluster-bootstrap-0 1/1 Running 0 10m
cluster-masters-0 1/1 Running 0 10m
cluster-masters-1 0/1 Running 0 8m
cluster-securityconfig-update-f7wmp 0/1 Completed 0 10m
This phase begins once the securityconfig pod creates the .opendistro_security index. As shown above, this pod's status will change to Completed and the operator will create any additional masters.
3. The cluster is fully deployed:
kubectl get po
NAME READY STATUS RESTARTS AGE
cluster-masters-0 1/1 Running 0 63m
cluster-masters-1 1/1 Running 0 62m
cluster-masters-2 1/1 Running 0 60m
cluster-securityconfig-update-f7wmp 0/1 Completed 0 63m
This is what the cluster looks like after a successful deploy. Note that the operator has removed the bootstrap pod.
Testing
Pane 1
kubectl port-forward --address 0.0.0.0 cluster-masters-0 :9200
Forwarding from 0.0.0.0:45679 -> 9200
Pane 2
set variables
You can find the username and password in /srv/git/private/hieradata/role/common/deployment_server/kubernetes.yaml on the Puppet server.
PW=user:password <- used for basic auth with cURL.
PT=45679 <- connect to the same port number from pane 1
with authentication
curl -ks -u ${PW} https://0:${PT}/_prometheus/metrics
without authentication
curl -ks https://0:${PT}/_cat/nodes
curl -ks https://0:${PT}/_cat/health
curl -ks https://0:${PT}/_prometheus/metrics
Deploying on an existing cluster
Changing Resources on a Live Cluster
The chart has several places to change requests/limits.
So far as I can tell, the one that actually applies is under
opensearchCluster:
nodePools:
- component: masters
roles:
- "master"
- "data"
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "2000m"
As of this writing, that configuration is applied via this file in our deployment-charts repo. Which means if you want to override it, you need to add it to the values.yaml file specific to your deployment and/or helmfile release.
Again, as of this writing, the operator does not detect changes to the resources or source image.
As such, you'll have to delete the pods one-by-one to apply the resource changes. Note that since the PersistentVolumes and their PersistentVolumeClaims are not deleted, the data is not at risk. In practice it feels a lot more like a very quick reboot (delete the pod, the operator replaces it immediately, the new pod is connected to the existing OpenSearch data). It typically only takes a minute or two after deletion for the cluster to get back into its healthy "green" state.
But you should still take your time and ensure the cluster is healthy before moving on to the next pod. You can use the following API calls to check cluster health:
curl -H 'Accept:Application/yaml' -XGET https://opensearch-test.discovery.wmnet:30443/_cluster/health
---
cluster_name: "opensearch-test"
status: "yellow"
timed_out: false
number_of_nodes: 3
number_of_data_nodes: 3
discovered_master: true
discovered_cluster_manager: true
active_primary_shards: 8
active_shards: 18
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 2
delayed_unassigned_shards: 0
number_of_pending_tasks: 0
number_of_in_flight_fetch: 0
task_max_waiting_in_queue_millis: 0
active_shards_percent_as_number: 90.0
---
This cluster is yellow. If it stays in yellow for more than a few minutes after launching a new pod, further investigation is required. Did the new pod join the cluster? Are shards moving to it? If there are under-replicated shards, are they on the pod you want to delete?
curl -XGET https://opensearch-test.discovery.wmnet:30443/_cat/nodes
10.67.29.29 8 56 1 0.80 1.39 1.72 dm cluster_manager,data - opensearch-test-masters-2
10.67.28.200 36 83 4 0.83 1.13 1.45 dm cluster_manager,data * opensearch-test-masters-0
This cluster is missing a pod, wait for the third pod to finish deploying before deleting another pod.
Step by Step
1.
$ k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
memory: 2Gi
memory: 2Gi
We have applied the change via helmfile, but the operator hasn't detected the changes.
2.
k delete po opensearch-ipoid-test-masters-0
We've deleted one of the pods to force its replacement.
3.
k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
memory: 4Gi
memory: 4Gi
After a few minutes, the pod has been replaced and it has the new specs.
The operator should be handling this automatically. We'll continue investigating, but it's likely this will be fixed when we move to a newer version of the operator and/or helm chart.
Renewing TLS Certificates
The current version of the OpenSearch operator does not currently support hot reloading of certificates. Note that kubectl get cert is *NOT* sufficient to see which certificate is presented by the cluster, you have to hit the endpoint directly:
gnutls-cli --print-cert opensearch-ipoid.svc.codfw.wmnet:30443 | grep expire
- subject `CN=opensearch-wmf', issuer `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', serial 0x7be936f0d3f3aaeb1514cb41a7c4b13343a5b2eb, RSA key 2048 bits, signed using ECDSA-SHA512, activated `2025-12-10 17:15:00 UTC', expires `2026-01-07 17:15:00 UTC', pin-sha256="AcCR/RPH1ogjJfEnPi91ltR4uCLEY7bFtwcVIgSFfCU="
- subject `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', issuer `CN=Wikimedia_Internal_Root_CA,OU=Cloud Services,O=Wikimedia Foundation\, Inc,L=San Francisco,ST=California,C=US', serial 0x715331115b69e7112b0e3c7f8c89ce15c51a4639, EC/ECDSA key 528 bits, signed using ECDSA-SHA512, activated `2021-05-04 13:54:00 UTC', expires `2026-05-03 13:54:00 UTC', pin-sha256="PbgfDlEHVB4Zw0a42zNqqnEQbcYF9jYp/dbT4eSdOb8="
*** Fatal error: Error in the certificate.
- Status: The certificate is NOT trusted. The certificate chain uses expired certificate.
To force the operator to regenerate certificates: 1) Prepare a patch that adds a dummy SAN (example) 2) From the deployment server, check the age of the `opensearch-wmf` Kubernetes certificate resource:
k get cert
NAME READY SECRET AGE
opensearch-wmf True opensearch-wmf 57d
3) As the ${NS}-deploy user, delete the current certificate:
k delete cert opensearch-wmf
Note that this won't break the current installation as the certificates are already part of the pod data.
4)Pull down the patch you created and do a homedir deploy (full details on what this is and how to do it in the link).
5) Check the health of the cluster
curl https://opensearch-ipoid.svc.codfw.wmnet:30443/_cat/health
1770324073 20:41:13 opensearch-ipoid green 3 3 true 132 63 0 0 0 0 - 100.0%
Assuming the health is green, perform a rolling refresh of the pods and described above.
6)Verify that your endpoint is serving the new certificate:
gnutls-cli --print-cert opensearch-ipoid.svc.codfw.wmnet:30443 | grep expire
- subject `CN=opensearch-wmf', issuer `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', serial 0x69ad031ab6ce85e4abf0a1be545c95e0d716daef, RSA key 2048 bits, signed using ECDSA-SHA512, activated `2026-02-05 19:35:00 UTC', expires `2026-03-05 19:35:00 UTC', pin-sha256="AcCR/RPH1ogjJfEnPi91ltR4uCLEY7bFtwcVIgSFfCU="
7)Confirm that all certificate alerts have cleared. The Blackbox probes that monitor certificate expiration come from outside K8s, so they could be hitting any one of the three pods. If the alerts don't all clear, you may have forgotten to cycle through all of the pods.
API Calls
Audit Logging
PT=43885; curl -H 'Content-type: Application/json' -XPOST -k -u ${PW} https://0:${PT}/security-auditlog-2025.10.31/_search?pretty
Forcing Shard Reallocation
! You will need the `operator` OpenSearch user as opposed to the typical `opensearch` user!
Dynamic and Static Configuration settings
| Aspect | Static | Dynamic |
|---|---|---|
| Where to set | opensearch.yml. If using the opensearch-cluster helm chart, additionalConfig is added to opensearch.yml |
set via the OpenSearch API |
| Activating | You have to restart the OpenSearch service. Theoretically, this should be handled by the operator, but you may need to rollout restart to apply the settings if the settings don't change |
Settings are active as soon as OpenSearch accepts your API call |
| How to tell if setting is known to OpenSearch | You will NOT see the changes in configMaps or inside the pod's opensearch.yml, as the operator adds new settings as environment variables. You can run env inside the pod to check if the setting exists. | Check via OpenSearch API |
| How to tell if setting is applied | OpenSearch config follows Postel's law, and it will accept invalid config (For example, see this change where I added config that OpenSearch accepted, but never really worked). You may need to do more investigation to verify the setting is:
1) correct and 2)active. OpenSearch will ignore some settings even when they are correct and active if they would have detrimental effects on the cluster (for example, it will allow shards to land on a banned node if there is no other place to put them). |
Same |
Disk Usage
How to Check
By default, the OpenSearch on K8s pods have 30 GB disks. You can check disk usage via the following API call:
curl -s https://opensearch-ipoid.svc.eqiad.wmnet:30443/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
21 8.3gb 8.3gb 20.9gb 29.3gb 28 opensearch-ipoid-masters-2 10.67.28.219 opensearch-ipoid-masters-2
21 4.6gb 4.6gb 24.6gb 29.3gb 15 opensearch-ipoid-masters-0 10.67.26.116 opensearch-ipoid-masters-0
22 13gb 13gb 16.2gb 29.3gb 44 opensearch-ipoid-masters-1 10.67.28.6 opensearch-ipoid-masters-1
Disk usage is also visible via this panel from the OpenSearch on K8s dashboard
How to Expand
1. Merge a change that raises the disk size. Note that in the current version of the opensearch-cluster helm chart (2.7.0), you will have to override the entire nodePools object within the opensearchCluster object, see this CR for an example.
2. Apply the change using our standard deployment process (helmfile apply).
3. The change should be visible in OpenSearch immediately. Verify by using the same call as above:
curl -s https://opensearch-ipoid.svc.eqiad.wmnet:30443/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
25 6.4gb 6.4gb 32.7gb 39.2gb 16 opensearch-ipoid-masters-0 10.67.29.37 opensearch-ipoid-masters-0
24 4.6gb 4.6gb 34.5gb 39.2gb 11 opensearch-ipoid-masters-2 10.67.30.227 opensearch-ipoid-masters-2
24 12.8gb 12.8gb 26.3gb 39.2gb 32 opensearch-ipoid-masters-1 10.67.28.5 opensearch-ipoid-masters-1
Now we have ~40GB of disk space per pod instead of the previous ~30GB!