Jump to content

Data Platform/Systems/OpenSearch-on-K8s/Administration

From Wikitech

OpenSearch on K8s (WIP)

Intended audience: SREs who operate the OpenSearch on K8s platform.

Note: This platform is not yet in production. See T408586 ☂️ OpenSearch on K8s: Ensure platform is ready for production ☂️ for the latest on production status.

If you are a service owner, or are interested in deploying on the platform, you may be more interested in this page.

Dashboards and Alerts

See our OpenSearch on K8s dashboard. The dashboard contains a number of metrics we use to gauge health. Of particular interest are cluster state, thread pools, FIXME: Add more data

You can find alerts in the Wikimedia Foundation's alert repo .

Deploying a New OpenSearch on K8s Cluster

  • Begin by following the general instructions for deploying a new Kubernetes service.
  • Create/merge a patch that adds your new namespace to the list of tenantNamespaces in deployment-charts repo (example CR). OpenSearch uses RBD, so you want to change the helmfile.d/admin_ng/values/dse-k8s-eqiad/ceph-csi-rbd-values.yaml file.
  • After merging, you will need to deploy admin_ng for both dse-k8s clusters.
  • Create/merge a patch that increases default resources for your new namespace (example patch) . This is needed because the version of the operator chart we use (2.7.0) does not allow changing the resources allotted to the bootstrap pod, and it needs at least 2 GB RAM to stand up the cluster.
  • Create and add secrets. By default, you will need the following secrets:
Secret Name Details
username Used to access the OpenSearch REST API via basic auth. Set the username to the k8s namespace unless the service owner wants something else.
password Used to access the OpenSearch REST API via basic auth.
hashed_password Used to populate the opensearch security YAML config, which in turn is pushed to the OpenSearch Security API via securityadmin.sh . This must be a bcrypted hash. You can generate this with htpasswd , Python's bcrypt library, etc.
  • Create and merge a patch that tells the OpenSearch operator to watch your new namespace. Example patch

Phases of Deployment

1. The bootstrap, first master, and security pods are created:

NAME                                                              READY   STATUS    RESTARTS   AGE
cluster-bootstrap-0                                               1/1     Running   0          61s
cluster-masters-0                                                 0/1     Running   0          61s
cluster-securityconfig-update-f7wmp                               1/1     Running   0          61s

During this phase, the securityconfig pod (which creates the .opendistro_security index, somewhat akin to the MySQL users table) continually polls the cluster, waiting for the cluster to reach yellow status (in this context, when the bootstrap node and initial master node form a cluster). You will see a lot of Waiting to connect to the cluster in the security config pod, this is normal for the first few minutes of a deploy.

If you look at the bootstrap pod's logs, you might see logs complaining that the .opendistro_security index doesn't exist. This is also normal for the first few minutes of a deploy.


2. The cluster has formed, but not all the masters have deployed:

kubectl get po
NAME                                                              READY   STATUS      RESTARTS   AGE
cluster-bootstrap-0                                               1/1     Running     0          10m
cluster-masters-0                                                 1/1     Running     0          10m
cluster-masters-1                                                 0/1     Running     0          8m
cluster-securityconfig-update-f7wmp                               0/1     Completed   0          10m

This phase begins once the securityconfig pod creates the .opendistro_security index. As shown above, this pod's status will change to Completed and the operator will create any additional masters.

3. The cluster is fully deployed:

kubectl get po
NAME                                                              READY   STATUS      RESTARTS   AGE
cluster-masters-0                                                 1/1     Running     0          63m
cluster-masters-1                                                 1/1     Running     0          62m
cluster-masters-2                                                 1/1     Running     0          60m
cluster-securityconfig-update-f7wmp                               0/1     Completed   0          63m

This is what the cluster looks like after a successful deploy. Note that the operator has removed the bootstrap pod.

Testing

Pane 1

kubectl port-forward --address 0.0.0.0 cluster-masters-0 :9200
Forwarding from 0.0.0.0:45679 -> 9200

Pane 2

set variables

You can find the username and password in /srv/git/private/hieradata/role/common/deployment_server/kubernetes.yaml on the Puppet server.

PW=user:password <- used for basic auth with cURL. 

PT=45679 <- connect to the same port number from pane 1

with authentication

curl -ks -u ${PW} https://0:${PT}/_prometheus/metrics

without authentication

curl -ks https://0:${PT}/_cat/nodes
curl -ks  https://0:${PT}/_cat/health
curl -ks https://0:${PT}/_prometheus/metrics

Deploying on an existing cluster

Changing Resources on a Live Cluster

The chart has several places to change requests/limits.

So far as I can tell, the one that actually applies is under

opensearchCluster:
  nodePools:
    - component: masters
      roles:
        - "master"
        - "data"
      resources:
        requests:
          memory: "4Gi"
          cpu: "2000m"
        limits:
          memory: "4Gi"
          cpu: "2000m"


As of this writing, that configuration is applied via this file in our deployment-charts repo. Which means if you want to override it, you need to add it to the values.yaml file specific to your deployment and/or helmfile release.

Again, as of this writing, the operator does not detect changes to the resources or source image.

As such, you'll have to delete the pods one-by-one to apply the resource changes. Note that since the PersistentVolumes and their PersistentVolumeClaims are not deleted, the data is not at risk. In practice it feels a lot more like a very quick reboot (delete the pod, the operator replaces it immediately, the new pod is connected to the existing OpenSearch data). It typically only takes a minute or two after deletion for the cluster to get back into its healthy "green" state.

But you should still take your time and ensure the cluster is healthy before moving on to the next pod. You can use the following API calls to check cluster health:

  curl -H 'Accept:Application/yaml' -XGET https://opensearch-test.discovery.wmnet:30443/_cluster/health
---
cluster_name: "opensearch-test"
status: "yellow"
timed_out: false
number_of_nodes: 3
number_of_data_nodes: 3
discovered_master: true
discovered_cluster_manager: true
active_primary_shards: 8
active_shards: 18
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 2
delayed_unassigned_shards: 0
number_of_pending_tasks: 0
number_of_in_flight_fetch: 0
task_max_waiting_in_queue_millis: 0
active_shards_percent_as_number: 90.0
---

This cluster is yellow. If it stays in yellow for more than a few minutes after launching a new pod, further investigation is required. Did the new pod join the cluster? Are shards moving to it? If there are under-replicated shards, are they on the pod you want to delete?

 curl -XGET https://opensearch-test.discovery.wmnet:30443/_cat/nodes
10.67.29.29   8 56 1 0.80 1.39 1.72 dm cluster_manager,data - opensearch-test-masters-2
10.67.28.200 36 83 4 0.83 1.13 1.45 dm cluster_manager,data * opensearch-test-masters-0

This cluster is missing a pod, wait for the third pod to finish deploying before deleting another pod.

Step by Step

1.

$ k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
        memory: 2Gi
        memory: 2Gi

We have applied the change via helmfile, but the operator hasn't detected the changes.

2.

k delete po opensearch-ipoid-test-masters-0

We've deleted one of the pods to force its replacement.

3.

k get po -o yaml opensearch-ipoid-test-masters-0 | grep -i memory
        memory: 4Gi
        memory: 4Gi

After a few minutes, the pod has been replaced and it has the new specs.

The operator should be handling this automatically. We'll continue investigating, but it's likely this will be fixed when we move to a newer version of the operator and/or helm chart.

Renewing TLS Certificates

The current version of the OpenSearch operator does not currently support hot reloading of certificates. Note that kubectl get cert is *NOT* sufficient to see which certificate is presented by the cluster, you have to hit the endpoint directly:

gnutls-cli --print-cert opensearch-ipoid.svc.codfw.wmnet:30443 | grep expire
 - subject `CN=opensearch-wmf', issuer `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', serial 0x7be936f0d3f3aaeb1514cb41a7c4b13343a5b2eb, RSA key 2048 bits, signed using ECDSA-SHA512, activated `2025-12-10 17:15:00 UTC', expires `2026-01-07 17:15:00 UTC', pin-sha256="AcCR/RPH1ogjJfEnPi91ltR4uCLEY7bFtwcVIgSFfCU="
 - subject `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', issuer `CN=Wikimedia_Internal_Root_CA,OU=Cloud Services,O=Wikimedia Foundation\, Inc,L=San Francisco,ST=California,C=US', serial 0x715331115b69e7112b0e3c7f8c89ce15c51a4639, EC/ECDSA key 528 bits, signed using ECDSA-SHA512, activated `2021-05-04 13:54:00 UTC', expires `2026-05-03 13:54:00 UTC', pin-sha256="PbgfDlEHVB4Zw0a42zNqqnEQbcYF9jYp/dbT4eSdOb8="
*** Fatal error: Error in the certificate.
- Status: The certificate is NOT trusted. The certificate chain uses expired certificate.

To force the operator to regenerate certificates: 1) Prepare a patch that adds a dummy SAN (example) 2) From the deployment server, check the age of the `opensearch-wmf` Kubernetes certificate resource:

k get cert
NAME                   READY   SECRET                 AGE
opensearch-wmf         True    opensearch-wmf         57d


3) As the ${NS}-deploy user, delete the current certificate: k delete cert opensearch-wmf

Note that this won't break the current installation as the certificates are already part of the pod data.

4)Pull down the patch you created and do a homedir deploy (full details on what this is and how to do it in the link).

5) Check the health of the cluster

curl https://opensearch-ipoid.svc.codfw.wmnet:30443/_cat/health
1770324073 20:41:13 opensearch-ipoid green 3 3 true 132 63 0 0 0 0 - 100.0%

Assuming the health is green, perform a rolling refresh of the pods and described above.

6)Verify that your endpoint is serving the new certificate:

gnutls-cli --print-cert opensearch-ipoid.svc.codfw.wmnet:30443 | grep expire
  - subject `CN=opensearch-wmf', issuer `CN=discovery,OU=SRE Foundations,O=Wikimedia Foundation\, Inc,L=San Francisco,C=US', serial 0x69ad031ab6ce85e4abf0a1be545c95e0d716daef, RSA key 2048 bits, signed using ECDSA-SHA512, activated `2026-02-05 19:35:00 UTC', expires `2026-03-05 19:35:00 UTC', pin-sha256="AcCR/RPH1ogjJfEnPi91ltR4uCLEY7bFtwcVIgSFfCU="

7)Confirm that all certificate alerts have cleared. The Blackbox probes that monitor certificate expiration come from outside K8s, so they could be hitting any one of the three pods. If the alerts don't all clear, you may have forgotten to cycle through all of the pods.

API Calls

Audit Logging

PT=43885; curl -H 'Content-type: Application/json' -XPOST -k -u ${PW}  https://0:${PT}/security-auditlog-2025.10.31/_search?pretty

Forcing Shard Reallocation

! You will need the `operator` OpenSearch user as opposed to the typical `opensearch` user!

Dynamic and Static Configuration settings

Aspect Static Dynamic
Where to set opensearch.yml. If using the opensearch-cluster helm chart, additionalConfig is added to opensearch.yml set via the OpenSearch API
Activating You have to restart the OpenSearch service. Theoretically, this should be handled by the operator, but you may need to rollout restart to apply the settings if the settings don't change Settings are active as soon as OpenSearch accepts your API call
How to tell if setting is known to OpenSearch You will NOT see the changes in configMaps or inside the pod's opensearch.yml, as the operator adds new settings as environment variables. You can run env inside the pod to check if the setting exists. Check via OpenSearch API
How to tell if setting is applied OpenSearch config follows Postel's law, and it will accept invalid config (For example, see this change where I added config that OpenSearch accepted, but never really worked). You may need to do more investigation to verify the setting is:

1) correct and 2)active. OpenSearch will ignore some settings even when they are correct and active if they would have detrimental effects on the cluster (for example, it will allow shards to land on a banned node if there is no other place to put them).

Same

Disk Usage

How to Check

By default, the OpenSearch on K8s pods have 30 GB disks. You can check disk usage via the following API call:

curl -s  https://opensearch-ipoid.svc.eqiad.wmnet:30443/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host                       ip           node
    21        8.3gb     8.3gb     20.9gb     29.3gb           28 opensearch-ipoid-masters-2 10.67.28.219 opensearch-ipoid-masters-2
    21        4.6gb     4.6gb     24.6gb     29.3gb           15 opensearch-ipoid-masters-0 10.67.26.116 opensearch-ipoid-masters-0
    22         13gb      13gb     16.2gb     29.3gb           44 opensearch-ipoid-masters-1 10.67.28.6   opensearch-ipoid-masters-1

Disk usage is also visible via this panel from the OpenSearch on K8s dashboard

How to Expand

1. Merge a change that raises the disk size. Note that in the current version of the opensearch-cluster helm chart (2.7.0), you will have to override the entire nodePools object within the opensearchCluster object, see this CR for an example.

2. Apply the change using our standard deployment process (helmfile apply).

3. The change should be visible in OpenSearch immediately. Verify by using the same call as above:

curl -s  https://opensearch-ipoid.svc.eqiad.wmnet:30443/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host                       ip           node
    25        6.4gb     6.4gb     32.7gb     39.2gb           16 opensearch-ipoid-masters-0 10.67.29.37  opensearch-ipoid-masters-0
    24        4.6gb     4.6gb     34.5gb     39.2gb           11 opensearch-ipoid-masters-2 10.67.30.227 opensearch-ipoid-masters-2
    24       12.8gb    12.8gb     26.3gb     39.2gb           32 opensearch-ipoid-masters-1 10.67.28.5   opensearch-ipoid-masters-1

Now we have ~40GB of disk space per pod instead of the previous ~30GB!