Jump to content

Data Platform/Systems/DataHub/Administration

From Wikitech

Datahub Service Details

datahub-next
Attribute Value
Owner Data Platform SRE
Kubernetes Cluster dse-k8s-eqiad
Kubernetes Namespace datahub-next
Chart https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/charts/datahub/
Helmfiles https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/helmfile.d/dse-k8s-services/datahub-next/
Docker image https://gitlab.wikimedia.org/repos/data-engineering/datahub
Internal service DNS datahub-next.svc.eqiad.wmnet
Public service URL datahub-next.wikimedia.org
Logs datahub-next combined App Logs (Kubernetes)
Metrics https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=thanos&var-site=eqiad&var-cluster=k8s-dse&var-namespace=datahub-next&var-container=All
Monitors https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-data-platform/datahub-availability.yaml
Application documentation https://datahubproject.io/docs/features
Paging false
Deployment Phabricator ticket T361185
datahub
Attribute Value
Owner Data Platform SRE
Kubernetes Cluster dse-k8s-eqiad
Kubernetes Namespace datahub
Chart https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/charts/datahub/
Helmfiles https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/helmfile.d/dse-k8s-services/datahub/
Docker image https://gitlab.wikimedia.org/repos/data-engineering/datahub
Internal service DNS datahub.svc.eqiad.wmnet
Public service URL datahub-wikimedia.org
Logs datahub-frontend production App logs
Logs datahub-gms production App logs
Logs datahub-mce-consumer production App logs
Logs datahub-mae-consumer production App logs
Metrics https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=thanos&var-site=eqiad&var-cluster=k8s-dse&var-namespace=datahub&var-container=All
Monitors https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-data-platform/datahub-availability.yaml
Application documentation https://datahubproject.io/docs/features
Paging false
Deployment Phabricator ticket T361185

Deployment

As per Kubernetes/Deployments, we deploy these services from the deployment server, using helmfile.

datahub-next

kube-env datahub-next-deploy dse-k8s-eqiad
cd /srv/deployment-charts/helmfile.d/dse-k8s-services/datahub-next
helmfile -e dse-k8s-eqiad -i apply

datahub

kube-env datahub-deploy dse-k8s-eqiad
cd /srv/deployment-charts/helmfile.d/dse-k8s-services/datahub
helmfile -e dse-k8s-eqiad -i apply

Restore Indices

The MariaDB database is the cacnonical data store for DataHub and the OpenSearch indices may be recreated from the database. On occasions, it may be necessary to recreate these indices, either due to an upgrade or a problem.

There is a suspended CronJob that can be used for this purpose. We create a new CronJob based on this template.

kube-env datahub-deploy dse-k8s-eqiad
kubectl create job --from=cronjob/datahub-production-restore-indices-job-template datahub-restore-indices-job

The logs of the job can be viewed with:

kubectl get pods
kubectl logs -f datahub-restore-indices-job-<hash> -c datahub-upgrade-job

The job should complete within about 20 minutes.

Alerting

The app isn't running

If you're getting paged because the app isn't running, investigate if something in the `datahub-gms-production` logs could explain the crash. In case of a recurring crash, the pod would be in CrashloopBackoff state in Kubernetes. To check whether this is the case, ssh to the deployment server and run the following commands

kube-env datahub-deploy dse-k8s-eqiad
kubectl get pods
kubectl logs -f datahub-gms-production-<hash> -c datahub-gms-production

If no pod at all is displayed, re-deploy the app by following the deployment instructions above.