Jump to content

Event Platform/Stream Processing/Flink/FailureScenarios

From Wikitech

Kubernetes Operator

The Flink Kubernetes Operator runs as an HA pair. We have observed a scenario in which the active master loses sync with its resources. In this scenario, API calls that involve resource updates (writes, in other words) are either ignored, or hang forever.

Workaround

Deleting the Flink Operator's active master container will force a failover, which fixes the issue. To find the master: kubectl -n flink-operator get lease flink-operator-lease  -o yaml