Incidents/20150901-Elasticsearch
Appearance
Summary
Elasticsearch service (on elastic*.eqiad.wmnet nodes) backing the search functionality went red for few minutes. We didn't lose any real data and we failed to service some searches during 10 minutes.
Timeline
- 05:28: dcausse pauses write before applying the firewall rules to master (elastic1001)
- 05:32: chasemp applies the rules
- 05:32: master is starting to lose track of its nodes
- 05:33: cluster is red
- 05:33: chasemp revert the rules
- 05:34: cluster is starting to recover
- 05:39: cluster is back to yellow
- 05:48: there's a 10 min spike of "Pool errors", dcausse and chasemp test some queries on enwiki and they all worked
- 07:58: cluster is back to green
- 08:00: dcausse unfreeze the indices