As a result of the mobile gateway at en.m becoming more and more popular it has become necessary to draft some guidelines about what it is required from our dedicated operations staff to better support the service. The role of these guides would be a quick go to for anyone needing to administer the service. Separate pages could be used for anything more specific.
At a bare minimum the following criteria would be required:
- Software Stack
- OS if relevant
- Scary Salaris
Should be brief as this will be a operations guide not a detailed service spec
- What hosts do we live on?
Links should follow to the hardware pages of the servers
This can get out of date really quickly
- Who are the clients?
- Be able to easily identify the state of the application and if its meeting its sla's
- Can capacity be easily added and if so how?
- Clearly identify what other services this piece of software requires and what other services require it
- What happens if the service is intermittent or goes down hard
- How do we bring it up and down
- How do we do migrate quickly if the server is on fire
- Where do they exist live and in source control
- What (additional) actions are needed to make configuration changes go live
- What commonly breaks
- And how are we fixing it
this should not be a section of this sucks and we just live with it
- How do I see running activity of the service
- Process stats
- Black Magic
- How do I track the history of the services lifetime
- How are backups of any data handled, next to the configuration of the service itself.
- Who do I call when the problem is far more then what I can handle
- Pointers to more detailed installation/configuration instructions if needed
Template is available at Template:ServiceOperations