User:Razzi/alertname: Icinga/Check correctness of the icinga configuration

From Wikitech
Jump to navigation Jump to search

Saw alert on alerts.wikimedia.org

alertname: Icinga/Check correctness of the icinga configuration
summary: Icinga configuration contains errors

Runbook at https://wikitech.wikimedia.org/wiki/Icinga

Has section https://wikitech.wikimedia.org/wiki/Icinga#Check_validity_of_the_Icinga's_config

so I run

sudo /usr/sbin/icinga -v /etc/icinga/icinga.cfg

It returns

Total Warnings: 0
Total Errors:   2

Looking up for the errors:

Error: Service check command 'check_https_on_port:8090' specified in service 'puppetdb-api codfw port 8090/tcp - Puppetdb api microservice IPv4' for host 'puppetdb1002' (file '/etc/nagios/nagios_service.cfg', line 24535) not defined anywhere!
Error: Service check command 'check_https_on_port:8090' specified in service 'puppetdb-api eqiad port 8090/tcp - Puppetdb api microservice IPv4' for host 'puppetdb2002' (file '/etc/nagios/nagios_service.cfg', line 24557) not defined anywhere!

maybe I can find these lines

Yes, they are on icinga.wikimedia.org

Here are the lines

define service {
       ## --PUPPET_NAME-- (called '_naginator_name' in the manifest)                alert1001 puppetdb1002_puppetdb-api
       active_checks_enabled          1
       check_command                  check_https_on_port:8090
       check_freshness                0
       check_interval                 1
       check_period                   24x7
       contact_groups                 admins
       host_name                      puppetdb1002
       is_volatile                    0
       max_check_attempts             3
       notes_url                      https://wikitech.wikimedia.org/wiki/Puppet#Micro_Service
       notification_interval          0
       notification_options           c,r,f
       notification_period            24x7
       notifications_enabled          1
       passive_checks_enabled         1
       retry_interval                 1
       service_description            puppetdb-api codfw port 8090/tcp - Puppetdb api microservice IPv4
       servicegroups                  lvs
}
define service {
       ## --PUPPET_NAME-- (called '_naginator_name' in the manifest)                alert1001 puppetdb2002_puppetdb-api
       active_checks_enabled          1
       check_command                  check_https_on_port:8090
       check_freshness                0
       check_interval                 1
       check_period                   24x7
       contact_groups                 admins
       host_name                      puppetdb2002
       is_volatile                    0
       max_check_attempts             3
       notes_url                      https://wikitech.wikimedia.org/wiki/Puppet#Micro_Service
       notification_interval          0
       notification_options           c,r,f
       notification_period            24x7
       notifications_enabled          1
       passive_checks_enabled         1
       retry_interval                 1
       service_description            puppetdb-api eqiad port 8090/tcp - Puppetdb api microservice IPv4
       servicegroups                  lvs
}

This was patched: https://gerrit.wikimedia.org/r/c/operations/puppet/+/693496

I didn't find the original string because I hadn't pulled the latest puppet code, and the error was only on a later tree!