Obsolete:Ldap rename

From Wikitech
Jump to navigation Jump to search

Several things are happening to our ldap server infrastructure:

  • Our current secondary ldap server, virt0, is about to be shut down as the Tampa datacenter is decomissioned.
  • A new ldap server is now running on the Dallas datacenter on a box named labcontrol2001

As a part of this change (and in order to ease future such transitions) we're going to start referring to ldap servers using service-specific names rather than server names.

  • The ldap server running on virt1000 will be known as ldap-eqiad.wikimedia.org
  • The ldap server running on labcontrol2001 will be known as ldap-codfw.wikimedia.org

Since no existing instances refer to it, ldap-codfw is already up and running and doesn't require any additional work. Renaming virt1000 will be more complex, as many services already rely on it and call it by name. Renaming the service on virt1000 in place will cause several outages (long ones for labs instances with broken puppet). To avoid these outages, we've set up a third transitional server, neptunium.wikimedia.org. https://rt.wikimedia.org/Ticket/Display.html?id=8417 Here's how this will work:

  1. As of 2014-09-24, ldap-eqiad points to neptunium. An ldap server is running there which replicates virt1000's data and sound be in every way identical (save name and ssl cert).
  2. Soon, Andrew will merge a puppet patch that renames ldap servers: https://gerrit.wikimedia.org/r/#/c/162689 After this, all properly-puppetized systems will be using ldap-eqiad (neptunium) as the ldap primary and ldap-codfw (labcontrol2001) as the ldap secondary.
  3. Andrew will give a seven-day warning on labs-l: All non-puppetized instances must have two patches cherry-picked and applied: https://gerrit.wikimedia.org/r/#/c/159740/ and https://gerrit.wikimedia.org/r/#/c/162689
  4. Someone (Coren?) will generate a report of all instances not subject to active puppet control.
  5. At some point during the seven days (maybe right away) Andrew will start going through that list and applying those two needed patches. (Since some clusters e.g. deployment-prep use project-wide puppetmasters, this shouldn't be all that painful.)
  6. At the end of the seven days, ldap on virt1000 will be modified to serve up the proper ldap-eqiad certificate.
    1. Add new cert to truststore: /usr/bin/keytool -importcert -trustcacerts -alias ldap-eqiad.wikimedia.org -file /etc/ssl/certs/ldap-eqiad.wikimedia.org.chained.pem -keystore /var/opendj/instance/config/truststore -storepass `cat /var/opendj/instance/config/keystore.pin` -noprompt
    2. run /usr/opendj/bin/dsconfig as user 'opendj,' edit all connection handlers to use cert nickname ldap-eqiad.wikimedia.org
    3. stop opendj, edit config.ldif, change ds-cfg-key-store-file to ds-cfg-key-store-file: /var/opendj/instance/ldap-eqiad.wikimedia.org.p12, restart opendj
  7. ldap-eqiad will be changed in DNS to point to virt1000. Systems will move from ldap-eqiad (neptunium) to ldap-eqiad (virt1000) without noticing the difference.
  8. After a while, the ldap server on neptunium will be shut down and neptunium will be wiped/reclaimed/powered down.
    1. After which, 'virt1000' needs to be removed from the replication chain.

Instances in Danger

The following instances are still using the old ldap servers, presumably because of puppet breakage:

  • wikidata-topicmaps: wikidata-topicmaps (mukil)
  • netflow: flow-localpuppet (jkrauska)
  • abusefilter-global: abusefilter-global-main (novaadmin)
  • openstack: nova-precise2 (novaadmin)
  • conventionextension: conventionextension-test (novaadmin)