User:BryanDavis/Scap3 in a Cloud VPS project

From Wikitech

Setting up a deploy server to use scap3 in a Cloud VPS project

  • Create a m1.small instance
  • Add role and some dummy hiera config values that are needed by ::profile::mediawiki::deployment::server to Hiera:$PROJECT/host/$HOST
---
classes:
    - profile::mediawiki::deployment::server
    - profile::keyholder::server
profile::keyholder::server::require_encrypted_keys: nope
scap::dsh::groups:
  mediawiki-installation:
    hosts:
    - 127.0.0.1
  • Add some project wide hiera config values to set the scap and scap3 master server in Hiera:$PROJECT:
    ---
    scap::deployment_server: <FQDN of project deploy server>
    profile::mediawiki::deployment::server::rsync_host: <FQDN of project deploy server>
    scap::wmflabs_master: <FQDN of project deploy server>
    deployment_server: <FQDN of project deploy server>
    network::allow_deployment_from_ips:
    - <FQDN of project deploy server>
    # A bunch of hiera settings that we have to stub out because the profiles
    # are not well factored for the scap3 only use case
    profile::rsyslog::kafka_shipper::kafka_brokers: []
    profile::mediawiki::php::enable_fpm: false
    profile::mediawiki::php::version: "7.2"
    profile::mediawiki::apc_shm_size: 128M
    has_lvs: false
    lvs::configuration::lvs_service_ips: {}
    lvs::configuration::lvs_services: {}
    
  • Force another puppet run!

Adding a scap3 project

profile::keyholder::server::agents:
    deploy-service:
        trusted_groups:
            - wikidev

scap::sources:
    striker/deploy:
        repository: labs/striker/deploy

Syncing with the cluster

  • Accept the ssh host keys of all of the target nodes as the user you will be deploying as (e.g. bd808 if you happen to be BryanDavis)
  • Arm Keyholder on your deploy server.
    • When running sudo keyholder status you should see all the keys needed listed, if not something went wrong.
    • To verify that everything runs as expected, pick one of the target hosts of the deployment and execute the following (you should be able to ssh correctly): SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l <KEYHOLDER-IDENTITY> -oBatchMode=yes <TARGET-HOSTNAME>
    • Interesting corner case: the analytics team owns the keyholder identity analytics-deploy and needs to deploy to hosts using the username analytics. This is what I had to do to resolve ssh problems when using scap: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l analytics-deploy -oBatchMode=yes analytics@hadoop-coordinator-2.analytics.eqiad.wmflabs
  • Deploy stuff!
    $ cd $MY_DEPLOY_DIR  # (e.g /srv/deployment/striker/deploy)
    $ scap deploy
    21:14:10 Started Deploy: striker/deploy
    Entering 'public_html/staticfiles'
    Entering 'striker'
    Entering 'wheels'
    21:14:10
    == DEFAULT ==
    :* striker-uwsgi01.striker.eqiad.wmflabs
    striker/deploy: fetch stage(s): 100% (ok: 1; fail: 0; left: 0)
    striker/deploy: config_deploy stage(s): 100% (ok: 1; fail: 0; left: 0)
    striker/deploy: promote and restart_service stage(s):   0% (ok: 0; fail: 0; left: 1)
    striker/deploy: promote and restart_service stage(s): 100% (ok: 1; fail: 0; left: 0)
    striker/deploy: promote and restart_service stage(s): 100% (ok: 1; fail: 0; left: 0)
    21:14:13 Finished Deploy: striker/deploy (duration: 00m 02s)
    $ scap deploy-log
    -- Opening log file: '/srv/deployment/striker/deploy/scap/log/scap-sync-2016-07-28-0007.log'
    21:14:10 [striker-deploy03] Started Deploy: striker/deploy
    21:14:10 [striker-deploy03]
    == DEFAULT ==
    :* striker-uwsgi01.striker.eqiad.wmflabs
    21:14:11 [striker-uwsgi01.striker.eqiad.wmflabs] Revision directory already exists (use --force to override)
    21:14:12 [striker-uwsgi01.striker.eqiad.wmflabs] Starting new HTTP connection (1): striker-deploy03.striker.eqiad.wmflabs
    21:14:13 [striker-uwsgi01.striker.eqiad.wmflabs] /srv/deployment/striker/deploy-cache/revs/47ea97dc38677aab79bd0c89f1e3ffe7fdc2cbfb is already live (use --force to override)
    21:14:13 [striker-deploy03] Finished Deploy: striker/deploy (duration: 00m 02s)
    

Errors I saw on initial provision

Yes Done https://gerrit.wikimedia.org/r/#/c/301403/ --

Error: Could not set 'file' on ensure: No such file or directory - /etc/firejail/mediawiki-imagemagick.profile20160726-18098-h7ugbr.lock at 45:/etc/puppet/modules/mediawiki/manifests/init.pp
Error: Could not set 'file' on ensure: No such file or directory - /etc/firejail/mediawiki-imagemagick.profile20160726-18098-h7ugbr.lock at 45:/etc/puppet/modules/mediawiki/manifests/init.pp
Wrapped exception:
No such file or directory - /etc/firejail/mediawiki-imagemagick.profile20160726-18098-h7ugbr.lock
Error: /Stage[main]/Mediawiki/File[/etc/firejail/mediawiki-imagemagick.profile]/ensure: change from absent to file failed: Could not set 'file' on ensure: No such file or directory - /etc/firejail/mediawiki-imagemagick.profile20160726-18098-h7ugbr.lock at 45:/etc/puppet/modules/mediawiki/manifests/init.pp

Yes Done https://gerrit.wikimedia.org/r/#/c/301404 --

Error: Could not set 'file' on ensure: No such file or directory - /etc/php5/apache2/php.ini20160726-18098-10h241o.lock at 21:/etc/puppet/modules/mediawiki/manifests/php.pp
Error: Could not set 'file' on ensure: No such file or directory - /etc/php5/apache2/php.ini20160726-18098-10h241o.lock at 21:/etc/puppet/modules/mediawiki/manifests/php.pp
Wrapped exception:
No such file or directory - /etc/php5/apache2/php.ini20160726-18098-10h241o.lock
Error: /Stage[main]/Mediawiki::Php/File[/etc/php5/apache2/php.ini]/ensure: change from absent to file failed: Could not set 'file' on ensure: No such file or directory - /etc/php5/apache2/php.ini20160726-18098-10h241o.lock at 21:/etc/puppet/modules/mediawiki/manifests/php.pp

Yes Done https://gerrit.wikimedia.org/r/#/c/301405 --

Error: Could not set 'file' on ensure: No such file or directory - /home/l10nupdate/.gitconfig20160726-18098-axu2js.lock at 81:/etc/puppet/modules/scap/manifests/l10nupdate.pp
Error: Could not set 'file' on ensure: No such file or directory - /home/l10nupdate/.gitconfig20160726-18098-axu2js.lock at 81:/etc/puppet/modules/scap/manifests/l10nupdate.pp
Wrapped exception:
No such file or directory - /home/l10nupdate/.gitconfig20160726-18098-axu2js.lock
Error: /Stage[main]/Scap::L10nupdate/File[/home/l10nupdate/.gitconfig]/ensure: change from absent to file failed: Could not set 'file' on ensure: No such file or directory - /home/l10nupdate/.gitconfig20160726-18098-axu2js.lock at 81:/etc/puppet/modules/scap/manifests/l10nupdate.pp

Yes Done https://gerrit.wikimedia.org/r/#/c/301408 --

Notice: /Stage[main]/Mediawiki::Scap/Exec[fetch_mediawiki]/returns: 22:25:55 pull failed: <CalledProcessError> Command '['sudo', '-u', 'mwdeploy', '-n', '--', '/usr/bin/rsync', '--archive', '--delete-delay', '--delay-updates', '--compress', '--delete', '--exclude=**/cache/l10n/*.cdb', '--exclude=*.swp', '--no-perms', '--exclude=**/.git', 'deployment-tin.eqiad.wmflabs::common', '/srv/mediawiki']' returned non-zero exit status 10
Notice: /Stage[main]/Mediawiki::Scap/Exec[fetch_mediawiki]/returns:
Error: /usr/bin/scap pull returned 70 instead of one of [0]
Error: /Stage[main]/Mediawiki::Scap/Exec[fetch_mediawiki]/returns: change from notrun to 0 failed: /usr/bin/scap pull returned 70 instead of one of [0]n

What's wrong here

  1. trebuchet and all of its baggage including a bunch of cloned repos
  2. Full MediaWiki scap setup
  3. l10nupdate (via MW scap)
  4. Full MW runtime setup (via MW scap)
  5. Why oh why is hiera('mediawiki::redis_servers::eqiad') embedded in role::memcached?
  6. Several resource ordering issues on initial provision