Homer
Homer (previously jnt) is our homemade network configuration manager.
It takes variables from Netbox and yaml files, run them through jinja templates to generate Juniper compatible configuration.
Homer can then send those configurations to selected network devices, for a diff or a safe commit.
The tool is written to not be Wikimedia specific. It only supports Junos but can easily be extended to other platforms.
Its doc is available on https://doc.wikimedia.org/homer/master/
Its code on Gerrit https://gerrit.wikimedia.org/g/operations/software/homer
Its bug and feature requests on Phabricator: https://phabricator.wikimedia.org/tag/homer/
This page focuses on Wikimedia's deployment.
Deployment
Homer is deployed via Puppet and a Cookbook to the cluster management hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
).
You can find its deploy repository here https://gerrit.wikimedia.org/g/operations/software/homer/deploy
And its Puppet module there https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/homer
In addition it's available on Pypi: https://pypi.org/project/homer/
Releasing a new version
If the only thing to release is the WMF plugin in the homer-deploy repo, you can do a minimal deploy without going through the whole process, merge the plugin changes in the deployment repo, then skip to Homer#In the deployment server to pull the changes in to that and deploy.
In the homer repository
- Make a release patch updating the CHANGELOG file in the (see this example patch).
- Once CI passes, meaning that the documentation can be generated correctly, +2 it and let CI merge it.
- Update the local checkout and make a git tag. Ideally an annotated one (requires a GPG key and have git configured to use it, see
signingkey
):
$ RELEASE=v0.1.0
$ git tag -s -a "${RELEASE}" -m "${RELEASE}" -m "[Release Notes](CHANGELOG.rst)"
- Push the generated tag:
git push origin "${RELEASE}"
In the homer-deploy repository
Update the src/
submodule with the new code:
$ cd src/
$ git fetch
$ git checkout "${RELEASE}"
$ git log -1 # to check to be at the right commit
$ cd ..
# At this point git status would show that there is a diff for the 'src' path, indicating the different SHA1 of the git submodule
(if the git checkout doesn't work, you can try git submodule update --init --recursive --remote
)
Now generate the new wheels:
# Ensure that docker is running
# Follow the instructions in the README file
$ cat README.md
# Verify that the generated wheels are correct
# At this point the frozen-requirements-bullseye.txt file will most likely have some changes and the artifacts/artifacts.bullseye.tar.gz will be different
git add .
git commit -m "Release ${RELEASE}"
git review
- One checked that is ok merge it on Gerrit (C+2, V+2 + submit)
In the deployment server
Now move to the deployment server and run the below commands. If you are only upgrading the wmf-plugin then you just need to do the git pull in the repo, and verify git satus is clean. No need to touch the src directory in that case.
$ RELEASE=v0.0.1 # UPDATE IT TO THE CURRENT RELEASE
$ cd /srv/deployment/homer/deploy
$ git pull
$ cd src/
$ git checkout ${RELEASE}
$ cd ..
$ git status # It should be clean without local modification
On any cumin host
Now move on one of the Cumin hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
) and to deploy the code run: sudo cookbook sre.deploy.python-code -r 'Release vX.Y.Z' homer 'A:cumin'
- Run a full Homer diff to check that everything works fine:
homer "*" diff
Daily diffs
A cron job runs Homer every 12h (24h per cumin hosts) to compare the live network configuration with our intended state. Any discrepancies is emailed to the rancid-core alias.
Usage ๐
Making changes to Homer is a 2-step process where we first commit the change, and then we deploy it.
Step 1: Making code changes
There are three different ways to modify Homer
Case 1: Editing the private repository
Manually edit then commit the files on ssh://cumin1002.eqiad.wmnet:/srv/homer/private .
git will sync them with the other cumin host. And will email a summary of the changes to SREs.
Make sure to mirror all your changes on the mock-private repo: https://gerrit.wikimedia.org/g/operations/homer/mock-private
This repository doesn't have CI, please be extra careful.
Case 2: Editing the public repository
Similar to our other public repositories, send CRs to https://gerrit.wikimedia.org/g/operations/homer/public , try not to self-+2 your changes without other review. A +2 will automatically, merge your change.
Note: Puppet deploys the public repository, kick off a manual puppet run on cumin hosts to grab the latest code
Its documentation is published at https://doc.wikimedia.org/homer-public/master/.
Case 3: Editing Netbox
Data is also pulled from Netbox, always make sure that Netbox is accurate before using Homer.
Part 2: Deploying above changes
Note that Homer explicitly asks you when its about to modify the live network configuration (Type "yes" to commit, "no" to abort.
) and will prompt you with a diff of the changes beforehand.
There are 2 ways to deploy changes:
- Either use our cluster management hosts (highly recommended)
- Use your local workstation (when you really know what you are doing)
Option 1: Running Homer from cluster management hosts (recommended)
Get familiar with the command line: https://doc.wikimedia.org/homer/master/homer.html everything else is taken care of
- Log into one of the cluster management hosts (
cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
). - Kick off a manual puppet run on the cluster management hosts to grab the latest code
- Run a diff (examples below) to check your changes
- Run a commit (examples below)
Some examples:
$ homer "*" diff
All devices$ homer "cr*ams*" diff
esams and knams core routers$ homer "mr*" commit "My commit message"
All management routers
Note: When pushing configurations, homer will ssh to the network devices using the Homer user. You need to be in the ops group to be able to use its private key.
Option 2: Running Homer from your local machine (less recommended)
- Clone the public repo: https://gerrit.wikimedia.org/g/operations/homer/public
- Clone private repo: ssh://cumin1001.eqiad.wmnet:/srv/homer/private
- Clone deploy repo: https://gerrit.wikimedia.org/g/operations/software/homer/deploy
- Install Homer with either:
pip install homer
https://gerrit.wikimedia.org/g/operations/software/homer
+python3 setup.py install
(if you live on the edge)
- Make the plugins included in the deploy repo available in the Python path:
- If homer's code is checked out, just create a symlink in the root's of homer's checkout to the
homer_plugins/
directory in the deploy repo. If they are all checkout in the same root directory, from within the homer's checkout run:ln -s ../homer-deploy/plugins/ homer_plugins
- If homer is installed via pip, find the
site_packages
directory where homer is installed, usually something likevenv/lib/python3.X/site-packages/
and add there a symlink to the plugins likeln -s /PATH_TO_DEPLOY_REPO/plugins/ homer_plugins
.
- If homer's code is checked out, just create a symlink in the root's of homer's checkout to the
- Create your configuration file to match https://doc.wikimedia.org/homer/master/configuration.html
- Including the plugin setup:
homer_plugins.wmf-netbox
- Including the plugin setup:
- Get familiar with the command line: https://doc.wikimedia.org/homer/master/homer.html
When pushing configurations, your machine will ssh directly to the network devices, which mean that you have to have an account there, with the proper permissions.
It's common to test a change locally with the "diff" option. Once satisfied with the result, please merge your change on Gerrit before pushing them with the "commit" action.
Style guides
YAML files
We use json-schema to both prevent mistakes in the configuration, as well as document it.
https://doc.wikimedia.org/homer-public/master/
Templates
https://j2live.ttl255.com/ Is a useful tool to test jinja snippets
Capirca (ACL generation)
Task: https://phabricator.wikimedia.org/T273865
Capirca is an actively maintained open source tool made by Google to generate multi-platform ACLs based on generic policy and definitions files.
How it works?
- User edits relevant files (see below)
- As well as runs https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/
- User run Homer
- Homer pulls the hosts definitions from Netbox
- Homer executes Capirca for each relevant policy files (defined in
homer-{public|private}/config/{devices|roles}.yaml
) - Capirca takes all the (hosts/services) definition files as input, as well as the policy files (while following the includes) and generates the firewall rules in the proper format
- Homer adds the previously generated file to the other parts of the generated config and pushes it to the device
Advantages
- IPv4 and IPv6 filters will be updated automatically as long as the hosts have both v4 and v6 records
- Limited blast radius if a mistake is done in a given .inc policy file
- Centralized services (ports) definitions in text file
- Hosts definitions synced up from Netbox
- Same syntax for all platforms (Juniper and JuniperSRX in our case)
- Shading detection (eg. useless rules hidden behind a more generic one)
- Reduced operational complexity
- Easier to audit
Limitations
- Dependency on Netbox
- Increased setup complexity
- Doesn't distinguish between v4 and v6 prefix-lists, that means to leverage the auto-generation of both v4 and v6 we have to specify both prefix-lists
[edit firewall family inet filter loopback4 term return-tcp from source-prefix-list] wikimedia4 { ... } + wikimedia6;
- Use prefix-lists when the prefixes will be used in routing (eg. BGP) rules as well. So they're only defined once
- Can't have jinja2 applied to it, so filters like ping-offload needs to stay out of Capirca
- Netbox definitions need to be manually updated by running https://netbox-next.wikimedia.org/extras/scripts/capirca.GetHosts/
- Upstream issues:
How to use it?
Update an existing ACL
- Browse
homer-{public|private}/policies
- Find the relevant
.pol
or.inc
file (eg.cr-analytics.pol
) - Update it (use existing rules and guidelines as models)
Guidelines
term my-term {
comment:: "T123456" # Don't forget the quotes
destination-address:: foo # from either static.net or Netbox
destination-port:: bar # From the services.svc file
action:: deny
}
term allow_rest {
action:: accept # All our platforms have a default deny
}
{source|destination}-port::
are defined in in thehomer-public/definitions/services.svc
file, add yours if it's not already there. Ordered by port numbers.- Most
{source|destination}-address::
are pulled from Netbox and grouped by their hostname prefix (eg. allaqs*
hosts are underaqs_group
)- To update a group (eg. provisioning/deprogramming a host), run https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ then run Homer.
- For network prefixes and special IPs (eg. VIPs), add them to
homer-public/definitions/static.net
.
Add a new ACL (firewall filter)
Most likely for a Netops.
- Create a
.pol
file inhomer-{public|private}/policies
(eg.my-filter.pol
), see guidelines below - Reference the above policy file in either:
homer-{public|private}/config/{devices|roles}.yaml
(recommended)capirca: - my-filter # The policy file name without the extention
- Another
.pol
file with#include 'my-filter.pol'
Guidelines
- Example juniper headers
header { comment:: "foobar" target:: juniper my-filter4 inet target:: juniper my-filter6 inet6 } # juniper: platform (which final syntax to use) # my-filter4: the juniper filter name that will be generated # inet: IP family to target (ipv4 s. ipv6)
- Example SRX security policies headers
header { comment:: "Generated by Capirca" target:: srx from-zone production to-zone production address-book-global } # srx: platform # from/to security-zones (they need to already exist) # Use global address-book (default everywhere in our infra)
- If you have to have different terms for v4 and v6 policies, put all the common policies in a
.inc
file, then include it before/after the specific term. For example:header { target:: juniper border-in4 inet } term offload-ping { verbatim:: juniper "term offload-ping4 {" verbatim:: juniper " filter offload-ping4;" verbatim:: juniper "}" } #include 'cr-border-in.inc' header { target:: juniper border-in6 inet6 } #include 'cr-border-in.inc'
- If a specific Juniper syntax is not supported by Capirca, use the
verbatim::
keyword, that will be copied as-is.
Common errors
Error parsing cr: No such service, foo
- There is a
{source|destination}-port:: foo
incr.pol
(or one of its child includes) not defined inservices.svc
.
- There is a
Error parsing cr-analytics: UNDEFINED: puppetmaster
- There is a
{source|destination}-address:: puppetmaster
incr-analytics.pol
(or one of its child includes) not definedstatic.net
or Netbox. - See https://netbox.wikimedia.org/extras/scripts/#script.GetHosts "Last run" then "output" tab to see the Netbox generated definitions.
- There is a
Error parsing cr:ย ERROR on "udp" (type STRING, line 38, Next 'destination-port'). Error parsing cr-analytics:ย ERROR on "T274951" (type STRING, line 312, Next 'destination-address')
- Most common cause is forgetting the double semi-colon
::
or forgetting the quotes around a comment. - Note that it shows the next line, and the lines don't always match if there are includes.
- Most common cause is forgetting the double semi-colon
Multiple definitions found for service:ย git-ssh.eqiad
- The service is defined twice, either in
services.svc
or in network definitions (static.net
or Netbox)
- The service is defined twice, either in
References
Capirca policy format (which keywords are accepted?)
Network configuration coverage
CR
TODO
chassis {} (partial)
routing-options {} # TODO: statics
protocols {
router-advertisement {}
bgp {} # TODO: confed. IXPs are out of scope (dedicated tool like peering-manager)
}
MR
TODO
routing-instances {}
CLOUDSW
TODO
bgp {}
routing-options {}
Common/known issues
(Almost) None.
- The "commit" action will not work on the first try with the mr1* devices, but homer will retry.