User:Eevans/Notes/JavaHeapAnalysis
Using the Eclipse Memory Analyzer
When analyzing a Java heap, you may need nearly as much available memory as the size of the heap dump. This can be a problem because (on the RESTBase cluster at least), heap sizes can run into the 10s of GB, (definitely more than I have to spare on local workstations). To make matters worse, most of the available tools require some sort of GUI.
What follows is a recipe for running the Eclipse Memory Analyzer on a remote machine over VNC.
Note: If you are provisioning an instance in labs, it may be necessary to request to have your project's quota increased to accommodate it (for example).
Preparing the remote machine
$ sudo apt-get update && sudo apt-get -y upgrade
...
$ sudo apt-get install xfce4 xfce4-goodies gnome-icon-theme tightvncserver xfonts-base
# Save as fabfile.py
# fab -H mat.services-testbed.eqiad.wmflabs
from fabric.api import sudo, env
env.use_ssh_config = True
def setup():
sudo("apt-get update")
sudo("apt-get -y upgrade")
sudo("apt-get -y install openjdk-8-jdk xfce4 xfce4-goodies gnome-icon-theme tightvncserver xfonts-base unzip")
# Not idempotent
vg = sudo("vgdisplay |grep 'VG Name' |awk '{ print $3; }'")
sudo("lvcreate -L 100G -n extra {}".format(vg))
sudo("mkfs.ext4 /dev/mapper/{}-extra".format(vg))
sudo("mount /dev/mapper/{}-extra /mnt".format(vg))
Starting VNC
$ vncserver
...
The first time you run the server, you will prompted for a password (remember it, you will need it when connecting). By default the server listens on port 5901
.
Port-forward
If running from a labs instance, you won't be able to connect to VNC directly from your local machine, so SSH into the instance with port 5901
forwarded.
$ ssh -L 5901:localhost:5901 mat.services.eqiad.wmflabs
Profit
Connect using a VNC client to the locally forwarded port.
Shaving The Yak
In most instances, analyzing the heap in situ won't be an option, which means copying the dump elsewhere. The only ready-to-go way of doing this is rsync-over-ssh, and since agent forwarding is disabled on the WMF production network, this means copying the heap to your workstation over the WAN (of course, this won't be an option if the heap may contain PII). Once on your workstation, you can copy the heap dump over the WAN again to a Labs VM for analysis.
If your heap sizes are say 12G (as they are in the RESTBase environment), you'll need an m1.xlarge instance with 16G of memory. Chances are that you'll have to request an increase in quota to accommodate such a large instance. The Labs team has a process that requires some review, so if this occurs near a holiday, the request may be delayed accordingly. The increase will only be temporary, so make sure to let Labs know when you're done, so that they can revert the quota.
Storage
If you have more than one heap dump to analyze, get ready from some additional calisthenics, because an m1.xlarge instance only has 20G of storage. Fortunately, each VM has a 140G volume group available:
# vgdisplay
--- Volume group ---
VG Name vd
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 140.50 GiB
PE Size 4.00 MiB
Total PE 35967
Alloc PE / Size 25600 / 100.00 GiB
Free PE / Size 10367 / 40.50 GiB
VG UUID tk94FM-NTas-Vixb-zeeI-j3ue-lEIt-Cc5CL1
You can create a new volume from this group, format, and mount it to create some additional space:
# lvcreate -L 100G -n extra vd
# mkfs.ext4 /dev/mapper/vd-extra
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 85012615-cb21-4204-8e39-eed1f3d862b9
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
# mount /dev/mapper/vd-extra /mnt
Irony avoidance
When working with large heaps, the Eclipse Memory Analyzer can easily OOM (resulting in another heap dump). To avoid this, you need to increase the max heap size, like so:
$ mat/MemoryAnalyzer -vmargs -Xmx12g