Nova Resource Talk:Mwoffliner

From Wikitech
Jump to: navigation, search

This documentation describe how to setup a virtual machine (VM) able to create ZIM files from Wikimedia projects.

Virtual machine creation

  • Create a new VM with max CPU/Storage in mwoffliner" project
  • on wikitech, ‘configure instance’ and then select the labs::lvm::srv class. This will create /srv (if nothing happens, then "sudo puppet agent -tv")
  • Ask admin to configure a public IP
  • Create a new hostname "mwofflinerX.wmflabs.org" here

Setup locale

sudo locale-gen en_US.UTF-8
sudo update-locale LANG=en_US.UTF-8
sudo dpkg-reconfigure locales

Optimize filesystem

Then reformat the /src filesystem, this is necessary to get more inodes:

sudo umount /dev/mapper/vd-second--local--disk
sudo mkfs.ext4 -T news /dev/mapper/vd-second--local--disk
sudo mount /srv
sudo rm -rf /srv/lost+found/

As root, put the following code in the crontab:

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
# m h  dom mon dow   command
@reboot umount /srv ; mount -o rw,errors=remount-ro,noatime,barrier=0,data=writeback,nobh /srv

Ubuntu basic setup

Make a dist-upgrade:

sudo apt-get update
sudo apt-get dist-upgrade

Install mandatory packages:

sudo apt-get install bc nginx git htop screen rsync unzip g++ p7zip-full libzim-dev automake libtool pkg-config libmagic-dev \
     redis-server emacs imagemagick advancecomp gifsicle pngquant nscd liblog-log4perl-perl zip cmake uuid-dev \
     texinfo xapian-tools jpegoptim zlib1g-dev iotop libicu-dev icu-devtools

Increase standards system limits

Edit /etc/security/limits.conf

*                -       nofile        unlimited
*                -       stack         unlimited

Edit /etc/sysctl.conf and add at the end of the file

vm.overcommit_memory = 1

Checkout Kiwix code

Clone these online code repositories:

sudo git clone https://github.com/kiwix/kiwix.git /srv/kiwix
sudo git clone https://github.com/kiwix/mwoffliner.git /srv/mwoffliner
sudo git clone https://github.com/kiwix/kiwix-tools /srv/tools
sudo git clone https://github.com/kiwix/maintenance.git /srv/maintenance
sudo git clone https://gerrit.wikimedia.org/r/p/openzim.git

Install Xapian 1.4.1

The lastest version of Xapian is necessary

wget http://download.kiwix.org/dev/xapian-core-1.4.1.tar.xz
tar -xvf xapian-core-1.4.1.tar.xz
cd xapian-core-1.4.1
./autogen.sh
./configure
make
sudo make install

Install Libgumbo

Gumbo is necessary for zimwriterfs

git clone https://github.com/google/gumbo-parser.git
cd gumbo-parser
./autogen.sh
./configure
make
sudo make install

Install and compile zimwriterfs

zimwriterfs is called my mwoffliner.js and transform a HTML directory in a ZIM file:

cd /srv/openzim/zimwriterfs/
./autogen.sh
./configure
make
sudo make install

Configure redis

Edit /etc/redis/redis.conf

unixsocket /dev/shm/redis.sock
unixsocketperm 777
save ""
appendfsync no

Compile & install Node.js

mwoffliner.sh needs Node.js (for example 0.12):

cd /tmp
wget http://nodejs.org/dist/v0.12.2/node-v0.12.2.tar.gz
tar -xvf node-v0.12.2.tar.gz
cd node-v0.12.2/
./configure
make
sudo make install
cd /tmp
rm -rf node-v0.12.2*

Install mwoffliner dependences

mwoffliner is responsible for dump the HTML from Mediawiki/Parsoid API:

cd /srv/kiwix-other/mwoffliner/
sudo npm install -g node-gyp
export LINK=g++
npm install

Create download.kiwix.org mirror

This mirror is needed to avoid too much network traffic.

Create a directory (this has to be done only on "mwoffliner1"):

sudo mkdir -p /data/scratch/mwoffliner/download.kiwix.org/

Make a first sync (this has to be done only on "mwoffliner1"):

rsync -vzrlptD --delete download.kiwix.org::download.kiwix.org/dev download.kiwix.org::download.kiwix.org/bin download.kiwix.org::download.kiwix.org/src /data/scratch/mwoffliner/download.kiwix.org

Configure cron (this has to be done only on "mwoffliner1"):

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
# m h  dom mon dow   command
*/5 * * * * flock -n /tmp/download.kiwix.org.rsync.lock -c "rsync -vzrlptD --delete download.kiwix.org::download.kiwix.org/dev download.kiwix.org::download.kiwix.org/bin download.kiwix.org::download.kiwix.org/src /data/scratch/mwoffliner/download.kiwix.org"

Create the download.kiwix.org virtualhost configuration at /etc/nginx/sites-available/download_dev_mirror:

server {
listen 127.0.0.1:80;
  server_name download_dev_mirror;
  location / {
    alias /data/scratch/mwoffliner/download.kiwix.org/;
    autoindex on;
  }
}

... and enable it:

cd /etc/nginx/sites-enabled/
sudo ln -s ../sites-available/download_dev_mirror .

Finally, update the /etc/hosts by adding the following line:

127.0.0.1 download_dev_mirror

Make a link (necessary for build_portable_package.sh):

sudo ln -s /data/scratch/mwoffliner/download.kiwix.org/ /srv/

Create upload directories

Prepare upload directories like this:

sudo mkdir -p /srv/upload/zim/
sudo mkdir -p /srv/upload/portable/
sudo mkdir -p /srv/upload/zim2index/wikipedia/
sudo mkdir -p /srv/upload/zim2index/wiktionary/
sudo mkdir -p /srv/upload/zim2index/wikiquote/
sudo mkdir -p /srv/upload/zim2index/wikibooks/
sudo mkdir -p /srv/upload/zim2index/wikisource/
sudo mkdir -p /srv/upload/zim2index/wikinews/
sudo mkdir -p /srv/upload/zim2index/wikiversity/
sudo mkdir -p /srv/upload/zim2index/wikispecies/
sudo mkdir -p /srv/upload/zim2index/wikivoyage/
sudo mkdir -p /srv/upload/tmp

Configure rsync

Configure rsync deamon by putting following content to /etc/rsyncd.conf:

log file = /var/log/rsync.log
max connections = 15
timeout = 100

[mwofflinerX.wmflabs.org]
path = /srv/upload
comment = kiwix upload directory
list = no
uid = kelson
gid = wikidev
read only = false
hosts allow = 62.210.143.55

Configure rsync to work as a daemon, activate it in /etc/default/rsync:

RSYNC_ENABLE=true

Compile & install kiwix-install

kiwix-install is needed to prepare portable packages:

cd srv/kiwix-kiwix/
./autogen.sh
./configure --enable-compileall --enable-staticbins --disable-android
cd src/dependencies/
make
cd /srv/kiwix-kiwix
./configure --enable-compileall --enable-staticbins --disable-android
cd src/installer
make
sudo make install

Install kiwix-compact

kiwix-compact is xapian-compact based tool allowing to compact fulltext search engine. Install it:

cd /srv/kiwix-kiwix/kiwix/
sudo cp kiwix-compact /usr/local/bin/

Configure cron

There are the jobs which are run periodically:

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
# m h  dom mon dow   command
*/6 * * * * flock -n /tmp/build_portable.lock -c "/srv/kiwix-maintenance/maintenance_tools/build_portable_packages.sh"

Reboot

VM needs to be rebooted to apply all changes:

sudo reboot