User:Milimetric/Zarchive/Gobblin
Appearance
Totally untested right now, just notes from when I set this up:
If using a stat box to build, put this in ~/.gradle/gradle.properties:
org.gradle.daemon=true systemProp.https.proxyHost=webproxy.eqiad.wmnet systemProp.https.proxyPort=8080 systemProp.http.proxyHost=webproxy.eqiad.wmnet systemProp.http.proxyPort=8080 systemProp.http.nonProxyHosts=localhost|127.0.0.1
Clone and customize the gobblin flavor:
git clone https://github.com/apache/gobblin.git ~/gobblin vim ~/gobblin/gobblin-distribution/gobblin-flavor-custom-kafka1.gradle
# make it look like this: dependencies { compile project(':gobblin-example') compile project(':gobblin-modules:gobblin-crypto-provider') compile project(':gobblin-modules:gobblin-kafka-08') compile project(':gobblin-modules:gobblin-kafka-1') compile project(':gobblin-modules:google-ingestion') compile project(':gobblin-modules:gobblin-elasticsearch') }
~/gobblin/gradlew -PgobblinFlavor=custom-kafka1 -PhadoopVersion=2.10.1 -x test -x rat -x javadoc build
Then extract the build on a stat box, make a pull file, and run:
# make a setup_gobblin file that looks like this: export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF=/etc/hadoop/conf
source setup_gobblin (s)cp ~/gobblin/build/gobblin-distribution/distributions/apache-gobblin-incubating-bin-0.15.0.tar.gz <to a stat box somewhere> <go there> tar -xvf apache-gobblin-incubating-bin-0.15.0.tar.gz cd gobblin-dist ./bin/gobblin.sh cli run oneShot -appConf file:///your_pull_file.pull
# example pull file that works on page_move: # add/change any classes you want to use as extractors etc. for testing job.name=GobblinKafkaTest job.group=GobblinKafka job.description=Gobblin test for Kafka job.lock.enabled=false kafka.brokers=kafka-jumbo1001.eqiad.wmnet:9093 topic.whitelist=eqiad.mediawiki.page-move source.class=org.apache.gobblin.source.extractor.extract.kafka.MilimetricKafkaDeserializerSource kafka.deserializer.type=BYTE_ARRAY gobblin.kafka.consumerClient.class=org.apache.gobblin.kafka.client.Kafka1ConsumerClient$Factory source.kafka.value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer source.kafka.security.protocol=SSL source.kafka.ssl.cipher.suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 # Other ssl settings #source.kafka.ssl.ca.location=/var/lib/puppet/ssl/certs/ca.pem #source.kafka.ssl.curves.list=P-256 #source.kafka.ssl.sigalgs.list=ECDSA+SHA256 #source.kafka.ssl.ca.location=/etc/ssl/certs/Puppet_Internal_CA.pem extract.namespace=org.wikimedia.analytics.test writer.builder.class=org.apache.gobblin.writer.SimpleDataWriterBuilder simple.writer.delimiter=\n writer.file.path.type=tablename writer.destination.type=HDFS writer.output.format=txt data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher metrics.reporting.file.enabled=true metrics.log.dir=/user/milimetric/test_gobblin/metrics metrics.reporting.file.suffix=txt bootstrap.with.offset=earliest fs.uri=hdfs://analytics-hadoop/ writer.fs.uri=hdfs://analytics-hadoop/ state.store.fs.uri=hdfs://analytics-hadoop/ mr.job.max.mappers=2 mr.job.root.dir=/user/milimetric/test_gobblin/working state.store.dir=/user/milimetric/test_gobblin/state-store task.data.root.dir=/user/milimetric/test_gobblin/task-data data.publisher.final.dir=/user/milimetric/test_gobblin/GobblinKafkaTest data.publisher.metadata.output.dir=/user/milimetric/test_gobblin/task-metadata # weird jobFailure log towards the end of output, searched but wasn't obvious: https://github.com/apache/gobblin/search?q=onJobFailure