Create external table in Hive

Problem

Given several partitioned AVRO formatted files, together with the AVSC schema, we want to create a table in Hive.

We have hundreds of files in a directory partitioned by year and mont in the HDFS folder /data/mytable.db/mytable.
The folder structure is:
/data/mytable.db/mytable/Year=2018/month=11
/data/mytable.db/mytable/Year=2018/month=12
/data/mytable.db/mytable/Year=2019/month=1
/data/mytable.db/mytable/Year=2019/month=2
/data/mytable.db/mytable/Year=2019/month=3
/data/mytable.db/mytable/Year=2019/month=4

Continue reading

How to configure Jenking without GUI

It is possible to create the jobs in jenkins using Groovy.
You can create a Groovy script file $JENKINS_HOME/init.groovy, or any .groovy file in the directory $JENKINS_HOME/init.groovy.d/, to run some additional things right after Jenkins starts up.

In Ubuntu $JENKINS_HOME is by default configured in /var/lib/jenkins/

Continue reading

Exporting Docker images

Sometimes you want to move docker images from one machine to another and you don’t have a proper docker registry or it is not availabe.

Docker provides commands for that:

But these images can be really big in size, there is an alternative to save only the current container, no the image with all its historic data.

Continue reading

Remove line breaks in CSV

Problem:
You have a CSV lines with new line breaks and you need one line per entry.

Solution:

A simple solution could be:
sed ':a;N;$!ba;s/\r\n/ /g' myfile.csv > myfile_no_nline.csv
If the new lines inside the CSV body are Windows new lines (\r\n) and the CSV lines are UNIX (\n).

But normally you are not so lucky. So if your CSV new lines are between quotes (“) you can use the following command:
cat myfile.csv | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' > myfile_no_nline.csv