Ignoring updates in git

Some times you create a file like password.properties, that you commit with fake values. But then you want to update the content of that file and be sure that is never pushed to the remote.

To do that you have to follow the next steps:

  1. Create the fake password.properties with fake data.
  2. Commit the changes
  3. Add password.properties to your .gitignore
  4. Avoid future git updates with: git update-index –assume-unchanged password.properties

Create external table in Hive

Problem

Given several partitioned AVRO formatted files, together with the AVSC schema, we want to create a table in Hive.

We have hundreds of files in a directory partitioned by year and mont in the HDFS folder /data/mytable.db/mytable.
The folder structure is:
/data/mytable.db/mytable/Year=2018/month=11
/data/mytable.db/mytable/Year=2018/month=12
/data/mytable.db/mytable/Year=2019/month=1
/data/mytable.db/mytable/Year=2019/month=2
/data/mytable.db/mytable/Year=2019/month=3
/data/mytable.db/mytable/Year=2019/month=4

Continue reading

How to configure Jenking without GUI

It is possible to create the jobs in jenkins using Groovy.
You can create a Groovy script file $JENKINS_HOME/init.groovy, or any .groovy file in the directory $JENKINS_HOME/init.groovy.d/, to run some additional things right after Jenkins starts up.

In Ubuntu $JENKINS_HOME is by default configured in /var/lib/jenkins/

Continue reading

Exporting Docker images

Sometimes you want to move docker images from one machine to another and you don’t have a proper docker registry or it is not availabe.

Docker provides commands for that:

But these images can be really big in size, there is an alternative to save only the current container, no the image with all its historic data.

Continue reading

Remove line breaks in CSV

Problem:
You have a CSV lines with new line breaks and you need one line per entry.

Solution:

A simple solution could be:
sed ':a;N;$!ba;s/\r\n/ /g' myfile.csv > myfile_no_nline.csv
If the new lines inside the CSV body are Windows new lines (\r\n) and the CSV lines are UNIX (\n).

But normally you are not so lucky. So if your CSV new lines are between quotes (“) you can use the following command:
cat myfile.csv | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' > myfile_no_nline.csv