Create external table in Hive

Problem

Given several partitioned AVRO formatted files, together with the AVSC schema, we want to create a table in Hive.

We have hundreds of files in a directory partitioned by year and mont in the HDFS folder /data/mytable.db/mytable.
The folder structure is:
/data/mytable.db/mytable/Year=2018/month=11
/data/mytable.db/mytable/Year=2018/month=12
/data/mytable.db/mytable/Year=2019/month=1
/data/mytable.db/mytable/Year=2019/month=2
/data/mytable.db/mytable/Year=2019/month=3
/data/mytable.db/mytable/Year=2019/month=4

Solution

Assuming that the AVSC file is: /data/schemas/mytable_schema.avsc
and that the AVRO files are in: /data/mytable.db/mytable

We can create a Hive table in the mydb schema with reference to the former files:

CREATE EXTERNAL TABLE `mydb.mytable`(
`name` string COMMENT '',
`surname` string COMMENT '',
`phone` string COMMENT '',
`score` int COMMENT ''
)
PARTITIONED BY (
`year` int,
`month` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.url'='hdfs:///data/schemas/mytable_schema.avsc')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'hdfs:///data/mytable.db/mytable'
TBLPROPERTIES (
'transient_lastDdlTime'='1566997300');

After that you need to execute MSCK REPAIR TABLE:

Use this statement on Hadoop partitioned tables to identify partitions that were manually added to the distributed file system (DFS). This statement (a Hive command) adds metadata about the partitions to the Hive catalogs.

MSCK REPAIR TABLE mydb.mytable

If you are goint to use Impala you need to repair the metastore with:

invalidate metadata mydb.mytable;

Now you can execute queries:

select * from mydb.mytable limit 10

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s