How to skip the first line or header when reading a file in Hive?

How to avoid a Broadcast Nested Loop join in Spark?

February 26, 2021

Difference between EBS, S3 and Glacier in AWS?

March 3, 2021

Published by Big Data In Real World at March 1, 2021

Solution

This solution works for Hive version 0.13 and above.

Note the tblproperties below. We have set skip.header.line.count to 1. This means the first line in the files behind the tables will be skipped.

create external table employee (id int, name string) 
lines terminated by '\n' 
location '/user/hirw/employees’ 
tblproperties ("skip.header.line.count"="1");

Similarly, skip.footer.line.count will skip the lines in the footer of the files behind the Hive table.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to skip the first line or header when reading a file in Hive?

How to avoid a Broadcast Nested Loop join in Spark?

Difference between EBS, S3 and Glacier in AWS?

How to avoid a Broadcast Nested Loop join in Spark?

Difference between EBS, S3 and Glacier in AWS?

Solution

Big Data In Real World

Related posts

How to transpose or convert columns to rows in Hive?

How to fail a Hive script based on a condition?

How to delete duplicate data from the Hive table?