How to skip the first line or header when reading a file in Hive? - Big Data In Real World

How to skip the first line or header when reading a file in Hive?

How to avoid a Broadcast Nested Loop join in Spark?
February 26, 2021
Difference between EBS, S3 and Glacier in AWS?
March 3, 2021
How to avoid a Broadcast Nested Loop join in Spark?
February 26, 2021
Difference between EBS, S3 and Glacier in AWS?
March 3, 2021

This is a common problem because most of the data files that come from the legacy system will contain a header in the first row. This post will provide a quick solution to skip the first row from the files when read by Hive.

Solution

This solution works for Hive version 0.13 and above.

Note the tblproperties below. We have set skip.header.line.count to 1. This means the first line in the files behind the tables will be skipped.

create external table employee (id int, name string) 
lines terminated by '\n' 
location '/user/hirw/employees’ 
tblproperties ("skip.header.line.count"="1");

Similarly, skip.footer.line.count will skip the lines in the footer of the files behind the Hive table.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to skip the first line or header when reading a file in Hive?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X