How to avoid a Broadcast Nested Loop join in Spark?
February 26, 2021Difference between EBS, S3 and Glacier in AWS?
March 3, 2021This is a common problem because most of the data files that come from the legacy system will contain a header in the first row. This post will provide a quick solution to skip the first row from the files when read by Hive.
Solution
This solution works for Hive version 0.13 and above.
Note the tblproperties below. We have set skip.header.line.count to 1. This means the first line in the files behind the tables will be skipped.
create external table employee (id int, name string) lines terminated by '\n' location '/user/hirw/employees’ tblproperties ("skip.header.line.count"="1");
Similarly, skip.footer.line.count
will skip the lines in the footer of the files behind the Hive table.