How to export a Hive table into a CSV file?

How does Spark decide the number of tasks and number of tasks to execute in parallel?

August 4, 2021

What does “Stage Skipped” mean in Apache Spark web UI?

August 9, 2021

Published by Big Data In Real World at August 6, 2021

Hive version 11 or higher

If you are using Hive version 11 or higher below command will do the job. ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ dictates that the columns should be delimited by a comma.

INSERT OVERWRITE LOCAL DIRECTORY '/home/hirw/sales 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
select * from sales_table;

Hive versions older than 11

By default selecting the hive table and writing to a file like below will result in a tab separated file and of course this is not what you want as you want a comma separated file.

hive -e 'select * from sales_table' > /home/hirw/sales.tsv

With below, you are selecting the table and piping the results to sed command and passing a regex expression.

hive -e 'select * from sales_table' | sed 's/[\t]/,/g' > /home/hirw/sales.csv

The regex expression does a global match of all tab character [t] and replace with a ,

This approach is not preferred as you might get inconsistent results when dealing with huge datasets. Also you are combing Hive and sed (Linux) tools to get the job done which in our opinion is not a clean approach.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to export a Hive table into a CSV file?

How does Spark decide the number of tasks and number of tasks to execute in parallel?

What does “Stage Skipped” mean in Apache Spark web UI?

How does Spark decide the number of tasks and number of tasks to execute in parallel?

What does “Stage Skipped” mean in Apache Spark web UI?

Hive version 11 or higher

Hive versions older than 11

Big Data In Real World

Related posts

How to transpose or convert columns to rows in Hive?

How to fail a Hive script based on a condition?

How to delete duplicate data from the Hive table?