How does Spark decide the number of tasks and number of tasks to execute in parallel?
August 4, 2021What does “Stage Skipped” mean in Apache Spark web UI?
August 9, 2021It is a pretty common use case to export the contents of a Hive table into a CSV file. It’s pretty simple if you are using a recent version of Hive. In this post, we will see who to achieve this with both newer and older versions of Hive.
Hive version 11 or higher
If you are using Hive version 11 or higher below command will do the job. ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ dictates that the columns should be delimited by a comma.
INSERT OVERWRITE LOCAL DIRECTORY '/home/hirw/sales ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from sales_table;
Hive versions older than 11
By default selecting the hive table and writing to a file like below will result in a tab separated file and of course this is not what you want as you want a comma separated file.
hive -e 'select * from sales_table' > /home/hirw/sales.tsv
With below, you are selecting the table and piping the results to sed command and passing a regex expression.
hive -e 'select * from sales_table' | sed 's/[\t]/,/g' > /home/hirw/sales.csv
The regex expression does a global match of all tab character [t] and replace with a ,
This approach is not preferred as you might get inconsistent results when dealing with huge datasets. Also you are combing Hive and sed (Linux) tools to get the job done which in our opinion is not a clean approach.