How to save Spark DataFrame directly to a Hive table?

What is the difference between NameNode and Secondary NameNode?

May 28, 2021

How does Spark decide the number of tasks and number of tasks to execute in parallel?

August 4, 2021

Published by Big Data In Real World at May 31, 2021

Solution 1

Create Hivecontext

import org.apache.spark.sql.hive.HiveContext; 
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());

df is the result dataframe you want to write to Hive. Below will write the contents of dataframe df to sales under the database sample_db. Since we are using the SaveMode Overwrite the contents of the table will be overwritten.

df.write().mode(SaveMode.Overwrite).saveAsTable("sample_db.sales");

Solution 2

df.createOrReplaceTempView("temp_table")

With below we are creating a table named sales by selecting the content of temp_table. Sales will have the same structure as temp_table.

sqlContext.sql("create table sample_db.sales as select * from temp_table");

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to save Spark DataFrame directly to a Hive table?

What is the difference between NameNode and Secondary NameNode?

How does Spark decide the number of tasks and number of tasks to execute in parallel?

What is the difference between NameNode and Secondary NameNode?

How does Spark decide the number of tasks and number of tasks to execute in parallel?

Solution 1

Solution 2

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?