What is the difference between NameNode and Secondary NameNode?
May 28, 2021How does Spark decide the number of tasks and number of tasks to execute in parallel?
August 4, 2021It is a very common use case to process the data in Spark and save the processed data or Spark dataframe directly into a Hive table.
There are a couple ways to achieve this.
Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>
Solution 1
Create Hivecontext
import org.apache.spark.sql.hive.HiveContext; HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());
df is the result dataframe you want to write to Hive. Below will write the contents of dataframe df to sales under the database sample_db. Since we are using the SaveMode Overwrite the contents of the table will be overwritten.
df.write().mode(SaveMode.Overwrite).saveAsTable("sample_db.sales");
Solution 2
Register the dataframe df to a temporary view name temp_table in Spark
df.createOrReplaceTempView("temp_table")
With below we are creating a table named sales by selecting the content of temp_table. Sales will have the same structure as temp_table.
sqlContext.sql("create table sample_db.sales as select * from temp_table");