How to insert into a Hive table when the table has columns with complex types like Arrays? - Big Data In Real World

How to insert into a Hive table when the table has columns with complex types like Arrays?

How to see the current database user is on with Hive cli or Beeline?
January 19, 2022
What is the difference between get_json_object and json_tuple functions in Hive?
February 2, 2022
How to see the current database user is on with Hive cli or Beeline?
January 19, 2022
What is the difference between get_json_object and json_tuple functions in Hive?
February 2, 2022

Let’s consider the below table employee_depts  with 2 columns – ename and dept_list. dept_list is of type array and has the list of departments.

 CREATE TABLE IF NOT EXISTS employee_depts (ename string, dept_list Array<int>) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
 COLLECTION ITEMS TERMINATED BY ',' 
 STORED AS TEXTFILE; 

Problem

When we attempt to insert in to the employee_depts table with the below INSERT, the insert operation will fail with  TOK_FUNCTION not supported in insert/values SemanticException.

The below insert statement is syntactically correct but doesn’t with tables with complex columns like Arrays.

 INSERT INTO TABLE employee_depts (ename, dept_list) VALUES ('Tom', array(10, 20));
 INSERT INTO TABLE employee_depts (ename, dept_list) 
 VALUES ('Tom', array(10, 20));
 Error: Error while compiling statement: FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values (state=42000,code=10293) 

Solution

Change INSERT INTO table VALUES syntax to INSERT INTO table SELECT syntax like below

Here the employee table in SELECT could be any table with just one record. We are not selecting any column from the employee table. It is a dummy or a placeholder table with atleast one record.

 INSERT INTO employee_depts
 SELECT "Tom", array(10, 20)
 FROM employee LIMIT 1;
 INFO  : Session is already open
 INFO  : Dag name: INSERT INTO employee_depts
 SELECT "Tom",...1(Stage-1)
 INFO  : Status: Running (Executing on YARN cluster with App id application_1604763385917_0005)
 --------------------------------------------------------------------------------
         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
 --------------------------------------------------------------------------------
 Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
 Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
 --------------------------------------------------------------------------------
 VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 23.10 s
 --------------------------------------------------------------------------------
 INFO  : Loading data to table hirw.employee_depts from hdfs://ms1.hirw.com:8020/apps/hive/warehouse/hirw.db/employee_depts/.hive-staging_hive_2020-11-17_19-49-38_700_4071314832051767912-1/-ext-10000
 INFO  : Table hirw.employee_depts stats: [numFiles=1, numRows=1, totalSize=10, rawDataSize=9]
 No rows affected (13.132 seconds) 

As you can it worked perfectly and we can see the inserted record in our table employee_depts now.

 SELECT * from employee_depts;
 +-----------------------+---------------------------+--+
 | employee_depts.ename  | employee_depts.dept_list  |
 +-----------------------+---------------------------+--+
 | Tom                   | [10,20]                   |
 +-----------------------+---------------------------+--+
 1 row selected (0.469 seconds) 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to insert into a Hive table when the table has columns with complex types like Arrays?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X