What is the difference between get_json_object and json_tuple functions in Hive?

How to insert into a Hive table when the table has columns with complex types like Arrays?

January 26, 2022

How to find out if a directory in HDFS is empty or not?

February 9, 2022

Published by Big Data In Real World at February 2, 2022

get_json_obect

get_json_object() is a UDF and can take a JSON string as input and you can pass in the key that you want to be extracted from the JSON.

Here we are selecting 2 values from the JSON – owner and course. And we ended up calling the get_json_object twice.

 SELECT get_json_object(course, '$.owner'), get_json_object(course, '$.course')  FROM hirw_courses;
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 |    _c0    |                                                                    _c1                                                                    |
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 | Big Data  | {"course1":{"name":"Hadoop Developer In Real World","type":"online"},"course2":{"name":"Spark Developer In Real World","type":"online"}}  |
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 1 row selected (0.125 seconds)

json_tuple

json_tuple is a User Defined Table generating Function (UDTF). <<Check out the post, if you want know the difference between UDF, UDAF and UDTFs)

With json_tuple you can pass in the json string followed by a list of keys for which you want to extract the values from JSON.

Since json_tuple is an UDTF, you can also use it with LATERAL VIEW which can be quite powerful.

 SELECT json_tuple(course, 'owner', 'course')  FROM hirw_courses;
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 |    c0     |                                                                    c1                                                                     |
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 | Big Data  | {"course1":{"name":"Hadoop Developer In Real World","type":"online"},"course2":{"name":"Spark Developer In Real World","type":"online"}}  |
 +-----------+-------------------------------------------------------------------------------------------------------------------------------------------+--+
 1 row selected (0.177 seconds)

get_json_object vs. json_tuple – which is better?

With get_json_object, if we are trying to extract 5 values from a JSON string we would have to call the function 5 times and this means the same JSON data is parsed through 5 times.

With json_tuple, we parse the JSON data only once even when we need to parse or extract 5 values from JSON.

So always use json_tuple in place of get_json_object for better performance.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between get_json_object and json_tuple functions in Hive?

How to insert into a Hive table when the table has columns with complex types like Arrays?

How to find out if a directory in HDFS is empty or not?

How to insert into a Hive table when the table has columns with complex types like Arrays?

How to find out if a directory in HDFS is empty or not?

get_json_obect

json_tuple

get_json_object vs. json_tuple – which is better?

Big Data In Real World

Related posts

How to transpose or convert columns to rows in Hive?

How to fail a Hive script based on a condition?

How to delete duplicate data from the Hive table?