What is the difference between hivevar and hiveconf? - Big Data In Real World

What is the difference between hivevar and hiveconf?

How to convert RDD to DataFrame in spark?
December 9, 2020
How to set variables in Hive scripts?
December 14, 2020
How to convert RDD to DataFrame in spark?
December 9, 2020
How to set variables in Hive scripts?
December 14, 2020

In this post we have shown how to set and refer to the variable in Hive prompt and Hive scripts. In the post, we have used hivevar namespace and it is the right thing to do. If you are looking at Hive scripts which were written when Hive came out, you will see the references to hiveconf namespace.

When you look at hive help (hive -h) you will see hiveconf and hivevar doing similar things. 

 

--hiveconf <property=value> Use value for given property 

--hivevar <key=value> Variable substitution to apply to hive commands. e.g. --hivevar A=B

hivevar vs hiveconf

When you run a Hive script in production there are 2 types of variables you would be needing to set. Variables or properties that are specific to execution of the Hive job like for eg.mapreduce.reduce.tasks and variables that are specific to the Hive script that you are executing.

When Hive first came out it only had hiveconf namespace so both variables specific to execution of Hive job and user/script specific variables are set to both hiveconf. This is obviously not great as it doesn’t have a good separation of variables and hence hivevar namespace was introduced in the later versions.

Use hiveconf namespace and –hiveconf when you set properties that affect the execution of the job.

Use hivevar namespace and –hivevar when you set user variables that affect the execution of the Hive script.

Which is default? – hivevar or hiveconf

What is the default namespace when you don’t specify the namespace during the set and the retrieval of variables? Here is where it gets confusing. 

When you don’t specify a namespace the variable is set to the hiveconf namespace.

hive> set date-ymd = '2019-11-15';

So to refer the variable date-ymd, you would need to use ${hiveconf:date-ymd}

hive> select ${hiveconf:date-ymd};

Since hiveconf is the default namespace what happens when you skip the namespace when you refer to the variables?

hive> select ${date-ymd};

Above will give you an error that the variable doesn’t exist. Why is this the case? The answer is when you refer to the variables the default namespace is hivevar and not hiveconf.

hiveconf – is the default namespace when you set variables

hivevar – is the default namespace when you refer to the variables

To conclude, to avoid this confusion, always prefix your variables with the namespace and we recommend you using hivevar namespace when your intention is to use the variables in Hive scripts.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between hivevar and hiveconf?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X