How to convert RDD to DataFrame in spark?
December 9, 2020How to set variables in Hive scripts?
December 14, 2020In this post we have shown how to set and refer to the variable in Hive prompt and Hive scripts. In the post, we have used hivevar namespace and it is the right thing to do. If you are looking at Hive scripts which were written when Hive came out, you will see the references to hiveconf namespace.
When you look at hive help (hive -h) you will see hiveconf and hivevar doing similar things.
--hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to hive commands. e.g. --hivevar A=B
hivevar vs hiveconf
When you run a Hive script in production there are 2 types of variables you would be needing to set. Variables or properties that are specific to execution of the Hive job like for eg.mapreduce.reduce.tasks and variables that are specific to the Hive script that you are executing.
When Hive first came out it only had hiveconf namespace so both variables specific to execution of Hive job and user/script specific variables are set to both hiveconf. This is obviously not great as it doesn’t have a good separation of variables and hence hivevar namespace was introduced in the later versions.
Use hiveconf namespace and –hiveconf when you set properties that affect the execution of the job.
Use hivevar namespace and –hivevar when you set user variables that affect the execution of the Hive script.
Which is default? – hivevar or hiveconf
What is the default namespace when you don’t specify the namespace during the set and the retrieval of variables? Here is where it gets confusing.
When you don’t specify a namespace the variable is set to the hiveconf namespace.
hive> set date-ymd = '2019-11-15';
So to refer the variable date-ymd, you would need to use ${hiveconf:date-ymd}
hive> select ${hiveconf:date-ymd};
Since hiveconf is the default namespace what happens when you skip the namespace when you refer to the variables?
hive> select ${date-ymd};
Above will give you an error that the variable doesn’t exist. Why is this the case? The answer is when you refer to the variables the default namespace is hivevar and not hiveconf.
hiveconf – is the default namespace when you set variables
hivevar – is the default namespace when you refer to the variables
To conclude, to avoid this confusion, always prefix your variables with the namespace and we recommend you using hivevar namespace when your intention is to use the variables in Hive scripts.