Hive by default store all the files behind the Hive table under the warehouse directory. But this location can be overridden during table or later when […]
In this post we will explain the architecture of Hive along with the various components involved and their functions. HiveServer2 HiveServer2 is an improved implementation of […]
LATERAL VIEW and EXPLODE are 2 different things in Hive. Lateral view is used in conjunction with user-defined table generating functions such as explode(). Problem Let’s […]
Comparing 2 dates is quite common when you deal with data. Hive has datediff function to help you compare 2 dates. Solution datediff function in Hive […]
Simple problem with a simple solution. Solution Order the records first and then apply the LIMIT clause to limit the number of records. SELECT * FROM […]
foreach() and foreachPartition() are action function and not transform function. Both functions, since they are actions, they don’t return a RDD back. Do you like us […]
Simple problem with a simple solution. Solution Use the powerful regexp_replace function to replace characters. regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): Replace tab in the string […]
Pretty simple problem with a simple solution. Solution CURRENT_DATE will give the current date and CURRENT_TIMESTAMP will give you the date and time. 0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> SELECT […]
Both partitioning and bucketing are techniques in Hive to organize the data efficiently so subsequent executions on the data works with optimal performance. Partitioning Let’s take […]
Accumulators are like global variables in Spark application. In the real world, accumulators are used as counters and keep to keep track of something at an […]