Both partitioning and bucketing are techniques in Hive to organize the data efficiently so subsequent executions on the data works with optimal performance. Partitioning Let’s take […]
Apache Pig was created by Yahoo. Apache Hive was created by Facebook. Both tools aimed at hiding the complexities of writing MapReduce jobs. Pig is similar […]
Both explode and posexplode are User Defined Table generating Functions. UDTFs operate on single rows and produce multiple rows as output. explode() There are 2 flavors […]
HiveServer (or HiveSerer1) was introduced when Hive first came out and it had several limitations. HiveServer2 was later introduced with Hive 0.11 and aimed to solve […]
Hive has 3 different types of functions – User Defined Function (UDF), User Defined Aggregate Function (UDAF) and User Defined Table generating Function (UDTF). User Defined […]
Both INNER JOIN and LEFT SEMI JOIN return matching records between both tables with a subtle difference. Let’s consider 2 tables – employee and employee_department_mapping with […]
Hive stores metadata information about the tables created in Hive in a relational database like Derby, MySQL etc. The metadata information includes table name, structure of […]
So you were installing Hive and ran into the below issue when Hive was trying to set up the metastore database. Exception in thread "main" java.lang.RuntimeException: […]
Hive scripts which are scheduled to run in production always take in variables. These variables are set in dynamically and you would need to pass the […]