Both explode and posexplode are User Defined Table generating Functions. UDTFs operate on single rows and produce multiple rows as output. explode() There are 2 flavors […]
We get this question a lot so we thought we would write a small post to answer this question. Spark leverages Hadoop’s InputFileFormat to read files […]
HiveServer (or HiveSerer1) was introduced when Hive first came out and it had several limitations. HiveServer2 was later introduced with Hive 0.11 and aimed to solve […]
Filtering based on a range like greater than, less than, greater than equal etc. are pretty common requirements when you work with data. In this post […]
Both map and mapPartitions are narrow transformation functions. Both functions don’t trigger a shuffle. Let’s say our RDD has 5 partitions and 10 elements in each […]
Hive has 3 different types of functions – User Defined Function (UDF), User Defined Aggregate Function (UDAF) and User Defined Table generating Function (UDTF). User Defined […]
A common question with a simple solution. Solution Use the below YARN command to list all applications that are running in YARN. yarn application -appStates RUNNING […]
It is pretty common in certain Hadoop distributions to get the below error when you attempt to start resource manager service or other services. ERROR [main] […]
reduceByKey() reduceByKey() has the below properties The result of the combination (e.g. a sum) is of the same type that the values The operation when combined […]
We can specify conditional expressions like OR, AND using the Query expression during search in Elasticsearch. We have an index named account and in the index […]
Both reduceByKey and groupByKey result in wide transformations which means both triggers a shuffle operation. The key difference between reduceByKey and groupByKey is that reduceByKey does […]