Kafka Distributed, durable and reliable message broker which can handle high volume of real time messages coming from realtime producers. Storage for real time streaming data […]
In Hadoop you will find Hadoop specific types for basic types. For eg. you will find Text for String. IntWritable instead of Integer. For all primitive […]
Cartesian Product Join (a.k.a Shuffle-and-Replication Nested Loop) join works very similar to a Broadcast Nested Loop join except the dataset is not broadcasted. Shuffle-and-Replication does not […]
Shuffle Sort Merge Join, as the name indicates, involves a sort operation. Shuffle Sort Merge Join has 3 phases. Shuffle Phase – both datasets are shuffled […]
Hive stores metadata information about the tables created in Hive in a relational database like Derby, MySQL etc. The metadata information includes table name, structure of […]
Broadcast Nested Loop join works by broadcasting one of the entire datasets and performing a nested loop to join the data. So essentially every record from […]