Why does Hadoop need classes like Text instead of String? - Big Data In Real World

Why does Hadoop need classes like Text instead of String?

How to get a count of the number of documents in an Elasticsearch Index?
February 1, 2021
How to specify join hints with Spark 3.0?
February 5, 2021
How to get a count of the number of documents in an Elasticsearch Index?
February 1, 2021
How to specify join hints with Spark 3.0?
February 5, 2021

In Hadoop you will find Hadoop specific types for basic types. For eg. you will find Text for String. IntWritable instead of Integer. For all primitive types you will see a type which implements Writable interface.

Fast and compact

Serialization and Deserialization are at the heart of Hadoop implementation. Data is distributed in Hadoop and data is transferred over the network multiple times during shuffle mainly. Data is serialized when the data is transferred over the network and deserialized when data is processed on the other side.

Plain Java serialization is heavy and slow.

To overcome the inefficiencies with Java serialization Hadoop’s creator Doug Cutting implemented Writable objects that came up with IO classes to replace Java primitive types which can perform serialization that is fast and compact.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Why does Hadoop need classes like Text instead of String?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X