You are getting the below error during DataNode startup. This post talks about how to fix the issue. 2013-04-11 16:25:50,515 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting […]
Changing the number of replicas on an existing topic is a 3 step process. Get the current information about the topic Configure new replica assignment in […]
java.net.BindException is a common exception when Spark is trying to initialize SparkContext. This is especially a common error when you try to run Spark locally. 16/01/04 […]
Downloading an entire bucket or folder inside a bucket is quite straightforward with AWS CLI. Install AWS CLI from here if you don’t have it already. […]
NameNode NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on […]
Let’s redefine the question a little bit. Can multiple Kafka consumers from a consumer group read the same message from a partition? The short answer to […]
Broadcast variables are variables which are available in all executors executing the Spark application. These variables are already cached and ready to be used by tasks […]
Apache Pig was created by Yahoo. Apache Hive was created by Facebook. Both tools aimed at hiding the complexities of writing MapReduce jobs. Pig is similar […]