How to get a few lines of data from a file in HDFS?

What is the difference between order by, sort by, cluster by and distribute by in Hive?

September 10, 2021

What are accumulators in Spark, when and when not to use them?

September 15, 2021

Published by Big Data In Real World at September 13, 2021

Few lines from top of the file

Use the cat command followed by a head to get the top few files from a file.

[hirw@wk1 ~]$ hdfs dfs -cat /user/zeppelin/notebook/2CA587K77/note.json | head

{
  "paragraphs": [
    {
      "text": "%md\n\n## Exploring Spark SQL Module\n#### with an Airline Dataset\n\n**Level**: 
Beginner\n**Language**: Scala\n**Requirements**: \n- [HDP 2.6](http://hortonworks.com/products/sandbox/) 
(or later) or [HDCloud](https://hortonworks.github.io/hdp-aws/)\n- Spark 2.x\n\n**Author**: 
Robert Hryniewicz\n**Follow** [@RobertH8z](https://twitter.com/RobertH8z)",
      "user": "admin",
      "dateUpdated": "Feb 22, 2017 3:45:16 PM",
      "config": {
        "editorMode": "ace/mode/markdown",
        "colWidth": 12.0,
        "editorHide": true,

Few lines from bottom of the file

Use the tail command on the file to get few lines from the end of the file.

[hirw@wk1 ~]$ hdfs dfs -tail /user/zeppelin/notebook/2CA587K77/note.json

      "progressUpdateIntervalMs": 500
    }
  ],
  "name": "Labs / Spark 2.x / Data Worker / Scala / 101 - Intro to SparkSQL",
  "id": "2CA587K77",
  "angularObjects": {
    "2C9J4X9BB:shared_process": [],
    "2C97XTJFE:shared_process": [],
    "2C9BD8WCX:shared_process": [],
    "2CBT85YD7:shared_process": [],
    "2C8RGTKC3:shared_process": [],
    "2CBQNWPMD:shared_process": [],
    "2C8JDGPHH:shared_process": [],

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to get a few lines of data from a file in HDFS?

What is the difference between order by, sort by, cluster by and distribute by in Hive?

What are accumulators in Spark, when and when not to use them?

What is the difference between order by, sort by, cluster by and distribute by in Hive?

What are accumulators in Spark, when and when not to use them?

Few lines from top of the file

Few lines from bottom of the file

Big Data In Real World

Related posts

How to view the contents of a GZiped file in HDFS?

How to find out if a directory in HDFS is empty or not?

How to find the available free space in HDFS?