How to get a few lines of data from a file in HDFS? - Big Data In Real World

How to get a few lines of data from a file in HDFS?

What is the difference between order by, sort by, cluster by and distribute by in Hive?
September 10, 2021
What are accumulators in Spark, when and when not to use them?
September 15, 2021
What is the difference between order by, sort by, cluster by and distribute by in Hive?
September 10, 2021
What are accumulators in Spark, when and when not to use them?
September 15, 2021

Use cat followed by a head or tail to see few lines from the top or end of the file.

Few lines from top of the file

Use the cat command followed by a head to get the top few files from a file. 

[hirw@wk1 ~]$ hdfs dfs -cat /user/zeppelin/notebook/2CA587K77/note.json | head

{
  "paragraphs": [
    {
      "text": "%md\n\n## Exploring Spark SQL Module\n#### with an Airline Dataset\n\n**Level**: 
Beginner\n**Language**: Scala\n**Requirements**: \n- [HDP 2.6](http://hortonworks.com/products/sandbox/) 
(or later) or [HDCloud](https://hortonworks.github.io/hdp-aws/)\n- Spark 2.x\n\n**Author**: 
Robert Hryniewicz\n**Follow** [@RobertH8z](https://twitter.com/RobertH8z)",
      "user": "admin",
      "dateUpdated": "Feb 22, 2017 3:45:16 PM",
      "config": {
        "editorMode": "ace/mode/markdown",
        "colWidth": 12.0,
        "editorHide": true,

Few lines from bottom of the file

Use the tail command on the file to get few lines from the end of the file.

[hirw@wk1 ~]$ hdfs dfs -tail /user/zeppelin/notebook/2CA587K77/note.json

      "progressUpdateIntervalMs": 500
    }
  ],
  "name": "Labs / Spark 2.x / Data Worker / Scala / 101 - Intro to SparkSQL",
  "id": "2CA587K77",
  "angularObjects": {
    "2C9J4X9BB:shared_process": [],
    "2C97XTJFE:shared_process": [],
    "2C9BD8WCX:shared_process": [],
    "2CBT85YD7:shared_process": [],
    "2C8RGTKC3:shared_process": [],
    "2CBQNWPMD:shared_process": [],
    "2C8JDGPHH:shared_process": [],

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to get a few lines of data from a file in HDFS?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X