Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the becustom domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":2005,"date":"2021-04-09T06:00:00","date_gmt":"2021-04-09T11:00:00","guid":{"rendered":"https:\/\/www.bigdatainrealworld.com\/?p=2005"},"modified":"2023-02-19T07:31:30","modified_gmt":"2023-02-19T13:31:30","slug":"how-to-solve-word-count-problem-in-hive","status":"publish","type":"post","link":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/","title":{"rendered":"How to solve word count problem in Hive?"},"content":{"rendered":"

If you have read about MapReduce you know what a word count problem is. Word count is simply counting the number of words in a dataset. You probably know how this problem is solved with MapReduce.<\/p>\n

In this post we are going to see how to solve the word count problem in Hive.<\/span><\/p>\n

We have a file with the following content.<\/span><\/p>\n

When different join strategy hints are specified on both sides of a join, Spark prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint over the SHUFFLE_REPLICATE_NL hint<\/span><\/p>\n

Our solution should look like below which is basically the number of occurrence of each word in the file.<\/span><\/p>\n

+-----------------------+------+--+\n|         words         | _c1  |\n+-----------------------+------+--+\n| BROADCAST             | 1    |\n| MERGE                 | 1    |\n| SHUFFLE_HASH          | 1    |\n| SHUFFLE_REPLICATE_NL  | 1    |\n| Spark                 | 1    |\n| When                  | 1    |\n| a                     | 1    |\n| are                   | 1    |\n| both                  | 1    |\n| different             | 1    |\n| hint                  | 4    |\n| hints                 | 1    |\n| join                  | 1    |\n| join,                 | 1    |\n| of                    | 1    |\n| on                    | 1    |\n| over                  | 3    |\n| prioritizes           | 1    |\n| sides                 | 1    |\n| specified             | 1    |\n| strategy              | 1    |\n| the                   | 4    |\n+-----------------------+------+--+<\/pre>\n

Solution<\/span><\/h2>\n

We will be using split(), explode() and lateral view to solve this problem.<\/span><\/p>\n

split()<\/span><\/h3>\n

Step 1 – we will split the contents of the file by space. Split will turn each line in the file to an array of words<\/span><\/p>\n

explode()<\/span><\/h3>\n

Step 2 – we will apply the explode() function on the array of words. explode() is a user-defined<\/span> table generating function which takes in a row and explode to multiple rows.<\/span><\/p>\n

In this case, explode will take the array of words and explode each word into a row. If the array has 5 words, we will end up with 5 rows.<\/span><\/p>\n

LATERAL VIEW<\/span><\/h3>\n

Lateral view is used in conjunction with user-defined table generating functions such as explode()<\/span> <\/span>. <\/span><\/p>\n

A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.<\/span><\/p>\n

LATERAL VIEW can\u2019t function alone. It needs to be used along with a UDTF. Here we are using explode()<\/span>  to first explode the array to individual rows or words. For the exploded data we are naming the table as expl_words with a column words.<\/span><\/p>\n

SELECT words, count(1)\nFROM textfile\nLATERAL VIEW EXPLODE(SPLIT(line, ' ')) expl_words AS words\nGROUP BY words;<\/pre>\n

LATERAL VIEW joins resulting output exploded rows to the input rows from textfile. In this case, we are not displaying the line column from textfile because we are not interested in that column.<\/span><\/p>\n

SELECT words, count(1)\nFROM textfile\nLATERAL VIEW EXPLODE(SPLIT(line, ' ')) expl_words AS words\nGROUP BY words;\nINFO  : Session is already open\nINFO  : Dag name: SELECT words, count(1)\nFROM textfile...words(Stage-1)\nINFO  : Tez session was closed. Reopening...\nINFO  : Session re-established.\nINFO  : Status: Running (Executing on YARN cluster with App id application_1604763385917_0004)\n--------------------------------------------------------------------------------\n\n       VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED\n\n--------------------------------------------------------------------------------\nMap 1 ..........   SUCCEEDED      1          1        0        0       0       0\nReducer 2 ......   SUCCEEDED      1          1        0        0       0       0\n--------------------------------------------------------------------------------\nVERTICES: 02\/02  [==========================>>] 100%  ELAPSED TIME: 24.89 s\n--------------------------------------------------------------------------------\n+-----------------------+------+--+\n|         words         | _c1  |\n+-----------------------+------+--+\n| BROADCAST             | 1    |\n| MERGE                 | 1    |\n| SHUFFLE_HASH          | 1    |\n| SHUFFLE_REPLICATE_NL  | 1    |\n| Spark                 | 1    |\n| When                  | 1    |\n| a                     | 1    |\n| are                   | 1    |\n| both                  | 1    |\n| different             | 1    |\n| hint                  | 4    |\n| hints                 | 1    |\n| join                  | 1    |\n| join,                 | 1    |\n| of                    | 1    |\n| on                    | 1    |\n| over                  | 3    |\n| prioritizes           | 1    |\n| sides                 | 1    |\n| specified             | 1    |\n| strategy              | 1    |\n| the                   | 4    |\n+-----------------------+------+--+\n22 rows selected (27.572 seconds)\n\n\n<\/pre>\n

 <\/p>\n","protected":false},"excerpt":{"rendered":"

If you have read about MapReduce you know what a word count problem is. Word count is simply counting the number of words in a dataset. [\u2026]<\/span><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[],"class_list":["post-2005","post","type-post","status-publish","format-standard","hentry","category-apache-hive"],"yoast_head":"\nHow to solve word count problem in Hive? - Big Data In Real World<\/title>\n<meta name=\"description\" content=\"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to solve word count problem in Hive? - Big Data In Real World\" \/>\n<meta property=\"og:description\" content=\"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\" \/>\n<meta property=\"og:site_name\" content=\"Big Data In Real World\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bigdatainrealworld\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-09T11:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-19T13:31:30+00:00\" \/>\n<meta name=\"author\" content=\"Big Data In Real World\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Big Data In Real World\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\"},\"author\":{\"name\":\"Big Data In Real World\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67\"},\"headline\":\"How to solve word count problem in Hive?\",\"datePublished\":\"2021-04-09T11:00:00+00:00\",\"dateModified\":\"2023-02-19T13:31:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\"},\"wordCount\":353,\"publisher\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\"},\"articleSection\":[\"Apache Hive\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\",\"name\":\"How to solve word count problem in Hive? - Big Data In Real World\",\"isPartOf\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#website\"},\"datePublished\":\"2021-04-09T11:00:00+00:00\",\"dateModified\":\"2023-02-19T13:31:30+00:00\",\"description\":\"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.bigdatainrealworld.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to solve word count problem in Hive?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#website\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/\",\"name\":\"Big Data In Real World\",\"description\":\"Learn Big Data from experts!\",\"publisher\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.bigdatainrealworld.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\",\"name\":\"Big Data In Real World\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png\",\"contentUrl\":\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png\",\"width\":500,\"height\":500,\"caption\":\"Big Data In Real World\"},\"image\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/bigdatainrealworld\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67\",\"name\":\"Big Data In Real World\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g\",\"caption\":\"Big Data In Real World\"},\"description\":\"We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.\",\"sameAs\":[\"https:\/\/www.bigdatainrealworld.com\/\"],\"url\":\"https:\/\/www.bigdatainrealworld.com\/author\/bigdatainrealworld\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to solve word count problem in Hive? - Big Data In Real World","description":"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/","og_locale":"en_US","og_type":"article","og_title":"How to solve word count problem in Hive? - Big Data In Real World","og_description":"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.","og_url":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/","og_site_name":"Big Data In Real World","article_publisher":"https:\/\/www.facebook.com\/bigdatainrealworld","article_published_time":"2021-04-09T11:00:00+00:00","article_modified_time":"2023-02-19T13:31:30+00:00","author":"Big Data In Real World","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Big Data In Real World","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#article","isPartOf":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/"},"author":{"name":"Big Data In Real World","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67"},"headline":"How to solve word count problem in Hive?","datePublished":"2021-04-09T11:00:00+00:00","dateModified":"2023-02-19T13:31:30+00:00","mainEntityOfPage":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/"},"wordCount":353,"publisher":{"@id":"https:\/\/www.bigdatainrealworld.com\/#organization"},"articleSection":["Apache Hive"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/","url":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/","name":"How to solve word count problem in Hive? - Big Data In Real World","isPartOf":{"@id":"https:\/\/www.bigdatainrealworld.com\/#website"},"datePublished":"2021-04-09T11:00:00+00:00","dateModified":"2023-02-19T13:31:30+00:00","description":"Easy to follow, step by step instructions on how to solve word count problem with Hive with code.","breadcrumb":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-solve-word-count-problem-in-hive\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.bigdatainrealworld.com\/"},{"@type":"ListItem","position":2,"name":"How to solve word count problem in Hive?"}]},{"@type":"WebSite","@id":"https:\/\/www.bigdatainrealworld.com\/#website","url":"https:\/\/www.bigdatainrealworld.com\/","name":"Big Data In Real World","description":"Learn Big Data from experts!","publisher":{"@id":"https:\/\/www.bigdatainrealworld.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.bigdatainrealworld.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.bigdatainrealworld.com\/#organization","name":"Big Data In Real World","url":"https:\/\/www.bigdatainrealworld.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png","contentUrl":"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png","width":500,"height":500,"caption":"Big Data In Real World"},"image":{"@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/bigdatainrealworld"]},{"@type":"Person","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67","name":"Big Data In Real World","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g","caption":"Big Data In Real World"},"description":"We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.","sameAs":["https:\/\/www.bigdatainrealworld.com\/"],"url":"https:\/\/www.bigdatainrealworld.com\/author\/bigdatainrealworld\/"}]}},"_links":{"self":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2005","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/comments?post=2005"}],"version-history":[{"count":3,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2005\/revisions"}],"predecessor-version":[{"id":2044,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2005\/revisions\/2044"}],"wp:attachment":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/media?parent=2005"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/categories?post=2005"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/tags?post=2005"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}