Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the becustom domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":2284,"date":"2023-05-22T06:00:00","date_gmt":"2023-05-22T11:00:00","guid":{"rendered":"https:\/\/www.bigdatainrealworld.com\/?p=2284"},"modified":"2023-05-09T07:24:24","modified_gmt":"2023-05-09T12:24:24","slug":"how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe","status":"publish","type":"post","link":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/","title":{"rendered":"How to add total count of DataFrame to an already grouped DataFrame?"},"content":{"rendered":"\n

<\/p>\n\n\n\n

Here is our data. We have an employee DataFrame with 3 columns, name, project and cost_to_project. An employee can belong to multiple projects and for each project a cost_to_project is assigned.<\/p>\n\n\n\n

val data = Seq(\r\n      (\"Ingestion\", \"Jerry\", 1000), (\"Ingestion\", \"Arya\", 2000), (\"Ingestion\", \"Emily\", 3000),\r\n      (\"ML\", \"Riley\", 9000), (\"ML\", \"Patrick\", 1000), (\"ML\", \"Mickey\", 8000),\r\n      (\"Analytics\", \"Donald\", 1000), (\"Ingestion\", \"John\", 1000), (\"Analytics\", \"Emily\", 8000),\r\n      (\"Analytics\", \"Arya\", 10000), (\"BI\", \"Mickey\", 12000), (\"BI\", \"Martin\", 5000))\r\n\r\nimport spark.sqlContext.implicits._\r\n\r\nval df = data.toDF(\"Project\", \"Name\", \"Cost_To_Project\")\r\n\r\n\r\nscala> df.show()\r\n+---------+-------+---------------+\r\n|  Project|   Name|Cost_To_Project|\r\n+---------+-------+---------------+\r\n|Ingestion|  Jerry|           1000|\r\n|Ingestion|   Arya|           2000|\r\n|Ingestion|  Emily|           3000|\r\n|       ML|  Riley|           9000|\r\n|       ML|Patrick|           1000|\r\n|       ML| Mickey|           8000|\r\n|Analytics| Donald|           1000|\r\n|Ingestion|   John|           1000|\r\n|Analytics|  Emily|           8000|\r\n|Analytics|   Arya|          10000|\r\n|       BI| Mickey|          12000|\r\n|       BI| Martin|           5000|\r\n+---------+-------+---------------+\r\n<\/pre>\n\n\n\n

We want to group the dataset by Name and get a count to see the employee and the number of projects they are assigned to. In addition to that sub count, we also want to add a column with a total count like below.<\/p>\n\n\n\n

<\/p>\n\n\n\n

+-------+------------------+-----------+\r\n|   Name|number_of_projects|Total Count|\r\n+-------+------------------+-----------+\r\n| Mickey|                 2|         12|\r\n| Martin|                 1|         12|\r\n|  Jerry|                 1|         12|\r\n|  Riley|                 1|         12|\r\n| Donald|                 1|         12|\r\n|   John|                 1|         12|\r\n|Patrick|                 1|         12|\r\n|  Emily|                 2|         12|\r\n|   Arya|                 2|         12|\r\n+-------+------------------+-----------+\r\n\n<\/pre>\n\n\n\n

Solution<\/h2>\n\n\n\n

It is pretty simple to achieve this. Simply add a column with counting the dataFrame and convert the value to a literal.<\/p>\n\n\n\n

<\/p>\n\n\n\n

val groupBy = df.groupBy(\"Name\").agg(count(\"*\").alias(\"number_of_projects\")).withColumn(\"Total Count\", lit(df.count))\r\ngroupBy.show()\r\n\r\n+-------+------------------+-----------+\r\n|   Name|number_of_projects|Total Count|\r\n+-------+------------------+-----------+\r\n| Mickey|                 2|         12|\r\n| Martin|                 1|         12|\r\n|  Jerry|                 1|         12|\r\n|  Riley|                 1|         12|\r\n| Donald|                 1|         12|\r\n|   John|                 1|         12|\r\n|Patrick|                 1|         12|\r\n|  Emily|                 2|         12|\r\n|   Arya|                 2|         12|\r\n+-------+------------------+-----------+\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"

Here is our data. We have an employee DataFrame with 3 columns, name, project and cost_to_project. An employee can belong to multiple projects and for each [\u2026]<\/span><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-2284","post","type-post","status-publish","format-standard","hentry","category-spark"],"yoast_head":"\nHow to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World<\/title>\n<meta name=\"description\" content=\"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World\" \/>\n<meta property=\"og:description\" content=\"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\" \/>\n<meta property=\"og:site_name\" content=\"Big Data In Real World\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bigdatainrealworld\" \/>\n<meta property=\"article:published_time\" content=\"2023-05-22T11:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-05-09T12:24:24+00:00\" \/>\n<meta name=\"author\" content=\"Big Data In Real World\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Big Data In Real World\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\"},\"author\":{\"name\":\"Big Data In Real World\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67\"},\"headline\":\"How to add total count of DataFrame to an already grouped DataFrame?\",\"datePublished\":\"2023-05-22T11:00:00+00:00\",\"dateModified\":\"2023-05-09T12:24:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\"},\"wordCount\":113,\"publisher\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\"},\"articleSection\":[\"Spark\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\",\"name\":\"How to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World\",\"isPartOf\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#website\"},\"datePublished\":\"2023-05-22T11:00:00+00:00\",\"dateModified\":\"2023-05-09T12:24:24+00:00\",\"description\":\"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.bigdatainrealworld.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to add total count of DataFrame to an already grouped DataFrame?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#website\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/\",\"name\":\"Big Data In Real World\",\"description\":\"Learn Big Data from experts!\",\"publisher\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.bigdatainrealworld.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#organization\",\"name\":\"Big Data In Real World\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png\",\"contentUrl\":\"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png\",\"width\":500,\"height\":500,\"caption\":\"Big Data In Real World\"},\"image\":{\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/bigdatainrealworld\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67\",\"name\":\"Big Data In Real World\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g\",\"caption\":\"Big Data In Real World\"},\"description\":\"We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.\",\"sameAs\":[\"https:\/\/www.bigdatainrealworld.com\/\"],\"url\":\"https:\/\/www.bigdatainrealworld.com\/author\/bigdatainrealworld\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World","description":"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/","og_locale":"en_US","og_type":"article","og_title":"How to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World","og_description":"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.","og_url":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/","og_site_name":"Big Data In Real World","article_publisher":"https:\/\/www.facebook.com\/bigdatainrealworld","article_published_time":"2023-05-22T11:00:00+00:00","article_modified_time":"2023-05-09T12:24:24+00:00","author":"Big Data In Real World","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Big Data In Real World","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#article","isPartOf":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/"},"author":{"name":"Big Data In Real World","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67"},"headline":"How to add total count of DataFrame to an already grouped DataFrame?","datePublished":"2023-05-22T11:00:00+00:00","dateModified":"2023-05-09T12:24:24+00:00","mainEntityOfPage":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/"},"wordCount":113,"publisher":{"@id":"https:\/\/www.bigdatainrealworld.com\/#organization"},"articleSection":["Spark"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/","url":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/","name":"How to add total count of DataFrame to an already grouped DataFrame? - Big Data In Real World","isPartOf":{"@id":"https:\/\/www.bigdatainrealworld.com\/#website"},"datePublished":"2023-05-22T11:00:00+00:00","dateModified":"2023-05-09T12:24:24+00:00","description":"In this short post we demonstrate how to add a total count of DataFrame to a DataFrame which is already grouped.","breadcrumb":{"@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.bigdatainrealworld.com\/how-to-add-total-count-of-dataframe-to-an-already-grouped-dataframe\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.bigdatainrealworld.com\/"},{"@type":"ListItem","position":2,"name":"How to add total count of DataFrame to an already grouped DataFrame?"}]},{"@type":"WebSite","@id":"https:\/\/www.bigdatainrealworld.com\/#website","url":"https:\/\/www.bigdatainrealworld.com\/","name":"Big Data In Real World","description":"Learn Big Data from experts!","publisher":{"@id":"https:\/\/www.bigdatainrealworld.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.bigdatainrealworld.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.bigdatainrealworld.com\/#organization","name":"Big Data In Real World","url":"https:\/\/www.bigdatainrealworld.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png","contentUrl":"https:\/\/www.bigdatainrealworld.com\/wp-content\/uploads\/2023\/02\/black.png","width":500,"height":500,"caption":"Big Data In Real World"},"image":{"@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/bigdatainrealworld"]},{"@type":"Person","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/24cab2292ef49c73053440c86515ef67","name":"Big Data In Real World","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.bigdatainrealworld.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d332bc24fe9b3182f0a22135f163ac4e?s=96&d=retro&r=g","caption":"Big Data In Real World"},"description":"We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.","sameAs":["https:\/\/www.bigdatainrealworld.com\/"],"url":"https:\/\/www.bigdatainrealworld.com\/author\/bigdatainrealworld\/"}]}},"_links":{"self":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/comments?post=2284"}],"version-history":[{"count":1,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2284\/revisions"}],"predecessor-version":[{"id":2285,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/posts\/2284\/revisions\/2285"}],"wp:attachment":[{"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/media?parent=2284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/categories?post=2284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bigdatainrealworld.com\/wp-json\/wp\/v2\/tags?post=2284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}