www.bigdatainrealworld.com is now fully live Hadoop In Real World is now Big Data In Real World! In-case you missed our communication from last week, you can […]
Let’s say you have a Hive table and the Hive table is pointing at a location or directory which has several sub directories and each subdirectories […]
Let’s say we have a DataFrame like below. +---------+-------+---------------+ | Project| Name|Cost_To_Project| +---------+-------+---------------+ |Ingestion| Jerry| 1000| |Ingestion| Arya| […]
Deleting a single document is pretty straightforward in Elasticsearch. We can simply issue a DELETE on the document id and the document will be deleted from […]
How to parse information from URL in Hive? Hive offers 2 functions to work with URLS – parse_url and parse_url_tuple. With both functions you can extract […]
stack function in Spark takes a number of rows as an argument followed by expressions. stack(n, expr1, expr2.. exprn) stack function will generate n rows by […]
Both map and flatMap functions are transformation functions. When applied on RDD, map and flatMap transform each element inside the rdd to something. Consider this simple […]
Simple problem with a simple solution. Solution Use hdfs dfs -count to get the count of files and directories inside the directory. [hirw@wk1 ~]$ hdfs dfs […]
Let’s consider the below table employee_depts with 2 columns – ename and dept_list. dept_list is of type array and has the list of departments. CREATE TABLE […]
The difference between static and dynamic partitioning only exists when the partition is being created based on how the partitions are added to the table. Once […]