Where does Hive store files for Hive tables?
October 29, 2021How to move a Hive table from one database to another?
January 5, 2022A quick answer that might come to your mind is to call the count() function on the dataframe and check if the count is greater than 0. count() on a dataframe with a lot of records is super inefficient.
count() will do a global count of records in the dataframe from all partitions and then add all the intermediate counts together to get the final count. You will find this approach very slow for big dataframes.
Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>
Optimal way to check if dataframe is empty
Use the head function in place of count.
df.head(1).isEmpty
Above is efficient because to find whether a dataframe is empty or not, all you need to know is whether the dataframe has at least one record or not.
Note that head() on an empty dataframe will result in java.util.NoSuchElementException exception. So make sure to use head(1).