How to skip the first line or header when reading a file in Hive?
March 1, 2021Why does Cartesian Product Join aka Shuffle-and-Replication Nested Loop Join does not cause a shuffle?
March 5, 2021EBS, S3 and Glacier are different storage options available in Amazon Web Services. They differ in cost, use case and purpose. Here is a super quick summary.
EBS
- EBS is block storage
- Can be formatted and accessed as file system
- Tied to a region and can be accessed only from that region. But can be transferred to another region.
- You need an EC2 instance to access the data
- Snapshots can be taken for backup reasons.
S3
- S3 is object storage
- Can be accessed from any region and from the internet
- Great for storing large volume of data
- Can be stored for archiving data too
- S3 is not a filesystem so locking, permissions can be applied
- Extremely reliable
- Cheaper than EBS
Glacier
- Cheaper of all 3 options
- Great for long term storage
- Not as high performance as S3