What is the difference between INNER JOIN and LEFT SEMI JOIN in Hive?
March 15, 2021When to use cache and persist functions in Spark?
March 19, 2021In this post we are going to see how to enable version on a bucket and then how to get a very specific version of a file or object from S3 using AWS CLI.
Enable versioning
Versioning can be enabled on a bucket during creation or after creation. If your bucket is already created, go to properties and make sure versioning is enabled.
Get specific version from S3
We first upload a file named version-test with the following text – “Hello, this is VERSION 1”
[osboxes@wk1 ~]$ aws s3 cp version-test s3://hirw-bucket-versions upload: ./version-test to s3://hirw-bucket-versions/version-test
Let’s now execute s3api list-object-versions with the name of the bucket. Here we can see the key of the object which is the name of the file/object and the version id. This is version 1 of the object.
[osboxes@wk1 ~]$ aws s3api list-object-versions --bucket hirw-bucket-versions { "Versions": [ { "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"", "Size": 25, "StorageClass": "STANDARD", "Key": "version-test", "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp", "IsLatest": true, "LastModified": "2020-11-18T19:39:49+00:00", "Owner": { "DisplayName": "xxx", "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d" } } ] }
We now change the content of the file version-test to – “Hello, this is VERSION TWO”
When we now list-object-versions we will see 2 versions for the objects. You can see which version is latest by looking at the IsLatest property.
[osboxes@wk1 ~]$ aws s3api list-object-versions --bucket hirw-bucket-versions { "Versions": [ { "ETag": "\"04cf95a672923b5c021568765b336c4f\"", "Size": 27, "StorageClass": "STANDARD", "Key": "version-test", "VersionId": "8LKeiP26.7WS_CUYUmY_BNmpIX0ljIxa", "IsLatest": true, "LastModified": "2020-11-18T19:40:35+00:00", "Owner": { "DisplayName": "xxx", "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d" } }, { "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"", "Size": 25, "StorageClass": "STANDARD", "Key": "version-test", "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp", "IsLatest": false, "LastModified": "2020-11-18T19:39:49+00:00", "Owner": { "DisplayName": "xxx", "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d" } } ] }
When we get the object from S3, by default, S3 will return the most recent version.
[osboxes@wk1 ~]$ aws s3api get-object --bucket hirw-bucket-versions --key version-test --range bytes=0-10000 /dev/stdout | head Hello, this is VERSION TWO { "AcceptRanges": "bytes", "LastModified": "2020-11-18T19:40:35+00:00", "ContentLength": 27, "ETag": "\"04cf95a672923b5c021568765b336c4f\"", "VersionId": "8LKeiP26.7WS_CUYUmY_BNmpIX0ljIxa", "ContentRange": "bytes 0-26/27", "ContentType": "binary/octet-stream", "Metadata": {}
To get an older version we need to specify the version id of the version we are trying to get. So to get the first version we need to specify –version-id “w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp”
[osboxes@wk1 ~]$ aws s3api get-object --bucket hirw-bucket-versions --key version-test --version-id "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp" /dev/stdout | head Hello, this is VERSION 1 { "AcceptRanges": "bytes", "LastModified": "2020-11-18T19:39:49+00:00", "ContentLength": 25, "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"", "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp", "ContentType": "binary/octet-stream", "Metadata": {} }