How to get a specific version of a file from S3 using AWS CLI? - Big Data In Real World

How to get a specific version of a file from S3 using AWS CLI?

What is the difference between INNER JOIN and LEFT SEMI JOIN in Hive?
March 15, 2021
When to use cache and persist functions in Spark?
March 19, 2021
What is the difference between INNER JOIN and LEFT SEMI JOIN in Hive?
March 15, 2021
When to use cache and persist functions in Spark?
March 19, 2021

In this post we are going to see how to enable version on a bucket and then how to get a very specific version of a file or object from S3 using AWS CLI.

Enable versioning

Versioning can be enabled on a bucket during creation or after creation. If your bucket is already created, go to properties and make sure versioning is enabled.
aws-s3-enable-versioning-edit

Get specific version from S3

We first upload a file named version-test with the following text – “Hello, this is VERSION 1”

[osboxes@wk1 ~]$ aws s3 cp version-test s3://hirw-bucket-versions

upload: ./version-test to s3://hirw-bucket-versions/version-test

Let’s now execute s3api list-object-versions with the name of the bucket. Here we can see the key of the object which is the name of the file/object and the version id. This is version 1 of the object.

[osboxes@wk1 ~]$ aws s3api list-object-versions --bucket hirw-bucket-versions

{
    "Versions": [
        {
            "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"",
            "Size": 25,
            "StorageClass": "STANDARD",
            "Key": "version-test",
            "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp",
            "IsLatest": true,
            "LastModified": "2020-11-18T19:39:49+00:00",
            "Owner": {
                "DisplayName": "xxx",
                "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d"
            }
        }
    ]
}

 

We now change the content of the file version-test to – “Hello, this is VERSION TWO”

When we now list-object-versions we will see 2 versions for the objects. You can see which version is latest by looking at the IsLatest property.

[osboxes@wk1 ~]$ aws s3api list-object-versions --bucket hirw-bucket-versions

{
    "Versions": [
        {
            "ETag": "\"04cf95a672923b5c021568765b336c4f\"",
            "Size": 27,
            "StorageClass": "STANDARD",
            "Key": "version-test",
            "VersionId": "8LKeiP26.7WS_CUYUmY_BNmpIX0ljIxa",
            "IsLatest": true,
            "LastModified": "2020-11-18T19:40:35+00:00",
            "Owner": {
                "DisplayName": "xxx",
                "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d"
            }
        },

        {
            "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"",
            "Size": 25,
            "StorageClass": "STANDARD",
            "Key": "version-test",
            "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp",
            "IsLatest": false,
            "LastModified": "2020-11-18T19:39:49+00:00",
            "Owner": {
                "DisplayName": "xxx",
                "ID": "90e716b806d3a58e2950b72c380166e57a0388d557bdc5365dd305838342930d"
            }
        }
    ]
}

 

When we get the object from S3, by default, S3 will return the most recent version.

[osboxes@wk1 ~]$ aws s3api get-object --bucket hirw-bucket-versions --key version-test --range bytes=0-10000 /dev/stdout | head

Hello, this is VERSION TWO

{
    "AcceptRanges": "bytes",
    "LastModified": "2020-11-18T19:40:35+00:00",
    "ContentLength": 27,
    "ETag": "\"04cf95a672923b5c021568765b336c4f\"",
    "VersionId": "8LKeiP26.7WS_CUYUmY_BNmpIX0ljIxa",
    "ContentRange": "bytes 0-26/27",
    "ContentType": "binary/octet-stream",
    "Metadata": {}

To get an older version we need to specify the  version id of the version we are trying to get. So to get the first version we need to specify –version-id “w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp” 

[osboxes@wk1 ~]$ aws s3api get-object --bucket hirw-bucket-versions --key version-test --version-id "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp" /dev/stdout | head

Hello, this is VERSION 1

{
    "AcceptRanges": "bytes",
    "LastModified": "2020-11-18T19:39:49+00:00",
    "ContentLength": 25,
    "ETag": "\"43e9d964d061d355ea04efcd5fee0b5d\"",
    "VersionId": "w6l8ijFxRK5JtEiwfcjeYYn5IGe8oTKp",
    "ContentType": "binary/octet-stream",
    "Metadata": {}
}

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to get a specific version of a file from S3 using AWS CLI?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X