mt.base.s3transfer

Abstractions over S3’s upload/download operations.

This module provides high level abstractions for efficient uploads/downloads. It handles several things for the user:

  • Automatically switching to multipart transfers when a file is over a specific size threshold

  • Uploading/downloading a file in parallel

  • Progress callbacks to monitor transfers

  • Retries. While botocore handles retries for streaming uploads, it is not possible for it to handle retries for streaming downloads. This module handles retries for both cases so you don’t need to implement any retry logic yourself.

This module has a reasonable set of defaults. It also allows you to configure many aspects of the transfer process including:

  • Multipart threshold size

  • Max parallel downloads

  • Socket timeouts

  • Retry amounts

There is no support for s3->s3 multipart copies at this time.

Usage

The simplest way to use this module is:

client = boto3.client('s3', 'us-west-2')
transfer = S3Transfer(client)
# Upload /tmp/myfile to s3://bucket/key
transfer.upload_file('/tmp/myfile', 'bucket', 'key')

# Download s3://bucket/key to /tmp/myfile
transfer.download_file('bucket', 'key', '/tmp/myfile')

The upload_file and download_file methods also accept **kwargs, which will be forwarded through to the corresponding client operation. Here are a few examples using upload_file:

# Making the object public
transfer.upload_file('/tmp/myfile', 'bucket', 'key',
                     extra_args={'ACL': 'public-read'})

# Setting metadata
transfer.upload_file('/tmp/myfile', 'bucket', 'key',
                     extra_args={'Metadata': {'a': 'b', 'c': 'd'}})

# Setting content type
transfer.upload_file('/tmp/myfile.json', 'bucket', 'key',
                     extra_args={'ContentType': "application/json"})

The S3Transfer class also supports progress callbacks so you can provide transfer progress to users. Both the upload_file and download_file methods take an optional callback parameter. Here’s an example of how to print a simple progress percentage to the user:

class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            sys.stdout.write(
                "
%s %s / %s (%.2f%%)” % (

self._filename, self._seen_so_far, self._size, percentage))

sys.stdout.flush()

transfer = S3Transfer(boto3.client(‘s3’, ‘us-west-2’)) # Upload /tmp/myfile to s3://bucket/key and print upload progress. transfer.upload_file(‘/tmp/myfile’, ‘bucket’, ‘key’,

callback=ProgressPercentage(‘/tmp/myfile’))

You can also provide a TransferConfig object to the S3Transfer object that gives you more fine grained control over the transfer. For example:

client = boto3.client('s3', 'us-west-2')
config = TransferConfig(
    multipart_threshold=8 * 1024 * 1024,
    max_concurrency=10,
    num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file('/tmp/foo', 'bucket', 'key')