mt.base.s3transfer
Abstractions over S3’s upload/download operations.
This module provides high level abstractions for efficient uploads/downloads. It handles several things for the user:
Automatically switching to multipart transfers when a file is over a specific size threshold
Uploading/downloading a file in parallel
Progress callbacks to monitor transfers
Retries. While botocore handles retries for streaming uploads, it is not possible for it to handle retries for streaming downloads. This module handles retries for both cases so you don’t need to implement any retry logic yourself.
This module has a reasonable set of defaults. It also allows you to configure many aspects of the transfer process including:
Multipart threshold size
Max parallel downloads
Socket timeouts
Retry amounts
There is no support for s3->s3 multipart copies at this time.
Usage
The simplest way to use this module is:
client = boto3.client('s3', 'us-west-2')
transfer = S3Transfer(client)
# Upload /tmp/myfile to s3://bucket/key
transfer.upload_file('/tmp/myfile', 'bucket', 'key')
# Download s3://bucket/key to /tmp/myfile
transfer.download_file('bucket', 'key', '/tmp/myfile')
The upload_file
and download_file
methods also accept
**kwargs
, which will be forwarded through to the corresponding
client operation. Here are a few examples using upload_file
:
# Making the object public
transfer.upload_file('/tmp/myfile', 'bucket', 'key',
extra_args={'ACL': 'public-read'})
# Setting metadata
transfer.upload_file('/tmp/myfile', 'bucket', 'key',
extra_args={'Metadata': {'a': 'b', 'c': 'd'}})
# Setting content type
transfer.upload_file('/tmp/myfile.json', 'bucket', 'key',
extra_args={'ContentType': "application/json"})
The S3Transfer
class also supports progress callbacks so you can
provide transfer progress to users. Both the upload_file
and
download_file
methods take an optional callback
parameter.
Here’s an example of how to print a simple progress percentage
to the user:
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"
- %s %s / %s (%.2f%%)” % (
self._filename, self._seen_so_far, self._size, percentage))
sys.stdout.flush()
transfer = S3Transfer(boto3.client(‘s3’, ‘us-west-2’)) # Upload /tmp/myfile to s3://bucket/key and print upload progress. transfer.upload_file(‘/tmp/myfile’, ‘bucket’, ‘key’,
callback=ProgressPercentage(‘/tmp/myfile’))
You can also provide a TransferConfig object to the S3Transfer object that gives you more fine grained control over the transfer. For example:
client = boto3.client('s3', 'us-west-2')
config = TransferConfig(
multipart_threshold=8 * 1024 * 1024,
max_concurrency=10,
num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file('/tmp/foo', 'bucket', 'key')