Upload Files to S3 Using Python

Amazon S3 (Simple Storage Service) is a scalable object storage service provided by AWS. In this tutorial, we’ll learn how to upload files to S3 using Python. We’ll cover various methods and provide code examples along with descriptions.

Prerequisites
Methods of Uploading Files to S3
References
Summary

Prerequisites

Before we begin, make sure you have the following:

Python installed on your system.
An AWS account and S3 bucket where you will upload the files.
AWS Access Key ID and Secret Access Key.

Methods of Uploading Files to S3

Boto3 Setup

To work with S3 in Python, we’ll use the Boto3 library, which is the Amazon Web Services (AWS) SDK for Python.

pip install boto3

Uploading a Single File

To upload a single file to an S3 bucket, you can use the following example:

import boto3

# Define your AWS credentials
aws_access_key_id = 'YOUR_ACCESS_KEY_ID'
aws_secret_access_key = 'YOUR_SECRET_ACCESS_KEY'

# Define the name of the S3 bucket and the file you want to upload
bucket_name = 'your_bucket_name'
file_name = 'path/to/your/file.txt'

# Create a Boto3 S3 client
s3 = boto3.client('s3',aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)

# Upload the file to S3
s3.upload_file(file_name, bucket_name, file_name)

print(f'{file_name} uploaded successfully to {bucket_name}.')

Uploading Multiple Files

For uploading multiple files, you can iterate through a directory and upload each file:

import boto3
import os

# Define your AWS credentials
aws_access_key_id = 'YOUR_ACCESS_KEY_ID'
aws_secret_access_key = 'YOUR_SECRET_ACCESS_KEY'

# Define the name of the S3 bucket
bucket_name = 'your_bucket_name'

# Define the directory containing the files you want to upload
directory_path = '/path/to/your/directory'

# Create a Boto3 S3 client
s3 = boto3.client('s3',aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)

# Iterate over the files in the directory
for root, dirs, files in os.walk(directory_path):
    for file_name in files:
        # Construct the full local path to the file
        local_file_path = os.path.join(root, file_name)
        
        # Construct the S3 key (object key) using the relative path of the file
        s3_key = os.path.relpath(local_file_path, directory_path)
        
        # Upload the file to S3
        s3.upload_file(local_file_path, bucket_name, s3_key)
        
        print(f'{local_file_path} uploaded successfully to {bucket_name} with key {s3_key}.')

Uploading Large Files

The provided example demonstrates how to utilize Amazon S3’s multipart upload capability, specifically designed for handling large files efficiently. By breaking large files into smaller parts, uploading them in parallel, it optimizes the upload process, ensuring reliability and higher throughput.

import boto3
import os

# Define your AWS credentials
aws_access_key_id = 'YOUR_ACCESS_KEY_ID'
aws_secret_access_key = 'YOUR_SECRET_ACCESS_KEY'

# Define the name of the S3 bucket
bucket_name = 'your_bucket_name'

# Define the path to the large file you want to upload
file_path = '/path/to/your/large_file'

# Create a Boto3 S3 client
s3 = boto3.client('s3',
                  aws_access_key_id=aws_access_key_id,
                  aws_secret_access_key=aws_secret_access_key)

# Define TransferConfig with your desired settings
transfer_config = boto3.s3.transfer.TransferConfig(
    multipart_threshold=8 * 1024 * 1024,  # 8 MB
    multipart_chunksize=8 * 1024 * 1024,  # 8 MB
)

# Open the file in binary read mode
with open(file_path, 'rb') as f:
    # Upload the file to S3 with TransferConfig
    s3.upload_fileobj(
        Fileobj=f,
        Bucket=bucket_name,
        Key=os.path.basename(file_path),  # Use the base name of the file as the key
        Config=transfer_config
    )

print(f'{file_path} uploaded successfully to {bucket_name} using TransferConfig.')

TransferConfig object is instantiated to specify multipart upload settings, including the threshold for when to switch to multipart uploads and the size of each part. Next, it opens the file in binary read mode and uses the upload_fileobj method to upload the file object to the S3 bucket with the defined transfer configuration. This approach optimizes the upload process for large files, enhancing efficiency and performance.

References

Boto3 Documentation: AWS SDK for Python
Amazon S3 Documentation: Amazon Simple Storage Service

Summary

We have explored different ways to upload files to S3 using Python. We started with setting up Boto3, proceeded to upload a single file, then multiple files, and finally looked at how to handle large files using multipart uploads. These methods can be integrated into your Python applications to effectively store your files in the cloud with Amazon S3.