Glacier

boto.glacier

boto.glacier.connect_to_region(region_name, **kw_params)
boto.glacier.regions()

Get all available regions for the Amazon Glacier service.

Return type:list
Returns:A list of boto.regioninfo.RegionInfo

boto.glacier.layer1

class boto.glacier.layer1.Layer1(aws_access_key_id=None, aws_secret_access_key=None, account_id='-', is_secure=True, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, path='/', provider='aws', security_token=None, suppress_consec_slashes=True, region=None, region_name='us-east-1')
Version = '2012-06-01'

Glacier API version.

abort_multipart_upload(vault_name, upload_id)

Call this to abort a multipart upload identified by the upload ID.

Parameters:
  • vault_name (str) – The name of the vault.
  • upload_id (str) – The unique ID associated with this upload operation.
complete_multipart_upload(vault_name, upload_id, sha256_treehash, archive_size)

Call this to inform Amazon Glacier that all of the archive parts have been uploaded and Amazon Glacier can now assemble the archive from the uploaded parts.

Parameters:
  • vault_name (str) – The name of the vault.
  • upload_id (str) – The unique ID associated with this upload operation.
  • sha256_treehash (str) – The SHA256 tree hash of the entire archive. It is the tree hash of SHA256 tree hash of the individual parts. If the value you specify in the request does not match the SHA256 tree hash of the final assembled archive as computed by Amazon Glacier, Amazon Glacier returns an error and the request fails.
  • archive_size (int) – The total size, in bytes, of the entire archive. This value should be the sum of all the sizes of the individual parts that you uploaded.
create_vault(vault_name)

This operation creates a new vault with the specified name. The name of the vault must be unique within a region for an AWS account. You can create up to 1,000 vaults per account. For information on creating more vaults, go to the Amazon Glacier product detail page.

You must use the following guidelines when naming a vault.

Names can be between 1 and 255 characters long.

Allowed characters are a–z, A–Z, 0–9, ‘_’ (underscore), ‘-‘ (hyphen), and ‘.’ (period).

This operation is idempotent, you can send the same request multiple times and it has no further effect after the first time Amazon Glacier creates the specified vault.

Parameters:vault_name (str) – The name of the new vault
delete_archive(vault_name, archive_id)

This operation deletes an archive from a vault.

Parameters:
  • vault_name (str) – The name of the new vault
  • archive_id (str) – The ID for the archive to be deleted.
delete_vault(vault_name)

This operation deletes a vault. Amazon Glacier will delete a vault only if there are no archives in the vault as per the last inventory and there have been no writes to the vault since the last inventory. If either of these conditions is not satisfied, the vault deletion fails (that is, the vault is not removed) and Amazon Glacier returns an error.

This operation is idempotent, you can send the same request multiple times and it has no further effect after the first time Amazon Glacier delete the specified vault.

Parameters:vault_name (str) – The name of the new vault
delete_vault_notifications(vault_name)

This operation deletes the notification-configuration subresource set on the vault.

Parameters:vault_name (str) – The name of the new vault
describe_job(vault_name, job_id)

This operation returns information about a job you previously initiated, including the job initiation date, the user who initiated the job, the job status code/message and the Amazon Simple Notification Service (Amazon SNS) topic to notify after Amazon Glacier completes the job.

Parameters:
  • vault_name (str) – The name of the new vault
  • job_id (str) – The ID of the job.
describe_vault(vault_name)

This operation returns information about a vault, including the vault Amazon Resource Name (ARN), the date the vault was created, the number of archives contained within the vault, and the total size of all the archives in the vault. The number of archives and their total size are as of the last vault inventory Amazon Glacier generated. Amazon Glacier generates vault inventories approximately daily. This means that if you add or remove an archive from a vault, and then immediately send a Describe Vault request, the response might not reflect the changes.

Parameters:vault_name (str) – The name of the new vault
get_job_output(vault_name, job_id, byte_range=None)

This operation downloads the output of the job you initiated using Initiate a Job. Depending on the job type you specified when you initiated the job, the output will be either the content of an archive or a vault inventory.

You can download all the job output or download a portion of the output by specifying a byte range. In the case of an archive retrieval job, depending on the byte range you specify, Amazon Glacier returns the checksum for the portion of the data. You can compute the checksum on the client and verify that the values match to ensure the portion you downloaded is the correct data.

Parameters:
  • vault_name (str :param) – The name of the new vault
  • job_id (str) – The ID of the job.
  • range – A tuple of integers specifying the slice (in bytes) of the archive you want to receive
get_vault_notifications(vault_name)

This operation retrieves the notification-configuration subresource set on the vault.

Parameters:vault_name (str) – The name of the new vault
initiate_job(vault_name, job_data)

This operation initiates a job of the specified type. Retrieving an archive or a vault inventory are asynchronous operations that require you to initiate a job. It is a two-step process:

  • Initiate a retrieval job.
  • After the job completes, download the bytes.

The retrieval is executed asynchronously. When you initiate a retrieval job, Amazon Glacier creates a job and returns a job ID in the response.

Parameters:
  • vault_name (str) – The name of the new vault
  • job_data (dict) –

    A Python dictionary containing the information about the requested job. The dictionary can contain the following attributes:

    • ArchiveId - The ID of the archive you want to retrieve. This field is required only if the Type is set to archive-retrieval.
    • Description - The optional description for the job.
    • Format - When initiating a job to retrieve a vault inventory, you can optionally add this parameter to specify the output format. Valid values are: CSV|JSON.
    • SNSTopic - The Amazon SNS topic ARN where Amazon Glacier sends a notification when the job is completed and the output is ready for you to download.
    • Type - The job type. Valid values are: archive-retrieval|inventory-retrieval
    • RetrievalByteRange - Optionally specify the range of bytes to retrieve.
initiate_multipart_upload(vault_name, part_size, description=None)

Initiate a multipart upload. Amazon Glacier creates a multipart upload resource and returns it’s ID. You use this ID in subsequent multipart upload operations.

Parameters:
  • vault_name (str) – The name of the vault.
  • description (str) – An optional description of the archive.
  • part_size (int) – The size of each part except the last, in bytes. The part size must be a multiple of 1024 KB multiplied by a power of 2. The minimum allowable part size is 1MB and the maximum is 4GB.
list_jobs(vault_name, completed=None, status_code=None, limit=None, marker=None)

This operation lists jobs for a vault including jobs that are in-progress and jobs that have recently finished.

Parameters:
  • vault_name (str) – The name of the vault.
  • completed (boolean) – Specifies the state of the jobs to return. If a value of True is passed, only completed jobs will be returned. If a value of False is passed, only uncompleted jobs will be returned. If no value is passed, all jobs will be returned.
  • status_code (string) – Specifies the type of job status to return. Valid values are: InProgress|Succeeded|Failed. If not specified, jobs with all status codes are returned.
  • limit (int) – The maximum number of items returned in the response. If you don’t specify a value, the List Jobs operation returns up to 1,000 items.
  • marker (str) – An opaque string used for pagination. marker specifies the job at which the listing of jobs should begin. Get the marker value from a previous List Jobs response. You need only include the marker if you are continuing the pagination of results started in a previous List Jobs request.
list_multipart_uploads(vault_name, limit=None, marker=None)

Lists in-progress multipart uploads for the specified vault.

Parameters:
  • vault_name (str) – The name of the vault.
  • limit (int) – The maximum number of items returned in the response. If you don’t specify a value, the operation returns up to 1,000 items.
  • marker (str) – An opaque string used for pagination. marker specifies the item at which the listing should begin. Get the marker value from a previous response. You need only include the marker if you are continuing the pagination of results started in a previous request.
list_parts(vault_name, upload_id, limit=None, marker=None)

Lists in-progress multipart uploads for the specified vault.

Parameters:
  • vault_name (str) – The name of the vault.
  • upload_id (str) – The unique ID associated with this upload operation.
  • limit (int) – The maximum number of items returned in the response. If you don’t specify a value, the operation returns up to 1,000 items.
  • marker (str) – An opaque string used for pagination. marker specifies the item at which the listing should begin. Get the marker value from a previous response. You need only include the marker if you are continuing the pagination of results started in a previous request.
list_vaults(limit=None, marker=None)

This operation lists all vaults owned by the calling user’s account. The list returned in the response is ASCII-sorted by vault name.

By default, this operation returns up to 1,000 items. If there are more vaults to list, the marker field in the response body contains the vault Amazon Resource Name (ARN) at which to continue the list with a new List Vaults request; otherwise, the marker field is null. In your next List Vaults request you set the marker parameter to the value Amazon Glacier returned in the responses to your previous List Vaults request. You can also limit the number of vaults returned in the response by specifying the limit parameter in the request.

Parameters:
  • limit (int) – The maximum number of items returned in the response. If you don’t specify a value, the List Vaults operation returns up to 1,000 items.
  • marker (str) – A string used for pagination. marker specifies the vault ARN after which the listing of vaults should begin. (The vault specified by marker is not included in the returned list.) Get the marker value from a previous List Vaults response. You need to include the marker only if you are continuing the pagination of results started in a previous List Vaults request. Specifying an empty value (“”) for the marker returns a list of vaults starting from the first vault.
make_request(verb, resource, headers=None, data='', ok_responses=(200, ), params=None, sender=None, response_headers=None)
set_vault_notifications(vault_name, notification_config)

This operation retrieves the notification-configuration subresource set on the vault.

Parameters:
  • vault_name (str) – The name of the new vault
  • notification_config (dict) –

    A Python dictionary containing an SNS Topic and events for which you want Amazon Glacier to send notifications to the topic. Possible events are:

    • ArchiveRetrievalCompleted - occurs when a job that was initiated for an archive retrieval is completed.
    • InventoryRetrievalCompleted - occurs when a job that was initiated for an inventory retrieval is completed.

    The format of the dictionary is:

    {‘SNSTopic’: ‘mytopic’,
    ‘Events’: [event1,...]}
upload_archive(vault_name, archive, linear_hash, tree_hash, description=None)

This operation adds an archive to a vault. For a successful upload, your data is durably persisted. In response, Amazon Glacier returns the archive ID in the x-amz-archive-id header of the response. You should save the archive ID returned so that you can access the archive later.

Parameters:
  • vault_name (str :param) – The name of the vault
  • archive (bytes) – The data to upload.
  • linear_hash (str) – The SHA256 checksum (a linear hash) of the payload.
  • tree_hash (str) – The user-computed SHA256 tree hash of the payload. For more information on computing the tree hash, see http://goo.gl/u7chF.
  • description (str) – An optional description of the archive.
upload_part(vault_name, upload_id, linear_hash, tree_hash, byte_range, part_data)

Lists in-progress multipart uploads for the specified vault.

Parameters:
  • vault_name (str) – The name of the vault.
  • linear_hash (str) – The SHA256 checksum (a linear hash) of the payload.
  • tree_hash (str) – The user-computed SHA256 tree hash of the payload. For more information on computing the tree hash, see http://goo.gl/u7chF.
  • upload_id (str) – The unique ID associated with this upload operation.
  • byte_range (tuple of ints) – Identfies the range of bytes in the assembled archive that will be uploaded in this part.
  • part_data (bytes) – The data to be uploaded for the part

boto.glacier.layer2

class boto.glacier.layer2.Layer2(*args, **kwargs)

Provides a more pythonic and friendly interface to Glacier based on Layer1

create_vault(name)

Creates a vault.

Parameters:name (str) – The name of the vault
Return type:boto.glacier.vault.Vault
Returns:A Vault object representing the vault.
delete_vault(name)

Delete a vault.

This operation deletes a vault. Amazon Glacier will delete a vault only if there are no archives in the vault as per the last inventory and there have been no writes to the vault since the last inventory. If either of these conditions is not satisfied, the vault deletion fails (that is, the vault is not removed) and Amazon Glacier returns an error.

This operation is idempotent, you can send the same request multiple times and it has no further effect after the first time Amazon Glacier delete the specified vault.

Parameters:vault_name (str) – The name of the vault to delete.
get_vault(name)

Get an object representing a named vault from Glacier. This operation does not check if the vault actually exists.

Parameters:name (str) – The name of the vault
Return type:boto.glacier.vault.Vault
Returns:A Vault object representing the vault.
list_vaults()

Return a list of all vaults associated with the account ID.

Return type:List of boto.glacier.vault.Vault
Returns:A list of Vault objects.

boto.glacier.vault

class boto.glacier.vault.Vault(layer1, response_data=None)
DefaultPartSize = 4194304
ResponseDataElements = (('VaultName', 'name', None), ('VaultARN', 'arn', None), ('CreationDate', 'creation_date', None), ('LastInventoryDate', 'last_inventory_date', None), ('SizeInBytes', 'size', 0), ('NumberOfArchives', 'number_of_archives', 0))
SingleOperationThreshold = 104857600
concurrent_create_archive_from_file(filename, description, **kwargs)

Create a new archive from a file and upload the given file.

This is a convenience method around the boto.glacier.concurrent.ConcurrentUploader class. This method will perform a multipart upload and upload the parts of the file concurrently.

Parameters:
  • filename (str) – A filename to upload
  • kwargs – Additional kwargs to pass through to boto.glacier.concurrent.ConcurrentUploader. You can pass any argument besides the api and vault_name param (these arguments are already passed to the ConcurrentUploader for you).
Raises :

boto.glacier.exception.UploadArchiveError is an error occurs during the upload process.

Return type:

str

Returns:

The archive id of the newly created archive

create_archive_from_file(filename=None, file_obj=None, description=None, upload_id_callback=None)

Create a new archive and upload the data from the given file or file-like object.

Parameters:
  • filename (str) – A filename to upload
  • file_obj (file) – A file-like object to upload
  • description (str) – An optional description for the archive.
  • upload_id_callback (function) – if set, call with the upload_id as the only parameter when it becomes known, to enable future calls to resume_archive_from_file in case resume is needed.
Return type:

str

Returns:

The archive id of the newly created archive

create_archive_writer(part_size=4194304, description=None)

Create a new archive and begin a multi-part upload to it. Returns a file-like object to which the data for the archive can be written. Once all the data is written the file-like object should be closed, you can then call the get_archive_id method on it to get the ID of the created archive.

Parameters:
  • part_size (int) – The part size for the multipart upload.
  • description (str) – An optional description for the archive.
Return type:

boto.glacier.writer.Writer

Returns:

A Writer object that to which the archive data should be written.

delete()

Delete’s this vault. WARNING!

delete_archive(archive_id)

This operation deletes an archive from the vault.

Parameters:archive_id (str) – The ID for the archive to be deleted.
get_job(job_id)

Get an object representing a job in progress.

Parameters:job_id (str) – The ID of the job
Return type:boto.glacier.job.Job
Returns:A Job object representing the job.
list_all_parts(upload_id)

Automatically make and combine multiple calls to list_parts.

Call list_parts as necessary, combining the results in case multiple calls were required to get data on all available parts.

list_jobs(completed=None, status_code=None)

Return a list of Job objects related to this vault.

Parameters:
  • completed (boolean) – Specifies the state of the jobs to return. If a value of True is passed, only completed jobs will be returned. If a value of False is passed, only uncompleted jobs will be returned. If no value is passed, all jobs will be returned.
  • status_code (string) – Specifies the type of job status to return. Valid values are: InProgress|Succeeded|Failed. If not specified, jobs with all status codes are returned.
Return type:

list of boto.glacier.job.Job

Returns:

A list of Job objects related to this vault.

resume_archive_from_file(upload_id, filename=None, file_obj=None)

Resume upload of a file already part-uploaded to Glacier.

The resumption of an upload where the part-uploaded section is empty is a valid degenerate case that this function can handle.

One and only one of filename or file_obj must be specified.

Parameters:
  • upload_id (str) – existing Glacier upload id of upload being resumed.
  • filename (str) – file to open for resume
  • fobj (file) – file-like object containing local data to resume. This must read from the start of the entire upload, not just from the point being resumed. Use fobj.seek(0) to achieve this if necessary.
Return type:

str

Returns:

The archive id of the newly created archive

retrieve_archive(archive_id, sns_topic=None, description=None)

Initiate a archive retrieval job to download the data from an archive. You will need to wait for the notification from Amazon (via SNS) before you can actually download the data, this takes around 4 hours.

Parameters:
  • archive_id (str) – The id of the archive
  • description (str) – An optional description for the job.
  • sns_topic (str) – The Amazon SNS topic ARN where Amazon Glacier sends notification when the job is completed and the output is ready for you to download.
Return type:

boto.glacier.job.Job

Returns:

A Job object representing the retrieval job.

retrieve_inventory(sns_topic=None, description=None)

Initiate a inventory retrieval job to list the items in the vault. You will need to wait for the notification from Amazon (via SNS) before you can actually download the data, this takes around 4 hours.

Parameters:
  • description (str) – An optional description for the job.
  • sns_topic (str) – The Amazon SNS topic ARN where Amazon Glacier sends notification when the job is completed and the output is ready for you to download.
Return type:

str

Returns:

The ID of the job

retrieve_inventory_job(**kwargs)

Identical to retrieve_inventory, but returns a Job instance instead of just the job ID.

Parameters:
  • description (str) – An optional description for the job.
  • sns_topic (str) – The Amazon SNS topic ARN where Amazon Glacier sends notification when the job is completed and the output is ready for you to download.
Return type:

boto.glacier.job.Job

Returns:

A Job object representing the retrieval job.

upload_archive(filename, description=None)

Adds an archive to a vault. For archives greater than 100MB the multipart upload will be used.

Parameters:
  • file (str) – A filename to upload
  • description (str) – An optional description for the archive.
Return type:

str

Returns:

The archive id of the newly created archive

boto.glacier.job

class boto.glacier.job.Job(vault, response_data=None)
DefaultPartSize = 4194304
ResponseDataElements = (('Action', 'action', None), ('ArchiveId', 'archive_id', None), ('ArchiveSizeInBytes', 'archive_size', 0), ('Completed', 'completed', False), ('CompletionDate', 'completion_date', None), ('CreationDate', 'creation_date', None), ('InventorySizeInBytes', 'inventory_size', 0), ('JobDescription', 'description', None), ('JobId', 'id', None), ('SHA256TreeHash', 'sha256_treehash', None), ('SNSTopic', 'sns_topic', None), ('StatusCode', 'status_code', None), ('StatusMessage', 'status_message', None), ('VaultARN', 'arn', None))
download_to_file(filename, chunk_size=4194304, verify_hashes=True, retry_exceptions=(<class 'socket.error'>, ))

Download an archive to a file.

Parameters:
  • filename (str) – The name of the file where the archive contents will be saved.
  • chunk_size (int) – The chunk size to use when downloading the archive.
  • verify_hashes (bool) – Indicates whether or not to verify the tree hashes for each downloaded chunk.
get_output(byte_range=None, validate_checksum=False)

This operation downloads the output of the job. Depending on the job type you specified when you initiated the job, the output will be either the content of an archive or a vault inventory.

You can download all the job output or download a portion of the output by specifying a byte range. In the case of an archive retrieval job, depending on the byte range you specify, Amazon Glacier returns the checksum for the portion of the data. You can compute the checksum on the client and verify that the values match to ensure the portion you downloaded is the correct data.

Parameters:
  • range – A tuple of integer specifying the slice (in bytes) of the archive you want to receive
  • validate_checksum (bool) – Specify whether or not to validate the associate tree hash. If the response does not contain a TreeHash, then no checksum will be verified.

boto.glacier.writer

class boto.glacier.writer.Writer(vault, upload_id, part_size, chunk_size=1048576)

Presents a file-like object for writing to a Amazon Glacier Archive. The data is written using the multi-part upload API.

close()
current_tree_hash

Returns the current tree hash for the data that’s been written so far.

Only once the writing is complete is the final tree hash returned.

current_uploaded_size

Returns the current uploaded size for the data that’s been written so far.

Only once the writing is complete is the final uploaded size returned.

get_archive_id()
upload_id
vault
write(data)
boto.glacier.writer.generate_parts_from_fobj(fobj, part_size)
boto.glacier.writer.resume_file_upload(vault, upload_id, part_size, fobj, part_hash_map, chunk_size=1048576)

Resume upload of a file already part-uploaded to Glacier.

The resumption of an upload where the part-uploaded section is empty is a valid degenerate case that this function can handle. In this case, part_hash_map should be an empty dict.

Parameters:
  • vault – boto.glacier.vault.Vault object.
  • upload_id – existing Glacier upload id of upload being resumed.
  • part_size – part size of existing upload.
  • fobj – file object containing local data to resume. This must read from the start of the entire upload, not just from the point being resumed. Use fobj.seek(0) to achieve this if necessary.
  • part_hash_map – {part_index: part_tree_hash, ...} of data already uploaded. Each supplied part_tree_hash will be verified and the part re-uploaded if there is a mismatch.
  • chunk_size – chunk size of tree hash calculation. This must be 1 MiB for Amazon.

boto.glacier.concurrent

class boto.glacier.concurrent.ConcurrentDownloader(job, part_size=4194304, num_threads=10)

Concurrently download an archive from glacier.

This class uses a thread pool to concurrently download an archive from glacier.

The threadpool is completely managed by this class and is transparent to the users of this class.

Parameters:
  • job – A layer2 job object for archive retrieval object.
  • part_size – The size, in bytes, of the chunks to use when uploading the archive parts. The part size must be a megabyte multiplied by a power of two.
download(filename)

Concurrently download an archive.

Parameters:filename (str) – The filename to download the archive to
class boto.glacier.concurrent.ConcurrentTransferer(part_size=4194304, num_threads=10)
class boto.glacier.concurrent.ConcurrentUploader(api, vault_name, part_size=4194304, num_threads=10)

Concurrently upload an archive to glacier.

This class uses a thread pool to concurrently upload an archive to glacier using the multipart upload API.

The threadpool is completely managed by this class and is transparent to the users of this class.

Parameters:
  • api (boto.glacier.layer1.Layer1) – A layer1 glacier object.
  • vault_name (str) – The name of the vault.
  • part_size (int) – The size, in bytes, of the chunks to use when uploading the archive parts. The part size must be a megabyte multiplied by a power of two.
  • num_threads (int) – The number of threads to spawn for the thread pool. The number of threads will control how much parts are being concurrently uploaded.
upload(filename, description=None)

Concurrently create an archive.

The part_size value specified when the class was constructed will be used unless it is smaller than the minimum required part size needed for the size of the given file. In that case, the part size used will be the minimum part size required to properly upload the given file.

Parameters:
  • file (str) – The filename to upload
  • description (str) – The description of the archive.
Return type:

str

Returns:

The archive id of the newly created archive.

class boto.glacier.concurrent.DownloadWorkerThread(job, worker_queue, result_queue, num_retries=5, time_between_retries=5, retry_exceptions=<type 'exceptions.Exception'>)

Individual download thread that will download parts of the file from Glacier. Parts to download stored in work queue.

Parts download to a temp dir with each part a separate file

Parameters:
  • job – Glacier job object
  • work_queue – A queue of tuples which include the part_number and part_size
  • result_queue – A priority queue of tuples which include the part_number and the path to the temp file that holds that part’s data.
class boto.glacier.concurrent.TransferThread(worker_queue, result_queue)
run()
class boto.glacier.concurrent.UploadWorkerThread(api, vault_name, filename, upload_id, worker_queue, result_queue, num_retries=5, time_between_retries=5, retry_exceptions=<type 'exceptions.Exception'>)

boto.glacier.exceptions

exception boto.glacier.exceptions.ArchiveError
exception boto.glacier.exceptions.DownloadArchiveError
exception boto.glacier.exceptions.TreeHashDoesNotMatchError
exception boto.glacier.exceptions.UnexpectedHTTPResponseError(expected_responses, response)
exception boto.glacier.exceptions.UploadArchiveError

Table Of Contents

Previous topic

fps

Next topic

GS

This Page

PDF Version