AWS S3 API continuation token
AWS S3 ListObjects 1000 per request
There is a common misunderstanding that AWS S3 ListObjects returns only 1000 results.
resp = s3.list_objects_v2(Bucket='gdc-mmrf-commpass-phs000748-2-open')
assert len(resp['Contents']) == 1000
This API and the documentation does not emphasize enough that 1000 results are per page/request.
We are expected to call the API multiple times while checking for NextContinuationToken and pass it to the next call.
continuation_token = None
while True:
api_kwargs = (
{'ContinuationToken': continuation_token} if continuation_token
else {}
)
resp = s3.list_objects_v2(
Bucket='gdc-mmrf-commpass-phs000748-2-open',
**api_kwargs,
)
print(resp['Contents'][-1]['Key'])
continuation_token = resp.get('NextContinuationToken')
if not continuation_token:
break
This continuation/next token also applies to other AWS APIs such as Athena.
Sample Code
# %%
import boto3
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# %%
resp = s3.list_objects_v2(Bucket='gdc-mmrf-commpass-phs000748-2-open')
print(len(resp['Contents'])) # 1000
# %%
print(resp['NextContinuationToken'])
# %%
continuation_token = None
while True:
api_kwargs = (
{'ContinuationToken': continuation_token} if continuation_token
else {}
)
resp = s3.list_objects_v2(
Bucket='gdc-mmrf-commpass-phs000748-2-open',
**api_kwargs,
)
print(resp['Contents'][-1]['Key'])
continuation_token = resp.get('NextContinuationToken')
if not continuation_token:
break
Last modified on 2024-08-22