Using a Custom IAM Role to Access S3
You may have data in S3 buckets that are not directly accessible by the Task Execution IAM role that is used by tasks running on Outerbounds by default. For instance, the bucket may reside in another AWS account.
In this scenario, you can grant tasks access to the bucket by creating a custom role as described here:
1. Create an IAM role
Create an IAM role in the account that hosts the S3 bucket with the necessary Amazon S3 permissions.
2. Add a trust policy
Add a trust policy to that role that allows the Outerbounds Task Execution Role to assume it.
Please contact us to retrieve the ARN specific to your account which you can enter in OBP_PRINCIPAL
.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "OBP_PRINCIPAL"
},
"Action": ["sts:AssumeRole", "sts:SetSourceIdentity"],
"Condition": {}
}
]
}
3. Tag the role
Attach a tag to the role. The tag name should be outerbounds.com/accessible-by-deployment
and the value matches to the name of your Outerbounds deployment, e.g. speedyhawk
.
Your deployment name can be inferred from the Outerbounds UI URL. It is of the format, ui.<deployment_name>.obp.outerbounds.com/dashboard
.
Using the role
Once these steps are done, you should be able to use this role within your Metaflow flows to access data from your S3 bucket.
You can do this by passing the role
argument to :
class CustomS3AccessFlow(FlowSpec):
@step
def start(self):
import pandas as pd
# Read some data from an S3 bucket in some account
with S3(role='<your-custom-IAM-role-arn-from-step-1>') as s3:
tmp_data_path = s3.get('s3://bucket-in-some-account/some-data')
df = pd.read_csv(tmp_data_path.path)
self.summary = df.describe()
self.next(self.end)
@step
def end(self):
print(self.summary)
if __name__ == "__main__":
CustomS3AccessFlow()