Using a Custom IAM Role to Access S3

You may have data in S3 buckets that are not directly accessible by the Task Execution IAM role that is used by tasks running on Outerbounds by default. For instance, the bucket may reside in another AWS account.

In this scenario, you can grant tasks access to the bucket by creating a custom role as described here:

1. Create an IAM role

Create an IAM role in the account that hosts the S3 bucket with the necessary Amazon S3 permissions.

2. Add a trust policy

Add a trust policy to that role that allows the Outerbounds Task Execution Role to assume it.

Please contact us to retrieve the ARN specific to your account which you can enter in OBP_PRINCIPAL.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "OBP_PRINCIPAL"
            },
            "Action": ["sts:AssumeRole", "sts:SetSourceIdentity"],
            "Condition": {}
        }
    ]
}

3. Tag the role

Attach a tag to the role. The tag name should be outerbounds.com/accessible-by-deployment and the value matches to the name of your Outerbounds deployment, e.g. speedyhawk.

tip

Your deployment name can be inferred from the Outerbounds UI URL. It is of the format, ui.<deployment_name>.obp.outerbounds.com/dashboard.

Using the role

Once these steps are done, you should be able to use this role within your Metaflow flows to access data from your S3 bucket.

You can do this by passing the role argument to the metaflow.S3 client:

class CustomS3AccessFlow(FlowSpec):
    @step
    def start(self):
        import pandas as pd

        # Read some data from an S3 bucket in some account
        with S3(role='<your-custom-IAM-role-arn-from-step-1>') as s3:
            tmp_data_path = s3.get('s3://bucket-in-some-account/some-data')
            df = pd.read_csv(tmp_data_path.path)
            self.summary = df.describe()

        self.next(self.end)

    @step
    def end(self):
        print(self.summary)

if __name__ == "__main__":
    CustomS3AccessFlow()

Using a Custom IAM Role to Access S3

1. Create an IAM role​

2. Add a trust policy​

3. Tag the role​

Using the role​

1. Create an IAM role

2. Add a trust policy

3. Tag the role

Using the role