Using a presigned URL to identify Amazon S3 usage by the requester
Many software-as-a-service (SaaS) products have a pricing model with payment according to usage, charging customers only for the resources consumed. However, pay-as-you-use pricing is only viable if each customer's resource usage, such as computing capacity, storage, and network bandwidth, can be accurately tracked. SaaS providers have no insight into each customer's resource usage without this data and cannot charge based on usage. This is just one example of a critical need for the ability to track the consumption of similar resources accurately.
In this article, the authors describe how to track who has downloaded objects from your Amazon Simple Storage Service (S3) resources. Specifically, downloads are tracked using predefined URLs. The developers generate a pre-signed URL with a custom parameter using the Signature Version 4 (SigV4) process and then send a query to the S3 server access logs using Amazon Athena to identify who made the requests. By attaching a custom parameter to the pre-signed URL, S3 resource owners can track usage using the custom parameter as an identifier. This allows resource owners to charge users based on how often and how much data users download from their environments.
Permission is required to create a pre-signed URL
When creating a pre-signed URL, keep permissions as low as possible. A person with a valid signed URL can access objects as if they were the original signing user, which makes it necessary to lock down the permissions of the entity creating the pre-signed URL. Creating a pre-signed URL for an S3 object requires that the creating user has explicit permission to perform a specific action. For example, the signing user must have read permissions for the S3 object to create a pre-signed URL for a GET request. With this in mind, giving the signer only the necessary permissions to restrict access to the desired level is recommended.
Creating a pre-signed URL with a custom parameter
In this section, the authors will demonstrate how to generate a pre-signed URL with a custom parameter for an object in a private S3 tray. To do this, you must write code to sign the request using the SigV4 process. This step is necessary to provide authentication information in the request. The authors will use the Java code sample provided in the Amazon S3 API reference documentation as a reference for the signature calculation process and make minor changes to the code to generate a pre-signed URL with custom parameters.
Follow these steps to download and navigate to the provided sample code.
1. download the Java sample code.
2. extract the zip file.
3. navigate to com/amazonaws/services/s3/sample.
Once you reach the code folder, make changes to the three files:
RunAllSamples.java and PresignedUrlSample.java in the sample folder and AWS4SignerForQueryParameterAuth.java in the sample/auth folder.
For a quick overview of the files:
- RunAllSamples.java: this file runs the four example code snippets provided in the documentation.
- PresignedUrlSample.java: This file is responsible for collecting the relevant parameters, calculating the signature using the parameters, and generating the final, pre-signed URL. The file is hardcoded to generate the pre-signed URL for the ExampleObject.txt object. You can later change the name of this object in the file to suit your needs.
- AWS4SignerForQueryParameterAuth.java: This file contains code that calculates the signature using the query string parameters. It is used in PresignedUrlSample.java.
The RunAllSamples.java file requires the four variables awsAccessKey, awsSecretKey, bucketName, and regionName to run the script. For demonstration purposes, the authors use existing code from the documentation that uses static credentials, but for a production implementation, passing temporary credentials is recommended as the best solution. This can prevent permanently encoded access and AWS secret keys from being accidentally passed to the code repository.
Once you have populated the four variables (awsAccessKey, awsSecretKey, bucketName, and regionName) in RunAllSamples.java, compile and run the code according to the instructions in the Amazon S3 API reference documentation. This should run four code samples, including code that generates a pre-signed URL for the ExampleObject.txt file in the specified S3 tray.
To include a custom parameter in the pre-signed URL, you must make the following changes to the Java code sample.
- Change lines 30 and 32 in Java from ExampleObject.txt to the object's name in your S3 tray.
- On line 44 in Java, add a query parameter. This will be your custom parameter added to the pre-signed URL. You can also add multiple parameters. In the example, the authors have added name/johndoe as an example key-value pair.
- In Java, you need to add another line for the query parameter added in step 3 to be added to the previously signed URL.
- After following the steps above, you can follow the exact instructions in the Amazon S3 API reference documentation to compile and run the code that generates the pre-signed URL.
The previously added custom parameter name=johndoe is now appended to the end of the pre-signed URL. You can now retrieve the object from the pre-signed URL.
Questioning S3 server access logs to identify usage patterns
Now that you have generated a pre-signed URL with a custom parameter, the next step is to track URL usage. First, however, you must enable S3 server access logs for the S3 tray. Once you do this, you can query the server access logs to identify the requesters. There are many ways to send queries to the server access logs stored in the S3 tray. For example, you can analyze the server access logs by streaming the logs to Amazon OpenSearch Service via AWS Lambda or using Pandas with AWS SDK for Python. In this article, the authors use Amazon Athena to send queries to the S3 server access logs and identify requests using a custom parameter. You can follow the steps below to create an Athena table. Once you have done this, you can start checking the S3 server access logs.
Running sample queries
First, run the following query to see what information is being logged.
SQL
SELECT * FROM s3_access_logs_db.mybucket_logs limit 10;
The previous query should return 24 columns, including tray name, http status, reporter IP address and more. The custom parameter added in the above exercise will appear under request_uri. If you were to charge clients based on the amount of data they downloaded from your S3 tray, you could run a query looking for access logs that only return the custom parameter in the request_uri and bytes of a successful GET request.
SQL
SELECT SPLIT_PART(SPLIT_PART(request_uri,‘name=’,2),‘ “,1), bytessent FROM s3_access_logs_db.mybucket_logs WHERE httpstatus=”200’ AND operation=‘REST.GET.OBJECT’ AND request_uri LIKE ‘%name=%’;
The screenshot above shows the value of the johndoe client name appearing in the S3 server access logs when queried by Athena. Knowing this, you can use the example code to generate pre-signed URLs dedicated to each client and track their usage.
Ordering
The only ongoing charges from this exercise are the cost of storing files in S3. If you do not wish to incur further charges, delete the S3 files created for testing purposes.
Conclusions
From this article, the authors have discussed how to identify Amazon S3 usage on demand. They cover making modifications to existing sample code to add a custom parameter to a pre-signed URL and querying S3 server access logs to identify the request using the added custom parameter. With this information, you no longer need to guess at the per-customer traffic in your S3 tray and can offer your customers access to S3 objects in a pay-as-you-go pricing model. User access patterns can be further used to infer user needs and can help you create new and improved offers for your customers.