What are S3's default read and write consistency models?

***savas*** · 11-05-2020, 01:18 PM

[Image: drivemaker-s3-ftp-sftp-drive-map-mobile.png]

S3's default read and write consistency models are a big deal for developers like us, especially when you're building applications that depend heavily on reliable data storage. The default behavior you encounter can really impact how you architect your solutions and manage state.

To break it down, you should know that S3 offers strong read-after-write consistency for both PUTS of new objects and overwrites of existing objects. This means that if I upload a new object or overwrite an existing one, I can immediately read that object and I’ll get the up-to-date version. This is useful because you don’t have to worry about the eventual consistency model that you might see in other systems where there’s a lag between the writes and the reads.

For example, if I were to upload a profile picture for a user in an application, and then almost right afterward, I try to retrieve that image, S3 makes sure that I see the newly uploaded picture right away. You can imagine how annoying it would be to upload a new image and then still see the old one if there was a delay in the consistency. It’s nice that I can have that immediate feedback loop during development.

Now, you might wonder how this manifests in a more complex scenario. If you’re working with dynamic web applications and users upload files frequently, you won’t have to implement extra logic to deal with any outdated reads. If your application were to use a different storage solution that didn’t provide strong consistency, your users could end up in a frustrating situation where they see stale data or uploads don’t reflect immediately. That can lead you down a rabbit hole of debugging and handling race conditions that could waste a lot of your time.

Let’s also talk about write consistency. There’s a perception that S3 may only guarantee this kind of strong consistency in the context of one-off writes or overwrites, but it applies across all of S3. If I write a new object and you happen to try to read that right after, you will see the new data, contrary to some older models where it would take a moment for the writes to propagate and reflect in all reads. This makes writing code that relies on the current state of the application much simpler because, from the get-go, you can rely on the guarantees S3 gives you.

Now, while we have this strong consistency, there's something important to keep in mind regarding the listing of objects. Even though you might have strong read-after-write consistency on individual PUT operations, the LIST operations in S3 don't give you the same guarantees. If I were to upload three objects and then immediately call a LIST operation, there’s a chance that I might not see all three in the response, especially if the operation is near real-time or if there is a lot of churn in the bucket. This behavior can sometimes catch you off guard, particularly if your application logic is heavily reliant on those LIST calls returning all the objects.

You may want to implement a pattern that proactively verifies that the objects you expect to be there are indeed present, especially if your workflow depends heavily on those lists. In such a case, combining direct ID lookups with LIST operations is often a good way to achieve a more reliable outcome. That way, you’re not leaving anything up to chance.

Let me highlight another aspect of S3's behavior during data ingestion. When I upload a large object, I might be using multipart uploads to break the file into manageable chunks, especially when I deal with videos or large images. Now, here's where the strong consistency model plays a significant role as well. If I upload those chunks and one of them gets uploaded, I can read the uploaded parts immediately, keeping track of which parts are completed. The beauty here is that even though I may still be uploading additional parts, each part I upload is immediately available to read. This can be a game-changer when you’re running an application that needs to display or process data as it comes online.

You might find yourself using that feature often because it scales beautifully with performance. However, one takeaway is understanding that while I experience this immediate read availability for individual parts, there’s still some nuance to how those parts form a complete object once the process finishes. Until all parts are uploaded and the entire object is completed, a read attempt for the complete object will return an error since the object doesn’t exist until the whole multipart upload is finalized. In this case, having some error handling logic to catch and report failures when reading unfinalized objects is essential.

One crucial part of working with S3, especially regarding its consistency guarantees, is how you implement your caching strategies. While developing, I often find myself using local caches to store results from previous calls to help reduce latency on subsequent requests. However, because of S3's strong consistency with writes, you need to play carefully. If you update the data in S3, you’ll want to invalidate or update your cache accordingly, because the last thing you want is to serve stale data. Implementing cache expiration or clear logic after certain PUT operations helps me stay on top of the most current data.

Alright, let’s touch on integration. If your app interacts with Lambda functions that trigger on S3 events, you gain even more insight into the strong consistency. When I have a Lambda function trigger for a new object creation, I can depend on the fact that the object which triggered the event is the one I can work with, ensuring that the payload reflects the current state. I don’t run into issues where I must wait for a delay for the Lambda to pick up the newly written data – it just works.

In scenarios where you’re facing strict performance needs and must ensure that your read operations reflect the latest data, the architecture can benefit from S3's strong consistency. You can design your workflows to ensure timely user interactions and maintain an efficient load on your resources. The implications of this model reach further into operational costs as well. If you know that your read operations have this guarantee, along with effective caching mechanisms, you can save on a multitude of resources.

Considering where S3 fits into larger architectures, if you decide to use it in conjunction with other AWS services like DynamoDB or RDS, you can simplify your consistency management across your entire stack. I tend to be selective with workflows that require data from multiple services and ensure that I clearly document the assumed states after interactions with S3 or other storage services. Developers often forget that mixing different consistency models can lead to unexpected behaviors if not handled properly, especially when user experience is on the line.

At the end of the day, having a solid understanding of S3’s consistency models will empower you to create robust and efficient applications. You’ll find that navigating the challenges posed by eventual inconsistencies elsewhere can become an easier mental load to carry when you leverage S3's strong guarantees where applicable. As we build systems capable of handling various workloads, thinking strategically about how to utilize these models effectively can lead to cleaner code and a more enjoyable development experience.

In conclusion, embracing S3’s behavior, especially concerning read-after-write consistency, offers a level of confidence that allows you to innovate without being shackled by concerns of stale data in the context of writes. Being aware of these details can help you build applications that not only perform effectively but also enhance user experience across the board. It’s fascinating how these architectural choices impact everything we build in the AWS ecosystem and even outside of it.