Why is S3 less suitable than NFS or SMB for applications with shared file system access?

***savas*** · 06-03-2024, 03:58 PM

[Image: drivemaker-s3-ftp-sftp-drive-map-mobile.png]

You have to think about how S3 operates versus NFS or SMB when it comes to applications that require shared file system access. S3 is an object storage service, which means it's designed for storing and retrieving data as discrete units or objects rather than as a file system in the traditional sense. This fundamental difference can lead to complications when you're dealing with applications that require simultaneous access to files or directories by multiple users or systems.

With S3, you're dealing with the concept of objects that are stored in a flat namespace. Each object is accessed via a unique URL, which means when you're trying to manage file locks or coordinated access by multiple users or applications, you run into challenges. I can tell you from experience that when you design applications that expect file system semantics—in terms of directories, files, and their attributes—S3 can get frustrating. It lacks features inherent to a file system that you'd find in NFS or SMB.

Think about locking mechanisms. In NFS or SMB, you can implement file locking to ensure that one user isn’t overwriting changes made by another user in real-time. You can use advisory locks to prevent both read and write access while one process is handling a given file. That makes it practical for collaborative editing or applications that need to read and write data concurrently, like database backups or file editing in applications like Word or Excel.

On the other hand, S3 doesn’t natively support this type of locking. Imagine you’re using S3 as a backend for an application that requires simultaneous updates to a file. You could end up in a race condition where two processes read the same object before either writes back its changes. This can lead to data corruption, which you have to anticipate and handle in your application code. You might think about using versioning to mitigate some risks, but that just adds complexity without solving the underlying issue of lack of transactional integrity, which you can usually find in both NFS and SMB setups.

You might run into more challenges with eventual consistency as well. When you write an object to S3, there’s a delay before other instances recognize that change. If you're expecting something like the behavior of NFS or SMB, where changes are immediately visible throughout the network, you’ll find S3's eventual consistency model a bit unsettling. If one application updates an object and another tries to read it right after, you could still end up with stale information, which is a pain for applications that depend on the most current data.

With NFS, the file is generally accessible as soon as it’s saved, and changes are propagated immediately. The same goes for SMB, which inherently manages file access and updates at a more granular level. If you're building something like a continuous integration pipeline, where multiple services may write to or read from the same files, the lack of predictability with S3 would make me quite nervous.

Another technical limitation to consider is performance. For workloads that require frequent random access operations, you’ll notice a discrepancy between S3 and a traditional file system like NFS or SMB. When you're doing small, quick reads and writes that require low latency, the overhead of going through the REST API to access S3 can become a bottleneck. Your application may introduce unnecessary delays simply because every interaction with S3 adds network latency due to the remote calls involved—especially if you're dealing with thousands of small files.

Take a file server scenario for example. If you have a workload that pulls and pushes files constantly, NFS and SMB are built for efficiency. These protocols optimize data transfer for those kinds of operations, while further supporting operations like distributed file caching that you just don’t get with S3. For example, if you’re running a virtual machine that needs rapid access to disk images, NFS or SMB can effectively use caching and direct file operations. S3 would require you to manage more data shuffling, resulting in poorer responsiveness.

Speaking of caching, let’s consider the caching layer that NFS and SMB usually provide. They allow you to cache file metadata or frequently accessed files, significantly reducing the number of calls you need to make over the network. If I’m working on applications that frequently access the same set of data, I want that data available locally, which isn’t something S3 is optimized for. Each access requires a REST API call and, with the galactic number of API calls required over time, those round-trip times could seriously degrade performance.

On the security front, NFS and SMB also offer a layer of protection that is more aligned with the way file-level permissions work in traditional environments. You can set permissions at a per-file or per-directory level, allowing for stricter access control based on your organizational roles. In contrast, S3 has its own way of handling access permissions through bucket policies or IAM roles, which can sometimes feel clunky depending on your use case. You can easily get into situations where users end up either over-privileged or under-privileged because the permission model is less intuitive for shared access setups.

You might also find that working with S3 leads to greater complexity when considering metadata. If you’re accustomed to working with file properties—like timestamps, ownership, or permissions—when you put or get objects in S3, you’re often left to either reinvent a wheel or use additional services to manage those attributes. For example, a basic text file might carry timestamps and ownership properties with NFS or SMB, easily enabling change tracking and auditing. In S3, you’ve got to write logic around the object to manage metadata, which complicates things if you're doing bulk operations or needing to transfer that data for other uses.

If real-time collaboration is part of your application's requirements, you’ll find it almost impossible to work with S3 effectively. NFS and SMB are inherently built to handle concurrent writes, so if two users edit a file at the same time, the file locking mechanisms help manage that conflict. Imagine two users working on the same document; NFS will make sure that their changes do not collide and that revisions are appropriately handled. With S3, you’d need an entirely different architecture to accommodate that kind of workflow, likely incorporating additional services to handle merges or track changes efficiently.

When you're looking at these differences, keep in mind that the issues with S3 aren't necessarily insurmountable, but they do add significant overhead in terms of complexity and development effort. Depending on your context and requirements, it may need a lot of extra code to simulate the file system behaviors that you can get almost “for free” with NFS or SMB.

If you're building something new or working within an existing architecture, consider what kind of file behaviors you expect from your applications. If you've got legacy applications or systems designed for shared file access, integrating them with S3 will take a clear consideration of these differences. You have to ask yourself if the trade-offs in flexibility, ease, and performance are worth it based on your specific use case. If you can afford the upfront investment in architecture and code to handle these differences, S3 may still fit within your overall strategy. Just don't expect it to replace the agile flexibility you’re used to with NFS and SMB right out of the box. It's an entirely different ballpark where you'll need to aggressively manage the challenges that come with object storage.