This is an important question for anyone planning their hosting infrastructure for high-capacity horizontal scaling. File replication is a fundamental component of load balanced environments and the cyberworld by and large would be a very different landscape without managed file replication helping balance the workload among its cluster members.
In this guide, we’ll be taking a look at managed file replication, what it is, and when you need it. NameHero offers Managed File Replication on our higher tier enterprise hosting packages if you’re interested in learning more.
What is File Replication?
The practice of tracking and replicating file system changes across two or more independent locations. File Replication can be used locally to keep two separate target paths synchronized. Any changes made to one location are replicated to the other in either a one-way or two-way replication setup. File Replication is more often used over a private network as part of a load balanced server cluster. Load balanced infrastructure requires a synchronized document root to maintain an identical end-user experience to visitors no matter which node in the cluster their connection happens to land.
What is Managed File Replication?
Managed File Replication (MFR) is an enterprise-grade hosting service provided by NameHero. Our everyday heroes monitor and maintain file replication clusters, ensuring they remain online, healthy, and synchronized. A necessary expenditure for any website or application at scale. NameHero takes care of the setup, configuration, monitoring, and maintenance of the software daemons responsible for the replication process. Our heroes will act to correct any problems detected with the replication cluster automatically. MFR gives website owners peace of mind that their replication needs are being met and attended to without having to expense development work to maintain the replication system themselves.
Managed File Replication Types
One-Way Replication
- Unidirectional
- Master/Slave Archetype
- Replication Delay
- Single Point of Failure
- 7th-layer Load Balancing
Two-Way Replication
- Bidirectional
- Node Equality
- Near Real-time Sync
- File Collisions
- 4th-layer Load Balancing
When To Use Managed File Replication?
File replication is necessary when load balancing web server clusters. Each node in the cluster must maintain an up-to-date copy of all file system changes made to its document root. Otherwise, one server will not be aware of the changes made to the others and thus would result in servicing a 404 or other HTTP error code to visitors landing on a node with the incorrect files.
Example: An end-user uploads a picture to the site or app, the initial upload lands on a single server in the cluster. That server is the only one aware of that new file. When visitors are assigned to a node other than the one that initially serviced the upload, the file is then missing and thus results in a 404 error and a broken/missing picture.
Vanishing Image Phenomenon – A telltale sign that your website/application needs a Managed File Replication solution is the vanishing images phenomenon. When an image is uploaded to one node in a web cluster and not replicated to all other nodes, visitors will report images that appear and disappear between page clicks.
Differences Between One-Way vs Two-Way Replication
As mentioned earlier, there are two basic MFR models to consider. Each can be utilized to synchronize your file system changes across any number of web server nodes in a cluster. We will go over the basics of these archetypes, their strengths, weaknesses, and intended use case.
One-way File Replication
Unidirectional – Monodirectional replication is when file system changes must be restricted to a single primary master node in the cluster. That node is then responsible for replicating any changes needed to the other cluster members.
Master/Slave Archetype – The practice of utilizing a master or primary node in a cluster. The primary node in the cluster is responsible for handling all write-based and synchronization operations in the cluster. It is the authority for file system changes and will overwrite any changes made to secondary/slave members in the cluster.
Replication Delay – The one-way model introduces an inherent delay in replication, most noticeable when a large amount of file changes happen in a short period of time. It takes time to both detect and replicate those changes across to each other node in the cluster. Most of the time, the delay is minimal, and undetected by end-users.
Single Point of Failure – The presence of one primary node to rule them all means that the primary node becomes a single point of failure in the cluster. If there are problems with that specific node, all traffic that writes data will fail and throw errors due to the missing server.
7th-layer Load Balancing – One-way replication necessitates the use of 7th-layer load balancing techniques, such as Traffic Pinning, to ensure all write-based requests are properly funneled to the primary node in the replication cluster.
7th-Layer Load Balancing refers to the Application Layer in the seven layers of the OSI Model. Load Balancing at the Application Layer requires full decryption of all inbound traffic by the load balancer device. The act of decrypting traffic and inspecting HTTP headers to determine whether the request must be pinned to the primary node comes with a slightly heavier computational workload for the load balancer device.
Two-way File Replication
Bidirectional – Replication that opens a two-way street between all cluster members in a file replication cluster. Each cluster member is able to synchronize its own file system changes over to every other member of the cluster.
Node Equality – Two-way replication does not rely on the master/slave archetype so their are no primary or secondary node assignments. Each node in the cluster is equal to the others by running its own daemon for tracking and synchronizing changes. Those changes are then immediately replicated to each member of the cluster over a private network.
Near Real-time Sync – Bidirectional synchronization reduces the overhead of having to rely on a single primary cluster member to track changes. This brings delays in file replication down further, making them appear close to real-time synchronization.
File System Collisions – One of the more difficult hurdles to overcome with two-way replication is reconciliation of file collisions. These occur in two-way replications systems when the same file is modified on more then one cluster member at the exact same time. Typically, file changes are prioritized by timestamp, promoting the most recent change to take precedence over any others. However, when that timestamp is identical, we then reach a race-condition where another means of reconciliation is necessary.
4th-Layer Load Balancing – Omnidirectional replication can take advantage of reduced computational overhead at the load balancer device by eliminating the need for 7th-Layer Load Balancing techniques.
4th-Layer Load Balancing refers to the Transport Layer in the seven layers of the OSI Model. Load Balancing at the Transport Layer can be achieved without the overhead of decryption.
When To Use One-Way & Two-Way Replication?
The general consensus here is that one-way replication is for read-heavy workloads, while two-way replication is needed for write-heavy workloads. However, your mileage may vary.
Two-way replication is the superior form, as it also can easily handle read-heavy workloads without any degradation in performance. However, due to the tricky nature of file collisions and file system delete operation, two-way replication is considerably more risky in that it has the potential to delete all files in the file replication cluster if not handled properly.
One of the biggest factors that determines your replication needs is how well your application is able to handle the horizontal scaling model. Not all software is created equal, in fact, I’d argue that most software, like WordPress, its themes and plugins for example, are not written with load balanced cluster in mind. These instance of these applications running on individual servers expects that it is the only authority governing file system changes. Managed File Replication becomes a crutch in these scenarios, helping the application replicate changes across multiple backend nodes without the application being developed as cluster aware.
Conclusion
We hope this article helps you learn a little bit more about managed file replication and if it’s right for your website. Growing businesses and websites have to stay on top of these terms and concepts to ensure uptime and optimal customer experience but sometimes can be a lot more complicated than what first meets the eye. If you need help figuring out if your website may need Managed File Replication, please contact a member of the NameHero team about our enterprise hosting packages.
Jason Potter is a Senior Linux Systems Administrator & Technical Writer with more than 20 years experience providing technical support to customers and has a passion for writing competent and thorough technical documentation at all skill levels.
Leave a Reply