Why Today’s Cloud Storage has Inconsistent Performance—And How to Fix It

Trisha Winter

February 2, 2023

When data is routed through the internet, it must traverse through many different intermediaries making a number of hops, any one of which could become a bottleneck. With typical cloud data storage that uses data centers, traffic gets routed through a hub and spoke model. This places performance at the mercy of the hub and spokes.

‍

How a download from a data center traverses the hub and spokes of the internet

‍

Any slowdown along the intended route may cause the data to be rerouted to a longer or slower path or the data to continue on the path but at a slower speed. In a recent webinar comparing AWS performance to Storj, Jacob Willoughby, CTO of Storj, relayed common issues, “Some of these links get full, the ISPs don’t immediately go out and fix them, they go down or are not available at the time. The internet is really the wild west as far as how these things are connected and operated.”

Willoughby went on to say, “This problem is exacerbated if you have transatlantic cables or if you are storing your data in a single region and trying to serve it around the globe. Fundamentally, the longer the distance between the data center and the destination, the more inconsistent the download performance.”

As data distribution has gotten more global, data replication to solve for these problems has become incredibly costly. The decisions made on cloud data storage are now having a big impact on end user experience, particularly where end users are located a far distance from a regional data center. This article will cover how companies can eliminate the issue of inconsistent performance and will give the high level performance results from the on-demand webinar: Want reliable performance? How Storj is consistently faster than AWS.

How Storj avoids the problems of the hub and spoke pathways

Storj is decentralized and distributed cloud storage that utilizes underused storage capacity across the globe. Storj uses erasure coding to break data into small, encrypted segments and distribute those segments across its global network of tens of thousands storage nodes. When data is requested, Storj calls segments from many nodes in parallel, but only needs the fastest 29 segments to rebuild the file.

‍

How a download works using Storj distributed cloud storage

‍

So when a download is requested on Storj, the segments located closest to the destination are typically used to satisfy the request as illustrated in the graphic here. Storj can leverage wider interconnectedness, decreasing overall distance and dependency on individual routers and endpoints.

So how does Storj get great performance when the data is encrypted, erasure-coded and distributed over a global network of storage nodes? By designing that way.

Here’s how Storj gets consistently fast performance for large files:

Optimizes for less bandwidth use - Erasure codes have a lower expansion factor than replication but have high durability. They reduce required data transmission.
Minimizes coordination - Metadata storage is in a simple federated model, instead of coordination-heavy globally shared ledgers or similar.
Avoids long-tail variance - Uploading and downloading from excess nodes lets us care only about the fastest nodes in any set, turning variance into a strength.
Maximizes parallelism - Brings smaller segments of the data across many pathways simultaneously where the fastest segments get used to build the file. Basically, the internet is big and we get to use all of it.

Storj vs. AWS - the results are in!

‍

A benchmark was performed to compare S3 single-region storage with Storj distributed storage. Storj has more consistent throughput because it pulls from many nodes simultaneously, drastically increasing the probability of uncongested paths. S3 performance is affected by intermittent internet slowdowns. This is especially true when the S3 origin is far away from the download location.

‍

AWS single region storage versus Storj distributed storage - download performance across distances

‍

Benchmarks are great, but what do customers actually see?

Storj looked at 301,859 sampled download events of customers who opted in to telemetry, for distributions of traffic quantized every 1000 km from Portland, customers see consistent performance independent of distance.

‍

Real customer download speeds across distances on Storj distributed storage

‍

This performance was also validated independently by Dr. Antonin Portelli at the University of Edinburgh who simulated distribution of high energy particle physics datasets within DiRAC. Dr. Portelli’s study demonstrated enormous global throughput gains thanks to the parallelism used by Storj.

‍

Excerpts from Univ. of Edinburgh study on Storj upload/download performance

‍

Taking a holistic look at this data, it is clear that the architecture of Storj distributed storage is far more consistent in data transfer performance than AWS or any traditional cloud storage model using data centers.

AWS has a lot of money, can’t they make improvements?

Actually, no. The fundamental model of data center storage has clear limitations. Can they build more data centers? Absolutely! But this only increases the cost of storage as companies would need to replicate their data and store it in all the regional data centers to get global consistency in data transfer performance.

Realistically, the workaround that companies are using today to solve for this is to create multiple replicas of data at different origin servers in different geographies. Willoughby says this introduces a lot of inconsistency, “You have to wait for synchronization of that data between locations for multi-region storage. That means you have to code your application to handle asynchronous data to try to provide customers with consistent performance.”

Willoughby went on to explain that poor performance from an origin server requires lots of cache for PoPs (Point-of-Presence). “AWS, as well as the other hyperscalers, are already utilizing caching to speed up performance as much as possible. They truly have no room for improvement in their model.”

The bottom line of this “band-aid” solution is a multiplicative cost for AWS users. For 1TB of storage, Storj is $4 for storage and $7 for egress. AWS is $23 for storage and $90 for egress. But when replication is added, it is another $23 for every region you want to replicate to and another $10 fee for intra data center charges. That becomes incredibly expensive for AWS customers.

The future is even faster with Storj distributed storage

The data clearly shows that Storj has great consistency in transfer speeds across geographies. But is that the best we can do? Willoughby says it's only going to get faster. “Storj has lots of ways to improve performance where we’ll be able to meet or exceed download speeds of those from requests made right near a data center, let alone those further away.”

Here are some of the ways Storj will continue to improve performance:

Performance and consistency naturally improve as the network grows
Eliminating excess protocol/database round-trips and unnecessary pipeline stalls
Better default erasure code parameters
Dynamic long tail cancellation
More intelligent node selection and data placement
Dynamic segment sizing
Dynamic erasure coding parameters based on object size and other characteristics
Caching hot content on additional loser nodes
Growth in # of nodes and expansion of global footprint

Storj is more consistent today and will be even faster in the future. And it will do this at drastically lower costs than AWS and its ilk because no replication is needed and the model is significantly less expensive to operate. Consistent performance and low cost is a bright future for cloud storage and data innovation.

The webinar concluded with lots of great questions about Storj parameters, erasure coding, and even IPFS and Filecoin. To get the full details from the webinar including customer case studies and the complete Q&A, watch the on-demand webinar here.

The "declouding" myth: Why hybrid is the new default, not a retreat.

The rise of "data sovereignty" in a geopolitical cloud.

You can’t AI a tape: Why the great LTO migration is non-negotiable.

Start your 30 day trial

Work without waiting.

Make the switch and get local-like performance from the globally distributed cloud.

Get started