Storj DCS Performance
Decentralized storage employs the concept of parallelism, which has a significant effect on performance. As you will read below, we can enact parallel efforts at several layers to ensure the highest possible performance. This performance capability is made possible by our distributed nature, allowing you to retrieve your data from potentially hundreds or thousands of Nodes with a speed limit gated only by your local compute or network resources.
Files -> Segments -> Pieces
- Files are the items you upload to the network. Files are broken down into segments.
- Files are represented by the green box in the diagram below.
- Files are made of one or more segments. Our segment size is 64MB. 256MB is made up of 4 such 64MB segments.
- Segments are represented by the blue box in the diagram below.
- Segments are made up of pieces. For each segment sent out to the network, 80 pieces are distributed to Nodes.
- Pieces are represented by the red box in the diagram below.
A theme that will come up time and time again when we talk about performance is parallelism. In this guide parallelism and concurrency are used as like terms. We’ll talk about the concept of base parallelism and our new segment parallelism enabling new record speeds.
When you upload or download from the Storj network, file segments are sent one after another. However, the pieces that make the segments are sent in parallel. Tremendous effort has gone into optimizing this process. For instance, when you download a file we attempt to grab 39 pieces when only 29 are required eliminating slow nodes, also known as “long tail elimination”.. This is our base parallelism and allows up to 10 nodes to respond slowly without affecting your download speeds.
When uploading we start sending 110 erasure-coded pieces per segment in parallel out to the world, but stop at 80 pieces. This has the same effect as above in eliminating slow nodes but attempting more connections than are required to reconstitute the file.
Both examples above are our base parallelism occurring within segments.
In addition to the base parallelism, we now have the capability of transferring multiple segments in parallel, which we refer to as segment parallelism. Let’s look at an example for a 512MB file:
A 512MB file is made up of eight 64MB Segments (512/64=8). When downloading a File we use the fastest 29 Pieces. So a 512MB file is 8x 64MB segments of which you’ll retrieve a total of 232 Pieces (8*29=232). To download this file as fast as possible you would request each segment at the same time, in parallel, which results in downloading all 232 Pieces at the same time. This means, in theory, you can download your file eight times faster than if you requested each segment serially.
What about if the file was 1GB? Using the math above you could request up to 16 segments in parallel for peak performance if your computer and network support the increased load, which we’ll explore deeper in this article.
Example of Segment Parallelism
Let’s say you have a 1GB file to upload to Storj DCS. Given that the block size for files on the network is 64MB, your upload will contain 16 such blocks (1024/64=16), going forward we’ll refer to these 64MB blocks as segments. Given that in this example you have 16 segments to upload you can just do it all at once, that is if you have enough bandwidth and memory. Sending multiple segments at a time will be covered in detail below and is the key to realizing incredible speeds on the Storj DCS network.
Please note that parallelism as discussed in the paragraph above is ideal for a small number of large files, 64MB and above. If you wish to move a large number of smaller files, please follow our guidelines below that focuses on concurrent file transfer which is ideal for many smaller files.
There are a few ways to interact with the Storj DCS network. Our two main choices are native where you run our uplink locally and our hosted gateway mt where we run the uplink for you and you connect using the s3 standard. Below we offer greater detail and also provide step by step tutorials.
- Native (the fastest method for downloading large files)
- All the major steps including Encryption, erasure coding, and transfer occurs directly from your computer to the nodes on our network. This method is ideal for downloading large files quickly.
- This method supports parallelism for downloads
- Has a 2.68x upload multiplier for uploads and does not support segment parallelism
- Gateway MT (the best method for uploading large files)
- Encryption, erasure coding, and upload to Nodes occurs on the server-side
- Supports parallelism for upload and multi transfer for download
- 1GB upload will result in 1GB of data being uploaded to Storage Nodes across the network
- Based on the S3 standard
- HTTP (alternative for downloading)
- Encryption, erasure coding, and upload to the Nodes occur server-side
- An alternative method allowing download parallelism for large files
- Not as fast as Gateway MT, but faster than no parallelism
We refer to the components used to interact with the network as our uplink peer class.
There are several tools to interact with Storj DCS. RCLONE is the only tool supporting both the Gateway MT and the native methods listed above. We’ll define each tool below and then guide you to your best options in the tuning guides to follow.
Rclone is a command-line program to manage files on cloud storage. It’s a feature-rich alternative to cloud vendors' web storage interfaces. When not using our native uplink, RCLONE is our go-to tool. In RCLONE, parallelism is known as concurrency. The default concurrency for RCLONE while uploading multipart files is four.
The Uplink CLI is our client-side application that supports interacting with the Storj DCS network. This tool supports parallelism for downloads only.
The Storj DCS service allows you to host static objects and other web-delivered assets such as streaming multimedia and large file distribution.
Thanks for reading! This blog is one of a three part series, so look out for the more blogs to come.