Did you happen to miss the fist two parts in this blog series? You should probably read them first, because, you know, they're a series. You can find them here:
Hot Rodding Decentralized Storage-Part 3
Download - Choose Your Own Adventure
Let’s start with what your needs are with a focus on the goal of best overall throughput. You likely fall somewhere in the gradient listed below:
- I want to download a single large file
- I want to download many smaller files
Given that we get great performance with parallel actions we need to calculate how to get the most out of what you have. We will list a few examples below. Please note these examples focus on the maximum possible performance and can require significant compute resources. We address this in detail in the tuning section. Unlike uploads where we exclusively use RCLONE your requirements will steer us to either the Storj Uplink CLI or RCLONE.
Single to Few Huge Large Files (download)
In testing, we find that download performance can be fully realized with single files as small as 1.5GB. This means if you have the resources (memory and network) you can realize the best theoretical performance with the following figures:
- 1x 1.5GB file uploaded with a Chunk Size of 64M and parallelism of 24 using UPLINK CLI
Many Smaller Files (download)
If downloading many files that are 64MB or smaller we don’t have the opportunity to use the uplink parallelism feature as each file is only made up of a single segment. In this case, it is faster to use RCLONE to retrieve our files. RCLONE supports --transfers allowing us to realize parallelism of smaller files by grabbing many files at the same time. Uplink will support this in the future.
Reviewing what we just learned above:
- Files 64MB or smaller are a single segment and thus to push the network we need to move a bunch at the same time.
- Large files can benefit from parallel segment transfers.
- You will use Uplink CLI for large files and RCLONE for small files.
Download Tuning (large files)
UplinkCLI - Native
Native uplink is the best choice for large files as it bypasses our edge service layer and connects you directly to the nodes. This results in the best possible throughput. If downloading several large files it is commonly faster to enact the parallelism per file yet download the many files in a serial fashion one after the other.
Download performance is CPU bound as the burden of encryption and erasure coding is handled by the client machine running Libuplink. Compute requirements are as follows:
- One CPU (not thread) per concurrent operation
If you have a 4 core CPU presenting 8 threads via hyperthreading you will hit near peak performance with a parallelism setting of 4. You can use higher figures up to 50% over core count to marginally increase performance. On a 24 core server, we found great performance at --parallelism 24 with an increase of performance of 4.5% when set to --parallelism 32.
Use as many cores as you can. Using --parallelism 24 (24c) we have observed speeds of over 2600Mb/s (325MB/s) with supportive compute and networks. This requires multi GB IP Transit.
Example Commands - Prerequisites
Ultimately your compute, ram, or internet connection is likely to be the gating factor. We advise that you perform a speed test to understand the capability of your network before testing.
Depending on the integration method you will either have high or low computer utilization:
- This will be the fastest with adequate compute. Increase--transfers until you have reached your throughput goal or limit of acceptable compute.
- Hosted Gateway MT
- If you have lower compute capability this may be faster. It is worth testing both integration patterns and seeing what works best for you. Increase--transfers until you have reached your throughput goal.
The following command will download 10 files at the same time
Download Tuning HTTP - (Single File)
It is possible to distribute files via the Storj Linkshare Service. It is also possible to use a multi-connector download utility to significantly improve the transfer speeds. For our testing we used aria2c. This utility uses very little CPU and has a great effect between 4 and 8 connections. One connection in testing was between 18-20MB/s.
# 2 connections (29-35 MB/s)
# 4 connections (70-80 MB/s)
If you have any questions or comments, feel free to jump into the forum and thanks for reading.