Accelerating Blockchain Node Synchronization with Decentralized Cloud Storage

Standing up new blockchain nodes can be expensive and time-consuming, especially when those nodes are spun up globally in different hosting environments from multiple hosting providers. Achieving a high-growth decentralized network of nodes adds many layers of complexity and cost compared to centralized architectures.

New nodes can take days or weeks to sync, so many organizations use a snapshot of chain state to jumpstart node synchronization, dramatically reducing sync time and accelerating node time-to-value. Chain state snapshots tend to be large files, often hundreds of GB to 10TB or more. Uploading, storing and transferring large files like that tends to have a number of challenges:

  • Snapshots are frequently stored on centralized providers like AWS S3 or Google Cloud Storage
  • Using a single data center to store snapshots tends to produce inconsistent download performance, with speed diminishing the further you get from that data center
  • Storage of snapshots can be expensive, for example $23 per TB on AWS
  • Snapshot downloads can be even more expensive at $90 per TB
  • For a large scale node operator running thousands of nodes in many different hosting environments around the world, complexity, cost and process compound increasing time to value

Storj DCS offers decentralized cloud storage that solves the challenges  associated with blockchain node fast sync:

  • Ultra-fast transfer at multi Gb/s speeds from almost any point on the globe
  • Low cost storage that is only $4 per TB per month - 80% less than a single availability zone on AWS S3
  • Download bandwidth that is only $7 per TB per month - 90% less than egress bandwidth on AWS S3
  • Instream archive extraction while parallelism is set for extreme performance, eliminating the need for extra hard drive space required to first download and then extract

For large-scale operators like Ankr, Pocket Network, or C0d3r to blockchains like Harmony, Storj DCS enables node operators to synchronize new nodes in minutes or hours instead of days or weeks.

At the same time, node operators can be more efficient with resources allocated to nodes. Nodes no longer need extra hard drive space to first download a snapshot archive and then uncompress the snapshot to store the blockchain state. Nodes can download and uncompress instream during download, eliminating the need for provisioning double the amount of hard drive space actually needed by nodes.


Previous Solution


Downloading Was Fast But…

In the past downloading archives (tar/zip) required a decision. If you wanted to go fast you could use parallelism to accelerate uplink. You downloaded and then separately extracted the archive. This, however, required at least twice the hard drive space. We have seen speeds north of 6,000Mb/s using high levels of parallelism.

Limited Storage Space

Alternatively, if you had limited hard drive space you could pipe STDOUT to Tar. You could use parallelism of up to two (the current segment and the next one queued) - this could achieve 200Mb/s.


New Enhanced Solution


Now Downloading Goes Faster!

Uplink version 1.57 or later supports buffering segments in memory allowing parallelism > 2 while piping STDOUT to Tar or any other process. Now you can achieve multi Gb/s speeds when downloading and piping to archive software like Tar, requiring only the space to store the extracted archives.

If your environment has a significant amount of available bandwidth and compute, you can get extremely fast transfer rates with the Storj Uplink client. The best part is you only need to provision as much hard drive space on your node as you need to store the blockchain data.  

Sample Command

Objects can use up to 16 parallelism per GB of archive size. Based on the capability of the machine on which you’re running the uplink client, you can configure parallelism to as high as 2x the available thread count. For example, if you had a file that was 2GB+ and 16 threads an ideal setting would be 32. Even settings as low as 8 will offer >1Gb/s performance, so there is a lot of flexibility to tune to your needs and environment.

uplink --access  cp --parallelism 32 sj://chains/snapshot.tar | tar -xz -C ~/.chain/data

Important Integration Details

There are a number of integration details that can impact the success of your fast sync implementation. Here are a few important factors to keep in mind:

  • Blockchain nodes store chain state as a large number of small flat files on local storage, so it’s critical to archive those blockchain state flat files to an archive file like Tar
  • Decentralized node ecosystems have nodes running in a wide range of different hosting environments. The most cost-effective approach when managing the creation of node snapshots is to use an environment with low egress fees to create and upload the snapshot to Storj DCS
  • It’s feasible to create a backup of chain state to a certain block height, then create incremental backups, but it does introduce the additional complexity of sequencing, downloading and streaming multiple files when fast-syncing a node

Common Integration Patterns


Most Immutable / Private

  • Upload
  • The most secure implementation is to create a snapshot in an environment, archive the snapshot using Tar, upload to Storj DCS via the uplink cli or rclone via native integration.
  • Download
  • Utilize our Uplink CLI tool with the maximum amount of parallelism and the “| tar” flag. You remain the custodian of your private keys in this process.

Fastest Upload

  • Upload
  • The most common and fastest implementation is to create a snapshot in an environment, archive the snapshot using Tar, and upload to Storj DCS via the S3 compatible edge services.
  • Download
    Utilize our Uplink CLI tool with the maximum amount of parallelism and the “| tar” flag. You remain the custodian of your private keys in this process.

Decentralized IaaS Advantage

What makes the Storj DCS service so well suited for high-throughput transfer of large objects is the inherently high parallelism in the distributed and decentralized network. While the service is great for blockchain fast sync, those same performance characteristics make the service equally good for other use cases like video streaming, software distribution, and storing large scientific research data sets.

If you want to learn more, check out:

You can try Storj DCS for free - we offer 150GB of storage and bandwidth per month in our free tier. Just sign up at https://storj.io/signup and get started!

Ref

https://www.devdungeon.com/content/stdin-stdout-stderr-piping-and-redirecting

https://github.com/storj/uplink/issues/105

https://en.wikipedia.org/wiki/Archive_file

Share this blog post

Build on the
decentralized cloud.

Kickstart your next project and grow your revenue with this high-converting, beautifully crafted template.
Start for free