Architecting a Decentralized GitHub Backup

October 18, 2019

GitBackup is a tool that backs up and archives GitHub repositories. The tool is in the process of backing up the entirety of GitHub onto the Storj network, which currently stands at 1-2 PB of data. As of today, October 18, 2019, the tool has currently snapshotted 815,200 repositories across more than 150,000 users.

GitHub is the largest store of open source code in the world, with 20 million users and more than 28 million public repositories as of April 2017.  

We believe that this reservoir of free and open source code acts as a digital version of a public good, similar to a developers’ library—a library that empowers software engineers to access the collective knowledge around open source code, development patterns, and free software.

While GitHub is a wonderful service, it’s owned by an agenda-driven global corporation and is thus prone to downtime, blockage, and censorship by a single point of failure.

If we want to guarantee the preservation of the work of hundreds of thousands of open source developers, we need to act now! 

Let’s download it all!

We’re currently using to get a list of GitHub usernames that have had a public action since 2015. So far the 815,200 repositories we’ve backed up constitutes about 80 TB of data. We anticipate that the entirety of public GitHub repos is about 1-2 PBs so we still have a way to go

If you want to backup your codebases’ repository (or all of GitHub) to the decentralized cloud, check out the tool, found here:

Gitbackup was built by Shawn Wilkinson in collaboration with a number of Storj Labs’ engineers and community members. The tool was demonstrated on October 11 at Devcon V (Osaka, Japan).

Share this blog post

Build on the distributed cloud.

Get S3-compatible object storage with better security, performance and cost.

Start for free
Storj dashboard