At Storj, we are relentlessly pursuing the best system we can. Of course, we've still got work to do. This document is intended to address the known limitations of the product (with regard to security, performance, availability, ease of use, economics, and more) that we are busy trying to address.
There are also some instances where we’ve made what we think are thoughtful compromises between competing goals (e.g., completeness vs. time to release, security vs. ease of use, decentralization vs. economics). Still, some users might disagree with those choices.
There are also some circumstances where, to support certain use cases or to work with outside systems (e.g., to support certain common S3 usage patterns), we’ve had to give users options, and some of those options represent compromises.
While we think we’ve made the right trade-offs, we don’t want to substitute our judgments for yours. So:
- We publish this document so that you understand our choices
- We’ll always try to make sure users can make well-informed decisions in product
- We publish information on the state of our network, so users can make informed decisions (more info in section 7 below),
- We are open source, publish detailed whitepapers, and publish frequent technical updates, so you can see what it is we are doing. (We do not currently grant public access to Jira tickets, inclusive bugs, and the internal roadmap, although this is a matter of internal discussion.)
Read below for some important information on
- Satellites as points of failure
- Inline Segments
- S3 Compatibility
- Data Disclosure
- We do not use 3rd party cookies, but we do (with user opt-in) use Segment.io to understand session paths.
- We use Customer.io for product emails, which has tracking links within the customer.io product. We use this to understand the effectiveness of different marketing campaigns and emails, but we do not track individual data.
- For performance and maintenance reasons, uplink and Storage Nodes send trace information, but there is an option to disable this.
- We have removed Google Analytics from most of our sites and are in the process of removing it from some remaining sections.
- On our recently launched website and blog, a user would need to consent to a cookie for us to do session tracking through the proxy we set up, that then sends session data to our Segment data warehouse.
We have published extensive documentation on our use of encryption and the security features for access management to enable developers to build more secure and private applications. There are three main architectural choices for users when using Storj DCS Uplink:
1) Libuplink: Our native library and the partner tools that use it (e.g., Rclone, CLI, Filezilla)
2) Gateway ST:: An S3 Gateway that you host yourself
3) Gateway MT: A multi-tenant gateway that Storj hosts. Gateway MT also underlies the Web UI Filebrowser and Shared URLs.
There are some important differences in the use of encryption between these options.
- Security aspects common to all three options
Regardless of whether you are using Libuplink, Gateway ST, or Gateway MT, data is encrypted before it goes to the Storage Nodes. Encrypted data is broken into segments. Those segments are divided into erasure-coded pieces. Pieces are distributed to Nodes (unless it is an inline segment, see below). The Nodes just see an encrypted piece. 29 pieces are needed just to recreate an encrypted segment—[See Section 4.11 of the whitepaper for more information on encryption].
We also utilize metadata masking to ensure there is a minimal amount of metadata that anyone (including Storj) can use to compromise or mine your data. However, for usability sake, certain metadata is not masked (e.g., bucket names, size, created time.) [See Section 4.9 of the whitepaper].
- Encryption for Libuplink and Gateway ST
In the case of libuplink and Gateway ST, you generate the keys to encrypt your data at rest and only you hold those encryption keys. Storj never sees the keys. So, there is no way (even if we wanted to do so) to see your data or share it with a third party. This is called End-to-End Encryption (E2E), and it is by default the only option we support on Libuplink and Gateway ST.
- Encryption for Gateway MT
In Gateway MT, we use Server Side Encryption (SSE). We do this for ease of use and to support certain S3 use cases where End to End encryption is not supported by S3. However, this means that Storj also is involved in the generation of --and has temporary access to-- your encryption keys when using Gateway MT. We take steps to make this a very secure option, and far more secure than centralized services.
- We use unique keys to encrypt segments of each file individually.
- We encrypt the access grant and secret key with your access key id, which we do not save, so we cannot access them when we are not using them, and (unlike the centralized services) ensure that data is encrypted at rest. We accomplish this by storing your encrypted keys in our database, using the hash of your access key as our identifier.
- We encrypt the data, as well as path and metadata by default, which most other services do not do.
- Even if you choose to share an object via a public URL, the underlying data and metadata are always encrypted on Storj DCS.
- We encrypt the channel as data moves from your system to the hosted gateway.
- We have a canary to let you know if we are ever compelled by a government to disclose any data.
This approach is the industry-standard approach to encryption with a cloud-hosted S3 compatible gateway. However, if you want E2E encryption instead of SSE, or if you want a truly trustless system, you may either i) use Libuplink or Gateway ST instead of Gateway MT, or ii) encrypt your files using other tools before sending them to Gateway MT.
We are currently working on an option that will allow you to locally generate encryption keys and encrypt data before sending it to Gateway MT, but that will require you to run some code locally. Stay tuned.
- Sharing Data
Storj has implemented a very secure and highly flexible mechanism for sharing files, portions of folders, etc., relying on mechanisms such as macaroon-based access grants. An Access Grant is a security envelope that contains a Satellite address, a restricted authorization token, and a restricted path-based encryption key—everything an application needs to locate an object on the network, access that object, and decrypt it.
Access Grants coordinate two parallel constructs—encryption and authorization-- in a way that makes it easy to share data without having to manage access control lists or use complex encryption tools. Both of these constructs work together to provide a client-side access management framework that’s secure and private, as well as extremely flexible for application developers. These mechanisms are described in greater detail in our product documentation.
As you might imagine, there are security implications in the choice of creating access grants in the command line interface vs. the browser interface without Gateway MT vs. the browser interface with Gateway MT. Again, if Gateway MT is involved, the Satellite does have temporary access to the access grant. This topic is addressed more fully here.
- Compliant Storage
We believe that a decentralized, know-nothing approach to data storage offers far better security than traditional, centralized approaches. However, many older data security and privacy standards (e.g., HIPAA, GDPR), require making specific representations about the physical location of data (e.g, requiring that data only be stored in certain countries or in data centers that meet certain standards). Therefore, if your application is subject to those standards, you should make sure that our solution is right for you. Our roadmap includes a project around geotagging (i.e., making sure that data only goes to nodes in certain countries out of the 96+ where we currently have nodes). As our list of node operators includes both individuals and data centers, once we reach a critical mass of node operators who are operating compliant data centers, we also intend to add an option for customers to only store data on nodes in compliant data centers should they need to meet storage compliance requirements. However, neither of these two important capabilities for compliant storage is live as of the last publication date.
III. SATELLITES AS POINTS OF FAILURE
We designed our system to be extraordinarily resilient to data loss. The decentralized nature of our Nodes provides exceptional resilience against a lot of things that cause data loss in conventional storage systems (Node failure, fire, floods, power outages, etc.). [See Sections 2.5 and 3.2 of the whitepaper]
However, the Satellite peer class, which holds the metadata used to distribute and recover the data located on the Nodes, is currently less decentralized and distributed than our nodes.. [See Section 4.10 of the whitepaper and our product documentation.] Catastrophic satellite failure could cause data loss or data unavailability.
Any “Satellite” is a collection of multiple servers, including multiple instances of our satellite API endpoints and multiple instances of our distributed database, CockroachDB. At this point, all production Satellites are run multi-region (i.e., every Satellite has redundancy across multiple data centers and geographic regions). Of course, we try to follow best practices in terms of backing up and snapshotting the metadata in satellites. Nevertheless:
- Satellites are not yet fully decentralized. Our next three goal posts for Satellite decentralization are to enable partners to run Satellites, the community to run Satellites, and then to further break apart Satellites themselves. While these are important goals for us, the happiness and experience of our Node Operators and customers come first, and we believe we have more work to do to improve the experience of storage node operators further before we expand the number of Satellites on the network. In particular, improving the payment experience and trust and management functionality for Satellite and storage node operator interactions are important next steps. See this post for a discussion of the current state and our plans to become more decentralized over time.
- If a Satellite were to become temporarily unavailable (i.e., all of the Satellite servers in multiple distinct regions were unavailable), then you would not be able to access data stored using that satellite until it came back online.
- If a Satellite were to experience a catastrophic failure, and we had to recreate the Satellite from backup/snapshots, then not only would some data become unavailable during the re-creation process, but any data stored between the last backup/snapshot and the Satellite failure could be lost.
- We currently do a full back-up of our Satellites daily and do incremental backups (snapshots) hourly. These backups and snapshots are of your encrypted metadata and are stored on services and devices not run by Storj. We also are implementing backups/snapshots between different satellites which will allow us to store multiple copies of any satellite’s backups and metadata both on our services and on third party services, so Storj itself is not a single point of failure. Backups of Storj DCS on other Storj DCS Satellites are not live as of the last publication date of this document.
We also try to follow best practices for simulating catastrophic failures and recovering from them. (e.g., simulating Chaos Monkey and Chaos Gorilla). We are also implementing a write-ahead log so that we will not be subject to data loss even between snapshots, but this is not live as of the last publication date of this document.
- To date, we have not lost a single file since our Alpha 3 in August of 2019. We will disclose if we ever lose a file. We aim to maintain above 99.95% data availability and did so for the 30 days prior to the last publication date of this document. We have a Service Level Agreement (See Section 9 of Terms of Service ) which covers going below those limits.
IV. INLINE SEGMENTS
Very small files (under 4 KB) are encrypted but follow a different erasure coding and storage scheme than normal files. [See Section 4.10 of the whitepaper].
V. S3 COMPATIBILITY
An important aspect of Storj DCS is ensuring that we are compatible with the existing, de facto standard for cloud object storage, S3. As S3 is a vast, 15-year-old API, while we have aimed to support the most important capabilities---which cover the overwhelming majority of cloud storage-based applications--there is a long tail list of lesser-used functions that are not yet supported. We have benefited from the efforts of Minio in building out S3 compatibility. Our next major effort involves supporting a range of server-side S3 functions for Gateway MT users. We listen to the community and users when prioritizing additional S3 functionality.
We aim to be the most economical service, with prices that are approximately 20% of the major cloud providers, with no hidden fees, with multi-region at no additional price. However, you should always make sure that you know our latest pricing, and we recognize that other providers’ prices may change as well and/or be less expensive for certain use cases. Our system does work best when segments are larger (64 MB). So, if you choose to use the S3 default segment size of 5 MB rather than the Storj default segment size of 64 MB, you may incur some extra fees. See pricing for details. Pricing is subject to change.
We have tried to design our system to be very performant. But, the decentralized and encrypted nature of our service means that certain use cases are more performant than others. For example, you shouldn’t try to store a live database using decentralized storage. (Backups and archives of databases, on the other hand, are good use cases). Like all services, we are constantly tuning performance. With Gateway MT (which is hosted in multiple Equinix locations), we should get much better throughput performance when moving data from locations that have peering relationships with Equinix (e.g., most centralized clouds, many enterprises). We also have a major project to reduce latency, which is scheduled to go live in Q2 of 2021.
When performing Reed Solomon erasure coding client-side, there is an additional throughput impact, as the expansion factor results in 80/30=2.7 x data being sent up. With Gateway MT, the Reed Solomon erasure coding is server-side. The result is better performance, especially in environments with low bandwidth available. We are working on projects to enable Gateway MT performance with client-side encryption to get the best of both solutions so that customers don’t have to choose between performance and trustless security.
VIII. DATA DISCLOSURE
We know that customers, community members, and Node Operators alike want up-to-date information on things like the size of our network, the performance of our network, durability, availability, and token flows. We’ve tried to be good about publishing those stats periodically, e.g., in town halls and our quarterly token report. We also know that some community members have starting publishing stats, which is awesome (but we can’t guarantee their accuracy). We have a project underway to make significantly more data available on a near real-time and programmatic basis. Stay tuned!