Data stored in Storj Decentralized Cloud Storage can be accessed in multiple ways:
- With native “uplink” protocol, which connects directly to the nodes where the data is stored
- With using S3 compatible REST API, using an S3 gateway:
- Either the hosted S3 gateway, operated by Storj Labs
- Or with running your own S3 gateways
The easiest way is using the shared S3 gateway with any S3 compatible tool (2a) but this approach may also have disadvantages:
- The encryption keys are shared with the gateway
- All traffic is routed to the gateway before accessing the data on storage nodes
In a powerful server environment (with enough network and CPU bandwidth) it can be more reasonable to use the native protocol and access the storage nodes directly. However, native protocol is not supported by as many tools as the S3 protocol.
Fortunately, in Kubernetes – thanks to the sidecar pattern – using the native protocol is almost as easy as using the shared gateway.
The smallest deployable unit in Kubernetes is a pod. Pod is the definition of one or more containers with the attached volumes/resources/network usage. Typically, only one container is included in one pod, but sidecar patterns deploys an additional helper container to each pod.
As the network namespace is shared inside the pod the main container can access the features of the sidecar container.
To follow this pattern, we should deploy a sidecar container to each of our application pods.
Instead of using the hosted, multi-tenant version of the S3 gateway:
We will start a single-tenant S3 gateway with each of the services:
Let’s start with a simple example: we will create a Jupyter notebook which reads data from a Storj bucket for following data science calculations.
A Jupyter notebook can be deployed with the following simplified deployment:
(Please note that we use hostPort here. In a real cluster, service (load balanced, nodeIp) or ingress definition would be required, depending on the environment.)
After deploying this definition to a Kubernetes cluster, we can access the Jupyter notebook application and write our own notebooks.
To open the Jupyter web application we need the secret token which is printed out to the standard output of the container:
And now we can create a new notebook where we use the Storj data:
To read data via S3 protocol, we need boto, the python S3 library, which can be added to the docker image or installed as a first step in the notebook:
Next, we can read/use files directly from Storj:
The approach uses the shared S3 gateway and requires access key and secret credentials generated as documented here.
Activating the Sidecar
Let’s improve the previous example by using the sidecar pattern. First, we need to generate an access grant instead of the S3 credentials to access Storj data, and we should define any S3 credentials for our local, single-tenant S3 gateway:
Let’s create a Kubernetes secret with all of these:
Now we can enhance our Kubernetes deployment by adding one more container (put it under spec/template/spec/containers):
This container is configured to access the Storj API (using the STORJ_ACCESS environment variable) and secured by STORJ_MINIO_ACCESS_KEY and STORJ_MINIO_SECRET_KEY.
Now we can access any Storj object from our Storj bucket, but we can also make it more secure without hard-coding credentials during the initialization of the python S3 client. We should add two more environment variables to the existing Jupyter container to make it available for the client:
With this approach, we can initialize the S3 client without hard-coding the credentials:
Note: http://localhost:7777 is the address of the single-tenant Storj gateway which is running in the same pod.
Sidecar pattern is an easy way to access our data from Storj Decentralized Cloud Storage using the power of the native protocol, even if our application is compatible only with the S3 Rest API.