Archives for February 2018

How to run Google Cloud Datalab on your own Linux server

If you are into data analysis then you are probably already using Jupyter Notebooks. Some time ago Google developed their own flavour of Jupyter notebooks and released it as Google Cloud Datalab. If you are interested in running this variant it is very easy to get up and running. You will need a Linux server with docker installed.

Verify docker works by running:

docker run hello-world

Then run the Google Cloud Datalab container:

docker run -it -p "<EXTERNAL_PORT>:8080" -v "${HOME}:/content"

The first time you run this command docker will pull the Datalab image from Google, so it might take some time.

Datalab will listen on port 8080 for connections. We need to tell the docker host to expose this port so we can connect to Datalab from our browser. You can either expose 8080 directly or map it to a different port. Personally, I think its a good idea mapping each instance to a different port (e.g. 8080 + N) just incase you find yourself running multiple Datalab containers. Keep in mind we have told the docker host to expose the Datalab port to everyone on the network by specifying Datalab does not come with built-in authentication or encryption so if your docker host is on an insecure network you may prefer to expose the Datalab port on and then use an ssh tunnel to forward the exposed port <EXTERNAL_PORT> to the system running your desktop browser.

We also need to consider storage as we want to preserve data if the Datalab container is stopped. Datalab is configured to read and write notebooks to /content in the running container. We can tell docker to map the users home directory (i.e. $HOME) on the docker host to the running Datalab container’s /content path. This way when the container is stopped the notebooks will be preserved.

With the container up and running connect to the Datalab instance at http://<DOCKERHOST>:<EXTERNAL_PORT>/

You should see Google Cloud Datalab load in your browser.

At this point you can create a notebook and start analysing data. If you need to install or upgrade Python packages the easiest way is to invoke shell commands from within the notebook using the following syntax:

!<command> <arguments>

For example, to install Python Spark for Python 3 create a code cell in the notebook as below and run it:

!pip3 install --upgrade pyspark

Have fun!