Data Analysis is not all about reports or visualization. The correctness and reproducibility are also important for scientific research. A consistent environment is critical for reproducibility. There are several ways to achieve that. However, I find out using Docker at any time can repeat the experiment in the same environment. It is easy to scale up and scale horizontally.
My docker image is based on Ubuntu. It includes common Data Science tools such as Jupyter Notebook wiht Python 3 and R kernel. With the help of R Magic, I can run both Python and R in the same .ipynb file. To learn more about R Magic, you can click here. I also installed Nbextensions for Jupyter Notebook. For more information, you can click here
You can find my Dockerfile in my GitHub Repository.
HOW TO USE MY DOCKERFILE
Install Docker
The Docker community have an explicit tutorial about how to install Docker. Please check here
Build
In the terminal, direct to the folder that contains the dockerfile and run the following command:
|
|
Don’t forget the “.” at the end. data-analyst-notebook is the name of the image. You can change to whatever you prefer.
Start server
I use following code to start server:
|
|
There is more detailed instruction from User Guide on ReadTheDocs
If you feel like that the command is too long to run. You can add an alias to your .bashrc file like this:
|
|
Now you can use dslab
in the terminal as a replacement for typing the long command.