Docker - a Lovestory

How it all started

This spring I went to the rOpenSci Unconf. I went there being quite nervous, because so many great and super smart people would be there. The event was organized as a sort of hackathon where people would work on different projects they were interested in. In the weeks before the unconf people were posting ideas on what they would like to work on. There were so many topics I didn’t even understand what they were about. The first project I decided to join in on was called Tutorials for more advanced topics with R. As I consider myself an advanced R user I felt like I could do that. The group decided to work on different topics, one of which was a Docker tutorial for R users. I did not know exactly what Docker is. Just that it was similar to a virtual machine, but somehow cooler. I found it very intriguing and had been thinking I should learn about this Docker thing anyway. So I went ahead and joined in on the team and learned. What we produced is still work in progress, but I am pretty proud being part of it: Click here to get to the website and here to get to the GitHub page. Sean made a list list of the other projects from the unconference on his blog.

What I use Docker for

For a broader explanation on what Docker is why to use it please see the first lesson of the tutorial. I will just tell a little bit about I have been using Docker since I learned it myself.

Docker as a portable computer with fixed settings

I love writing this blog, but installing all the things to get it running was a pain. But I will not have to worry about this again. I have a Dockerfile, which is just a plain text file that builds a Docker image. I just copy this onto any computer (e.g. by checking out my git repository with my blog stuff), build the Docker image, start the container and have RStudio running in my browser in no time. And I know the stuff installed in this container are allways the same. It just works.

Docker images as ensurance of reproducible research

In a previous blog post I wrote about my goal to make my research fully reproducible. With Docker I think I can get a step closer to the gold standard. I can do all my analyses and just add them to a Docker image and distribute it for everyone to use. Check out how to do that here. And the great thing is, that everyone will have the same settings and versions of packages as I did when I did the analyses and thus get exactly the same results.

All in all I am really happy that I learned Docker and can only recommend researchers who work with R to learn it, too.