My favorite tools for helping future me

Reproducible research is a topic that people like to talk about these days. Thinking about reproducible research and learning the important tools is what improved my work more than anything. Not in a sense that my results got better. More in a sense that my feeling about the work got better and my analyses got easier to understand for future me.

before after

So in the following I would like to give a list of things that are helping future Heidi most:

Of course there are many other things for me that are influencing the usefullness of these tools. First of all I use R for statistical analyses and run
Linux (Ubuntu) on my computer (and servers).

This post is not meant as an instruction to how to use the tools. Introductions are linked above. I want to tell how I am connecting the tools for projects.

How to use Literate Programming, Version Control and Makefiles together

knitr

As a statistician, before shipping a report, talk or paper usually many hours are spent with the cleaning and exploration of the data. In order to keep track of what I already did, I like to produce a PDF of all tables, figures and other analyses that I produce. Or at least those that are in any way usefull. That I do using knitr. With knitr I can also include my thoughts

Roxygen

I document (almost) all functions that I write for a project with Roxygen2. It is a fast and easy way to keep track of what your function does and what parameters are supposed to mean. And if you decide to make an R-package out of the code, it can easily be transformed into .Rd files.

SVN

To keep track of the changes and in order to be able to go back to old versions of my code or my latex files, I use SVN (or git). This is super nice when you work alone. It is even better when you work with other people. With SVN I control and save all my R-, Rnw-, tex- and other files that are important to keep in case my computer breaks or my office burns down. In this way it is also possible for me to work on different machines without any hassle. I just checkout the repository to the given machine.

Make

Imagine you write a paper or a report. You change something in the data cleaning code or in your analyses. Wouldn’t it be awesome to just do a single command and all codes that depend on your change are run and your paper/report automatically updated? It is possible by defining a simple text file (Makefile) and then just say make all. At least when you work in Linux.

The gold standard

My personal goal is

  1. To make my work so understandable that in some years I will still understand what I did or even that I could quit a project and someone else could keep working on it without major problems.
  2. To be able to go back to old thoughts and track changes that anyone did on the project.
  3. To have a “flow” in my projects where I can change things somewhere in the middle of my code and with a single command (make) I can update everything that is neccesary.
  4. If the worst happens and all the machines I work on brake, I want to be able to recreate everything without loosing much time.

Update (04.07.2016):

I now also use Docker. See my post on how I learned Docker.