Documenting Your Code

— Alex Reinhart

Documenting your code is essential. Choosing good names helps make your code understandable, as we described in the best practices lecture, but sometimes code needs documentation.

A very useful kind of documentation is comments explaining what each function or class does. Many programming languages even add extra features to make this documentation easy to write and easy to access.

#

R has a built-in help system – you’ve probably used it many times. If you’re writing an R package, or even if you’re just writing ordinary code for yourself, it can be useful to write documentation so you and others understand your code.

R’s original system required you to write specially formatted separate files, called .Rd files, in a syntax that looks a lot like LaTeX. R would then use these to make the help pages you view.

Fortunately, the roxygen2 package makes this much easier, and you can write documentation right in your R files in ordinary comments.

An example template:

#' A short one-line description of this function.
#'
#' Sometimes longer description is necessary. If so, leave a blank line after
#' the one-line summary. The next paragraph goes into the "Description" section
#' of the help page.
#'
#' After that paragraph, subsequent paragraphs are automatically put into the
#' "Details" section of the help page.
#'
#' @param bar A brief description of the bar parameter, as a complete sentence.
#' @param baz A description of the baz parameter. Descriptions should say what
#'     type the parameter is (list, vector, data frame, ...) and what it does.
#' @return A brief description of what the function returns.
#'
#' @examples
#' foo(4, 7)
foo <- function(bar, baz) {
    ## do stuff...

    return(quux)
}

Notice that the comment lines start with #', not just # – that’s how Roxygen2 knows they are documentation comments. They go immediately before the function definition.

In ordinary code, these comments are just nicely-formatted ordinary comments. In an R package, you can set up Roxygen2 to automatically turn these comments into .Rd files distributed with your package, so users can use ?foo or help(foo) to look up the help page for foo.

Check out the Roxygen2 vignette for more examples on how to format documentation for different types of R objects.

Python #

In Python, code is documented using docstrings. These are ordinary strings written using three quotation marks at each end, rather than one, and can span multiple lines. A triple-quoted string at the beginning of a function or a class is a docstring.

An example template:

def foo(bar, baz):
    """A short one-line description of this function.

    Sometimes longer description is necessary. If so, leave a blank line after
    the one-line summary, then write additional paragraphs as needed.
    Additionally, you can document the arguments individually if needed.

    Parameters:

    bar: A list of bars.
    baz: An integer indicating the amount of baz.
    """

    pass

Notice that the docstring is inside the function as the very first thing in the function. Docstrings can also be used inside classes. Here’s an example of documenting a class and its methods, demonstrating the style commonly used in Python:

class MultilayerFrobincator:
    """A Penman's multi-layer frobicator with adjustable layering.

    Instances have several attributes:

    layers: An integer number of layers.
    tortosity: The Hindenburg tortosity index of this frobnicator.
    """

    def __init__(self, layers=1):
        """Initialize an empty frobnicator."""

        self.layers = layers

        ## do other stuff...

    def add_layer(self, layer_def):
        """Add a new layer to the bottom of the layer stack.

        layer_def is a dictionary of layer parameters.
        """

        ## do stuff

        pass

    def get_layer(self, layer_no):
        """Return the layer definition of a layer."""

        return stuff

Notice several conventions:

  • Docstrings are sentences ending in periods. They describe the function as a command (“return the layer definition”, “add a new layer”), not as a description (“adds a layer”).
  • Docstrings don’t repeat what you already know by looking at the function definition (e.g. the names of all the arguments).
  • Docstrings can be one line long, when the function is easy to describe, or several lines if more description is needed.

Following these conventions has advantages. Docstrings are accessible through Python code: calling help(foo) prints out the docstring for foo, and tools like Visual Studio Code automatically look up docstrings so they can show useful help while you’re writing code.

The full Python conventions on docstrings are described in PEP 257 – Docstring Conventions.

Other languages #