reticulate and rpy2

— Christopher Genovese and Alex Reinhart

reticulate #

The reticulate package for R provides a bridge between R and Python: it allows R code to call Python functions and load Python packages. You can even use Python code in an RMarkdown document in RStudio.

Calling Python code in R is a bit tricky. If I make an R data frame and want to give it to a Python function, how can the Python function manipulate the data frame? If a Python function returns a tuple, how does the R code access a tuple if tuples are not an R data type?

reticulate solves these problems with automatic conversions. An R data frame is given to Python code as a Pandas data frame; a named list (like list(a=4, b=2, c=3) is passed to Python as a dictionary; an R matrix is turned into a Numpy array. The conversions work in the other direction as well: when Python returns a tuple, it’s turned into an R list; a Pandas data frame is turned back into an R data frame.

The introductory vignette gives plenty of examples. Let’s try a couple.

Basic Usage #

With reticulate, we can import ordinary Python packages and do things with them:

library(reticulate)

os <- import("os")
os$listdir("/Users/alexreinhart/Desktop")

This is equivalent to:

import os

os.listdir("/Users/alexreinhart/Desktop")

We can do fancier things like

library(reticulate)

np <- import("numpy")

a <- np$arange(4)
a * 2

You can run entire Python scripts or bits of code and then access their results:

library(reticulate)

py_run_file("some_script.py")

py_run_string("x = 10")

py$x  #=> 10

RMarkdown #

You can seamlessly use Python and R inside the same RMarkdown document. For example, consider this file:

First we load `reticulate`:

```{r setup, include=FALSE}
library(reticulate)
```

Then we run some Python code in a Python code block:

```{python}
import pandas

flights = pandas.read_csv("flights.csv")
flights = flights[flights['dest'] == 'ORD']
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
```

Then we can read the `flights` variable in R by accessing `py$flights`:

```{r, fig.width=7, fig.height=3}
library(ggplot2)
ggplot(py$flights, aes(x=carrier, y=arr_delay)) + geom_point() + geom_jitter()
```

The vignette has more examples like this.

(You might need to update your knitr package to the latest version for this to work.)

Resources #

  • The reticulate package has documentation vignettes introducing it, showing how to use packages, and showing how to use Python in RMarkdown.

rpy2 #

Python is a great general-purpose language with libraries to do just about anything – connect to databases, use web services, make graphical interfaces, whatever. But R has CRAN, with every statistical method imaginable implemented in a package somewhere. If you’re writing Python code, it can be frustrating to find a CRAN package with no Python equivalent.

rpy2 is a Python library that lets you call R directly from inside Python. The documentation is incomplete and unclear, but the basics are straightforward.

Examples #

We can load R packages (like the built-in base and stats packages) easily:

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
base = importr("base")
stats = importr("stats")

We can run code with the robjects.r method and get its results:

result = robjects.r("""
foo <- function(x, y) { x + y}
foo(3, 4)""")

If we want to create R data, we can do so explicitly:

xs = robjects.FloatVector([1, 2, 3, 4, 5])

but it’s easier to do so with R functions:

xs = robjects.r("1:5")

And you can call R functions directly:

robjects.r['sort'](xs, decreasing=True)

which will return an R vector.

There’s a lot more – you can use rpy2 to generate R graphics, you can access R objects and classes, work with data frames and types, use R packages, and so on. Check the documentation for more details.