reticulate #
The reticulate package for R provides a bridge between R and Python: it allows R code to call Python functions and load Python packages. You can even use Python code in an RMarkdown document in RStudio.
Calling Python code in R is a bit tricky. If I make an R data frame and want to give it to a Python function, how can the Python function manipulate the data frame? If a Python function returns a tuple, how does the R code access a tuple if tuples are not an R data type?
reticulate
solves these problems with automatic conversions. An R data frame is
given to Python code as a Pandas data frame; a named list (like list(a=4, b=2, c=3)
is passed to Python as a dictionary; an R matrix is turned into a Numpy
array. The conversions work in the other direction as well: when Python returns
a tuple, it’s turned into an R list; a Pandas data frame is turned back into an
R data frame.
The introductory vignette gives plenty of examples. Let’s try a couple.
Basic Usage #
With reticulate
, we can import ordinary Python packages and do things with
them:
library(reticulate)
os <- import("os")
os$listdir("/Users/alexreinhart/Desktop")
This is equivalent to:
import os
os.listdir("/Users/alexreinhart/Desktop")
We can do fancier things like
library(reticulate)
np <- import("numpy")
a <- np$arange(4)
a * 2
You can run entire Python scripts or bits of code and then access their results:
library(reticulate)
py_run_file("some_script.py")
py_run_string("x = 10")
py$x #=> 10
RMarkdown #
You can seamlessly use Python and R inside the same RMarkdown document. For example, consider this file:
First we load `reticulate`:
```{r setup, include=FALSE}
library(reticulate)
```
Then we run some Python code in a Python code block:
```{python}
import pandas
flights = pandas.read_csv("flights.csv")
flights = flights[flights['dest'] == 'ORD']
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
```
Then we can read the `flights` variable in R by accessing `py$flights`:
```{r, fig.width=7, fig.height=3}
library(ggplot2)
ggplot(py$flights, aes(x=carrier, y=arr_delay)) + geom_point() + geom_jitter()
```
The vignette has more examples like this.
(You might need to update your knitr package to the latest version for this to work.)
Resources #
- The reticulate package has documentation vignettes introducing it, showing how to use packages, and showing how to use Python in RMarkdown.
rpy2 #
Python is a great general-purpose language with libraries to do just about anything – connect to databases, use web services, make graphical interfaces, whatever. But R has CRAN, with every statistical method imaginable implemented in a package somewhere. If you’re writing Python code, it can be frustrating to find a CRAN package with no Python equivalent.
rpy2 is a Python library that lets you call R directly from inside Python. The documentation is incomplete and unclear, but the basics are straightforward.
Examples #
We can load R packages (like the built-in base
and stats
packages) easily:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
base = importr("base")
stats = importr("stats")
We can run code with the robjects.r
method and get its results:
result = robjects.r("""
foo <- function(x, y) { x + y}
foo(3, 4)""")
If we want to create R data, we can do so explicitly:
xs = robjects.FloatVector([1, 2, 3, 4, 5])
but it’s easier to do so with R functions:
xs = robjects.r("1:5")
And you can call R functions directly:
robjects.r['sort'](xs, decreasing=True)
which will return an R vector.
There’s a lot more – you can use rpy2 to generate R graphics, you can access R objects and classes, work with data frames and types, use R packages, and so on. Check the documentation for more details.