Programming Language Resources

— Alex Reinhart and Christopher Genovese

Part of the joy of programming is realizing that there are many different programming languages, many of which have very different ways of expressing computational ideas. The structure of a language and the constraints it imposes affect how you think about and approach problems. Learning a new language, especially one very different from one you are familiar with, opens up new ways of thinking. (This is a technological version of the Sapir-Whorf hypothesis.)

We encourage you to experiment with an unfamiliar programming language in this course. There are a couple of ways to choose a language to try:

  1. Pick one that complements what you know. C++, for example, can seamlessly integrate with R using Rcpp, so you may want to learn it so you can write fast C++ code for the core of an algorithm to be used in R. JavaScript can be used for interactive data visualizations and apps on the Web, and may fit with data analysis you do in another language.
  2. Pick one that stretches your brain in a new and different direction. Any variant of Lisp, for example, is reputed to bring profound enlightenment; more pure functional languages like Haskell, OCaml, or F# bring a very different way of thinking than typical R or Python.

Here we provide some resources for those interested in different languages. Some of the books referenced below are part of the statistics student library, in the graduate student offices.

If you want to know what these languages (and many others) look like, look around at Rosetta Code, which presents solutions to programming problems in hundreds of different languages. (Don’t get carried away; many of the languages are esoteric and rarely used.)

#

The quintessential statistical programming language which you’ve probably already seen. Some resources:

  • R-Project.org
  • The Comprehensive R Archive Network, with packages for nearly any statistical task you’d ever want. Also check out the task views, detailed overviews of packages for specific subjects.
  • swirl provides interactive R tutorials from inside R, ranging from basic R syntax to statistical data analysis methods.
  • R for Data Science, by Garrett Grolemund and Hadley Wickham (online and in print), an introduction to using R for statistics.
  • The Art of R Programming, by Norman Matloff, an introduction to R that quickly advances to high-level concepts.
  • The Book of R, by Tilman Davies, a more basic comprehensive introduction to R.
  • Advanced R, by Hadley Wickham, covering the more esoteric and advanced features of R.
  • R Language for Programmers, a quick review of some of R’s features for those who already know another language.
  • The tidyverse style guide is Hadley Wickham’s guide to R style, formatting, and naming; following these rules will keep your code consistent with other popular R packages.
  • lintr analyzes your R code for style issues and common mistakes, following the tidyverse style. RStudio has its diagnostics built in as an option, and other text editors can be configured to use lintr too; see its documentation for details.
  • roxygen2 generates R documentation from specially formatted comments; the comment format, which is a good style to use even if you’re not using roxygen2, is specified here.

Python #

Another very popular dynamic language.

  • Python.org
  • The Python documentation is quite good, including the tutorial and the library reference.
  • Snakify, a nifty online interactive tutorial for the very basics.
  • PEP 8 is the standard Python style guide, explaining the conventions for naming and formatting; PEP 257 describes documentation conventions.
  • Common Gotchas explains some common mistakes made by new Python programmers. The same site also has a list of references for learning Python and some advice on style.
  • Jupyter, a project providing a web notebook interface to writing Python (and other) code.
  • NumPy, a library for n-dimensional arrays and matrices, plus linear algebra functions. The foundation of all other scientific packages for Python.
  • SciPy, an extensive collection of mathematical and scientific functions for Python.
  • pandas, a package providing high-performance data frames and time series tools.
  • statsmodels, a collection of statistical models implemented in Python.
  • scikit-learn, machine learning algorithms in Python.
  • Python Crash Course, by Eric Matthes, a well-regarded introduction to programming in Python.
  • Automate the Boring Stuff with Python, by Al Sweigart, an introduction to using Python for automating boring repeated tasks, like scraping data off websites, organizing files, or fiddling with images.

Julia #

A new language for scientific computing, intended to be as flexible and easy to use as R or Python, but without giving up speed close to C or Fortran.

Racket #

A descendant of the age-old Scheme programming language, with many modern features and a fairly extensive standard library. Provides many tools for building your own small programming languages to solve specific problems, like Datalog, a logic programming language.

  • Racket-lang.org
  • The Racket Guide
  • How to Design Programs, a textbook introducing Racket from scratch while also teaching the basic principles of program design.
  • Realm of Racket, by a host of authors, an introduction to Racket through games (and cartoons).
  • Matthew Butterick’s Beautiful Racket, an introduction to Racket by building a series of simple programming languages.
  • Matthew Butterick’s Why Racket? Why Lisp?, an elegant argument for the virtue of Racket, Lisp, and related languages (like Clojure, below).

Clojure #

A modern descendant of Lisp (from the same family as Scheme, and hence Racket), running on the industrial-strength Java Virtual Machine. (This language also runs on the web atop Javascript as ClojureScript.)

Like other Lisps (and Racket), Clojure uses a prefix style, so whereas you might right a function call in R or C as

func(arg1, arg2, arg3)

in Clojure, you just move the first parenthesis before function name, like

(func arg1, arg2, arg3)

and you can even drop the commas if you like (though you don’t have to)

(func arg1 arg2 arg3)

That small change makes Clojure (and other Lisp) syntax very easy to use. You see more parentheses, but after a short time, many advantages for working with code structured this way will become clear.

Resources:

  • Clojure.org, the main language site. See also ClojureScript.org for the version of the language that compiles to JavaScript and is outstanding for web apps.
  • Clojure for the Brave and True, by Daniel Higginbotham, an introduction to “the most powerful and fun programming language on the planet” via Vikings.
  • The Joy of Clojure, by Chris Houser and Michael Fogus, an introduction to programming in Clojure.

Haskell #

Lisp advocates believe Lisp is the path to enlightenment; Haskell advocates would beg to differ. A functional programming language heavily rooted in mathematics, with an advanced type system which allows the compiler to detect many types of bugs before the code even runs.

  • Learn You a Haskell for Great Good!, by Miran Lipovača, a slightly silly (but good!) introduction to Haskell.
  • Real World Haskell, by Bryan O’Sullivan, Don Stewart, and John Goerzen, a more serious in-depth book.
  • Category Theory for Programmers, a blog series by Bartosz Milewski, explaining the mathematical theory underpinning Haskell and many of its most advanced features, and also helping you understand why Haskell programmers make jokes like “A monad is just a monoid in the category of endofunctors, what’s the problem?”

JavaScript #

The language of the Web. JavaScript is understood by every web browser, so it’s widely used for interactive features on web pages, but also much more – server software, apps, and everything in between has been written in JavaScript.

Rust #

A low-level systems programming language, intended as a replacement for C and C++, which uses a strong type system to prevent crashes and security bugs. Fairly new and still being improved upon. If you’re thinking of writing high-performance servers, desktop and command-line applications, or highly parallel programs, Rust may be worth a look.