Writing Command-Line Programs

— Christopher Genovese and Alex Reinhart

In the lecture on the shell, we discussed the command-line shell and the ways it lets you connect together simple commands to do complicated things. We also discussed how writing your programs as commands – as scripts that can be run from the command line, with arguments controlling their behavior – makes it easy to automate your programs. An entire data analysis pipeline could be built to run at the command line.

Creating commands #

Because of how commands are defined and found, we can create our own commands. Let’s start with commands that just run a file containing shell commands.

Here’s a goofy example. Create a file blah in your current directory containing the following four lines:

#!/bin/bash
echo The first argument is $1.
echo The files in the directory above the current one are that start with g are:
ls -1 .. | grep --ignore-case '^g'

The first line is a special comment (delimited by #) that I’ll explain in a moment. What do you make of the rest?

At the shell prompt, type

chmod +x blah

(This might not work for Windows users, but don’t worry.)

Now try two commands (for Windows users only the second might work):

./blah first second third fourth

and

bash blah first second third fourth

Why did I include the ./ in the first command? What does all this mean?

The file blah is an example of a shell script, a list of shell commands that can itself be run as a command.

The first line #!/bin/bash is called a shebang. It is a special comment that tells the shell how to process the file as a command – what program will interpret these commands. In this case, it is the shell (bash) itself. And the shell essentially does a version of what we do in the second command.

The command chmod +x blah makes the file blah an executable, telling the shell that it is OK to use that file as a command. The second version of the command does not require that step.

We can make scripts using other languages too. For instance, most interactive programming languages provide a command-line tool that will run a program. For example:

python3 batch.py
Rscript analyze.R
ruby frobnicate.rb
racket anagrams.rkt
julia maxSubSum.jl

You can always run scripts this way, but if you want, you can turn your script into its own command, so you can run ./batch.py instead of python3 batch.py.

To do so:

  1. Add a shebang, e.g., #!/usr/bin/env python3, to the first line of the file, specifying which program is used to run it.
  2. Change the file to executable, e.g., chmod +x foo.

Here’s a file foo after these steps:

#!/usr/bin/env python3

def frobnicate():
    pass

frobnicate()

Now, you can run ./foo as the command when in the same directory, or more conveniently, put foo in your path and just type foo, or you can type the full path /Users/genovese/frobs/foo.

(Again, Windows is a bit stodgier here. It doesn’t understand shebangs, so you need to write python3 foo instead of just foo.)

Reading standard input, output, and error #

In the shell lecture, we discussed the three input and output streams: standard input, standard output, and standard error. We saw that these could be redirected in various ways.

Your programs can also use these streams.

Reading from standard input in a program is easy, though the details vary from language to language.

In Python, the fileinput module provides an easy way to read from standard input, just as though you had opened a file:

import fileinput

# a bunch of code goes here

for line in fileinput.input():
    do_stuff(line)

In R, stdin is a special file you can open and read like any other file you can read:

#!/usr/bin/Rscript

## do stuff here

f <- file("stdin")
open(f)

lines <- readLines(f)
do_stuff_with(lines)

If you know that the standard input will be formatted as a CSV, you can even write read.csv("stdin").

Writing to standard output is easier: just write output like you normally do, with cat or print or println et cetera.

In R, the message and warning functions print to standard error by default. For example,

print(a_bunch_of_data)

message("Processed ", nrows(data), " data entries, ", num_errors, " failed")

The result of print will go to wherever we direct our output; the result of message to where we direct our errors. This is very useful.

In Python 3, we can similarly write

import sys

print("A big important warning message", file=sys.stderr)

All the redirection and pipeline features discussed in the shell lecture apply here. The standard output of one command can be piped into the standard input of another, and so on. We’ll often make use of these features.

Command-line arguments #

As we discussed in the shell lecture, commands can take arguments. These arguments can be used by the command to decide what it should do.

For example, the new-homework command you use to submit your homework assignments takes several arguments. The most important argument is the name of the homework assignment you want to start, but it also takes an argument specifying the language you want to use, and various optional flags to adjust how it sets up the homework.

For historical reasons, commands you encounter often use different conventions and naming schemes for their arguments. Fortunately, most programming languages have libraries for handling command-line arguments, so you don’t need to worry about the details.

In Python, the argparse module is all you will ever need. Your code can specify the types of arguments it takes, and argparse handles parsing them and even printing error and help messages if the user provides the wrong arguments. It has a useful tutorial you can use to get started.

For R, the optigrab package is quite easy to use and requires less boilerplate code; check the Using Optigrab vignette for simple examples.

You can also get arguments directly as a list or vector. Try writing a program called arguments.py containing

import sys

print(sys.argv)

or an R script called arguments.R containing

print(commandArgs(TRUE))

and run these at the command line with toy arguments:

python3 arguments.py --some-argument foo bar

Rscript arguments.R --foo bar -b -a -z