In the lecture on the shell, we discussed the command-line shell and the ways it lets you connect together simple commands to do complicated things. We also discussed how writing your programs as commands – as scripts that can be run from the command line, with arguments controlling their behavior – makes it easy to automate your programs. An entire data analysis pipeline could be built to run at the command line.
Creating commands #
Because of how commands are defined and found, we can create our own commands. Let’s start with commands that just run a file containing shell commands.
Here’s a goofy example. Create a file blah
in your current
directory containing the following four lines:
#!/bin/bash
echo The first argument is $1.
echo The files in the directory above the current one are that start with g are:
ls -1 .. | grep --ignore-case '^g'
The first line is a special comment (delimited by #
) that I’ll explain in a moment.
What do you make of the rest?
At the shell prompt, type
chmod +x blah
(This might not work for Windows users, but don’t worry.)
Now try two commands (for Windows users only the second might work):
./blah first second third fourth
and
bash blah first second third fourth
Why did I include the ./
in the first command? What does all this mean?
The file blah
is an example of a shell script, a list of shell commands
that can itself be run as a command.
The first line #!/bin/bash
is called a shebang. It is a special comment that
tells the shell how to process the file as a command – what program will
interpret these commands. In this case, it is the shell (bash
) itself.
And the shell essentially does a version of what we do in the second
command.
The command chmod +x blah
makes the file blah
an executable, telling
the shell that it is OK to use that file as a command. The second
version of the command does not require that step.
We can make scripts using other languages too. For instance, most interactive programming languages provide a command-line tool that will run a program. For example:
python3 batch.py
Rscript analyze.R
ruby frobnicate.rb
racket anagrams.rkt
julia maxSubSum.jl
You can always run scripts this way, but if you want, you can turn your script
into its own command, so you can run ./batch.py
instead of python3 batch.py
.
To do so:
- Add a shebang, e.g.,
#!/usr/bin/env python3
, to the first line of the file, specifying which program is used to run it. - Change the file to executable, e.g.,
chmod +x foo
.
Here’s a file foo
after these steps:
#!/usr/bin/env python3
def frobnicate():
pass
frobnicate()
Now, you can run ./foo
as the command when in the same
directory, or more conveniently, put foo
in your path and
just type foo
, or you can type the full path /Users/genovese/frobs/foo
.
(Again, Windows is a bit stodgier here. It doesn’t understand shebangs, so you
need to write python3 foo
instead of just foo
.)
Reading standard input, output, and error #
In the shell lecture, we discussed the three input and output streams: standard input, standard output, and standard error. We saw that these could be redirected in various ways.
Your programs can also use these streams.
Reading from standard input in a program is easy, though the details vary from language to language.
In Python, the fileinput module provides an easy way to read from standard input, just as though you had opened a file:
import fileinput
# a bunch of code goes here
for line in fileinput.input():
do_stuff(line)
In R, stdin
is a special file you can open and read like any other file you
can read:
#!/usr/bin/Rscript
## do stuff here
f <- file("stdin")
open(f)
lines <- readLines(f)
do_stuff_with(lines)
If you know that the standard input will be formatted as a CSV, you can even
write read.csv("stdin")
.
Writing to standard output is easier: just write output like you normally do, with
cat
or print
or println
et cetera.
In R, the message
and warning
functions print to standard error by default. For
example,
print(a_bunch_of_data)
message("Processed ", nrows(data), " data entries, ", num_errors, " failed")
The result of print
will go to wherever we direct our output; the result
of message
to where we direct our errors. This is very useful.
In Python 3, we can similarly write
import sys
print("A big important warning message", file=sys.stderr)
All the redirection and pipeline features discussed in the shell lecture apply here. The standard output of one command can be piped into the standard input of another, and so on. We’ll often make use of these features.
Command-line arguments #
As we discussed in the shell lecture, commands can take arguments. These arguments can be used by the command to decide what it should do.
For example, the new-homework
command you use to submit your homework
assignments takes several arguments. The most important argument is the name
of the homework assignment you want to start, but it also takes an argument
specifying the language you want to use, and various optional flags to adjust
how it sets up the homework.
For historical reasons, commands you encounter often use different conventions and naming schemes for their arguments. Fortunately, most programming languages have libraries for handling command-line arguments, so you don’t need to worry about the details.
In Python, the argparse module is all you will ever need. Your code can specify the types of arguments it takes, and argparse handles parsing them and even printing error and help messages if the user provides the wrong arguments. It has a useful tutorial you can use to get started.
For R, the optigrab package is quite easy to use and requires less boilerplate code; check the Using Optigrab vignette for simple examples.
You can also get arguments directly as a list or vector. Try writing a program
called arguments.py
containing
import sys
print(sys.argv)
or an R script called arguments.R
containing
print(commandArgs(TRUE))
and run these at the command line with toy arguments:
python3 arguments.py --some-argument foo bar
Rscript arguments.R --foo bar -b -a -z