During your research careers, many of you will discover that your personal laptop is not enough to do your work. Perhaps you need to load a 50GB dataset into memory; perhaps you need to do thousands of time-consuming simulations; perhaps you need to run a process that will take many days to complete. None of these are well-suited to a typical laptop.
When this happens, you have several options:
- Buy a bigger laptop, or a desktop workstation.
- Run your code on a server maintained by the Department for research use.
- Run your code on a commercial cloud provider’s servers, such as Amazon AWS or Microsoft Azure.
- Run your code on a supercomputer, such as those at the Pittsburgh Supercomputing Center.
All of these options are available (and you will have heard about several at our seminars on departmental computing). But for options 2-4, you need a way to remotely run software on a server.
Introducing the Secure Shell (SSH) #
Earlier we introduced the shell. This provides you a way to run commands (including your own R or Python scripts) in a command-line environment.
The Secure Shell protocol, known as SSH, is a way to log in to a remote computer and work in a similar shell environment – except your commands run on the remote computer, not your own.
To use SSH, you need:
- A remote server to connect to
- SSH running on that server to receive your requests and run your commands
- An SSH account so you can log in and have permission to run commands
- An SSH client that runs on your computer, to send your commands to the remote server
Fortunately, SSH is extremely common: just about anyone with a Linux server administers it through SSH. Consequently, those of you on Linux or Mac computers already have an SSH client installed; on Windows, Git Bash includes an SSH client, or you can download PuTTY or Tectia for graphical clients.
The SSH protocol sets up an encrypted connection between the user (you) and the server. You can type commands just like you were using a shell on your computer; the output of those commands is sent back to you, just like you were running them on your own computer.
Making an SSH Connection #
The ssh
command creates an SSH connection. Run
ssh yourusername@server.name.example.com
to create a connection. The part after the @
is called the “hostname”;
different servers have different hostnames just like different websites have
different URLs.
(If you prefer PuTTY or Tectia, they have dialog boxes that have you enter your username and desired hostname.)
The first time you connect to a particular server, you’ll get a message like this:
The authenticity of host 'server.name.example.com (127.0.0.1)' can't be established.
RSA key fingerprint is 30:9a:97:f3:19:4f:d1:6e:28:76:9e:e7:d1:df:2c:31.
Are you sure you want to continue connecting (yes/no)?
Because SSH is an encrypted protocol, it uses public-key cryptography to authenticate the server. That “key fingerprint” is the server’s key; if someone told you the server’s key fingerprint in advance, you can check it to be sure you are reaching the right server and nobody is eavesdropping. If not, just type “yes” and cross your fingers.
You’ll then be asked for your password; note that when you type, no text will appear, but your password is still being received.
Then you’ll get something like this:
Linux server.name.example.com 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Sep 20 09:39:37 2021 from [IP address]
yourusername@example:~$
That last line is a shell prompt: you can now type commands. The stuff above it is an introductory message that can vary from server to server.
To end an SSH connection, we use the exit
command.
yourusername@example:~$ exit
Connection to server.name.example.com closed.
Transferring Files #
scp
, for “secure copy”, uses the SSH protocol to move files from one computer
to another.
For example, suppose you have a file foo.txt
in your current directory.
scp foo.txt yourusername@server.name.example.com:~/some-directory/
This takes foo.txt
and puts it into some-directory/
in your home directory on
the server.
Symmetrically,
scp yourusername@server.name.example.com:~/some-directory/data.txt .
copies some-directory/data.txt
to your current directory (signified by .
).
You can also use wildcards:
scp "yourusername@server.name.example.com:~/data/*.csv" data/
This copies all CSV files in the given directory into the data/
directory on
your computer. Notice the quotation marks, intended to prevent your local
shell from trying to handle the wildcard *
and let the server interpret it
instead.
Running and Multiplexing Commands #
At the most basic level, running commands over SSH is just like running
commands on your computer. Try running pwd
or ls
or anything like that.
But often you’ll want to run things that take a while. If you disconnect from
SSH (such as by typing exit
or having your WiFi go down), you will be logged
out from the server – and all the programs you started will be killed.
You might also want to do several things at the same time. Maybe you want to edit some code, run some different code, and produce plots from some data, all (nearly) at the same time. (Maybe your code takes 20 minutes to run, so while it runs, you switch to tweaking some other code to make some plots.)
This is where you want a terminal multiplexer. A terminal multiplexer lets you run several shells at the same time and keep them running even after you log out.
I recommend tmux
. In tmux
, you create a “session”; inside that session you can
have multiple “windows”, which are just separate shells. You can switch
between them with various keyboard shortcuts. You can then “detach” from a
session to log out of the server; when you return, you can “attach” to the
session.
See the Tmux Cheat Sheet for more information.