SSH Access to HPC4Health🔗

The most basic interaction a user can have with the HPC4Health compute cluster is using Secure SHell (SSH) to access a remote terminal. SSH provides an encrypted tunnel into a remote server over which data and commands can be securely transfered.

A SSH utility should come preinstalled on Windows, macOS, and most Linux distrubtions. If for some reason you don't have an SSH utility, you can install OpenSSH (the most common open source SSH library) as follows:

Windows: Get started with OpenSSH for Windows
macOS: Homebrew OpenSSH
Ubuntu: OpenSSH Server for Ubuntu

Remote Access Nodes🔗

Here is a summary of the 4 main node types a H4H user can remotely connect to:

Node Type	Names	Description	Access	Notes
Login Node	login node	- Provides 50 GB of personal space for scripts, metadata, local package installations, etc	- personal directory only.	- No access to the group directory! - Able to request resources and submit jobs via the Slurm scheduler
Data Node	`data node`	- Access to the group directory, where datasets should be stored	- personal directory - group directory	- No access to request resources or submit jobs!
Compute Node	node[1-100] (i.e `node1`, `node2`, ...)	- Where jobs are run, and data is processed	- personal directory - group directory	- No access to internet! - can submit jobs via the Slurm scheduler - name is determined after the node is assigned to you
Build Node	`uhnslurmbuildbox`	- Where software is built and installed for your use	- personal directory only	- No access to the group directory! - only for building software

The Login Nodes are the primary point of access to the H4H cluster. From here, you can request access to additional compute resources (via Compute Nodes) which are able to interact with large datasets in your labs group directory.

Importantly, you only have 50 GB of storage on the Login Node, so it is important to be cognizant of the files you are storing here. If you need to store data, it is recommended to store them in an appropriate folder on the Data Node and follow the H4H Data Management Plan (DMP).

Data Node🔗

The Data Transfer node can be used if you need to upload or download small to medium sized files from H4H. It can also be mounted to work with remote files as if they were on your local machine (see SSH file system).

This node has access to group directories which are shared between all members of a lab group. This is where you should store large data files associated with your project or analysis.

This node does not have access to additional compute resources, so you will need to submit jobs to the Compute Nodes to run analyses on the cluster.

Compute Nodes🔗

Running analyses on the cluster requires submitting jobs (either batch or interactive) to run in a virtual machine on one of the cluster nodes. You can submit jobs only via the Login Nodes, but once you do your script will have access to your group directory where large data files associated with your project or analysis are stored.

The Slurm tutorial will show you how to use Slurm to view information about available Nodes and Partitions on the cluster. This information is also available in the UHN Bioinformatics and HPC Core Intro document [3], which provides an overview of how to access H4H written by the cluster administrators.

Build Nodes🔗

The Build Nodes are used to compile and install software packages that are not available on the cluster. See more information on Installing Packages.

Connecting from the Terminal🔗

Connecting to H4H requires that you have access to the UHN intranet or can connect remotely via the UHN VPN service. It also requires that you have an H4H account to authenticate to the cluster. Please see the VPN Setup page for instructions on accessing the VPN.

Connecting to H4H requires that you:

Have an active UHN VPN connection (see VPN Setup)
Have an active H4H account (see Getting an HPC4Health Account)

The first step in using H4H for your analyses is to connect to the cluster from your local machine. If you have not already done so, it is recommended that complete the Tutorial on Environment Setup to setup some useful environment variables on your local machine which make it easier to connect to the cluster. Once connected, you have access to myriad of compute resources, allowing you to scale up your analysis to run across multiple virtual machines, accessing 10s to 100s of CPUs each, with large amounts of RAM and the potential to add GPU accelerators if appropriate for your use case.

A Guide to SSH

If you will be working with remote servers in your career, it is worth understanding the basics of SSH and taking advantage of the many features it offers. For a more detailed overview of SSH see the tutorial on SSH.

As alluded to above, the Login node is your home on the cluster. It can be used to submit jobs and generally interact with the compute resources available using the Slurm Scheduler and command line utlities.

The login node can be accessed via ssh using:

ssh -p "$H4HLOGIN_PORT" "<username>@$H4HLOGIN"

Where <username> is replaced with your H4H username. This command will prompt you to enter your password before making the SSH connection.

Data Node🔗

The data transfer node is primarily intended to allow users to upload and download small data files such as scripts, job configurations, or metadata to and from H4H. An example of transferring files via the data node can be found in examples/h4h_files_transfer.sh. Please never transfer sensitive information such as PHI off of the server!

In addition to data transfers, access to the data node can be useful to make interacting with the cluster feel more like developing on your local machine. For details on how SSHFS can be used to mount the data drive to your local machine, please see the SSH Tutorial.

The data node can be accessed via ssh using:

ssh -p "$H4HDATA_PORT" "<username>@$H4HDATA"

Where <username> is replaced with your H4H username. This command will prompt you to enter your password before making the SSH connection.

Passwordless Access🔗

SSH Keys provide a secure way to prove your identity to a remote server. By generating an SSH Key and uploading it to H4H, you can easily interact with the server without typing your password every time. Use of SSH Key authentication is also generally more secure than password and is the preferred method for authenticating to H4H.

The ssh-keygen utility can be used to generate a public and private SSH key pair if you don't already have one. You can then use the ssh-copy-id command to upload your public key to the H4H Login Node, which enables passwordless access (just don't forget your password!).

Generate an SSH Key Pair🔗

If you already use SSH to access other resources, you can reuse your existing SSH public key. Skip this section if this is the case.

# Follow the prompts
ssh-keygen

Upload SSH Public Key to H4H🔗

Once you have a public/private SSH key pair, you can upload your public key to H4H using:

# Change username to your own and enter password when prompted
ssh-copy-id -p "$H4HLOGIN_PORT" "<username>@$H4HLOGIN"

You should now be able to connect to H4H without entering your password using:

ssh -p "$H4HLOGIN_PORT" "<username>@$H4HLOGIN"

SSH Access to HPC4Health🔗

Remote Access Nodes🔗

Login Nodes🔗

Data Node🔗

Compute Nodes🔗

Build Nodes🔗

Connecting from the Terminal🔗

Login Node🔗

Data Node🔗

Passwordless Access🔗

Generate an SSH Key Pair🔗

Upload SSH Public Key to H4H🔗

References🔗