Running Jupyter on H4H🔗

Jupyter Notebook or Lab will be familiar to most people working on data science in the Python ecosystem. Similar to RMarkdown for R or Pluto.jl for Julia, Jupyter provides a single file to contain your end-to-end analysis, including code, visualizations, and markdown text. It serves as one of the primary methods for conducting an analysis and sharing results for data science, ML, and AI in Python.

Platforms like Google Colab or Databricks provide a Jupyter Notebook interface to the cloud, and allow access to powerful remote computers where a user can leverage GPUs or TPUs to accelerate their analysis and scale up to huge datasets. However, when working with sensitive PHI, it is often not possible to move data off your local HPC cluster.

As a solution, this tutorial will show you how to set up the familiar Jupyter interface on H4H and conduct your analysis with all the resources available via the cluster.

Jupyter Notebook or Lab🔗

Running programs on non-Compute nodes

Running programs or analyses on the Login or Data node is strictly prohibited on H4H. This is to ensure that performance across this shared node is high quality for all H4H users and is not compromised by a few people taking more than their fair share of available RAM and CPU. As such, we need to run our Notebook on a Compute node but don't have internet access!

There is no visual user interface on the Compute nodes, so we need to setup an SSH bridge forwarding the Jupyter port on the compute node through the login node and to your local machine. Doing so allows you to open up Jupyter Lab or Notebook in your local browser via localhost while maintaining access to the cluster.

Follow the tutorial below to learn how.

Try it yourself:

SSH into the Login Node

Solution

ssh -p "$H4HLOGIN_PORT" "<username>@$H4HLOGIN"

If you haven't already, install miniconda; follow the instructions in the Installing Packages Tutorial

Solution

salloc -p build -c 1 --mem=10G -t 0:30:0
# Now on the build node
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh  # follow the prompt, say yes to conda init!
source ~/.bashrc  # you should now see (base) prepended to your terminal prompt

If you haven't already, install jupyter and/or jupyterlab, generate a config file, and set a default password

Solution

salloc -p build -c 1 --mem=10G -t 0:30:0  # only need if you exited from step 2!
# Now on build node
conda activate  # specify a custom environment if desired
conda install -c conda-forge jupyterlab  # this will take a few minutes and also installs notebook
jupyter notebook --generate-config  # replace notebook with lab for jupyterlab
jupyter notebook password  # Setup a remote access password

Allocate an interactive job on a compute node to configure Jupyter (note the node you were allocated!)
Solution
```
salloc -c 2 --mem=8G -t 0:30:0
```

Launch Jupyter Lab or Notebook

Solution

jupyter notebook --ip="$(hostname -i)" --port=8888  # use lab for jupyter lab

In a terminal on your local computer, setup SSH port forwarding from the compute node to the login node then to your local computer

Solution

ssh -N -L -p "$H4HLOGIN_PORT" "<local_port>:<node_name>:8888" "<username>@$H4HLOGIN"  # this will not return anything, that means it worked; to disconnect press CTRL/CMD + C; connection stays alive as long as the process is running

Navigate to the Jupyter Notebook at http://localhost:<local_port> and login with your configured password. You should now have access to your Jupyter Notebook or Lab instance.