Running Jupyter on H4H🔗
Jupyter Notebook
or Lab
will be familiar to most people working on data science in the Python ecosystem. Similar to RMarkdown
for R
or Pluto.jl
for Julia
, Jupyter
provides a single file to contain your end-to-end analysis, including code, visualizations, and markdown text. It serves as one of the primary methods for conducting an analysis and sharing results for data science, ML, and AI in Python.
Platforms like Google Colab or Databricks provide a Jupyter Notebook interface to the cloud, and allow access to powerful remote computers where a user can leverage GPUs or TPUs to accelerate their analysis and scale up to huge datasets. However, when working with sensitive PHI, it is often not possible to move data off your local HPC cluster.
As a solution, this tutorial will show you how to set up the familiar Jupyter interface on H4H and conduct your analysis with all the resources available via the cluster.
Jupyter Notebook or Lab🔗
Running programs on non-Compute nodes
Running programs or analyses on the Login
or Data
node is strictly prohibited on H4H. This is to ensure that performance across this shared node is high quality for all H4H users and is not compromised by a few people taking more than their fair share of available RAM and CPU.
As such, we need to run our Notebook on a Compute node but don't have internet access!
There is no visual user interface on the Compute nodes, so we need to setup an SSH bridge forwarding the Jupyter port on the compute node through the login node and to your local machine. Doing so allows you to open up Jupyter Lab or Notebook in your local browser via localhost
while maintaining access to the cluster.
Follow the tutorial below to learn how.
Try it yourself:
-
SSH into the Login Node
-
If you haven't already, install miniconda; follow the instructions in the Installing Packages Tutorial
Solution
salloc -p build -c 1 --mem=10G -t 0:30:0 # Now on the build node wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh # follow the prompt, say yes to conda init! source ~/.bashrc # you should now see (base) prepended to your terminal prompt
-
If you haven't already, install jupyter and/or jupyterlab, generate a config file, and set a default password
Solution
salloc -p build -c 1 --mem=10G -t 0:30:0 # only need if you exited from step 2! # Now on build node conda activate # specify a custom environment if desired conda install -c conda-forge jupyterlab # this will take a few minutes and also installs notebook jupyter notebook --generate-config # replace notebook with lab for jupyterlab jupyter notebook password # Setup a remote access password
-
Allocate an interactive job on a compute node to configure Jupyter (note the node you were allocated!)
-
Launch Jupyter Lab or Notebook
-
In a terminal on your local computer, setup SSH port forwarding from the compute node to the login node then to your local computer
-
Navigate to the Jupyter Notebook at
http://localhost:<local_port>
and login with your configured password. You should now have access to your Jupyter Notebook or Lab instance.