Overview

QHub HPC is a deployment of HPC with jupyterhub here we talk about the services that run and how they are connected. The architecture is based off of the typical hpc setup of a master/login node with N worker nodes. The worker nodes are designed to have minimal dependencies which most of the setup involves configuring the master node. At a high level there are several services: monitoring, the job scheduler (slurm), and jupyterhub and related python services.

Important urls:

  • <master node ip>:8000 jupyterhub server

  • <master node ip>:3000 grafana server with username admin and password admin

All Nodes

Services

Master Node

Services

Monitoring

  • grafana :: central place to view monitoring information (default port 3000)

  • prometheus :: metrics scraper (default port 9090)

  • slurm_exporter :: slurm metrics (default port 9341)

Slurm

  • slurmctld :: slurm central management daemon

  • slurmdbd :: slurm accounting

  • mysql :: database for slurm accounting

Python Ecosystem

  • jupyterhub :: scalable interactive compute (default port 8000)

  • nfs server for sharing conda environments and home directories between all users

Worker Nodes

Services

Slurm

  • slurmd :: slurm agent that runs on all worker nodes