Date: 20 November 2024 @ 17:00 - 18:00

Topic: "Survival guide for the upcoming GPU upgrades (more total power, but fewer GPUs)"Speaker: Sergey Mashchenko, SHARCNETVideo link --- In the coming months, national systems will be undergoing significant upgrades. In particular, older GPUs (P100, V100) will be replaced with newest H100 GPUs from NVIDIA. The total GPU computing power of the upgraded systems will grow by a factor of 3.5, but the number of GPUs will go down significantly (from 3200 to 2100). This will present a significant challenge for our users, as "the business as usual" (using a whole GPU for each process or MPI rank) will no longer be feasible in most cases. Fortunately, NVIDIA provides two powerful technologies which can be used to mitigate this situation: MPS (Multi-Process Service) and MIG (Multi-Instance GPU). This presentation will walk you through both technologies, and discuss the ways they can be used on our clusters. We will discuss how to figure out which of the approaches will work the best for your code. At the end a live demonstration will be given.---The Compute Ontario Colloquia are weekly Zoom presentations on Advanced Research Computing, High Performance Computing, Research Data Management, and Research Software topics, delivered by staff from three Compute Ontario consortia (CAC, SciNet, SHARCNET) and guest speakers. The series began January 2023 and superseded similar series previously delivered by individual consortia (e.g. General Interest Seminars by SHARCNET or User Group Meeting TechTalks by SciNet). The colloquia are one hour long and include time for questions. No registration is required. Presentations are usually recorded and uploaded to the hosting consortium video channel (colloquia hosted by SHARCNET go to our youtube channel).

Keywords: RDM, Research Data Management, GPU, HPC

Venue: online


Activity log