GPU Server

From MuHack
Jump to navigation Jump to search
Service: GPU Server
Operational Status Working
Location Aula associazioni
Contact [email protected]
Super Users Ceres-c, Mrmoddom, Cammo
Owner MuHack
Last Update 2025-05-13
URIs
gpu.muhack

MUHACK GPU SERVER

A shared machine that gives all members access to serious GPU horsepower for AI/ML, rendering, data-science and other compute-intensive experiments.

What this box is for

  • Training / fine-tuning neural networks
  • CUDA / OpenCL development and compilation
  • 3-D rendering, video encoding or scientific workloads
  • General “I-need-48-GB-of-VRAM” tinkering

The host runs Proxmox VE, so we can either

  1. carve the GPU into vGPU slices for VMs, or
  2. passthrough the full card to Debian 12 LXC containers.

Fast facts

  • Host name  : gpu.muhack (reachable only locally or through the MuHack Tailscale network)
  • Management  : Proxmox web UI @ port 8006 → Link
  • Sysadmins  : Ceres-c, Mrmoddom, Cammo
  • Contact  : [email protected] • or in person every Tuesday evening at the MuHack meeting

Hardware

  • CPU  : Intel Core i7-4930K — 6 cores / 12 threads @ 3.40 GHz (3.90 GHz Turbo)
  • RAM  : 64 GB DDR3-1866
  • GPU-A  : NVIDIA Quadro RTX 8000 — 48 GB GDDR6 (compute card)
  • GPU-B  : NVIDIA GeForce GT 710 — emergency host console only
  • Disk  : 1 TB PNY SATA SSD + additional SATA ports free for expansion
  • NICs  :
    • Intel 82574L Gigabit
    • Intel 82579V Gigabit

Software stack

  • Proxmox VE 8.x (kernel 6.8)
  • “Merged” NVIDIA driver 550.90.07 (both vGPU and passthrough available)

Pre-built images

  • Debian 12 LXC (default; CUDA-ready; shared GPU)
  • Ubuntu 22.04 / Windows VM (on request, with dedicated vGPU slice, only for specific necessities)

Getting your own environment

  1. Make sure your laptop/PC is connected to the MuHack Tailscale tail-net and can ping gpu.vpn.muhack.
  2. Talk to any sysadmin (email or in person on Tuesday). Tell us
    1. what you plan to run (ML, rendering, etc.)
    2. an estimate of how much GPU RAM / runtime you’ll need
  3. We’ll create a container (or VM) for you and hand back:
    1. ID (e.g. ct104 or vm203)
    2. a random initial password

Typical turnaround time: usually same day or next-day. However might take more time if the sysadmins are busy.

After the first login you have to re-login a second time to let proxmox setup correctly your account's group

Using your Debian 12 LXC

Need CUDA toolkit? Run the helper script already placed in /root:

/root/install-cuda-toolkit.sh
source ~/.bashrc
nvcc --version

You’ll get CUDA toolkit 12.4 that matches the host driver.

Fair-share policy

  • Jobs longer than 12 h → announce them at the Tuesday meeting or in the telegram channel.
  • Sysadmins may pause/stop workloads that block others
  • Crypto-mining strictly forbidden

Good-citizen checklist

  • Before logging out, run watch -n60 nvidia-smi and confirm your process is gone and is not hogging the gpu.
  • Keep /tmp and your home directory tidy; disk space is shared
  • BACK UP YOUR OWN DATA: we have no data persistency or backup guarantee

Extras

  • vGPU-enabled Windows or Ubuntu VMs — ask if needed
  • Docker-inside-LXC supported (nesting + cgroupv2 enabled)

Troubleshooting quick-ref

Problem Fix
nvidia-smi shows no devices Container lost its GPU mapping—tell a sysadmin
Driver mismatch error Host driver was updated—reinstall matching userspace driver with --no-kernel-modules—tell a sysadmin
Out-of-memory in PyTorch If using a VM: request a larger vGPU slice. If using a container: someone else might be using some GPU memory. Check with nvidia-smi (gui for that coming up sometime kinda soon).

Questions / ideas → email [email protected], [email protected], or [email protected] — or just grab us at the Tuesday MuHack meeting.