GPU Server
From MuHack
Jump to navigationJump to search
Service: GPU Server | |
---|---|
Service URI | gpu.muhack |
Location | Aula associazioni |
Contact | webmaster@muhack.org |
Operational Status | Working |
Super Users | Ceres-c, Mrmoddom, Cammo |
Owner | MuHack |
Last Update | 2025-05-14 |
MUHACK GPU SERVER
A shared machine that gives all members access to serious GPU horsepower for AI/ML, rendering, data-science and other compute-intensive experiments.
What this box is for
- Training / fine-tuning neural networks
- CUDA / OpenCL development and compilation
- 3-D rendering, video encoding or scientific workloads
- General “I-need-48-GB-of-VRAM” tinkering
The host runs Proxmox VE, so we can either
- carve the GPU into vGPU slices for VMs, or
- passthrough the full card to Debian 12 LXC containers.
Fast facts
- Host name : gpu.muhack (reachable only locally or through the MuHack Tailscale network)
- Management : Proxmox web UI @ port 8006 → Link
- Sysadmins : Ceres-c, Mrmoddom, Cammo
- Contact : webmaster@muhack.com • or in person every Tuesday evening at the MuHack meeting
Hardware
- CPU : Intel Core i7-4930K — 6 cores / 12 threads @ 3.40 GHz (3.90 GHz Turbo)
- RAM : 64 GB DDR3-1866
- GPU-A : NVIDIA Quadro RTX 8000 — 48 GB GDDR6 (compute card)
- GPU-B : NVIDIA GeForce GT 710 — emergency host console only
- Disk : 1 TB PNY SATA SSD + additional SATA ports free for expansion
- NICs :
- Intel 82574L Gigabit
- Intel 82579V Gigabit
Software stack
- Proxmox VE 8.x (kernel 6.8)
- “Merged” NVIDIA driver 550.90.07 (both vGPU and passthrough available)
Pre-built images
- Debian 12 LXC (default; CUDA-ready; shared GPU)
- Ubuntu 22.04 / Windows VM (on request, with dedicated vGPU slice, only for specific necessities)
Getting your own environment
- Make sure your laptop/PC is connected to the MuHack Tailscale tail-net and can ping gpu.muhack.
- Go to the proxmox login page and login a first time to create your account and then logout immediately
- Talk to any sysadmin (email or in person on Tuesday). Tell us
- what you plan to run (ML, rendering, etc.)
- an estimate of how much GPU RAM / runtime you’ll need
- We’ll create a container (or VM) for you and hand back:
- ID (e.g. ct104 or vm203)
- a random initial password
Typical turnaround time: usually same day or next-day. However might take more time if the sysadmins are busy.
After the first login you have to re-login a second time to let proxmox setup correctly your account's group
At this point you have two options for interact with the console:
- Use the proxmox web interface's integrated console
- Join the machine to a tailnet (your personal or the MuHack's one) and connect trough SSH -> remember to enable ssh root access within sshd config file!
Using your Debian 12 LXC
Need CUDA toolkit? Run the following script:
#!/bin/bash
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
apt install -y ./cuda-keyring_1.1-1_all.deb
apt update
rm cuda-keyring_1.1-1_all.deb
apt install -y cuda-toolkit-12-4
echo "export PATH=\${PATH}:/usr/local/cuda/bin" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}:/usr/local/cuda/lib64" >> ~/.bashrc
After running the script, apply the environment changes by sourcing your `.bashrc`:
source ~/.bashrc
You’ll get CUDA toolkit 12.4 that matches the host driver.
- Jobs longer than 12 h → announce them at the Tuesday meeting or in the telegram channel.
- Sysadmins may pause/stop workloads that block others
- Crypto-mining strictly forbidden
Good-citizen checklist
- Before logging out, run watch -n60 nvidia-smi and confirm your process is gone and is not hogging the gpu.
- Keep /tmp and your home directory tidy; disk space is shared
- BACK UP YOUR OWN DATA: we have no data persistency or backup guarantee
Extras
- vGPU-enabled Windows or Ubuntu VMs — ask if needed
- Docker-inside-LXC supported (nesting + cgroupv2 enabled)
Troubleshooting quick-ref
Problem | Fix |
---|---|
nvidia-smi shows no devices | Container lost its GPU mapping—tell a sysadmin |
Driver mismatch error | Host driver was updated—reinstall matching userspace driver with --no-kernel-modules—tell a sysadmin |
Out-of-memory in PyTorch | If using a VM: request a larger vGPU slice. If using a container: someone else might be using some GPU memory. Check with nvidia-smi (gui for that coming up sometime kinda soon). |
Questions / ideas → email ceres-c@muhack.com, mrmoddom@muhack.com, or cammo@muhack.com — or just grab us at the Tuesday MuHack meeting.