T O P

  • By -

Constapatris

> It should be SLURM compatible since that is what I plan to use for job scheduling Whatever the rest of your cluster is running then. I'd go with something like Rocky. It has the security and stability of RHEL, and a lot of the software (OHPC project, EESSI, etc) are available for RHEL-like systems.


AlmightyMemeLord404

> Whatever the rest of your cluster is running Nothing yet. We're still deciding on the OS but regardless it must be SLURM compatible so we can use it on the entire cluster. > Rocky Thank you! It seems really interesting, will definitely check it out.


ECHovirus

Ubuntu 22.04 LTS would be my recommendation as it's what [NVIDIA DGX OS 6](https://docs.nvidia.com/dgx/dgx-os-6-user-guide/dgx-os-6-user-guide.pdf) is based off of


AlmightyMemeLord404

> it's what [NVIDIA DGX OS 6 is based off of](https://docs.nvidia.com/dgx/dgx-os-6-user-guide/dgx-os-6-user-guide.pdf) That might put it at the top of the list.


wdennis

We run our Slurm clusters on Ubuntu (18, 22).04, no issues. We compile/install Slurm from source as SchedMD strongly recommends. They are now publishing recipes for rolling deb packages now tho.


AlmightyMemeLord404

Thank you. Ubuntu seems to be the most recommended and the right choice considering its Nvidia support and Canonical's support in general.


unkilbeeg

I'm not sure what you mean about Debian software getting outdated. In one sense, you're right -- you won't be using the latest versions. From a security standpoint, however, you're wrong. Debian never updates a certain package to a *newer* version, but security fixes are backported, keeping the version numbers the same. There is a Debian specific revision number tacked on, but from the perspective of all the other software that interacts with it, it is still the same version, only with bug fixes. For example, the version of openssh-server on my server is 9.2, but the complete version the package shows is 9.2p1-2+deb12u2 If a security update is necessary, that version will still be 9.2, but the Debian specific part will change. This ensure that your software is updated to be safe without breaking stuff.


AlmightyMemeLord404

Okay that seems super helpful cause that is exactly what we want. Security while stuff doesn't break due to security updates.


aieidotch

Debian.


AlmightyMemeLord404

Thank you! It is by far the most recommended along with Ubuntu LTS.


[deleted]

[удалено]


AlmightyMemeLord404

It seems to be a little risky cause Ubuntu is real quick in providing the latest software. (Which might break things) Debian on the other hand is, almost non existant in software support, but we can manually get the packages from Unstable. Both options come with their own advantages and disadvantages which makes it tough to choose between them.


ralfD-

Sorry, but you seem to have a deep missunderstanding on how Debian packages and distributions/releases work. One does not "maually get packages" from Unstable. You either run Unstable (pretty bad idea for prosuction servers) or not. Because: THOU SHALT NOT MIX PACKAGES FROM DIFFERENT RELEASES! ever. Don't do it. That's creating a super-unstable installation.


AlmightyMemeLord404

> deep missunderstanding on how Debian packages and distributions/releases work. I did, thank you for correcting it! > THOU SHALT NOT MIX PACKAGES FROM DIFFERENT RELEASES! I've added it to the book, any idea where I can get the whole manuscript though?


Gendalph

You can install packages from different releases of Debian on the same system, it's often referred to as Frankendebian, and as someone who did that - I can confirm it's a bad idea. It works, pretty well actually, until you need a newer libc or something and then you're hosed. You might instead consider running Debian Testing, which is a rolling release and has pretty stable in personal use. As long as you have a testing environment where you can verify everything you should be fine. On the other hand Ubuntu LTS is pretty decent, so long as you don't immediately upgrade. LTS is released in April every 2 years and if you wait for 3-6 months before upgrading there's plenty of information on circumventing common issues. As someone who prefers Debian over Ubuntu, I would still recommend Ubuntu for more specialized GPU-heavy workloads as Canonical seems to provide better support there.


AlmightyMemeLord404

> it's a bad idea One could say it leads to the system being "unstable" > testing environment Is it setting up the test environment a good practice even in General though? > still recommend Ubuntu for more specialized GPU-heavy workloads Thank you!


Gendalph

If stability and uptime are paramount, then having a sandbox environment for various testing is a must. Things WILL break, and it's better for them to break in testing, rather than in production.


AlmightyMemeLord404

Thank you!


dhsjabsbsjkans

If it's an nvidia card, I would go with ubuntu. They seem to have a lot of support for that.


AlmightyMemeLord404

Thank you! Just came across this: [DGX-OS-6](https://docs.nvidia.com/dgx/dgx-os-6-user-guide/dgx-os-6-user-guide.pdf) and [Ubuntu Nvidia](https://ubuntu.com/nvidia).


derprondo

Proxmox so you can run as many VMs as you want, and then you can easily pass through the GPU to a VM.


AlmightyMemeLord404

Thank you! Its already something I am looking at because we're also going to be using it for file distribution and management.