[Demelerlab] server slowness
Lukas Dobler
lukas.dobler at uni-konstanz.de
Thu Apr 2 03:39:34 MDT 2026
When investigating something else, I noticed that when starting
ultrascan applications the following output was present:
libGL error: glx: failed to create dri3 screen
libGL error: failed to load driver: nouveau
I never noticed this before. When running journalctl | grep -iE
"nouveau|libGL error" I noticed that those occure since the last restart
of demeler9. When I checked demeler2, I observed the same output, but
the journal entries go further back.
$ journalctl | grep -iE "nouveau|libGL error"
Mar 30 16:18:02 nrch.umt.edu dracut[56122]: -rw-r--r-- 2 root root
0 Sep 4 2024 etc/modprobe.d/nvidia-installer-disable-nouveau.conf
Mar 30 16:18:07 nrch.umt.edu dracut[56122]: -rw-r--r-- 2 root root
76 Sep 4 2024
usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf
Mar 30 16:18:07 nrch.umt.edu dracut[56122]: drwxr-xr-x 2 root root
0 Dec 17 21:52
usr/lib/modules/5.15.0-318.199.3.2.el8uek.x86_64/kernel/drivers/gpu/drm/nouveau
Mar 30 16:18:07 nrch.umt.edu dracut[56122]: -rw-r--r-- 1 root root
853360 Dec 17 21:52
usr/lib/modules/5.15.0-318.199.3.2.el8uek.x86_64/kernel/drivers/gpu/drm/nouveau/nouveau.ko.xz
Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL
error: glx: failed to create dri3 screen
Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL
error: failed to load driver: nouveau
Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL
error: glx: failed to create dri3 screen
Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL
error: failed to load driver: nouveau
So basically as soon as the driver changes were done, the errors started
within 3 minutes. With this change, there is no hardware acceleration
anymore for rendering the screen and instead the CPU is doing it. This
would especially affect the VNC in addition to the network latency. It
also matches the observation that killing the mpi jobs helped to reduce
it, because they heavily use CPU and memory. When looking at the running
processes, I also noticed that things like gnome-shell would spike,
especially if dragging something. Or more notably, if you close a dialog
box (for example run details in us_edit) the cpu load spikes after
closing, due to the render updates around the desktop manager.
btop runs per ssh to not contribute to the load on gnome-shell. What you
see is me opening firefox and going to youtube. Where I also can't watch
a video currently, youtube says that the browser can't play a video.
To verify:
- Current NVIDIA driver 580.95.05 is installed (grep "NVIDIA GLX Module"
/var/log/Xorg.0.log or nvidia-smi -q)
- open-source nouveau driver is blacklisted at kernel level (grep -r
"nouveau" /etc/modprobe.d/ /usr/lib/modprobe.d/)
- The gpus are not configured to contribute to the display output
(nvidia-smi -q | grep -A 2 "Display" gives "Display Active: Disabled"
for all gpus)
- This forces the the VNC session to use software rendering (glxinfo -B
| grep "OpenGL renderer" returnts llvmpip (software renderer))
- All gnome-shell sessions combined have the thread count of the cpu
llvmpipe threads (for the subthreads of gnome-shell ps -T -C gnome-shell
| awk '{print $5}' | sort | uniq -c | sort -nr)
According to AI the chain of effect is:
Why it worked with Nouveau:
Nouveau is deeply integrated into the standard Linux kernel and the
open-source Mesa graphics stack. It fully supports Kernel Mode Setting
(KMS). Because of this deep integration, standard display servers (like
Xorg) can automatically detect and initialize nouveau to provide basic
2D and 3D hardware acceleration via standard generic interfaces, even
without physical monitors attached or an xorg.conf file present.
Why it fails with the Proprietary Driver:
The proprietary NVIDIA driver is closed-source and operates outside the
standard Linux KMS framework. It strictly relies on its own proprietary
modules (glxserver_nvidia).
By default, the proprietary driver expects a physical monitor to be
connected to initialize a rendering screen. Because your Tesla V100s are
headless compute cards, the proprietary driver sees zero monitors.
Without a physical monitor, and without an explicit xorg.conf file
instructing it to create a "Virtual" off-screen buffer, the NVIDIA
driver simply refuses to initialize the display engine.
Consequently, Xorg crashes out of the hardware acceleration attempt and
falls back to CPU software rendering.
From my understanding, the vnc always used software rendering, but with
the default driver the defaults around the rendering and especially
opengl seem to have prevent this from happening. To verify this, I
tested the current main on Konstanz and ASTFVM which had both no gpu
related changes, and wasn't able to observe the same issues there.
Have a nice day
Lukas
*Lukas Dobler*, M.Sc.
Ph.D. student
Universität Konstanz
AG Prof. Cölfen
Fachbereich Chemie
Universitätsstraße 10, Box 714
78464 Konstanz
Raum L 1050
Tel. +49 (0)7531 88 2019
On 02.04.2026 00:30, Saeed Mortezazadeh wrote:
> us_mpi_analysis is updated!
> -Saeed
>
> On Wed, Apr 1, 2026 at 3:26 PM Borries Demeler via Demelerlab
> <demelerlab at biophysics.uleth.ca> wrote:
>
> I believe I found the culprit of the slowness. Killing all mpi
> jobs doing AUC analysis restored the regular speed. So
> us_mpi_analysis appears to be a problem.
> I did not have to reboot. We should perhaps recompile the mpi
> libraries and make sure it is all updated. Saeed, can you please
> take care of that? Let us all know when it is ready to go again,
> and perhaps Haben, Sophia, Sigang and Reece and retry to do their
> jobs then to see if it happens again?
> Thanks, and sorry for the inconveniences.
> -Borries
> _______________________________________________
> Demelerlab mailing list
> Demelerlab at biophysics.uleth.ca
> https://biophysics.uleth.ca/mailman/listinfo/demelerlab
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: otzd1GdqQWtcBwhz.png
Type: image/png
Size: 125475 bytes
Desc: not available
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5060 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.p7s>
More information about the Demelerlab
mailing list