[Demelerlab] server slowness

Lukas Dobler lukas.dobler at uni-konstanz.de
Thu Apr 2 03:39:34 MDT 2026


When investigating something else, I noticed that when starting 
ultrascan applications the following output was present:

libGL error: glx: failed to create dri3 screen

libGL error: failed to load driver: nouveau

I never noticed this before. When running journalctl | grep -iE 
"nouveau|libGL error" I noticed that those occure since the last restart 
of demeler9. When I checked demeler2, I observed the same output, but 
the journal entries go further back.

$ journalctl | grep -iE "nouveau|libGL error"

Mar 30 16:18:02 nrch.umt.edu dracut[56122]: -rw-r--r--   2 root  root    
         0 Sep  4  2024 etc/modprobe.d/nvidia-installer-disable-nouveau.conf

Mar 30 16:18:07 nrch.umt.edu dracut[56122]: -rw-r--r--   2 root  root    
        76 Sep  4  2024 
usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf

Mar 30 16:18:07 nrch.umt.edu dracut[56122]: drwxr-xr-x   2 root  root    
         0 Dec 17 21:52 
usr/lib/modules/5.15.0-318.199.3.2.el8uek.x86_64/kernel/drivers/gpu/drm/nouveau

Mar 30 16:18:07 nrch.umt.edu dracut[56122]: -rw-r--r--   1 root  root    
    853360 Dec 17 21:52 
usr/lib/modules/5.15.0-318.199.3.2.el8uek.x86_64/kernel/drivers/gpu/drm/nouveau/nouveau.ko.xz

Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL 
error: glx: failed to create dri3 screen

Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL 
error: failed to load driver: nouveau

Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL 
error: glx: failed to create dri3 screen

Mar 30 16:21:12 nrch.umt.edu org.gnome.Shell.desktop[70407]: libGL 
error: failed to load driver: nouveau


So basically as soon as the driver changes were done, the errors started 
within 3 minutes. With this change, there is no hardware acceleration 
anymore for rendering the screen and instead the CPU is doing it. This 
would especially affect the VNC in addition to the network latency. It 
also matches the observation that killing the mpi jobs helped to reduce 
it, because they heavily use CPU and memory. When looking at the running 
processes, I also noticed that things like gnome-shell would spike, 
especially if dragging something. Or more notably, if you close a dialog 
box (for example run details in us_edit) the cpu load spikes after 
closing, due to the render updates around the desktop manager.

btop runs per ssh to not contribute to the load on gnome-shell. What you 
see is me opening firefox and going to youtube. Where I also can't watch 
a video currently, youtube says that the browser can't play a video.

To verify:

- Current NVIDIA driver 580.95.05 is installed (grep "NVIDIA GLX Module" 
/var/log/Xorg.0.log or nvidia-smi -q)

- open-source nouveau driver is blacklisted at kernel level (grep -r 
"nouveau" /etc/modprobe.d/ /usr/lib/modprobe.d/)

- The gpus are not configured to contribute to the display output 
(nvidia-smi -q | grep -A 2 "Display" gives "Display Active: Disabled" 
for all gpus)

- This forces the the VNC session to use software rendering (glxinfo -B 
| grep "OpenGL renderer" returnts llvmpip (software renderer))

- All gnome-shell sessions combined have the thread count of the cpu 
llvmpipe threads (for the subthreads of gnome-shell ps -T -C gnome-shell 
| awk '{print $5}' | sort | uniq -c | sort -nr)

According to AI the chain of effect is:

Why it worked with Nouveau:
Nouveau is deeply integrated into the standard Linux kernel and the 
open-source Mesa graphics stack. It fully supports Kernel Mode Setting 
(KMS). Because of this deep integration, standard display servers (like 
Xorg) can automatically detect and initialize nouveau to provide basic 
2D and 3D hardware acceleration via standard generic interfaces, even 
without physical monitors attached or an xorg.conf file present.

Why it fails with the Proprietary Driver:
The proprietary NVIDIA driver is closed-source and operates outside the 
standard Linux KMS framework. It strictly relies on its own proprietary 
modules (glxserver_nvidia).

By default, the proprietary driver expects a physical monitor to be 
connected to initialize a rendering screen. Because your Tesla V100s are 
headless compute cards, the proprietary driver sees zero monitors. 
Without a physical monitor, and without an explicit xorg.conf file 
instructing it to create a "Virtual" off-screen buffer, the NVIDIA 
driver simply refuses to initialize the display engine.

Consequently, Xorg crashes out of the hardware acceleration attempt and 
falls back to CPU software rendering.


 From my understanding, the vnc always used software rendering, but with 
the default driver the defaults around the rendering and especially 
opengl seem to have prevent this from happening. To verify this, I 
tested the current main on Konstanz and ASTFVM which had both no gpu 
related changes, and wasn't able to observe the same issues there.


Have a nice day


Lukas


*Lukas Dobler*, M.Sc.
Ph.D. student
Universität Konstanz
AG Prof. Cölfen
Fachbereich Chemie
Universitätsstraße 10, Box 714
78464 Konstanz

Raum L 1050
Tel. +49 (0)7531 88 2019


On 02.04.2026 00:30, Saeed Mortezazadeh wrote:
> us_mpi_analysis is updated!
> -Saeed
>
> On Wed, Apr 1, 2026 at 3:26 PM Borries Demeler via Demelerlab 
> <demelerlab at biophysics.uleth.ca> wrote:
>
>     I believe I found the culprit of the slowness. Killing all mpi
>     jobs doing AUC analysis restored the regular speed. So
>     us_mpi_analysis appears to be a problem.
>     I did not have to reboot. We should perhaps recompile the mpi
>     libraries and make sure it is all updated. Saeed, can you please
>     take care of that? Let us all know when it is ready to go again,
>     and perhaps Haben, Sophia, Sigang and Reece and retry to do their
>     jobs then to see if it happens again?
>     Thanks, and sorry for the inconveniences.
>     -Borries
>     _______________________________________________
>     Demelerlab mailing list
>     Demelerlab at biophysics.uleth.ca
>     https://biophysics.uleth.ca/mailman/listinfo/demelerlab
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: otzd1GdqQWtcBwhz.png
Type: image/png
Size: 125475 bytes
Desc: not available
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5060 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://biophysics.uleth.ca/pipermail/demelerlab/attachments/20260402/f999c1ef/attachment-0001.p7s>


More information about the Demelerlab mailing list