TL;DR - If you’re using USB-C displays with Intel 12th or 13th gen hardeware, its probably a DRI firmware issue in the i915 kernel module. Roll back your DMC blob to adlp_dmc_ver2_16.bin.

I’ve recently been building a machine for doing video production work at the robotics competitions I support as well as other small events where a video mux is required. In my pursuit of ever smaller footprints to load in and load out, I switched late this year to using a pair of portable USB-C monitors that accept displayport over USB-C. In order to connect these to my machines of choice and supply them with enough power, I use a CalDigit Element 3 hub. I can speak very highly of the build quality of the CalDigit equipment in general, but one thing I noticed using both a Framework 12th generation mainboard and an Asus ExpertCenter with 13th generation Intel was that over time, my displays would lock up completely and I would no longer be able to access the machine.

For me, the lockups were characterized by the following messages in dmesg repeating consistently:

[ 3308.663655] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:272:DP-3] commit wait timed out
[ 3318.903615] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] flip_done timed out
[ 3359.351470] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3359.351475] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] commit wait timed out
[ 3369.591431] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3369.591437] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:272:DP-3] commit wait timed out
[ 3379.832395] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] flip_done timed out
[ 3390.071475] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out

Its important to note that the machine is still up and running even when the displays have locked up, and connecting via SSH still yeilds a running system. For a computer doing video work, however, not having stable video kind of defeats the purpose. After lots of debugging and chasing bugs, I stumbled upon this thread which suggests that among other things, this is the result of the render pipeline getting into a bad state. The response from November 24th by user mkyral turned out to be the answer for my issue, which was setting the dmc_firmware_path attribute on the i915 module. To make this change I created a file at /etc/modprobe.d/i915.conf with the following content:

options i915 enable_guc=3 dmc_firmware_path=/lib/firmware/i915/adlp_dmc_ver2_16.bin

I also enabled some additional performance modes when I was editing this file. If you don’t need or want these, omit the enable_guc=3 token.

Since this file needs to be present in the initrd to be loaded extremely early, I also created a configuration file for dracut in /etc/dracut.conf.d/i915.conf to include the file to the initrd:

install_items+=/etc/modprobe.d/i915.conf

After creating these files and regenerating the initrd, I rebooted and verified that the path was picked up with systool:

$ sudo systool -m i915 -av | grep dmc_firmware_path
    dmc_firmware_path   = "/lib/firmware/i915/adlp_dmc_ver2_16.bin"

My system has been stable for about a day and a half since setting this configuration. Since this was basically ungoogleable, this post is now up with the exact steps I took to resolve this until Intel can fix whatever’s wrong in the closed source blob.