TL;DR - If you’re using USB-C displays with Intel 12th or 13th gen
hardeware, its probably a DRI firmware issue in the i915 kernel
module. Roll back your DMC blob to adlp_dmc_ver2_16.bin
.
I’ve recently been building a machine for doing video production work at the robotics competitions I support as well as other small events where a video mux is required. In my pursuit of ever smaller footprints to load in and load out, I switched late this year to using a pair of portable USB-C monitors that accept displayport over USB-C. In order to connect these to my machines of choice and supply them with enough power, I use a CalDigit Element 3 hub. I can speak very highly of the build quality of the CalDigit equipment in general, but one thing I noticed using both a Framework 12th generation mainboard and an Asus ExpertCenter with 13th generation Intel was that over time, my displays would lock up completely and I would no longer be able to access the machine.
For me, the lockups were characterized by the following messages in
dmesg
repeating consistently:
[ 3308.663655] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:272:DP-3] commit wait timed out
[ 3318.903615] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] flip_done timed out
[ 3359.351470] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3359.351475] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] commit wait timed out
[ 3369.591431] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3369.591437] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:272:DP-3] commit wait timed out
[ 3379.832395] i915 0000:00:02.0: [drm] *ERROR* [CRTC:131:pipe B] flip_done timed out
[ 3390.071475] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
Its important to note that the machine is still up and running even
when the displays have locked up, and connecting via SSH still yeilds
a running system. For a computer doing video work, however, not
having stable video kind of defeats the purpose. After lots of
debugging and chasing bugs, I stumbled upon this
thread
which suggests that among other things, this is the result of the
render pipeline getting into a bad state. The response from November
24th by user mkyral
turned out to be the answer for my issue, which
was setting the dmc_firmware_path
attribute on the i915 module. To
make this change I created a file at /etc/modprobe.d/i915.conf
with
the following content:
options i915 enable_guc=3 dmc_firmware_path=/lib/firmware/i915/adlp_dmc_ver2_16.bin
I also enabled some additional performance modes when I was editing
this file. If you don’t need or want these, omit the enable_guc=3
token.
Since this file needs to be present in the initrd to be loaded
extremely early, I also created a configuration file for dracut in
/etc/dracut.conf.d/i915.conf
to include the file to the initrd:
install_items+=/etc/modprobe.d/i915.conf
After creating these files and regenerating the initrd, I rebooted and
verified that the path was picked up with systool
:
$ sudo systool -m i915 -av | grep dmc_firmware_path
dmc_firmware_path = "/lib/firmware/i915/adlp_dmc_ver2_16.bin"
My system has been stable for about a day and a half since setting this configuration. Since this was basically ungoogleable, this post is now up with the exact steps I took to resolve this until Intel can fix whatever’s wrong in the closed source blob.