Touchpad, Interrupted

For two years I've been driving myself crazy trying to figure out the source of a driver problem on OpenBSD: interrupts never arrived for certain touchpad devices. While debugging an unrelated issue over the weekend, I finally solved it.

It's been a long journey and it's a technical tale, but here it is.

Windows Precision

In 2015, I purchased a Samsung ATIV Book 9 laptop. Its touchpad was different than most other laptops used with OpenBSD previously, and would be a model for most touchpads to come after it: a Windows Precision Touchpad connected over I2C.

Most other laptops had a touchpad connected through the 8042 (PS/2) controller along with the keyboard, emulating the historical design of PCs having two PS/2 ports for an external mouse and keyboard. These touchpads from Synaptics, Elan, and ALPS each spoke a proprietary protocol and were rather bandwidth-constrained in terms of how much finger data they could communicate back to the OS which became a problem when multi-touch gestures became a thing in Windows.

For these devices, Microsoft produced its Windows Precision Touchpad specification and would handle the driver side of things, allowing vendors to have touchpads that shared a common driver and worked in Windows out of the box, as well as allow Microsoft to provide a better touchpad experience with gestures and palm rejection (but still not be able to rival what Apple does with the Broadcom touchpads on their MacBooks).

OpenBSD Support

In 2016, I finished writing drivers for these touchpads which required an I2C controller driver (dwiic), a HID-over-I2C driver (ihidev), a basic I2C-HID mouse driver (ims), then a full transport-agnostic HID driver implementing the Windows Precision Touchpad spec (hidmt), and finally an I2C touchpad driver (imt) to interface between ihidev and hidmt.

Shortly after, some laptops started showing up with their keyboard connected over I2C as well, requiring the ikbd driver. In 2018, I wrote umt to support USB-connected Windows Precision devices in use on some laptop touchscreens.

While all of this worked fairly well and somewhat modernized OpenBSD's non-ThinkPad laptop support (many ThinkPads up until some 2019 models still used a PS/2-connected touchpad and TrackPoint), there was one aspect that didn't work: on Broadwell chipsets, the touchpad would not wake up after an S3 suspend/resume.

Bug-Hunting

Later in 2016, I purchased a Chromebook Pixel and got OpenBSD running on it. The Pixel also had its touchpad and touchscreen connected over I2C, though being a Chromebook not running Windows, its touchpad did not conform to the Windows Precision Touchpad standard which meant it needed a new driver (iatp).

The Chromebook Pixel was also a Broadwell chipset and this new driver had the same issue: communication with it failed after an S3 resume. Two different vendors of touchpads and two different drivers, but the same problem. The I2C controller (dwiic) worked fine after resume, but any time it tried to communicate with the touchpad device, everything would just timeout.

After some months of debugging on Linux, I tracked down the fix to a single write to a register on the I2C controller device, found in Linux's Intel Low Power Subsystem (LPSS) driver for power gating.

Intel's LPSS is used for their I2C and SPI devices used to limit power usage by quickly shutting off components when idle. The way this is implemented in Linux is kind of confusing, and even now looking at their main LPSS driver I can't see where the 0x800 register comes from that OpenBSD's driver writes to the I2C controller in order to power up the I2C slave device. That Linux driver registers a clock (clk) device and the clk framework handles the register writing itself rather than calling back to a function in the LPSS driver, which is why it took me so long to find it in 2016.

Interrupting ihidev

In 2017, I purchased a Huawei Matebook X with a Kaby Lake chipset which Intel refers to as the 100 Series. Intel's I2C controllers on this chipset now show up as actual PCI devices, which meant splitting up the dwiic driver to handle both PCI and ACPI attachments.

The dwiic driver fetches ACPI resource information for I2C slave devices that are connected to it, like the touchpad. That resource includes the I2C slave address and interrupt pin that it is connected to on the IOAPIC. ihidev then attaches and uses the standard methods in the OpenBSD kernel to program the ioapic device to register a callback to ihidev whenever that pin receives an interrupt.

Despite all of that being setup with the proper address and pin (which matched what Linux did), the IOAPIC would never receive an interrupt on that pin and ihidev would never have its interrupt handler called when the touchpad was touched. It was being properly powered up and would respond to I2C HID commands, and if polled after touching, there was finger data available to read. It just never generated an interrupt.

As with the S3 resume issue, I spent months trying to figure out what was happening with these missing interrupts. I attended the OpenBSD t2k17 Hackathon and spent nearly a week straight in a room full of OpenBSD developers as I tried tearing apart the Linux I2C, LPSS, IOAPIC, and ACPI code with no luck.

As I heard reports from other users and developers with Intel 100 Series machines with the same interrupt problem, I started to assume it was specific to these newer chipsets. I went digging through Intel documentation and I2C implementations in other OSes (such as Coreboot and Google's Zircon kernel) to find anything related to this specific hardware.

Growing weary and admitting defeat, I added an adaptive polling mechanism to ihidev so the kernel would poll the device every 200ms until there was touch data available, then poll at 10ms until shortly after it stopped receiving new data. This was enough to get touchpads working on these new laptops, but it was slow and wasted a bit of CPU time and battery power. Unfortunately that "temporary" polling mechanism had to be used for the next two years as no one could fix (or was not interested in fixing) this problem.

ACPI Node Walking

A few weeks ago, I purchased the 7th generation ThinkPad X1 Carbon. Getting OpenBSD installed and working on it has been quite a feat, as there were multiple bugs to fix. The first showstopper was a kernel panic shortly into booting the installer due to an AML problem with OpenBSD's AML parser reporting "Not Integer" when executing a particular method.

For some quick background: Linux and most smaller operating systems use an ACPI interpreter called ACPICA which is written and maintained by Intel. OpenBSD and Windows each use their own custom-developed ACPI stacks. Presumably Microsoft has many engineers available to maintain their ACPI implementation (since they also wrote and maintain the official ACPI specfication with Intel) and every other OS just re-imports the ACPICA code from Intel when it's updated. Unfortunately on OpenBSD, this means we have to fix bugs and implement new functionality required by the ACPI spec (now at version 6.3) when we encounter them on new hardware.

The cause of the "Not Integer" panic on the X1 was due to this AML in an _INI method (ironically, on its touchpad device):

Method (_INI, 0, NotSerialized)  // _INI: Initialize
{
    GPDI = 0x64
    If ((OSYS < 0x07DC))
    {
        SRXO (GPDI, One)
    }
    
    INT1 = GNUM (GPDI)
    INT2 = INUM (GPDI)

    [..]

    If ((TPDT == 0x05))
    {
        If ((^^^LPCB.NFCD == Zero))
        {
            _HID = "SYNA8005"
        }
        Else
        {
            _HID = "SYNA8004"
        }

        ADBG (Concatenate ("TPD0 _HID:", ToHexString (_HID)))
        HID2 = 0x20
        BADR = 0x2C
        ADBG (Concatenate ("TPD0 _INI:BADR=", ToHexString (BADR)))
        Return (Zero)
    }

When ACPI is being initialized in ACPICA or OpenBSD's ACPI code, it walks the entire DSDT tree looking for any methods named _INI and executes them. This is how certain variables get initialized, interrupts get setup, and anything else the hardware vendor needs to do.

At this point you may be thinking: maybe there's just an _INI function that OpenBSD is not executing that is needed to fix the touchpad interrupt problem. I checked this a long time ago and listed out all of the _INI method calls that OpenBSD did and compared it to Linux. The results were similar enough that I didn't investigate further.

The ToHexString operator in that _INI function is one built-in to ACPI and is supposed to convert string or integer data into a string of hexadecimal characters. The way it was implemented in OpenBSD's AML parser 11 years ago was to only accept integer arguments, so anything passed to it that wasn't an integer (such as the _HID string above) would cause an AML panic. After reviewing the ACPI specification, the fix was just to allow passing other types to the ToHexString (and ToDecString) functions since the underlying OpenBSD implementations already handled converting non-integer types.

However, while debugging that crash, I noticed something strange. The first conditional in that _INI method checks against OSYS, which is a global variable that most DSDTs compute according to which version of whichever operating system it's running on. There's a long history related to _OSI that I won't go into, but basically every OS claims to be Windows now, except on Apple hardware, where we all claim to be Darwin, because it's easier for other OSes to behave like Windows and macOS than for the hardware vendors to update their BIOS code when a driver issue in Linux is fixed.

  --== Eval Method [\\_SB_.PCI0.I2C1.TPD0._INI, 0 args] to t ==--
  ===== Stack \\_SB_.PCI0.I2C1.TPD0._INI:Method
  parsename: \\GPDI 5
   write 00 6fb1847a 0020 [\\GNVS]
  parsename: \\OSYS 5
    read 00 6fb18000 0010 [\\GNVS]
  aml_evalexpr: LLess 0 7dc = ffffffffffffffff
  quick: 203a8 [LLess] alloc return integer = 0xffffffffffffffff
  parse-if @ 203a6
  parsename: \\_SB_.SRXO 8

The AML If ((OSYS < 0x07DC)) was being turned into a conditional LLess 0 7dc, but why was OSYS zero?

Looking elsewhere in the DSDT, OSYS is initialized like so:

Scope (_SB.PCI0)
{
	[...]
    Method (_INI, 0, Serialized)  // _INI: Initialize
    {
        TBPE = One
        OSYS = 0x03E8
        If (CondRefOf (\_OSI))
        {
            If (_OSI ("Windows 2001"))
            {
                WNTF = One
                WXPF = One
                WSPV = Zero
                OSYS = 0x07D1
            }

            If (_OSI ("Windows 2001 SP1"))
            {
                WSPV = One
                OSYS = 0x07D1
            }
            
            [...]
            
            If (_OSI ("Windows 2015"))
            {
                WIN8 = One
                OSYS = 0x07DF
            }

Basically for each newer version of Windows that the system reports it is compatible with (OpenBSD reports up to Windows 2015), OSYS is updated to a higher value. That OSYS variable is then used in various other DSDT methods related to setting up devices, basically to allow backwards compatibility if the machine is being used with older versions of Windows that may not be able to deal with a device set up in one particular way vs. another.

So if OSYS is being initialized in _SB.PCI0._INI, why is it zero when doing the conditional in the touchpad's _INI method?

Well as it turns out, the way that OpenBSD's ACPI stack was walking the entire DSDT tree looking for _INI methods was slightly different than ACPICA (and presumably Windows). On OpenBSD, nodes were being walked in this order;

  \_SB_.PCI0.LPCB.EC0_._INI
  \_SB_.PCI0.LPCB.EC0_.ALSD._INI
  \_SB_.PCI0.XHC_._INI
  [...]
  \_SB_.PCI0.I2C1.TPL1._INI
  \_SB_.PCI0._INI

But in ACPICA, they were walked in this order:

  \_SB_.PCI0._INI
  \_SB_.PCI0.LPCB.EC0_._INI
  \_SB_.PCI0.LPCB.EC0_.ALSD._INI
  \_SB_.PCI0.XHC_._INI
  [...]
  \_SB_.PCI0.I2C1.TPL1._INI

That slight change in ordering was the entire cause of the interrupt problem.

Earlier I wrote that I checked the list of _INI calls in Linux vs. OpenBSD, but I didn't realize the order of them was so important and that they had interdependencies that weren't explicit.

When \_SB_.PCI0.I2C1.TPL1._INI was executed first, OSYS was still zero, meaning that conditional mentioned earlier was returning true, executing SRXO (GDPI, One).

After that, _SB_.PCI0._INI was being executed, properly initializing OSYS. When ihidev would attach later, it would call the touchpad device's _CRS method to retrieve information about the I2C slave address and interrupt information that was supposed to be setup earlier in its _INI method:

  Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
  {
      If ((OSYS < 0x07DC))
      {
          Return (SBFI) /* \_SB_.PCI0.I2C1.TPD0.SBFI */
      }
      [...]

By this time, OSYS was properly set, and it would return resource information saying that its interrupts were routing through the IOAPIC on a particular pin, and OpenBSD would try to configure the IOAPIC accordingly. However, that didn't match what the firmware was actually doing earlier when _INI was executed, because it was being told to route its interrupt through some other mechanism or perhaps it never activated anything.

The fix for this was to change the node walk algorithm to match ACPICA and execute a matching child node (_INI) of a device before recursing through its child devices. With that change in place, it now properly executes \_SB_.PCI0._INI before _SB_.PCI0.I2C1.TPL1._INI, ensuring that OSYS is set before it's read.

With that fix in place, I was happy to finally disable forced-polling in dwiic for ihidev.

In the end, the bug had nothing to do with the devices being Intel 100 Series, and was most likely affecting all of them similarly because their vendors all used the same DSDT template from Intel, which uses OSYS in device _INI methods without an explicit dependency on _SB_.PCI0._INI to initialize it.

These fixes are now in the OpenBSD tree and have been in recent snapshots, so if this bug affected you and you want to try it out with proper interrupts, try the most recent snapshot.

Questions or comments?
Please feel free to contact me.