SMP IRQ affinity

On multiprocessor systems, APIC (Advanced Programmable Interrupt Controller) is used to route interrupts to processors/cores. As the name hints, APIC can be programmed to do the routing as desired, to achieve optimal performance.  In Linux, interrupts are handled automatically by irqbalancer which can distribute interrupts across processors. By default on typical Linux systems, all hardware interrupts end up getting serviced by CPU 0. This can result in bottlenecks and reduced performance. The solution is to use SMP affinity to assign interrupts to different cores.

irqbalancer analyzes the amount of work interrupts require on the system and balances interrupt handling across all of the systems CPUs in a fair manner, keeping system performance more predictable and constant.

In Linux, /proc/irq/<irq_number>/smp_affinity controls which CPU handles specific interrupts . The default for all interrupts is all cores (some numbers of “f”), e.g.:

$ sudo cat /proc/irq/*/smp_affinity | uniq

You would expect that this setting spreads interrupts to all cores, but it doesn’t. Why? Because APIC is set by BIOS, and can operate in couple of different modes, e.g.:

  • physical (all IRQs assigned to CPU0)
  • logical (round-robin distribution of IRQs, which in most cases would probably introduce performance degradation)

Therefore the behavior you get is dependent on the chipset capabilities and what the system vendor decided to set in the BIOS.

For testing purposes I’ve launched a cr1.8xlarge instance with RHEL 6.4 HVM in AWS cloud, and there are couple of things to check on boot to see if the system supports Xen HVM callback (required in virtualized environment) and APIC mode:

$ dmesg | egrep "Xen HVM callback|APIC routing"
Xen HVM callback vector for event delivery is enabled
Setting APIC routing to physical flat

So, APIC is in physical mode. Let’s see how NIC interrupts are handled:

$ cat /proc/interrupts | grep -E 'CPU|eth0' | less -S
            CPU0       CPU1       CPU2       CPU3 ... ... ...
1837:       2480          0          0          0 ... ... .... xen-dyn-event     eth0

OK, I want to bind this interrupt to CPU1:

$ sudo service irqbalance stop
$ sudo su -
$ echo 2 > /proc/irq/1837/smp_affinity
$ cat /proc/irq/1837/smp_affinity
$ cat /proc/interrupts | grep -E 'CPU|eth0' | less -S
            CPU0       CPU1       CPU2       CPU3 ... ... ...
1837:       4980        265          0          0 ... ... .... xen-dyn-event     eth0
$ cat /proc/interrupts | grep -E 'CPU|eth0' | less -S
            CPU0       CPU1       CPU2       CPU3 ... ... ...
1837:       4980        980          0          0 ... ... .... xen-dyn-event     eth0

I’ve stopped irqbalance daemon (so that it doesn’t change my interrupt assignments), assigned IRQ 1837 to CPU1 by using the interrupt mask, and checked /proc/interrupts two times to see if the CPU1 is now handling the interrupts for eth0.

If I would like to run irqbalance for automatic interrupts distribution, but would like to keep IRQ 1837 assigned to CPU1, I would have to either run irqbalance with –banirq=1837 to ban affecting the affinity of IRQ 1837, or edit IRQBALANCE_BANNED_CPUS in /etc/sysconfig/irqbalance.

Also, if I would like to bind a process to specific CPU, I would use the taskset utility:

$ sudo taskset -p 8 2470
pid 2470's current affinity mask: ffffffff
pid 2470's new affinity mask: 8

For some last words, SMP affinity is commonly set for multi-gigabit and multiple network cards, since multi-gigabit networking traffic is pushing the limits of current SMP systems. TCP/IP software implementations, are known for their inability to scale well in general-purpose monolithic operating systems for SMP. One research showed that IRQ affinity alone provides a throughput gain of up to 25%, and a combined process and interrupt affinity can achieve gains of 30%, for bulk data transfers. [1]

Also, if a NIC supports MSI-X it can have multiple queues, with each queue getting it’s own interrupt (with keeping in mind that that TX affinity matches RX affinity)!

For these kind of busy systems, it is also useful to run e.g. Munin to graph interrupts and CPU.


Comments are closed.