Fast interrupt handling is important to system throughput and responsiveness. Xtensa processors already support rich configuration of sophisticated multi-level exception and interrupt priority levels, with additional state at each level to accelerate interrupt handler entry and exit. This flexible support provides specific capabilities for common shared interrupt handlers, medium priority C-coded interrupt handlers, lowest-latency high-priority interrupt handlers, and debug support for interrupt and exception handling. In some applications, even higher performance is needed, either to support very tight interrupt latency requirements, or to reduce total interrupt handling CPU load under high interrupt rate conditions.
This application note describes a method to use existing Xtensa features and configuration options to support very fast interrupt handling. Xtensa now supports both a windowed register file (“windowed ABI") and a non-windowed register file (“call0 ABI") calling convention. The method described in this note exploits that fact that call0 uses only 16 registers for any context, yet Xtensa supports up to 64 physical register file entries. By using existing instructions to switch register banks, the cycles for saving and restoring registers - normally required for this switching - can be reduced, sometimes dramatically. Accelerating the save and restore can have multiple benefits. First, it can reduce the latency of interrupt service, especially for interrupts with more complex code structure written in C - the processor can be executing code in the C interrupt handler after execution of as little as three instructions after the interrupt event. Second, it can reduce the total processor overhead for interrupt handling. The interrupt routine can execute as few as six instructions, not including the code of the C-level handler itself.
This note focuses on the interrupt handling case - i.e., that case where one task is running, but is preempted at some random point by an external or timer interrupt, which then performs an independent task. The interrupt routine has minimal register state (it only needs a stack pointer live), so it only needs to save the register state of the original task on entry, then perform its function, often creating temporary state in registers, and finally restore the register state of the original task, and allow it to resume. (The same technique could be extended to full context switching, in which there are 2-4 active tasks, each with a full set of live register state, each being switched at random points, but that demonstration requires development of a more complete RTOS-like scheduling infrastructure.)
These examples deal with the standard user state found in most configurations: the 16 live ARs, the SAR (Shift Amount Register) and the zero-overhead loop registers, if configured. It does not cover TIE state, which we assume is a) not configured, b) used solely by one task, so not switched, or c) switched by other software. Tensilica generally recommends that interrupt handlers not use TIE instructions that operate on TIE state. The Tensilica OSKit RTOS porting kit does include automatically-generated routines for save and restore of TIE state, which could be used to help implement interrupt handling with TIE state. Xtensa processors and the TIE language also define coprocessor enable flags (CPENABLE) that can be used for lazy TIE state save and restore.
The demonstration also takes advantage of the call0 ABI, in which registers a12 through a15 are "callee-saved", that is, each C-compiled procedure assumes that its caller has left some live values in these registers. If the callee needs to use these registers for temporary values, it must save the caller's value, use the registers, then restore the caller's value, before returning. The availability of these four callee-saved registers allows the interrupt handler to keep some of the interrupted task's register state in the interrupt routine's ARs. That state will simply stay in the ARs until the interrupt routine needs the registers. This avoids or defers the save/restore to memory of these values. The method outlined here puts the interrupted task's SAR value in 1 © TENSILICA, INC. Accelerated Interrupt Handling a15, and the three loop register values (LCOUNT, LBEG and LEND) in a12-a14, if zero-overhead loops are needed. In fact, the saving of the loop registers can be avoided if loop instructions are not used in both the interrupted task and the interrupt service routines. These examples also assume that the configuration or code does not use the MAC16 coprocessor, which may have live accumulator state.
The basic mechanism is described in the figure below. On entry into the interrupt handler, the code executes the ROTW 4 instruction, which moves the internal WINDOWBASE pointer to hide the 16 AR registers of the interrupted task, and expose a new set of 16 AR registers. These new registers hold no live values, except for a1, a previously initialized stack pointer. Callee-saved register a12-a15 can be used for holding other state of the interrupted task, since these will be automatically saved by any C code that needs to use those registers. Before exit from the timer interrupt, the code executes the ROTW -4 instruction which restores the internal WINDOWBASE pointer to re-expose the 16 AR registers of the interrupted task.