Testing and Inducing Failures in ARM LDREX/STREX Atomic Operations

Understanding LDREX/STREX and the Challenge of Testing Failure Paths

The ARM architecture provides a mechanism for atomic read-modify-write operations through the use of Load-Exclusive (LDREX) and Store-Exclusive (STREX) instructions. These instructions are fundamental for implementing synchronization primitives such as compare-and-swap (CAS), increment, decrement, and semaphore locks in multi-threaded or interrupt-driven environments. The LDREX instruction loads a value from memory and marks the memory location as exclusive to the current processor. The STREX instruction attempts to store a new value to the same memory location, but only succeeds if no other processor or context has modified the memory location since the LDREX. If the store fails, STREX returns a non-zero value, indicating that the operation must be retried.

The challenge lies in testing the failure paths of these atomic operations. Specifically, how can we reliably and reproducibly induce a failure in the STREX operation to ensure that the code handles such cases correctly? This is critical for verifying the robustness of synchronization mechanisms, especially in real-time systems where atomicity is paramount.

Causes of STREX Failures and Their Simulation

STREX failures can occur due to several reasons, including:

Memory Access by Another Core or Context: If another processor core or context modifies the memory location between the LDREX and STREX operations, the exclusive store will fail.
Interrupts or Exceptions: If an interrupt or exception occurs between the LDREX and STREX, the exclusive access may be cleared, causing the STREX to fail.
Explicit Clearing of Exclusive Access: The CLREX instruction can be used to explicitly clear the exclusive access marker, forcing a STREX failure.

To simulate these conditions, we need to manipulate the system state in a controlled manner. Simply halting execution between LDREX and STREX or accessing memory via a debugger may not always result in a STREX failure, as the ARM architecture’s exclusive monitor behavior can be complex and implementation-dependent.

Techniques for Inducing and Testing STREX Failures

Using Wait-For-Interrupt (WFI) and Interrupts

One effective method to induce a STREX failure is to insert a Wait-For-Interrupt (WFI) instruction between the LDREX and STREX. The WFI instruction puts the processor into a low-power state until an interrupt occurs. By triggering an interrupt during this window, we can simulate a context switch or another core modifying the memory location, causing the STREX to fail.

To implement this:

Instrument the code to insert a WFI instruction between the LDREX and STREX.
Configure an interrupt source (e.g., a timer interrupt) to fire shortly after the WFI is executed.
In the interrupt handler, modify the memory location that was loaded by the LDREX.
Upon returning from the interrupt, the STREX should fail, allowing you to verify the failure handling logic.

Using CLREX to Force a Failure

The CLREX instruction explicitly clears the exclusive access marker, ensuring that any subsequent STREX operation will fail. This can be used to test the failure path without relying on external events like interrupts.

To implement this:

Insert a CLREX instruction between the LDREX and STREX.
Execute the STREX, which should now fail due to the cleared exclusive access.
Verify that the code correctly handles the failure and retries the operation if necessary.

Brute-Force Testing with High IRQ Load

While less deterministic, brute-force testing can be used to increase the likelihood of a STREX failure. This involves repeatedly calling the atomic functions while generating a high number of interrupts to increase the chances of an interrupt occurring between the LDREX and STREX.

To implement this:

Create a test loop that repeatedly calls the atomic functions.
Configure multiple interrupt sources (e.g., timers, peripherals) to fire at high frequencies.
Monitor the system to detect STREX failures and verify the handling logic.

Debugger-Assisted Testing

For more precise control, a debugger can be used to manually manipulate the program counter or memory state between the LDREX and STREX. This method is less automated but allows for exact control over the test conditions.

To implement this:

Set a breakpoint at the instruction immediately following the LDREX.
When the breakpoint is hit, use the debugger to modify the memory location or force a context switch.
Resume execution and observe the STREX failure.

Combining Techniques for Comprehensive Testing

For thorough testing, a combination of the above techniques can be employed. For example, you could use WFI and interrupts to test the failure path under normal operating conditions, while also using CLREX to ensure that the failure handling logic is robust. Additionally, brute-force testing can be used to uncover edge cases that may not be easily reproducible with deterministic methods.

Example Code Snippet

Below is an example of how you might instrument your code to test STREX failures using WFI and interrupts:

#include <stdint.h>
#include <arm_acle.h>

volatile uint32_t shared_variable = 0;

void atomic_increment(volatile uint32_t *ptr) {
    uint32_t value;
    do {
        value = __ldrex(ptr);
        __wfi(); // Wait for interrupt
    } while (__strex(value + 1, ptr) != 0);
}

void timer_interrupt_handler(void) {
    shared_variable = 0xDEADBEEF; // Modify shared variable
}

int main(void) {
    // Configure timer interrupt
    configure_timer_interrupt(timer_interrupt_handler);

    // Perform atomic increment
    atomic_increment(&shared_variable);

    // Verify result
    if (shared_variable != 1) {
        // Handle failure case
    }

    return 0;
}

In this example, the atomic_increment function uses WFI to wait for an interrupt between the LDREX and STREX. The interrupt handler modifies the shared variable, causing the STREX to fail and the loop to retry the operation.

Conclusion

Testing the failure paths of LDREX/STREX operations is essential for ensuring the robustness of atomic operations in ARM-based systems. By using techniques such as WFI with interrupts, CLREX, brute-force testing, and debugger-assisted manipulation, you can reliably induce and verify STREX failures. Combining these methods provides a comprehensive approach to testing, ensuring that your code can handle all possible failure scenarios.

Testing and Inducing Failures in ARM LDREX/STREX Atomic Operations

Understanding LDREX/STREX and the Challenge of Testing Failure Paths

Causes of STREX Failures and Their Simulation

Techniques for Inducing and Testing STREX Failures