ARM Cortex-M7 SVCall Handler Misbehavior and Hard Fault Escalation
The issue at hand involves an ARM Cortex-M7 processor (specifically the NXP i.MX RT1062 chip) where the SVCall (Supervisor Call) handler is misbehaving, resulting in either a return to an invalid address (0xdeadbeee
) or escalation to a Hard Fault. The problem manifests during runtime but does not occur when stepping through the code in a debugger. This discrepancy suggests a subtle timing or state-related issue in the interaction between the SVCall handler, the processor’s exception handling mechanism, and the stack management.
The SVCall handler is implemented in assembly and C, with the assembly portion responsible for determining the stack pointer (MSP or PSP) and extracting the SVC number, while the C portion handles the actual SVC operation. The issue arises during the return from the SVCall handler, where the processor either fails to restore the correct context or encounters an invalid state, leading to the observed misbehavior.
Nested SVCall Execution and Handler Mode Context Mismanagement
The root cause of the issue lies in the improper handling of the processor’s execution context during the SVCall exception. Specifically, the problem is exacerbated by the following factors:
-
Nested SVCall Execution: The ARM Cortex-M architecture does not support nested SVCall exceptions. Attempting to execute an SVC instruction while already in Handler mode (e.g., within an SVCall handler) results in a Hard Fault. This is because the processor cannot re-enter the SVCall handler while already handling an exception. The
EXC_RETURN
value of0xFFFF_FFF1
indicates that the processor is in Handler mode, which is consistent with the observed behavior. -
Incorrect Stack Pointer Management: The assembly code in the SVCall handler attempts to determine whether the Main Stack Pointer (MSP) or Process Stack Pointer (PSP) is in use by testing the
EXC_RETURN
value. However, the use ofMOVEQ r1, sp
instead ofMRSEQ r1, msp
introduces ambiguity, assp
could refer to either the MSP or PSP depending on the context. This ambiguity can lead to incorrect stack frame access, resulting in corrupted state restoration. -
Debugger Masking the Issue: When stepping through the code in a debugger, the timing and state of the processor are altered, which can mask the underlying issue. This is why the problem does not manifest during debugging but occurs during free-running execution. The debugger’s intervention effectively prevents the nested SVCall scenario or other timing-related issues from occurring.
-
Invalid Return Address: The return address
0xdeadbeee
is a corrupted version of the well-known placeholder value0xdeadbeef
, which is often used to indicate an invalid or uninitialized memory location. This suggests that the stack frame is being corrupted during the SVCall handler’s execution, leading to an invalid return address being loaded into the Program Counter (PC).
Implementing Correct SVCall Handler and Context Management
To resolve the issue, the following steps should be taken to ensure proper SVCall handler implementation and context management:
-
Prevent Nested SVCall Execution: The SVCall handler must ensure that it does not execute an SVC instruction while already in Handler mode. This can be achieved by checking the
EXC_RETURN
value to determine the current mode and avoiding any SVC instructions if in Handler mode. Additionally, the handler should be designed to handle only one SVCall at a time, with any nested calls being rejected or deferred. -
Explicit Stack Pointer Management: The assembly code should explicitly use the
MRS
instruction to access the MSP or PSP, rather than relying on the ambiguoussp
register. For example, replaceMOVEQ r1, sp
withMRSEQ r1, msp
to explicitly access the Main Stack Pointer. This ensures that the correct stack pointer is used for accessing the stack frame, preventing state corruption. -
Stack Frame Validation: Before returning from the SVCall handler, the stack frame should be validated to ensure that it contains valid values for the registers and return address. This can be done by checking the integrity of the stack frame and ensuring that the return address points to a valid location in memory.
-
Debugging and Testing: To identify and resolve timing-related issues, the code should be tested under various conditions, including different interrupt priorities and system loads. The use of breakpoints and watchpoints in the debugger can help identify the exact point where the stack frame becomes corrupted or the invalid return address is loaded.
-
Exception Handling Best Practices: The SVCall handler should follow best practices for exception handling, including proper use of the
EXC_RETURN
value, correct stack frame management, and avoidance of nested exceptions. The handler should also be designed to handle errors gracefully, such as by triggering a system reset or logging the error for further analysis.
By implementing these fixes, the SVCall handler can be made robust and reliable, preventing the return to an invalid address or escalation to a Hard Fault. The key is to ensure proper context management and avoid nested SVCall execution, which are the primary causes of the observed misbehavior.
Detailed Analysis of the SVCall Handler Code
The provided SVCall handler code has several areas that need attention:
-
Stack Pointer Determination:
TST LR, #4 ITE EQ MOVEQ r1, sp MRSNE r1, psp
This code tests the
EXC_RETURN
value to determine whether the MSP or PSP is in use. However, the use ofMOVEQ r1, sp
is ambiguous. It should be replaced withMRSEQ r1, msp
to explicitly access the Main Stack Pointer. -
Return Value Handling:
STMFD sp!, {LR} BL SVC_Handler_C LDMIA sp!, {LR} stm sp, {r0} BX LR
The return value from
SVC_Handler_C
is stored on the stack, but the stack frame is not validated before returning. This can lead to corruption if the stack frame is not properly managed. -
SVC Number Extraction:
LDR r3, [r1, #0x18] LDRH r0, [r3,#-2] BIC r0, r0, #0xFF00
This code extracts the SVC number from the instruction that triggered the SVCall. While this is correct, it assumes that the instruction is valid and that the stack frame is not corrupted.
Conclusion
The issue of the SVCall handler returning to 0xdeadbeee
or escalating to a Hard Fault is primarily caused by nested SVCall execution and improper stack pointer management. By preventing nested SVCall execution, explicitly managing the stack pointers, and validating the stack frame, the issue can be resolved. Additionally, thorough testing and debugging under various conditions are essential to ensure the robustness of the SVCall handler. Following these steps will lead to a reliable and efficient implementation of the SVCall handler on the ARM Cortex-M7 processor.