NEON Instruction Execution Failure in NSEL1 with HCR_EL2.ID and HCR_EL2.CD Set to 1
The core issue revolves around the failure of NEON instructions to execute in Non-Secure EL1 (NSEL1) when the HCR_EL2 register’s ID (Instruction Cache Disable) and CD (Data Cache Disable) bits are both set to 1. Specifically, the NEON instruction str q0, [x1, #0x60]
triggers an exception when executed in NSEL1, while the same instruction executes successfully in EL3. The exception message includes SPSR 0x3c9, ELR 0x400, and ESR 0x82000010. When the NEON instruction is replaced with a non-NEON instruction such as str x0, [x1, #60]
, the execution proceeds without issues. This behavior is unexpected because the HCR_EL2.ID and HCR_EL2.CD bits are documented to control instruction and data cache behavior in NSEL0 and NSEL1, with no direct mention of their impact on NEON operations.
The problem is further complicated by the fact that the MMU is disabled in both EL3 and EL1, ruling out MMU-related issues as the root cause. Additionally, the register x1
is confirmed to be 64-byte aligned, eliminating alignment issues as a potential cause. This leaves the HCR_EL2 settings as the primary suspect, despite the lack of explicit documentation linking these bits to NEON instruction execution.
HCR_EL2.ID and HCR_EL2.CD Impact on NEON Execution and Cache Coherency
The HCR_EL2 register is a critical control register in ARMv8-A architectures, governing various aspects of virtualization and system behavior. The ID (bit 33) and CD (bit 32) bits are specifically designed to disable instruction and data caches, respectively, in Non-Secure EL0 and EL1. When both bits are set to 1, the instruction and data caches are effectively disabled for non-secure states. While this behavior is well-documented, the indirect impact on NEON instruction execution is not explicitly addressed in the technical manuals.
One possible explanation is that the NEON unit relies on cache coherency mechanisms to ensure data integrity during vector operations. When the caches are disabled, the NEON unit may encounter inconsistencies or undefined behavior, leading to exceptions. This hypothesis is supported by the fact that NEON instructions execute successfully in EL3, where the caches are not affected by the HCR_EL2 settings. Additionally, the exception syndrome register (ESR) value 0x82000010 indicates an "Illegal Execution State" exception, which could be triggered by the NEON unit’s inability to access cached data or instructions.
Another potential cause is the interaction between the HCR_EL2 settings and the ARMv8-A memory model. The ARMv8-A architecture enforces strict memory ordering and synchronization rules, which may be violated when caches are disabled. This could lead to unpredictable behavior in the NEON unit, particularly during memory access operations such as str q0, [x1, #0x60]
. The absence of cache coherency mechanisms could result in data corruption or invalid memory accesses, triggering the observed exception.
Resolving NEON Execution Issues by Adjusting HCR_EL2 Settings and Ensuring Cache Coherency
To address the NEON instruction execution failure in NSEL1, the following troubleshooting steps and solutions are recommended:
-
Modify HCR_EL2 Settings: The simplest solution is to clear the HCR_EL2.ID and HCR_EL2.CD bits, enabling instruction and data caches in NSEL0 and NSEL1. This can be done by writing to the HCR_EL2 register during the system initialization phase. For example:
MSR HCR_EL2, x0 // Clear ID and CD bits in x0 before writing to HCR_EL2
This approach ensures that the caches are enabled, allowing the NEON unit to function correctly.
-
Implement Cache Management Instructions: If disabling the caches is not an option, explicit cache management instructions can be used to maintain cache coherency. For example, the
DC CIVAC
(Data Cache Clean by VA to PoC) instruction can be used to clean the data cache before executing NEON instructions. Similarly, theIC IALLU
(Instruction Cache Invalidate All to PoU) instruction can be used to invalidate the instruction cache. These instructions should be executed in EL3 or EL2 to ensure proper cache management. -
Use Data Synchronization Barriers: Data Synchronization Barriers (DSBs) and Instruction Synchronization Barriers (ISBs) can be used to enforce memory ordering and synchronization. Placing a DSB before and after NEON instructions ensures that all memory operations are completed before proceeding. For example:
DSB SY STR q0, [x1, #0x60] DSB SY
This approach prevents memory access conflicts and ensures that the NEON unit operates on consistent data.
-
Verify Memory Alignment and Permissions: Although the issue is not directly related to memory alignment or permissions, it is good practice to verify that the memory regions accessed by NEON instructions are properly aligned and have the correct permissions. This can be done by inspecting the memory attributes and access permissions in the system’s memory management unit (MMU) configuration.
-
Debugging with Exception Registers: The exception syndrome register (ESR) provides valuable information about the cause of the exception. In this case, the ESR value 0x82000010 indicates an "Illegal Execution State" exception. This information can be used to narrow down the root cause and guide further debugging efforts. For example, the ESR value can be cross-referenced with the ARMv8-A architecture reference manual to identify specific conditions that trigger the exception.
-
Testing with Simplified NEON Instructions: To isolate the issue, test the system with simplified NEON instructions that do not involve memory access. For example, use arithmetic NEON instructions such as
ADD V0.4S, V1.4S, V2.4S
to verify that the NEON unit is functioning correctly. If these instructions execute successfully, the issue is likely related to memory access and cache coherency. -
Consult ARM Documentation and Errata: Review the ARM architecture reference manual and any applicable errata for the specific processor core (e.g., Cortex-A55). There may be undocumented behavior or known issues related to NEON execution and HCR_EL2 settings. ARM’s technical support team can also provide additional insights and guidance.
By following these steps, the NEON instruction execution failure in NSEL1 can be resolved, ensuring reliable operation of the system. The key is to understand the interaction between the HCR_EL2 settings, cache coherency, and the NEON unit, and to implement appropriate measures to maintain system integrity.