ARM Cortex-A53 TLB Invalidation Limited to Local Cluster

The core issue revolves around the inconsistent behavior of Translation Lookaside Buffer (TLB) invalidation instructions when executed on an ARM Cortex-A53 core within a heterogeneous multi-core system, specifically the NXP i.MX8QM platform. The i.MX8QM features a dual-core Cortex-A72 cluster and a quad-core Cortex-A53 cluster. When running at Exception Level 2 (EL2) on a Cortex-A53 core, the tlbi vae2is instruction fails to invalidate TLB entries across all cores in the system. Instead, the invalidation is limited to the Cortex-A53 cluster. In contrast, executing the same instruction from a Cortex-A72 core results in the expected behavior, where TLB entries are invalidated across both clusters. Additionally, removing the "is" suffix from the instruction (tlbi vae2) restricts the invalidation to the core executing the instruction, which is consistent with the ARM architecture specification.

The tlbi alle2is instruction, however, works as expected when executed from either cluster, invalidating TLB entries across all cores. This discrepancy suggests a potential issue with the shareability domain implementation or a hardware bug specific to the Cortex-A53 cluster or the system’s interconnect. The problem is further complicated by the presence of synchronization barriers, which ensure that Cortex-A72 cores do not access the invalidated address range until after the invalidation is complete. Despite these barriers, the Cortex-A53 cluster’s TLB invalidation remains inconsistent.

Hardware Bug in NXP i.MX8QM Cortex-A53 Cluster

The root cause of this issue has been identified as a hardware bug documented in NXP’s errata sheet under ERR050104. This errata specifically addresses the inconsistent behavior of TLB invalidation instructions when executed on the Cortex-A53 cluster. The bug manifests as a failure to propagate TLB invalidation requests to the Cortex-A72 cluster when using the tlbi vae2is instruction. The issue is not present when executing the same instruction from the Cortex-A72 cluster, indicating a problem with the Cortex-A53 cluster’s implementation of the shareability domain for TLB maintenance operations.

The shareability domain determines the scope of cache and TLB maintenance operations. In a multi-core system, these operations can be local to a core, shared within a cluster, or shared across the entire system. The "is" suffix in TLB invalidation instructions indicates that the operation is inner shareable, meaning it should affect all cores within the same inner shareability domain. In the case of the i.MX8QM, the Cortex-A53 and Cortex-A72 clusters are expected to share the same inner shareability domain. However, the hardware bug prevents the Cortex-A53 cluster from correctly propagating TLB invalidation requests to the Cortex-A72 cluster.

Workarounds and Mitigation Strategies for TLB Invalidation Issues

To address this issue, several workarounds and mitigation strategies can be employed. The most straightforward approach is to avoid using the tlbi vae2is instruction from the Cortex-A53 cluster. Instead, TLB invalidation requests should be initiated from the Cortex-A72 cluster, where the instruction behaves as expected. This ensures that TLB entries are invalidated across all cores in the system. However, this approach may not always be feasible, especially in systems where the Cortex-A53 cluster is responsible for managing memory mappings.

An alternative workaround involves using the tlbi alle2is instruction, which invalidates all TLB entries at EL2. While this instruction is less granular than tlbi vae2is, it ensures that TLB entries are invalidated across all cores, regardless of the cluster from which it is executed. This approach can be combined with careful management of page table entries to minimize the performance impact of invalidating the entire TLB.

For systems where granular TLB invalidation is critical, a software-based solution can be implemented. This involves using inter-processor interrupts (IPIs) to coordinate TLB invalidation across clusters. When a Cortex-A53 core needs to invalidate a TLB entry, it can send an IPI to a Cortex-A72 core, which then executes the tlbi vae2is instruction. This ensures that the TLB invalidation request is propagated to all cores in the system. While this approach introduces additional latency, it provides a reliable mechanism for maintaining TLB coherency in the presence of the hardware bug.

In addition to these workarounds, it is essential to ensure that all synchronization barriers are correctly implemented. The ARM architecture requires the use of Data Synchronization Barriers (DSBs) and Instruction Synchronization Barriers (ISBs) to ensure that TLB invalidation operations are completed before subsequent memory accesses. In the context of the i.MX8QM, these barriers must be carefully placed to account for the hardware bug and ensure that Cortex-A72 cores do not access invalidated TLB entries.

Finally, system designers should consult the NXP errata sheet for the i.MX8QM and apply any recommended firmware or hardware updates. NXP may provide patches or updates that mitigate the impact of the hardware bug, and staying informed about these updates is critical for maintaining system reliability.

In conclusion, the inconsistent TLB invalidation behavior observed on the NXP i.MX8QM platform is caused by a hardware bug in the Cortex-A53 cluster. By understanding the limitations imposed by this bug and implementing appropriate workarounds, system designers can ensure reliable TLB coherency across all cores in the system. Whether through careful instruction selection, software-based coordination, or firmware updates, addressing this issue requires a thorough understanding of the ARM architecture and the specific implementation details of the i.MX8QM platform.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *