Cortex-A9 MMU TEX Bit Configuration and L2 Cache Bypass Challenges
The Cortex-A9 processor, commonly used in heterogeneous multi-core systems, presents unique challenges when configuring the Memory Management Unit (MMU) for cache coherency and memory isolation. In systems where one core runs a full-fledged operating system like Linux with access to both L1 and L2 caches, and another core runs a real-time operating system like FreeRTOS with access only to L1 cache, the MMU configuration becomes critical. The primary issue revolves around the TEX (Type Extension) bits in the MMU table entries, which control memory attributes such as cacheability and shareability. Specifically, the TEX[2:0] bits determine whether memory regions are cached, non-cached, or strongly ordered, and how they interact with the L1 and L2 caches.
In the described scenario, CPU0 (running Linux) has access to both L1 and L2 caches, while CPU1 (running FreeRTOS) is intended to operate with only L1 cache, bypassing the L2 cache entirely. The goal is to ensure that CPU1 remains isolated from cache coherency mechanisms, treating DDR memory as the single source of truth for shared data. However, the configuration of the TEX bits and the behavior of the L2 cache controller raise questions about whether CPU1 can truly bypass the L2 cache. The TEX[2:0] setting of 100, combined with specific C (Cacheable) and B (Bufferable) bits, is intended to limit caching to L1 only. However, the physical presence of the L2 cache, enabled by CPU0, introduces ambiguity about whether CPU1’s memory accesses are truly bypassing the L2 cache or inadvertently interacting with it.
Misconfigured TEX Bits and SMP Coherency Settings
The root cause of the issue lies in the interplay between the TEX bit configuration, the SMP (Symmetric Multiprocessing) bit in the ACTLR (Auxiliary Control Register), and the L2 cache controller’s behavior. The TEX[2:0] setting of 100 with C=1 and B=1 is designed to enable L1 caching while bypassing the L2 cache. However, this configuration assumes that the L2 cache controller will honor the TEX settings and not cache any data for CPU1. In reality, the L2 cache controller may still cache data for CPU1 if the SMP bit in the ACTLR is set, indicating that the Cortex-A9 processor is part of a coherency domain.
The SMP bit, when set, enables cache coherency mechanisms between the L1 caches of multiple cores and the shared L2 cache. If the SMP bit is not explicitly cleared for CPU1, the L2 cache controller may treat CPU1’s memory accesses as cacheable, even if the TEX bits suggest otherwise. This behavior is further complicated by the fact that the Linux kernel running on CPU0 may have enabled SMP support, which could inadvertently affect CPU1’s cache behavior. Additionally, the L2 cache controller’s support for cache lockdown—a feature that allows specific cache ways to be allocated exclusively to one CPU—does not inherently prevent other CPUs from accessing those cache lines, leading to potential coherency issues.
Correcting TEX Bit Configuration and Disabling SMP Coherency
To resolve the issue, a multi-step approach is required to ensure that CPU1’s memory accesses bypass the L2 cache and remain isolated from cache coherency mechanisms. First, the TEX[2:0] bits must be configured correctly to enforce L1-only caching for CPU1. The recommended setting is TEX[2:0]=100 with C=1 and B=1, which should theoretically limit caching to L1. However, this setting alone is insufficient if the SMP bit in the ACTLR is set. Therefore, the SMP bit must be explicitly cleared for CPU1 to disable its participation in cache coherency. This can be achieved by modifying the ACTLR register during CPU1’s initialization sequence.
Next, the Linux kernel running on CPU0 must be configured to avoid enabling SMP support if it is not required. This involves setting CONFIG_SMP=n
in the kernel configuration to ensure that CPU0 does not inadvertently enable coherency mechanisms that could affect CPU1. Additionally, the L2 cache controller’s behavior must be carefully examined to ensure that it respects the TEX bit settings and does not cache data for CPU1. If the L2 cache controller supports cache lockdown, this feature can be used to allocate specific cache ways exclusively to CPU0, preventing CPU1 from accessing them. However, this approach requires careful configuration to avoid performance degradation on CPU0.
Finally, cache maintenance operations must be performed explicitly to ensure data consistency between CPU1’s L1 cache and DDR memory. Since CPU1 is not participating in cache coherency, any shared data must be flushed from the L1 cache to DDR memory before notifying CPU0 of its availability. This can be achieved using the cache maintenance functions provided by the Xilinx framework, ensuring that only the L1 cache is flushed and the L2 cache remains unaffected. By combining these steps, the system can achieve the desired isolation of CPU1 from cache coherency mechanisms, ensuring reliable operation in a heterogeneous multi-core environment.