Cache Contents Mismatch with Main TLB After Enabling MMU and Configuring Memory Attributes
The core issue revolves around a mismatch between the contents of the L1 data cache and the main TLB after enabling the MMU and configuring memory attributes for a specific SRAM region (0x30000000-0x30200000). The user observed that the L1 data cache only contains addresses related to the page table (0xFXXXXXXX), while the main TLB contains entries for both the page table and the SRAM region. This discrepancy raises two critical questions: (1) Is data from the SRAM region (0x30000000) not being cached? (2) What is the root cause of this behavior?
The user followed a sequence of steps to set up the system:
- Disabled the MMU by clearing the I, C, and M bits in the SCTLR_EL3 register.
- Set up the page table, marking the entire address space as Device-nGnRnE.
- Enabled the MMU by setting the I, C, and M bits in the SCTLR_EL3 register.
- Modified the memory attributes of the SRAM region (0x30000000-0x30200000) to Normal, Inner Shareable, Write-Through, and Read/Write Allocate, followed by TLB invalidation.
- Performed multiple read and write operations to the SRAM region.
Despite these steps, the L1 data cache did not reflect the SRAM region’s addresses, while the main TLB did. This suggests a potential issue with cacheability settings, TLB maintenance, or memory attribute configuration.
Improper Memory Attribute Configuration and TLB Maintenance
The root cause of the issue lies in the improper configuration of memory attributes and insufficient TLB maintenance. Specifically, the following factors contribute to the observed behavior:
-
Initial Page Table Configuration: The initial page table marked the entire address space as Device-nGnRnE, which is not suitable for executing code. Device memory is typically non-cacheable and non-executable, and executing instructions from such regions is not recommended. This could lead to undefined behavior, especially if speculative instruction fetches occur.
-
Write-Through Cache Policy: The user configured the SRAM region as Write-Through (WT), which is not supported by many ARM cores. When a core does not support Write-Through, it may treat such regions as effectively non-cacheable. This explains why the SRAM region’s data did not appear in the L1 data cache.
-
TLB Invalidation Timing: Although the user performed TLB invalidation after modifying the memory attributes, the timing and scope of the invalidation might not have been sufficient. The ARM Architecture Reference Manual provides specific sequences for TLB maintenance, and any deviation from these sequences can result in stale translations being used by the MMU.
-
Cacheability and Shareability Settings: The memory attributes were set to Inner Shareable, which is correct for shared memory regions. However, the combination of Inner Shareable with Write-Through might not be handled consistently across all ARM cores. Additionally, the outer cacheability settings were not explicitly mentioned, which could lead to mismatched behavior between the inner and outer caches.
-
Execution from Device Memory: Executing code from regions marked as Device memory is not legal and can lead to unpredictable behavior. The initial page table configuration did not mark the execution region as Normal memory, which is a critical oversight.
Correcting Memory Attributes, Cache Policies, and TLB Maintenance
To resolve the issue, the following steps should be taken:
-
Reconfigure the Initial Page Table: The initial page table should mark the execution region as Normal memory. This ensures that the processor can fetch and execute instructions without encountering undefined behavior. Device memory regions should be marked as non-executable (XN=1) to prevent speculative instruction fetches.
-
Use Write-Back Cache Policy: Replace the Write-Through (WT) policy with Write-Back (WB) for the SRAM region. Write-Back is widely supported and ensures that data is cached appropriately. Verify the processor’s Technical Reference Manual (TRM) to confirm support for Write-Back and other cache policies.
-
Ensure Proper TLB Maintenance: Follow the TLB maintenance sequences outlined in the ARM Architecture Reference Manual (e.g., section K13.5.3). Perform a full TLB invalidation after modifying the translation table entries to ensure that the MMU uses the updated attributes.
-
Validate Cacheability Settings: Ensure that the inner and outer cacheability settings are consistent. For example, if the inner cacheability is set to Write-Back, the outer cacheability should also be configured appropriately. Mismatched settings can lead to unexpected behavior.
-
Test with Updated Configuration: After applying the above changes, repeat the read and write operations to the SRAM region. Verify that the L1 data cache now contains the SRAM region’s addresses and that the main TLB reflects the correct translations.
-
Monitor Performance and Behavior: Use performance monitoring tools to observe the cache behavior and ensure that the SRAM region is being cached as expected. Check for any anomalies in the TLB and cache contents.
By addressing these factors, the system should exhibit the expected behavior, with the SRAM region’s data being cached and the TLB reflecting the correct translations. This approach not only resolves the immediate issue but also establishes best practices for configuring memory attributes and maintaining cache and TLB coherency in ARM-based systems.
Detailed Explanation of Key Concepts
Memory Attributes and Cache Policies
Memory attributes define how the processor interacts with different memory regions. The key attributes include:
- Normal Memory: Used for typical RAM and ROM regions. Supports caching and speculative accesses.
- Device Memory: Used for memory-mapped I/O. Typically non-cacheable and non-speculative.
- Inner and Outer Cacheability: Defines how data is cached at different levels of the memory hierarchy.
- Shareability: Determines whether the memory region is shared between multiple cores or clusters.
Cache policies, such as Write-Through and Write-Back, control how data is written to memory:
- Write-Through (WT): Data is written to both the cache and main memory simultaneously. This ensures consistency but can be slower.
- Write-Back (WB): Data is written only to the cache initially. The write to main memory is deferred until the cache line is evicted. This improves performance but requires careful management to maintain coherency.
TLB Maintenance
The Translation Lookaside Buffer (TLB) caches virtual-to-physical address translations. When the page table is modified, the TLB must be invalidated to ensure that the MMU uses the updated translations. The ARM architecture provides specific instructions for TLB maintenance, such as:
- TLBI ALLE1: Invalidate all TLB entries at EL1.
- DSB ISH: Data Synchronization Barrier to ensure that the TLB invalidation is complete before proceeding.
Practical Considerations
When working with ARM processors, it is essential to:
- Consult the TRM: The Technical Reference Manual for the specific processor provides detailed information about supported features and recommended configurations.
- Use Architectural Guidelines: Follow the sequences and recommendations provided in the ARM Architecture Reference Manual.
- Validate Configurations: Use debugging and performance monitoring tools to verify that the system behaves as expected.
By adhering to these principles, developers can avoid common pitfalls and ensure robust and efficient system implementations.