ARM Cortex-A72 Cache Coherency Issues During SRAM Mapping
When mapping a Static Random-Access Memory (SRAM) area in an ARMv8 Cortex-A72 system, a common issue arises where the data cache must be explicitly flushed after adding the SRAM region to the page table. This behavior is not observed when mapping Dynamic Random-Access Memory (DDR) regions, leading to confusion about the underlying mechanisms. The core issue revolves around cache coherency, memory attributes, and the interaction between the Memory Management Unit (MMU) and the cache subsystem.
The ARM Cortex-A72 processor, like other ARMv8-A architectures, employs a sophisticated caching mechanism to optimize memory access. When a new memory region is mapped, the processor must ensure that the cache is coherent with the newly established memory attributes. Failure to do so can result in stale data being served from the cache, leading to incorrect program behavior. The requirement to flush the data cache after mapping SRAM but not DDR suggests that the memory attributes and cache behavior differ significantly between these two types of memory.
The Memory Attribute Indirection Register (MAIR) plays a crucial role in defining memory attributes, such as cacheability and shareability. When a memory region is mapped, the MAIR settings must align with the attributes specified in the page table entries. If these attributes are not correctly synchronized, the cache may not behave as expected, necessitating a cache flush to ensure coherency.
Memory Attribute Mismatch and Cache Invalidation Timing
One of the primary reasons for the observed behavior is a potential mismatch between the memory attributes specified in the page table and those expected by the cache subsystem. The ARM Cortex-A72 processor uses the Translation Control Register (TCR) to define the attributes used for table walks. If the TCR settings do not match the attributes specified in the page table, the cache may not be properly invalidated, leading to coherency issues.
The ARM Architecture Reference Manual (ARM ARM) provides detailed guidelines on the use of barriers to ensure correct ordering between data writes and MMU table walks. Data writes and MMU table walks are considered different observers, and barriers must be used to ensure that the cache is coherent with the newly mapped memory region. Without proper barriers, the cache may serve stale data, even if the memory attributes are correctly specified.
Another factor to consider is the state of the Virtual Address (VA) before the SRAM mapping is added. If the VA was previously marked as causing translation faults, the Translation Lookaside Buffer (TLB) must be invalidated to ensure that the new mapping is correctly recognized. Failure to invalidate the TLB can result in the processor continuing to use the old mapping, leading to cache coherency issues.
Implementing Cache Maintenance and Barrier Instructions
To resolve the cache coherency issues when mapping SRAM, a combination of cache maintenance operations and barrier instructions must be employed. The following steps outline the necessary actions to ensure that the cache is coherent with the newly mapped SRAM region:
-
Cache Flush: After adding the SRAM region to the page table, a data cache flush must be performed to ensure that any stale data in the cache is invalidated. This can be achieved using the
DC CIVAC
(Data Cache Clean and Invalidate by Virtual Address to Point of Coherency) instruction. This instruction ensures that the cache line corresponding to the specified virtual address is cleaned and invalidated, ensuring that subsequent accesses to the SRAM region will fetch fresh data from memory. -
Barrier Instructions: Proper use of barrier instructions is essential to ensure correct ordering between data writes and MMU table walks. The
DSB
(Data Synchronization Barrier) instruction can be used to ensure that all previous memory accesses are completed before proceeding with the cache maintenance operations. Additionally, theISB
(Instruction Synchronization Barrier) instruction can be used to ensure that the processor fetches the updated page table entries. -
TLB Invalidation: If the VA was previously marked as causing translation faults, the TLB must be invalidated to ensure that the new mapping is correctly recognized. This can be achieved using the
TLBI
(TLB Invalidate) instruction, which invalidates the TLB entries corresponding to the specified virtual address. -
Memory Attribute Synchronization: Ensure that the memory attributes specified in the page table entries match those expected by the cache subsystem. This includes verifying that the MAIR settings are correctly configured and that the TCR settings are consistent with the page table attributes.
-
Testing and Validation: After implementing the cache maintenance and barrier instructions, thorough testing and validation should be performed to ensure that the SRAM region is correctly mapped and that the cache is coherent with the memory. This includes verifying that data reads from the SRAM region return the expected values and that there are no coherency issues.
By following these steps, the cache coherency issues when mapping SRAM on an ARM Cortex-A72 processor can be effectively resolved. The key is to ensure that the cache is properly invalidated, that barrier instructions are used to enforce correct ordering, and that the memory attributes are correctly synchronized between the page table and the cache subsystem.
Detailed Explanation of Cache Coherency Mechanisms
To fully understand the necessity of cache maintenance operations when mapping SRAM, it is essential to delve into the cache coherency mechanisms employed by the ARM Cortex-A72 processor. The processor uses a multi-level cache hierarchy, with each level having its own set of attributes and behaviors. The cache coherency protocol ensures that all caches in the system have a consistent view of memory.
When a new memory region is mapped, the cache coherency protocol must be informed of the change to ensure that the cache is coherent with the new memory attributes. This is particularly important for SRAM, which may have different cacheability and shareability attributes compared to DDR. The cache coherency protocol relies on the memory attributes specified in the page table entries to determine how the cache should behave.
If the memory attributes are not correctly synchronized, the cache may continue to serve stale data, leading to coherency issues. This is why a cache flush is necessary after mapping SRAM. The cache flush ensures that any stale data in the cache is invalidated, and subsequent accesses to the SRAM region will fetch fresh data from memory.
Impact of Memory Attributes on Cache Behavior
The memory attributes specified in the page table entries have a significant impact on the behavior of the cache. The ARM Cortex-A72 processor uses the MAIR to define memory attributes such as cacheability, shareability, and memory type. These attributes determine how the cache interacts with the memory region.
For example, if a memory region is marked as cacheable, the cache will store copies of the data from that region, allowing for faster access. If the memory region is marked as non-cacheable, the cache will not store copies of the data, and all accesses will go directly to memory. The shareability attribute determines whether the memory region is shared between multiple processors or cores, which affects how the cache coherency protocol operates.
When mapping SRAM, it is crucial to ensure that the memory attributes specified in the page table entries match those expected by the cache subsystem. If the attributes do not match, the cache may not behave as expected, leading to coherency issues. This is why a cache flush is necessary after mapping SRAM, to ensure that the cache is coherent with the new memory attributes.
Role of Barrier Instructions in Cache Coherency
Barrier instructions play a critical role in ensuring cache coherency when mapping new memory regions. The ARM Cortex-A72 processor uses barrier instructions to enforce correct ordering between data writes and MMU table walks. Without proper barriers, the cache may serve stale data, even if the memory attributes are correctly specified.
The DSB
instruction ensures that all previous memory accesses are completed before proceeding with the cache maintenance operations. This is important because the cache maintenance operations must be performed after the memory attributes have been updated in the page table. The ISB
instruction ensures that the processor fetches the updated page table entries, ensuring that the new memory attributes are correctly recognized.
By using barrier instructions, the processor can ensure that the cache is coherent with the newly mapped memory region. This is particularly important when mapping SRAM, as the memory attributes may differ significantly from those of DDR.
Conclusion
Mapping SRAM on an ARM Cortex-A72 processor requires careful attention to cache coherency, memory attributes, and barrier instructions. The necessity of a cache flush after mapping SRAM is due to the potential mismatch between the memory attributes specified in the page table and those expected by the cache subsystem. By performing a cache flush, using barrier instructions, and ensuring that the memory attributes are correctly synchronized, the cache coherency issues can be effectively resolved.
Understanding the underlying mechanisms of cache coherency and memory attributes is essential for optimizing the performance and reliability of embedded systems using ARM Cortex-A72 processors. By following the outlined steps and ensuring proper synchronization between the cache and memory subsystems, developers can avoid common pitfalls and ensure that their systems operate as intended.