SMMU-v3 Memory Attribute Configuration and Cache Coherency Challenges
The System Memory Management Unit version 3 (SMMU-v3) is a critical component in modern ARM-based systems, particularly in hypervisor environments where it manages memory translations and access permissions for devices. One of the most intricate aspects of SMMU-v3 configuration is the setup of memory attributes, specifically cacheability and shareability settings. These attributes determine how the SMMU interacts with the memory system, including translation tables and queues, and how it handles coherency with the CPU and other agents.
The SMMU_CR1 register and Stage 2 Translation Entries (STE) are central to this configuration. The SMMU_CR1 register controls the cacheability and shareability attributes for table and queue accesses, while the STE fields (S2IR0, S2OR0, and S2SH0) define these attributes for stage 2 translation table accesses. Misconfigurations in these settings can lead to subtle but severe issues, such as data corruption, performance degradation, or complete system failure, especially when the SMMU does not support coherent access (COHACC = 0).
A key challenge lies in understanding which agent (CPU or SMMU) these attributes apply to and how they interact with the broader memory system. For instance, when the SMMU accesses translation tables or queues, are the cacheability and shareability settings referring to the SMMU’s view of memory or the CPU’s? Additionally, when the SMMU lacks coherent access support, how should these attributes be configured to ensure proper operation without violating memory consistency?
This post delves into the intricacies of SMMU-v3 memory attribute configuration, exploring the potential pitfalls and providing detailed guidance on resolving these issues. By the end, you will have a clear understanding of how to configure SMMU_CR1 and STE fields correctly, even in non-coherent SMMU implementations.
SMMU_CR1 Configuration: Cacheability and Shareability Ambiguities
The SMMU_CR1 register is responsible for defining the memory attributes for table and queue accesses. Specifically, it includes fields such as TABLE_SH (Table Shareability), TABLE_OC (Table Outer Cacheability), TABLE_IC (Table Inner Cacheability), and their counterparts for queue accesses (QUEUE_SH, QUEUE_OC, QUEUE_IC). These fields determine how the SMMU interacts with memory when accessing translation tables and queues.
A common point of confusion is whether these attributes apply to the CPU or the SMMU. For example, when TABLE_IC is set to write-back cacheable, does this mean the SMMU expects the translation tables to be cached in the CPU’s inner cache hierarchy, or does it refer to the SMMU’s own caching behavior? The answer lies in understanding the SMMU’s role as a memory access agent. The SMMU accesses memory on behalf of devices, and these attributes define how the SMMU expects the memory system to behave during such accesses. Therefore, the cacheability and shareability settings in SMMU_CR1 refer to the memory attributes as seen by the SMMU, not the CPU.
However, this interpretation becomes problematic when the SMMU does not support coherent access (COHACC = 0). In such cases, the SMMU cannot rely on hardware coherency mechanisms to maintain consistency between its caches and the CPU’s caches. If the SMMU_CR1 register is configured with cacheable attributes (e.g., write-back) in a non-coherent SMMU, the SMMU may incorrectly assume that the memory is cacheable, leading to potential data corruption or stale data issues. This is because the SMMU might cache data locally without ensuring consistency with the CPU or other agents.
The Linux SMMU-v3 driver, for instance, configures SMMU_CR1 with write-back cacheable attributes regardless of the COHACC setting. While this works for coherent SMMUs, it raises questions about the correctness of this configuration for non-coherent SMMUs. If the SMMU cannot enforce coherency, should the memory attributes be set to non-cacheable to avoid inconsistencies? This ambiguity is a significant source of potential errors in SMMU-v3 implementations.
Stage 2 Translation Entries: Cacheability and Shareability for Translation Tables
The Stage 2 Translation Entries (STE) in SMMU-v3 include fields such as S2IR0 (Inner Cacheability), S2OR0 (Outer Cacheability), and S2SH0 (Shareability) for stage 2 translation table accesses. These fields define the memory attributes for the translation tables used by the SMMU during stage 2 address translation. Similar to SMMU_CR1, there is ambiguity about whether these attributes apply to the CPU or the SMMU.
When the SMMU accesses stage 2 translation tables, it does so on behalf of a device. The cacheability and shareability settings in the STE determine how the SMMU expects the memory system to behave during these accesses. For example, if S2IR0 is set to write-back cacheable, the SMMU assumes that the translation tables are cached in the inner cache hierarchy as seen by the SMMU. However, if the SMMU is non-coherent (COHACC = 0), this assumption can lead to inconsistencies, as the SMMU cannot enforce coherency with the CPU or other agents.
In hypervisor environments, the stage 2 translation tables are often created by the CPU. This raises the question of whether the cacheability and shareability settings in the STE should match those used by the CPU. If the CPU accesses the translation tables with certain cacheability attributes, should the SMMU be configured with the same attributes? The answer depends on the SMMU’s coherency capabilities. For coherent SMMUs, the attributes can match, as the hardware ensures coherency. For non-coherent SMMUs, however, the attributes may need to be adjusted to avoid inconsistencies.
For example, if the CPU accesses the translation tables with write-back cacheable attributes, but the SMMU is non-coherent, the SMMU should ideally access the tables with non-cacheable attributes to prevent caching stale data. This ensures that the SMMU always reads the most up-to-date translation tables from memory, even if it cannot enforce coherency.
Resolving SMMU-v3 Memory Attribute Configuration Issues
To address the challenges associated with SMMU-v3 memory attribute configuration, a systematic approach is required. This involves understanding the SMMU’s coherency capabilities, carefully configuring the SMMU_CR1 and STE fields, and ensuring consistency between the SMMU and CPU memory attributes.
Step 1: Determine SMMU Coherency Support
The first step is to check the SMMU_IDR0.COHACC bit to determine whether the SMMU supports coherent access. If COHACC = 1, the SMMU can rely on hardware coherency mechanisms, and cacheable attributes can be safely used in SMMU_CR1 and STE fields. If COHACC = 0, the SMMU is non-coherent, and non-cacheable attributes should be used to avoid inconsistencies.
Step 2: Configure SMMU_CR1 Based on Coherency Support
For coherent SMMUs, the Linux SMMU-v3 driver’s approach of configuring SMMU_CR1 with write-back cacheable attributes is appropriate. However, for non-coherent SMMUs, the SMMU_CR1 register should be configured with non-cacheable attributes. This ensures that the SMMU does not cache data locally, preventing potential data corruption or stale data issues.
Step 3: Align STE Attributes with CPU and SMMU Coherency
When configuring the STE fields (S2IR0, S2OR0, S2SH0), the attributes should align with the SMMU’s coherency capabilities. For coherent SMMUs, the attributes can match those used by the CPU. For non-coherent SMMUs, non-cacheable attributes should be used to ensure that the SMMU always accesses the most up-to-date translation tables from memory.
Step 4: Validate Configuration with System-Level Testing
After configuring the SMMU_CR1 and STE fields, it is essential to validate the configuration through system-level testing. This includes stress testing the SMMU with various memory access patterns and verifying that no data corruption or inconsistencies occur. Tools such as memory analyzers and coherency checkers can be invaluable in this process.
Step 5: Document Configuration Guidelines
Finally, document the configuration guidelines for SMMU-v3 memory attributes, including the implications of COHACC and the recommended settings for coherent and non-coherent SMMUs. This documentation will serve as a reference for future implementations and troubleshooting efforts.
By following these steps, you can ensure that the SMMU-v3 memory attribute configuration is correct and optimized for your system’s coherency capabilities. This will prevent subtle hardware-software interaction issues and performance bottlenecks, leading to a more reliable and efficient system implementation.