TLB Coherence Expectations in Multi-PE Systems with ARM Neoverse N1 and CMN-600

In systems utilizing ARM Neoverse N1 cores interconnected via the CMN-600 mesh network, ensuring Translation Lookaside Buffer (TLB) coherence across multiple Processing Elements (PEs) is critical for maintaining memory consistency and avoiding permission faults. The TLB is a cache used by the Memory Management Unit (MMU) to store recent translations from virtual to physical addresses. When memory attributes, such as read-write permissions, are modified, the TLB entries must be invalidated to ensure all PEs observe the updated attributes. However, in multi-PE systems, achieving TLB coherence is not always straightforward, especially when hardware-assisted coherence mechanisms are expected to handle invalidation automatically.

The core issue arises when modifying memory attributes using the ARM Trusted Firmware-A (TF-A) library function xlat_change_mem_attributes(). This function changes the memory attributes and performs a TLB Invalidate (TLBI) operation followed by Data Synchronization Barrier (DSB) and Instruction Synchronization Barrier (ISB) sequences to ensure the changes are visible to the modifying core. However, in a multi-PE system, the expectation that the TLBI operation on one core will invalidate TLB entries on all other cores is not always met. This leads to permission faults when other PEs attempt to access the modified memory region, as their TLBs still contain stale entries.

TLBI Command Scope and Shareability Domain Mismatch

The root cause of the TLB coherence issue lies in the scope of the TLBI command and the shareability domain configuration of the system. The ARM architecture provides different TLBI commands with varying visibility scopes, such as Inner Shareable (IS), Outer Shareable (OS), and Non-Shareable (NS). These commands determine which PEs are affected by the TLB invalidation operation. In the case of the ARM Neoverse N1 cores connected via the CMN-600 interconnect, the system’s shareability domain configuration plays a crucial role in determining how TLB invalidations propagate across PEs.

The TLBI command used in the xlat_change_mem_attributes() function is TLBI VAE3IS, which targets the Inner Shareable domain. However, if the PEs are organized in a way that spans multiple shareability domains or if the CMN-600 interconnect does not fully support the propagation of Inner Shareable TLBI commands, the invalidation may not reach all PEs. This results in stale TLB entries remaining in some PEs, causing permission faults when those PEs attempt to access the modified memory region.

Additionally, the DynamIQ Shared Unit (DSU) configuration of the SoC can influence the shareability domain. Each DSU typically contains a set of cores that share certain resources, and the interconnect between DSUs may have different shareability characteristics. If the TLBI command does not account for these hierarchical shareability domains, the invalidation may not propagate correctly across the entire system.

Implementing Correct TLBI Commands and Shareability Domain Configuration

To resolve the TLB coherence issue, it is essential to use the appropriate TLBI command and ensure that the shareability domain configuration aligns with the system’s architecture. The TLBI VAE3OS command, which targets the Outer Shareable domain, is often more suitable for systems with complex interconnect topologies like the CMN-600. This command ensures that the TLB invalidation propagates across a broader range of PEs, including those in different DSUs or shareability domains.

The following steps outline the troubleshooting and resolution process:

  1. Verify Shareability Domain Configuration: Determine the shareability domain configuration of the ARM Neoverse N1 cores and the CMN-600 interconnect. This includes identifying the DSU organization and the interconnect’s support for Inner and Outer Shareable domains. Consult the SoC documentation to understand how shareability domains are implemented and how TLBI commands propagate across the system.

  2. Update TLBI Command in TF-A: Modify the xlat_change_mem_attributes() function to use the TLBI VAE3OS command instead of TLBI VAE3IS. This ensures that the TLB invalidation operation targets the Outer Shareable domain, which is more likely to propagate across all PEs in a multi-DSU system.

  3. Ensure Proper Barrier Sequences: After performing the TLBI operation, ensure that the DSB and ISB sequences are correctly executed. The DSB ensures that the TLB invalidation completes before proceeding, while the ISB ensures that the instruction pipeline is flushed, preventing any stale instructions from using outdated TLB entries.

  4. Test and Validate: After implementing the changes, rerun the coherence tests to verify that all PEs observe the updated memory attributes without causing permission faults. This includes performing writes to the modified memory region from all PEs and checking for exceptions.

  5. Debugging and Profiling: If the issue persists, use debugging tools to profile the TLB behavior and identify any remaining stale entries. ARM CoreSight and other debugging tools can provide insights into how TLBI commands are being executed and propagated across the system.

By addressing the shareability domain configuration and using the correct TLBI command, the TLB coherence issue in ARM Neoverse N1 systems with CMN-600 interconnects can be effectively resolved. This ensures that all PEs observe consistent memory attributes, preventing permission faults and maintaining system stability.

Detailed Explanation of TLBI Commands and Shareability Domains

To further understand the solution, it is important to delve into the specifics of TLBI commands and shareability domains in the ARM architecture. The ARMv8-A architecture defines several TLBI commands, each with a different scope and target domain. The most commonly used commands include:

  • TLBI VAE1IS: Invalidates TLB entries for the specified virtual address in the Inner Shareable domain at EL1.
  • TLBI VAE1OS: Invalidates TLB entries for the specified virtual address in the Outer Shareable domain at EL1.
  • TLBI VAE3IS: Invalidates TLB entries for the specified virtual address in the Inner Shareable domain at EL3.
  • TLBI VAE3OS: Invalidates TLB entries for the specified virtual address in the Outer Shareable domain at EL3.

The choice of TLBI command depends on the system’s shareability domain configuration and the level of execution (EL) at which the operation is performed. In the context of the ARM Neoverse N1 and CMN-600, the system typically operates at EL3, and the interconnect spans multiple shareability domains. Therefore, using TLBI VAE3OS ensures that the invalidation operation propagates across all relevant PEs.

The shareability domain configuration is defined by the system’s memory system architecture and the interconnect topology. The ARM architecture defines three shareability domains:

  • Non-Shareable (NS): The memory region is not shared between PEs, and no coherence is required.
  • Inner Shareable (IS): The memory region is shared between PEs within the same Inner Shareable domain, typically within a single DSU.
  • Outer Shareable (OS): The memory region is shared between PEs across multiple Inner Shareable domains, typically spanning multiple DSUs.

In a system with multiple DSUs connected via the CMN-600 interconnect, the Outer Shareable domain is often the appropriate choice for ensuring TLB coherence across all PEs. This is because the CMN-600 interconnect is designed to handle coherence across a wide range of PEs, and using the Outer Shareable domain ensures that the TLBI command propagates correctly.

Practical Considerations for Implementing TLB Coherence

When implementing TLB coherence in a multi-PE system, several practical considerations must be taken into account:

  1. System Configuration: Ensure that the system’s shareability domain configuration is correctly defined in the firmware and hardware. This includes setting up the DSUs and interconnect to support the desired shareability domains.

  2. Firmware Updates: Modify the firmware, such as TF-A, to use the appropriate TLBI commands for the system’s shareability domain. This may involve updating the xlat_change_mem_attributes() function or other relevant parts of the firmware.

  3. Testing and Validation: Thoroughly test the system to ensure that TLB coherence is maintained across all PEs. This includes running coherence tests, stress tests, and real-world workloads to validate the system’s behavior.

  4. Debugging Tools: Use debugging tools, such as ARM CoreSight, to monitor TLB behavior and identify any issues with coherence. These tools can provide valuable insights into how TLBI commands are being executed and propagated.

  5. Documentation and Best Practices: Document the system’s shareability domain configuration and the chosen TLBI commands. This ensures that future firmware updates and system modifications maintain TLB coherence.

By following these steps and understanding the underlying principles of TLBI commands and shareability domains, engineers can effectively address TLB coherence issues in ARM Neoverse N1 systems with CMN-600 interconnects. This ensures that all PEs observe consistent memory attributes, preventing permission faults and maintaining system stability.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *