ARM N1 SDP L4 Cache Mismatch Between lstopo and CLIDR_EL1
The ARM Neoverse N1 System Development Platform (SDP) is a highly configurable and scalable SoC designed for high-performance computing workloads. It features a hierarchical cache architecture, including L1, L2, and L3 caches, as documented in the N1 SDP technical reference manual (TRM). However, when running Linux workloads on the N1 SDP, the lstopo
command reports the presence of an 8MB L4 cache, while the CLIDR_EL1 register indicates that the L3 cache is the highest inner cacheable level. This discrepancy raises questions about the existence, configuration, and enablement of the L4 cache in the N1 SDP.
The CLIDR_EL1 register is a critical system register in ARM architectures that provides information about the cache hierarchy. The value 0xc3000123
read from CLIDR_EL1 suggests that the L3 cache is the highest level of inner cacheability, which contradicts the lstopo
output. This inconsistency could stem from several factors, including misinterpretation of the cache hierarchy, misconfiguration of the cache system, or a discrepancy between the hardware implementation and software tools.
The CMN-600 (Coherent Mesh Network) is a key component in the N1 SDP, responsible for interconnecting various system components, including caches. The CMN-600 can be configured to include additional cache levels, such as an L4 cache, depending on the system design. However, the N1 SDP documentation does not explicitly mention an L4 cache, leading to confusion about its presence and functionality.
CMN-600 Configuration and Cache Hierarchy Misalignment
The CMN-600 is a highly configurable interconnect that supports multiple cache levels, including L4 caches, depending on the system design. In the N1 SDP, the CMN-600 is typically configured to support L1, L2, and L3 caches, as documented in the TRM. However, the presence of an L4 cache, as reported by lstopo
, suggests that the CMN-600 might have been configured to include an additional cache level.
One possible cause of the discrepancy is that the L4 cache is part of the CMN-600 but is not enabled by default. The CMN-600 configuration registers, such as the CMN_CONFIG_REG, control the enablement and configuration of additional cache levels. If the L4 cache is not enabled in the CMN-600 configuration, the CLIDR_EL1 register would not reflect its presence, leading to the observed inconsistency.
Another potential cause is a misalignment between the hardware implementation and the software tools. The lstopo
command relies on system topology information provided by the Linux kernel, which might interpret the CMN-600 configuration differently. If the Linux kernel is not aware of the CMN-600 configuration or if there is a bug in the kernel’s interpretation of the cache hierarchy, lstopo
might incorrectly report the presence of an L4 cache.
Additionally, the CLIDR_EL1 register might not be updated correctly if the L4 cache is dynamically enabled or disabled during runtime. The CLIDR_EL1 register is typically initialized during boot and might not reflect changes to the cache hierarchy made after boot. If the L4 cache is enabled after boot, the CLIDR_EL1 register might still indicate that the L3 cache is the highest inner cacheable level.
Enabling and Verifying L4 Cache in ARM N1 SDP
To resolve the discrepancy between lstopo
and CLIDR_EL1, the first step is to verify the CMN-600 configuration. The CMN-600 configuration registers should be checked to determine if the L4 cache is enabled. The CMN_CONFIG_REG register controls the enablement of additional cache levels, and its value should be examined to confirm whether the L4 cache is part of the system design.
If the L4 cache is not enabled, it can be enabled by setting the appropriate bits in the CMN_CONFIG_REG register. The exact bit fields for enabling the L4 cache can be found in the CMN-600 technical reference manual. Once the L4 cache is enabled, the system should be rebooted to ensure that the CLIDR_EL1 register is updated to reflect the new cache hierarchy.
After enabling the L4 cache, the CLIDR_EL1 register should be read again to verify that it now indicates the L4 cache as the highest inner cacheable level. The value of CLIDR_EL1 should be compared with the expected value based on the CMN-600 configuration. If the CLIDR_EL1 register still does not reflect the L4 cache, further investigation is required to determine if there is a hardware or software issue.
The Linux kernel should also be updated to ensure that it correctly interprets the CMN-600 configuration and reports the cache hierarchy accurately. The kernel’s cache detection logic might need to be modified to account for the L4 cache, especially if it is dynamically enabled. The kernel source code should be reviewed to identify any potential issues with cache detection and reporting.
In addition to enabling and verifying the L4 cache, performance testing should be conducted to evaluate the impact of the L4 cache on system performance. The L4 cache can significantly improve performance for certain workloads by reducing memory latency and increasing data throughput. Performance benchmarks should be run with and without the L4 cache enabled to quantify its impact on system performance.
Finally, if the L4 cache is not required for the specific workload, it can be disabled to save power and reduce complexity. The CMN-600 configuration registers should be updated to disable the L4 cache, and the system should be rebooted to ensure that the cache hierarchy is correctly updated. The CLIDR_EL1 register should be read again to confirm that the L3 cache is now the highest inner cacheable level.
In conclusion, the discrepancy between lstopo
and CLIDR_EL1 in the ARM N1 SDP can be resolved by carefully examining and configuring the CMN-600 interconnect. The L4 cache, if present, should be enabled and verified through the CMN-600 configuration registers and the CLIDR_EL1 register. The Linux kernel should also be updated to ensure accurate cache detection and reporting. Performance testing should be conducted to evaluate the impact of the L4 cache on system performance, and the cache should be disabled if not required. By following these steps, the cache hierarchy in the ARM N1 SDP can be correctly configured and verified, ensuring optimal system performance and functionality.