Cortex-A53 AXI Read Signal Anomalies with Non-Cacheable Memory
The issue revolves around unexpected behavior in the AXI read signals when accessing non-cacheable memory regions on a Kria KV260 board with a Cortex-A53 processor. Specifically, the AXI read transactions exhibit discrepancies in signal values, such as ARSIZE and ARLEN, when compared to the expected behavior for non-cacheable memory accesses. The problem manifests when performing memory copy operations between non-cacheable memory regions, where the read transactions appear to fetch more data than requested, despite the memory being explicitly marked as non-cacheable.
The Cortex-A53 processor, configured with a 16-byte bus width, initiates AXI transactions to access memory regions mapped to 0xA0000000. When the AXI bus width is set to 4 bytes, the write transactions (AWSIZE=2, AWLEN=1) correctly transfer 8 bytes, but the read transactions (ARSIZE=2, ARLEN=3) fetch 16 bytes, which is inconsistent with the expected 8-byte transfer. Similarly, when the AXI bus width is increased to 16 bytes, the write transactions (AWSIZE=4, AWLEN=0) transfer 16 bytes, but the read transactions (ARSIZE=3, ARLEN=0) also fetch 16 bytes, despite the write strobes (WSTRB=FF00) indicating an 8-byte transfer.
This behavior suggests a potential misconfiguration or misinterpretation of the AXI signals, particularly in the context of non-cacheable memory regions. The issue is further complicated by the fact that the ARCACHE and AWCACHE signals are set to 0011, which typically indicates a non-cacheable, non-bufferable transaction. However, the observed behavior aligns more closely with cacheable transactions, where the processor might fetch a full cache line (16 bytes) even when only a portion of the data is required.
Misconfigured AXI Signal Interpretation and Cache Coherency Settings
The root cause of the issue lies in the misinterpretation of AXI signal values and potential misconfigurations in the cache coherency settings. The Cortex-A53 processor, when accessing non-cacheable memory, should ideally fetch only the requested amount of data, as indicated by the ARSIZE and ARLEN signals. However, the observed behavior suggests that the processor might be treating the memory access as cacheable, leading to the fetching of additional data beyond the requested amount.
One possible cause is the incorrect setting of the ARCACHE and AWCACHE signals. Although these signals are set to 0011, indicating non-cacheable and non-bufferable transactions, the behavior observed is more consistent with cacheable transactions. This discrepancy could arise from a misconfiguration in the memory attributes or the translation lookaside buffer (TLB) settings. The TLB attributes for the memory region (0xA0000000) are set using the Xil_SetTlbAttributes
function, which marks the region as non-cacheable. However, if the TLB settings are not correctly propagated to the AXI interface, the processor might still treat the memory access as cacheable.
Another potential cause is the misalignment between the AXI bus width and the memory access size. The Cortex-A53 processor has a 16-byte bus width, but the AXI interface is configured with a 4-byte or 16-byte width. When the AXI bus width is set to 4 bytes, the processor might issue multiple transactions to fetch the required data, leading to the observed 16-byte read. Similarly, when the AXI bus width is set to 16 bytes, the processor might issue a single transaction to fetch 16 bytes, even if only 8 bytes are requested. This misalignment could be exacerbated by the write strobes (WSTRB), which indicate the portions of the data bus that are valid. If the WSTRB signals are not correctly interpreted, the processor might fetch more data than necessary.
Additionally, the issue could be related to the memory protection unit (MPU) settings. The MPU is configured using the Xil_SetMPURegion
function, which sets the memory region attributes, including cacheability and shareability. If the MPU settings are not correctly aligned with the TLB and AXI interface settings, the processor might misinterpret the memory attributes, leading to the observed behavior.
Correcting AXI Signal Interpretation and Ensuring Cache Coherency
To resolve the issue, a systematic approach is required to ensure that the AXI signals are correctly interpreted and that the cache coherency settings are properly configured. The following steps outline the troubleshooting process and potential solutions:
-
Verify TLB and MPU Settings: Ensure that the TLB and MPU settings are correctly configured to mark the memory region as non-cacheable. The
Xil_SetTlbAttributes
andXil_SetMPURegion
functions should be reviewed to confirm that the memory attributes are correctly set. Specifically, theNORM_NONCACHE
andNORM_NSHARED_NCACHE
attributes should be applied consistently across all memory regions accessed by the AXI interface. -
Check ARCACHE and AWCACHE Signals: Verify that the ARCACHE and AWCACHE signals are correctly set to 0011 for non-cacheable transactions. This can be done by monitoring the AXI signals during memory access operations. If the signals are not correctly set, the memory attributes might need to be reconfigured to ensure that the processor treats the memory access as non-cacheable.
-
Align AXI Bus Width with Memory Access Size: Ensure that the AXI bus width is correctly aligned with the memory access size. If the AXI bus width is set to 4 bytes, the processor should issue multiple transactions to fetch the required data, but the total amount of data fetched should not exceed the requested amount. Similarly, if the AXI bus width is set to 16 bytes, the processor should issue a single transaction to fetch the exact amount of data requested. The WSTRB signals should be carefully monitored to ensure that they correctly indicate the portions of the data bus that are valid.
-
Review AXI Transaction Logs: Analyze the AXI transaction logs to identify any discrepancies in the signal values. The logs should be reviewed to confirm that the ARSIZE, ARLEN, AWSIZE, and AWLEN signals are correctly interpreted and that the transactions align with the expected behavior for non-cacheable memory accesses. Any anomalies in the logs should be investigated further to identify potential misconfigurations or misinterpretations.
-
Consult Xilinx Documentation and Support: Given that the issue is specific to the Kria KV260 board, it is recommended to consult the Xilinx documentation and support resources for additional guidance. The Xilinx AXI implementation might have specific requirements or constraints that need to be considered when configuring the AXI interface for non-cacheable memory accesses. The Xilinx support team might also provide insights into potential issues or bugs related to the AXI implementation on the Kria KV260 board.
-
Implement Data Synchronization Barriers: If the issue persists, consider implementing data synchronization barriers to ensure that the processor correctly handles non-cacheable memory accesses. Data synchronization barriers can be used to enforce the correct ordering of memory accesses and prevent the processor from fetching additional data beyond the requested amount. The
DSB
(Data Synchronization Barrier) instruction can be used to ensure that all memory accesses are completed before proceeding to the next instruction. -
Perform Simulation and Verification: Finally, perform simulation and verification to validate the AXI signal interpretation and cache coherency settings. The simulation environment should be configured to replicate the behavior observed on the Kria KV260 board, and the AXI transactions should be carefully monitored to ensure that they align with the expected behavior for non-cacheable memory accesses. Any discrepancies in the simulation results should be investigated and addressed before proceeding to hardware testing.
By following these steps, the issue of AXI read signal mismatches on the Kria KV260 board with a Cortex-A53 processor can be systematically addressed. The key is to ensure that the AXI signals are correctly interpreted, the cache coherency settings are properly configured, and the memory attributes are consistently applied across all memory regions accessed by the AXI interface.