Cortex-A15 Deadlock Due to Shared L2 Cache State in ACE Systems
The Cortex-A15 processor, particularly in revision r2p4, is susceptible to a rare but critical deadlock condition when operating in an ACE (AXI Coherency Extensions) system. This deadlock arises under specific conditions involving shared L2 cache states, multiple caching masters, and the interplay between readUnique requests and snoop operations. The issue is documented in ARM silicon errata 814169 and can lead to system lockups, particularly in multi-core configurations or systems with external caching masters.
The deadlock occurs when a series of store or PLDW (Preload Data for Write) instructions hit the L2 cache in a shared state. This shared state is typically maintained in systems where multiple caching masters, such as other Cortex-A15 cores or external peripherals, access the same cache lines. The errata specifies that the deadlock is triggered when six readUnique requests are unable to complete in the interconnect until a snoop request to the same tag bank completes. This dependency between readUnique and snoop operations creates a circular wait condition, resulting in a deadlock.
Understanding the conditions that lead to this deadlock requires a deep dive into the Cortex-A15’s cache coherency mechanisms, the role of the ACE system, and the interactions between multiple caching masters. The following sections will explore the root causes of this issue, the conditions under which it manifests, and potential solutions to mitigate or resolve the deadlock.
Conditions Leading to Shared L2 Cache State and Deadlock
The deadlock described in errata 814169 is contingent on several specific conditions related to the Cortex-A15’s cache architecture and the ACE system. These conditions include the presence of multiple caching masters, the shared state of the L2 cache, and the timing dependencies between readUnique requests and snoop operations.
Shared L2 Cache State and Multiple Caching Masters
The L2 cache in the Cortex-A15 can enter a shared state when multiple caching masters access the same cache lines. In an ACE system, caching masters include not only the Cortex-A15 cores but also external peripherals or accelerators that are capable of caching data. The shared state is a coherency mechanism that ensures all caching masters have a consistent view of the memory. However, this shared state becomes problematic when combined with specific sequences of store or PLDW instructions.
The errata specifies that the deadlock occurs when a series of store or PLDW instructions hit the L2 cache in the shared state. This implies that the cache lines in question have been accessed by another caching master and are marked as shared in the L2 cache. The shared state is a prerequisite for the deadlock, as it introduces the possibility of snoop requests being generated by other caching masters.
ReadUnique Requests and Snoop Dependencies
The deadlock is triggered when six readUnique requests are unable to complete in the interconnect until a snoop request to the same tag bank completes. A readUnique request is a coherency operation that ensures the requesting master has exclusive access to a cache line. In an ACE system, readUnique requests are typically used when a master intends to modify a cache line and needs to invalidate any shared copies in other caches.
The dependency between readUnique requests and snoop operations arises from the way the Cortex-A15 handles cache coherency. When a readUnique request is issued, the interconnect must ensure that any shared copies of the cache line are invalidated. This invalidation is achieved through snoop requests sent to other caching masters. If the snoop request cannot complete—for example, because the target cache is busy or the interconnect is congested—the readUnique request will stall, waiting for the snoop to complete.
In the case of errata 814169, the deadlock occurs because the readUnique requests and snoop operations become mutually dependent. The readUnique requests cannot complete until the snoop requests complete, but the snoop requests may themselves be blocked by other operations in the system. This circular dependency results in a deadlock, causing the Cortex-A15 core to hang.
Role of the ACE System
The ACE system plays a critical role in the deadlock condition. ACE is an extension of the AXI protocol that adds support for cache coherency between multiple masters. In an ACE system, the interconnect is responsible for maintaining coherency by propagating snoop requests and ensuring that all masters have a consistent view of the memory.
The errata states that there is no workaround for this issue in an ACE system with multiple caching masters. This is because the deadlock is fundamentally a result of the coherency mechanisms implemented in the ACE system. Disabling the ACE system or modifying its behavior would require significant changes to the system architecture and could introduce other coherency issues.
Mitigating Cortex-A15 Deadlock: Strategies and Solutions
While the errata states that there is no direct workaround for the deadlock in an ACE system with multiple caching masters, there are several strategies that can be employed to reduce the likelihood of the deadlock occurring or to mitigate its impact. These strategies include optimizing cache usage, modifying software behavior, and implementing hardware-based solutions.
Optimizing Cache Usage
One approach to mitigating the deadlock is to minimize the conditions under which the L2 cache enters the shared state. This can be achieved by carefully managing cache usage and avoiding situations where multiple caching masters frequently access the same cache lines.
For example, software can be designed to use non-cacheable memory for data that is frequently shared between multiple masters. This avoids the need for cache coherency operations and reduces the likelihood of the L2 cache entering the shared state. Alternatively, software can use cache maintenance operations to explicitly invalidate or clean cache lines, ensuring that they are not left in the shared state for extended periods.
Modifying Software Behavior
Another strategy is to modify the behavior of the software running on the Cortex-A15 cores to avoid the specific sequences of store or PLDW instructions that trigger the deadlock. This requires a detailed understanding of the software’s memory access patterns and the ability to identify and modify problematic code sequences.
For example, software can be modified to use fewer store or PLDW instructions in sequences that access shared cache lines. Alternatively, software can be designed to use memory barriers or other synchronization mechanisms to ensure that readUnique requests and snoop operations do not become mutually dependent.
Implementing Hardware-Based Solutions
In some cases, it may be possible to implement hardware-based solutions to mitigate the deadlock. For example, the system interconnect can be designed to prioritize snoop requests and ensure that they are completed in a timely manner. This reduces the likelihood of readUnique requests stalling due to incomplete snoop operations.
Another hardware-based solution is to modify the Cortex-A15’s cache coherency mechanisms to avoid the specific conditions that lead to the deadlock. This could involve changes to the way readUnique requests and snoop operations are handled, or the introduction of additional checks to detect and resolve potential deadlocks.
Disabling ACE Coherency
As a last resort, it may be possible to disable the ACE coherency mechanisms entirely. This would prevent the deadlock from occurring but would also require software to manually manage cache coherency between multiple masters. This approach is not recommended for most systems, as it introduces significant complexity and can lead to other coherency issues.
Conclusion
The Cortex-A15 deadlock described in errata 814169 is a complex issue that arises from the interaction between the processor’s cache coherency mechanisms and the ACE system. While there is no direct workaround for the issue in systems with multiple caching masters, careful optimization of cache usage, modification of software behavior, and implementation of hardware-based solutions can help to mitigate the risk of deadlock. In extreme cases, disabling the ACE coherency mechanisms may be necessary, but this approach should be used with caution due to the potential for introducing other coherency issues. By understanding the conditions that lead to the deadlock and implementing appropriate mitigation strategies, it is possible to reduce the impact of this issue and ensure reliable operation of Cortex-A15-based systems.