ARM Cortex-M7 Atomic Operation Faults on Non-Cacheable Memory

Issue Overview: LDREX Bus Faults in Non-Cacheable Memory Regions

The ARM Cortex-M7 processor is a high-performance microcontroller core designed for real-time and embedded applications. One of its key features is the Memory Protection Unit (MPU), which allows developers to define memory regions with specific attributes such as cacheability, shareability, and access permissions. However, a critical issue arises when attempting to perform atomic operations, such as LDREX (Load Exclusive), on non-cacheable memory regions configured as shareable. Specifically, when a region of SRAM is marked as non-cacheable and shareable using the MPU, the first atomic operation on that memory region results in a bus fault. This issue is particularly problematic in scenarios where DMA buffers are placed in non-cacheable memory to ensure data consistency between the processor and peripherals.

The fault occurs regardless of the memory location, whether it is in DTCM (Data Tightly Coupled Memory) or SRAM1/SRAM2. The fault is consistently triggered by the LDREX instruction, which is used in atomic operations to implement synchronization primitives such as mutexes in RTOS environments. The issue is not limited to a specific chip manufacturer, as it has been observed in STM32H7 and i.MXRT1064 processors, both of which are based on the Cortex-M7 architecture.

The root cause of this issue lies in the interaction between the Cortex-M7’s memory system and the MPU configuration. The Cortex-M7 expects atomic operations to be performed on memory regions with specific attributes, and deviations from these expectations can lead to undefined behavior, including bus faults. This issue is exacerbated when the memory region is marked as shareable, as the Cortex-M7 does not support hardware coherency for non-cacheable memory regions.

Possible Causes: MPU Configuration and Memory Attribute Mismatch

The bus fault during atomic operations on non-cacheable memory regions can be attributed to several factors related to the MPU configuration and the Cortex-M7’s memory system. Below are the key factors contributing to this issue:

  1. Incorrect MPU Configuration for Atomic Operations: The ARMv7-M architecture specifies that LDREX and STREX (Store Exclusive) operations must be performed only on memory regions with the Normal memory attribute. If the memory region is configured with attributes that do not align with the Normal memory type, the processor may generate a bus fault. In the case of the Cortex-M7, marking a memory region as non-cacheable and shareable (TEX=1, B=0, C=0, S=1) does not satisfy the requirements for atomic operations, leading to a fault.

  2. Lack of Hardware Coherency Support: The Cortex-M7 does not support hardware coherency for non-cacheable memory regions. When a memory region is marked as shareable, the processor expects hardware coherency mechanisms to ensure data consistency between multiple bus masters. However, since the Cortex-M7 lacks this support, the LDREX instruction fails, resulting in a bus fault. This is particularly evident in multi-core systems, where the Cortex-M7 core faults while the Cortex-M4 core does not.

  3. Misalignment Between MPU and Memory System Expectations: The Cortex-M7’s memory system expects atomic operations to be performed on memory regions that are cacheable or non-cacheable but not shareable. When a memory region is configured as non-cacheable and shareable, the memory system cannot guarantee the atomicity of the operation, leading to a fault. This misalignment between the MPU configuration and the memory system’s expectations is a key contributor to the issue.

  4. DMA Buffer Placement and Cache Invalidation: In many embedded systems, DMA buffers are placed in non-cacheable memory to avoid cache coherency issues. However, if the MPU configuration for these buffers does not align with the requirements for atomic operations, the system may experience bus faults. Additionally, improper cache invalidation or cleanup before performing atomic operations can exacerbate the issue.

Troubleshooting Steps, Solutions & Fixes: Implementing Correct MPU Configuration and Memory Management

To resolve the issue of bus faults during atomic operations on non-cacheable memory regions, developers must carefully configure the MPU and ensure that the memory attributes align with the Cortex-M7’s requirements. Below are the detailed steps and solutions to address this issue:

  1. Verify MPU Configuration for Atomic Operations: Ensure that the memory region used for atomic operations is configured with the correct attributes. The memory region should be marked as Normal memory, and the cacheability and shareability attributes should be set appropriately. For atomic operations, the memory region should not be marked as shareable (S=0). The following table summarizes the recommended MPU configuration for atomic operations:

    Attribute Value for Atomic Operations
    TypeExtField (TEX) 1 (Normal memory)
    Bufferable (B) 0 (Not bufferable)
    Cacheable (C) 0 (Not cacheable)
    Shareable (S) 0 (Not shareable)

    Example MPU configuration code:

    MPU_InitStruct.Enable = MPU_REGION_ENABLE;
    MPU_InitStruct.Number = MPU_REGION_NUMBER0;
    MPU_InitStruct.BaseAddress = 0x20000000;
    MPU_InitStruct.Size = MPU_REGION_SIZE_1MB;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
    MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
    MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
    MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
    HAL_MPU_ConfigRegion(&MPU_InitStruct);
    HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
    
  2. Avoid Shareable Attribute for Non-Cacheable Memory: Since the Cortex-M7 does not support hardware coherency for non-cacheable memory regions, the shareable attribute should be avoided for such regions. Instead, use software mechanisms to ensure data consistency between multiple bus masters. For example, use memory barriers or explicit cache management instructions to synchronize data access.

  3. Use Device Memory Attribute for DMA Buffers: If the non-cacheable memory region is used for DMA buffers, consider marking the region as Device memory (TEX=0, B=1, C=0, S=0) instead of Normal memory. Device memory attributes are better suited for peripheral access and do not require atomic operations. However, this approach should be used with caution, as Device memory has different access characteristics compared to Normal memory.

  4. Implement Data Synchronization Barriers: When performing atomic operations on non-cacheable memory regions, ensure that data synchronization barriers (DSB) and instruction synchronization barriers (ISB) are used to enforce the correct order of memory accesses. This is particularly important in multi-core systems or when using DMA. Example:

    __DSB(); // Ensure all previous memory accesses are complete
    __ISB(); // Ensure the instruction stream is synchronized
    
  5. Validate Memory Alignment and Access Permissions: Ensure that the memory addresses used for atomic operations are properly aligned (32-bit aligned for LDREX and STREX). Additionally, verify that the MPU configuration grants the necessary access permissions (read/write) for the memory region.

  6. Test and Debug with Minimal Configuration: Start with a minimal MPU configuration that only includes the memory regions required for atomic operations. Gradually add additional regions and test for bus faults. This approach helps isolate the issue and identify any misconfigurations.

  7. Consult Processor-Specific Documentation: Refer to the processor-specific documentation and application notes for additional guidance on MPU configuration and memory management. For example, the STM32H7 series has specific recommendations for MPU settings in its reference manual and application notes.

By following these steps and ensuring that the MPU configuration aligns with the Cortex-M7’s memory system requirements, developers can avoid bus faults during atomic operations on non-cacheable memory regions. Proper memory management and synchronization are critical to achieving reliable and efficient system performance in embedded applications.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *