ARM Cortex-M Cache Coherency Problems During Kernel Startup with FAULTMASK Enabled

When initializing an ARM Cortex-M-based system, particularly during the kernel startup phase, developers often enable the FAULTMASK register to ensure that no interrupts disrupt the critical initialization process. This is a common practice in real-time operating systems (RTOS) to guarantee a deterministic startup sequence. However, a subtle and often overlooked issue arises when the Memory Protection Unit (MPU) is configured during this phase, especially when the MPU_CTRL.HFNMIENA bit is not set. This bit controls whether the MPU is active during HardFault, NMI, and FAULTMASK handlers. If MPU_CTRL.HFNMIENA is not set, the default memory map is used, which can lead to unpredictable behavior regarding cache coherency.

The core issue manifests when the system initializes variables or configures memory regions in internal RAM while FAULTMASK is enabled. If the MPU is not properly configured to handle cacheability attributes during this phase, data written to memory may end up in the cache instead of being written directly to RAM. This becomes problematic when Direct Memory Access (DMA) is involved, as DMA transactions bypass the cache and operate directly on physical memory. Consequently, DMA operations may not see the data written by the CPU, leading to data integrity issues and system failures.

This problem is particularly prevalent in systems like the i.MX RT1064, where the cache architecture and MPU configuration play a critical role in ensuring data consistency between the CPU and peripherals. The ARM Cortex-M manual explicitly states that the behavior is UNPREDICTABLE if memory regions are later mapped as non-cacheable while data resides in the cache. This unpredictability can result in silent data corruption, making it a challenging issue to diagnose and resolve.

Memory Barrier Omission and Cache Invalidation Timing During MPU Initialization

The root cause of this issue lies in the timing and sequence of cache management operations relative to MPU configuration and FAULTMASK usage. When FAULTMASK is enabled, the system operates in a privileged state with all interrupts disabled, including non-maskable interrupts (NMIs). During this phase, if the MPU is configured without setting the MPU_CTRL.HFNMIENA bit, the default memory map is used, which may not align with the intended cacheability attributes of the memory regions being accessed.

One of the key contributors to this problem is the omission of memory barriers and cache invalidation steps during the initialization sequence. Memory barriers ensure that memory operations are completed in the correct order, while cache invalidation ensures that stale data in the cache is cleared before new data is written. Without these steps, the following scenarios can occur:

  1. Cache Pollution: Data written to memory during initialization may be cached, even if the memory region is later mapped as non-cacheable. This results in the cache holding stale data that is not visible to DMA operations.
  2. DMA Data Inconsistency: DMA transactions that rely on data written by the CPU may fail because the data resides in the cache and is not flushed to physical memory.
  3. Unpredictable Memory Access: The ARM Cortex-M architecture does not guarantee the behavior of memory accesses when cacheability attributes are changed dynamically without proper synchronization.

Additionally, the timing of cache invalidation relative to MPU configuration is critical. If cache invalidation is performed after the MPU is configured but before the cacheability attributes are applied, the cache may still hold stale data. Conversely, if cache invalidation is performed too early, it may not account for changes in the memory map introduced by the MPU configuration.

Implementing Data Synchronization Barriers and Cache Management for Reliable Startup

To address the cache coherency issues during kernel startup with FAULTMASK enabled, a systematic approach to data synchronization and cache management is required. The following steps outline a robust solution to ensure reliable system initialization:

Step 1: Configure MPU_CTRL.HFNMIENA Before Enabling FAULTMASK

Before enabling FAULTMASK, ensure that the MPU_CTRL.HFNMIENA bit is set. This ensures that the MPU remains active during the FAULTMASK handler, preserving the intended memory map and cacheability attributes. The following code snippet demonstrates how to configure this bit:

// Enable MPU with HFNMIENA bit set
MPU->CTRL = MPU_CTRL_ENABLE_Msk | MPU_CTRL_HFNMIENA_Msk;

Step 2: Use Data Synchronization Barriers (DSB) and Instruction Synchronization Barriers (ISB)

Data Synchronization Barriers (DSB) and Instruction Synchronization Barriers (ISB) ensure that memory operations and instruction fetches are completed in the correct order. Insert DSB and ISB instructions after configuring the MPU and before enabling FAULTMASK to ensure that all memory operations are synchronized:

// Configure MPU regions and attributes
MPU->RNR = 0; // Select region 0
MPU->RBAR = (0x20000000 & MPU_RBAR_ADDR_Msk) | (0x1 << MPU_RBAR_REGION_Pos);
MPU->RASR = (0x1 << MPU_RASR_ENABLE_Pos) | (0x0 << MPU_RASR_CACHEABLE_Pos);

// Insert DSB and ISB to synchronize memory operations
__DSB();
__ISB();

Step 3: Invalidate Cache Before Writing to Memory

Before writing to memory regions that may be accessed by DMA, invalidate the cache to ensure that stale data is cleared. Use the SCB_InvalidateDCache function provided by the Cortex-M core:

// Invalidate data cache
SCB_InvalidateDCache();

Step 4: Flush Cache After Writing to Memory

After writing to memory regions that may be accessed by DMA, flush the cache to ensure that the data is written to physical memory. Use the SCB_CleanDCache function:

// Flush data cache
SCB_CleanDCache();

Step 5: Verify Cache Coherency with DMA

To verify that the cache coherency measures are effective, perform a test DMA transaction and compare the data transferred by the DMA with the data written by the CPU. This step ensures that the cache management operations are correctly synchronized with the MPU configuration.

Step 6: Disable FAULTMASK After Initialization

Once the initialization sequence is complete, disable FAULTMASK to restore normal interrupt handling. Ensure that all cache management and memory synchronization steps are completed before disabling FAULTMASK:

// Disable FAULTMASK
__set_FAULTMASK(0);

By following these steps, developers can ensure reliable system initialization and avoid cache coherency issues during kernel startup. This approach leverages the ARM Cortex-M architecture’s features to maintain data integrity and synchronization between the CPU and peripherals, even in complex embedded systems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *