ARM Cortex-A Multi-Core Boot Failure During Initialization
The issue at hand involves a failure in booting multiple cores on an ARM Cortex-A processor using the Fixed Virtual Platform (FVP) and the boot-wrapper-aarch64 software. The system successfully boots when running with a single core, but fails when multiple cores are enabled. The primary observation is that only the primary CPU (CPU 0) executes the initialization function cpu_init_bootwrapper()
correctly, while secondary cores appear to stall or fail to initialize properly.
The cpu_init_bootwrapper()
function is designed to initialize each CPU core sequentially. It uses a static variable cpu_next
to coordinate the initialization of each core. The primary core (CPU 0) initializes the system by calling init_bootwrapper()
, while secondary cores wait for their turn by spinning on the wfe()
(Wait For Event) instruction until cpu_next
matches their logical ID. Once a core completes its initialization, it increments cpu_next
and signals the next core using sev()
(Send Event).
The failure suggests that secondary cores are either not receiving the sev()
signal, not powered on correctly, or encountering issues during their initialization sequence. This could be due to misconfigurations in the boot-wrapper-aarch64 code, issues with the FVP model, or improper handling of Power State Coordination Interface (PSCI) requests.
SEV() Signaling Failure and CPU Power-On Issues
The failure of secondary cores to initialize could stem from several root causes. One potential issue is the failure of the sev()
instruction to properly wake up the secondary cores. The sev()
instruction is used to send an event to all cores, signaling them to exit the wfe()
state. If the sev()
instruction is not functioning as expected, secondary cores will remain in a waiting state, causing the boot process to hang.
Another possible cause is related to the power state of the secondary cores. In ARM systems, secondary cores are typically held in a powered-off state until explicitly powered on by the primary core using PSCI requests. If the boot-wrapper-aarch64 code does not correctly issue the necessary PSCI requests to power on the secondary cores, they will remain inactive and fail to participate in the boot process.
Additionally, the FVP model itself may have limitations or bugs in handling PSCI requests or multi-core initialization. The FVP model is designed to emulate ARM processors, but it may not perfectly replicate the behavior of physical hardware, especially in complex scenarios involving multi-core boot sequences and power management.
Debugging CPU States, PSCI Requests, and Implementing Fixes
To diagnose and resolve the multi-core boot failure, a systematic approach is required. The following steps outline the debugging process, potential solutions, and fixes.
Tracing CPU States in FVP Base Model
The first step in debugging the issue is to trace the state of each CPU core during the boot process. The FVP model provides various debugging features that can be used to monitor the state of the CPU cores. By enabling trace logging, you can capture detailed information about the execution flow, including the state of each core, the instructions being executed, and any exceptions or interrupts that occur.
To enable trace logging in FVP, you can use the --trace
option followed by the specific trace points you want to monitor. For example, to trace the state of all CPU cores, you can use the following command:
FVP_Base_RevC-2xAEMvA --trace=CPU* ...
This will generate a trace log that includes detailed information about the state of each CPU core, allowing you to identify where the boot process is failing.
Determining CPU Exception Levels (EL1 or EL3)
Another critical aspect of debugging the multi-core boot failure is determining the exception level (EL) of each CPU core. ARM processors support multiple exception levels, with EL3 being the highest privilege level typically used for secure monitor code, and EL1 being the standard privilege level for operating systems.
To determine the exception level of each CPU core, you can use the CurrentEL
system register. This register can be read using the following assembly instruction:
MRS X0, CurrentEL
By inserting this instruction into the boot-wrapper-aarch64 code, you can log the exception level of each core during the boot process. This will help you identify whether the secondary cores are correctly transitioning to the expected exception level.
Monitoring PSCI Requests for CPU Power-On/Off
The Power State Coordination Interface (PSCI) is a standard interface for managing the power states of CPU cores in ARM systems. During the boot process, the primary core is responsible for issuing PSCI requests to power on the secondary cores. If these requests are not issued correctly, the secondary cores will remain powered off and fail to initialize.
To monitor PSCI requests in the FVP model, you can use the --psci-monitor
option. This option enables logging of all PSCI requests and responses, allowing you to verify that the primary core is correctly issuing the necessary requests to power on the secondary cores.
FVP_Base_RevC-2xAEMvA --psci-monitor ...
By analyzing the PSCI log, you can determine whether the primary core is successfully powering on the secondary cores and whether any errors are occurring during the process.
Implementing Data Synchronization Barriers and Cache Management
One potential issue in the boot-wrapper-aarch64 code is the lack of proper data synchronization barriers and cache management. ARM processors use a weakly ordered memory model, which means that memory operations can be reordered unless explicit synchronization instructions are used. This can lead to subtle bugs in multi-core systems, where one core may observe stale or inconsistent memory values.
To ensure proper synchronization between cores, you should use the dsb()
(Data Synchronization Barrier) and isb()
(Instruction Synchronization Barrier) instructions. The dsb()
instruction ensures that all memory operations before the barrier are completed before any subsequent operations, while the isb()
instruction ensures that the processor pipeline is flushed, preventing any stale instructions from being executed.
In the cpu_init_bootwrapper()
function, the dsb(sy)
instruction is already used after incrementing cpu_next
. However, you may need to add additional synchronization barriers to ensure that all cores have a consistent view of memory. For example, you can add a dsb()
instruction before the sev()
instruction to ensure that the update to cpu_next
is visible to all cores before signaling them to wake up.
cpu_next = cpu + 1;
dsb(sy); // Ensure the update to cpu_next is visible to all cores
sev(); // Signal the next core to wake up
Additionally, you should ensure that the cache is properly managed during the boot process. If the secondary cores are using cached memory, they may observe stale data unless the cache is invalidated or cleaned. You can use the dcache
and icache
maintenance instructions to ensure that the cache is in a consistent state.
Verifying PSCI Implementation in boot-wrapper-aarch64
Since the boot-wrapper-aarch64 is being used without TF-A (Trusted Firmware-A), it is responsible for handling PSCI requests. The boot-wrapper-aarch64 code should include the necessary PSCI handlers to power on the secondary cores. If the PSCI implementation is incomplete or incorrect, the secondary cores will not be powered on, leading to the observed boot failure.
To verify the PSCI implementation, you should review the boot-wrapper-aarch64 code to ensure that it includes the necessary PSCI handlers for CPU_ON
, CPU_OFF
, and other relevant PSCI functions. The CPU_ON
handler is particularly important, as it is responsible for powering on the secondary cores.
The CPU_ON
handler should perform the following steps:
- Validate the target CPU ID and power state.
- Set up the entry point and context for the target CPU.
- Issue the necessary platform-specific commands to power on the target CPU.
- Return a success or error code to the caller.
If the PSCI implementation is missing or incorrect, you will need to add or modify the necessary code to ensure that the secondary cores are properly powered on.
Testing with Different FVP Configurations
Finally, it is important to test the system with different FVP configurations to rule out any issues with the FVP model itself. The FVP model supports various configuration options that can affect the behavior of the system, including the number of CPU cores, the memory layout, and the presence of certain hardware features.
You should test the system with different numbers of CPU cores to determine whether the issue is specific to a particular core count. Additionally, you should test with different memory configurations to ensure that the system is not running out of memory or encountering memory-related issues during the boot process.
If the issue persists across different FVP configurations, it is likely a bug in the boot-wrapper-aarch64 code or the FVP model itself. In this case, you may need to consult the ARM documentation or seek assistance from the ARM community to resolve the issue.
Conclusion
The multi-core boot failure in the ARM Cortex-A system using FVP and boot-wrapper-aarch64 is a complex issue that requires a thorough understanding of ARM architecture, multi-core boot sequences, and the FVP model. By systematically tracing the CPU states, verifying the PSCI implementation, and ensuring proper synchronization and cache management, you can identify and resolve the root cause of the failure. Additionally, testing with different FVP configurations can help rule out any issues with the FVP model itself. With careful analysis and debugging, you can achieve a successful multi-core boot and ensure the reliable operation of your ARM-based system.