ARM Cortex-A72 Performance Monitoring Unit (PMU) Access in User Mode
The ARM Cortex-A72, part of the ARMv8-A architecture, includes a Performance Monitoring Unit (PMU) that provides detailed insights into system performance through cycle counters and event counters. These counters are invaluable for profiling and optimizing software, particularly in performance-critical applications. However, accessing these counters directly from user mode on a Linux-based system can be challenging due to the privilege levels enforced by the ARM architecture and the Linux kernel.
The PMU registers, including the cycle counter register (PMCCNTR_EL0
), are accessible only at higher privilege levels (PL1 or PL2). In ARMv8-A, privilege levels are defined as follows:
- PL0: User mode (unprivileged).
- PL1: Supervisor mode (Linux kernel).
- PL2: Hypervisor mode (if implemented).
When running a Linux-based system, user applications operate in PL0, while the kernel operates in PL1. This means that any attempt to access PMU registers directly from a user-space application will result in an "Illegal Instruction" exception, as the hardware enforces privilege-level restrictions.
The root of the issue lies in the fact that even with root privileges, user-space applications cannot elevate their privilege level to access PL1-only registers. Root privileges in Linux do not equate to CPU privilege-level escalation; they merely grant additional permissions within the user-space environment. Therefore, accessing the PMU registers requires either kernel-level intervention or a mechanism to temporarily elevate the privilege level.
Privilege Level Restrictions and Kernel Module Requirements
The primary cause of the "Illegal Instruction" error when attempting to access the cycle counter (PMCCNTR_EL0
) from user space is the privilege-level restriction enforced by the ARMv8-A architecture. The PMU registers are designed to be accessed only at PL1 or higher, which is why user-space applications cannot directly interact with them.
To enable user-space access to the cycle counter, the following steps must be taken:
-
Enable User-Space Access to PMU Registers: The ARMv8-A architecture provides the
PMUSERENR_EL0
register, which controls user-space access to the PMU registers. Specifically, thePMUSERENR_EN
bit must be set to allow user-space applications to access the PMU registers. However, this register itself is accessible only at PL1, meaning it must be configured by the kernel. -
Kernel Module for Privilege Escalation: Since user-space applications cannot modify the
PMUSERENR_EL0
register, a kernel module must be developed to enable user-space access. This module will:- Set the
PMUSERENR_EN
bit in thePMUSERENR_EL0
register. - Optionally, configure the PMU to enable the cycle counter and reset it as needed.
- Provide a mechanism for user-space applications to request cycle counter readings via system calls or shared memory.
- Set the
-
System Call Interface: The kernel module should expose a system call interface that allows user-space applications to request cycle counter readings. This interface ensures that the privilege-level restrictions are respected while providing the necessary functionality to user-space applications.
-
Memory Barriers and Synchronization: When accessing the cycle counter, it is essential to use memory barriers (
DSB
andISB
instructions) to ensure that the counter readings are accurate and consistent. These barriers prevent instruction reordering and ensure that the cycle counter is read at the intended point in the program flow.
Implementing Kernel-Level PMU Access and User-Space Integration
To resolve the issue of accessing the cycle counter from user space on an ARM Cortex-A72 platform running Linux, the following steps must be implemented:
Step 1: Develop a Kernel Module to Enable PMU Access
The kernel module is responsible for configuring the PMU registers and enabling user-space access. The module should perform the following tasks:
- Set the
PMUSERENR_EN
bit in thePMUSERENR_EL0
register to allow user-space access to the PMU registers. - Enable the cycle counter by setting the appropriate bits in the
PMCR_EL0
andPMCNTENSET_EL0
registers. - Provide a system call interface for user-space applications to request cycle counter readings.
The following code snippet demonstrates how to enable user-space access to the PMU registers in a kernel module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <asm/sysreg.h>
static void enable_pmu_user_access(void) {
uint32_t pmuserenr;
// Read the current value of PMUSERENR_EL0
asm volatile("MRS %0, PMUSERENR_EL0" : "=r" (pmuserenr));
// Set the PMUSERENR_EN bit to enable user-space access
pmuserenr |= ARMV8_PMUSERENR_EN;
// Write the updated value back to PMUSERENR_EL0
asm volatile("MSR PMUSERENR_EL0, %0" : : "r" (pmuserenr));
}
static int __init pmu_module_init(void) {
enable_pmu_user_access();
printk(KERN_INFO "PMU user-space access enabled\n");
return 0;
}
static void __exit pmu_module_exit(void) {
printk(KERN_INFO "PMU user-space access disabled\n");
}
module_init(pmu_module_init);
module_exit(pmu_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("Kernel module to enable PMU user-space access");
Step 2: Implement a System Call Interface
The kernel module should provide a system call interface that allows user-space applications to request cycle counter readings. This can be achieved by defining a new system call or using an existing mechanism such as ioctl
.
The following code snippet demonstrates how to implement a system call interface for cycle counter readings:
#include <linux/syscalls.h>
#include <linux/uaccess.h>
SYSCALL_DEFINE0(get_cycle_counter) {
uint64_t cycle_counter;
// Read the cycle counter
asm volatile("MRS %0, PMCCNTR_EL0" : "=r" (cycle_counter));
return cycle_counter;
}
Step 3: Modify User-Space Application to Use the System Call
Once the kernel module is in place, the user-space application can be modified to use the system call for cycle counter readings. The following code snippet demonstrates how to call the get_cycle_counter
system call from user space:
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/kernel.h>
#define SYS_GET_CYCLE_COUNTER 333 // Replace with the actual system call number
uint64_t get_cycle_counter(void) {
return syscall(SYS_GET_CYCLE_COUNTER);
}
int main() {
uint64_t start, end, delta;
start = get_cycle_counter();
// Perform the operation to be measured
end = get_cycle_counter();
delta = end - start;
printf("Cycle count: %lu\n", delta);
return 0;
}
Step 4: Ensure Proper Synchronization with Memory Barriers
When accessing the cycle counter, it is crucial to use memory barriers to ensure accurate and consistent readings. The following code snippet demonstrates how to use memory barriers in the user-space application:
void arm_v8_memory_barrier(void) {
asm volatile ("DSB SY");
asm volatile ("ISB");
}
int main() {
uint64_t start, end, delta;
arm_v8_memory_barrier();
start = get_cycle_counter();
arm_v8_memory_barrier();
// Perform the operation to be measured
arm_v8_memory_barrier();
end = get_cycle_counter();
arm_v8_memory_barrier();
delta = end - start;
printf("Cycle count: %lu\n", delta);
return 0;
}
Summary of Key Points
- The ARM Cortex-A72 PMU registers, including the cycle counter, are accessible only at PL1 or higher.
- User-space applications cannot directly access these registers, even with root privileges.
- A kernel module is required to enable user-space access to the PMU registers.
- The kernel module must configure the
PMUSERENR_EL0
register and provide a system call interface for user-space applications. - Memory barriers (
DSB
andISB
) must be used to ensure accurate cycle counter readings.
By following these steps, you can successfully access the cycle counter from user space on an ARM Cortex-A72 platform running Linux, enabling precise timing measurements for performance analysis and optimization.