Standard C Library Functions Execution in External Flash Causing Performance Bottlenecks

In ARM-based SoC designs, the execution of standard C library functions such as memcpy, sin, and others in external flash memory can lead to significant performance degradation. This is primarily due to the slower access times and higher latency associated with external flash compared to on-chip RAM. When these functions are executed from external flash, the entire software system experiences delays, as the processor frequently stalls waiting for instructions and data to be fetched from the slower memory. This issue is particularly critical in real-time systems where deterministic performance is required.

The problem is exacerbated in systems where the standard C library functions are frequently called, as each call incurs the penalty of accessing external flash. This not only slows down the execution of the functions themselves but also impacts the overall system performance, as the processor is unable to execute other tasks while waiting for memory accesses to complete. In some cases, the performance bottleneck can be severe enough to cause the system to miss real-time deadlines, leading to functional failures.

To address this issue, it is necessary to relocate the execution of these standard C library functions to a specific sector of on-chip RAM. This relocation ensures that the functions are executed from faster memory, reducing access latency and improving overall system performance. However, achieving this relocation requires careful consideration of the memory map, linker script configuration, and the behavior of the standard C library functions themselves.

Linker Script Misconfiguration and Memory Map Constraints

One of the primary causes of standard C library functions executing from external flash is a misconfigured linker script. The linker script is responsible for defining the memory layout of the application, including the placement of code, data, and other sections in memory. If the linker script is not properly configured to place the standard C library functions in on-chip RAM, these functions will default to being placed in external flash, leading to the performance issues described earlier.

Another potential cause is the memory map constraints of the SoC. In some ARM-based SoCs, the on-chip RAM may be limited in size, making it challenging to allocate sufficient space for the standard C library functions. This limitation can force the linker to place these functions in external flash, even if the linker script is correctly configured. Additionally, the memory map may have specific regions reserved for other purposes, such as DMA buffers or peripheral registers, further restricting the available space for the standard C library functions.

The behavior of the standard C library functions themselves can also contribute to the problem. Some functions, such as memcpy, may be implemented in a way that requires them to be placed in a specific memory region to function correctly. If these functions are not explicitly placed in the desired memory region, they may end up being placed in external flash, leading to performance bottlenecks.

Relocating Standard C Library Functions to On-Chip RAM

To resolve the issue of standard C library functions executing from external flash, the following steps can be taken to relocate these functions to on-chip RAM:

Step 1: Modify the Linker Script

The first step in relocating the standard C library functions to on-chip RAM is to modify the linker script to explicitly place these functions in the desired memory region. This involves defining a new section in the linker script for the standard C library functions and specifying that this section should be placed in on-chip RAM. The following example illustrates how this can be done:

MEMORY
{
    FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
    RAM (rwx)  : ORIGIN = 0x20000000, LENGTH = 256K
}

SECTIONS
{
    .text : {
        *(.text)
        *(.rodata)
        *(.glue_7)
        *(.glue_7t)
        *(.vfp11_veneer)
        *(.v4_bx)
    } > FLASH

    .libc_functions : {
        *libc.a:*(.text)
        *libc.a:*(.rodata)
    } > RAM

    .data : {
        *(.data)
    } > RAM

    .bss : {
        *(.bss)
    } > RAM
}

In this example, the .libc_functions section is defined to include all text and read-only data sections from the standard C library (libc.a). This section is then placed in the RAM memory region, ensuring that the standard C library functions are executed from on-chip RAM.

Step 2: Ensure Sufficient On-Chip RAM Allocation

After modifying the linker script, it is important to ensure that there is sufficient on-chip RAM available to accommodate the standard C library functions. This may involve analyzing the size of the standard C library functions and comparing it to the available on-chip RAM. If the available RAM is insufficient, it may be necessary to optimize the memory usage by removing unused functions or data, or by increasing the size of the on-chip RAM if possible.

Step 3: Verify Function Placement and Performance

Once the linker script has been modified and sufficient on-chip RAM has been allocated, the next step is to verify that the standard C library functions are correctly placed in on-chip RAM and that the performance has improved. This can be done by examining the memory map generated by the linker and by running performance benchmarks to measure the execution time of the standard C library functions.

To verify the function placement, the memory map file generated by the linker can be inspected to ensure that the standard C library functions are located in the on-chip RAM region. The following is an example of what the memory map might look like:

Memory Map

FLASH (0x08000000 - 0x08100000)
    .text         0x08000000     0x10000
    .rodata       0x08010000     0x2000

RAM (0x20000000 - 0x20040000)
    .libc_functions 0x20000000   0x8000
    .data         0x20008000     0x4000
    .bss          0x2000C000     0x4000

In this example, the .libc_functions section is correctly placed in the RAM region starting at address 0x20000000.

To verify the performance improvement, benchmarks can be run to measure the execution time of the standard C library functions before and after the relocation. The results should show a significant reduction in execution time when the functions are executed from on-chip RAM compared to external flash.

Step 4: Handle Function Dependencies and Initialization

Some standard C library functions may have dependencies on other functions or data that are also located in external flash. In such cases, it is important to ensure that these dependencies are also relocated to on-chip RAM to avoid performance bottlenecks. This may involve modifying the linker script to include additional sections or using custom initialization code to copy the required data from external flash to on-chip RAM at startup.

For example, if a standard C library function relies on a lookup table stored in external flash, the lookup table should also be relocated to on-chip RAM. This can be done by adding a new section in the linker script for the lookup table and using initialization code to copy the table from external flash to on-chip RAM during system startup.

SECTIONS
{
    .text : {
        *(.text)
        *(.rodata)
        *(.glue_7)
        *(.glue_7t)
        *(.vfp11_veneer)
        *(.v4_bx)
    } > FLASH

    .libc_functions : {
        *libc.a:*(.text)
        *libc.a:*(.rodata)
    } > RAM

    .libc_data : {
        *libc.a:*(.data)
    } > RAM

    .data : {
        *(.data)
    } > RAM

    .bss : {
        *(.bss)
    } > RAM
}

In this example, the .libc_data section is defined to include all data sections from the standard C library (libc.a). This section is then placed in the RAM memory region, ensuring that any data required by the standard C library functions is also located in on-chip RAM.

Step 5: Optimize for Power and Area Constraints

In some ARM-based SoC designs, there may be power and area constraints that limit the amount of on-chip RAM available for relocating standard C library functions. In such cases, it may be necessary to optimize the memory usage further by selectively relocating only the most frequently used functions to on-chip RAM, while leaving less frequently used functions in external flash. This can be done by analyzing the call graph of the application to identify the most critical functions and modifying the linker script to place only these functions in on-chip RAM.

For example, if the memcpy function is identified as one of the most frequently used functions, it can be selectively relocated to on-chip RAM while leaving other less frequently used functions in external flash. This can be achieved by modifying the linker script as follows:

SECTIONS
{
    .text : {
        *(.text)
        *(.rodata)
        *(.glue_7)
        *(.glue_7t)
        *(.vfp11_veneer)
        *(.v4_bx)
    } > FLASH

    .libc_functions : {
        *libc.a:memcpy.o(.text)
        *libc.a:memcpy.o(.rodata)
    } > RAM

    .data : {
        *(.data)
    } > RAM

    .bss : {
        *(.bss)
    } > RAM
}

In this example, only the memcpy function from the standard C library is placed in on-chip RAM, while other functions remain in external flash. This approach allows for a balance between performance and memory usage, ensuring that the most critical functions are executed from faster memory while minimizing the impact on power and area constraints.

Step 6: Validate the Solution Across Different Toolchains and Compiler Versions

Finally, it is important to validate the solution across different toolchains and compiler versions to ensure compatibility and consistent behavior. Different toolchains and compiler versions may have different default behaviors for placing standard C library functions in memory, and it is important to verify that the linker script modifications work as expected across all relevant toolchains and compiler versions.

This validation can be done by building the application with different toolchains and compiler versions and verifying that the standard C library functions are correctly placed in on-chip RAM in each case. Any discrepancies should be addressed by further modifying the linker script or by using conditional compilation to handle toolchain-specific differences.

Conclusion

Relocating standard C library functions to on-chip RAM is a critical optimization for ARM-based SoC designs, particularly in systems where performance is a key concern. By carefully modifying the linker script, ensuring sufficient on-chip RAM allocation, and validating the solution across different toolchains and compiler versions, it is possible to significantly improve the performance of the system while minimizing the impact on power and area constraints. This approach ensures that the most critical functions are executed from faster memory, leading to a more efficient and responsive system overall.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *