Arm - Geek went Freak!


Ubuntu: Cross-compile baremetal Cortex-M assembly program

In this post, we will cross-compile a small baremetal program for ARM processor on an Ubuntu machine.

ARM cross-compile toolchain

First step is to install the ARM cross-compiler toolchain. Luckily Ubuntu already has it in its software repository. Execute the following command in the terminal to install ARM EABI compatible tool chain:

sudo apt install gcc-arm-none-eabi

Check the version of the installed compiler using the following command:

arm-none-eabi-gcc --version

Sample baremetal program

Now, we need a sample baremetal program to compile. I have choosen a very simple assembly program.


.global _start
  B _reset /* Reset */
  B . /* Undefined */
  B . /* SWI */
  B . /* Prefetch Abort */
  B . /* Data Abort */
  B . /* reserved */
  B . /* IRQ */
  B . /* FIQ */

  mov r1, #10
  ldr r0, =0x20000000
  str r1, [r0]
  ldr r2, [r0]
  B .


Lets assemble the assembly file using GCC assembler.

arm-none-eabi-as -mcpu=cortex-m3 -g startup.S -o startup.out


Finally lets link the object file startup.out generated by the assembler.

arm-none-eabi-ld -Ttext=0x0 -o startup.elf startup.out

Note: Since the program is very simple, I haven’t used any linker script here.

-Ttext=0x0 option instructs the linker to use 0x0 as the starting address of the instructions.

Ubuntu: Emulate baremetal Cortex-M program

In this post, we will emulate a baremetal program for Cortex-M on Ubuntu PC.


We will need

  1. QEMU emulator for ARM
  2. GDB

Fortunately both of them are available through Ubuntu software repository.

Install them using the following command:

sudo apt install qemu-system-arm
sudo apt install gdb-arm-none-eabi


We will use QEMU for emulation. GDB is used to control and inspect QEMU.

Launch QEMU

qemu-system-arm -monitor stdio -machine lm3s811evb -cpu cortex-m3 -s -S -kernel startup.elf
  • -monitor stdio
    Access QEMU HMI monitor from terminal
  • -machine lm3s811evb -cpu cortex-m3
    Select machine lm3s811evb and CPU cortex-m3
  • -s
    Start GDB server on localhost:1234
  • -S
    Don’t start execution. This is used so we can start and control execution from GDB
  • -kernel startup.elf
    The executable file to execute

Launch GDB client

arm-none-eabi-gdb startup.elf

You should now be in GDB interactive console.

Connect to QEMU

Lets connect to GDB server hosted by QEMU from the GDB client

target remote localhost:1234

Run the program



Press <Ctrl-c> to stop execution.

Check registers

In lines 13, 14 and 16, we update registers r1, r0 and r2 respectively. They should hold values 0x20000000, 10 and 10 respectively.

info reg r0 r1 r2

Should print:

r0 0x20000000 536870912
r1 0xa 10
r2 0xa 10

Check memory

We write value 10 to memory address 0x20000000. Lets check if that worked correctly:

x/4wx 0x20000000

0x20000000: 0x0000000a 0x00000000 0x00000000 0x00000000

STM32F: Calculating APB clock frequency (PCLKx)

The clock frequency of APB is determined through a long sequence of prescaling and selecting as shown in the image below:

APB clock source flow

Note: In this post, external oscillator and PLL are used to select SYSCLK.

Term Explanation
HSE External clock frequency
PLLM PLL division factor
PLLN PLL multiplication factor
PLLP SYSCLK division factor
HPRE AHB prescaler
PPREx APBx prescaler


$$tex fVCO = \frac{HSE}{PLLM} * PLLN tex$$


$$tex SYSCLK = \frac{fVCO}{PLLP} tex$$

AHB clock

$$tex HCLK = \frac{SYSCLK}{HPRE} tex$$

APB clock

$$tex PCLKx = \frac{HCLK}{PPREx} tex$$

An example

Lets consider an external oscillator of frequency 16MHz. Lets say we need a SYSCLK and HCLK of 168MHz.

>> HPRE = 1

This leaves us with,

$$tex \frac{PLLN}{PLLM * PLLP} = \frac{SYSCLK}{HSE} tex$$
$$tex \frac{PLLN}{PLLM * PLLP} = 10.5 tex$$

We can settle with the following values:

>> PLLN = 336
>> PLLM = 16
>> PLLP = 2  

Now, for a PCLKx of 42MHz, we can pick,

>> PPREx = 4  

STM32F: What is PCLK and fPCLK

Couple of peripherals like SPI and UART derive their clock from the fPCLK. So what clock is PCLK and what frequency is fPCLK?

PCLKx is the clock of the corresponding APB peripheral X. For example:

Clock Bus

Note: Similarly, HCLKx is the clock of the corresponding AHB peripheral X. For example:

Clock Bus

So, when SPI2 says it derives its clock for fPCLK, what it means is the clock of its APB bus. In STM32F407, SPI2 is on APB1. So this makes its fPCLK fPCLK1.

LPC810: UART baudrate configuration

This post is about UART baudrate configuration in LPC810.

FRG and BRG can be used to derive the desired baudrate.

Block diagram

Setup BRG

BRG should produce output clock rate 16 times the desired baudrate. The input clock to BRG is BASECLK.

$$tex BRGVAL = \frac{BASECLK}{16 * Baudrate} tex$$
//Setup BRG
LPC_USART0->BRG = MAINCLK / (16 * aBaudRate);

Setup FRG

Output clock from FRG is common for all UART peripherals.

$$tex UARTFRGMUL = \frac{FRGINCLK*(UARTFRGDIV+1)}{16 * Baudrate * BRGVAL} - (UARTFRGDIV+1) tex$$

It is easier, if we set UARTFRGDIV to 255.

//Set up FRG
LPC_SYSCON->UARTFRGMULT = ((MAINCLK * 256) / (16 * aBaudRate * LPC_USART0->BRG))
    - 256;

Setup clock to FRG

$$tex UARTDIV = \frac{MAINCLK}{FRGINCLK} tex$$
//Setup clock to FRG


The pin and the protocol

SWD (Serial Wire Debug) is a minimal pin debug and trace port. At a very minimal configuration, SWD consists of a debug port matching JTAG’s functionality but with just 2 wires.

These two wires are:

  1. SWCLK
  2. SWDIO

SWCLK is the clock for the synchronous bi-directional half-duplex communication channel SWDIO that runs between the host and the target. The host can access the DAP (Debug Access Port) as per ADI (ARM Debug Interface specification) through SWDIO pin.

ARM cores have advanced tracing functionality enabled by ITM, ETM, DWT, etc. These are asynchronous trace messages needed to be sent from the processor to the host.

In addition to SWCLK and SWDIO, another wire can optionally be added to obtain trace functionality. This wire is called SWO (Serial Wire Output).

SWO is a unidirectional asynchronous pin with trace data flowing from the target to the host. SWV data can be sent over SWO pin using either UART or Manchester encoded.


When the trace data is sent through SWO pin, it is called Serial Wire View.

Trace data can also be sent through parallel data bus called TPIU.


SWO is a pin/wire in SWD port whereas SWV is a tracing protocol and technology that is sent through the SWO pin.


  1. CoreSight Technology

CPS instruction: Difference between Cortex-M vs others


Cortex-A and Cortex-M families have different interrupt and exception models.

Cortex-A has traditional interrupts through IRQ and FIQ. While Cortex-M has vector table supported by NVIC controller.

In Cortex-A, IRQ and FIQ are enabled/disabled using I and F flags in CPSR register.

#Enable IRQ interrupt cpsie i #Enable FIQ interrupt cpsie f

#Disable IRQ interrupt cpsid i #Disable FIQ interrupt cpsid f

One can also directly manipulate I and F flags in CPSR register using msr and mrs instructions.

I_BIT = 0x80 F_BIT = 0x40

#Disables IRQ and FIQ interrupts mrs r0, cpsr orr r0, r0, #I_BIT|F_BIT msr cpsr_c, r0

#Enables IRQ and FIQ interrupts mrs r0, cpsr bic r0, r0, #I_BIT|F_BIT msr cpsr_c, r1

In Cortex-M, there are no IRQ and FIQ. Interrupts can be disabled and enabled using PRIMASK and FAULTMASK registers.

msr PRIMASK, r0 msr FAULTMASK, r0

Even though Cortex-M doesn’t have either CPSR or I and F flags, it has cpsie and cpsid instructions to enable and disable interrupts and fault exceptions. When cpsi instructions are used in Cortex-M micro-controllers, they affect PRIMASK and FAULTMASK rather than CPSR register.

#Enable interrupts and configurable fault handlers (clear PRIMASK) cpsie i #Enable interrupts and fault handlers (clear FAULTMASK) cpsie f

#Disable interrupts and configurable fault handlers (set PRIMASK) cpsid i #Disable interrupts and all fault handlers (set FAULTMASK) cpsid f

Execution modes

Cortex-A processors have several execution modes. Current mode can be read and changed through 5 least significant bits of CPSR register.

C_BIT = 0x1F USER_BITS = 0b10000

mrs r0, cpsr bic r0, #C_BIT orr r0, #USER_BITS msr CPSR_c, r0

Cortex-M processor has only two execution modes. They are Thread and Handler modes. Current mode can be read and changed through least significant bit of control register.

msr CONTROL, r0

Cortex-M Program Status Register

Since M-profile has discarded IRQ and FIQ exceptions and also execution modes, these bits in PSR are unnecessary. So, M-profile adopts new PSR format and registers.

CPSR in non M-profile processors

Mode, I and F bits are meaningless in M-profile micro-controllers. Instead of them, M-profile adds other bits like ISR number, IT/ICI, etc.

PSR in non M-profile processors

It can be read in assembly using the following code:

mrs <rd>, PSR

These bits can also be accessed separately:

  1. APSR: Application Program Status Register
    • ALU flags
    • N, Z, C, C flags
    • mrs <rd>, APSR
  2. IPSR: Interrupt Program Status Register
    • Interrupt/Exception number
    • mrs <rd>, IPSR
  3. EPSR: Exception Program Status Register
    • IT, ICI, T bits
    • mrs <rd>, EPSR

Cortex-M: Fixed memory map

To aid portability of code between different Cortex-M micro-controller across vendors, M-profile architectures (ARMv6-M, ARMv7-M, ARMv7E-M, etc) defines standard fixed memory map.

Area Address range Notes STM32F407
Vendor specific 0xE0100000:0xFFFFFFFF
External device 0xA0000000:0xDFFFFFFF SD card, etc FSMC
External RAM 0x60000000:0x9FFFFFFF DDR, LCD, etc DDR
Peripheral 0x40000000:0x5FFFFFFF AHB, APB peripherals UART, Timers, ADC, etc
SRAM 0x20000000:0x3FFFFFFF SRAM/SDRAM/Data Data memory
Code 0x00000000:0x1FFFFFFF ROM/Flash/Code Code ROM memory

Maximum Flash and RAM size

Because of fixed memory map, both RAM and Flash areas in the memory map are 0x1FFFFFFF in length. This means maximum size of 512 MB.

Cortex-M family

ARM Cortex-M family is one of my favorite micro-controller architectures.

Instruction set: A radical shift

Instruction set in architectures ARMv6-M, ARMv7-M and ARMv7E-M are radically different than other versions of ARM architectures. The huge difference is, these architectures drop requirement of ARM instruction set altogether.

Thumb1 instruction set was not a complete ISA. It relies on ARM instruction set for certain functionality. Thumb2 lifted this limitation by making it a complete ISA. This allowed dropping ARM instruction mode.

This results in reduction in gate count and also incredible performance gain due to mixed 16 and 32 bit instructions over 16 bit Thumb1 instruction set.

Interrupt handling


No shadow registers


No execution modes


Program status registers


No coprocessors


Fixed memory map

To aid portability of code between different Cortex-M micro-controller across vendors, M-profile architectures (ARMv6-M, ARMv7-M, ARMv7E-M, etc) defines standard fixed memory map. More information can be found from this post.


  1. Fixed memory map