C - Geek went Freak!


Ubuntu: Cross-compile baremetal Cortex-M assembly program

In this post, we will cross-compile a small baremetal program for ARM processor on an Ubuntu machine.

ARM cross-compile toolchain

First step is to install the ARM cross-compiler toolchain. Luckily Ubuntu already has it in its software repository. Execute the following command in the terminal to install ARM EABI compatible tool chain:

sudo apt install gcc-arm-none-eabi

Check the version of the installed compiler using the following command:

arm-none-eabi-gcc --version

Sample baremetal program

Now, we need a sample baremetal program to compile. I have choosen a very simple assembly program.


.global _start
  B _reset /* Reset */
  B . /* Undefined */
  B . /* SWI */
  B . /* Prefetch Abort */
  B . /* Data Abort */
  B . /* reserved */
  B . /* IRQ */
  B . /* FIQ */

  mov r1, #10
  ldr r0, =0x20000000
  str r1, [r0]
  ldr r2, [r0]
  B .


Lets assemble the assembly file using GCC assembler.

arm-none-eabi-as -mcpu=cortex-m3 -g startup.S -o startup.out


Finally lets link the object file startup.out generated by the assembler.

arm-none-eabi-ld -Ttext=0x0 -o startup.elf startup.out

Note: Since the program is very simple, I haven’t used any linker script here.

-Ttext=0x0 option instructs the linker to use 0x0 as the starting address of the instructions.

Ubuntu: Emulate baremetal Cortex-M program

In this post, we will emulate a baremetal program for Cortex-M on Ubuntu PC.


We will need

  1. QEMU emulator for ARM
  2. GDB

Fortunately both of them are available through Ubuntu software repository.

Install them using the following command:

sudo apt install qemu-system-arm
sudo apt install gdb-arm-none-eabi


We will use QEMU for emulation. GDB is used to control and inspect QEMU.

Launch QEMU

qemu-system-arm -monitor stdio -machine lm3s811evb -cpu cortex-m3 -s -S -kernel startup.elf
  • -monitor stdio
    Access QEMU HMI monitor from terminal
  • -machine lm3s811evb -cpu cortex-m3
    Select machine lm3s811evb and CPU cortex-m3
  • -s
    Start GDB server on localhost:1234
  • -S
    Don’t start execution. This is used so we can start and control execution from GDB
  • -kernel startup.elf
    The executable file to execute

Launch GDB client

arm-none-eabi-gdb startup.elf

You should now be in GDB interactive console.

Connect to QEMU

Lets connect to GDB server hosted by QEMU from the GDB client

target remote localhost:1234

Run the program



Press <Ctrl-c> to stop execution.

Check registers

In lines 13, 14 and 16, we update registers r1, r0 and r2 respectively. They should hold values 0x20000000, 10 and 10 respectively.

info reg r0 r1 r2

Should print:

r0 0x20000000 536870912
r1 0xa 10
r2 0xa 10

Check memory

We write value 10 to memory address 0x20000000. Lets check if that worked correctly:

x/4wx 0x20000000

0x20000000: 0x0000000a 0x00000000 0x00000000 0x00000000

Windows: build C/C++ code from command line

To build C/C++ code from command line, the cl.exe binary must be in the path environment variable.

Lets try to compile a single code:

int add(int a, int b) {
  return a + b;

int main(int argc, char const *argv[]) {
  int temp = add(5, 15);
  return 0;
cl.exe simple.c

That would throw the following error:

LINK : fatal error LNK1104: cannot open file ‘LIBCMT.lib’


The reason is, just having the cl binary in the path isn’t enough. cl expects some configuration from the environment variables. This can be setup by executing the following command:

"C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7/Tools/VsDevCmd.bat"

Now, the compilation should be smooth:

cl.exe simple.c

Note: This only compiles to 32-bit binaries.

GCC: -fprofile-arcs va -ftest-coverage

Two flags we use in gcc to enable code coverage are -fprofile-arcs and -ftest-coverage.

So, what exactly do these flags do?

Block graph and source location information

-ftest-coverage generates .gcno files for corresponding translation units during compile time. These files have information regarding block graphs and their associated line numbers in platform independent format.

Profile statistics and counts

Code coverage is performed by instrumenting code into the target source code. This instrumented code then keeps track of statistics and counts of number of times a statement or a block has executed.

Functions in libgcov.a are responsible for recording these statistics information to .gcda file after execution.

The flag -fprofile-arcs links libgcov.a library into the executable.

Making sense of the information

Thus, during compile time -ftest-coverage generates .gcno and after runtime, -fprofile-arcs generates .gcda. gcov command uses these two files to link statistics in .gcda to block graph and source association information in .gcno.

Internals of CXCursor exposed by Clang's c interface

clang exposes a part of it to C libraries using clang-c interface.

One of the important data structure in this library is CXCursor. CXCursor is used to navigate through the AST tree. clang-c interface provides a lot of functions to interact with CXCursor instances.

Internally CXCursor is made of the following fields:

typedef struct {
   enum CXCursorKind kind;
   int xdata;
   const void *data[3];
} CXCursor;


kind returns the kind of the cursor.


I have no idea what it is doing.


data is an array of three pointers. Each pointer points to specific C++ classes that holds important information about the cursor.

Decl: data[0]

Decl provides valuable information about declaration statements. It is only available when the cursor is an element of a definition or declaration statement.

SourceLocation: data[1]

[SourceLocation][] provides information about where in the source code the cursor is pointing to.

TranslationUnit: data[2]

Translation unit the cursor belong to.

Differentiating function declaration vs definition using clang -ast-dump

#include "stdio.h"

extern void deinit();

void init() {
        //TODO initialization goes here

void deinit() {
        //TODO Deinitialization goes here

int main() {


At line 12, we have declaration of a function and at line 19, we have a definition.

If we run clang on it, we will get both those instances in the AST tree.

clang -fsyntax-only -Xclang -ast-dump -Xclang -ast-dump-filter -Xclang deinit  main.c


Dumping deinit:
FunctionDecl 0x104806320  col:13 deinit 'void ()' extern

Dumping deinit:
FunctionDecl 0x1048065e0 prev 0x104806320  line:10:6 used deinit 'void ()'
`-CompoundStmt 0x1048067a0 
  `-CallExpr 0x104806740  'int'
    |-ImplicitCastExpr 0x104806728  'int (*)(const char *, ...)' 
    | `-DeclRefExpr 0x104806688  'int (const char *, ...)' Function 0x104003d30 'printf' 'int (const char *, ...)'
    `-ImplicitCastExpr 0x104806788  'const char *' 
      `-ImplicitCastExpr 0x104806770  'char *' 
        `-StringLiteral 0x1048066e8  'char [18]' lvalue "Deinitializing..."

There is no way to query only definition or only declaration.

However, we can find, if it is declaration or definition by checking if it has further sub-tree.

Refactoring using clang-rename command

Google has contributed a very useful tool called clang-rename to clang. It can be used to rename symbols in C/C++ files.

Lets consider a project with following two files:


#include "stdio.h"

void setup(void) {
        printf("Setting up...\n");

static void priv_init(void) {
        printf("Module init for main...\n");

void execute(void);

int main() {


        return 0;


#include "stdio.h"

static void priv_setup() {
        printf("Module init for execute...\n");

void setup(void);

void execute() {
        while(1) {


We want to refractor two things in this file:

  1. Rename a global function setup to init in all files
  2. Rename a local function priv_setup in execute.c to execute_init in only that file

Rename global symbols

clang-rename -i -pl -pn -new-name=init -offset=26 execute.c main.c


clang-rename: renamed at: execute.c:7:6
clang-rename: renamed at: execute.c:11:3
clang-rename: renamed at: main.c:3:6
clang-rename: renamed at: main.c:14:2
clang-rename -i -pl -pn -new-name=execute_init -offset=33 main.c execute.c


clang-rename: renamed at: execute.c:3:13
clang-rename: renamed at: execute.c:12:3

It refactored the files like magic.

One slight problem is, it accepts offset as absolute character offset from start of file. This can be a bit hard to find out. I find Line:Column offset easier.

But here is a way to find the character offset using some trial and error:

head -c 33 execute.c

Head prints contents of a file up to the character offset.

Generate AST using clang

clang provides easy ways to get AST from the source code. It provides two flags to print AST into stdout. They are

  1. -ast-dump
  2. -ast-print

Lets consider the following source code, for example:

#include "stdio.h"

void init() {
	//TODO initialization goes here

void deinit() {
	//TODO Deinitialization goes here

int main() {


This command dumps the AST tree.

clang -fsyntax-only -Xclang -ast-dump main.c


TranslationUnitDecl 0x10302eac0 <> 
<< ... Some output has been cut out ... >>
|-FunctionDecl 0x1030d0320  line:3:6 used init 'void ()'
| `-CompoundStmt 0x1030d04d0 
|   `-CallExpr 0x1030d0470  'int'
|     |-ImplicitCastExpr 0x1030d0458  'int (*)(const char *, ...)' 
|     | `-DeclRefExpr 0x1030d03c8  'int (const char *, ...)' Function 0x1030bb930 'printf' 'int (const char *, ...)'
|     `-ImplicitCastExpr 0x1030d04b8  'const char *' 
|       `-ImplicitCastExpr 0x1030d04a0  'char *' 
|         `-StringLiteral 0x1030d03f0  'char [16]' lvalue "Initializing..."
|-FunctionDecl 0x1030d0510  line:8:6 used deinit 'void ()'
| `-CompoundStmt 0x1030d06d0 
|   `-CallExpr 0x1030d0670  'int'
|     |-ImplicitCastExpr 0x1030d0658  'int (*)(const char *, ...)' 
|     | `-DeclRefExpr 0x1030d05b8  'int (const char *, ...)' Function 0x1030bb930 'printf' 'int (const char *, ...)'
|     `-ImplicitCastExpr 0x1030d06b8  'const char *' 
|       `-ImplicitCastExpr 0x1030d06a0  'char *' 
|         `-StringLiteral 0x1030d0618  'char [18]' lvalue "Deinitializing..."
`-FunctionDecl 0x1030d0710  line:13:5 main 'int ()'
  `-CompoundStmt 0x1030d08b0 
    |-CallExpr 0x1030d0820  'void'
    | `-ImplicitCastExpr 0x1030d0808  'void (*)()' 
    |   `-DeclRefExpr 0x1030d07b8  'void ()' Function 0x1030d0320 'init' 'void ()'
    `-CallExpr 0x1030d0888  'void'
      `-ImplicitCastExpr 0x1030d0870  'void (*)()' 
        `-DeclRefExpr 0x1030d0848  'void ()' Function 0x1030d0510 'deinit' 'void ()'

That is very powerful output. It can be (and is) used in many IDEs to aid Go to definition and Go to declaration features.

One thing annoying about this output is, it dumps a lot of stuff that are less interesting from standard builtin libraries. I haven’t yet found a solution to turn this off. But clang does support a way to filter the output based on name using the flag -ast-dump-filter.

clang -fsyntax-only -Xclang -ast-dump -Xclang -ast-dump-filter -Xclang init  main.c


FunctionDecl 0x103079f20  line:3:6 used init 'void ()'
`-CompoundStmt 0x10307a0d0 
  `-CallExpr 0x10307a070  'int'
    |-ImplicitCastExpr 0x10307a058  'int (*)(const char *, ...)' 
    | `-DeclRefExpr 0x103079fc8  'int (const char *, ...)' Function 0x103035b30 'printf' 'int (const char *, ...)'
    `-ImplicitCastExpr 0x10307a0b8  'const char *' 
      `-ImplicitCastExpr 0x10307a0a0  'char *' 
        `-StringLiteral 0x103079ff0  'char [16]' lvalue "Initializing..."

Dumping deinit:
FunctionDecl 0x10307a110  line:8:6 used deinit 'void ()'
`-CompoundStmt 0x10307a2d0 
  `-CallExpr 0x10307a270  'int'
    |-ImplicitCastExpr 0x10307a258  'int (*)(const char *, ...)' 
    | `-DeclRefExpr 0x10307a1b8  'int (const char *, ...)' Function 0x103035b30 'printf' 'int (const char *, ...)'
    `-ImplicitCastExpr 0x10307a2b8  'const char *' 
      `-ImplicitCastExpr 0x10307a2a0  'char *' 
        `-StringLiteral 0x10307a218  'char [18]' lvalue "Deinitializing..."

As you can see it can filter by name and there is no way to use regular expression or filter by type. This is annoying!

-ast-print builds a AST tree and then prints it. It can be used to extract a function or declaration for quick peeks.

clang -fsyntax-only -Xclang -ast-print -Xclang -ast-dump-filter -Xclang init  main.c


Printing init:
void init() {

Printing deinit:
void deinit() {

objdump: Disassemble specific function

objdump can be used to view and analyze disassembled code. Lets consider this following code:

#include #include

int add(int a, int b) { return a + b; }

int main(void) { int lVal = add(5, 10); printf(“%d\n”, lVal); return 0; }

Disassembled code can be viewed using the following command:

objfmt -d a.out

The above command outputs:

a.out: file format mach-o-x86-64

Disassembly of section .text:

0000000100000f10 <_add>: 100000f10: 55 push %rbp 100000f11: 48 89 e5 mov %rsp,%rbp … … 100000f22: 5d pop %rbp 100000f23: c3 retq
100000f24: 66 66 66 2e 0f 1f 84 data16 data16 nopw %cs:0x0(%rax,%rax,1) 100000f2b: 00 00 00 00 00

0000000100000f30 <_main>: 100000f30: 55 push %rbp 100000f31: 48 89 e5 mov %rsp,%rbp … … 100000f69: 48 83 c4 10 add $0x10,%rsp 100000f6d: 5d pop %rbp 100000f6e: c3 retq

Disassembly of section TEXT.stubs:

0000000100000f70 <TEXT.stubs>: 100000f70: ff 25 9a 00 00 00 jmpq *0x9a(%rip) # 100001010 <_main+0xe0>

Disassembly of section TEXT.stub_helper:

0000000100000f78 <TEXT.stub_helper>: 100000f78: 4c 8d 1d 89 00 00 00 lea 0x89(%rip),%r11 # 100001008 <_main+0xd8> 100000f7f: 41 53 push %r11 100000f81: ff 25 79 00 00 00 jmpq *0x79(%rip) # 100001000 <_main+0xd0> 100000f87: 90 nop 100000f88: 68 00 00 00 00 pushq $0x0 100000f8d: e9 e6 ff ff ff jmpq 100000f78 <_main+0x48>

Disassembly of section TEXT.unwind_info:

0000000100000f98 <TEXT.unwind_info>: 100000f98: 01 00 add %eax,(%rax) 100000f9a: 00 00 add %al,(%rax) … … 100000fdc: 00 00 add %al,(%rax) 100000fde: 00 01 add %al,(%rcx)

But the problem is objfmt can only display disassembled code of all or specific sections. There is no way to request it to display disassembled code of a specific function or a range of address. For example, there is no way to only see disassembled code for add function.

To achieve this, gdb is a better option.

gdb provides disassemble command to print disassembled code.

gdb a.out disassemble add

This opens gdb in interactive mode. To display disassembled code directly:

gdb –batch –ex ‘file a.out’ –ex ‘disassemble add’

This provides only the requested function:

Dump of assembler code for function add: 0x0000000100000f10 <+0>: 55 push %rbp 0x0000000100000f11 <+1>: 48 89 e5 mov %rsp,%rbp 0x0000000100000f14 <+4>: 89 7d fc mov %edi,-0x4(%rbp) 0x0000000100000f17 <+7>: 89 75 f8 mov %esi,-0x8(%rbp) 0x0000000100000f1a <+10>: 8b 75 fc mov -0x4(%rbp),%esi 0x0000000100000f1d <+13>: 03 75 f8 add -0x8(%rbp),%esi 0x0000000100000f20 <+16>: 89 f0 mov %esi,%eax 0x0000000100000f22 <+18>: 5d pop %rbp 0x0000000100000f23 <+19>: c3 retq
End of assembler dump.

GCC Linker script

In this post, we will find out how linker scripts handle RAM sections.

We know that RAM is volatile. Hence, the linker/programmer cannot store anything on RAM. Anything that needs to be on RAM, should be stored in a persistent memory and loaded upon the OS/application startup.

In embedded systems, the persistent memory is usually flash. The initialization code copies the RAM data from the flash to the RAM before invoking main.

RAM usually contains 4 types of data:

  1. Initialized global and static variables (.data)
  2. Uninitialized or zero initialized global and static variables (.bss)
  3. Heap (.heap)
  4. Stack (.stack)

Both heap and stack are runtime oriented. Compiler doesn’t provide any guarantees on initial data of these sections. .bss section is initialized to zero, so the linker doesn’t have to store any data regarding it. The initialization code clears the .bss section in RAM.

.data section is the most interesting section. The data required to initialize this section is stored on flash and moved to the RAM during startup. Lets see how this is setup in linker script.

Consider the following memory map:

	  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
	  RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 256K

The tricky part with .data section is, the space required by it should be allocated on RAM and the required data should be stored on flash. One could write linker script output sections like this:

.ramdata :

.data :
} >RAM

What this basically does is duplicate input data sections in both flash(ramdata) and RAM(data). Eventhough this works, it creates a huge hex file. Also it might cause problems if flash routines are used to write to RAM address during programming.

GCC linker script provides a neat way to achieve this. Enter “AT” command! Using “AT” command, we can ask the linker to allocate space at one place(typically RAM) and actually store the data at another place (typically flash).

} >RAM

If you would like to specify a memory area instead of an address, this syntax can be used:

.ramdata :