Back to Cambridge, I decided to focus on assembly code generation, which is the last layer of compilation. There are multiple things to perform to create a new target backend.
# TODO
# In /asmcomp/
:
This directory contains for each architecture a sub-directory that implements architecture-specific code. I created my own xtensa
sub-directory, which contains the following files:
emit.mlp
: a pre-processed OCaml file (on which syntax highlighting has a lot of troubles by the way). It implementsasmcomp/emit.mli
and consists in translating aLinearize.fundecl
code to assembly. This is obviously architecture-specific and I worked on it by roughly translating what was done on ARM.arch.ml
: defines architecture-dependant values such as endianness, addressing modes.proc.ml
: describes registers, calling conventions and the side effects of instructions on registers. Used by the register allocator.selection.ml
: operation and addressing selection overriding default behavior. Useful as Xtensa doesn't have double precision hardware floating point for example.scheduling.ml
: instruction timing hints.CSE.ml
: common subexpression elimination. Set to default.reload.ml
: instruction reloading. Set to default.
# In /asmrun/
:
xtensa.S
: an architecture-specific, handwritten assembly code is here to make the glue between C and OCaml code. It handles calls to the garbage collector.
# Progress
# Writing code
Last week I finished to fill emit.mlp
and proc.ml
to start debugging. I figured out when linking failed that I forgot to fill xtensa.S
assembly stubs.
There are a bit of features to fill in:
caml_call_gc
: call the runtime garbage collector.caml_alloc1
: allocate 4 bytescaml_alloc2
: allocate 8 bytescaml_allocN
: allocate N-4 bytes, with N given in a registercaml_c_call
: call a C functioncaml_start_program
: entry point after caml runtime startupcaml_callback_exn
: callback from C to OCaml with one argumentcaml_callback2_exn
: callback from C to OCaml with two argumentscaml_callback3_exn
: callback from C to OCaml with three argumentstrap_handler
: callback from exceptioncaml_raise_exn
: raise an exception from OCamlcaml_raise_exception
: raise an exception from C
# Linking it
The process is not that straightforward as compiling and linking for ESP32 relies on the espressif's Iot Development Framework with contains the linker script and required libraries. The ~easiest~ way I found, yet, to have some OCaml native code running on the ESP32 is the following:
ocamlopt-esp32 test.ml -dstartup -o main.o -S -dstartup
will generate two assembly files and fail on linking:
main.s
is the main source codemain.o.startup.s
is the startup code which will then callmain.s
entry point.
- Create
startup-c.c
that will be the glue between ESP-IDF entry pointapp_main
and OCaml runtime entry pointcaml_main
. - Put all these files in an ESP-IDF component subdirectory of a project. That is for example
hello_caml/main/
. - Put library files generated by the compilation of ocaml-esp32 in a lib directory
hello_caml/lib/
:
libasmrun.a
libstdlib.a
std_exit.o
- Create a relocatable object file
startup-c.o
fromstartup-c.c
,main.s
andmain.o.startup.s
. - Add the libraries in the component Makefile through
COMPONENT_ADD_LDFLAGS
andCOMPONENT_EXTRA_INCLUDES
. make
# Debugging stuff
- I use QEMU for debugging. This github explains how to do it. It works out of the box with the gdb shipped with the repository.
- ESP32 WROVER kits have a JTAG interface, that will allow me to test my code on real hardware, once it works on QEMU.
# Funny stuff encountered
# Conditional branches don't have legs
The conditional branch has a range of +-128 bytes. My generated code tried to jump further, generating the Error: jump target out of range; no usable trampoline found
. I had to put a jump instruction close the conditional as I often need to go far away. The jump to label has a range of +-131075 bytes. If that's not enough I can address the whole space with a jump to address in register.
# Never look forward
The PC-relative load has a range of [-262141, -4]. Therefore data must be before every load and store instructions. The assembler handles this alone when compiling a single file. But the linker doesn't seem to handle that well accross files. I had to put additional symbols.
# What you see is not what you get
Xtensa processors can have a feature called "Windowed registers". It allows a processor to have a given number of registers (64) but only a subset interval of these registers are visible at each instant (16).
On call, you can ask the processor to move this window to the right, by a number of registers. It can be 0, 4, 8, or 12. There are special instructions that magically handles the fact that this window can overflow by spilling registers in stack memory.
That makes the ABI a bit special as a8
register of the caller is the a0
register of the callee if the call8
instruction is used.
Using call4
, call8
and call12
is compatible as the entry
function handles everything for you. However call0
is not compatible with entry
as the document explains it throws an IllegalInstruction exception. Guess what? I wanted to start with call0
ABI as it's simpler to reason about, but C code is compiled against call8
ABI.