How I made RBA

Programming knowledge may be required!

What is Rba?

RBA, otherwise known as Really Bad Assembly, was an experimental programming language I wanted to develop. To make it easier for myself, I limited the instruction set to a very simple level, making it almost an “assembly language”.

Specification

The Really Bad Assembly specification is available on the github page, but I will briefly cover it here As well.

All variables in RBA already exist, are set to 0 and are of type u64. A valid variable name is any set of lowercase characters. RBA has types 2 constants, a number constant and a string pointer constant (like a c string). The & operator dereferences a value, returning the value at the memory address. Not confusing at all right?

All C functions are automatically added to the JIT compiler and a few basic ones are supported with the interpreted one.

Example programs

Why don’t we look at two example programs for this language. First up, the echo program.

INC io;

CALL malloc 1024 Z;
CALL read 2, Z, 40 am; 
CALL write 0, Z, am;

This program begins by including the IO module. Then it continues by allocating 1024 bytes into variable Z. It then calls the read function on pipe 2, which corresponds to stdin. It stores the amount of bytes read in am and then writes the content in Z to the pipe 0 which corresponds to stdout.

Let’s look at a Fibbonacci program.

INC io;

CALL malloc 64 Z;
CALL read 2, Z, 40;
CALL atol Z it;

ADD it 1;

MOV 1 X;
MOV 0 Z;

LABEL: loop;

MOV X Y;
MOV Z X;
ADD Z Y;

SUB it 1;
JNZ it loop;

OUT X;

This program begins by reading in a number similar to how it did in the echo program. It then uses atol to convert the string to a number.

The Program continues by setting the variables X=1, Y=0, and Z=0 and establishes the label loop. The program then does the classic Fibonacci algorithm, and subtracts one from it which stores the number taken in as input. After the decrement, the program jumps back to the label loop if it is not 0. After the Fibbonacci loop runs it times it outputs X the itth Fibbonacci number.

How does the language work?

Step 1: Dividing into tokens

The language use the parser to convert the text input into the following struct.

#[derive(Clone, Debug)]
#[repr(u8)]
pub enum AsmIns {
    Include(Label),
    Move(Val, Var),
    Swap(Var, Var),
    Add(Var, Val),
    Sub(Var, Val),
    Mul(Var, Val),
    Div(Var, Val),
    Mod(Var, Val),
    Label(Label),
    JZ(Val, Label),
    JNz(Val, Label),
    TakeInput,
    CopyInput,
    Nop,
    Output(Val),
    Call(Label, Vec<Val>, Option<Var>),
    Function(Label, Vec<AsmIns>)
}

This struct is a complete description of every possible operator in the language.

Step 2: Interpreting language

The language has preprocessing so intrepreting the language is as simple as looping through each instruction one by one and executing it. Every instruction is run through one by one, and can return one of 3 results.

#[derive(Clone, Eq, PartialEq, Hash)]
enum InsResult {
    Failure,
    Success,
    Rewind(Label)
}

A failure instantly quits the program, a success has no effect, and a rewind jumps to one of the stored label positions in the programs memory.

The program keeps memory 3 HashMaps.

  • Label lookup hashmap
  • Function hashmap
  • Variable hashmap

Running a single instruction uses the following method.

unsafe fn run_ins(ins: &AsmIns, rgs: &mut HashMap<String, Word>, funcs: &HashMap<String, *const u8>) -> InsResult

This is the interpreting loop.

pub unsafe fn execute(ins: &[AsmIns], provider: impl ModuleProvider) {
    let mut regs = HashMap::new();

    let mut lookup = HashMap::new();
    let mut func = HashMap::new();
    func.insert(String::from("malloc"), libc::malloc as *const u8);
    func.insert(String::from("atol"), libc::atol as *const u8);

    // Code to add modules - hidden
    
    loop {
        if idx >= ins.len() { break; }
        let ir = run_ins(&ins[idx], &mut regs, &func);

        match ir {
            InsResult::Rewind(pos) => { idx = *lookup.get(&pos).expect("Invalid jump"); }
            _ => { idx += 1; }
        }
    }
}

Step 3: Module System

The module system of the language is quite dynamic and actually pretty easy to write in plain rust! For example, this is how the basic std module is implemented.

#[module(std)]
impl Std {
    fn printc(val: Word) { println!("{val}") }
    fn printa(val: Word) { print!("{}", char::from_u32(val as u32).unwrap()) }
    fn top_8(val: Word) -> Word { val >> 56 }
    fn addr_8(val: Addr) -> Word {
        unsafe {
            let ptr: *mut u8 = std::mem::transmute(val);
            ptr.read() as Word
        }
    }
}

That was actually quite simple, but what exactly is happening here?

The module trait is required to be implemented for a module, and only one function is required to be implemented, one which returns a iterator of strings and function pointers to be called. To save the time required to manually write each and every function name and cast each, the crate has a derive macro! While the code for the macro is outside of the scope of this article, you can see the code here!

Step 4: JIT compiler

The JIT compiler uses cranelift to compile RBA to a machine code. The cranelift code is generated by converting basic operations like add and subtract to their cranelift equivalents, and injecting modules and c functions into the runtime along with creating blocks when JNz and Jz instructions are found.

Result

A fully functioning basic assembly language, which isn’t really assembling to anything. Here is a demo text editor program being run in the language.

Thanks for reading!