Making AP CSP code real

Programming knowledge may be required!

What is the CSP pseudocode?

For thousands of high school students (including me), the first computer science class they take in high school is AP Computer Science Principles. To make this class language agnostic, the College Board tests students on questions written in a peculiar pseudocode format.

Around February 2023, I thought it would be a fun idea to try to implement an interpreter for the language armed with a small amount of previous knowledge, I attempted to create the CSP language.

However, it did not go so well. Despite a basic functioning prototype working, each time I attempted a more complex program, or tried to do something a little bit harder than what a CSP student could understand, I was met with thousands parsing and runtime errors. But it was powerful enough to solve AP Classroom problems.

The CSP language

I’ve talked a lot about the language behind AP CSP, now let me show you what it looks like.

PROCEDURE suffix(num) {
    IF (num < 0) { num <- num * -1 }

    l <- num MOD 10
    sm <- num MOD 100
    
    IF (l = 1 AND sm != 11) {
        RETURN("st")
    }
    IF (l = 2 AND sm != 12) {
        RETURN("nd")
    } 
    IF (l = 3 AND sm != 13) {
        RETURN("rd")
    }

    RETURN("th")
}

PROCEDURE fibbonacci(n) {
    x <- 0
    y <- 0
    z <- 1

    REPEAT n TIMES {
        x <- y
        y <- z
        z <- x + y
    }

    RETURN(x)
} 

DISPLAY("You don't run Java in your mind.")
DISPLAY("So why should you for CSP?")
DISPLAY("Number: ")


inp <- PARSE(INPUT())

IF (inp >= 500000) {
    DISPLAY("Woah, this is going to take awhile.")
}

fib <- fibbonacci(inp)
// Multiple arguments in display is non-normal / not in cb material
DISPLAY("The ", inp, suffix(inp), " Fibbonacci number is: ", fib)

As you can see, despite being pseudocode, the language is somewhat defined and rigid.

Revisiting the interpreter

I found the old interpreter while looking through my old projects and decided, why not finish it?

I quickly laid out quite a simple plan: going from text to tokens, tokens to an abstract syntax tree, and intrepreting the tree. I decided not to use a parser like I have in the past and just write the tokenizer myself.

Tokenizing

This is what the tokens are represented by

#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Token<'a> {
    Ident(&'a str),
    Flow(char),
    ArgSep,
    LineSep,
    Operator(Operator),
    Assign,
    Boolean(bool),
    String(&'a str),
    Number(i64),
    If,
    Else,
    Repeat,
    Times,
    Until,
    For,
    Each,
    In,
    Procedure,
}

To tokenize the project, I decided to build a relatively simple parser, one which is basically and extended iterator over the input string. After looping through each element, the code begins by checking for matching symbols. After checking for those, we look for a quotation to start a string, then a alphanumeric string to start identifier, and finally a number.

Tokens to Ast

This was one of the more painful parts of the code. After struggling with coming up with an idea for how to do it, I finally came up with a idea. Iterate through each token and provide the rest of the iterator to a parsing function for that token. This implementation is actually how the code still works*.

pub fn create_ast<'a, 'b>(tokens: &'b [Token<'a>]) -> Result<Vec<Ast<'a>>, AstError<'b>> {
    let mut block = Vec::new();
    let mut iterator = PeekableSliceIterator::from(tokens);

    while let Some(token) = iterator.peek() {
        let ast = match token {
            Token::Ident(_) => parse_basic(&mut iterator, Token::LineSep),
            Token::If => parse_if(&mut iterator),
            Token::Repeat => parse_repeat(&mut iterator),
            Token::For => parse_for(&mut iterator),
            Token::Procedure => parse_procedure(&mut iterator),
            Token::LineSep => {
                iterator.next();
                continue;
            }
            token => return Err(AstError::StartingToken(token)),
        };
        block.push(ast?);
    }

    Ok(block)
}

The astute among you may release I used a PeekableSliceIterator, which was actually necessary for implementing the code.

Most of these methods are pretty simple, parse_if, parse_repeat, parse_for, and parse_procedure, are all pretty much implemented the same way, they make sure the required tokens are present, parse the condition using parse_basic and parse the block with parse_block.

Parse_basic

The idea behind parse_basic, was pretty simple, take in all the tokens in a line, and run a quick check, if the length was 0, return a Nop, if it was 1, parse a literal (string, int), otherwise, search for a operator -, <-, MOD, etc. split and parse both sides as a basic_token and return that operation.

Hurdle #1

Parse basic requires operators to be parsed in reverse order, as the first parsed operators wrap the other ones. This led to me discovering I accidently parsed all math as right to left instead of left to right.

The code which splits operators is designed as such.

// I love lifetimes I love lifetimes I love lifetimes I love lifetimes
// this function used to have a lifetime 'c aswell
fn split_any<'a, 'b, const N: usize>(
    slice: &'b [Token<'a>],
    break_at: [Token<'static>; N],
) -> Option<(&'b Token<'a>, &'b [Token<'a>], &'b [Token<'a>])> {
    // I hate my life 'r'
    let mut nest = 0;
    let pos = slice.iter().rposition(|x| {
        if x == &Token::Flow('(') {
            nest += 1;
        } else if x == &Token::Flow(')') {
            nest -= 1;
        }
        nest == 0 && break_at.contains(x)
    })?;

    Some((slice.get(pos)?, slice.get(..pos)?, slice.get(pos + 1..)?))
}

As you can tell, I was not enjoying the lifetimes, however this was pretty tame compared to same of the borrow checker issues I had to deal with later (async is hard). You might also notice the Nest variable, which is important for making sure parenthesis match, and something which I forgot for every single parenthesis and bracket matching throughout the program. All this code created the following Abstract Syntax Tree

type Ref<'a> = Box<Ast<'a>>;

// A much more powerful representation of tokens
#[derive(Clone, Debug, PartialEq)]
pub enum Ast<'a> {
    Assign(Ref<'a>, Ref<'a>),
    Identifier(&'a str),
    Unary(Operator, Ref<'a>),
    Operation(Operator, Ref<'a>, Ref<'a>),
    Block(Vec<Ast<'a>>),
    Literal(Value<'a>),
    ListLiteral(Vec<Ast<'a>>),
    If(Ref<'a>, Vec<Ast<'a>>, Option<Vec<Ast<'a>>>),
    Index(Ref<'a>, Ref<'a>),
    Repeat(Ref<'a>, Vec<Ast<'a>>),
    RepeatUntil(Ref<'a>, Vec<Ast<'a>>),
    For(&'a str, Ref<'a>, Vec<Ast<'a>>),
    Procedure(&'a str, Vec<&'a str>, Vec<Ast<'a>>),
    Call(&'a str, Vec<Ast<'a>>),
    Nop,
}

#[derive(Clone, Debug, PartialEq)]
pub enum Value<'a> {
    String(Cow<'a, str>),
    Number(i64),
    Boolean(bool),
    List(Vec<Value<'a>>),
    Undefined,
}

The interpreter

Now that we have a working Abstract Syntax Tree parser, we can move on to how we are going to run that code. This stage was actually quite easy compared to the rest of them, because 2023 me knew how to right an recursive interpreter pretty well. The interpreter was mostly unchanged, with a few small tweaks for preformance and idomatic features with the versions of rust which have released since. Here’s an overview of how the interpreter works.

Flatten

Flatten is a function which receives a single node of the Ast and attempts to convert it into a mutable reference. This function is useful for anything which expects a value but is given a Ast<'a> instead. The flatten function calls procedures, preforms operations, and returns identifiers/variables.

Run

Run is a function which receives a evert node of the Ast and loops through each node and executes instructions. IF, REPEAT, FOR EACH, and are all handled here.

Functions

Functions and Procedures are defined in a hashmap, with native functions using a function pointer and procedures holding a Ast and list of arguments. This is how native functions are declared in the desktop version.

functions.insert(
    "REVERSE".into(),
    Function::NativeFunction(|args, env| {
        parse_args!("REVERSE", env, args, Value::String(string), Type::String);

        let rev = string.chars().rev().collect();
        Ok(Some(Value::String(rev)))
    }),
);

parse_args! is a macro designed to flatten args automagically and throw appropriate type errors.

Porting it all to the web

After I made a basic working prototype, I realized I used almost 0 dependencies to run the code, only relying on rand and thiserror. Two crates which are wasm compatable! I quickly found a js terminal emulator, and got to work.

I quickly realized to not block the browser from functioning, I would need to wrap the code in an async layer, pretty easy right? Just make each function async, slap #[async_recursion] and call it a day right?

Actually, that’s almost true, except for two things.

I had to write the input and printing functionality for the terminal
I had to make all the NativeFunctions async

To spare you the tiring process of figuring out how that would work, I’m just going to show you how it worked.

#[derive(Clone)]
enum Function<'a> {
    NativeFunction(
        Rc<
            dyn for<'b> Fn(
                Vec<Ast<'a>>,
                &'b mut Environment<'a>,
            )
                -> LocalBoxFuture<'b, Result<Option<Value<'a>>, RuntimeError<'a>>>,
        >,
    ),
    Function(Vec<&'a str>, Vec<Ast<'a>>),
}

That is a crazy type definition, with futures, high order lifetimes, and so many angle brackets it’s hard to count. The declaration of native functions also got more complex

functions.insert(
    "REVERSE".into(),
    Function::NativeFunction(Rc::new(|args, env| {
        Box::pin(async move {
            parse_args!("REVERSE", env, args, Value::String(string), Type::String);

            let rev = string.chars().rev().collect();
            Ok(Some(Value::String(rev)))
        })
    })),
);

I was actually too lazy to continue to write this each time, and ended up converting it into a macro, making the function definition as clean as the sync version.

functions.insert(
    "REVERSE".into(),
    native!(args, env, async move {
        parse_args!("REVERSE", env, args, Value::String(string), Type::String);

        let rev = string.chars().rev().collect();
        Ok(Some(Value::String(rev)))
    }),
);

If you’re on pc, try out the online interpreter or look at the source code.