Making AP CSP code real
What is the CSP pseudocode?
For thousands of high school students (including me), the first computer science class they take in high school is AP Computer Science Principles. To make this class language agnostic, the College Board tests students on questions written in a peculiar pseudocode format.
Around February 2023, I thought it would be a fun idea to try to implement an interpreter for the language armed with a small amount of previous knowledge, I attempted to create the CSP language.
However, it did not go so well. Despite a basic functioning prototype working, each time I attempted a more complex program, or tried to do something a little bit harder than what a CSP student could understand, I was met with thousands parsing and runtime errors. But it was powerful enough to solve AP Classroom problems.
The CSP language
I’ve talked a lot about the language behind AP CSP, now let me show you what it looks like.
PROCEDURE suffix(num) {
IF (num < 0) { num <- num * -1 }
l <- num MOD 10
sm <- num MOD 100
IF (l = 1 AND sm != 11) {
RETURN("st")
}
IF (l = 2 AND sm != 12) {
RETURN("nd")
}
IF (l = 3 AND sm != 13) {
RETURN("rd")
}
RETURN("th")
}
PROCEDURE fibbonacci(n) {
x <- 0
y <- 0
z <- 1
REPEAT n TIMES {
x <- y
y <- z
z <- x + y
}
RETURN(x)
}
DISPLAY("You don't run Java in your mind.")
DISPLAY("So why should you for CSP?")
DISPLAY("Number: ")
inp <- PARSE(INPUT())
IF (inp >= 500000) {
DISPLAY("Woah, this is going to take awhile.")
}
fib <- fibbonacci(inp)
// Multiple arguments in display is non-normal / not in cb material
DISPLAY("The ", inp, suffix(inp), " Fibbonacci number is: ", fib)
As you can see, despite being pseudocode, the language is somewhat defined and rigid.
Revisiting the interpreter
I found the old interpreter while looking through my old projects and decided, why not finish it?
I quickly laid out quite a simple plan: going from text to tokens, tokens to an abstract syntax tree, and intrepreting the tree. I decided not to use a parser like I have in the past and just write the tokenizer myself.
Tokenizing
This is what the tokens are represented by
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Token<'a> {
Ident(&'a str),
Flow(char),
ArgSep,
LineSep,
Operator(Operator),
Assign,
Boolean(bool),
String(&'a str),
Number(i64),
If,
Else,
Repeat,
Times,
Until,
For,
Each,
In,
Procedure,
}
To tokenize the project, I decided to build a relatively simple parser, one which is basically and extended iterator over the input string. After looping through each element, the code begins by checking for matching symbols. After checking for those, we look for a quotation to start a string, then a alphanumeric string to start identifier, and finally a number.
Tokens to Ast
This was one of the more painful parts of the code. After struggling with coming up with an idea for how to do it, I finally came up with a idea. Iterate through each token and provide the rest of the iterator to a parsing function for that token. This implementation is actually how the code still works*.
pub fn create_ast<'a, 'b>(tokens: &'b [Token<'a>]) -> Result<Vec<Ast<'a>>, AstError<'b>> {
let mut block = Vec::new();
let mut iterator = PeekableSliceIterator::from(tokens);
while let Some(token) = iterator.peek() {
let ast = match token {
Token::Ident(_) => parse_basic(&mut iterator, Token::LineSep),
Token::If => parse_if(&mut iterator),
Token::Repeat => parse_repeat(&mut iterator),
Token::For => parse_for(&mut iterator),
Token::Procedure => parse_procedure(&mut iterator),
Token::LineSep => {
iterator.next();
continue;
}
token => return Err(AstError::StartingToken(token)),
};
block.push(ast?);
}
Ok(block)
}
The astute among you may release I used a PeekableSliceIterator
,
which was actually necessary for implementing the code.
Most of these methods are pretty simple, parse_if
, parse_repeat
, parse_for
,
and parse_procedure
,
are all pretty much implemented the same way, they make sure the required tokens are present, parse the
condition using parse_basic and parse the block with parse_block.
Parse_basic
The idea behind parse_basic
,
was pretty simple, take in all the tokens in a line, and run a quick check,
if the length was 0, return a Nop, if it was 1, parse a literal (string, int), otherwise,
search for a operator -
, <-
, MOD
, etc.
split and parse both sides as a basic_token and return that operation.
Hurdle #1
Parse basic requires operators to be parsed in reverse order, as the first parsed operators wrap the other ones. This led to me discovering I accidently parsed all math as right to left instead of left to right.
The code which splits operators is designed as such.
// I love lifetimes I love lifetimes I love lifetimes I love lifetimes
// this function used to have a lifetime 'c aswell
fn split_any<'a, 'b, const N: usize>(
slice: &'b [Token<'a>],
break_at: [Token<'static>; N],
) -> Option<(&'b Token<'a>, &'b [Token<'a>], &'b [Token<'a>])> {
// I hate my life 'r'
let mut nest = 0;
let pos = slice.iter().rposition(|x| {
if x == &Token::Flow('(') {
nest += 1;
} else if x == &Token::Flow(')') {
nest -= 1;
}
nest == 0 && break_at.contains(x)
})?;
Some((slice.get(pos)?, slice.get(..pos)?, slice.get(pos + 1..)?))
}
As you can tell, I was not enjoying the lifetimes, however this was pretty
tame compared to same of the borrow checker issues I had to deal with later (async is hard). You might also
notice the Nest
variable, which is important for making sure parenthesis match, and something which I forgot for every single
parenthesis and bracket matching throughout the program. All this code created the following Abstract Syntax Tree
type Ref<'a> = Box<Ast<'a>>;
// A much more powerful representation of tokens
#[derive(Clone, Debug, PartialEq)]
pub enum Ast<'a> {
Assign(Ref<'a>, Ref<'a>),
Identifier(&'a str),
Unary(Operator, Ref<'a>),
Operation(Operator, Ref<'a>, Ref<'a>),
Block(Vec<Ast<'a>>),
Literal(Value<'a>),
ListLiteral(Vec<Ast<'a>>),
If(Ref<'a>, Vec<Ast<'a>>, Option<Vec<Ast<'a>>>),
Index(Ref<'a>, Ref<'a>),
Repeat(Ref<'a>, Vec<Ast<'a>>),
RepeatUntil(Ref<'a>, Vec<Ast<'a>>),
For(&'a str, Ref<'a>, Vec<Ast<'a>>),
Procedure(&'a str, Vec<&'a str>, Vec<Ast<'a>>),
Call(&'a str, Vec<Ast<'a>>),
Nop,
}
#[derive(Clone, Debug, PartialEq)]
pub enum Value<'a> {
String(Cow<'a, str>),
Number(i64),
Boolean(bool),
List(Vec<Value<'a>>),
Undefined,
}
The interpreter
Now that we have a working Abstract Syntax Tree parser, we can move on to how we are going to run that code. This stage was actually quite easy compared to the rest of them, because 2023 me knew how to right an recursive interpreter pretty well. The interpreter was mostly unchanged, with a few small tweaks for preformance and idomatic features with the versions of rust which have released since. Here’s an overview of how the interpreter works.
Flatten
Flatten is a function which receives a single node of the Ast and attempts to convert it into a mutable
reference. This function is useful for anything which expects a value but is given a Ast<'a>
instead. The flatten function calls procedures, preforms operations, and returns identifiers/variables.
Run
Run is a function which receives a evert node of the Ast and loops through each node
and executes instructions. IF
, REPEAT
, FOR EACH
,
and are all handled here.
Functions
Functions and Procedures are defined in a hashmap, with native functions using a function pointer and procedures holding a Ast and list of arguments. This is how native functions are declared in the desktop version.
functions.insert(
"REVERSE".into(),
Function::NativeFunction(|args, env| {
parse_args!("REVERSE", env, args, Value::String(string), Type::String);
let rev = string.chars().rev().collect();
Ok(Some(Value::String(rev)))
}),
);
parse_args!
is a macro designed to flatten args automagically and throw appropriate
type errors.
Porting it all to the web
After I made a basic working prototype, I realized I used almost 0 dependencies to run the code,
only relying on rand
and thiserror
.
Two crates which are wasm compatable! I quickly found a js terminal emulator, and got to work.
I quickly realized to not block the browser from functioning, I would need to wrap the code in
an async
layer,
pretty easy right? Just make each function async, slap #[async_recursion]
and call it a day right?
Actually, that’s almost true, except for two things.
I had to write the input and printing functionality for the terminal
I had to make all the NativeFunctions async
To spare you the tiring process of figuring out how that would work, I’m just going to show you how it worked.
#[derive(Clone)]
enum Function<'a> {
NativeFunction(
Rc<
dyn for<'b> Fn(
Vec<Ast<'a>>,
&'b mut Environment<'a>,
)
-> LocalBoxFuture<'b, Result<Option<Value<'a>>, RuntimeError<'a>>>,
>,
),
Function(Vec<&'a str>, Vec<Ast<'a>>),
}
That is a crazy type definition, with futures, high order lifetimes, and so many angle brackets it’s hard to count. The declaration of native functions also got more complex
functions.insert(
"REVERSE".into(),
Function::NativeFunction(Rc::new(|args, env| {
Box::pin(async move {
parse_args!("REVERSE", env, args, Value::String(string), Type::String);
let rev = string.chars().rev().collect();
Ok(Some(Value::String(rev)))
})
})),
);
I was actually too lazy to continue to write this each time, and ended up converting it into a macro, making the function definition as clean as the sync version.
functions.insert(
"REVERSE".into(),
native!(args, env, async move {
parse_args!("REVERSE", env, args, Value::String(string), Type::String);
let rev = string.chars().rev().collect();
Ok(Some(Value::String(rev)))
}),
);
If you’re on pc, try out the online interpreter or look at the source code.