Language Guide

A crash course in the particular workings of microscheme

Learning Scheme

The existing wealth of tutorials and crash-courses in Scheme are really very good, and I shall not attempt to better them. For example, Teach Yourself Scheme in Fixnum Days. On the other hand, I have included enough detail that the ambitious hacker could reasonably learn a lot by tinkering with the examples, and refering to this guide. I do recommend that the novice reader follows at least some general introduction to functional programming—which I cannot provide— before proceeding.

Fundemental Forms

A microscheme program is a list of expressions, which are evaluated in order when the program runs. Each expression—except for constants like 4, #t and "hello"—takes one of the ten fundemental forms described here. Each fundemental form is composed of a pair of parentheses (brackets), containing keywords, lists and subexpressions. Each subexpression must also be a constant or fundemental form, and so on.

  • Abstraction: (lambda (X Y Z …) B …)

    Every fundemental form is surrounded by a pair of parentheses (brackets). The lambda form produces a procedure (a computational unit which models some function). Its name reflects that fact that it represents a lambda abstraction as found in Lambda Calculus. The keyword lambda is followed by a list of variable names, which are the arguments of the procedure; as well as one or more expressions which form the body of the procedure. When the procedure is applied (i.e. called, invoked) the body expressions are executed in order, and the value of the final expression is returned. Procedures produced in this way are intrinsically anonymous in Scheme, and they are first-class values. This means that the lambda form evaluates to an thing representing the procedure, which is not automatically given a name. We can bind that thing to a name using (define ...), but we are not obliged to. Instead, we could pass it directly to another procedure, or return it from an enclosing procedure. For example, (lambda (x) (+ x 1)) represents a procedure which takes one argument, assumes it has type number, and returns the number 1 higher.

  • Application: (<procedure> A B C …)

    The procedure application form does not contain any keywords. It is composed by writing an expression <procedure>, followed by some number of arguments, all inside a pair of parentheses. The <procedure> expression must evaluate to a procedure in some way. That means it can either be a primitive procedure name (as listed below), a lambda expression, a variable name, or another procedure application. If you give a variable name, then that variable must be bound to a procedure by the time the application is reached. If you give another procedure application as <procedure>, then that application must return something of type procedure. In any case, the number of arguments (A B C …) given must match the number of arguments expected by <procedure>. ‘+’ is the name of a primitive procedure, taking two arguments. Hence, (+ 3 7) is a valid procedure application which evaluates to 10. Since expressions of any complexity can be used as the two arguments to +, and likewise for other math operators, this form can be used to write any arithmetic expression in prefix notation. To give a richer example, the code given for the lambda form above is a valid expression, which evaluates to something of type procedure, expecting a single numeric argument. So, we can form an application like this: (<example from above> <any numeric argument>). Writing it out in full gives: ((lambda (x) (+ x 1)) 5) which evaluates to 6.

  • Definition: (define <name> <expr>)

    The two forms we've seen so far are actually powerful enough to express any program that our electronic computers can compute. (See wikipedia, of course) But, in practice, we can get a lot more done if we give names to things, and use them over and over again. The define form takes a variable name and an expression. It evaluates the expression, and binds the result to the given name (i.e., it stores the result in the variable <name>). From that point onwards, <name> refers to the thing produced by the expression, which could be of any type, even procedure.

    In microscheme, definitions are only allowed at the top-level. Definitions within the body of some form must be achieved using (let …).

    Combining the three forms above, we can write the following program, which results in the variable 'theothernumber' being bound to the value 6.

    (define plusone (lambda (x) (+ x 1)))
    (define thenumber 5)
    (define theothernumber (plusone 5))
     
  • Definition (again): (define (<proc> X Y Z …) B …)

    Since the pattern for defining a named function: (define <procname> (lambda (…) …)) is so frequently used in Scheme programs, a shorthand notation is provided for it. The first definition of the program above can be rewritten as (define (plusone x) (+ x 1)). Under Scheme's semmantics, these expressions are precisely equivalent. It is (slightly) important that the programmer realises that this is a library form, i.e. it is compiled just as a lambda inside a define form.

  • Assignment: (set! <name> <expr>)

    Assignment looks just like definition, but with the ‘set!’ keyword instead of ‘define’. This is used to change the value to which some variable name is bound. That could be a global variable, which is introduced by (define …), a procedure argument, or a local variable introduced by (let …) The set keyword includes an exclamation mark to remind you that it is changing the state of the system; and this is why Scheme is considered not to be purely functional.

  • Conditional: (<predicate> <consequent> <alternative>)

    The conditional form takes at least a predicate and a consequent, and optinally an alternative. Each of these are expressions of any kind. If the predicate evaluates to true (In Scheme, anything other than false, denoted #f, counts as true) then the consequent will be evaluated. If the predicate evaluates to false, and an alternative is given, then it will be evaluated. This is subtly different from the conditional branches of imperative programming. As well as making a decision about which expression to evaluate, the conditional itself inherits the value of whichever branch is chosen. This means you can use the whole expression as a subexpression, whose value depends on the predicate. e.g. (+ 1 (if (= 2 3) 7 13)) evaluates to 14.

  • Conjunction: (and A B C …)

    The conjunction form takes any number of arguments, each of which is a subexpression. It will evaluate those expressions in order. If it reaches one that evaluates to false (denoted #f), then it will stop and return #f. If none of them evaluates to #f, then the value of the final expression will be returned (remember, anything other than #f is considered true). Using this form with zero arguments is equivalent to the true constant #t.

  • Disjunction: (or A B C …)

    Like conjunction, this disjunction form evaluates each of its arguments in order. If any one of them evaluates to anything other than #f, it stops and returns that value. If it reaches the end of the list, and every expression evaluated to #f, then it returns #f. Using this form with zer oarguments is equivalent to the false constant #f.

    When used with Boolean type values, the conjunctive and disjunctive forms work just like boolean operators in imperative languages. (or #f #t #f) evaluates to #t and so forth. In Scheme, however, these forms perform a much more powerful function. Since they are variadic, and will keep evaluating until a false or true subexpression is reached respectively, they can be used as control-flow mechanisms in place of nested (if …) forms.

  • Local Binding: (let ((a X) (b Y) (c Z) …) B …)

    The let form is used to bind names to values only for a specific part of the program. The first argument to let is a list of binding pairs. Each binding pair is a pair of brackets containing a variable name and an expression. The expressions that are given as the body ‘B …’ are evaluated with those names bound to their corrseponding values, and the value of the final expression is returned. Those bindings do not persist outside of the let form. For any code outside of the let's parentheses, the variables a b c … are unchanged, and may not be defined at all.

    Important nuances:

    1. Even if you only give one binding pair, the parentheses around the list of binding pairs is still needed. Hence, you end up with double brackets: (let ((x 5)) (+ x 1)). Missing those is a common mistake.
    2. The variable bindings apply in the body, but not within other binding pairs in the list. i.e., the expression Y should not rely on X being bound to a.

  • Sequence: (begin B1 B2 …)

    Finally, you can group together expressions with the sequencial form, using the begin keyword. The whole thing is treated as one expression, whose subexpressions are executed in sequence. As usual, the value of the final subexpression is returned for the overall expression.

    You can use this in cases where you want to guarantee a group of expressions will be evaluated, or where you want to give multiple expressions in a context where only one is expected.

    (+ 1 (begin 2 4 6)) evaluates to 7. The subexpressions 2 and 4 are evaluated, but they have no effect. 6 is evaluated and returned to the outer + procedure.


Primitive Procedures

Primitive procedures are procedures that are built-in to the language. This means that the compiler produces efficient low-level routines for them.

Unlike full-blown Scheme, microscheme primitives are not first-class. i.e., they can only appear in the function application form. This is a problem when you want to pass a primitive function as the argument to a higher-order function such as map. For example, you may want to invert a list of Booleans: (map not list_of_booleans).

The solution is to make a simple wrapper-function which is first-class but performs the same function as the primitive you want to work with: (define (not* x) (not x)).

Then, you are free to use it as a value: (map not* list_of_booleans). This might seem annoying, but it is not without good reason. Making all primitive functions first-class would tie up around .5 KB of RAM. On the arduino, RAM is precious. This compromise ammounts to you, the programmer, telling the compiler exactly which primitives need to be loaded into RAM. For the vast majority of programs, this ammounts to a massive memory saving.

Available Primitives

The primitive procedures built-in to compiler version 0.6 are:

  • =, >, >=, <, <=, not
  • +, -, *, div, mod, zero?
  • number?, pair?, vector?, procedure?, char?, boolean?, null?
  • cons, car, cdr, set-car!, set-cdr!
  • list, vector
  • vector-length, vector-ref, vector-set!
  • assert, error
  • include, stacksize, heapsize, pause
  • serial-send, digital-state, set-digital-state

Type System

Microscheme has a strong dynamic type system. It is strong in the sense that:

  • All values have a specific, definite type
  • No type coersion occurs
  • Procedures are generally valid for a specific set of types
  • Type exceptions are raised when procedures are applied to values of the wrong type

It is dynamic in the sense that a variable is not restricted to hold values of a certain type. The type of value to which a variable name will be bound is not known until runtime, and can change as the program progresses.

The built-in types are: Number, Char, Boolean, Pair, Vector, ‘The Empty List’ aka null, which is said to have a type of its own, and Procedure. From these basic types we can infer compound types. A List is defined to be something of the type null or pair where the value of the cdr field has type List. This definition is effectively implemented by the (list? …) function in the ‘list.ms’ library.

Even though the built-in numeric type is fairly restricted (15-bit unsigned integer), a richer numeric stack can be built using combinations of pairs, vectors and numbers. For example, the 'xtypes.ms' library provides types long and fp, which represent 8-digit unsigned integers, and 4+4 digit fixed-point real numbers respectively.

For every type, there is a predicate function which answers the question 'is this value of type X'. These predicates are consistently formed by appending a question mark to the type name. For example, (number? 4) evaluates to #t. (boolean? 4) evaluates to #f. (boolean? (number? 4)) evaluates to #t.

Procedures for converting between types are formed with an arrow between the type names, e.g. (vector->list a b). These conversions are not provided for many types, but they can be written manually.


Microscheme Libraries

Microscheme supports the (include …) primitive, which effectively loads the whole contents of another file into the program. This allows commonly used program segments to be saved in 'libraries' that can be included in any other program. Typically, libraries contain definitions, but do not perform any input or output, so including them simply makes a set of procedures and data structures available to the program. Some useful libraries are included with microscheme, and more will become available as the project matures:

libraries/io.ms provides digital Input/Output functionality. This allows you to work with the Arduino's digital I/O pins, using the indices given to them on the arduino board. It provides the procedures:

  • (set-ddr N X) to set the DDR (data-direction-register) for a pin. N is the pin number. X is #f for ‘input’ and #t for ‘output’.
  • (get-ddr N) returns a boolean representing the DDR value for pin N. #t means ‘output’.
  • (set-pin N Y) sets the value (high or low) for pin N.
  • (set-pin N Y) gets the value (high or low) of pin N.

libraries/list.ms provides various functions for working with lists, which are linear data structures built using pairs and null. Procedures provided include:

  • (list? X) returns true if and only if X is a list.
  • (reverse X) if X is a list, returns a new list which is the reverse of it.
  • (map P X) returns a list formed by performing procedure P on every element of list X.
  • foldr, foldl, for-each, all various common higher-order list procedures.
  • (vector->list V) returns a list whose elements are identical to those of vector V.
NB: the primitive (list …) for building lists is built-in, and implemented efficiently by the compiler.

libraries/long.ms provides an implementation for 8-digit unsigned integers:

  • (long hi lo) forms a long where hi represents the high four digits and lo represents the low four digits of the number. The number 994020 is produced by (long 99 4020).
  • (hi X) and (lo X) extract the high and low parts of a long.
  • (long? X) returns true if X is a valid long. Warning: any pair of numbers will satisfy this.
  • l+ l- l* l/ standard arithmetic operators. (NB: l* and l/ are slow, software-based implementations.
  • l++ l-- l** are in-place versions of l+ l- and l*. i.e. (l++ lX lY) is equivalent to (set! lX (l+ lX lY)), but allocates no new memory. You should use these operators wherever possible
  • l= l< l> l<= l>= standard numeric comparators.

libraries/fixedpoint.ms provides an implementation for 5+5 digit unsigned fixed-point reals:

NB: including the xtypes library has the same effect as including long and fixedpoint individually, but saves memory by taking advantage of the overlap between their functions.


Compiler Errors

As of version 0.6, build 230, the possible compile-time errors are:

0  Out of memory
1  Char buffer full
2  while lexing the file '%s'. File could not be opened
3  Comment before end of token
4  Extraneous )
5  Missing )
6  Procedure '%s' is primitive, and cannot be used as a value
7  Non-identifier in formal argument list
8  Malford lambda. No formals given
9  Wrong number of operands to IF form
10 First operand to SET should be IDENTIFIER
11 Wrong number of operands to SET form
12 Wrong number of operands to DEFINE form
13 Non-identifier in formal argument list
14 First operand to DEFINE should be IDENTIFIER or PARENS
15 Definition not allowed here
16 Malformed Binding
17 Malformed LET?
18 First operand to INCLUDE should be STRING
19 Wrong number of operands to INCLUDE form
20 Unknown parenthesized form
21 Unknown form\n
22 Unexpected list of expressions
23 NOT IN SCOPE %s
24 Integer constant too large
25 Freevar refs of degree > 1 not supported yet
26 No primitive P taking N arguments
27 Internal Error

Runtime Exceptions

Like Scheme, microscheme is strongly, dynamically typed. Exceptions are semmantic errors that arise at runtime. Microscheme makes use of the Arduino's built-in LED on digital pin 13 to give on-device indications of these situations. Generally, exceptions are not recoverable, and the device will need to be reset if an exception is raised. While it is possible to use digital pin 13 for general input and output, it is highly recommended to leave it free for exception indication.

StatusMeaningIndication
RUNProgram RunningNo Light
NVPNot a Valued ProcedureSingle Flashes
NARNumber of ARguments2 Flashes
NANNot A Number3 Flashes
NAPNot A Pair4 Flashes
NAVNot A Vector5 Flashes
OOBOut Of Bounds6 Flashes
DBZDivide By Zero7 Flashes
ERRCustom ExceptionContinuous Flashes
HALTProgram CompletedContinuous Light
Exception Details

NVP: A procedure application takes the form (proc X1 X2 ... Xn) where proc is an expression. At the time of application, if proc does not evaluate to a (valued) procedure, such as the result of a (lambda …) form, or a variable bound to a procedure, then NVP will be raised.

NAR: A procedure application takes the form (proc X1 X2 ... Xn) where X1 X2 ... Xn are arguments. At the time of application, if proc evaluates to a procedure taking m arguments, but m ≠ n, then NAR will be raised.

NAN: Indicates that an arithmetic operator (+, -, *, /, div, mod) received an argument that did not evaluate to a number.

NAP: Indicates that a pair operator (car, cdr, set-car!, set-cdr!) received an argument that did not evaluate to a pair.

NAV: Indicates that a vector operator (vector-ref, vector-set!) received an argument that did not evaluate to a vector.

OOB: Indicates that a vector operator (vector-ref, vector-set!) received an index that was outside the dimensions of the vector given.

DBZ: Indicates an attempt to divide by zero.

ERR: This exception is raised manually by the programmer. See (error) and (assert expr) in the language guide.