Wee language design

The Wee language is part of an educational programming environment aimed at novice programmers. Please refer to this series of articles on motivations, influences, etc.

Wee language design

Goals

Program structure

Open problems

Types and operators

Primitive types

Literals

Open Problems

Strings

Literals

Open problems

Lists

Literals

Open problems

Dictionaries

Literals

Open problems

Records

Definition

Literals

Open problems

Tagged unions

Functions

Literals

Open problems

Operators & expressions

Variables and assignments

Default value initialization

Open problems

Control flow

If

While

For

Match

Named functions and overloading

Hashing and equality

Error handling

Memory management

Core library

Syntax grammar

Unsorted problems and ideas

Aliasing to reduce having to repeat access chains

Discussion

Passing elements of mutable collections as arguments is unsafe

Discussion


Goals

The target user of Wee is the novice programmer without prior exposure to programming. From this we derive Wee’s design goals.

To achieve these goals, Wee is designed as an imperative structured programming language with support for some functional elements.

Wee will not incorporate the following features:

Wee may incorporate the following features in the future:

Program structure

A Wee program consists of one or more modules. Each module is a compilation unit stored in a single file in UTF-8 encoded source form, and consists of:

Types and functions can optionally be exported for use in other modules by prepending their declaration with export. When compiling one or more modules into an executable, the user must specify the entry point module. The module-level code of this entry point module becomes the main driver of the program.

A module declares its dependency and imports exported types and functions of other modules via an import statement. Once imported, all exported types and functions of the imported module are available to the code in the importing module. To avoid name clashes, an import may be assigned an optional prefix that needs to be specified when accessing the module’s exported types and functions.

Wee provides a set of core built-in types and functions that are always available without import.

Wee’s standard library for areas like i/o is contained in modules which need to be imported.

Example:

import “foo”

import “bar” as blargh

‘ Function moo of module foo does not require a prefix

moo()

‘ Function bar of module bar requires a prefix

blargh.bar()

Open problems

Types and literals

Every type has a type name, indicated in this document via the use of a monospaced font, e.g. nothing. These type names are used when specifying the types of variables, arguments, return values, fields, elements, dictionary keys, dictionary values, pointers, or types participating in a tagged union.

Every type has one or more literal forms that create a value of the type, e.g. 123, or “Hello world.”.

Primitive types

Wee supports the following primitive types.

The type nothing is special in that it can not be used in all places other type names can be used. The type name can be used to explicitly specify that a function does not return a value (e.g. the signature (int) -> nothing of a function that takes an int and returns “nothing”). The type name can also be used in a tagged union as an option to signal “no value” is part of the union (e.g. b | c | nothing, for a union that can be an a, b or a nothing).

See core library for information on built-in functions to work with primitives.

Literals

The nothing literal can only be used when assigning “no value” to a tagged union of which nothing is a part of, or when matching against nothing in a match statement.

Values of the boolean type can be expressed by the literals true and false.

Values of the int type can be expressed by decimal literals (32342), hexadecimal literals (0x3f4a) and binary literals (0b0011011).

Values of the float type can be expressed by decimal literals (234.443, 0.65, but not .54).

Open Problems

Strings

The immutable string type stores a sequence of UTF-8 code points. String data can be accessed as individual bytes, or as Unicode code points, both returned as int.

Strings are an opaque data type; their content can only be accessed and operated on through core library functions and operators supporting the string type.

See core library for information on built-in functions to work with strings and their data. See the memory management section for information on string memory management.

Literals

Values of the string type can be expressed by double quoted literals, e.g. “Hello world.”. String literals may contain the following escape sequences:

String literals support interpolations of the form “Hello ${expression}.”. The interpolations are evaluated to a value. If the value is not of type string, then the toString() function for the type is called and the value is converted to a string.

Open problems

Lists

The dynamically sized list<elementType> type stores zero or more values of the specified elementType. Lists can not store the nothing type, except if it is part of a tagged union.

Elements in the list are stored sequentially and indexed by an integer number. The first element is located at index 0, the last element is located at index length(list) - 1 (where length() is a core library function).

All indexed accesses to list elements via the list indexing operator are runtime checked. Access of indices outside the bounds of the list throws an exception.

See core library for information on built-in functions to work with lists and their elements. See the memory management section for information on list memory management.

Literals

Values of the list type can be expressed by different literals depending on the use case.

Open problems

Dictionaries

The dynamically sized dictionary<keyType, valueType> type stores zero or more key/value pairs of the specified keyType and valueType. Dictionaries can not store the nothing type, except if it is part of a tagged union.

The dictionary is implemented as a hash map. Keys are hashed and equality tested using the hashing and equality functions of their respective type. See the hashing, equality and toString() section.

All accesses to values via the key access operator are runtime checked. Accessing a value through a key that is not contained in the dictionary results in an exception being thrown.

See core library for information on built-in functions to work with dictionaries and their key/value pairs. See the memory management section for information on dictionary memory management.

Literals

Values of the dictionary type can be expressed by different literals depending on the use case.

Open problems

Records

The user-defined record type has zero or more fields. Each field has a name unique within the record and stores a value of a specific type. A field can not store the nothing type, except if it is part of tagged union.

Records can be nested but must not be recursive.

By default, all fields of a record are visible to and mutable by any code, except if the record is passed to a function as immutable. A field’s visibility can be limited to the module its record is defined in via the hidden keyword. Write access to a field can be restricted to the defining module via the readonly keyword

Records marked with the keyword export will be exported from the module and accessible by other modules.

Definition

A record type must be defined as a top-level module statement.

record Vector2 { x: float, y: float }

export record Monkey {

   ‘ only code in the module of Monkey can modify name

   readonly name: string

   position: Vector2

   ‘ only code in the module can see and access the bananas

   hidden bananas: list<Banana>

}

An exported function may only refer to exported records in its arguments and return type. An exported record may only refer to other exported records.

Literals

Values of the record type can be created by a named constructor function. The compiler will generate one constructor function in the module of a record type:

fun RecordType() -> RecordType

The function will be exported if the record type itself is exported.

export fun RecordType() -> RecordType

A user can hide the generated constructor function of an exported type from other modules. The user must not specify a function body:

hidden fun RecordType() -> RecordType

A user can create additional constructor functions with arguments that use the generated constructor function to create the initial record value and then use the arguments to initialize the value’s fields.

‘ hide default constructor from other modules

hidden fun Vector2() -> Vector2

‘ Custom constructor function

export fun Vector2(x: float, y: float) -> Vector2 {

   var v = Vector2()

   v.x = x

   v.y = y

   return v

}

Tagged unions

A tagged union consists of one or more other types. The list of types can include nothing to signal optionality of a value. The list of participating types is the type name.

‘ A union that can store an int, float or no value.

int | float | nothing

‘ A union that can store a list or a dictionary

list<int> | dictionary<string, int>

Tagged unions do not have a literal. Before accessing the value of a tagged union, it must be type matched via the match statement.

Pointer

A pointer

Functions

A function is an opaque data type “pointing” to a function body, with a signature consisting of zero or more typed arguments, and a single typed return value. The function signature is the type name.

‘ a function type with 2 int arguments, returning nothing

(int, int)

‘ a function type with 2 arguments returning an int

(string, list<int>) -> int

‘ a function type with 1 mutable argument returning an int

(mut float) -> int

All arguments are passed as immutable by default, allowing only read access on the passed in values (recursively in case of lists, dictionaries, records and tagged unions).

Arguments must be explicitly marked as mutable with the mut keyword, in which case the passed in value can be mutated in-place freely (recursively in case of lists, dictionaries, records and tagged unions).

If the return type is omitted, the function is assumed to return a value of type nothing.

Functions can be created anywhere in the code with a function literal and assigned to variables of a function type, e.g.:

var a: (int) -> int = (a: int) -> int { return a + 1 }

Literals

A function literal consists of an argument list, an optional return type and a function body.

(a: int) -> int { return a + 1 }

If the types of arguments and the return type can be inferred, they can be omitted from the literal.

var a: (int) -> int = (a) -> { return a + 1 }

Named functions and overloads

Functions can be named and exported from a module:

fun myInternalFunc() { … }

export myExportedFunc () { … }

Functions can be overloaded by varying the argument number & types as well as return type.

fun add(a: int, b: int) -> int { return a + b }

fun add(a: float, b: float) -> float { return a + b }

Open problems

Operators & expressions

The following operators are defined.

The operator precedence is the same as in Java (where applicable). Sub-expressions may be grouped by ().

Default value initialization

When a value is created it is initialized to a default value.

Open problems

Variables, bindings, and assignments

Control flow

If

Evaluates a boolean expression and executes the first branch if the expression is true, and the second branch (else) otherwise. The second branch including the else keyword may be omitted.

if a > 10 then {

   print(“It’s greater than ten!”)

} else if a < 5 then {

   printf(“It’s smaller than 5”)

} else {

   print(“It’s just right”)

}

Else if is not a special construct, but just a side effect of allowing single statement blocks for branches.

While

While evaluates a boolean expression and executes a block for as long as the expression is true.

var a = 0

while (a < 5) repeat {

   print(“Smaller than 5”)

   a = a + 1

}

For

Match

The match statement takes a tagged union value and requires the user to handle it possible type exhaustively.

var v: int | float | string | nothing = 1.23

match(v) {

   int: v = v + 1

   float: {

      print(math.cos(v))

   }

   string: print(v)

   nothing: ‘ no value, do nothing

}

Open problems

Hashing, equality and toString()

For every type someType, Wee automatically generates three functions in the module the type was defined in:

The equals() function is called when two values are compared via the equal (==) or not equal (!=) operator.

The hash() function is used by the dictionary type to hash keys.

A user can replace the auto-generated functions by manually adding the functions to the module the corresponding type is defined in. When replacing the default implementations of hash or equals, a user must always implement both. The custom implementations must follow this contract:

Open problems

Error handling

Wee uses a return value based error handling mechanism similar to Rust. Fatal errors a program can not recover from can be raised by calling the core library error() function. This function will terminate the program and optionally print a stack trace.

Recoverable errors can be signaled using tagged unions and an Error record type.. The core library APIs will be exemplify the recommended way of defining errors. However, it is up to the user how they want to signal errors.

In general, recoverable errors will look like this:

fun thisMayError() -> resultType | Error {

   return Error(“Something bad happened”)

}

var result = thisMayError()

match(result) {

   resultType -> …

   Error -> print(“Oh no: ” + result.message)

}

Open problems

Memory management

Values in Wee are stack allocated. Their memory is reclaimed as soon as the scope within the value has been defined in ends. Memory reclamation can thus rely on a simple FIFO stack in a VM, or use the native stack when compiling to native code.

Strings, lists and dictionaries are dynamic in size and could exhaust the available stack space if allocated on the stack. The data of these types (UTF-8 bytes, list elements, dictionary keys and values) are thus allocated on the heap. The life-time of that heap memory is still bound to the stack-side representation of the value.

Lists and dictionaries are represented on the stack with opaque structs that store a pointer to the heap data. Each list and dictionary has its own unique heap memory. There is no aliasing of list or dictionary heap memory! When the stack-side representation of a list or dictionary dies because it goes out of scope, the Wee runtime can also release the corresponding heap memory.

Strings work similarily, however, their immutability allows us to share the heap data between multiple string values. The heap-side data stores a reference count as well as the UTF-8 bytes. When assigning a string value to another string value, the reference count is increased, and the pointer to the heap data is shared by the two stack-side representations of the strings. This also works if a sub-string is created. If a stack-side string dies due to going out of scope, the reference count is checked. If the count is zero, the heap-side data can be reclaimed.

Core library

Wee’s core library consists of built-in functions accessible to all modules without an implicit import. The core library functions allow working with built-in types like lists, dictionaries and strings.

Syntax grammar

Unsorted problems and ideas

Rephrasing stack and heap values (value types and reference types) as a life-time and shareability issue

In Wee’s current design, the life-time of a value is that of the scope it has been created in. There is no way for a value to outlive its scope (there is however an issue with values in lists/dictionaries dying before aliases to them go out of scope). Assigning a value to something creates a copy, same for returning a value.

Wee also lacks explicit reference types. Under the hood, arguments may be passed by reference, e.g. because they are big (collections) or mutable (mut keyword). Another under-the-hood reference is created in for-each loops, when a reference to a collection element is taken. In both cases, the implicit references are scoped to either the function or the loop, and can not escape or form aliases.

Every value is unique, or rephrased, has its own storage location (string memory management is an implementation detail, like scoped implicit references).

Finally, fields are stored consecutively inside their record, so the shape of a record, or rephrased, the size of its storage, is always statically known.

All these design decisions have an impact on what types of programs can be, and more importantly, can not be expressed (or only expressed in a very cumbersome way). It can also have a negative effect on performance, as copying large values is an expensive operation. The consequences in short:

The first consequence can be viewed as a virtue: it makes reasoning about state (but not necessarily about performance) easier. There are however down-sides: some values like resources (images, GPU resources, …) would lend themselves well to sharing.

The second consequence can be worked around in most cases. E.g. instead of returning a (copy) of a big list, the user can pass a list into a function by reference.

The third consequence is the most severe, as it prohibits certain types of programs, or makes writing them very cumbersome and error prone. The best example are graph-like structures, which could not be expressed as easily as with some form of reference type (in conjunction with nothing to signal optionality).

Wee and its users might therefore profit from a type that allows a value to outlive its scope, and which can be shared by multiple owners.

Discussion

Mario: For the sake of discussion, the type name for references is ref. I think this is a bad name, maybe it could be shared? Something that would both say “this value outlives its creation scope”, “this value can share the value with others”, and “this value dies when nobody points to it anymore” (internally, we’d use a GC, not ref-counting, so we can do arbitrary graphs without the burden of introducing weak references.)

Reference types have a single literal: ref<valueType>(value). E.g.

var m = ref(monkey())

This would create a monkey that can outlive its current scope (rephrased: it’s stored on the heap). The type parameter is infered. The variable m would have the type ref<monkey>, and generally behave like a scoped value.

m.name = “Alfred” ‘ accessing the value works as usual

var localMonkey = monkey() ‘ a scoped monkey

If (m == localMonkey) {

   ‘ passing the value to a function is trivial as well
  ‘ foo must has the signature (v: ref<monkey>) though!

   foo(m)

}

m = localMonkey ‘copies the value of localMonkey to m

The differences to a scoped value become apparent when we have more than one ref<monkey> point to the same unscoped value.

var m = ref(monkey()) ‘ one unscoped monkey

Var n = ref(monkey()) ‘ two unscoped monkeys

m = n ‘ copies the monkey of n to m

m -> n ‘ now both m and n point to the same monkey!

However, we couldn’t let a ref reference a scoped value, that’d just be nasty. If you use scoped values, you have to deal with them like a good, copying citizen.

And monkeys can now escape functions as well.

fun findMonkey(name: string, monkeys: list<ref<monkey>>) -> ref<monkey> | nothing {

   for each m in monkeys

      if (m.name == name) return m

   return nothing

}

var monkeys = [ref(monkey(name: “Alfred”)), ref(monkey(...)), ref(monkey(...))]

var alfred = findMonkey(“Alfred”, monkeys)

match(alfred) {

   monkey -> { print(“Found Alfred.”) }

   nothing -> { print(“Alfred is missing.”) }

}

And the imo biggest gain: we can do graphs!

record node {

   value: int

   next: ref<node|nothing> ‘ initialized to nothing

}

Since whether something is an unscoped or scoped value is decided use-site, the root of our linked list can even be scoped (which is of course a debatable choice).

var list = node()

list.value = 1

List.next = ref(node())

What I like about this approach:

What I don’t like about this approach:

Overall, I’m sort of happy with this though. The only way to alias scoped values is through argument passing and forEach (and maybe I just disallow forEach on collections storing scoped values).

Aliasing to reduce having to repeat access chains

In-place updates or access to “sub-values” of nested lists (element), dictionaries (key and or value), and records (fields), can be cumbersome without explicit references. E.g.:

list[1].a.b.x = 10

list[1].a.b.y = 7

list[1].a.b.z = 8

where b is a 3 dimensional vector record.

Since the assignment operator creates a copy, this code won’t work:

var v = list[1].a.b ‘ creates a copy of b

v.x = 10 ‘ the copy is modified, not b

v.y = 7

v.z = 8

In the current design, there are two ways to solve this. The first one is to provide a setter function, so the access chain only has to be specified once:

fun set(v: mut vector, x: float, y: float, z: float) {

   v.x = x

   v.y = y

   v.y = z

}

set(list[1].a.b, 10, 7, 8)

Alternatively, if no function is available to manipulate multiple fields of the record at once, a function literal can be used:

(v: mut vector) {

   v.x = 10

   v.y = 7

   v.z = 8

} (set(list[1][“someKey”].a.b)

In both cases, a scoped, shorthand alias is created through the first argument, through which the value can be modified.

There are a few problems with this approach:

Discussion

Mario: I could imagine a locally scoped aliasing mechanism:

alias v = list[1].a.b

v.x = 10

v.y = 7

v.z = 8

The alias could only be assigned to once. It would suffer from the same unsafety as the above solutions, but is much less cumbersome and might be easier for a novice to understand. See http://mustoverride.com/ref-locals_single-assignment/

Passing/aliasing elements of mutable collections is unsafe

Arguments are passed by-reference (implementation detail). Consider the following code:

‘ module-global mutable list accessible by all module functions

module var l = [vector(), vector(), vector()]

foo(l[1])

fun foo(v: mut vector) {

   l.removeAt(1) ‘ invalidates v

   v.x = 1       ‘ either segfault, or write to old l[2]

}

Another variation:

fun bar() {

   var l = [vector(), vector(), vector()]

   foo(l, l[1])

}

fun foo(l: mut list<vector>, v: mut vector) {

   l.removeAt(1) ‘ invalidates v

   v.x = 1       ‘ either segfault, or write to old l[2]

}

And a variation using for each:

var l = [1, 2, 3, 4]

for each v in l { ‘ v is an alias

   l.removeAt(0)

   v.x = 1 ‘ either segfault or write to old l[2]

}

Discussion

Mario: In the second case, we could statically prevent users from being able to passing both the list and one of its elements to the function. But that seems more like a bandaid.

In the first case there’s no static way to know that foo() invalidates v by mutating l. Functions using module globals might have to be annotated with “this thing uses a global”, the we can apply the same restriction as proposed for the second variation.

In the third case, we could “freeze” collections for the scope of the alias to an element within the collection. Freezing would prevent resizing the collection. Again, I don’t think this can be statically determined. Also, if you alias an element in a module global in module-level code, and that alias never goes out of scope, the module global can’t be resized anymore at all.

Maybe I should remove/change for each from the language for the time being. E.g.

For each index in l {

   var v = l[index] ‘ copy

   v = v + 1 ‘ mutate

   L[index] = v ‘ write back

}