mirror of
https://github.com/janet-lang/janet
synced 2025-01-25 14:46:52 +00:00
Remove doc markdown and move it to website.
This commit is contained in:
parent
795e7a9de8
commit
e68a889fa9
2
Makefile
2
Makefile
@ -160,7 +160,7 @@ dist: build/janet-dist.tar.gz
|
||||
|
||||
build/janet-%.tar.gz: $(JANET_TARGET) src/include/janet/janet.h \
|
||||
janet.1 LICENSE CONTRIBUTING.md $(JANET_LIBRARY) \
|
||||
build/doc.html README.md $(wildcard doc/*.md)
|
||||
build/doc.html README.md
|
||||
tar -czvf $@ $^
|
||||
|
||||
#########################
|
||||
|
@ -1,6 +0,0 @@
|
||||
Janet is a dynamic, lightweight programming language with strong functional
|
||||
capabilities as well as support for imperative programming. It to be used
|
||||
for short lived scripts as well as for building real programs. It can also
|
||||
be extended with native code (C modules) for better performance and interfacing with
|
||||
existing software. Janet takes ideas from Lua, Scheme, Racket, Clojure, Smalltalk, Erlang, Arc, and
|
||||
a whole bunch of other dynamic languages.
|
@ -1,739 +0,0 @@
|
||||
# Hello, world!
|
||||
|
||||
Following tradition, a simple Janet program will print "Hello, world!".
|
||||
|
||||
```
|
||||
(print "Hello, world!")
|
||||
```
|
||||
|
||||
Put the following code in a file named `hello.janet`, and run `./janet hello.janet`.
|
||||
The words "Hello, world!" should be printed to the console, and then the program
|
||||
should immediately exit. You now have a working janet program!
|
||||
|
||||
Alternatively, run the program `./janet` without any arguments to enter a REPL,
|
||||
or read eval print loop. This is a mode where Janet functions like a calculator,
|
||||
reading some input from the user, evaluating it, and printing out the result, all
|
||||
in an infinite loop. This is a useful mode for exploring or prototyping in Janet.
|
||||
|
||||
This hello world program is about the simplest program one can write, and consists of only
|
||||
a few pieces of syntax. This first element is the `print` symbol. This is a function
|
||||
that simply prints its arguments to the console. The second argument is the
|
||||
string literal "Hello, world!", which is the one and only argument to the
|
||||
print function. Lastly, the print symbol and the string literal are wrapped
|
||||
in parentheses, forming a tuple. In Janet, parentheses and brackets are interchangeable,
|
||||
brackets are used mostly when the resulting tuple is not a function call. The tuple
|
||||
above indicates that the function `print` is to be called with one argument, `"Hello, world"`.
|
||||
|
||||
Like all lisps, all operations in Janet are in prefix notation; the name of the
|
||||
operator is the first value in the tuple, and the arguments passed to it are
|
||||
in the rest of the tuple.
|
||||
|
||||
# A bit more - Arithmetic
|
||||
|
||||
Any programming language will have some way to do arithmetic. Janet is no exception,
|
||||
and supports the basic arithmetic operators
|
||||
|
||||
```
|
||||
# Prints 13
|
||||
# (1 + (2*2) + (10/5) + 3 + 4 + (5 - 6))
|
||||
(print (+ 1 (* 2 2) (/ 10 5) 3 4 (- 5 6)))
|
||||
```
|
||||
|
||||
Just like the print function, all arithmetic operators are entered in
|
||||
prefix notation. Janet also supports the remainder operator, or `%`, which returns
|
||||
the remainder of division. For example, `(% 10 3)` is 1, and `(% 10.5 3)` is
|
||||
1.5. The lines that begin with `#` are comments.
|
||||
|
||||
All janet numbers are IEEE 754 floating point numbers. They can be used to represent
|
||||
both integers and real numbers to a finite precision.
|
||||
|
||||
## Numeric literals
|
||||
|
||||
Numeric literals can be written in many ways. Numbers can be written in base 10, with
|
||||
underscores used to separate digits into groups. A decimal point can be used for floating
|
||||
point numbers. Numbers can also be written in other bases by prefixing the number with the desired
|
||||
base and the character 'r'. For example, 16 can be written as `16`, `1_6`, `16r10`, `4r100`, or `0x10`. The
|
||||
`0x` prefix can be used for hexadecimal as it is so common. The radix must be themselves written in base 10, and
|
||||
can be any integer from 2 to 36. For any radix above 10, use the letters as digits (not case sensitive).
|
||||
|
||||
Numbers can also be in scientific notation such as `3e10`. A custom radix can be used as well
|
||||
as for scientific notation numbers, (the exponent will share the radix). For numbers in scientific
|
||||
notation with a radix besides 10, use the `&` symbol to indicate the exponent rather then `e`.
|
||||
|
||||
## Arithmetic Functions
|
||||
|
||||
Besides the 5 main arithmetic functions, janet also supports a number of math functions
|
||||
taken from the C library `<math.h>`, as well as bit-wise operators that behave like they
|
||||
do in C or Java. Functions like `math/sin`, `math/cos`, `math/log`, and `math/exp` will
|
||||
behave as expected to a C programmer. They all take either 1 or 2 numeric arguments and
|
||||
return a real number (never an integer!) Bit-wise functions are all prefixed with b.
|
||||
They are `bnot`, `bor`, `bxor`, `band`, `blshift`, `brshift`, and `brushift`. Bit-wise
|
||||
functions only work on integers.
|
||||
|
||||
# Strings, Keywords and Symbols
|
||||
|
||||
Janet supports several varieties of types that can be used as labels for things in
|
||||
your program. The most useful type for this purpose is the keyword type. A keyword
|
||||
begins with a semicolon, and then contains 0 or more alphanumeric or a few other common
|
||||
characters. For example, `:hello`, `:my-name`, `::`, and `:ABC123_-*&^%$` are all keywords.
|
||||
|
||||
Keywords, symbols, and strings all behave similarly and can be used as keys for tables and structs.
|
||||
Symbols and keywords are optimized for fast equality checks, so are preferred for table keys.
|
||||
|
||||
The difference between symbols and keywords is that keywords evaluate to themselves, while
|
||||
symbols evaluate to whatever they are bound to. To have a symbol evaluate to itself, it must be
|
||||
quoted.
|
||||
|
||||
```lisp
|
||||
# Evaluates to :monday
|
||||
:monday
|
||||
|
||||
# Will throw a compile error as monday is not defined
|
||||
monday
|
||||
|
||||
# Quote it - evaluates to the symbol monday
|
||||
'monday
|
||||
|
||||
# Or first define monday
|
||||
(def monday "It is monday")
|
||||
|
||||
# Now the evaluation should work - monday evaluates to "It is monday"
|
||||
monday
|
||||
```
|
||||
|
||||
The most common thing to do with a keyword is to check it for equality or use it as a key into
|
||||
a table or struct. Note that symbols, keywords and strings are all immutable. Besides making your
|
||||
code easier to reason about, it allows for many optimizations involving these types.
|
||||
|
||||
```lisp
|
||||
# Evaluates to true
|
||||
(= :hello :hello)
|
||||
|
||||
# Evaluates to false, everything in janet is case sensitive
|
||||
(= :hello :HeLlO)
|
||||
|
||||
# Look up into a table - evaluates to 25
|
||||
(get {
|
||||
:name "John"
|
||||
:age 25
|
||||
:occupation "plumber"
|
||||
} :age)
|
||||
```
|
||||
|
||||
Strings can be used similarly to keywords, but there primary usage is for defining either text
|
||||
or arbitrary sequences of bytes. Strings (and symbols) in janet are what is sometimes known as
|
||||
"8-bit clean"; they can hold any number of bytes, and are completely unaware of things like character
|
||||
encodings. This is completely compatible with ASCII and UTF-8, two of the most common character
|
||||
encodings. By being encoding agnostic, janet strings can be very simple, fast, and useful for
|
||||
for other uses besides holding text.
|
||||
|
||||
Literal text can be entered inside quotes, as we have seen above.
|
||||
|
||||
```
|
||||
"Hello, this is a string."
|
||||
|
||||
# We can also add escape characters for newlines, double quotes, backslash, tabs, etc.
|
||||
"Hello\nThis is on line two\n\tThis is indented\n"
|
||||
|
||||
# For long strings where you don't want to type a lot of escape characters,
|
||||
# you can use 1 or more backticks (`\``) to delimit a string.
|
||||
# To close this string, simply repeat the opening sequence of backticks
|
||||
``
|
||||
This is a string.
|
||||
Line 2
|
||||
Indented
|
||||
"We can just type quotes here", and backslashes \ no problem.
|
||||
``
|
||||
```
|
||||
|
||||
# Functions
|
||||
|
||||
Janet is a functional language - that means that one of the basic building blocks of your
|
||||
program will be defining functions (the other is using data structures). Because janet
|
||||
is a Lisp, functions are values just like numbers or strings - they can be passed around and
|
||||
created as needed.
|
||||
|
||||
Functions can be defined with the `defn` macro, like so:
|
||||
|
||||
```lisp
|
||||
(defn triangle-area
|
||||
"Calculates the area of a triangle."
|
||||
[base height]
|
||||
(print "calculating area of a triangle...")
|
||||
(* base height 0.5))
|
||||
```
|
||||
|
||||
A function defined with `defn` consists of a name, a number of optional flags for def, and
|
||||
finally a function body. The example above is named triangle-area and takes two parameters named base and height. The body of the function will print a message and then evaluate to the area of the triangle.
|
||||
|
||||
Once a function like the above one is defined, the programmer can use the `triangle-area`
|
||||
function just like any other, say `print` or `+`.
|
||||
|
||||
```lisp
|
||||
# Prints "calculating area of a triangle..." and then "25"
|
||||
(print (triangle-area 5 10))
|
||||
```
|
||||
|
||||
Note that when nesting function calls in other function calls like above (a call to triangle-area is
|
||||
nested inside a call to print), the inner function calls are evaluated first. Also, arguments to
|
||||
a function call are evaluated in order, from first argument to last argument).
|
||||
|
||||
Because functions are first-class values like numbers or strings, they can be passed
|
||||
as arguments to other functions as well.
|
||||
|
||||
```lisp
|
||||
(print triangle-area)
|
||||
```
|
||||
|
||||
This prints the location in memory of the function triangle area.
|
||||
|
||||
Functions don't need to have names. The `fn` keyword can be used to introduce function
|
||||
literals without binding them to a symbol.
|
||||
|
||||
```lisp
|
||||
# Evaluates to 40
|
||||
((fn [x y] (+ x x y)) 10 20)
|
||||
# Also evaluates to 40
|
||||
((fn [x y &] (+ x x y)) 10 20)
|
||||
|
||||
# Will throw an error about the wrong arity
|
||||
((fn [x] x) 1 2)
|
||||
# Will not throw an error about the wrong arity
|
||||
((fn [x &] x) 1 2)
|
||||
```
|
||||
|
||||
The first expression creates an anonymous function that adds twice
|
||||
the first argument to the second, and then calls that function with arguments 10 and 20.
|
||||
This will return (10 + 10 + 20) = 40.
|
||||
|
||||
There is a common macro `defn` that can be used for creating functions and immediately binding
|
||||
them to a name. `defn` works as expected at both the top level and inside another form. There is also
|
||||
the corresponding
|
||||
|
||||
Note that putting an ampersand at the end of the argument list inhibits strict arity checking.
|
||||
This means that such a function will accept fewer or more arguments than specified.
|
||||
|
||||
```lisp
|
||||
(defn myfun [x y]
|
||||
(+ x x y))
|
||||
|
||||
# You can think of defn as a shorthand for def and fn together
|
||||
(def myfun-same (fn [x y]
|
||||
(+ x x Y)))
|
||||
|
||||
(myfun 3 4) # -> 10
|
||||
```
|
||||
|
||||
Janet has many macros provided for you (and you can write your own).
|
||||
Macros are just functions that take your source code
|
||||
and transform it into some other source code, usually automating some repetitive pattern for you.
|
||||
|
||||
# Defs and Vars
|
||||
|
||||
Values can be bound to symbols for later use using the keyword `def`. Using undefined
|
||||
symbols will raise an error.
|
||||
|
||||
```lisp
|
||||
(def a 100)
|
||||
(def b (+ 1 a))
|
||||
(def c (+ b b))
|
||||
(def d (- c 100))
|
||||
```
|
||||
|
||||
Bindings created with def have lexical scoping. Also, bindings created with def are immutable; they
|
||||
cannot be changed after definition. For mutable bindings, like variables in other programming
|
||||
languages, use the `var` keyword. The assignment special form `set` can then be used to update
|
||||
a var.
|
||||
|
||||
```lisp
|
||||
(var myvar 1)
|
||||
(print myvar)
|
||||
(set myvar 10)
|
||||
(print myvar)
|
||||
```
|
||||
|
||||
In the global scope, you can use the `:private` option on a def or var to prevent it from
|
||||
being exported to code that imports your current module. You can also add documentation to
|
||||
a function by passing a string the def or var command.
|
||||
|
||||
```lisp
|
||||
(def mydef :private "This will have priavte scope. My doc here." 123)
|
||||
(var myvar "docstring here" 321)
|
||||
```
|
||||
|
||||
## Scopes
|
||||
|
||||
Defs and vars (collectively known as bindings) live inside what is called a scope. A scope is
|
||||
simply where the bindings are valid. If a binding is referenced outside of its scope, the compiler
|
||||
will throw an error. Scopes are useful for organizing your bindings and my extension your programs.
|
||||
There are two main ways to create a scope in Janet.
|
||||
|
||||
The first is to use the `do` special form. `do` executes a series of statements in a scope
|
||||
and evaluates to the last statement. Bindings create inside the form do not escape outside
|
||||
of its scope.
|
||||
|
||||
```lisp
|
||||
(def a :outera)
|
||||
|
||||
(do
|
||||
(def a 1)
|
||||
(def b 2)
|
||||
(def c 3)
|
||||
(+ a b c)) # -> 6
|
||||
|
||||
a # -> :outera
|
||||
b # -> compile error: "unknown symbol \"b\""
|
||||
c # -> compile error: "unknown symbol \"c\""
|
||||
```
|
||||
|
||||
Any attempt to reference the bindings from the do form after it has finished
|
||||
executing will fail. Also notice who defining `a` inside the do form did not
|
||||
overwrite the original definition of `a` for the global scope.
|
||||
|
||||
The second way to create a scope is to create a closure.
|
||||
The `fn` special form also introduces a scope just like
|
||||
the `do` special form.
|
||||
|
||||
There is another built in macro, `let`, that does multiple defs at once, and then introduces a scope.
|
||||
`let` is a wrapper around a combination of defs and dos, and is the most "functional" way of
|
||||
creating bindings.
|
||||
|
||||
```lisp
|
||||
(let [a 1
|
||||
b 2
|
||||
c 3]
|
||||
(+ a b c)) # -> 6
|
||||
```
|
||||
|
||||
The above is equivalent to the example using `do` and `def`.
|
||||
This is the preferable form in most cases,
|
||||
but using do with multiple defs is fine as well.
|
||||
|
||||
# Data Structures
|
||||
|
||||
Once you have a handle on functions and the primitive value types, you may be wondering how
|
||||
to work with collections of things. Janet has a small number of core data structure types
|
||||
that are very versatile. Tables, Structs, Arrays, Tuples, Strings, and Buffers, are the 6 main
|
||||
built in data structure types. These data structures can be arranged in a useful table describing
|
||||
there relationship to each other.
|
||||
|
||||
| | Mutable | Immutable |
|
||||
| ---------- | ------- | --------------- |
|
||||
| Indexed | Array | Tuple |
|
||||
| Dictionary | Table | Struct |
|
||||
| Bytes | Buffer | String |
|
||||
|
||||
Indexed types are linear lists of elements than can be accessed in constant time with an integer index.
|
||||
Indexed types are backed by a single chunk of memory for fast access, and are indexed from 0 as in C.
|
||||
Dictionary types associate keys with values. The difference between dictionaries and indexed types
|
||||
is that dictionaries are not limited to integer keys. They are backed by a hashtable and also offer
|
||||
constant time lookup (and insertion for the mutable case).
|
||||
Finally, the 'bytes' abstraction is any type that contains a sequence of bytes. A 'bytes' value or byteseq associates
|
||||
integer keys (the indices) with integer values between 0 and 255 (the byte values). In this way,
|
||||
they behave much like Arrays and Tuples. However, one cannot put non integer values into a byteseq
|
||||
|
||||
```lisp
|
||||
(def mytuple (tuple 1 2 3))
|
||||
|
||||
(def myarray @(1 2 3))
|
||||
(def myarray (array 1 2 3))
|
||||
|
||||
(def mystruct {
|
||||
:key "value"
|
||||
:key2 "another"
|
||||
1 2
|
||||
4 3})
|
||||
|
||||
(def another-struct
|
||||
(struct :a 1 :b 2))
|
||||
|
||||
(def my-table @{
|
||||
:a :b
|
||||
:c :d
|
||||
:A :qwerty})
|
||||
(def another-table
|
||||
(table 1 2 3 4))
|
||||
|
||||
(def my-buffer @"thisismutable")
|
||||
(def my-buffer2 @```
|
||||
This is also mutable ":)"
|
||||
```)
|
||||
```
|
||||
|
||||
To read the values in a data structure, use the get function. The first parameter is the data structure
|
||||
itself, and the second parameter is the key.
|
||||
|
||||
```lisp
|
||||
(get @{:a 1} :a) # -> 1
|
||||
(get {:a 1} :a) # -> 1
|
||||
(get @[:a :b :c] 2) # -> :c
|
||||
(get (tuple "a" "b" "c") 1) # -> "b"
|
||||
(get @"hello, world" 1) # -> 101
|
||||
(get "hello, world" 0) # -> 104
|
||||
```
|
||||
|
||||
### Destructuring
|
||||
|
||||
In many cases, however, you do not need the `get` function at all. Janet supports destructuring, which
|
||||
means both the `def` and `var` special forms can extract values from inside structures themselves.
|
||||
|
||||
```lisp
|
||||
# Before, we might do
|
||||
(def my-array @[:mary :had :a :little :lamb])
|
||||
(def lamb (get my-array 4))
|
||||
(print lamb) # Prints :lamb
|
||||
|
||||
# Now, with destructuring,
|
||||
(def [_ _ _ _ lamb] my-array)
|
||||
(print lamb) # Again, prints :lamb
|
||||
|
||||
# Destructuring works with tables as well
|
||||
(def person @{:name "Bob Dylan" :age 77}
|
||||
(def
|
||||
{:name person-name
|
||||
:age person-age} person)
|
||||
```
|
||||
To update a mutable data structure, use the `put` function. It takes 3 arguments, the data structure,
|
||||
the key, and the value, and returns the data structure. The allowed types keys and values
|
||||
depend on what data structure is passed in.
|
||||
|
||||
```lisp
|
||||
(put @[] 100 :a)
|
||||
(put @{} :key "value")
|
||||
(put @"" 100 92)
|
||||
```
|
||||
|
||||
Note that for Arrays and Buffers, putting an index that is outside the length of the data structure
|
||||
will extend the data structure and fill it with nils in the case of the Array,
|
||||
or 0s in the case of the Buffer.
|
||||
|
||||
The last generic function for all data structures is the `length` function. This returns the number of
|
||||
values in a data structure (the number of keys in a dictionary type).
|
||||
|
||||
# Flow Control
|
||||
|
||||
Janet has only two built in primitives to change flow while inside a function. The first is the
|
||||
`if` special form, which behaves as expected in most functional languages. It takes two or three parameters:
|
||||
a condition, an expression to evaluate to if the condition is true (not nil or false),
|
||||
and an optional condition to evaluate to when the condition is nil or false. If the optional parameter
|
||||
is omitted, the if form evaluates to nil.
|
||||
|
||||
```lisp
|
||||
(if (> 4 3)
|
||||
"4 is greater than 3"
|
||||
"4 is not greater then three") # Evaluates to the first statement
|
||||
|
||||
(if true
|
||||
(print "Hey")) # Will print
|
||||
|
||||
(if false
|
||||
(print "Oy!")) # Will not print
|
||||
```
|
||||
|
||||
The second primitive control flow construct is the while loop. The while behaves much the same
|
||||
as in many other programming languages, including C, Java, and Python. The while loop takes
|
||||
two or more parameters: the first is a condition (like in the `if` statement), that is checked before
|
||||
every iteration of the loop. If it is nil or false, the while loop ends and evaluates to nil. Otherwise,
|
||||
the rest of the parameters will be evaluated sequentially and then the program will return to the beginning
|
||||
of the loop.
|
||||
|
||||
```lisp
|
||||
# Loop from 100 down to 1 and print each time
|
||||
(var i 100)
|
||||
(while (pos? i)
|
||||
(print "the number is " i)
|
||||
(-- i))
|
||||
|
||||
# Print ... until a random number in range [0, 1) is >= 0.9
|
||||
# (math/random evaluates to a value between 0 and 1)
|
||||
(while (> 0.9 (math/random))
|
||||
(print "..."))
|
||||
```
|
||||
|
||||
Besides these special forms, Janet has many macros for both conditional testing and looping
|
||||
that are much better for the majority of cases. For conditional testing, the `cond`, `switch`, and
|
||||
`when` macros can be used to great effect. `cond` can be used for making an if-else chain, where using
|
||||
just raw if forms would result in many parentheses. `case` For looping, the `loop`, `seq`, and `generate`
|
||||
implement janet's form of list comprehension, as in Python or Clojure.
|
||||
|
||||
# The Core Library
|
||||
|
||||
Janet has a built in core library of over 300 functions and macros at the time of writing.
|
||||
While some of these functions may be refactored into separate modules, it is useful to get to know
|
||||
the core to avoid rewriting provided functions.
|
||||
|
||||
For any given function, use the `doc` macro to view the documentation for it in the repl.
|
||||
|
||||
```lisp
|
||||
(doc defn) -> Prints the documentation for "defn"
|
||||
```
|
||||
To see a list of all global functions in the repl, type the command
|
||||
|
||||
```lisp
|
||||
(table/getproto *env*)
|
||||
# Or
|
||||
(all-symbols)
|
||||
```
|
||||
Which will print out every built-in global binding
|
||||
(it will not show your global bindings). To print all
|
||||
of your global bindings, just use \*env\*, which is a var
|
||||
that is bound to the current environment.
|
||||
|
||||
The convention of surrounding a symbol in stars is taken from lisp
|
||||
and Clojure, and indicates a global dynamic variable rather than a normal
|
||||
definition. To get the static environment at the time of compilation, use the
|
||||
`_env` symbol.
|
||||
|
||||
# Prototypes
|
||||
|
||||
To support basic generic programming, Janet tables support a prototype
|
||||
table. A prototype table contains default values for a table if certain keys
|
||||
are not found in the original table. This allows many similar tables to share
|
||||
contents without duplicating memory.
|
||||
|
||||
```lisp
|
||||
# One of many Object Oriented schemes that can
|
||||
# be implented in janet.
|
||||
(def proto1 @{:type :custom1
|
||||
:behave (fn [self x] (print "behaving " x))})
|
||||
(def proto2 @{:type :custom2
|
||||
:behave (fn [self x] (print "behaving 2 " x))})
|
||||
|
||||
(def thing1 (table/setproto @{} proto1))
|
||||
(def thing2 (table/setproto @{} proto2))
|
||||
|
||||
(print thing1:type) # prints :custom1
|
||||
(print thing2:type) # prints :custom2
|
||||
|
||||
(thing1:behave thing1 :a) # prints "behaving :a"
|
||||
(thing2:behave thing2 :b) # prints "behaving 2 :b"
|
||||
```
|
||||
|
||||
Looking up in a table with a prototype can be summed up with the following algorithm.
|
||||
|
||||
1. `(get my-table my-key)` is called.
|
||||
2. my-table is checked for the key if my-key. If there is a value for the key, it is returned.
|
||||
3. if there is a prototype table for my-table, set `my-table = my-table's prototype` and got to 2.
|
||||
4. Return nil as the key was not found.
|
||||
|
||||
Janet will check up to about a 1000 prototypes recursively by default before giving up and returning nil. This
|
||||
is to prevent an infinite loop. This value can be changed by adjusting the `JANET_RECURSION_GUARD` value
|
||||
in janet.h.
|
||||
|
||||
Note that Janet prototypes are not as expressive as metatables in Lua and many other languages.
|
||||
This is by design, as adding Lua or Python like capabilities would not be technically difficult.
|
||||
Users should prefer plain data and functions that operate on them rather than mutable objects
|
||||
with methods.
|
||||
|
||||
# Fibers
|
||||
|
||||
Janet has support for single-core asynchronous programming via coroutines, or fibers.
|
||||
Fibers allow a process to stop and resume execution later, essentially enabling
|
||||
multiple returns from a function. This allows many patterns such a schedules, generators,
|
||||
iterators, live debugging, and robust error handling. Janet's error handling is actually built on
|
||||
top of fibers (when an error is thrown, the parent fiber will handle the error).
|
||||
|
||||
A temporary return from a fiber is called a yield, and can be invoked with the `yield` function.
|
||||
To resume a fiber that has been yielded, use the `resume` function. When resume is called on a fiber,
|
||||
it will only return when that fiber either returns, yields, throws an error, or otherwise emits
|
||||
a signal.
|
||||
|
||||
Different from traditional coroutines, Janet's fibers implement a signaling mechanism, which
|
||||
is used to differentiate different kinds of returns. When a fiber yields or throws an error,
|
||||
control is returned to the calling fiber. The parent fiber must then check what kind of state the
|
||||
fiber is in to differentiate errors from return values from user defined signals.
|
||||
|
||||
To create a fiber, user the `fiber/new` function. The fiber constructor take one or two arguments.
|
||||
The first, necessary argument is the function that the fiber will execute. This function must accept
|
||||
an arity of zero. The next optional argument is a collection of flags checking what kinds of
|
||||
signals to trap and return via `resume`. This is useful so
|
||||
the programmer does not need to handle all different kinds of signals from a fiber. Any un-trapped signals
|
||||
are simply propagated to the next fiber.
|
||||
|
||||
```lisp
|
||||
(def f (fiber/new (fn []
|
||||
(yield 1)
|
||||
(yield 2)
|
||||
(yield 3)
|
||||
(yield 4)
|
||||
5)))
|
||||
|
||||
# Get the status of the fiber (:alive, :dead, :debug, :new, :pending, or :user0-:user9)
|
||||
(print (fiber/status f)) # -> :new
|
||||
|
||||
(print (resume f)) # -> prints 1
|
||||
(print (resume f)) # -> prints 2
|
||||
(print (resume f)) # -> prints 3
|
||||
(print (resume f)) # -> prints 4
|
||||
(print (fiber/status f)) # -> print :pending
|
||||
(print (resume f)) # -> prints 5
|
||||
(print (fiber/status f)) # -> print :dead
|
||||
(print (resume f)) # -> throws an error because the fiber is dead
|
||||
```
|
||||
|
||||
## Using Fibers to Capture Errors
|
||||
|
||||
Besides being used as coroutines, fibers can be used to implement error handling (exceptions).
|
||||
|
||||
```lisp
|
||||
(defn my-function-that-errors [x]
|
||||
(print "start function with " x)
|
||||
(error "oops!")
|
||||
(print "never gets here"))
|
||||
|
||||
# Use the :e flag to only trap errors.
|
||||
(def f (fiber/new my-function-that-errors :e))
|
||||
(def result (resume f))
|
||||
(if (= (fiber/status f) :error)
|
||||
(print "result contains the error")
|
||||
(print "result contains the good result"))
|
||||
```
|
||||
|
||||
# Macros
|
||||
|
||||
Janet supports macros like most lisps. A macro is like a function, but transforms
|
||||
the code itself rather than data. They let you extend the syntax of the language itself.
|
||||
|
||||
You have seen some macros already. The `let`, `loop`, and `defn` forms are macros. When the compiler
|
||||
sees a macro, it evaluates the macro and then compiles the result. We say the macro has been
|
||||
*expanded* after the compiler evaluates it. A simple version of the `defn` macro can
|
||||
be thought of as transforming code of the form
|
||||
|
||||
```lisp
|
||||
(defn1 myfun [x] body)
|
||||
```
|
||||
into
|
||||
```lisp
|
||||
(def myfun (fn myfun [x] body))
|
||||
```
|
||||
|
||||
We could write such a macro like so:
|
||||
|
||||
```lisp
|
||||
(defmacro defn1 [name args body]
|
||||
(tuple 'def name (tuple 'fn name args body)))
|
||||
```
|
||||
|
||||
There are a couple of issues with this macro, but it will work for simple functions
|
||||
quite well.
|
||||
|
||||
The first issue is that our defn2 macro can't define functions with multiple expressions
|
||||
in the body. We can make the macro variadic, just like a function. Here is a second version
|
||||
of this macro.
|
||||
|
||||
```lisp
|
||||
(defmacro defn2 [name args & body]
|
||||
(tuple 'def name (apply tuple 'fn name args body)))
|
||||
```
|
||||
|
||||
Great! Now we can define functions with multiple elements in the body. We can still improve this
|
||||
macro even more though. First, we can add a docstring to it. If someone is using the function later,
|
||||
they can use `(doc defn3)` to get a description of the function. Next, we can rewrite the macro
|
||||
using janet's builtin quasiquoting facilities.
|
||||
|
||||
```lisp
|
||||
(defmacro defn3
|
||||
"Defines a new function."
|
||||
[name args & body]
|
||||
~(def ,name (fn ,name ,args ,;body)))
|
||||
```
|
||||
|
||||
This is functionally identical to our previous version `defn2`, but written in such
|
||||
a way that the macro output is more clear. The leading tilde `~` is shorthand for the
|
||||
`(quasiquote x)` special form, which is like `(quote x)` except we can unquote
|
||||
expressions inside it. The comma in front of `name` and `args` is an unquote, which
|
||||
allows us to put a value in the quasiquote. Without the unquote, the symbol \'name\'
|
||||
would be put in the returned tuple. Without the unquote, every function we defined
|
||||
would be called \'name\'!.
|
||||
|
||||
Similar to name, we must also unquote body. However, a normal unquote doesn't work.
|
||||
See what happens if we use a normal unquote for body as well.
|
||||
|
||||
```lisp
|
||||
(def name 'myfunction)
|
||||
(def args '[x y z])
|
||||
(defn body '[(print x) (print y) (print z)])
|
||||
|
||||
~(def ,name (fn ,name ,args ,body))
|
||||
# -> (def myfunction (fn myfunction (x y z) ((print x) (print y) (print z))))
|
||||
```
|
||||
|
||||
There is an extra set of parentheses around the body of our function! We don't
|
||||
want to put the body *inside* the form `(fn args ...)`, we want to *splice* it
|
||||
into the form. Luckily, janet has the `(splice x)` special form for this purpose,
|
||||
and a shorthand for it, the ; character.
|
||||
When combined with the unquote special, we get the desired output.
|
||||
|
||||
```lisp
|
||||
~(def ,name (fn ,name ,args ,;body))
|
||||
# -> (def myfunction (fn myfunction (x y z) (print x) (print y) (print z)))
|
||||
```
|
||||
|
||||
## Hygiene
|
||||
|
||||
Sometime when we write macros, we must generate symbols for local bindings. Ignoring that
|
||||
it could be written as a function, consider
|
||||
the following macro
|
||||
|
||||
```lisp
|
||||
(defmacro max1
|
||||
"Get the max of two values."
|
||||
[x y]
|
||||
~(if (> ,x ,y) ,x ,y))
|
||||
```
|
||||
|
||||
This almost works, but will evaluate both x and y twice. This is because both show up
|
||||
in the macro twice. For example, `(max1 (do (print 1) 1) (do (print 2) 2))` will
|
||||
print both 1 and 2 twice, which is surprising to a user of this macro.
|
||||
|
||||
We can do better:
|
||||
|
||||
```lisp
|
||||
(defmacro max2
|
||||
"Get the max of two values."
|
||||
[x y]
|
||||
~(let [x ,x
|
||||
y ,y]
|
||||
(if (> x y) x y)))
|
||||
```
|
||||
|
||||
Now we have no double evaluation problem! But we now have an even more subtle problem.
|
||||
What happens in the following code?
|
||||
|
||||
```lisp
|
||||
(def x 10)
|
||||
(max2 8 (+ x 4))
|
||||
```
|
||||
|
||||
We want the max to be 14, but this will actually evaluate to 12! This can be understood
|
||||
if we expand the macro. You can expand macro once in janet using the `(macex1 x)` function.
|
||||
(To expand macros until there are no macros left to expand, use `(macex x)`. Be careful,
|
||||
janet has many macros, so the full expansion may be almost unreadable).
|
||||
|
||||
```lisp
|
||||
(macex1 '(max2 8 (+ x 4)))
|
||||
# -> (let (x 8 y (+ x 4)) (if (> x y) x y))
|
||||
```
|
||||
|
||||
After expansion, y wrongly refers to the x inside the macro (which is bound to 8) rather than the x defined
|
||||
to be 10. The problem is the reuse of the symbol x inside the macro, which overshadowed the original
|
||||
binding.
|
||||
|
||||
Janet provides a general solution to this problem in terms of the `(gensym)` function, which returns
|
||||
a symbol which is guaranteed to be unique and not collide with any symbols defined previously. We can define
|
||||
our macro once more for a fully correct macro.
|
||||
|
||||
```lisp
|
||||
(defmacro max3
|
||||
"Get the max of two values."
|
||||
[x y]
|
||||
(def $x (gensym))
|
||||
(def $y (gensym))
|
||||
~(let [,$x ,x
|
||||
,$y ,y]
|
||||
(if (> ,$x ,$y) ,$x ,$y)))
|
||||
```
|
||||
|
||||
As you can see, macros are very powerful but also are prone to subtle bugs. You must remember that
|
||||
at their core, macros are just functions that output code, and the code that they return must
|
||||
work in many contexts!
|
174
doc/Loop.md
174
doc/Loop.md
@ -1,174 +0,0 @@
|
||||
# Loops in Janet
|
||||
|
||||
A very common and essential operation in all programming is looping. Most
|
||||
languages support looping of some kind, either with explicit loops or recursion.
|
||||
Janet supports both recursion and a primitive `while` loop. While recursion is
|
||||
useful in many cases, sometimes is more convenient to use a explicit loop to
|
||||
iterate over a collection like an array.
|
||||
|
||||
## An Example - Iterating a Range
|
||||
|
||||
Suppose you want to calculate the sum of the first 10 natural numbers
|
||||
0 through 9. There are many ways to carry out this explicit calculation
|
||||
even with taking shortcuts. A succinct way in janet is
|
||||
|
||||
```
|
||||
(+ ;(range 10))
|
||||
```
|
||||
|
||||
We will limit ourselves however to using explicit looping and no functions
|
||||
like `(range n)` which generate a list of natural numbers for us.
|
||||
|
||||
For our first version, we will use only the while macro to iterate, similar
|
||||
to how one might sum natural numbers in a language such as C.
|
||||
|
||||
```
|
||||
(var sum 0)
|
||||
(var i 0)
|
||||
(while (< i 10)
|
||||
(+= sum i)
|
||||
(++ i))
|
||||
(print sum) # prints 45
|
||||
```
|
||||
This is a very imperative style program which can grow very large very quickly.
|
||||
We are manually updating a counter `i` in a loop. Using the macros `+=` and `++`, this
|
||||
style code is similar in density to C code.
|
||||
It is recommended to use either macros (such as the loop macro) or a functional
|
||||
style in janet.
|
||||
|
||||
Since this is such a common pattern, Janet has a macro for this exact purpose. The
|
||||
`(for x start end body)` captures exactly this behavior of incrementing a counter
|
||||
in a loop.
|
||||
|
||||
```
|
||||
(var sum 0)
|
||||
(for i 0 10 (+= sum i))
|
||||
(print sum) # prints 45
|
||||
```
|
||||
|
||||
We have completely wrapped the imperative counter in a macro. The for macro, while not
|
||||
very flexible, is very terse and covers a common case of iteration, iterating over an integer range. The for macro will be expanded to something very similar to our original
|
||||
version with a while loop.
|
||||
|
||||
We can do something similar with the more flexible `loop` macro.
|
||||
|
||||
```
|
||||
(var sum 0)
|
||||
(loop [i :range [0 10]] (+= sum i))
|
||||
(print sum) # prints 45
|
||||
```
|
||||
|
||||
This is slightly more verbose than the for macro, but can be more easily extended.
|
||||
Let's say that we wanted to only count even numbers towards the sum. We can do this
|
||||
easily with the loop macro.
|
||||
|
||||
```
|
||||
(var sum 0)
|
||||
(loop [i :range [0 10] :when (even? i)] (+= sum i))
|
||||
(print sum) # prints 20
|
||||
```
|
||||
|
||||
The loop macro has several verbs (:range) and modifiers (:when) that let
|
||||
the programmer more easily generate common looping idioms. The loop macro
|
||||
is similar to the Common Lips loop macro, but smaller in scope and with a much
|
||||
simpler syntax. As with the `for` macro, the loop macro expands to similar
|
||||
code as our original while expression.
|
||||
|
||||
## Another Example - Iterating an Indexed Data Structure
|
||||
|
||||
Another common usage for iteration in any language is iterating over the items in
|
||||
some data structure, like items in an array, characters in a string, or key value
|
||||
pairs in a table.
|
||||
|
||||
Say we have an array of names that we want to print out. We will
|
||||
again start with a simple while loop which we will refine into
|
||||
more idiomatic expressions.
|
||||
|
||||
First, we will define our array of names
|
||||
```
|
||||
(def names @["Jean-Paul Sartre" "Bob Dylan" "Augusta Ada King" "Frida Kahlo" "Harriet Tubman")
|
||||
```
|
||||
|
||||
With our array of names, we can use a while loop to iterate through the indices of names, get the
|
||||
values, and the print them.
|
||||
|
||||
```
|
||||
(var i 0)
|
||||
(def len (length names))
|
||||
(while (< i len)
|
||||
(print (get names i))
|
||||
(++ i))
|
||||
```
|
||||
|
||||
This is rather verbose. janet provides the `each` macro for iterating through the items in a tuple or
|
||||
array, or the bytes in a buffer, symbol, or string.
|
||||
|
||||
```
|
||||
(each name names (print name))
|
||||
```
|
||||
|
||||
We can also use the `loop` macro for this case as well using the `:in` verb.
|
||||
|
||||
```
|
||||
(loop [name :in names] (print name))
|
||||
```
|
||||
|
||||
## Iterating a Dictionary
|
||||
|
||||
In the previous example, we iterated over the values in an array. Another common
|
||||
use of looping in a Janet program is iterating over the keys or values in a table.
|
||||
We cannot use the same method as iterating over an array because a table or struct does
|
||||
not contain a known integer range of keys. Instead we rely on a function `next`, which allows
|
||||
us to visit each of the keys in a struct or table. Note that iterating over a table will not
|
||||
visit the prototype table.
|
||||
|
||||
As an example, lets iterate over a table of letters to a word that starts with that letter. We
|
||||
will print out the words to our simple children's book.
|
||||
|
||||
```
|
||||
(def alphabook
|
||||
@{"A" "Apple"
|
||||
"B" "Banana"
|
||||
"C" "Cat"
|
||||
"D" "Dog"
|
||||
"E" "Elephant" })
|
||||
```
|
||||
|
||||
As before, we can evaluate this loop using only a while loop and the `next` function.
|
||||
|
||||
```
|
||||
(var key (next alphabook nil))
|
||||
(while (not= nil key)
|
||||
(print key " is for " (get alphabook key))
|
||||
(set key (next alphabook key))
|
||||
```
|
||||
|
||||
However, we can do better than this with the loop macro using the `:pairs` or `:keys` verbs.
|
||||
|
||||
```
|
||||
(loop [[letter word] :pairs alphabook]
|
||||
(print letter " is for " word))
|
||||
```
|
||||
|
||||
Using the `:keys` verb and the dot syntax for indexing
|
||||
|
||||
```
|
||||
(loop [letter :keys alphabook]
|
||||
(print letter " is for " alphabook.letter))
|
||||
```
|
||||
|
||||
The symbol `alphabook.letter` is shorthand for `(get alphabook letter)`.
|
||||
Note that the dot syntax of `alphabook.letter` is different than in many languages. In C or
|
||||
ALGOL like languages, it is more akin to the indexing operator, and would be written `alphabook[letter]`.
|
||||
The `.` character is part of the symbol and is recognized by the compiler.
|
||||
|
||||
We can also use the core library functions `keys` and `pairs` to get arrays of the keys and
|
||||
pairs respectively of the alphabook.
|
||||
|
||||
```
|
||||
(loop [[letter word] :in (pairs alphabook)]
|
||||
(print letter " is for " word))
|
||||
|
||||
(loop [letter :in (keys alphabook)]
|
||||
(print letter " is for " alphabook.letter))
|
||||
```
|
244
doc/Parser.md
244
doc/Parser.md
@ -1,244 +0,0 @@
|
||||
# The Parser
|
||||
|
||||
A Janet program begins life as a text file, just a sequence of byte like
|
||||
any other on your system. Janet source files should be UTF-8 or ASCII
|
||||
encoded. Before Janet can compile or run your program, it must transform
|
||||
your source code into a data structure. Janet is a lisp, which means it is
|
||||
homoiconic - code is data, so all of the facilities in the language for
|
||||
manipulating arrays, tuples, strings, and tables can be used for manipulating
|
||||
your source code as well.
|
||||
|
||||
But before janet code is represented as a data structure, it must be read, or parsed,
|
||||
by the janet parser. Called the reader in many other lisps, the parser is a machine
|
||||
that takes in plain text and outputs data structures which can be used by both
|
||||
the compiler and macros. In janet, it is a parser rather than a reader because
|
||||
there is no code execution at read time. This is safer and simpler, and also
|
||||
lets janet syntax serve as a robust data interchange format. While a parser
|
||||
is not extensible, in janet the philosophy is to extend the language via macros
|
||||
rather than reader macros.
|
||||
|
||||
## Nil, True and False
|
||||
|
||||
Nil, true and false are all literals than can be entered as such
|
||||
in the parser.
|
||||
|
||||
```
|
||||
nil
|
||||
true
|
||||
false
|
||||
```
|
||||
|
||||
## Symbols
|
||||
|
||||
Janet symbols are represented a sequence of alphanumeric characters
|
||||
not starting with a digit or a colon. They can also contain the characters
|
||||
\!, @, $, \%, \^, \&, \*, -, \_, +, =, \|, \~, :, \<, \>, ., \?, \\, /, as
|
||||
well as any Unicode codepoint not in the ASCII range.
|
||||
|
||||
By convention, most symbols should be all lower case and use dashes to connect words
|
||||
(sometimes called kebab case).
|
||||
|
||||
Symbols that come from another module often contain a forward slash that separates
|
||||
the name of the module from the name of the definition in the module
|
||||
|
||||
```
|
||||
symbol
|
||||
kebab-case-symbol
|
||||
snake_case_symbol
|
||||
my-module/my-fuction
|
||||
*****
|
||||
!%$^*__--__._+++===~-crazy-symbol
|
||||
*global-var*
|
||||
你好
|
||||
```
|
||||
|
||||
## Keywords
|
||||
|
||||
Janet keywords are like symbols that begin with the character :. However, they
|
||||
are used differently and treated by the compiler as a constant rather than a name for
|
||||
something. Keywords are used mostly for keys in tables and structs, or pieces of syntax
|
||||
in macros.
|
||||
|
||||
```
|
||||
:keyword
|
||||
:range
|
||||
:0x0x0x0
|
||||
:a-keyword
|
||||
::
|
||||
:
|
||||
```
|
||||
|
||||
## Numbers
|
||||
|
||||
Janet numbers are represented by IEEE-754 floating point numbers.
|
||||
The syntax is similar to that of many other languages
|
||||
as well. Numbers can be written in base 10, with
|
||||
underscores used to separate digits into groups. A decimal point can be used for floating
|
||||
point numbers. Numbers can also be written in other bases by prefixing the number with the desired
|
||||
base and the character 'r'. For example, 16 can be written as `16`, `1_6`, `16r10`, `4r100`, or `0x10`. The
|
||||
`0x` prefix can be used for hexadecimal as it is so common. The radix must be themselves written in base 10, and
|
||||
can be any integer from 2 to 36. For any radix above 10, use the letters as digits (not case sensitive).
|
||||
|
||||
```
|
||||
0
|
||||
12
|
||||
-65912
|
||||
4.98
|
||||
1.3e18
|
||||
1.3E18
|
||||
18r123C
|
||||
11raaa&a
|
||||
1_000_000
|
||||
0xbeef
|
||||
```
|
||||
|
||||
## Strings
|
||||
|
||||
Strings in janet are surrounded by double quotes. Strings are 8bit clean, meaning
|
||||
meaning they can contain any arbitrary sequence of bytes, including embedded
|
||||
0s. To insert a double quote into a string itself, escape
|
||||
the double quote with a backslash. For unprintable characters, you can either use
|
||||
one of a few common escapes, use the `\xHH` escape to escape a single byte in
|
||||
hexidecimal. The supported escapes are:
|
||||
|
||||
- \\xHH Escape a single arbitrary byte in hexidecimal.
|
||||
- \\n Newline (ASCII 10)
|
||||
- \\t Tab character (ASCII 9)
|
||||
- \\r Carriage Return (ASCII 13)
|
||||
- \\0 Null (ASCII 0)
|
||||
- \\z Null (ASCII 0)
|
||||
- \\f Form Feed (ASCII 12)
|
||||
- \\e Escape (ASCII 27)
|
||||
- \\" Double Quote (ASCII 34)
|
||||
- \\\\ Backslash (ASCII 92)
|
||||
|
||||
Strings can also contain literal newline characters that will be ignore.
|
||||
This lets one define a multiline string that does not contain newline characters.
|
||||
|
||||
An alternative way of representing strings in janet is the long string, or the backquote
|
||||
delimited string. A string can also be define to start with a certain number of
|
||||
backquotes, and will end the same number of backquotes. Long strings
|
||||
do not contain escape sequences; all bytes will be parsed literally until
|
||||
ending delimiter is found. This is useful
|
||||
for defining multi-line strings with literal newline characters, unprintable
|
||||
characters, or strings that would otherwise require many escape sequences.
|
||||
|
||||
```
|
||||
"This is a string."
|
||||
"This\nis\na\nstring."
|
||||
"This
|
||||
is
|
||||
a
|
||||
string."
|
||||
``
|
||||
This
|
||||
is
|
||||
a
|
||||
string
|
||||
``
|
||||
```
|
||||
|
||||
## Buffers
|
||||
|
||||
Buffers are similar strings except they are mutable data structures. Strings in janet
|
||||
cannot be mutated after created, where a buffer can be changed after creation.
|
||||
The syntax for a buffer is the same as that for a string or long string, but
|
||||
the buffer must be prefixed with the '@' character.
|
||||
|
||||
```
|
||||
@""
|
||||
@"Buffer."
|
||||
@``Another buffer``
|
||||
```
|
||||
|
||||
## Tuples
|
||||
|
||||
Tuples are a sequence of white space separated values surrounded by either parentheses
|
||||
or brackets. The parser considers any of the characters ASCII 32, \\0, \\f, \\n, \\r or \\t
|
||||
to be white-space.
|
||||
|
||||
```
|
||||
(do 1 2 3)
|
||||
[do 1 2 3]
|
||||
```
|
||||
|
||||
## Arrays
|
||||
|
||||
Arrays are the same as tuples, but have a leading @ to indicate mutability.
|
||||
|
||||
```
|
||||
@(:one :two :three)
|
||||
@[:one :two :three]
|
||||
```
|
||||
|
||||
## Structs
|
||||
|
||||
Structs are represented by a sequence of white-space delimited key value pairs
|
||||
surrounded by curly braces. The sequence is defined as key1, value1, key2, value2, etc.
|
||||
There must be an even number of items between curly braces or the parser will
|
||||
signal a parse error. Any value can be a key or value. Using nil as a key or
|
||||
value, however, will drop that pair from the parsed struct.
|
||||
|
||||
```
|
||||
{}
|
||||
{:key1 "value1" :key2 :value2 :key3 3}
|
||||
{(1 2 3) (4 5 6)}
|
||||
{@[] @[]}
|
||||
{1 2 3 4 5 6}
|
||||
```
|
||||
## Tables
|
||||
|
||||
Table have the same syntax as structs, except they have the @ prefix to indicate
|
||||
that they are mutable.
|
||||
|
||||
```
|
||||
@{}
|
||||
@{:key1 "value1" :key2 :value2 :key3 3}
|
||||
@{(1 2 3) (4 5 6)}
|
||||
@{@[] @[]}
|
||||
@{1 2 3 4 5 6}
|
||||
```
|
||||
|
||||
## Comments
|
||||
|
||||
Comments begin with a \# character and continue until the end of the line.
|
||||
There are no multi-line comments.
|
||||
|
||||
## Shorthand
|
||||
|
||||
Often called reader macros in other lisps, Janet provides several shorthand
|
||||
notations for some forms.
|
||||
|
||||
### 'x
|
||||
|
||||
Shorthand for `(quote x)`
|
||||
|
||||
### ;x
|
||||
|
||||
Shorthand for `(splice x)`
|
||||
|
||||
### ~x
|
||||
|
||||
Shorthand for `(quasiquote x)`
|
||||
|
||||
### ,x
|
||||
|
||||
Shorthand for `(unquote x)`
|
||||
|
||||
These shorthand notations can be combined in any order, allowing
|
||||
forms like `''x` (`(quote (quote x))`), or `,;x` (`(unquote (splice x))`).
|
||||
|
||||
## API
|
||||
|
||||
The parser contains the following functions which exposes
|
||||
the parser state machine as a janet abstract object.
|
||||
|
||||
- `parser/byte`
|
||||
- `parser/consume`
|
||||
- `parser/error`
|
||||
- `parser/flush`
|
||||
- `parser/new`
|
||||
- `parser/produce`
|
||||
- `parser/state`
|
||||
- `parser/status`
|
||||
- `parser/where`
|
201
doc/Peg.md
201
doc/Peg.md
@ -1,201 +0,0 @@
|
||||
# Peg (Parsing Expression Grammars)
|
||||
|
||||
A common programming task is recognizing patterns in text, be it
|
||||
filtering emails from a list or extracting data from a CSV file. Programming
|
||||
languages and libraries usually offer a number of tools for this, including prebuilt
|
||||
parsers, simple operations on strings (splitting a string on commas), and regular expressions.
|
||||
The pre-built or custom-built parser is usually the most robust solution, but can
|
||||
be very complex to maintain and may not exist for many languages. String functions are not
|
||||
powerful enough for a large class of languages, and regular expressions can be hard to read
|
||||
(which characters are escaped?) and under-powered (don't parse HTML with regex!).
|
||||
|
||||
PEGs, or Parsing Expression Grammars, are another formalism for recognizing languages that
|
||||
are easier to write than a custom parser and more powerful than regular expressions. They also
|
||||
can produce grammars that are easily understandable and fast. PEGs can also be compiled
|
||||
to a bytecode format that can be reused. Janet offers the `peg` module for writing and
|
||||
evaluating PEGs.
|
||||
|
||||
Janet's `peg` module borrows syntax and ideas from both LPeg and REBOL/Red parse module. Janet has
|
||||
no built in regex module because PEGs offer a superset of regex's functionality.
|
||||
|
||||
Below is a simple example for checking if a string is a valid IP address. Notice how
|
||||
the grammar is descriptive enough that you can read it even if you don't know the peg
|
||||
syntax (example is translated from a [RED language blog post](https://www.red-lang.org/2013/11/041-introducing-parse.html).
|
||||
```clojure
|
||||
(def ip-address
|
||||
'{:dig (range "09")
|
||||
:0-4 (range "04")
|
||||
:0-5 (range "05")
|
||||
:byte (choice
|
||||
(sequence "25" :0-5)
|
||||
(sequence "2" :0-4 :dig)
|
||||
(sequence "1" :dig :dig)
|
||||
(between 1 2 :dig))
|
||||
:main (sequence :byte "." :byte "." :byte "." :byte)})
|
||||
|
||||
(peg/match ip-address "0.0.0.0") # -> @[]
|
||||
(peg/match ip-address "elephant") # -> nil
|
||||
(peg/match ip-address "256.0.0.0") # -> nil
|
||||
```
|
||||
|
||||
## The API
|
||||
|
||||
The `peg` module has few functions because the complexity is exposed through the
|
||||
pattern syntax. Note that there is only one match function, `peg/match`. Variations
|
||||
on matching, such as parsing or searching, can be implemented inside patterns.
|
||||
PEGs can also be compiled ahead of time with `peg/compile` if a PEG will be reused
|
||||
many times.
|
||||
|
||||
### `(peg/match peg text [,start=0] & arguments)`
|
||||
|
||||
Match a peg against some text. Returns an array of captured data if the text
|
||||
matches, or nil if there is no match. The caller can provide an optional start
|
||||
index to begin matching the text at, otherwise the PEG starts on the first character
|
||||
of text. A peg can either a compile PEG object or peg source.
|
||||
|
||||
### `(peg/compile peg)`
|
||||
|
||||
Compiles a peg source data structure into a new PEG. Throws an error if there are problems
|
||||
with the peg code.
|
||||
|
||||
## Primitive Patterns
|
||||
|
||||
Larger patterns are built up with primitive patterns, which recognize individual
|
||||
characters, string literals, or a given number of characters. A character in Janet
|
||||
is considered a byte, so PEGs will work on any string of bytes. No special meaning is
|
||||
given to the 0 byte, or the string terminator in many languages.
|
||||
|
||||
| Pattern Signature | What it Matches |
|
||||
| ----------------- | ----------------|
|
||||
| string ("cat") | The literal string. |
|
||||
| integer (3) | Matches a number of characters, and advances that many characters. If negative, matches if not that many characters and does not advance. For example, -1 will match the end of a string |
|
||||
| `(range "az" "AZ")` | Matches characters in a range and advances 1 character. Multiple ranges can be combined together. |
|
||||
| `(set "abcd")` | Match any character in the argument string. Advances 1 character. |
|
||||
|
||||
Primitive patterns are not that useful by themselves, but can be passed to `peg/match` and `peg/compile` like any other pattern.
|
||||
|
||||
```clojure
|
||||
(peg/match "hello" "hello") # -> @[]
|
||||
(peg/match "hello" "hi") # -> nil
|
||||
(peg/match 1 "hi") # -> @[]
|
||||
(peg/match 1 "") # -> nil
|
||||
(peg/match '(range "AZ") "F") # -> @[]
|
||||
(peg/match '(range "AZ") "-") # -> nil
|
||||
(peg/match '(set "AZ") "F") # -> nil
|
||||
(peg/match '(set "ABCDEFGHIJKLMNOPQRSTUVWXYZ") "F") # -> @[]
|
||||
```
|
||||
|
||||
## Combining Patterns
|
||||
|
||||
These primitive patterns can be combined with several combinators to match a wide number of
|
||||
languages. These combinators
|
||||
can be thought of as the looping and branching forms in a traditional language
|
||||
(that is how they are implemented when compiled to bytecode).
|
||||
|
||||
| Pattern Signature | What it matches |
|
||||
| ------- | --------------- |
|
||||
| `(choice a b c ...)` | Tries to match a, then b, and so on. Will succeed on the first successful match, and fails if none of the arguments match the text. |
|
||||
| `(+ a b c ...)` | Alias for `(choice a b c ...)` |
|
||||
| `(sequence a b c)` | Tries to match a, b, c and so on in sequence. If any of these arguments fail to match the text, the whole pattern fails. |
|
||||
| `(* a b c ...)` | Alias for `(sequence a b c ...)` |
|
||||
| `(any x)` | Matches 0 or more repetitions of x. |
|
||||
| `(some x)` | Matches 1 or more repetitions of x. |
|
||||
| `(between min max x)` | Matches between min and max (inclusive) repetitions of x. |
|
||||
| `(at-least n x)` | Matches at least n repetitions of x. |
|
||||
| `(at-most n x)` | Matches at most n repetitions of x. |
|
||||
| `(if cond patt)` | Tries to match patt only if cond matches as well. cond will not produce any captures. |
|
||||
| `(if-not cond patt)` | Tries to match only if cond does not match. cond will not produce any captures. |
|
||||
| `(not patt)` | Matches only if patt does not match. Will not produce captures or advance any characters. |
|
||||
| `(! patt)` | Alias for `(not patt)` |
|
||||
| `(look offset patt)` | Matches only if patt matches at a fixed offset. offset can be any integer. patt will not produce captures and the peg will not advance any characters. |
|
||||
| `(> offset patt)` | Alias for `(look offset patt)` |
|
||||
| `(opt patt)` | Alias for `(between 0 1 patt)` |
|
||||
| `(? patt)` | Alias for `(between 0 1 patt)` |
|
||||
|
||||
PEGs try to match an input text with a pattern in a greedy manner.
|
||||
This means that if a rule fails to match, that rule will fail and not try again. The only
|
||||
backtracking provided in a peg is provided by the `(choice x y z ...)` special, which will
|
||||
try rules in order until one succeeds, and the whole pattern succeeds. If no sub pattern
|
||||
succeeds, then the whole pattern fails. Note that this means that the order of `x y z` in choice
|
||||
DOES matter. If y matches everything that z matches, z will never succeed.
|
||||
|
||||
## Captures
|
||||
|
||||
So far we have only been concerned with "does this text match this language?". This is useful, but
|
||||
it is often more useful to extract data from text if it does match a peg. The `peg` module
|
||||
uses that concept of a capture stack to extract data from text. As the PEG is trying to match
|
||||
a piece of text, some forms may push Janet values onto the capture stack as a side effect. If the
|
||||
text matches the main peg language, `(peg/match)` will return the final capture stack as an array.
|
||||
|
||||
Capture specials will only push captures to the capture stack if their child pattern matches the text.
|
||||
Most captures specials will match the same text as their first argument pattern. Also most specials
|
||||
that produce captures can take an optional argument `tag` that applies a keyword tag to the capture.
|
||||
These tagged captures can then be recaptured via the `(backref tag)` special in subsequent matches.
|
||||
Tagged captures, when combined with the `(cmt)` special, provide a powerful form of look-behind
|
||||
that can make many grammars simpler.
|
||||
|
||||
| Pattern Signature | What it captures |
|
||||
| ------- | ---------------- |
|
||||
| `(capture patt ?tag)` | Captures all of the text in patt if patt matches, If patt contains any captures, then those captures will be pushed to the capture stack before the total text. |
|
||||
| `(<- patt ?tag)` | Alias for `(capture patt ?tag)` |
|
||||
| `(quote patt ?tag)` | Another alias for `(capture patt ?tag)`. This allows code like `'patt` to capture pattern. |
|
||||
| `(group patt ?tag) ` | Captures an array of all of the captures in patt.
|
||||
| `(replace patt subst ?tag)` | Replaces the captures produced by patt by applying subst to them. If subst is a table or struct, will push `(get subst last-capture)` to the capture stack after removing the old captures. If a subst is a function, will call subst with the captures of patt as arguments and push the result to the capture stack. Otherwise, will push subst literally to the capture stack. |
|
||||
| `(/ patt subst ?tag)` | Alias for `(replace patt subst ?tag)` |
|
||||
| `(constant k ?tag)` | Captures a constant value and advances no characters. |
|
||||
| `(argument n ?tag)` | Captures the nth extra argument to the match function and does not advance. |
|
||||
| `(position ?tag)` | Captures the current index into the text and advances no input. |
|
||||
| `($ ?tag)` | Alias for `(position ?tag)`. |
|
||||
| `(accumulate patt ?tag)` | Capture a string that is the concatenation of all captures in patt. This will try to be efficient and not create intermediate strings if possible. |
|
||||
| `(% patt ?tag)` | Alias for `(accumulate patt ?tag)`
|
||||
| `(cmt patt fun ?tag)` | Invokes fun with all of the captures of patt as arguments (if patt matches). If the result is truthy, then captures the result. The whole expression fails if fun returns false or nil. |
|
||||
| `(backref tag ?tag)` | Duplicates the last capture with the tag `tag`. If no such capture exists then the match fails. |
|
||||
| `(-> tag ?tag)` | Alias for `(backref tag)`. |
|
||||
| `(error patt)` | Throws a Janet error if patt matches. The error thrown will be the last capture of patt, or a generic error if patt produces no captures. |
|
||||
| `(drop patt)` | Ignores (drops) all captures from patt. |
|
||||
|
||||
## Grammars and Recursion
|
||||
|
||||
The feature that makes PEGs so much more powerful than pattern matching solutions like (vanilla) regex is mutual recursion.
|
||||
To do recursion in a peg, you can wrap multiple patterns in a grammar, which is a Janet struct. The patterns must be named by
|
||||
keywords, which can then be used in all sub-patterns in the grammar.
|
||||
|
||||
Each grammar, defined by a struct, must also have a main rule, called :main, that is the pattern that the entire grammar
|
||||
is defined by.
|
||||
|
||||
An example grammar that uses mutual recursion:
|
||||
|
||||
```clojure
|
||||
(def my-grammar
|
||||
'{:a (* "a" :b "a")
|
||||
:b (* "b" (+ :a 0) "b")
|
||||
:main (* "(" :b ")")})
|
||||
|
||||
(peg/match my-grammar "(bb)") # -> @[]
|
||||
(peg/match my-grammar "(babbab)") # -> @[]
|
||||
(peg/match my-grammar "(baab)") # -> nil
|
||||
(peg/match my-grammar "(babaabab)") # -> nil
|
||||
```
|
||||
|
||||
Keep in mind that recursion is implemented with a stack, meaning that very recursive grammars
|
||||
can overflow the stack. The compiler is able to turn some recursion into iteration via tail call optimization, but some patterns
|
||||
may fail on large inputs. It is also possible to construct (very poorly written) patterns that will result in long loops and be very
|
||||
slow in general.
|
||||
|
||||
## String Searching and other Idioms
|
||||
|
||||
Although all pattern matching is done in anchored in mode, operations like global substitution
|
||||
and searching can be implemented with the `peg/module`. A simple Janet function that produces PEGs
|
||||
that search for strings shows how captures and looping specials can composed, and how quasiquoting
|
||||
can be used to embed values in patterns.
|
||||
|
||||
```clojure
|
||||
(defn finder
|
||||
"Creates a peg that finds all locations of str in the text."
|
||||
[str]
|
||||
(peg/compile ~(any (+ (* ($) ,str) 1))))
|
||||
|
||||
(def where-are-the-dogs? (finder "dog"))
|
||||
|
||||
(peg/match where-are-the-dogs? "dog dog cat dog") # -> @[0 4 12]
|
||||
```
|
206
doc/Specials.md
206
doc/Specials.md
@ -1,206 +0,0 @@
|
||||
# Special Forms
|
||||
|
||||
Janet is a lisp and so is defined in terms of mostly S-expressions, or
|
||||
in terms of Janet, tuples. Tuples are used to represent function calls, macros,
|
||||
and special forms. Most functionality is exposed through functions, some
|
||||
through macros, and a minimal amount through special forms. Special forms
|
||||
are neither functions nor macros -- they are used by the compiler to directly
|
||||
express a low level construct that can not be expressed through macros or functions.
|
||||
Special forms can be thought of as forming the real 'core' language of janet.
|
||||
|
||||
Below is a reference for all of the special forms in Janet.
|
||||
|
||||
## (def name meta... value)
|
||||
|
||||
This special form binds a value to a symbol. The symbol can the be substituted
|
||||
for the value in subsequent expression for the same result. A binding made by def
|
||||
is a constant and cannot be updated. A symbol can be redefined to a new value, but previous
|
||||
uses of the binding will refer to the previous value of the binding.
|
||||
|
||||
```lisp
|
||||
(def anumber (+ 1 2 3 4 5))
|
||||
|
||||
(print anumber) # prints 15
|
||||
```
|
||||
|
||||
Def can also take a tuple, array, table or struct to perform destructuring
|
||||
on the value. This allows us to do multiple assignments in one def.
|
||||
|
||||
```lisp
|
||||
(def [a b c] (range 10))
|
||||
(print a " " b " " c) # prints 0 1 2
|
||||
|
||||
(def {:x x} @{:x (+ 1 2)})
|
||||
(print x) # prints 3
|
||||
|
||||
(def [y {:x x}] @[:hi @{:x (+ 1 2)}])
|
||||
(print y x) # prints hi3
|
||||
```
|
||||
|
||||
Def can also append metadata and a docstring to the symbol when in the global scope.
|
||||
If not in the global scope, the extra metadata will be ignored.
|
||||
|
||||
```lisp
|
||||
(def mydef :private 3) # Adds the :private key to the metadata table.
|
||||
(def mydef2 :private "A docstring" 4) # Add a docstring
|
||||
|
||||
# The metadata will be ignored here because mydef is
|
||||
# accessible outside of the do form.
|
||||
(do
|
||||
(def mydef :private 3)
|
||||
(+ mydef 1))
|
||||
```
|
||||
|
||||
## (var name meta... value)
|
||||
|
||||
Similar to def, but bindings set in this manner can be updated using set. In all other respects is the
|
||||
same as def.
|
||||
|
||||
```lisp
|
||||
(var a 1)
|
||||
(defn printa [] (print a))
|
||||
|
||||
(printa) # prints 1
|
||||
(++ a)
|
||||
(printa) # prints 2
|
||||
(set a :hi)
|
||||
(printa) # prints hi
|
||||
```
|
||||
|
||||
## (fn name? args body...)
|
||||
|
||||
Compile a function literal (closure). A function literal consists of an optional name, an
|
||||
argument list, and a function body. The optional name is allowed so that functions can
|
||||
more easily be recursive. The argument list is a tuple of named parameters, and the body
|
||||
is 0 or more forms. The function will evaluate to the last form in the body. The other forms
|
||||
will only be evaluated for side effects.
|
||||
|
||||
Functions also introduced a new lexical scope, meaning the defs and vars inside a function
|
||||
body will not escape outside the body.
|
||||
|
||||
```lisp
|
||||
(fn []) # The simplest function literal. Takes no arguments and returns nil.
|
||||
(fn [x] x) # The identity function
|
||||
(fn identity [x] x) # The identity function - the name will also make stacktraces nicer.
|
||||
(fn [] 1 2 3 4 5) # A function that returns 5
|
||||
(fn [x y] (+ x y)) # A function that adds its two arguments.
|
||||
|
||||
(fn [& args] (length args)) # A variadic function that counts its arguments.
|
||||
|
||||
# A function that doesn't strictly check the number of arguments.
|
||||
# Extra arguments are ignored, and arguments not passed are nil.
|
||||
(fn [w x y z &] (tuple w w x x y y z z))
|
||||
```
|
||||
|
||||
## (do body...)
|
||||
|
||||
Execute a series of forms for side effects and evaluates to the final form. Also
|
||||
introduces a new lexical scope without creating or calling a function.
|
||||
|
||||
```lisp
|
||||
(do 1 2 3 4) # Evaluates to 4
|
||||
|
||||
# Prints 1, 2 and 3, then evaluates to (print 3), which is nil
|
||||
(do (print 1) (print 2) (print 3))
|
||||
|
||||
# Prints 1
|
||||
(do
|
||||
(def a 1)
|
||||
(print a))
|
||||
|
||||
# a is not defined here, so fails
|
||||
a
|
||||
```
|
||||
|
||||
## (quote x)
|
||||
|
||||
Evaluates to the literal value of the first argument. The argument is not compiled
|
||||
and is simply used as a constant value in the compiled code. Preceding a form with a
|
||||
single quote is shorthand for `(quote expression)`.
|
||||
|
||||
```lisp
|
||||
(quote 1) # evaluates to 1
|
||||
(quote hi) # evaluates to the symbol hi
|
||||
(quote quote) # evaluates to the symbol quote
|
||||
|
||||
`(1 2 3) # Evaluates to a tuple (1 2 3)
|
||||
`(print 1 2 3) # Evaluates to a tuple (print 1 2 3)
|
||||
```
|
||||
|
||||
## (if condition when-true when-false?)
|
||||
|
||||
Introduce a branching construct. The first form is the condition, the second
|
||||
form is the form to evaluate when the condition is true, and the optional
|
||||
third form is the form to evaluate when the condition is false. If no third
|
||||
form is provided it defaults to nil.
|
||||
|
||||
The if special form will not evaluate the when-true or when-false forms unless
|
||||
it needs to - it is a lazy form, which is why it cannot be a function or macro.
|
||||
|
||||
The condition is considered false only if it evaluates to nil or false - all other values
|
||||
are considered true.
|
||||
|
||||
```lisp
|
||||
(if true 10) # evaluates to 10
|
||||
(if false 10) # evaluates to nil
|
||||
(if true (print 1) (print 2)) # prints 1 but not 2
|
||||
```
|
||||
|
||||
## (splice x)
|
||||
|
||||
The splice special form is an interesting form that doesn't have an analog in most lisps.
|
||||
It only has an effect in two places - as an argument in a function call, or as the argument
|
||||
to the unquote form. Outside of these two settings, the splice special form simply evaluates
|
||||
directly to it's argument x. The shorthand for splice is prefixing a form with a semicolon.
|
||||
|
||||
In the context of a function call, splice will insert *the contents* of x in the parameter list.
|
||||
|
||||
```lisp
|
||||
(+ 1 2 3) # evaluates to 6
|
||||
|
||||
(+ @[1 2 3]) # bad
|
||||
|
||||
(+ (splice @[1 2 3])) # also evaluates to 6
|
||||
|
||||
(+ ;@[1 2 3]) # Same as above
|
||||
|
||||
(+ ;(range 100)) # Sum the first 100 natural numbers
|
||||
|
||||
(+ ;(range 100) 1000) # Sum the first 100 natural numbers and 1000
|
||||
```
|
||||
|
||||
Notice that this means we rarely will need the `apply` function, as the splice operator is more flexible.
|
||||
|
||||
The splice operator can also be used inside an unquote form, where it will behave like
|
||||
an `unquote-splicing` special in other lisps.
|
||||
|
||||
## (while condition body...)
|
||||
|
||||
The while special form compiles to a C-like while loop. The body of the form will be continuously evaluated
|
||||
until the condition is false or nil. Therefor, it is expected that the body will contain some side effects
|
||||
of the loop will go on for ever. The while loop always evaluates to nil.
|
||||
|
||||
```lisp
|
||||
(var i 0)
|
||||
(while (< i 10)
|
||||
(print i)
|
||||
(++ i))
|
||||
```
|
||||
|
||||
## (set l-value r-value)
|
||||
|
||||
Update the value of a var l-value to a new value r-value. The set special form will then evaluate to r-value.
|
||||
|
||||
The r-value can be any expression, and the l-value should be a bound var.
|
||||
|
||||
## (quasiquote x)
|
||||
|
||||
Similar to `(quote x)`, but allows for unquoting within x. This makes quasiquote useful for
|
||||
writing macros, as a macro definition often generates a lot of templated code with a
|
||||
few custom values. The shorthand for quasiquote is a leading tilde `~` before a form. With
|
||||
that form, `(unquote x)` will evaluate and insert x into the unquote form. The shorthand for
|
||||
`(unquote x)` is `,x`.
|
||||
|
||||
## (unquote x)
|
||||
|
||||
Unquote a form within a quasiquote. Outside of a quasiquote, unquote is invalid.
|
@ -1,224 +0,0 @@
|
||||
The Janet language is implemented on top of an abstract machine (AM). The compiler
|
||||
converts Janet data structures to this bytecode, which can then be efficiently executed
|
||||
from inside a C program. To understand the janet bytecode, it is useful to understand
|
||||
the abstractions used inside the Janet AM, as well as the C types used to implement these
|
||||
features.
|
||||
|
||||
## The Stack = The Fiber
|
||||
|
||||
A Janet Fiber is the type used to represent multiple concurrent processes
|
||||
in janet. It is basically a wrapper around the idea of a stack. The stack is
|
||||
divided into a number of stack frames (`JanetStackFrame *` in C), each of which
|
||||
contains information such as the function that created the stack frame,
|
||||
the program counter for the stack frame, a pointer to the previous frame,
|
||||
and the size of the frame. Each stack frame also is paired with a number
|
||||
registers.
|
||||
|
||||
```
|
||||
X: Slot
|
||||
|
||||
X
|
||||
X - Stack Top, for next function call.
|
||||
-----
|
||||
Frame next
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack 0
|
||||
-----
|
||||
Frame 0
|
||||
-----
|
||||
X
|
||||
X
|
||||
X - Stack -1
|
||||
-----
|
||||
Frame -1
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack -2
|
||||
-----
|
||||
Frame -2
|
||||
-----
|
||||
...
|
||||
...
|
||||
...
|
||||
-----
|
||||
Bottom of stack
|
||||
```
|
||||
|
||||
Fibers also have an incomplete stack frame for the next function call on top
|
||||
of their stacks. Making a function call involves pushing arguments to this
|
||||
temporary stack, and then invoking either the CALL or TCALL instructions.
|
||||
Arguments for the next function call are pushed via the PUSH, PUSH2, PUSH3, and
|
||||
PUSHA instructions. The stack of a fiber will grow as large as needed, although by
|
||||
default janet will limit the maximum size of a fiber's stack.
|
||||
The maximum stack size can be modified on a per fiber basis.
|
||||
|
||||
The slots in the stack are exposed as virtual registers to instructions. They
|
||||
can hold any Janet value.
|
||||
|
||||
## Closures
|
||||
|
||||
All functions in janet are closures; they combine some bytecode instructions
|
||||
with 0 or more environments. In the C source, a closure (hereby the same as
|
||||
a function) is represented by the type `JanetFunction *`. The bytecode instruction
|
||||
part of the function is represented by `JanetFuncDef *`, and a function environment
|
||||
is represented with `JanetFuncEnv *`.
|
||||
|
||||
The function definition part of a function (the 'bytecode' part, `JanetFuncDef *`),
|
||||
we also store various metadata about the function which is useful for debugging,
|
||||
as well as constants referenced by the function.
|
||||
|
||||
## C Functions
|
||||
|
||||
Janet uses C functions to bridge to native code. A C function
|
||||
(`JanetCFunction *` in C) is a C function pointer that can be called like
|
||||
a normal janet closure. From the perspective of the bytecode instruction set, there is no difference
|
||||
in invoking a C function and invoking a normal janet function.
|
||||
|
||||
## Bytecode Format
|
||||
|
||||
Janet bytecode presents an interface to a virtual machine with a large number
|
||||
of identical registers that can hold any Janet value (`Janet *` in C). Most instructions
|
||||
have a destination register, and 1 or 2 source register. Registers are simply
|
||||
named with positive integers.
|
||||
|
||||
Each instruction is a 32 bit integer, meaning that the instruction set is a constant
|
||||
width RISC instruction set like MIPS. The opcode of each instruction is the least significant
|
||||
byte of the instruction. The highest bit of
|
||||
this leading byte is reserved for debugging purpose, so there are 128 possible opcodes encodable
|
||||
with this scheme. Not all of these possible opcode are defined, and will trap the interpreter
|
||||
and emit a debug signal. Note that this mean an unknown opcode is still valid bytecode, it will
|
||||
just put the interpreter into a debug state when executed.
|
||||
|
||||
```
|
||||
X - Payload bits
|
||||
O - Opcode bits
|
||||
|
||||
4 3 2 1
|
||||
+----+----+----+----+
|
||||
| XX | XX | XX | OO |
|
||||
+----+----+----+----+
|
||||
```
|
||||
|
||||
8 bits for the opcode leaves 24 bits for the payload, which may or may not be utilized.
|
||||
There are a few instruction variants that divide these payload bits.
|
||||
|
||||
* 0 arg - Used for noops, returning nil, or other instructions that take no
|
||||
arguments. The payload is essentially ignored.
|
||||
* 1 arg - All payload bits correspond to a single value, usually a signed or unsigned integer.
|
||||
Used for instructions of 1 argument, like returning a value, yielding a value to the parent fiber,
|
||||
or doing a (relative) jump.
|
||||
* 2 arg - Payload is split into byte 2 and bytes 3 and 4.
|
||||
The first argument is the 8 bit value from byte 2, and the second argument is the 16 bit value
|
||||
from bytes 3 and 4 (`instruction >> 16`). Used for instructions of two arguments, like move, normal
|
||||
function calls, conditionals, etc.
|
||||
* 3 arg - Bytes 2, 3, and 4 each correspond to an 8 bit argument.
|
||||
Used for arithmetic operations, emitting a signal, etc.
|
||||
|
||||
These instruction variants can be further refined based on the semantics of the arguments.
|
||||
Some instructions may treat an argument as a slot index, while other instructions
|
||||
will treat the argument as a signed integer literal, and index for a constant, an index
|
||||
for an environment, or an unsigned integer.
|
||||
|
||||
## Instruction Reference
|
||||
|
||||
A listing of all opcode values can be found in src/include/janet/janetopcodes.h. The janet assembly
|
||||
short names can be found src/assembler/asm.c. In this document, we will refer to the instructions
|
||||
by their short names as presented to the assembler rather than their numerical values.
|
||||
|
||||
Each instruction is also listed with a signature, which are the arguments the instruction
|
||||
expects. There are a handful of instruction signatures, which combine the arity and type
|
||||
of the instruction. The assembler does not
|
||||
do any type-checking per closure, but does prevent jumping to invalid instructions and
|
||||
failure to return or error.
|
||||
|
||||
### Notation
|
||||
|
||||
* The $ prefix indicates that a instruction parameter is acting as a virtual register (slot).
|
||||
If a parameter does not have the $ suffix in the description, it is acting as some kind
|
||||
of literal (usually an unsigned integer for indexes, and a signed integer for literal integers).
|
||||
|
||||
* Some operators in the description have the suffix 'i' or 'r'. These indicate
|
||||
that these operators correspond to integers or real numbers only, respectively. All
|
||||
bit-wise operators and bit shifts only work with integers.
|
||||
|
||||
* The `>>>` indicates unsigned right shift, as in Java. Because all integers in janet are
|
||||
signed, we differentiate the two kinds of right bit shift.
|
||||
|
||||
* The 'im' suffix in the instruction name is short for immediate.
|
||||
|
||||
### Reference Table
|
||||
|
||||
| Instruction | Signature | Description |
|
||||
| ----------- | --------------------------- | --------------------------------- |
|
||||
| `add` | `(add dest lhs rhs)` | $dest = $lhs + $rhs |
|
||||
| `addim` | `(addim dest lhs im)` | $dest = $lhs + im |
|
||||
| `band` | `(band dest lhs rhs)` | $dest = $lhs & $rhs |
|
||||
| `bnot` | `(bnot dest operand)` | $dest = ~$operand |
|
||||
| `bor` | `(bor dest lhs rhs)` | $dest = $lhs | $rhs |
|
||||
| `bxor` | `(bxor dest lhs rhs)` | $dest = $lhs ^ $rhs |
|
||||
| `call` | `(call dest callee)` | $dest = call($callee, args) |
|
||||
| `clo` | `(clo dest index)` | $dest = closure(defs[$index]) |
|
||||
| `cmp` | `(cmp dest lhs rhs)` | $dest = janet\_compare($lhs, $rhs)|
|
||||
| `div` | `(div dest lhs rhs)` | $dest = $lhs / $rhs |
|
||||
| `divim` | `(divim dest lhs im)` | $dest = $lhs / im |
|
||||
| `eq` | `(eq dest lhs rhs)` | $dest = $lhs == $rhs |
|
||||
| `eqim` | `(eqim dest lhs im)` | $dest = $lhs == im |
|
||||
| `err` | `(err message)` | Throw error $message. |
|
||||
| `get` | `(get dest ds key)` | $dest = $ds[$key] |
|
||||
| `geti` | `(geti dest ds index)` | $dest = $ds[index] |
|
||||
| `gt` | `(gt dest lhs rhs)` | $dest = $lhs \> $rhs |
|
||||
| `gtim` | `(gtim dest lhs im)` | $dest = $lhs \> im |
|
||||
| `jmp` | `(jmp label)` | pc = label, pc += offset |
|
||||
| `jmpif` | `(jmpif cond label)` | if $cond pc = label else pc++ |
|
||||
| `jmpno` | `(jmpno cond label)` | if $cond pc++ else pc = label |
|
||||
| `ldc` | `(ldc dest index)` | $dest = constants[index] |
|
||||
| `ldf` | `(ldf dest)` | $dest = false |
|
||||
| `ldi` | `(ldi dest integer)` | $dest = integer |
|
||||
| `ldn` | `(ldn dest)` | $dest = nil |
|
||||
| `lds` | `(lds dest)` | $dest = current closure (self) |
|
||||
| `ldt` | `(ldt dest)` | $dest = true |
|
||||
| `ldu` | `(ldu dest env index)` | $dest = envs[env][index] |
|
||||
| `len` | `(len dest ds)` | $dest = length(ds) |
|
||||
| `lt` | `(lt dest lhs rhs)` | $dest = $lhs \< $rhs |
|
||||
| `ltim` | `(ltim dest lhs im)` | $dest = $lhs \< im |
|
||||
| `mkarr` | `(mkarr dest)` | $dest = call(array, args) |
|
||||
| `mkbuf` | `(mkbuf dest)` | $dest = call(buffer, args) |
|
||||
| `mktab` | `(mktab dest)` | $dest = call(table, args) |
|
||||
| `mkstr` | `(mkstr dest)` | $dest = call(string, args) |
|
||||
| `mkstu` | `(mkstu dest)` | $dest = call(struct, args) |
|
||||
| `mktup` | `(mktup dest)` | $dest = call(tuple, args) |
|
||||
| `movf` | `(movf src dest)` | $dest = $src |
|
||||
| `movn` | `(movn dest src)` | $dest = $src |
|
||||
| `mul` | `(mul dest lhs rhs)` | $dest = $lhs \* $rhs |
|
||||
| `mulim` | `(mulim dest lhs im)` | $dest = $lhs \* im |
|
||||
| `noop` | `(noop)` | Does nothing. |
|
||||
| `push` | `(push val)` | Push $val on arg |
|
||||
| `push2` | `(push2 val1 val3)` | Push $val1, $val2 on args |
|
||||
| `push3` | `(push3 val1 val2 val3)` | Push $val1, $val2, $val3, on args |
|
||||
| `pusha` | `(pusha array)` | Push values in $array on args |
|
||||
| `put` | `(put ds key val)` | $ds[$key] = $val |
|
||||
| `puti` | `(puti ds index val)` | $ds[index] = $val |
|
||||
| `res` | `(res dest fiber val)` | $dest = resume $fiber with $val |
|
||||
| `ret` | `(ret val)` | Return $val |
|
||||
| `retn` | `(retn)` | Return nil |
|
||||
| `setu` | `(setu env index val)` | envs[env][index] = $val |
|
||||
| `sig` | `(sig dest value sigtype)` | $dest = emit $value as sigtype |
|
||||
| `sl` | `(sl dest lhs rhs)` | $dest = $lhs << $rhs |
|
||||
| `slim` | `(slim dest lhs shamt)` | $dest = $lhs << shamt |
|
||||
| `sr` | `(sr dest lhs rhs)` | $dest = $lhs >> $rhs |
|
||||
| `srim` | `(srim dest lhs shamt)` | $dest = $lhs >> shamt |
|
||||
| `sru` | `(sru dest lhs rhs)` | $dest = $lhs >>> $rhs |
|
||||
| `sruim` | `(sruim dest lhs shamt)` | $dest = $lhs >>> shamt |
|
||||
| `sub` | `(sub dest lhs rhs)` | $dest = $lhs - $rhs |
|
||||
| `tcall` | `(tcall callee)` | Return call($callee, args) |
|
||||
| `tchck` | `(tcheck slot types)` | Assert $slot does matches types |
|
||||
|
Loading…
Reference in New Issue
Block a user