mirror of
https://github.com/janet-lang/janet
synced 2025-01-17 19:02:51 +00:00
Created The Dst AM Bytecode (markdown)
parent
b6766fadc2
commit
157a77b743
233
The-Dst-AM-Bytecode.md
Normal file
233
The-Dst-AM-Bytecode.md
Normal file
@ -0,0 +1,233 @@
|
||||
The Dst language is implemented on top of an abstract machine (AM). The compiler
|
||||
converts Dst data structures to this bytecode, which can then be efficiently executed
|
||||
from inside a C program. To understand the dst bytecode, it is useful to understand
|
||||
the abstractions used inside the Dst AM, as well as the C types used to implement these
|
||||
features.
|
||||
|
||||
## The Stack = The Fiber
|
||||
|
||||
A Dst Fiber is the type used to represent multiple concurrent processes
|
||||
in dst. It is basically a wrapper around the idea of a stack. The stack is
|
||||
divided into a number of stack frames (`DstStackFrame *` in C), each of which
|
||||
contains information such as the function that created the stack frame,
|
||||
the program counter for the stack frame, a pointer to the previous frame,
|
||||
and the size of the frame. Each stack frame also is paired with a number
|
||||
registers.
|
||||
|
||||
```
|
||||
X: Slot
|
||||
|
||||
X
|
||||
X - Stack Top, for next function call.
|
||||
-----
|
||||
Frame next
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack 0
|
||||
-----
|
||||
Frame 0
|
||||
-----
|
||||
X
|
||||
X
|
||||
X - Stack -1
|
||||
-----
|
||||
Frame -1
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack -2
|
||||
-----
|
||||
Frame -2
|
||||
-----
|
||||
...
|
||||
...
|
||||
...
|
||||
-----
|
||||
Bottom of stack
|
||||
```
|
||||
|
||||
Fibers also have an incomplete stack frame for the next function call on top
|
||||
of their stacks. Making a function call involves pushing arguments to this
|
||||
temporary stack, and then invoking either the CALL or TCALL instructions.
|
||||
Arguments for the next function call are pushed via the PUSH, PUSH2, PUSH3, and
|
||||
PUSHA instructions. The stack of a fiber will grow as large as needed, although by
|
||||
default dst will limit the maximum size of a fiber's stack.
|
||||
The maximum stack size can be modified on a per fiber basis.
|
||||
|
||||
The slots in the stack are exposed as virtual registers to instructions. They
|
||||
can hold any Dst value.
|
||||
|
||||
## Closures
|
||||
|
||||
All functions in dst are closures; they combine some bytecode instructions
|
||||
with 0 or more environments. In the C source, a closure (hereby the same as
|
||||
a function) is represented by the type `DstFunction *`. The bytecode instruction
|
||||
part of the function is represented by `DstFuncDef *`, and a function environment
|
||||
is represented with `DstFuncEnv *`.
|
||||
|
||||
The function definition part of a function (the 'bytecode' part, `DstFuncDef *`),
|
||||
we also store various metadata about the function which is useful for debugging,
|
||||
as well as constants referenced by the function.
|
||||
|
||||
## C Functions
|
||||
|
||||
Dst uses C functions to bridge to native code. A C function
|
||||
(`DstCFunction *` in C) is a C function pointer that can be called like
|
||||
a normal dst closure. From the perspective of the bytecode instruction set, there is no difference
|
||||
in invoking a C function and invoking a normal dst function.
|
||||
|
||||
## Bytecode Format
|
||||
|
||||
Dst bytecode presents an interface to a virtual machine with a large number
|
||||
of identical registers that can hold any Dst value (`Dst *` in C). Most instructions
|
||||
have a destination register, and 1 or 2 source register. Registers are simply
|
||||
named with positive integers.
|
||||
|
||||
Each instruction is a 32 bit integer, meaning that the instruction set is a constant
|
||||
width RISC instruction set like MIPS. The opcode of each instruction is the least significant
|
||||
byte of the instruction. The highest bit of
|
||||
this leading byte is reserved for debugging purpose, so there are 128 possible opcodes encodable
|
||||
with this scheme. Not all of these possible opcode are defined, and will trap the interpreter
|
||||
and emit a debug signal. Note that this mean an unknown opcode is still valid bytecode, it will
|
||||
just put the interpreter into a debug state when executed.
|
||||
|
||||
```
|
||||
X - Payload bits
|
||||
O - Opcode bits
|
||||
|
||||
4 3 2 1
|
||||
+----+----+----+----+
|
||||
| XX | XX | XX | OO |
|
||||
+----+----+----+----+
|
||||
```
|
||||
|
||||
8 bits for the opcode leaves 24 bits for the payload, which may or may not be utilized.
|
||||
There are a few instruction variants that divide these payload bits.
|
||||
|
||||
* 0 arg - Used for noops, returning nil, or other instructions that take no
|
||||
arguments. The payload is essentially ignored.
|
||||
* 1 arg - All payload bits correspond to a single value, usually a signed or unsigned integer.
|
||||
Used for instructions of 1 argument, like returning a value, yielding a value to the parent fiber,
|
||||
or doing a (relative) jump.
|
||||
* 2 arg - Payload is split into byte 2 and bytes 3 and 4.
|
||||
The first argument is the 8 bit value from byte 2, and the second argument is the 16 bit value
|
||||
from bytes 3 and 4 (`instruction >> 16`). Used for instructions of two arguments, like move, normal
|
||||
function calls, conditionals, etc.
|
||||
* 3 arg - Bytes 2, 3, and 4 each correspond to an 8 bit argument.
|
||||
Used for arithmetic operations, emitting a signal, etc.
|
||||
|
||||
These instruction variants can be further refined based on the semantics of the arguments.
|
||||
Some instructions may treat an argument as a slot index, while other instructions
|
||||
will treat the argument as a signed integer literal, and index for a constant, an index
|
||||
for an environment, or an unsigned integer.
|
||||
|
||||
## Instruction Reference
|
||||
|
||||
A listing of all opcode values can be found in src/include/dst/dstopcodes.h. The dst assembly
|
||||
short names can be found src/assembler/asm.c. In this document, we will refer to the instructions
|
||||
by their short names as presented to the assembler rather than their numerical values.
|
||||
|
||||
Each instruction is also listed with a signature, which are the arguments the instruction
|
||||
expects. There are a handful of instruction signatures, which combine the arity and type
|
||||
of the instruction. The assembler does not
|
||||
do any typechecking per closure, but does prevent jumping to invalid instructions and
|
||||
failure to return or error.
|
||||
|
||||
### Notation
|
||||
|
||||
* The $ prefix indicates that a instruction parameter is acting as a virtual register (slot).
|
||||
If a parameter does not have the $ suffix in the description, it is acting as some kind
|
||||
of literal (usually an unsigned integer for indexes, and a signed integer for literal integers).
|
||||
|
||||
* Some operators in the description have the suffix 'i' or 'r'. These indicate
|
||||
that these operators correspond to integers or real numbers only, respectively. All
|
||||
bitwise operators and bit shifts only work with integers.
|
||||
|
||||
* The `>>>` indicates unsigned right shift, as in Java. Because all integers in dst are
|
||||
signed, we differentiate the two kinds of right bit shift.
|
||||
|
||||
* The 'im' suffix in the instruction name is short for immediate. The 'i' suffix is short for integer,
|
||||
and the 'r' suffix is short for real.
|
||||
|
||||
### Reference Table
|
||||
|
||||
| Instruction | Signature | Description |
|
||||
| ----------- | --------------------------- | --------------------------------- |
|
||||
| `add` | `(add dest lhs rhs)` | $dest = $lhs + $rhs |
|
||||
| `addi` | `(addi dest lhs rhs)` | $dest = $lhs +i $rhs |
|
||||
| `addim` | `(addim dest lhs im)` | $dest = $lhs +i im |
|
||||
| `addr` | `(addr dest lhs rhs)` | $dest = $lhs +r $rhs |
|
||||
| `band` | `(band dest lhs rhs)` | $dest = $lhs & $rhs |
|
||||
| `bnot` | `(bnot dest operand)` | $dest = ~$operand |
|
||||
| `bor` | `(bor dest lhs rhs)` | $dest = $lhs | $rhs |
|
||||
| `bxor` | `(bxor dest lhs rhs)` | $dest = $lhs ^ $rhs |
|
||||
| `call` | `(call dest callee)` | $dest = call($callee) |
|
||||
| `clo` | `(clo dest index)` | $dest = closure(defs[$index]) |
|
||||
| `cmp` | `(cmp dest lhs rhs)` | $dest = dst\_compare($lhs, $rhs) |
|
||||
| `debug` | `(debug)` | Suspend current fiber |
|
||||
| `div` | `(div dest lhs rhs)` | $dest = $lhs / $rhs |
|
||||
| `divi` | `(divi dest lhs rhs)` | $dest = $lhs /i $rhs |
|
||||
| `divim` | `(divim dest lhs im)` | $dest = $lhs /i im |
|
||||
| `divr` | `(divr dest lhs rhs)` | $dest = $lhs /r $rhs |
|
||||
| `eq` | `(eq dest lhs rhs)` | $dest = $lhs == $rhs |
|
||||
| `eqi` | `(eqi dest lhs rhs)` | $dest = $lhs ==i $rhs |
|
||||
| `eqim` | `(eqim dest lhs im)` | $dest = $lhs ==i im |
|
||||
| `eqr` | `(eqr dest lhs rhs)` | $dest = $lhs ==r $rhs |
|
||||
| `err` | `(err message)` | Throw error $message. |
|
||||
| `get` | `(get dest ds key)` | $dest = $ds[$key] |
|
||||
| `geti` | `(geti dest ds index)` | $dest = $ds[index] |
|
||||
| `gt` | `(gt dest lhs rhs)` | $dest = $lhs > $rhs |
|
||||
| `gti` | `(gti dest lhs rhs)` | $dest = $lhs \>i $rhs |
|
||||
| `gtim` | `(gtim dest lhs im)` | $dest = $lhs \>i im |
|
||||
| `gtr` | `(gtr dest lhs rhs)` | $dest = $lhs \>r $rhs |
|
||||
| `gter` | `(gter dest lhs rhs)` | $dest = $lhs >=r $rhs |
|
||||
| `jmp` | `(jmp label)` | pc = label, pc += offset |
|
||||
| `jmpif` | `(jmpif cond label)` | if $cond pc = label else pc++ |
|
||||
| `jmpno` | `(jmpno cond label)` | if $cond pc++ else pc = label |
|
||||
| `ldc` | `(ldc dest index)` | $dest = constants[index] |
|
||||
| `ldf` | `(ldf dest)` | $dest = false |
|
||||
| `ldi` | `(ldi dest integer)` | $dest = integer |
|
||||
| `ldn` | `(ldn dest)` | $dest = nil |
|
||||
| `lds` | `(lds dest)` | $dest = current closure (self) |
|
||||
| `ldt` | `(ldt dest)` | $dest = true |
|
||||
| `ldu` | `(ldu dest env index)` | $dest = envs[env][index] |
|
||||
| `lt` | `(lt dest lhs rhs)` | $dest = $lhs < $rhs |
|
||||
| `lti` | `(lti dest lhs rhs)` | $dest = $lhs \<i $rhs |
|
||||
| `ltim` | `(ltim dest lhs im)` | $dest = $lhs \<i im |
|
||||
| `ltr` | `(ltr dest lhs rhs)` | $dest = $lhs \<r $rhs |
|
||||
| `lter` | `(lter dest lhs rhs)` | $dest = $lhs <=r $rhs |
|
||||
| `movf` | `(movf src dest)` | $dest = $src |
|
||||
| `movn` | `(movn dest src)` | $dest = $src |
|
||||
| `mul` | `(mul dest lhs rhs)` | $dest = $lhs * $rhs |
|
||||
| `muli` | `(muli dest lhs rhs)` | $dest = $lhs \*i $rhs |
|
||||
| `mulim` | `(mulim dest lhs im)` | $dest = $lhs \*i im |
|
||||
| `mulr` | `(mulr dest lhs rhs)` | $dest = $lhs \*r $rhs |
|
||||
| `noop` | `(noop)` | Does nothing. |
|
||||
| `push` | `(push val)` | Push $val as arg |
|
||||
| `push2` | `(push2 val1 val3)` | Push $val1, $val2 as args |
|
||||
| `push3` | `(push3 val1 val2 val3)` | Push $val1, $val2, $val3, as args |
|
||||
| `pusha` | `(pusha array)` | Push values in $array as args |
|
||||
| `put` | `(put ds key val)` | $ds[$key] = $val |
|
||||
| `puti` | `(puti ds index val)` | $ds[index] = $val |
|
||||
| `res` | `(res dest fiber val)` | $dest = resume $fiber with $val |
|
||||
| `ret` | `(ret val)` | Return $val |
|
||||
| `retn` | `(retn)` | Return nil |
|
||||
| `setu` | `(setu env index val)` | envs[env][index] = $val |
|
||||
| `sig` | `(sig dest value sigtype)` | $dest = emit $value as sigtype |
|
||||
| `sl` | `(sl dest lhs rhs)` | $dest = $lhs << $rhs |
|
||||
| `slim` | `(slim dest lhs shamt)` | $dest = $lhs << shamt |
|
||||
| `sr` | `(sr dest lhs rhs)` | $dest = $lhs >> $rhs |
|
||||
| `srim` | `(srim dest lhs shamt)` | $dest = $lhs >> shamt |
|
||||
| `sru` | `(sru dest lhs rhs)` | $dest = $lhs >>> $rhs |
|
||||
| `sruim` | `(sruim dest lhs shamt)` | $dest = $lhs >>> shamt |
|
||||
| `sub` | `(sub dest lhs rhs)` | $dest = $lhs - $rhs |
|
||||
| `tcall` | `(tcall callee)` | Return call($callee) |
|
||||
| `tchck` | `(tcheck slot types)` | Assert $slot does matches types |
|
||||
|
Loading…
Reference in New Issue
Block a user