mirror of
https://github.com/janet-lang/janet
synced 2025-01-12 00:20:26 +00:00
Add preliminary bytecode docs.
This commit is contained in:
parent
0d5b57daee
commit
077c345434
@ -62,6 +62,11 @@ Options are:
|
||||
$
|
||||
```
|
||||
|
||||
## Docmentation
|
||||
|
||||
API documentation and design documents will be added to the `doc` folder as they are written.
|
||||
As of March 2018, specifications are sparse because dst is evolving.
|
||||
|
||||
## Compiling and Running
|
||||
|
||||
Dst can be built with Make or CMake.
|
||||
|
195
doc/bytecode.md
Normal file
195
doc/bytecode.md
Normal file
@ -0,0 +1,195 @@
|
||||
# Dst Bytecode Interpreter
|
||||
|
||||
This document outlines the Dst bytecode format, and core ideas in the runtime.
|
||||
the are closely related to the bytecode. It should enable the reader
|
||||
to write dst assembly code and hopefully understand the dst internals better.
|
||||
It will also talk about the C abstractions used to implement some of these ideas.
|
||||
Some experience with basic computer organization is helpful for understanding
|
||||
the model of computation.
|
||||
|
||||
## The Stack = The Fiber
|
||||
|
||||
A Dst Fiber is the type used to represent multiple concurrent processes
|
||||
in dst. It is basically a wrapper around the idea of a stack. The stack is
|
||||
divided into a number of stack frames (`DstStackFrame *` in C), each of which
|
||||
contains information such as the function that created the stack frame,
|
||||
the program counter for the stack frame, a pointer to the previous frame,
|
||||
and the size of the frame. Each stack frame also is paired with a number
|
||||
registers.
|
||||
|
||||
```
|
||||
X: Slot
|
||||
|
||||
X
|
||||
X - Stack Top, for next function call.
|
||||
-----
|
||||
Frame next
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack 0
|
||||
-----
|
||||
Frame 0
|
||||
-----
|
||||
X
|
||||
X
|
||||
X - Stack -1
|
||||
-----
|
||||
Frame -1
|
||||
-----
|
||||
X
|
||||
X
|
||||
X
|
||||
X
|
||||
X - Stack -2
|
||||
-----
|
||||
Frame -2
|
||||
-----
|
||||
...
|
||||
...
|
||||
...
|
||||
-----
|
||||
Bottom of stack
|
||||
```
|
||||
|
||||
Fibers also have an incomplete stack frame for the next function call on top
|
||||
of their stacks. Making a function call involves pushing arguments to this
|
||||
temporary stack, and then invoking either the CALL or TCALL instructions.
|
||||
Arguments for the next function call are pushed via the PUSH, PUSH2, PUSH3, and
|
||||
PUSHA instructions. The stack of a fiber will grow as large as needed, so
|
||||
recursive algorithms can be used without fear of stack overflow.
|
||||
|
||||
The slots in the stack are exposed as virtual registers to instructions. They
|
||||
can hold any Dst value.
|
||||
|
||||
## Closures
|
||||
|
||||
All functions in dst are closures; they combine some bytecode instructions
|
||||
with 0 or more environments. In the C source, a closure (hereby the same as
|
||||
a function) is represented by the type `DstFunc *`. The bytecode instruction
|
||||
part of the function is represented by `DstFuncDef *`, and a function environment
|
||||
is represented with `DstFuncEnv *`.
|
||||
|
||||
The function definition part of a function (the 'bytecode' part, `DstFuncDef *`),
|
||||
we also store various metadata about the function which is useful for debugging,
|
||||
as well as constants referenced by the function.
|
||||
|
||||
## C Functions
|
||||
|
||||
Dst uses c functions to bridge to native code. A C function
|
||||
(`DstCFunction *` in C) is a C function pointer that can be called like
|
||||
a normal dst closure. From the perspective of the bytecode instruction set, there is no difference
|
||||
in invoking a c function and invoking a normal dst function.
|
||||
|
||||
## Bytecode Format
|
||||
|
||||
Dst bytecode presents an interface to virtual machine with a large number
|
||||
of identical registers that can hold any Dst value (`Dst *` in C). Most instructions
|
||||
have a destination register, and 1 or 2 source register. Registers are simply
|
||||
named with positive integers.
|
||||
|
||||
Each instruction is a 32 bit integer, meaning that the instruction set is a constant
|
||||
width instruction set like MIPS. The opcode of each instruction is the least significant
|
||||
byte of the instruction. This means there are 256 possible opcodes, but half of those
|
||||
are reserved, so 128 possible opcodes. The current implementation uses about half of these.
|
||||
|
||||
```
|
||||
X - Payload bits
|
||||
O - Opcode bits
|
||||
|
||||
4 3 2 1
|
||||
+----+----+----+----+
|
||||
| XX | XX | XX | OO |
|
||||
+----+----+----+----+
|
||||
```
|
||||
|
||||
8 bits for the opcode leaves 24 bits for the payload, which may or may not be utilized.
|
||||
There are a few instruction variants that divide these payload bits.
|
||||
|
||||
* 0 arg - Used for noops, returning nil, or other instructions that take no
|
||||
arguments. The payload is essentially ignored.
|
||||
* 1 arg - All payload bits correspond to a single value, usually a signed or a signed integer/
|
||||
Used for instructions of 1 argument, like returning a value, yielding a value to the parent fiber,
|
||||
or doing a jump.
|
||||
* 2 arg - Payload is split into byte 2 and bytes 3 and 4.
|
||||
The first argument is the 8 bit value from byte 2, and the second argument is the 16 bit value
|
||||
from bytes 3 and 4 (`instruction >> 16`). Used for instructions of two arguments, like move, normal
|
||||
function calls, conditionals, etc.
|
||||
* 3 arg - Bytes 2, 3, and 4 each correspond to an 8 bit argument.
|
||||
Used for arithmetic operations.
|
||||
|
||||
These instruction variants can be further refined based on the semantics of the arguments.
|
||||
Some instructions may treat an argument as a slot index, while other instructions
|
||||
will treat the argument as a signed integer literal, and index for a constant, an index
|
||||
for an environment, or an unsigned integer.
|
||||
|
||||
## Instruction Reference
|
||||
|
||||
A listing of all opcode values can be found in src/include/dst/dstopcodes.h. The dst assembly
|
||||
short names can be found src/assembler/asm.c. In this document, we will refer to the instructions
|
||||
by their short names as presented to the assembler rather than their numerical values.
|
||||
|
||||
* `add`:
|
||||
* `addi`:
|
||||
* `addim`:
|
||||
* `addr`:
|
||||
* `band`:
|
||||
* `bnot`:
|
||||
* `bor`:
|
||||
* `bxor`:
|
||||
* `call`:
|
||||
* `clo`:
|
||||
* `cmp`:
|
||||
* `debug`:
|
||||
* `div`:
|
||||
* `divi`:
|
||||
* `divim`:
|
||||
* `divr`:
|
||||
* `eq`:
|
||||
* `err`:
|
||||
* `get`:
|
||||
* `geti`:
|
||||
* `gt`:
|
||||
* `jmp`:
|
||||
* `jmpi`:
|
||||
* `jmpn`:
|
||||
* `ldc`:
|
||||
* `ldf`:
|
||||
* `ldi`:
|
||||
* `ldn`:
|
||||
* `lds`:
|
||||
* `ldt`:
|
||||
* `ldu`:
|
||||
* `lt`:
|
||||
* `movf`:
|
||||
* `movn`:
|
||||
* `mul`:
|
||||
* `muli`:
|
||||
* `mulim`:
|
||||
* `mulr`:
|
||||
* `noop`:
|
||||
* `push`:
|
||||
* `push2`:
|
||||
* `push3`:
|
||||
* `pusha`:
|
||||
* `put`:
|
||||
* `puti`:
|
||||
* `res`:
|
||||
* `ret`:
|
||||
* `retn`:
|
||||
* `setu`:
|
||||
* `sl`:
|
||||
* `slim`:
|
||||
* `sr`:
|
||||
* `srim`:
|
||||
* `sru`:
|
||||
* `sruim`:
|
||||
* `sub`:
|
||||
* `tcall`:
|
||||
* `tchck`:
|
||||
* `yield`:
|
||||
|
Loading…
Reference in New Issue
Block a user