mirror of
https://github.com/janet-lang/janet
synced 2024-11-28 19:19:53 +00:00
Add parser documentation.
This commit is contained in:
parent
24b9ae7820
commit
a412eecd36
245
doc/Parser.md
Normal file
245
doc/Parser.md
Normal file
@ -0,0 +1,245 @@
|
|||||||
|
# The Parser
|
||||||
|
|
||||||
|
A Janet program begins life as a text file, just a sequence of byte like
|
||||||
|
any other on your system. Janet source files should be UTF-8 or ASCII
|
||||||
|
encoded. Before Janet can compile or run your program, it must transform
|
||||||
|
your source code into a data structure. Janet is a lisp, which means it is
|
||||||
|
homoiconic - code is data, so all of the facilities in the language for
|
||||||
|
manipulating arrays, tuples, strings, and tables can be used for manipulating
|
||||||
|
your source code as well.
|
||||||
|
|
||||||
|
But before janet code is represented as a data structure, it must be read, or parsed,
|
||||||
|
by the janet parser. Called the reader in many other lisps, the parser is a machine
|
||||||
|
that takes in plain text and outputs data structures which can be used by both
|
||||||
|
the compiler and macros. In janet, it is a parser rather than a reader because
|
||||||
|
there is no code execution at read time. This is safer and simpler, and also
|
||||||
|
lets janet syntax serve as a robust data interchange format. While a parser
|
||||||
|
is not extensible, in janet the philosophy is to extend the language via macros
|
||||||
|
rather than reader macros.
|
||||||
|
|
||||||
|
## Nil, True and False
|
||||||
|
|
||||||
|
Nil, true and false are all literals than can be entered as such
|
||||||
|
in the parser.
|
||||||
|
|
||||||
|
```
|
||||||
|
nil
|
||||||
|
true
|
||||||
|
false
|
||||||
|
```
|
||||||
|
|
||||||
|
## Symbols
|
||||||
|
|
||||||
|
Janet symbols are represented a sequence of alphanumeric characters
|
||||||
|
not starting with a digit. They can also contain the characters
|
||||||
|
\!, @, $, \%, \^, \&, \*, -, \_, +, =, \|, \~, :, \<, \>, ., \?, \\, /, as
|
||||||
|
well as any Unicode codepoint not in the ascii range.
|
||||||
|
|
||||||
|
By convention, most symbols should be all lower case and use dashes to connect words
|
||||||
|
(sometimes called kebab case).
|
||||||
|
|
||||||
|
Symbols that come from another module often contain a forward slash that separates
|
||||||
|
the name of the module from the name of the definition in the module
|
||||||
|
|
||||||
|
```
|
||||||
|
symbol
|
||||||
|
kebab-case-symbol
|
||||||
|
snake_case_symbol
|
||||||
|
my-module/my-fuction
|
||||||
|
*****
|
||||||
|
!%$^*__--__._+++===~-crazy-symbol
|
||||||
|
*global-var*
|
||||||
|
你好
|
||||||
|
```
|
||||||
|
|
||||||
|
## Keywords
|
||||||
|
|
||||||
|
Janet keywords are really just symbols that begin with the character :. However, they
|
||||||
|
are used differently and treated by the compiler as a constant rather than a name for
|
||||||
|
something. Keywords are used mostly for keys in tables and structs, or pieces of syntax
|
||||||
|
in macros.
|
||||||
|
|
||||||
|
```
|
||||||
|
:keyword
|
||||||
|
:range
|
||||||
|
:0x0x0x0
|
||||||
|
:a-keyword
|
||||||
|
::
|
||||||
|
:
|
||||||
|
```
|
||||||
|
|
||||||
|
## Numbers
|
||||||
|
|
||||||
|
Janet numbers are represented by either 32 bit integers or
|
||||||
|
IEEE-754 floating point numbers. The syntax is similar to that of many other languages
|
||||||
|
as well. Numbers can be written in base 10, with
|
||||||
|
underscores used to separate digits into groups. A decimal point can be used for floating
|
||||||
|
point numbers. Numbers can also be written in other bases by prefixing the number with the desired
|
||||||
|
base and the character 'r'. For example, 16 can be written as `16`, `1_6`, `16r10`, `4r100`, or `0x10`. The
|
||||||
|
`0x` prefix can be used for hexadecimal as it is so common. The radix must be themselves written in base 10, and
|
||||||
|
can be any integer from 2 to 36. For any radix above 10, use the letters as digits (not case sensitive).
|
||||||
|
|
||||||
|
```
|
||||||
|
0
|
||||||
|
12
|
||||||
|
-65912
|
||||||
|
4.98
|
||||||
|
1.3e18
|
||||||
|
1.3E18
|
||||||
|
18r123C
|
||||||
|
11raaa&a
|
||||||
|
1_000_000
|
||||||
|
0xbeef
|
||||||
|
```
|
||||||
|
|
||||||
|
## Strings
|
||||||
|
|
||||||
|
Strings in janet are surrounded by double quotes. Strings are 8bit clean, meaning
|
||||||
|
meaning they can contain any arbitrary sequence of bytes, including embedded
|
||||||
|
0s. To insert a double quote into a string itself, escape
|
||||||
|
the double quote with a backslash. For unprintable characters, you can either use
|
||||||
|
one of a few common escapes, use the `\xHH` escape to escape a single byte in
|
||||||
|
hexidecimal. The supported escapes are:
|
||||||
|
|
||||||
|
- \\xHH Escape a single arbitrary byte in hexidecimal.
|
||||||
|
- \\n Newline (ASCII 10)
|
||||||
|
- \\t Tab character (ASCII 9)
|
||||||
|
- \\r Carriage Return (ASCII 13)
|
||||||
|
- \\0 Null (ASCII 0)
|
||||||
|
- \\z Null (ASCII 0)
|
||||||
|
- \\f Form Feed (ASCII 12)
|
||||||
|
- \\e Escape (ASCII 27)
|
||||||
|
- \\" Double Quote (ASCII 34)
|
||||||
|
- \\\\ Backslash (ASCII 92)
|
||||||
|
|
||||||
|
Strings can also contain literal newline characters that will be ignore.
|
||||||
|
This lets one define a multiline string that does not contain newline characters.
|
||||||
|
|
||||||
|
An alternative way of representing strings in janet is the long string, or the backquote
|
||||||
|
delimited string. A string can also be define to start with a certain number of
|
||||||
|
backquotes, and will end the same number of backquotes. Long strings
|
||||||
|
do not contain escape sequences; all bytes will be parsed literally until
|
||||||
|
ending delimiter is found. This is useful
|
||||||
|
for definining multiline strings with literal newline characters, unprintable
|
||||||
|
characters, or strings that would otherwise require many escape sequences.
|
||||||
|
|
||||||
|
```
|
||||||
|
"This is a string."
|
||||||
|
"This\nis\na\nstring."
|
||||||
|
"This
|
||||||
|
is
|
||||||
|
a
|
||||||
|
string."
|
||||||
|
``
|
||||||
|
This
|
||||||
|
is
|
||||||
|
a
|
||||||
|
string
|
||||||
|
``
|
||||||
|
```
|
||||||
|
|
||||||
|
## Buffers
|
||||||
|
|
||||||
|
Buffers are similar strings except they are mutable data structures. Strings in janet
|
||||||
|
cannot be mutated after created, where a buffer can be changed after creation.
|
||||||
|
The syntax for a buffer is the same as that for a string or long string, but
|
||||||
|
the buffer must be prefixed with the '@' character.
|
||||||
|
|
||||||
|
```
|
||||||
|
@""
|
||||||
|
@"Buffer."
|
||||||
|
@``Another buffer``
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tuples
|
||||||
|
|
||||||
|
Tuples are a sequence of white space separated values surrounded by either parentheses
|
||||||
|
or brackets. The parser considers any of the characters ASCII 32, \\0, \\f, \\n, \\r or \\t
|
||||||
|
to be whitespace.
|
||||||
|
|
||||||
|
```
|
||||||
|
(do 1 2 3)
|
||||||
|
[do 1 2 3]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Arrays
|
||||||
|
|
||||||
|
Arrays are the same as tuples, but have a leading @ to indicate mutability.
|
||||||
|
|
||||||
|
```
|
||||||
|
@(:one :two :three)
|
||||||
|
@[:one :two :three]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Structs
|
||||||
|
|
||||||
|
Structs are represented by a sequence of whitespace delimited key value pairs
|
||||||
|
surrounded by curly braces. The sequence is defined as key1, value1, key2, value2, etc.
|
||||||
|
There must be an even number of items between curly braces or the parser will
|
||||||
|
signal a parse error. Any value can be a key or value. Using nil as a key or
|
||||||
|
value, however, will drop that pair from the parsed struct.
|
||||||
|
|
||||||
|
```
|
||||||
|
{}
|
||||||
|
{:key1 "value1" :key2 :value2 :key3 3}
|
||||||
|
{(1 2 3) (4 5 6)}
|
||||||
|
{@[] @[]}
|
||||||
|
{1 2 3 4 5 6}
|
||||||
|
```
|
||||||
|
## Tables
|
||||||
|
|
||||||
|
Table have the same syntax as structs, except they have the @ prefix to indicate
|
||||||
|
that they are mutable.
|
||||||
|
|
||||||
|
```
|
||||||
|
@{}
|
||||||
|
@{:key1 "value1" :key2 :value2 :key3 3}
|
||||||
|
@{(1 2 3) (4 5 6)}
|
||||||
|
@{@[] @[]}
|
||||||
|
@{1 2 3 4 5 6}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Comments
|
||||||
|
|
||||||
|
Comments begin with a \# character and continue until the end of the line.
|
||||||
|
There are no multiline comments. For ricm multiline comments, use a
|
||||||
|
string literal.
|
||||||
|
|
||||||
|
## Shorthands
|
||||||
|
|
||||||
|
Often called reader macros in other lisps, Janet provides several shorthand
|
||||||
|
notations for some forms.
|
||||||
|
|
||||||
|
### 'x
|
||||||
|
|
||||||
|
Shorthand for `(quote x)`
|
||||||
|
|
||||||
|
### ;x
|
||||||
|
|
||||||
|
Shorthand for `(splice x)`
|
||||||
|
|
||||||
|
### \`x
|
||||||
|
|
||||||
|
Shorthand for `(quasiquote x)`
|
||||||
|
|
||||||
|
### ,x
|
||||||
|
|
||||||
|
Shorthand for `(unquote x)`
|
||||||
|
|
||||||
|
These shorthand notations can be combined in any order, allowing
|
||||||
|
forms like `''x` (`(quote (quote x))`), or `,;x` (`(unquote (splice x))`).
|
||||||
|
|
||||||
|
## API
|
||||||
|
|
||||||
|
The parser contains the following functions which exposes
|
||||||
|
the parser state machine as a janet abstract object.
|
||||||
|
|
||||||
|
- `parser/byte`
|
||||||
|
- `parser/consume`
|
||||||
|
- `parser/error`
|
||||||
|
- `parser/flush`
|
||||||
|
- `parser/new`
|
||||||
|
- `parser/produce`
|
||||||
|
- `parser/state`
|
||||||
|
- `parser/status`
|
||||||
|
- `parser/where`
|
Loading…
Reference in New Issue
Block a user