The use cases involve user-expandable grammars.
For example, consider the IRC nickname specification.
> They SHOULD NOT contain any dot character ('.', 0x2E).
> Servers MAY have additional implementation-specific nickname restrictions.
To implement this, we can do something along these lines:
```janet
(def nickname @{:main '(some :allowed)
:allowed (! (+ :forbidden/dot :forbidden/user))
# for lax mode, (put nickname :forbidden/dot false)
:forbidden/dot "."
# to add your own requirements
# (put nickname :forbidden/user 'something)
:forbidden/user false})
```
Additionally, it's common in parsing theory to allow matches of the
empty string (epsilon). `true` essentially allows for this.
Note that this does not strictly add new functionality, you could
emulate this previously using `0` and `(! 0)` respectively, but this
should be faster and more intuitive.
The speed improvement primarily comes from `(! 0)` which is now a single
step.
When peg/replace or peg/replace-all are given a function to serve as the text
replacement, any captures produced by the PEG are passed as additional
arguments to that function.
Functions will be invoked with the matched text, and their result will be
coerced to a string and used as the new replacement text.
This also allows passing non-function, non-byteviewable values, which will be
converted into strings during replacement (only once, and only if at least
one match is found).
For to and thru, we need to restore eveytime through the loop since rules need
run with the right captures on the stack, especially if they have any
sort of backrefs.
This is the beginning of a system for compiler warnings. This includes
linting, deprecation notices, and other compiler warnings that are best
detected by the `compile` function and don't require the partial
evalutaion of the flychecker.
The (undef rule :tag) combinator lets a user "scope" tagged captures.
After the rule has matched, all captures with tag :tag can no longer be
refered to by their tag. However, such captures from outside
rule are kept as is. If no tag is given, all tagged captures from rule
are unreferenced. Note that this doesn't `drop` the captures, merely
removes their association with the tag. This means subsequent calls to
`backref` and `backmatch` will no longer "see" these tagged captures.
Keep a separate stack for tagged references. May cause pegs to
use more memory but makes the backref and backmatch features much more
powerful.
Also disables the second stack if backref and backmatch are not used in the peg.
These capture the line and column number of the current position
in the matched text. This is useful for error reporting as well
as indentation checking.
This works by lazily creating an index on first use that stores all
newline character indices in order. We can then do a binary search on
this to get both line number and column number in log(n) time.
This is good enough for most use cases and doesn't slow down the common case at all
- these will not be commonly used patterns in a hot loop so it is not worth to try and
optimize this at all. Constant time look up should be possible but at
the cost of complicating code and slowing down all matching to check for
new lines.
Inpsired by the REBOL operators of the same name, these
combinators match bytes up to or inculding a given pattern.
(to patt) is (almost) equalivalent to (any (if-not patt 1)), and
(thru patt) is equivalent to (* (to patt) patt). The one difference
is that if the end of the input is reached and patt is not
matched, the entire pattern does not match.
This way we can support fewer build configurations. Also, remove
all undefined behavior due to use of memcpy with NULL pointers. GCC
was exploiting this to remove NULL checks in some builds.
This means we can add new properties to abstract types without
breaking old code. We can also make simple abstract types without
needing to add many NULL fields to the type.