REBOL Enhancement Proposal for the PARSE dialect
Author: Gabriele Santilli
Date: 9-May-2006
Contents:
1. Abstract
2. THROW
3. LITERAL
4. TO and THRU
5. NOT
6. FAIL
7. IF or CHECK
8. DO-RULE
9. REMOVE, REPLACE, REPLACE-ONLY
10. INTO-STRING (or INTO/STRING)
11. RULE! datatype
12. PARSE handling FUNCTION! values
13. DO
13.1 Why raise an ERROR! ?
1. Abstract
This REP is a summary of various proposals regarding the PARSE dialect that have been
made in the past few years. I wrote a REP for PARSE on January 2003, and Ladislav
Mecir wrote other proposals a little while later; new ideas have been proposed lately
by Brian Hawley on the REBOL3 AltME world.
I originally proposed adding the DO and THROW commands. Ladislav proposed the LITERAL
command, making TO and THRU work with subrules, the NOT command, the FAIL command and the IF command; he
also proposes a DO-RULE command that should not be confused with my DO proposal.
Brian proposes CHECK (similar to Ladislav's IF), REMOVE, REPLACE, REPLACE-ONLY and INTO-STRING;
he's also proposing a RULE! datatype. I'm now proposing that PARSE should handle FUNCTION!
values specially, and this would allow adding most of the commands proposed here as mezzanine
code (although, it may still make sense to have them as keywords to avoid clashes with
normal REBOL functions).
2. THROW
The THROW command modifies the behavior of PARSE when a rule fails. It could be used while
parsing both blocks and strings. If the rule that follows the THROW command fails, PARSE should
raise an ERROR!, reporting the position of the PARSE cursor just before matching that rule in
the NEAR field of the error. THROW should accept a string argument as the error message.
rule: [pair! | 2 number!]
parse ["something else"] rule
; PARSE returns FALSE
parse ["something else"] [throw "Size expected" rule]
; ** Parse Error: Size expected
; ** Near: "something else"
digit: charset "0123456789"
parse "1234abc" [throw "Digit expected" [some digit end]]
; ** Parse Error: Digit expected
; ** Near: abc
With this command available, just by adding the THROW keyword in the rules you would
get better error reporting for your dialect.
3. LITERAL
Let's suppose, that we want to use PARSE to check, whether a block contains numbers 1 2 3 in this order. The rule can be:
parse block [1 1 1 1 1 2 1 1 3]
which looks awful. For this reason Ladislav proposes something like:
parse block [literal [1 2 3]]
4. TO and THRU
It has been asked many times by the community to allow TO and THRU to accept a subrule
as well as a literal value to search for. Basically,
thru rule
should be equivalent to
(cont: [end skip]) some [rule (cont: none) break | skip] cont
(with maybe some optimizations where possible).
5. NOT
The rule
not rule
would succeed if rule fails and viceversa. It would be equivalent to:
[rule (cont: [end skip]) | none (cont: none)] cont
6. FAIL
Even in the examples above we often use the [end skip] idiom to force a rule to
fail. PARSE should probably have a FAIL keyword that always fails (the opposite of
NONE), or maybe we should have:
fail: [end skip]
by default in REBOL.
7. IF or CHECK
Ladislav proposes an IF command, so that:
if [condition]
is equivalent to:
(cont: unless condition [[end skip]]) cont
Brian proposes to call it CHECK and express it as:
check (condition)
8. DO-RULE
Ladislav proposes a way to apply a dynamically built rule: PARSE would use
the result of evaluating some REBOL code as the rule to match.
do-rule [append ['a] ['b]]
would match the rule:
'a 'b
I'd prefer to use a paren instead of a block for the code though.
9. REMOVE, REPLACE, REPLACE-ONLY
Brian proposes the following commands and the equivalent rules for the current PARSE:
remove rule ==> tmp1: rule tmp2: :tmp1 (remove/part :tmp1 :tmp2)
replace rule (code) ==> tmp1: rule tmp2: :tmp1 (tmp1: change/part :tmp1 (code) :tmp2) :tmp1
replace-only rule (code) ==> tmp1: rule tmp2: :tmp1 (tmp1: change/part/only :tmp1 (code) :tmp2) :tmp1
Note that if parse operations are changed to take refinements, replace-only could be expressed as replace/only.
This would be slower in a native implementation but it would look more REBOL-like if that matters to you.
10. INTO-STRING (or INTO/STRING)
Brian proposes a way to parse a substring while doing block parsing.
into-string rule ==> set tmp1 string! (tmp1: unless parse tmp1 rule [fail]) tmp1
As above, this could maybe be expressed as INTO/STRING.
11. RULE! datatype
Using Brian's words:
Here's my first attempt at a pattern for recursion-safe temporaries:
use [var ...] [rule ...] ==> (tmp1: use [var ...] copy/deep [[rule ...]]) tmp1
It would only work with a directly specified variable and rule block, and you should
only use the temporaries directly in the rule block or they won't get rebound.
Now, using REBOL 3's closure (probably better):
use [var ...] [rule ...] ==> (tmp1: do closure [/local var ...] [[rule ...]]) tmp1
REBOL's existing function recursion support wouldn't work because the function
returns before the rule is run.
I would prefer a native implementation of this operation if possible.
The use operation above would be a good semantic model for parse rule closures with recursion-safe
temporaries. Imagine a new datatype called rule!, a parse rule block bundled with a recursion-safe
context for local variables. You would create one with a mezzanine like this:
parse-rule: func [locals [block!] rule [block!]] [make rule! reduce [locals rule]]
It would be the equivalent of a function made by the HAS mezzanine - local variables, no parameters.
The rule would be prebound to the context and the context would be fixed up on recursion just like
function contexts are. Any time parse would accept a rule block! it would also accept a rule! value.
Now, that USE operation above was just giving a semantic model - it is too slow to use as-is.
To be practical it would have to be implemented as the rule! type (whatever you want to call it)
for efficiency. If you allowed a rule! to take parameters (that would take some significant changes to parse)
you could do some really interesting guru stuff that even Perl 6 couldn't match, but that may be a feature
for another day. For now, all I am suggesting is a bundled context that would be fixed up to be recursion-safe,
just like functions are.
12. PARSE handling FUNCTION! values
To avoid adding a new datatype like Brian proposes, I would make PARSE handle FUNCTION! values,
since a FUNCTION is already a block with a context, and they can already handle recursion
correctly. Basically, the following code:
rule: does ['a 'b]
parse block [rule]
would be the same as:
rule: ['a 'b]
parse block [rule]
The advantage is that you can then write:
rule: has [val] [set val number! (print val)]
parse [1 2 3] [some rule]
which would even be recursion safe (as Brian asks for RULE!). If we even allow arguments,
we can then implement most of the commands suggested above as functions, for example:
remove: func [rule /local tmp1 tmp2] [tmp1: rule tmp2: :tmp1 (remove/part :tmp1 :tmp2)]
and so on. This will probably need more discussion.
13. DO
I'm leaving my DO proposal last because it is debatable, and although I'd use it a lot
I don't want to push too much for it (as noone else seems to have asked for something like
this). The reasoning behind DO is: I think it would make it easier to write dialect parsers;
for example, dialects like VID that mix REBOL expressions with the dialect itself would greatly
benefit from the DO command. I also think that such kind of mixing is very useful,
even if it looks like it could create confusion for the user; if a dialect does not allow the user
to directly use REBOL expression, the user will need to use COMPOSE (or other means) to construct
the dialect code on the fly; this means consuming more memory and slowing down the process.
Also, if the dialect is using PAREN!s (eg. like the PARSE dialect), the user has to use
workarounds to avoid COMPOSE evaluating the wrong PAREN! and so on.
The DO command should work similarly to the SET command. (Like the SET command, the DO command makes
sense only when parsing a BLOCK!.) But while SET sets a word with the value under the "PARSE cursor"
if it matches the subsequent PARSE rule, the DO command should set a word with the value obtained by
evaluating the REBOL expression under the PARSE cursor. If the result value does not match the rule
immediately following DO, PARSE should raise an ERROR! (something like: "Expecting [pair! | number!], not 10-Jan-2003");
the rule should be matched by PARSE as if the expression was replaced by a block containing the
resulting value and the rule was the argument of the INTO command. (I think this is the behavior that
makes more sense. I'm very open to suggestions.)
parse [1 + 1] [do result integer!]
; should evaluate the expression "1 + 1" and set the word 'RESULT to
; the resulting value (2); the PARSE cursor should be advanced after
; the expression (i.e. to the tail of the block, in this case).
circle-rule: [
'circle
do center pair!
do radius number! (radius: to-integer radius + 0.5)
]
parse [circle as-pair x y d / 2] circle-rule
; should evaluate "as-pair x y" and set 'CENTER to the result,
; then should evaluate "d / 2" and set 'RADIUS to the result, etc.
parse [circle 10 20] circle-rule
; should raise an ERROR! like:
; ** Parse Error: Expecting pair!, not 10
; ** Near: 10 20
; (i.e. with the NEAR field reporting the position of the PARSE cursor
; before the evaluation)
parse [circle 5x5 7 / 0] circle-rule
; ** Math Error: Attempt to divide by zero
; ** Near: 7 / 0
13.1 Why raise an ERROR! ?
The choice of raising an error instead of just failing the match in the case the result does not
match the rule following DO was made for a reason. Since evaluating expression can have side effects,
it would not be a good idea to just fail the match (i.e. like SET does).
While it makes perfect sense to write a rule like:
circle-parameters: [any [set radius number! | set center pair!]]
it would not be a good idea to write:
circle-parameters: [any [do radius number! | do center pair!]]
because code such as:
c: 5x5
; ...
parse [c: c * 2 10] circle-parameters
would actually set CENTER (and C) to 20x20, while the user was likely to expect it being set to 10x10.
A rule like that could be written as:
circle-parameters: [any [do value [number! (radius: value) | pair! (center: value)]]]
and should raise an error if the result is not a number! nor a pair!, thus preventing PARSE continuing
with other rules and evaluating the expression multiple times. If using a temporary word doesn't look
a good idea, it would be possible to make DO accept NONE as the word to set with the meaning of
"do not set any word, i.e. discard the result", and then write it as:
circle-parameters: [any [do none [set radius number! | set center pair!]]]
taking advantage of the way I have defined the behavior of the rule following DO.
|