Parrot Virtual Machine/Advanced PGE
Information in this chapter will likely be covered in more detail when and if a book about Perl 6 is created |
Advanced PGE
editWe've already looked at some of the basics of parser constructing using PGE and NQP. In this chapter we are going to give a more in-depth look at some of the features of the grammar engine that we haven't seen yet. Some of these more advanced features, such as inline PIR code, assertions, function calls and built-in token types will make the life of a compiler designer much easier, but are not needed for most basic tasks.
PGE is an implementation of the Perl 6 Grammar Rules Engine, which will eventually be used directly in the Perl6 on Parrot implementation, "Rakudo". The Perl 6 grammar rules specification is rich and varied, and covering all aspects of it is beyond the scope of this book. We will cover the basics of PGE and the parts of it that are most useful for building compilers, but we cannot possibly cover all the advanced details of it. |
regex
, token
and proto
edit
A regex is a high-level matching operation that allows backtracking. A token is a low-level matching operation that does not allow backtracking. A proto is like a regex but allows multiple dispatch. Think of a proto declaration as being a prototype or signature that several functions can match.
Inline PIR Sections
editPIR can be embedded directly into both PGE grammar files and NQP files. This is important to fill in some gaps that NQP cannot handle due to its limitations. It is also helpful to insert some active processing into a grammar sometimes, to be able to direct the parser in a more intelligent way.
In NQP, PIR code can be inlined using the PIR
statement, followed by a quoted string of PIR code. This quoted string can be in the form of a perl-like "qw< ... >" type of quotation, if you think that looks better.
In PGE, inline PIR can be inserted using double-curly-brackets "{{ ... }}". Once in PIR mode, you can access the current match object by calling $Px = find_global "$/"
(where $Px
is any of the valid PIR registers where x is a number).
Built-In Token Types
editPGE has basic default values of certain rules already defined to help with parsing. However, you can redefine these to be something else, if you don't like the default behavior.
Calling Functions
editfunctions or subroutines are an integral part of modern programming practices. As such, support for them is part of the PAST system, and is relatively easy to implement. We're going to cover a little bit of necessary background information first, and then we will discuss how to put all the pieces together to create a system with usable subroutines.
return
Described
edit
In Parrot control flow, especially return operations from subroutines, are implemented as special control exceptions. The reason why it is done as an exception and not as a basic .return()
PIR statement is a little bit complicated. Many languages allow for nested lexical scopes, where variables defined in an "inner" scope cannot be seen, accessed, or modified by statements in the "outer" scope. In most compilers, this behavior is enforced by the compiler directly, and is invisible when the code is converted to assembly and machine languages. However PIR is like an assembly language for the Parrot system, and it's not possible to hide things at that level. All local variables are local to the entire subroutine and cannot be localized to a single part of a subroutine. To implement nested scopes, Parrot instead uses nested subroutine
Returns and Return Values
editFunctions can be made to return a value use the "return" PAST.op type. The return system is based on a control exception. Exceptions, as we've discussed before, move control flow to a specified location called the "exception handler". In terms of a return exception, the handler is the code directly after the original function call. The return values (currently, the return PAST node only allows a single return value) are passed as exception data items and are retrieved by the control exception handler.
All of these details are generally hidden from the programmer, and you can treat a return PAST node exactly like you would expect. You pass a return value, if any, to the return PAST node. The current function ends and its scope is destroyed. Control flow returns to the calling function, and the return value from the function is made available.
Assertions
editRepetition Counting with **
editMetaSyntactic Assertions
editYou can call a function from within a rule using the <FUNC( )>
format.
Non-Capturing Assertions
editUse <. >
form to create a match object that does not capture its contents.
Indirect Rules
editA rule of the form <$ >
, which can be a string or some other data, is converted into a regular expression and then run.
Character Classes
editRules of the form <[ ]>
contain custom character classes. Rules with <-[ ]>
are complimented character classes.
Built-in Assertions
edit<?before>
,<!before>
<?after>
,<!after>
<?same>
,<!same>
<.ws>
<?at()>
,<!at()>
Partial Matches
editYou can specify a partial match, a match which attempts to match as much as possible and never fails, with the <* >
form.
Recursive Calls
editYou can recurse back into subrules of the current match rule using the <~~ >
rule.
Resources
edit