Last modified on 25 January 2010, at 00:32

Parrot Virtual Machine/Not Quite Perl

Not Quite PerlEdit

Note:
The source for the NQP compiler on Parrot can be found in the compilers/nqp directory in the Parrot repository.

Not Quite Perl (NQP) is an implementation of a subset of the Perl 6 language which was originally intended to help bootstrap the implementation of Perl 6. In other words, the Perl 6 developers are writing the Perl 6 compiler in a subset of the Perl 6 language itself. This bootstrapping was accomplished by first writing a small NQP compiler using PIR. After the NQP compiler was completed, programs could then be written in NQP instead of having to write them entirely in PIR.

NQP is not just a tool reserved for use with Perl 6, however. Other languages are using NQP as a light-weight implementation language. A major benefit to NQP is that it does not rely on any external code libraries which would be subject to change over time. Because of its small footprint, however, NQP tends to lack many features of higher-level programming languages, and learning to program without using some common constructs can be challenging at first.

Variables in NQPEdit

Here we are going to discuss some of the basics of NQP programming. Experienced Perl programmers, even programmers who are familiar with Perl 5 but not necessarily Perl 6, will find most of this to be a simple review.

NQP is not perl5 or perl 6. This point cannot be stressed enough. There are a lot of features from Perl that are missing in NQP. Sometimes, this means you need to do some tasks the hard way. In NQP, we use the := operator, which is called the bind operator. Unlike normal variable assignment, bind does not copy the value from one "container" to another. Instead, it creates a link between the two variables, and they are, from that point forward, aliases of the same container. This is similar to the way copying a pointer in C does not copy the data being pointed to.

Variables in NQP typically have one of three basic types: scalars, arrays, and hashes. Scalars are single values, like an integer, a floating point number, or a string. Arrays are lists of scalars that are accessed with an integer index. A hash is a list of scalars that use a string, called a key, for indexing. All variable names have a sigil in front of them. A sigil is a punctuation symbol like "$", "@", or "%" that tells the type of the variable.

Scalar variables have a "$" sigil. The following are examples of scalar values:

$x := 5;
$mystring := "string";
$pi := 3.1415;

Arrays use the "@" sigil. We can use arrays like this:

@myarray[1] := 5;
@b[2] := @a[3]; 

Notice that NQP does not have a list context like Perl6 has. This means you can't do a list-assignment, like:

@b := (1, 2, 3); # WRONG!
$b := (1, 2, 3); # CORRECT

NQP is designed to be bare-bones, as little as is needed to support development of Perl6. The above line could be written also:

@b[0] := 1;
@b[1] := 2;
@b[2] := 3;

We'll discuss this in more detail a little bit further down the page. Hashes are prefixed with the "%" sigil:

%myhash{'mykey'} := 7
%mathconstants{'pi'} := 3.1415;
%mathconstants{'2pi'} := 2 * %mathconstants{'pi'};

Hashes, for people who aren't familiar with Perl, are also known as Dictionaries (in Python) or associative arrays. Basically, they are like arrays but with string indices instead of integer indices.

Where's My List Context?Edit

As we mentioned before, there is no such thing in NQP as "array context", which Perl 5 programmer might have expected. One of the big features of the Perl language is that it's context-aware, and it treats things differently depending on whether you are in scalar or array context. Without this, it really isn't perl. That's why they call it NQP, because it's perl-ish, but isn't quite perl. In NQP you cannot write either of the following:

@a := (1, 2, 3);
Wrong!
%b := ("a" => "b", "c" => "d");
Wrong!

Lexical And Global VariablesEdit

All variables (hashes, scalars, and arrays) can be declared to be lexical with the keyword "my", or global with the keyword "our". For those readers who have read the sections on PIR, "my" variables correspond to the .lex directive, and the instructions store_lex, and find_lex. "our" variables correspond to the set_global and find_global instructions. Here's an example:

This NQP Code Translates (roughly) into this PIR code
my $x;
my @y;
my %z;
set_lex "$x", ""
$P1 = new 'ResizablePMCArray'
set_lex "@y", $P1
$P2 = new 'Hash'
set_lex "%z", $P2

Likewise, for "our":

This NQP Code Translates (roughly) into this PIR code
our $x;
our @y;
our %z;
set_global "$x", ""
$P1 = new 'ResizablePMCArray'
set_global "@y", $P1
$P2 = new 'Hash'
set_global "%z", $P2

NQP Control ConstructsEdit

NQP has all the high-level control constructs that are missing in PIR. We have loops and If/Then/Else branches in a way that PIR does not have. Because this is a Perl-like language, the loops that NQP does have are varied and relatively high-level.

Branching ConstructsEdit

In terms of branches, we have:

If/Then/Else
if ($key eq 'foo') {
    THEN DO SOME FOO STUFF
}
elsif ($key eq 'bar') {
    THEN DO THE BAR-RELATED STUFF
}
else {
    OTHERWISE DO THIS
}
Unless/Then/Else

Looping ConstructsEdit

For
A "For" loop iterates over a list and sets $_ to the current index, as in perl5. There's no c-style loop with STARTING_POINT and STEP_ACTION in NQP, although there is a similar construct in both Perl 5 and Perl 6. Here is a basic for loop:
for (1,2,3) {
  Do something with $_
}
Translated exactly into this PIR code:
.sub 'for_statement'
   .param pmc match
   .local pmc block, past
   $P0  = match['EXPR']
   $P0  = $P0.'item'()
   $P1  = match['block']
   block = $P1.'item'()
   block.'blocktype'('sub')
   .local pmc params, topic_var
   params = block[0]
   $P3 = get_hll_global ['PAST'], 'Var'
   topic_var = $P3.'new'('name'=>'$_', 'scope'=>'parameter')
   params.'push'(topic_var)
   block.'symbol'('$_', 'scope'=>'lexical')
   $P2  = get_hll_global ['PAST'], 'Op'
   $S1  = match['sym']
   past = $P2.'new'($P0, block, 'pasttype'=>$S1, 'node'=>match)
   match.'result_object'(past)
.end 

You can also iterate over the keys of a hash like so:

for (keys %your_hash) {
    DO SOMETHING WITH %your_hash{$_}
}

where keys %your_hash creates a list of all of the keys in %your_hash, and iterates through this list setting $_ to hold the current key.

While
"While" loops are similar to for loops. In NQP, a while loop looks like this:
while(EXIT_CONDITION) {
   LOOP_CONTENTS
}
Which roughly becomes in PIR:
 loop_top:
   if(!EXIT_CONDITION) goto loop_end
   LOOP_CONTENTS
   goto loop_top
 loop_end:
Do/While
A "do/while" loop is similar to a while loop except that the condition is tested at the end of the loop and not at the beginning. This means that the loop is always executed at least once, and possibly more times if the condition is not satisfied. In NQP:
do {
  LOOP_CONTENTS
} while(EXIT_CONDITION);
In PIR:
loop_top:
   LOOP_CONTENTS
   if(!EXIT_CONDITION) goto loop_end
   goto loop_top
loop_end:


OperatorsEdit

NQP supports a small set of operators for manipulating variables.

Operator Purpose
+, - Scalar addition and subtraction
*, / Scalar multiplication and division
% integer modulus
$( ... ) Convert the argument into a scalar
@( ... ) Treat the argument as an array
%( ... ) Treat the argument as a hash
~ String concatenation
eq String equality comparison
ne String inequality comparison
:= binding
>, <, >=, <=, ==, != Equality and inequality operators

The Match Object, Defaults, and HashesEdit

When a grammar rule matches and the {*} rule is performed, a special type of hash object called the match object is generated and passed to the associated NQP method. This match object is given the special name $/. You can name it something different if you like, but you would lose a lot of the power that makes the $/ variable so special.

Ordinarily when you reference an object in a hash, you would use { } curly brackets. For example:

my %hash;
%hash{'key'} = "value";

When you want to call a value from a hash reference, you would have to do something even more complex:

$hashref->{'key'} = "value";
Note:
In NQP (and in Perl 6) angle-brackets magically "auto quote" what's inside them. So you can write <field> instead of {'field'} or <'field'>.

However, with the special default match object, you can use < > angle brackets instead. So, instead of writing

$/->{'key'}

We can write the less-verbose:

$<key>

The keys of the hash object correspond to the names of the subrules used in the grammar. So, if we had the grammar rule:

rule my_rule {
   <first> <second> <third> <andmore>
}

Our match object would have the fields:

$<first>
$<second>
$<third>
$<andmore>

If we have multiples of any one field, such as:

rule my_rule {
   <first> <second> <first> <second>
}

Now, $<first> and $<second> are both two-item arrays. Also, we can extend this behavior to repetition operators in the grammar:

rule my_rule {
   <first>+ <second>*
}

Now, both $<first> and $<second> are arrays whose length indicate how many items were matched by each. You can use the + operator or the scalar() function to get the number of items matched.

ExamplesEdit

Example: Word DetectionEdit

We want to make a simple parser that detects the words "Hello" or "Goodbye". If either of these words are entered, we want to print out a success message and the word. If neither word was entered, we print an error. To pick out words in our input, we will use the built-in subrule <ident>.

rule TOP {
  <ident>
  $
  {*}
}

In this grammar rule we are looking for a single identifier (which will be a word, for our purposes), followed by the end of the file. Once we have these, we create our match object and we call our Action method:

method TOP($/) {
  if($<ident> eq "Hello") {
     say("success! we found Hello");
  }
  elsif($<ident> eq "Goodbye") {
     say("success! we found Goodbye");
  }
  else {
     say("failure, we found: " ~ $<ident>);
  }
  make PAST::Stmts.new();
}

Since the HLLCompiler class expects our action method to return a PAST node, we must create and return an empty stmts node. When we run this parser on input it will have three possible outcomes:

  1. We've received a "Hello" or a "Goodbye", and the system will print a success method.
  2. We've received a different word, and we will receive an error message.
  3. We've received too many words, not enough words, or something that isn't a word. This will cause a parse error.

Try it!

Example: Oct2BinEdit

Here is a simple example that shows how to make a program to convert octal numbers into binary. We start with a basic language shell from mk_language_shell.pl

Grammar File:

grammar Oct2Bin::Grammar is PCT::Grammar;

rule TOP {
    <octdigit>+
    [ $ || <panic: Syntax error> ]
    {*}
}

token octdigit {'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'}

Action File:

class Oct2Bin::Grammar::Actions;

method TOP($/) {
    my @table;
    @table[0] := '000';
    @table[1] := '001';
    @table[2] := '010';
    @table[3] := '011';
    @table[4] := '100';
    @table[5] := '101';
    @table[6] := '110';
    @table[7] := '111';
    my $string := "";
    for $<octdigit> {
        $string := $string ~ @table[$_];
    }
    say( $string );
    make PAST::Stmts.new( );
}

Notice how in our actions file we had to instantiate the look-up table one element at a time? this is because NQP does not have a complete understanding of arrays. Notice also that we have our TOP method return an empty PAST::Stmts node, to suppress warnings from PCT that there are no PAST nodes.

PIR ActionsEdit

NQP isn't the only way for writing action methods to accompany a grammar. It's an attractive tool for a number of reasons, but it isn't the only option. Action methods can also be written in PIR or PASM. This is how the NQP compiler itself is implemented. Here is an example of how a PIR action might look:

.sub 'block' :method
   .param pmc match
   .param string key
   .local pmc past
   $P0 = get_hll_global ['PAST'], 'Stmts'
   past = $P0.'new'('node' => match)
   ...
   match.'result_object'(past)  # make $past;
.end


Previous Parrot Virtual Machine Next
Parrot Grammar Engine Optables and Expressions