ROSE Compiler Framework/AST Matching
The AST Matching Mechanism:
- rose/src/midend/astMatching
The following documentation is in the following file (but not finished yet and doesn't show up in doxygen):
- rose/src/midend/astMatching/AstMatching.docs
The examples give a good overview of what you can do with the matcher. Note it uses its own parser and implements its own specification language for specifying match-expressions (with a number of different operators). The matcher is implemented on top of the AST iterator - the matcher is for that reason a use case of the iterator.
Introduction
editThe AstMatching mechanism allows to specify arbitrary large patterns to be matched on any subtree in the AST. The patterns are specified as strings and the type names of the AST nodes can be used to specify the AST patterns. Additionally variables and some operators are available to allow the specification of complex patterns. Subtrees can also be ignored in the matching by using '_'. The binary operator '|' allows to combine different matching subexpressions into one expression. Variables are used for specifying pointers to which matched subtrees are stored in the matching result for further processing by the user.
In the following example we match assignments with variables on both sides, such as x=y, and assign the result to the variable $R.
#include "rose.h" #include "AstTerm.h" #include "AstMatching.h" AstMatching m; MatchResult res=m.performMatching("$R=SgAssignOp(SgVarRef,SgVarRef)",astRoot);
where 'astRoot' is a pointer to some node in the AST. AssignOp and SgVarRef are the names of ROSE AST nodes and $R is the name of a matcher variable.
In the above example all subtrees representing an assign operation with two variables as operands would be matched. The dollar sign denotes a variable. In the above example the pointers to the matched subtrees are assigned to the variable $R. The result with all matched assignments is stored in the variable res of type AstMatchingResult. The matching result is a set of maps where each map represents the results for one successful match and holds pairs of a variable name and a pointer to the respective AST subtree.
Variables
editVariables are used to as keys to store/retrieve the matched subtrees in the result. An arbitrary number of variables (keys) can be used and two forms of use are supported. A variable is denoted with a leading dollar sign an arbitrary number of letters and underscores (a single underscore is used as wildcard). A variable assignment notation can be used to assign the pointers of a specified pattern to a variable.
Example use of Variables
editFor example, $R=SgAssignOp(SgVarRef,_,_) is matched with all assignments which have a variable on the left hand side and some expression on the right hand side.
Alternatively we can also use $R=SgAssignOp($X=SgVarRef,$Y=_) - in this case we also store a pointer to the matched variable node and a pointer to the expression on the rhs in the match result.
For the expression $Y=_ we can also simply write $Y as a shorthand, thus we can also use $R=SgAssignOp($X=SgVarRef,$Y) instead. The assignment of variables to variables, such as $Z=$Y, is not allowed.
Ignoring subtrees (wildcard '_')
editSubtrees can also be specified to be ignored for matching by using '_' in the match expression. For example, if we use SgAssignOp(_,_) we can match all assignment nodes in the AST, but ignore the structure of the ASTs representing the rhs and lhs.
Null-values
editNull values can be explicitly matched by using "null" in a match expression. For example $X=SgForStatement(_,_,_,_,null) would match all SgForStatement-terms with the 5th argument being 0.
Operators
editSkip-Subtree Operator '#'
editPlacement of operator '#' in a match expression allows to exclude arbitrary subtrees from applying the match operation in subsequent matches. I.e. the marked subtrees are not traversed. For example if we only want to match the for-statements at the outer most level, but no nested for statements, we can use:
$FOR=SgForStatement(_,_,_,#_,..)
This matches only the outer for-statements, as the body (4th argument) is excluded from applying the match operator. Without '#' we would also match the inner loops.
Arbitrary-Arguments Operator '..'
editThis operator can be used in match expressions to specify that an arbitrary number of arguments can follow. For example we can use SgBlock($First,..) to match the very first statement in a SgBlock. Since SgBlocks can have arbitrary arity this is quite useful in this respect. The operator '..' can only be used at most once when specifying the arity of a node, but arbitrary often in a match pattern, e.g. SgBlock(SgForStatement($Cond,..),..) is OK, but SgBlock(_,..,_,..) is not.
Or Operator "|"
editThis operator allows to combine multiple match expressions. For example "SgAddOp($L,$R)|SgSubOp($L,$R)" will match either a SgAddOp and bind pointers to its two children to $L and $R, or it will match SgSubOp. The operator '|' performs a short-circuit evaluation, thus, matching is performed from left to right and the matching stops as soon as one of the patterns can be successfully matched.
Examples
editUsing a top level single variable
edit- performMatching("$R=AssignOp(_,_)",astRoot);
Match all assignment operators in an AST.
- performMatching("$R=SgAssignOp(SgVarRefExp,SgIntVal)",astRoot);
Match all assignment operators with a variable on the lhs and an integer value on the rhs.
- performMatching("$FORROOT=SgForStatement(_,_,_,#_)",astRoot);
Match all outer most for loops, but no nested for-loops. The operator '#' ensures that the match expression is not applied on the AST representing the body of the for-statement (4th argument). The pointer to the root of the AST representing the for-loop is bound to $FORROOT.
- performMatching("$N=_(null)",astRoot);
Match all nodes with arity 1 and a single null value. The main purpose for such match-expressions is to perform consistency checks in the AST.
- performMatching("$N=SgInitializedName(null)",astRoot); // many of those exist in a default ROSE AST
Specifically match all SgInitializedName nodes with a null pointer.
Using multiple variables
editThe variables can be buried inside the pattern. The top level pattern does not need a variable if not needed.
- performMatching("SgForStatement($LoopCond,_,_,_)|SgWhile($LoopCond,_)|SgDoWhile(_,$LoopCond)",astRoot);
Match different Loop constructs and bind variable $LoopCond to the respective loop condition.
- performMatching("SgAssignOp(SgVarRef,SgAddOp($X,$Y))",astRoot)
Match assignments with a variable on the rhs and an add-operator on the rhs(root). The pointers to the sub-ASTs representing the lhs and rhs of the add-operator are bound to variables $X and $Y for each match in the AST:
- performMatching("$Func=SgFunctionCallExp($FuncRef,$Params)",astRoot)
Match all function calls and bind variable $Func to the root of each such expression, bind $FuncRef to the SgFunctionRefExp (which can be used to obtain the name) and $Params to the AST representing the parameters:
matching loop increment exp
edit#include "AstTerm.h" #include "AstMatching.h" AstMatching m; // match i++ MatchResult res=m.performMatching("SgForStatement(_,_,SgPlusPlusOp($I=SgVarRefExp),..)",forloop); // match i++ or i== // MatchResult res=m.performMatching("SgForStatement(_,_,SgPlusPlusOp($I=SgVarRefExp)|SgMinusMinusOp($I=SgVarRefExp),..)",forloop); // cout<<res.size()<<endl; for(MatchResult::iterator i=res.begin();i!=res.end();++i) { // obtain the result:each variable is a map SgVarRefExp* ivar = isSgVarRefExp( (*i)["$I"]); cout<<"var:"<< ivar->unparseToString()<<endl; }
Print out pattern string
editOften it is not clear what pattern string you should use for a given subtree. In this case you can first find its root node and use AstTerm::astTermWithNullValuesToString () to print out a pattern string.
#include "AstTerm.h" // check if this is within auto kernel = ...; SgStatement* stmt = SageInterface::getEnclosingStatement(exp); AstMatching m; MatchResult r =m.performMatching ("$L=SgVariableDeclaration", stmt); for(MatchResult::iterator i=r.begin();i!=r.end();++i) { SgVariableDeclaration* i1 = isSgVariableDeclaration((*i)["$L"]); cout<< AstTerm::astTermWithNullValuesToString(i1)<<endl; }
Some example output may look like
"SgAggregateInitializer(SgExprListExp(SgDesignatedInitializer(SgExprListExp(SgAdaOthersExp:others),SgAssignInitializer(SgAggregateInitializer(SgExprListExp(SgDesignatedInitializer(SgExprListExp(SgAdaOthersExp:others),SgAssignInitializer(SgLongDoubleVal:2.0))))))))"
Note that you should not directly copy&paste the output string to be the pattern to match. You have to remove :value portion of expressions.
For example: (SgLongDoubleVal:2.0) should become (SgLongDoubleVal)
There is a dedicated translator to dump out AST terms for an input file
- exampleTranslators/defaultTranslator/astTermGenerator.C
Accessing matching results
editThe results are collected in a std::list of std::maps. Each map represents on successful match at one location in the AST and contains all the bound variables. The variables can be accessed by name and using the random access operator. The number of elements (=maps) in the list corresponds to the number of matched patterns in the AST.
typedef std::map<std::string,SgNode*> SingleMatchVarBindings; typedef std::list<SingleMatchVarBindings> MatchResult;
The pointers to matched patterns in the AST can be accessed as follows:
/* 1 */ AstMatching m; /* 2 */ MatchResult res=m.performMatching("$R=SgInitializedName($X)",root); /* 3 */ std::cout << "Number of matched patterns: " << r.size() << std::endl; /* 4 */ for(MatchResult::iterator i=r.begin();i!=r.end();++i) { /* 5 */ cout<<"Match expression variable $R:"<<isSgLocatedNode((*i)["R"])->unparseToString()<<endl; /* 6 */ cout<<"Match expression variable $X:"<<isSgLocatedNode((*i)["R"])->unparseToString()<<endl; /* 7 */ }
In line 1 the AstMatching object is created. In line 2 the match-expression and the root node of the AST is provided to the matching mechanism and the results are computed. The matching can be performed on any AST subtree of interest, by letting 'root' point to the respective AST subtree when the match operation is started. In line 3 the number of matched patterns is printed. In lines 4-7 the variable $R variable $X is accessed, and by using the unparseToString function the respective matched AST subtree is unparsed. The pointer value of the match variables $R and $X refer to the root node of the successful match AST subtree.
Here is a more elaborate code example to perform one match operation on the entire ROSE AST and print all match results in the map using iterators:
#include "AstTerm.h" #include "AstMatching.h" // Fragment from the matcher_demo program AstMatching m; MatchResult r=m.performMatching("$R=SgInitializedName(_)",root); // print result in readable form for demo purposes std::cout << "Number of matched patterns: " << r.size() << std::endl; // for each matched instance for(MatchResult::iterator i=r.begin();i!=r.end();++i) { std::cout << "MATCH: \n"; // obtain the map SingleMatchVarBindings& dict=(*i); SgNode* matched_op = dict["$R"] ; // retrieve the node by its key/variable // Alternatively, iterate through all entries in the map. // for each variable in the matched instance for(SingleMatchVarBindings::iterator vars_iter=(*i).begin();vars_iter!=(*i).end();++vars_iter) { SgNode* matchedTerm=(*vars_iter).second; std::cout << " VAR: " << (*vars_iter).first << "=" << AstTerm::astTermWithNullValuesToString(matchedTerm) << " @" << matchedTerm << std::endl; } std::cout << std::endl; }
The variable matchedTerm is assigned the pointer to the respective ROSE AST node which is bound to a variable. (*vars_iter).first is the name of the variable as used in the match expression when calling performMatching. In this example these are $R, $X, and $Y. The function generateAstTerm is an auxiliary function which is used to print an AST in readable form on stdout. It is implemented using the same Ast::iterator_with_null which is also used by the matching mechanism.
Example-output:
MATCH: VAR: $R=SgInitializedName(null) @0x7f1f8914da00 MATCH: VAR: $R=SgInitializedName(null) @0x7f1f8914db28 MATCH: VAR: $R=SgInitializedName(SgAssignInitializer(SgIntVal)) @0x7f1f8914dc50 MATCH: VAR: $R=SgInitializedName(null) @0x7f1f8914dd78 MATCH: VAR: $R=SgInitializedName(null) @0x7f1f8914dea0 MATCH: VAR: $R=SgInitializedName(SgAssignInitializer(SgIntVal)) @0x7f1f8914dfc8