Erlang Programming/Making Parsers with yecc

Making Parsers with yecc

Yecc is an erlang version of yacc/bison.

We have a BNF(Backus-Naur_form) grammar in a source file ending in .yrl, yrl means yecc rule list. We can parse a simple xhtml file using yecc. Actually, we will apply yecc to html.yrl to create a parser called html_parser.erl. Next we use the html_parser to parse some xhtml, voila.

yecc:yecc("html.yrl","html_parser.erl").
c(html_parser).
f(B), {_,B,_} =  
erl_scan:string(
"<html><head></head><body>hello_world</body></html>").
html_parser:parse(B).

All tags in the xhtml code must have matching open and close. (Of course a more powerful way to parse an xml file in erlang is to use xmerl).

html.yrl source:

Nonterminals tag elements element start_tag end_tag .
Terminals 'atom' '<' '>' '/'.
Rootsymbol tag.
tag -> 
        start_tag tag end_tag : 
        ['$1', '$2', '$3'].
tag -> 
        start_tag tag tag end_tag : 
        ['$1', '$2', '$3', '$4'].
tag -> 
        start_tag elements end_tag : 
        ['$1', {'contents','$2'}, '$3'].   
tag -> 
        start_tag end_tag : 
        ['$1','$2'].

start_tag -> '<' 'atom' '>' : {'open','$2'}.   
end_tag -> '<' '/' 'atom' '>' : {'close','$3'}.   
elements -> element : ['$1'].
elements -> element elements : ['$1', '$2'].
element -> atom : '$1'.

% yecc:yecc("html.yrl","html_parser.erl").
% c(html_parser).
% f(B), {_,B,_} =  
% erl_scan:string(
% "<html><head></head><body>hello_world</body></html>").
% html_parser:parse(B).

It can be a pain to build and run a parser each time we edit the source yrl file. To speed things up, we can use a program to build and run the parser for us. We compile and run the test program which builds the parser and tests it for us on some document.

-module(html_test).
-compile(export_all).

start() ->
        yecc:yecc("html.yrl","html_parser.erl"),
        cover:compile(html_parser),                         
        {_,List_of_symbols,_}=erl_scan:string(
                "<html><head><title>greeting</title></head>
                        <body>
                        hello there world what is up
                        </body>
                </html>"),
        {ok,L} = html_parser:parse(List_of_symbols),  
        register(do_event, spawn(html_test,event_loop,[])),
        Events = lists:flatten(L),
        send_events(Events),
        Events.

send_events([]) -> do_event ! {exit};
send_events([H|T]) ->
        do_event ! H,
        %io:format(" ~w ~n",[H]),
        send_events(T).
 
event_loop() ->
        receive
                {open,{atom,_Line_Number,html}} -> 
                        io:format("~n start scan ~n", []),
                        event_loop();
                {contents,List} -> 
                        Contents = get_contents(List,[]),
                        io:format("~n contents: ~w ~n", [Contents]);
                {exit} -> exit(normal)
        end,
        event_loop().
 
get_contents([],Items) -> Items;
get_contents([H|T],Items)->
        if
                length(T) > 0 ->
                        NT = hd(T);
                true ->
                        NT = T
        end,
        {atom,_N,Item} = H,
        NItems = Items++[Item],
        % io:format(" ~w ",[Item]),
        get_contents(NT,NItems).
        
% 6> c(html_test).
% {ok,html_test}
% 7> html_test:start().
%  [greeting]
%  [hello,there,world,what,is,up]
% and events.
Last modified on 24 March 2010, at 04:19