re2c Version 0.9.3 ------------------ Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca) Currently maintained by: Dan Nuffer Marcus Boerger Hartmut Kaiser The re2c distribution can be found at: http://sourceforge.net/projects/re2c/ This distribution is a cleaned up version of the 0.5 release. Several bugs were fixed as well as code cleanup for warning free compilation. It has been developed and tested with egcs 1.0.2 and gcc 2.7.2.3, 2.96 and 3.3.1 on Linux x86. You can compile your own version with other gcc version if you have yacc or any working bison version (tested up to bison 1.875). You can build this software by simply typing the following commands: ./autogen.sh ./configure make The above version will be based on the pregenerated scanner.cc file. If you want to build that file yourself (recommended when installing re2c) you need the following steps: ./autogen.sh ./configure make rm -f scanner.cc make install Or you can create a rpm package and install it by the following commands: ./autogen.sh ./configure ./makerpm rpm -Uhv /re2c-0.9.3-.rpm Here should be a number like 1. And must equal the directory where the makerpm step has written the generated rpm to. re2c is a great tool for writing fast and flexible lexers. It has served many people well for many years. re2c is on the order of 2-3 times faster than a flex based scanner, and its input model is much more flexible. Peter's original version 0.5 ANNOUNCE and README follows. -- re2c is a tool for generating C-based recognizers from regular expressions. re2c-based scanners are efficient: for programming languages, given similar specifications, an re2c-based scanner is typically almost twice as fast as a flex-based scanner with little or no increase in size (possibly a decrease on cisc architectures). Indeed, re2c-based scanners are quite competitive with hand-crafted ones. Unlike flex, re2c does not generate complete scanners: the user must supply some interface code. While this code is not bulky (about 50-100 lines for a flex-like scanner; see the man page and examples in the distribution) careful coding is required for efficiency (and correctness). One advantage of this arrangement is that the generated code is not tied to any particular input model. For example, re2c generated code can be used to scan data from a null-byte terminated buffer as illustrated below. Given the following source #define NULL ((char*) 0) char *scan(char *p){ char *q; #define YYCTYPE char #define YYCURSOR p #define YYLIMIT p #define YYMARKER q #define YYFILL(n) /*!re2c [0-9]+ {return YYCURSOR;} [\000-\377] {return NULL;} */ } re2c will generate /* Generated by re2c on Sat Apr 16 11:40:58 1994 */ #line 1 "simple.re" #define NULL ((char*) 0) char *scan(char *p){ char *q; #define YYCTYPE char #define YYCURSOR p #define YYLIMIT p #define YYMARKER q #define YYFILL(n) { YYCTYPE yych; unsigned int yyaccept; goto yy0; yy1: ++YYCURSOR; yy0: if((YYLIMIT - YYCURSOR) < 2) YYFILL(2); yych = *YYCURSOR; if(yych <= '/') goto yy4; if(yych >= ':') goto yy4; yy2: yych = *++YYCURSOR; goto yy7; yy3: #line 10 {return YYCURSOR;} yy4: yych = *++YYCURSOR; yy5: #line 11 {return NULL;} yy6: ++YYCURSOR; if(YYLIMIT == YYCURSOR) YYFILL(1); yych = *YYCURSOR; yy7: if(yych <= '/') goto yy3; if(yych <= '9') goto yy6; goto yy3; } #line 12 } Note that most compilers will perform dead-code elimination to remove all YYCURSOR, YYLIMIT comparisions. re2c was developed for a particular project (constructing a fast REXX scanner of all things!) and so while it has some rough edges, it should be quite usable. More information about re2c can be found in the (admittedly skimpy) man page; the algorithms and heuristics used are described in an upcoming LOPLAS article (included in the distribution). Probably the best way to find out more about re2c is to try the supplied examples. re2c is written in C++, and is currently being developed under Linux using gcc 2.5.8. Peter -- re2c is distributed with no warranty whatever. The code is certain to contain errors. Neither the author nor any contributor takes responsibility for any consequences of its use. re2c is in the public domain. The data structures and algorithms used in re2c are all either taken from documents available to the general public or are inventions of the author. Programs generated by re2c may be distributed freely. re2c itself may be distributed freely, in source or binary, unchanged or modified. Distributors may charge whatever fees they can obtain for re2c. If you do make use of re2c, or incorporate it into a larger project an acknowledgement somewhere (documentation, research report, etc.) would be appreciated. Please send bug reports and feedback (including suggestions for improving the distribution) to peter@csg.uwaterloo.ca Include a small example and the banner from parser.y with bug reports.