Unlike commonly used regexp libraries, regular expressions are not strings: instead a first class syntax is used to define them.
Felix allows you to name regular expressions with the syntax:
regexp <name> = <regexp> ;The name is an identifier. A string used in a regexp stands for a match of each character of the string in sequence. The following symbols are special, and are given from weakest to strongest binding order:
symbol | syntax | meaning |
---|---|---|
| | infix | alternatives |
* | postfix | 0 or more occurences |
+ | postfix | 1 or more occurences |
? | postfix | 0 or 1 occurences |
<juxtaposition> | infix | concatenation |
<name> | atomic | re denoted by the name in a REGEXP definition |
<string> | atomic | sequence of chars of the string |
[<charset>] | atomic | any char of the charset |
[^<charset>] | atomic | any char not in the charset |
. | atomic | any char other than end of line |
_ | atomic | any char |
eof | atomic | end marker |
(<regexp>) | atomic | brackets |
symbol | meaning |
---|---|
<string> | any character in the string |
<char>-<char> | any between or including the two chars |
1: #line 896 "./lpsrc/flx_tutorial.pak" 2: #import <flx.flxh> 3: regexp lower = ["abcdefghijklmnopqrstuvwxyz"]; 4: regexp upper = ["ABCDEFGHIJKLMNOPQRSTUVWXYZ"]; 5: regexp digit = ["0123456789"]; 6: regexp alpha = lower | upper | "_"; 7: regexp id = alpha (alpha | digit) *; 8:
9: #line 910 "./lpsrc/flx_tutorial.pak" 10: print 11: regmatch "identifier" with 12: | digit+ => "Number" 13: | id => "Identifier" 14: endmatch 15: ; 16: endl; 17: 18: print 19: regmatch "9999" with 20: | digit+ => "Number" 21: | id => "Identifier" 22: endmatch 23: ; 24: endl; 25: 26: print 27: regmatch "999xxx" with 28: | digit+ => "Number" 29: | id => "Identifier" 30: | _* => "Neither" 31: endmatch 32: ; 33: endl; 34: 35:
Note: the generated code is *extremely* fast, within one or two memory fetches of the fastest possible code. here is the generated code for the inner loop of a regmatch:
while(state && start != end) state = matrix[*start++][state];