[Aeditor-talk] regexp, yet another redesign
Simon Strandgaard
neoneye at adslhome.dk
Mon Feb 23 01:40:46 EST 2004
Some status for the last few days..
This week I have been implementing the new design (I decided to change
the design radicaly, monday, 7 days ago). I have almost reached the same
point where the last design(s) have given up, so I am very cuorious to
if the new design will cut it. A word of warning for people which think
of writing their own regexp engine: nested quantifiers is complex!
Right now I have a minor problem with activation of the third
quantifier... the testcase which provokes the problem is here
def test_verbose_repeat3
data =<<HERE
# maximization of first repeat
a b c d # 0 0 0
# skip to next input
a b c d # 0 0 0
a . b c d # 1 0 0
a . . b c d # 2 0 0
a . . . b c d # 3 0 0 ok
a . . . . b c d # 4 0 0
a . . . . . b c d # 5 0 0
a . . . . . . b c d # 6 0 0
a . . . . . . . b c d # 7 0 0
a . . . . . . . . b c d # 8 0 0 end of string
# maximization of second repeat
# there should be no resume entry (2) for the first repeat
a . . . b . c d # 3 1 0
a . . . b . . c d # 3 2 0
a . . . b . . . c d # 3 3 0
a . . . b . . . . c d # 3 4 0 end of string
# maximization of second repeat
# there should be no resume entry (1) for neither first nor second
repeat
a . . . b c . d # 3 0 1
a . . . b c . . d # 3 0 2
a . . . b c . . . d # 3 0 3
a . . . b c . . . . d # 3 0 4 end of string
HERE
assert_regex(
["aaxxbcd", "axx", "", ""],
"a(.*)b(.*)c(.*)d",
"xaaxxbcdx" , :integrity_heredoc=>data
)
end
I have attached the error output where above testcase fails, it
generates huge amounts of output! Its easy to see where it goes wrong,
but its not that obvious what the exact cause of failure seems to be
(maybe im too tired today).
I have just tried out Rake (ruby replacement for Make), which is
really nice. For instance these rules validates against docbook dtd.
task :valid_catalog do
sh "xmllint --valid --noout catalog.xml"
end
task :valid_main do
sh "xmllint --valid --noout main.xml"
end
task :validall => [:valid_catalog, :valid_main]
rakefiles is much more consistent than makefiles.
--
Simon Strandgaard
-------------- next part --------------
Loaded suite TestScanner
Started
test_alternation1(TestScanner): .
test_alternation10(TestScanner): .
test_alternation11(TestScanner): .
test_alternation12(TestScanner): .
test_alternation13(TestScanner): .
test_alternation14(TestScanner): .
test_alternation15(TestScanner): .
test_alternation16(TestScanner): .
test_alternation2(TestScanner): .
test_alternation3(TestScanner): .
test_alternation4(TestScanner): .
test_alternation5(TestScanner): .
test_alternation6(TestScanner): .
test_alternation7(TestScanner): .
test_alternation8(TestScanner): .
test_alternation9(TestScanner): .
test_repeat1(TestScanner): .
test_repeat2(TestScanner): .
test_repeat3(TestScanner): .
test_repeat4(TestScanner): .
test_repeat5(TestScanner): .
test_repeat6(TestScanner): .
test_repeat7(TestScanner): .
test_repeat8(TestScanner): .
test_repeat_lazy1(TestScanner): .
test_repeat_lazy2(TestScanner): .
test_repeat_lazy3(TestScanner): .
test_repeat_lazy4(TestScanner): .
test_repeat_lazy5(TestScanner): .
test_repeat_lazy6(TestScanner): .
test_repeat_lazy7(TestScanner): .
test_repeat_lazy8(TestScanner): .
test_repeat_min1_1(TestScanner): .
test_repeat_min1_2(TestScanner): .
test_repeat_min1_3(TestScanner): .
test_repeat_min1_4(TestScanner): .
test_repeat_min1_5(TestScanner): .
test_repeat_min1_6(TestScanner): .
test_repeat_min2_1(TestScanner): .
test_repeat_min2_2(TestScanner): .
test_repeat_min2_3(TestScanner): .
test_repeat_min2_4(TestScanner): .
test_repeat_min2_5(TestScanner): .
test_repeat_min2_6(TestScanner): .
test_repeat_range1(TestScanner): .
test_repeat_range10(TestScanner): .
test_repeat_range11(TestScanner): .
test_repeat_range12(TestScanner): .
test_repeat_range13(TestScanner): .
test_repeat_range14(TestScanner): .
test_repeat_range15(TestScanner): .
test_repeat_range16(TestScanner): .
test_repeat_range17(TestScanner): .
test_repeat_range18(TestScanner): .
test_repeat_range19(TestScanner): .
test_repeat_range2(TestScanner): .
test_repeat_range20(TestScanner): .
test_repeat_range21(TestScanner): .
test_repeat_range22(TestScanner): .
test_repeat_range3(TestScanner): .
test_repeat_range4(TestScanner): .
test_repeat_range5(TestScanner): .
test_repeat_range6(TestScanner): .
test_repeat_range7(TestScanner): .
test_repeat_range8(TestScanner): .
test_repeat_range9(TestScanner): .
test_repeat_range_ignore1(TestScanner): .
test_repeat_range_ignore2(TestScanner): .
test_repeat_range_special1(TestScanner): .
test_repeat_range_special2(TestScanner): .
test_repeat_range_special3(TestScanner): .
test_repeat_range_special4(TestScanner): .
test_repeat_range_special5(TestScanner): .
test_repeat_range_special6(TestScanner): .
test_repeat_range_special7(TestScanner): .
test_sequence1(TestScanner): .
test_sequence2(TestScanner): .
test_sequence3(TestScanner): .
test_sequence4(TestScanner): .
test_sequence5(TestScanner): .
test_sequence6(TestScanner): .
test_verbose_alt_rep1(TestScanner): .
test_verbose_alt_rep2(TestScanner): .
test_verbose_repeat1(TestScanner): .
test_verbose_repeat2(TestScanner): .
test_verbose_repeat3(TestScanner): before #test_verbose_repeat3
regexp="a(.*)b(.*)c(.*)d"
+-Sequence
+-Literal "a"
+-Group register=1
| +-Repeat greedy{0,-1}
| +-Wildcard NOT["\n"]
+-Literal "b"
+-Group register=2
| +-Repeat greedy{0,-1}
| +-Wildcard NOT["\n"]
+-Literal "c"
+-Group register=3
| +-Repeat greedy{0,-1}
| +-Wildcard NOT["\n"]
+-Literal "d"
input="xaaxxbcdx"
----------------------------------------
execute at position 0
match "a" at position 0
path end = expected "a" but got "x"
check_integrity history.size=0
index-stack=[]
integrity "a b c d" (line 0)
execute at position 1
match "a" at position 1
group_open register=1
repeat 0
visitor#set_state from active into inactive
group_close register=1
match "b" at position 2
path end = expected "b" but got "a"
check_integrity history.size=1
index-stack=[0]
integrity "a b c d" (line 1)
next_path zero. found=false lazy=false state=active index=0 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 2
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 3
path end = expected "b" but got "x"
check_integrity history.size=2
index-stack=[1]
integrity "a . b c d" (line 2)
next_path zero. found=false lazy=false state=active index=1 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 3
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 4
path end = expected "b" but got "x"
check_integrity history.size=3
index-stack=[2]
integrity "a . . b c d" (line 3)
next_path zero. found=false lazy=false state=active index=2 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 4
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 5
group_open register=2
repeat 0
visitor#set_state from inactive into inactive
group_close register=2
match "c" at position 6
group_open register=3
repeat 0
visitor#set_state from inactive into inactive
group_close register=3
match "d" at position 7
last
path end = reached last node
check_integrity history.size=6
index-stack=[3, 0, 0]
integrity "a . . . b c d" (line 4)
next_path zero. found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero. found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero. found=true lazy=false state=active index=3 has_match=false
remember zero and clear
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 5
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 6
path end = expected "b" but got "c"
check_integrity history.size=5
index-stack=[4]
integrity "a . . . . b c d" (line 5)
next_path zero. found=false lazy=false state=active index=4 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 7
path end = expected "b" but got "d"
check_integrity history.size=6
index-stack=[5]
integrity "a . . . . . b c d" (line 6)
next_path zero. found=false lazy=false state=active index=5 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 7
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 8
path end = expected "b" but got "x"
check_integrity history.size=7
index-stack=[6]
integrity "a . . . . . . b c d" (line 7)
next_path zero. found=false lazy=false state=active index=6 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 8
visitor#set_state from active into inactive
one-end
group_close register=1
match "b" at position 9
path end = expected "b" but got <EndOfInput>
check_integrity history.size=8
index-stack=[7]
integrity "a . . . . . . . b c d" (line 8)
next_path zero. found=false lazy=false state=active index=7 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 9
path end = expected NOT["\n"] but got <EndOfInput>
check_integrity history.size=8
index-stack=[8]
integrity "a . . . . . . . . b c d" (line 9)
next_path one. found=false lazy=false state=active index=7 has_match=true
clear history
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=6 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=5 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=4 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=3 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=2 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=1 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=0 has_match=true
remember one and clear
install match
visitor#set_memento
----- BEGIN before activate -----
content of history stack
0: ONE_0 quantifier_id=0
1: ONE_1 quantifier_id=0
2: ONE_2 quantifier_id=0
3: ZERO_3 quantifier_id=0
4: ZERO_0 quantifier_id=1
5: ZERO_0 quantifier_id=2
quantifier information
0: active found=no
1: inactive found=no
2: inactive found=no
----- END before activate -----
activate_next
found zero/one
found zero/one
found zero/one
activate_inactive
----- BEGIN after activate -----
content of history stack
0: ONE_0 quantifier_id=0
1: ONE_1 quantifier_id=0
2: ONE_2 quantifier_id=0
3: ZERO_3 quantifier_id=0
4: ZERO_0 quantifier_id=1
5: ZERO_0 quantifier_id=2
quantifier information
0: done found=no
1: active found=yes
2: inactive found=no
----- END after activate -----
skip, path end = is done
next_path zero. found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero. found=true lazy=false state=active index=0 has_match=true
remember zero and clear
replace zero with one
visitor#set_state from active into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close register=2
match "c" at position 7
path end = expected "c" but got "d"
check_integrity history.size=6
index-stack=[3, 1]
integrity "a . . . b . c d" (line 10)
next_path zero. found=false lazy=false state=active index=1 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 7
visitor#set_state from active into inactive
one-end
group_close register=2
match "c" at position 8
path end = expected "c" but got "x"
check_integrity history.size=7
index-stack=[3, 2]
integrity "a . . . b . . c d" (line 11)
next_path zero. found=false lazy=false state=active index=2 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 8
visitor#set_state from active into inactive
one-end
group_close register=2
match "c" at position 9
path end = expected "c" but got <EndOfInput>
check_integrity history.size=8
index-stack=[3, 3]
integrity "a . . . b . . . c d" (line 12)
next_path zero. found=false lazy=false state=active index=3 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 9
path end = expected NOT["\n"] but got <EndOfInput>
check_integrity history.size=8
index-stack=[3, 4]
integrity "a . . . b . . . . c d" (line 13)
next_path one. found=false lazy=false state=active index=3 has_match=true
clear history
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=2 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=1 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one. found=true lazy=false state=active index=0 has_match=true
remember one and clear
install match
visitor#set_memento
----- BEGIN before activate -----
content of history stack
0: ONE_0 quantifier_id=0
1: ONE_1 quantifier_id=0
2: ONE_2 quantifier_id=0
3: ZERO_3 quantifier_id=0
4: ZERO_0 quantifier_id=1
5: ZERO_0 quantifier_id=2
quantifier information
0: active found=no
1: inactive found=no
2: inactive found=no
----- END before activate -----
activate_next
found zero/one
found zero/one
found zero/one
activate_inactive
----- BEGIN after activate -----
content of history stack
0: ONE_0 quantifier_id=0
1: ONE_1 quantifier_id=0
2: ONE_2 quantifier_id=0
3: ZERO_3 quantifier_id=0
4: ZERO_0 quantifier_id=1
5: ZERO_0 quantifier_id=2
quantifier information
0: done found=no
1: active found=yes
2: inactive found=no
----- END after activate -----
skip, path end = is done
next_path zero. found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero. found=true lazy=false state=active index=0 has_match=true
remember zero and clear
replace zero with one
visitor#set_state from active into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close register=2
match "c" at position 7
path end = expected "c" but got "d"
check_integrity history.size=6
index-stack=[3, 1]
integrity "a . . . b . c d", but expected "a . . . b c . d" at line 14. ERROR!
E
Finished in 0.896336 seconds.
1) Error:
test_verbose_repeat3(TestScanner):
RuntimeError: integrity: expected "a . . . b c . d", got "a . . . b . c d" at line 14
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:41:in `check_integrity'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner_helpers.rb:533:in `path_end'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:117:in `visit_match'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner_nodes.rb:19:in `accept'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:252:in `find_match_at'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:262:in `match_impl'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:260:in `loop'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:268:in `match_impl'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:272:in `match'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `match_integrity'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `call'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:282:in `check_integrity'
/home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `match_integrity'
./common.rb:66:in `match_integrity'
./common.rb:83:in `assert_regex'
./match_mixins.rb:894:in `_debug_test_verbose_repeat3'
(eval):5:in `test_verbose_repeat3'
86 tests, 85 assertions, 0 failures, 1 errors
More information about the Aeditor-talk
mailing list