[Aeditor-talk] regexp, yet another redesign

Simon Strandgaard neoneye at adslhome.dk
Mon Feb 23 01:40:46 EST 2004


Some status for the last few days..


This week I have been implementing the new design (I decided to change
the design radicaly, monday, 7 days ago). I have almost reached the same
point where the last design(s) have given up, so I am very cuorious to
if the new design will cut it.  A word of warning for people which think
of writing their own regexp engine: nested quantifiers is complex!


Right now I have a minor problem with activation of the third
quantifier... the testcase which provokes the problem is here

  def test_verbose_repeat3
data =<<HERE
# maximization of first repeat
a b c d                   # 0 0 0
# skip to next input
a b c d                   # 0 0 0
a . b c d                 # 1 0 0
a . . b c d               # 2 0 0  
a . . . b c d             # 3 0 0 ok
a . . . . b c d           # 4 0 0 
a . . . . . b c d         # 5 0 0 
a . . . . . . b c d       # 6 0 0 
a . . . . . . . b c d     # 7 0 0 
a . . . . . . . . b c d   # 8 0 0 end of string  
# maximization of second repeat
# there should be no resume entry (2) for the first repeat
a . . . b . c d           # 3 1 0 
a . . . b . . c d         # 3 2 0 
a . . . b . . . c d       # 3 3 0 
a . . . b . . . . c d     # 3 4 0 end of string 
# maximization of second repeat
# there should be no resume entry (1) for neither first nor second
repeat
a . . . b c . d           # 3 0 1 
a . . . b c . . d         # 3 0 2 
a . . . b c . . . d       # 3 0 3 
a . . . b c . . . . d     # 3 0 4 end of string  
HERE
    assert_regex(
      ["aaxxbcd", "axx", "", ""], 
      "a(.*)b(.*)c(.*)d", 
      "xaaxxbcdx" , :integrity_heredoc=>data
    )
  end 


I have attached the error output where above testcase fails, it
generates huge amounts of output! Its easy to see where it goes wrong,
but its not that obvious what the exact cause of failure seems to be
(maybe im too tired today).



I have just tried out Rake (ruby replacement for Make), which is
really nice. For instance these rules validates against docbook dtd.

  task :valid_catalog do
    sh "xmllint --valid --noout catalog.xml"
  end
  task :valid_main do
    sh "xmllint --valid --noout main.xml"
  end
  task :validall => [:valid_catalog, :valid_main]

rakefiles is much more consistent than makefiles.

--
Simon Strandgaard
-------------- next part --------------
Loaded suite TestScanner
Started
test_alternation1(TestScanner): .
test_alternation10(TestScanner): .
test_alternation11(TestScanner): .
test_alternation12(TestScanner): .
test_alternation13(TestScanner): .
test_alternation14(TestScanner): .
test_alternation15(TestScanner): .
test_alternation16(TestScanner): .
test_alternation2(TestScanner): .
test_alternation3(TestScanner): .
test_alternation4(TestScanner): .
test_alternation5(TestScanner): .
test_alternation6(TestScanner): .
test_alternation7(TestScanner): .
test_alternation8(TestScanner): .
test_alternation9(TestScanner): .
test_repeat1(TestScanner): .
test_repeat2(TestScanner): .
test_repeat3(TestScanner): .
test_repeat4(TestScanner): .
test_repeat5(TestScanner): .
test_repeat6(TestScanner): .
test_repeat7(TestScanner): .
test_repeat8(TestScanner): .
test_repeat_lazy1(TestScanner): .
test_repeat_lazy2(TestScanner): .
test_repeat_lazy3(TestScanner): .
test_repeat_lazy4(TestScanner): .
test_repeat_lazy5(TestScanner): .
test_repeat_lazy6(TestScanner): .
test_repeat_lazy7(TestScanner): .
test_repeat_lazy8(TestScanner): .
test_repeat_min1_1(TestScanner): .
test_repeat_min1_2(TestScanner): .
test_repeat_min1_3(TestScanner): .
test_repeat_min1_4(TestScanner): .
test_repeat_min1_5(TestScanner): .
test_repeat_min1_6(TestScanner): .
test_repeat_min2_1(TestScanner): .
test_repeat_min2_2(TestScanner): .
test_repeat_min2_3(TestScanner): .
test_repeat_min2_4(TestScanner): .
test_repeat_min2_5(TestScanner): .
test_repeat_min2_6(TestScanner): .
test_repeat_range1(TestScanner): .
test_repeat_range10(TestScanner): .
test_repeat_range11(TestScanner): .
test_repeat_range12(TestScanner): .
test_repeat_range13(TestScanner): .
test_repeat_range14(TestScanner): .
test_repeat_range15(TestScanner): .
test_repeat_range16(TestScanner): .
test_repeat_range17(TestScanner): .
test_repeat_range18(TestScanner): .
test_repeat_range19(TestScanner): .
test_repeat_range2(TestScanner): .
test_repeat_range20(TestScanner): .
test_repeat_range21(TestScanner): .
test_repeat_range22(TestScanner): .
test_repeat_range3(TestScanner): .
test_repeat_range4(TestScanner): .
test_repeat_range5(TestScanner): .
test_repeat_range6(TestScanner): .
test_repeat_range7(TestScanner): .
test_repeat_range8(TestScanner): .
test_repeat_range9(TestScanner): .
test_repeat_range_ignore1(TestScanner): .
test_repeat_range_ignore2(TestScanner): .
test_repeat_range_special1(TestScanner): .
test_repeat_range_special2(TestScanner): .
test_repeat_range_special3(TestScanner): .
test_repeat_range_special4(TestScanner): .
test_repeat_range_special5(TestScanner): .
test_repeat_range_special6(TestScanner): .
test_repeat_range_special7(TestScanner): .
test_sequence1(TestScanner): .
test_sequence2(TestScanner): .
test_sequence3(TestScanner): .
test_sequence4(TestScanner): .
test_sequence5(TestScanner): .
test_sequence6(TestScanner): .
test_verbose_alt_rep1(TestScanner): .
test_verbose_alt_rep2(TestScanner): .
test_verbose_repeat1(TestScanner): .
test_verbose_repeat2(TestScanner): .
test_verbose_repeat3(TestScanner): before #test_verbose_repeat3
regexp="a(.*)b(.*)c(.*)d"
+-Sequence
  +-Literal "a"
  +-Group register=1 
  | +-Repeat greedy{0,-1}
  |   +-Wildcard NOT["\n"]
  +-Literal "b"
  +-Group register=2 
  | +-Repeat greedy{0,-1}
  |   +-Wildcard NOT["\n"]
  +-Literal "c"
  +-Group register=3 
  | +-Repeat greedy{0,-1}
  |   +-Wildcard NOT["\n"]
  +-Literal "d"
input="xaaxxbcdx"
----------------------------------------
execute at position 0
match "a" at position 0
path end = expected "a" but got "x"
check_integrity history.size=0
index-stack=[]
integrity "a b c d" (line 0)

execute at position 1
match "a" at position 1
group_open   register=1
repeat 0
visitor#set_state from active into inactive
group_close  register=1
match "b" at position 2
path end = expected "b" but got "a"
check_integrity history.size=1
index-stack=[0]
integrity "a b c d" (line 1)

next_path zero.  found=false lazy=false state=active index=0 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 2
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 3
path end = expected "b" but got "x"
check_integrity history.size=2
index-stack=[1]
integrity "a . b c d" (line 2)

next_path zero.  found=false lazy=false state=active index=1 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 3
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 4
path end = expected "b" but got "x"
check_integrity history.size=3
index-stack=[2]
integrity "a . . b c d" (line 3)

next_path zero.  found=false lazy=false state=active index=2 has_match=false
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 4
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 5
group_open   register=2
repeat 0
visitor#set_state from inactive into inactive
group_close  register=2
match "c" at position 6
group_open   register=3
repeat 0
visitor#set_state from inactive into inactive
group_close  register=3
match "d" at position 7
last
path end = reached last node
check_integrity history.size=6
index-stack=[3, 0, 0]
integrity "a . . . b c d" (line 4)

next_path zero.  found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero.  found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero.  found=true lazy=false state=active index=3 has_match=false
remember zero and clear
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 5
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 6
path end = expected "b" but got "c"
check_integrity history.size=5
index-stack=[4]
integrity "a . . . . b c d" (line 5)

next_path zero.  found=false lazy=false state=active index=4 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 7
path end = expected "b" but got "d"
check_integrity history.size=6
index-stack=[5]
integrity "a . . . . . b c d" (line 6)

next_path zero.  found=false lazy=false state=active index=5 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 7
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 8
path end = expected "b" but got "x"
check_integrity history.size=7
index-stack=[6]
integrity "a . . . . . . b c d" (line 7)

next_path zero.  found=false lazy=false state=active index=6 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 8
visitor#set_state from active into inactive
one-end
group_close  register=1
match "b" at position 9
path end = expected "b" but got <EndOfInput>
check_integrity history.size=8
index-stack=[7]
integrity "a . . . . . . . b c d" (line 8)

next_path zero.  found=false lazy=false state=active index=7 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 9
path end = expected NOT["\n"] but got <EndOfInput>
check_integrity history.size=8
index-stack=[8]
integrity "a . . . . . . . . b c d" (line 9)

next_path one.  found=false lazy=false state=active index=7 has_match=true
clear history
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=6 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=5 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=4 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=3 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=2 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=1 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=0 has_match=true
remember one and clear
install match
visitor#set_memento
----- BEGIN before activate -----
content of history stack
    0: ONE_0    quantifier_id=0
    1: ONE_1    quantifier_id=0
    2: ONE_2    quantifier_id=0
    3: ZERO_3   quantifier_id=0
    4: ZERO_0   quantifier_id=1
    5: ZERO_0   quantifier_id=2
quantifier information
    0: active   found=no
    1: inactive found=no
    2: inactive found=no
----- END before activate -----
activate_next
found zero/one
found zero/one
found zero/one
activate_inactive
----- BEGIN after activate -----
content of history stack
    0: ONE_0    quantifier_id=0
    1: ONE_1    quantifier_id=0
    2: ONE_2    quantifier_id=0
    3: ZERO_3   quantifier_id=0
    4: ZERO_0   quantifier_id=1
    5: ZERO_0   quantifier_id=2
quantifier information
    0: done     found=no
    1: active   found=yes
    2: inactive found=no
----- END after activate -----
skip, path end = is done
next_path zero.  found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero.  found=true lazy=false state=active index=0 has_match=true
remember zero and clear
replace zero with one
visitor#set_state from active into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close  register=2
match "c" at position 7
path end = expected "c" but got "d"
check_integrity history.size=6
index-stack=[3, 1]
integrity "a . . . b . c d" (line 10)

next_path zero.  found=false lazy=false state=active index=1 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 7
visitor#set_state from active into inactive
one-end
group_close  register=2
match "c" at position 8
path end = expected "c" but got "x"
check_integrity history.size=7
index-stack=[3, 2]
integrity "a . . . b . . c d" (line 11)

next_path zero.  found=false lazy=false state=active index=2 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 8
visitor#set_state from active into inactive
one-end
group_close  register=2
match "c" at position 9
path end = expected "c" but got <EndOfInput>
check_integrity history.size=8
index-stack=[3, 3]
integrity "a . . . b . . . c d" (line 12)

next_path zero.  found=false lazy=false state=active index=3 has_match=true
clear history
replace zero with one
visitor#set_state from inactive into active
visitor#set_memento
match NOT["\n"] at position 9
path end = expected NOT["\n"] but got <EndOfInput>
check_integrity history.size=8
index-stack=[3, 4]
integrity "a . . . b . . . . c d" (line 13)

next_path one.  found=false lazy=false state=active index=3 has_match=true
clear history
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=2 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=1 has_match=true
remember one and clear
install match
visitor#set_memento
skip, path end = is done
next_path one.  found=true lazy=false state=active index=0 has_match=true
remember one and clear
install match
visitor#set_memento
----- BEGIN before activate -----
content of history stack
    0: ONE_0    quantifier_id=0
    1: ONE_1    quantifier_id=0
    2: ONE_2    quantifier_id=0
    3: ZERO_3   quantifier_id=0
    4: ZERO_0   quantifier_id=1
    5: ZERO_0   quantifier_id=2
quantifier information
    0: active   found=no
    1: inactive found=no
    2: inactive found=no
----- END before activate -----
activate_next
found zero/one
found zero/one
found zero/one
activate_inactive
----- BEGIN after activate -----
content of history stack
    0: ONE_0    quantifier_id=0
    1: ONE_1    quantifier_id=0
    2: ONE_2    quantifier_id=0
    3: ZERO_3   quantifier_id=0
    4: ZERO_0   quantifier_id=1
    5: ZERO_0   quantifier_id=2
quantifier information
    0: done     found=no
    1: active   found=yes
    2: inactive found=no
----- END after activate -----
skip, path end = is done
next_path zero.  found=true lazy=false state=inactive index=0 has_match=false
skip, path end = not active
next_path zero.  found=true lazy=false state=active index=0 has_match=true
remember zero and clear
replace zero with one
visitor#set_state from active into active
visitor#set_memento
match NOT["\n"] at position 6
visitor#set_state from active into inactive
one-end
group_close  register=2
match "c" at position 7
path end = expected "c" but got "d"
check_integrity history.size=6
index-stack=[3, 1]
integrity "a . . . b . c d",  but expected "a . . . b c . d" at line 14.  ERROR!
E

Finished in 0.896336 seconds.

  1) Error:
test_verbose_repeat3(TestScanner):
RuntimeError: integrity: expected "a . . . b c . d", got "a . . . b . c d" at line 14
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:41:in `check_integrity'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner_helpers.rb:533:in `path_end'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:117:in `visit_match'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner_nodes.rb:19:in `accept'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:252:in `find_match_at'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:262:in `match_impl'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:260:in `loop'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:268:in `match_impl'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:272:in `match'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `match_integrity'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `call'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:282:in `check_integrity'
    /home/neoneye/kode/editor/projects/regexp_engine/regexp/scanner.rb:294:in `match_integrity'
    ./common.rb:66:in `match_integrity'
    ./common.rb:83:in `assert_regex'
    ./match_mixins.rb:894:in `_debug_test_verbose_repeat3'
    (eval):5:in `test_verbose_repeat3'

86 tests, 85 assertions, 0 failures, 1 errors


More information about the Aeditor-talk mailing list