From jason.p.morrison at gmail.com Mon Feb 5 23:13:16 2007 From: jason.p.morrison at gmail.com (Jason Morrison) Date: Mon, 5 Feb 2007 23:13:16 -0500 Subject: [grammarians] Error recovery / parsing partial source Message-ID: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> Hey there folks, I was wondering if anyone could point me in the right direction. I'm looking into parsing partial/invalid Ruby code, specifically for the intent of type inference for code completion. (See http://soc.jayunit.net... I've disappeared under school work for a while, but am trying to reach the surface under the guise of a project for my Language Processors/Compiler Construction course ;) The biggest roadblock I'd hit upon, and would like to revisit, is the fact that,inside an IDE code completion (and therefore type inference) is often requested at a point at which the code is not syntactically valid. I.e.: def foo if ( cond ) puts myvar. # invoke completion class MyKlass class << self; def method #... and so on So we're in the middle of a method-send on the myvar expr, a puts call, an if statement, and a method definition node, none of which are closed before the MyKlass class declaration node is opened (or, equally often, EOF is hit when we are ten nodes deep). My knowledge of error hooks inside parsers is spotty at best (and limited to error symbols in yacc-based Jay), so I appeal to you all: is there research on heuristics in this area I can read up on? Is there such error recovery functionality in C-Ruby's parse.y (I spy some nice yyerror() calls in there, do they bubble up into JRuby's Jay parser?) or Rubyfront's ruby.g (didn't see any, but I don't know ANTLR)? If not, any thoughts on the idea? Is this something that can be usefully resolved via error symbols? (Perhaps I am simply ignorant of an error-recovery enabling flag in my trusty JRuby parsing calls, and would be better served directing inquiry to the JRuby list?) I'd rather not just blast an arbitrary number of close-parens and "end"s at the cursor position until a valid parse surfaces, the quickest-and-dirtiest solution that originally came to mind ; ) Thanks very much for any input! Jason -- Jason Morrison jason.p.morrison at gmail.com http://jayunit.net (585) 216-5657 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20070205/410b72d5/attachment.html From charles.nutter at sun.com Tue Feb 6 00:21:39 2007 From: charles.nutter at sun.com (Charles Oliver Nutter) Date: Mon, 05 Feb 2007 23:21:39 -0600 Subject: [grammarians] Error recovery / parsing partial source In-Reply-To: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> References: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> Message-ID: <45C81063.4090205@sun.com> Jason Morrison wrote: > Hey there folks, Hello there Jason! You live! > I was wondering if anyone could point me in the right direction. I'm > looking into parsing partial/invalid Ruby code, specifically for the > intent of type inference for code completion. (See > http://soc.jayunit.net... I've disappeared under school work for a > while, but am trying to reach the surface under the guise of a project > for my Language Processors/Compiler Construction course ;) The biggest > roadblock I'd hit upon, and would like to revisit, is the fact > that,inside an IDE code completion (and therefore type inference) is > often requested at a point at which the code is not syntactically > valid. I.e.: > > def foo > if ( cond ) > puts myvar. # invoke completion > > class MyKlass > class << self; def method > #... and so on Oddly enough, I'm not a parser or grammar expert, but it's my understanding that the parsers ANTLR generates can recover from parse errors much more efficiently. Others on the list may be able to confirm or refute that. However the most efficient model I've heard about is called "parser combinators". I have no knowledge of it other than that it combines many "mini parsers" that individually handle smaller parts of the code, so you can cut off parsing at any point or parse subsections of a file. To my knowlege, there's no such parser for Ruby anywhere. ANTLR's probably your best bet for now. > If not, any thoughts on the idea? Is this something that can be > usefully resolved via error symbols? (Perhaps I am simply ignorant of > an error-recovery enabling flag in my trusty JRuby parsing calls, and > would be better served directing inquiry to the JRuby list?) I'd rather > not just blast an arbitrary number of close-parens and "end"s at the > cursor position until a valid parse surfaces, the quickest-and-dirtiest > solution that originally came to mind ; ) You're not missing anything in the JRuby parser. Most of the folks working on editors/IDEs are using their own hacks to get around it, and at least in the case of NetBeans I know Tor has used hacks plus completion to limit the likelihood that a file will be unparseable, but eventually there's a point that there's nothing you can do. Our (grammarians) grammar is actually one contributed by the XRuby project, and by most accounts it's the most complete ANTLR grammar yet available. Xue has managed to get XRuby to parse and compile everything in Ruby's "test.rb", which is no small feat. I believe this grammar could probably use an update (Xue, is this true?). I think we'd all love for someone to look at using Xue's grammar for other things, and of course on the JRuby side I'm just interested in potentially moving away from the YACC-based parser we have now. But it's pretty scary, since to our knowledge there's no other 100% complete and known perfect parser than the YACC-based versions. - Charlie From tom at infoether.com Tue Feb 6 06:18:50 2007 From: tom at infoether.com (Tom Copeland) Date: Tue, 06 Feb 2007 06:18:50 -0500 Subject: [grammarians] Error recovery / parsing partial source In-Reply-To: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> References: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> Message-ID: <1170760730.9220.25.camel@bugs.hal> On Mon, 2007-02-05 at 23:13 -0500, Jason Morrison wrote: > If not, any thoughts on the idea? Is this something that can be > usefully resolved via error symbols? (Perhaps I am simply ignorant of > an error-recovery enabling flag in my trusty JRuby parsing calls, and > would be better served directing inquiry to the JRuby list?) I'd > rather not just blast an arbitrary number of close-parens and "end"s > at the cursor position until a valid parse surfaces, the > quickest-and-dirtiest solution that originally came to mind ; ) I think that's pretty much one of the better solutions. Either you can do "follow set recovery", where you know what tokens should come next to close out the current nonterminal, or you can do "panic mode" where you just slap on closing tokens and hope something works, or you remove/replace tokens until it gets back to some known state. You've probably already encountered the Burke-Fisher algorithm, about which I am clueless but it seems to be good :-) You might want to post a question about this to the compilers list (*); all sorts of gurus hang out there and it's moderated by John Levine who wrote yacc. Yours, Tom (*) Search for 'compilers' here, not sure of a better way to get there: http://lists.gurus.com/cgi-bin/mj_wwwusr/domain=lists.iecc.com From zhixueyong at hotmail.com Tue Feb 6 08:03:00 2007 From: zhixueyong at hotmail.com (Xue Yong Zhi) Date: Tue, 06 Feb 2007 08:03:00 -0500 Subject: [grammarians] Error recovery / parsing partial source In-Reply-To: <45C81063.4090205@sun.com> Message-ID: > >Oddly enough, I'm not a parser or grammar expert, but it's my >understanding that the parsers ANTLR generates can recover from parse >errors much more efficiently. Others on the list may be able to confirm >or refute that. However the most efficient model I've heard about is >called "parser combinators". I have no knowledge of it other than that >it combines many "mini parsers" that individually handle smaller parts >of the code, so you can cut off parsing at any point or parse >subsections of a file. To my knowlege, there's no such parser for Ruby >anywhere. ANTLR's probably your best bet for now. Error recovery is not easy. You need to write lots of embedded actions and things turn ugly really quick. Normally it is a good idea to be not too picky in the parser(accept more if you can) and delay some error checking to ast walking. Antlr is good for error recovery (as it is the only one I am good at, your may want to ask others for more opnions;)). The best thing of antlr is it generates human readable result which is close to what you will write by hand. And Terence indicates antlr v3 has much better error recovery. >Our (grammarians) grammar is actually one contributed by the XRuby >project, and by most accounts it's the most complete ANTLR grammar yet >available. Xue has managed to get XRuby to parse and compile everything >in Ruby's "test.rb", which is no small feat. I believe this grammar >could probably use an update (Xue, is this true?). > Yes I did lots of cleanup but it causes some small problems as well. I will update the repository on this project once everything comes back together. _________________________________________________________________ FREE online classifieds from Windows Live Expo ? buy and sell with people you know http://clk.atdmt.com/MSN/go/msnnkwex0010000001msn/direct/01/?href=http://expo.live.com?s_cid=Hotmail_tagline_12/06 From tom at infoether.com Tue Feb 6 09:23:03 2007 From: tom at infoether.com (Tom Copeland) Date: Tue, 06 Feb 2007 09:23:03 -0500 Subject: [grammarians] Error recovery / parsing partial source In-Reply-To: <1170760730.9220.25.camel@bugs.hal> References: <8ba0586d0702052013s7458d69ekaf6e2c09b7376abb@mail.gmail.com> <1170760730.9220.25.camel@bugs.hal> Message-ID: <1170771783.9220.44.camel@bugs.hal> On Tue, 2007-02-06 at 06:18 -0500, Tom Copeland wrote: > the compilers list (*); > all sorts of gurus hang out there and it's moderated by John Levine who > wrote yacc. Er, I should say, who wrote the O'Reilly "lex & yacc" book. Also "Linkers and Loaders", which is great stuff as well... Yours, tom From tom at infoether.com Tue Feb 6 09:25:06 2007 From: tom at infoether.com (Tom Copeland) Date: Tue, 06 Feb 2007 09:25:06 -0500 Subject: [grammarians] Error recovery / parsing partial source In-Reply-To: References: Message-ID: <1170771906.9220.46.camel@bugs.hal> On Tue, 2007-02-06 at 08:03 -0500, Xue Yong Zhi wrote: > Antlr is good for error recovery (as it is the only one I am good at, your > may want to ask others for more opnions;)). The best thing of antlr is it > generates human readable result which is close to what you will write by > hand. And Terence indicates antlr v3 has much better error recovery. He's got an ANTLR book coming out from PragProg in a few months: http://www.pragmaticprogrammer.com/titles/tpantlr/index.html Should be pretty cool.... Yours, Tom