From jeff at somethingsimilar.com Tue Feb 3 01:12:47 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Mon, 2 Feb 2009 22:12:47 -0800 Subject: [Nokogiri-talk] flipping the bozo bit Message-ID: Hey, I'm trying to add support for nokogiri's XML parser[1] to rfeedparser, a ruby translation of feedparser, but I've hit a snag. You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be flipped in the data structure returned if the parsed XML is not well-formed (i.e. tags are missing a '>', etc.). This is to provide the developer using it a handy way of detecting "bad" data. On top of that, the architecture of feedparser (and rfeedparser) depends on having a "strict" parser for well-formed XML and a "loose" parser (for ill-formed XML). rfeedparser manages to get both expat and libxml-ruby[2] to adhere to this just as they do in the python version.[3] The problem I'm having is that I can't get nokogiri to fail on ill-formed XML! The 1500 or so ill-formed tests fail miserably when using my perfectly fine nokogiri SAX parser because nokogiri will not give up. Nothing turns up as a warning nor an error (i.e. nothing is passed to SAX::Parser#warning nor SAX::Parser#error). Is there some way to get nokogiri to either a) only work on well-formed XML or b) have it include some information on the well-formedness of the XML it is parsing? Perhaps, there is something easy I've overlooked. -- Jeff [1] Currently, just the XML parsing. Once we jump the hurdle I explain here, I'll probably start looking at for the "loose" things. [2] Well, older versions of it. The 0.9.7 and 0.9.8 releases have been... finicky. [3] "Manages" meaning "they blow up as expected and rfp cleans up the mess, so it has to have this architecture". From aaron.patterson at gmail.com Tue Feb 3 11:49:00 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 3 Feb 2009 08:49:00 -0800 Subject: [Nokogiri-talk] flipping the bozo bit In-Reply-To: References: Message-ID: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> Hi Jeff, On Mon, Feb 2, 2009 at 10:12 PM, Jeff Hodges wrote: > Hey, > I'm trying to add support for nokogiri's XML parser[1] to rfeedparser, > a ruby translation of feedparser, but I've hit a snag. > > You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be > flipped in the data structure returned if the parsed XML is not > well-formed (i.e. tags are missing a '>', etc.). This is to provide > the developer using it a handy way of detecting "bad" data. On top of > that, the architecture of feedparser (and rfeedparser) depends on > having a "strict" parser for well-formed XML and a "loose" parser (for > ill-formed XML). rfeedparser manages to get both expat and > libxml-ruby[2] to adhere to this just as they do in the python > version.[3] Do you want to be notified of bad XML, or actually blow up on bad XML, or both? > The problem I'm having is that I can't get nokogiri to fail on > ill-formed XML! The 1500 or so ill-formed tests fail miserably when > using my perfectly fine nokogiri SAX parser because nokogiri will not > give up. Nothing turns up as a warning nor an error (i.e. nothing is > passed to SAX::Parser#warning nor SAX::Parser#error). Yes. Nokogiri is the hardest working XML parser in show business. It will parse anything! ;-) > Is there some way to get nokogiri to either a) only work on > well-formed XML or b) have it include some information on the > well-formedness of the XML it is parsing? Perhaps, there is something > easy I've overlooked. I do have something for you. You can pass options to the DOM parser as to how strict you'd like to be. Passing in 0 is most strict. This will blow up: doc = Nokogiri::XML('', nil, nil, 0) For all the options, check out the constants here: http://nokogiri.rubyforge.org/nokogiri/classes/Nokogiri/XML.html Just bitwise and the constants to set options. (Sorry the RDoc is broken, I've worked with Eric to fix up rdoc bugs and it should be nicer next release). If you'd just like to be *notified* of parse errors, and not blow up, there is a callback you can set: Nokogiri.error_handler = lambda { |syntax_error| puts syntax_error.level } doc = Nokogiri::XML('') You can use that same callback for SAX documents. IIRC, the warning and error callbacks are just in libxml2 to tease us. I included them for completeness, but I've never been able to get them to fire. One thing to watch out for..... That lambda is not thread safe. It's easy to fix for SAX documents, but I'm not sure what to do when DOM parsing. I can tie that error callback to a context, I just don't know what to tie it to? Thread.current? Any suggestions would be greatly appreciated. :-) I hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Tue Feb 3 11:55:50 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 3 Feb 2009 08:55:50 -0800 Subject: [Nokogiri-talk] flipping the bozo bit In-Reply-To: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> References: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> Message-ID: <6959e1680902030855x3821c744p8c6b38a98a6d4f@mail.gmail.com> On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson wrote: > Hi Jeff, > > On Mon, Feb 2, 2009 at 10:12 PM, Jeff Hodges wrote: >> Hey, >> I'm trying to add support for nokogiri's XML parser[1] to rfeedparser, >> a ruby translation of feedparser, but I've hit a snag. >> >> You see, the 3000+ xml tests for feedparser expect a "bozo" bit to be >> flipped in the data structure returned if the parsed XML is not >> well-formed (i.e. tags are missing a '>', etc.). This is to provide >> the developer using it a handy way of detecting "bad" data. On top of >> that, the architecture of feedparser (and rfeedparser) depends on >> having a "strict" parser for well-formed XML and a "loose" parser (for >> ill-formed XML). rfeedparser manages to get both expat and >> libxml-ruby[2] to adhere to this just as they do in the python >> version.[3] > > Do you want to be notified of bad XML, or actually blow up on bad XML, or both? > >> The problem I'm having is that I can't get nokogiri to fail on >> ill-formed XML! The 1500 or so ill-formed tests fail miserably when >> using my perfectly fine nokogiri SAX parser because nokogiri will not >> give up. Nothing turns up as a warning nor an error (i.e. nothing is >> passed to SAX::Parser#warning nor SAX::Parser#error). > > Yes. Nokogiri is the hardest working XML parser in show business. It > will parse anything! ;-) > >> Is there some way to get nokogiri to either a) only work on >> well-formed XML or b) have it include some information on the >> well-formedness of the XML it is parsing? Perhaps, there is something >> easy I've overlooked. > > I do have something for you. > > You can pass options to the DOM parser as to how strict you'd like to > be. Passing in 0 is most strict. This will blow up: > > doc = Nokogiri::XML('', nil, nil, 0) > > For all the options, check out the constants here: > > http://nokogiri.rubyforge.org/nokogiri/classes/Nokogiri/XML.html > > Just bitwise and the constants to set options. (Sorry the RDoc is > broken, I've worked with Eric to fix up rdoc bugs and it should be > nicer next release). > > If you'd just like to be *notified* of parse errors, and not blow up, > there is a callback you can set: > > Nokogiri.error_handler = lambda { |syntax_error| puts syntax_error.level } > doc = Nokogiri::XML('') > > You can use that same callback for SAX documents. IIRC, the warning > and error callbacks are just in libxml2 to tease us. I included them > for completeness, but I've never been able to get them to fire. > > One thing to watch out for..... That lambda is not thread safe. It's > easy to fix for SAX documents, but I'm not sure what to do when DOM > parsing. I can tie that error callback to a context, I just don't > know what to tie it to? Thread.current? Any suggestions would be > greatly appreciated. :-) > > I hope that helps! Actually, any syntax suggestions on any of the error handling would be great. In my use cases, I want nokogiri to recover the document no matter what. Because of that, I haven't concerned myself too much with handling parse errors (since I don't get them). I think that is an area where I could make some nice improvements. -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Tue Feb 3 13:39:53 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Tue, 3 Feb 2009 10:39:53 -0800 Subject: [Nokogiri-talk] flipping the bozo bit In-Reply-To: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> References: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> Message-ID: On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson wrote: > You can pass options to the DOM parser as to how strict you'd like to > be. Passing in 0 is most strict. This will blow up: > > doc = Nokogiri::XML('', nil, nil, 0) This looks to be what I want. Well, more accurately, However, I can't find the SAX::Parser or SAX::Document methods to allow me to do this. Actually, this is related to another problem I found: There appears to be no way to tell a SAX::Parser or SAX::Document what encoding to parse as given that I know that I will always be passing a string to be parsed (meaning, SAX::Parser#parse is the appropriate method to call). Maybe there were supposed to be some other options on SAX::Parser#parse, #parse_io and #parse_file and they just were accidentally left out? -- Jef From aaron.patterson at gmail.com Tue Feb 3 14:30:56 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 3 Feb 2009 11:30:56 -0800 Subject: [Nokogiri-talk] flipping the bozo bit In-Reply-To: References: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> Message-ID: <6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com> On Tue, Feb 3, 2009 at 10:39 AM, Jeff Hodges wrote: > On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson > wrote: >> You can pass options to the DOM parser as to how strict you'd like to >> be. Passing in 0 is most strict. This will blow up: >> >> doc = Nokogiri::XML('', nil, nil, 0) > > This looks to be what I want. Well, more accurately, However, I can't > find the SAX::Parser or SAX::Document methods to allow me to do this. > Actually, this is related to another problem I found: There appears to > be no way to tell a SAX::Parser or SAX::Document what encoding to > parse as given that I know that I will always be passing a string to > be parsed (meaning, SAX::Parser#parse is the appropriate method to > call). If you set the lambda, it still gets called on SAX parse errors. In there, you can choose to raise or do whatever: Nokogiri.error_handler = lambda { |syntax_error| raise "Damn!" } Nokogiri::XML::SAX::Parser.new.parse('') > Maybe there were supposed to be some other options on > SAX::Parser#parse, #parse_io and #parse_file and they just were > accidentally left out? SAX::Parser.parse_io takes an encoding. The encoding is a number that maps to these constants: http://xmlsoft.org/html/libxml-encoding.html#xmlCharEncoding Yuck. I should document that.... :-( I'm going to have to look in to setting encoding for in memory parsing.... The function I'm using doesn't take any encoding options, but there must be a way to set them. -- Aaron Patterson http://tenderlovemaking.com/ From timcharper at gmail.com Tue Feb 3 16:05:18 2009 From: timcharper at gmail.com (Tim Harper) Date: Tue, 3 Feb 2009 14:05:18 -0700 Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri node Message-ID: This gist paste says most of it: http://gist.github.com/57751 Given a table, I'm trying to select all of the rows in that table without selecting rows from nested tables. Historically in Hpricot, I would just use (element / "> tr"). However, nokogiri doesn't like that syntax. Additionally, I can't seem to use the xpath selector either, (element / "//tr") is selecting all child rows. Am I going the completely wrong route? Or is this a feature that's planned to be implemented at some point? Thanks :) Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Tue Feb 3 16:25:41 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 3 Feb 2009 13:25:41 -0800 Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri node In-Reply-To: References: Message-ID: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com> On Tue, Feb 3, 2009 at 1:05 PM, Tim Harper wrote: > This gist paste says most of it: > http://gist.github.com/57751 > > Given a table, I'm trying to select all of the rows in that table without > selecting rows from nested tables. Historically in Hpricot, I would just > use (element / "> tr"). However, nokogiri doesn't like that syntax. > Additionally, I can't seem to use the xpath selector either, (element / > "//tr") is selecting all child rows. > Am I going the completely wrong route? Or is this a feature that's planned > to be implemented at some point? Actually, I consider this to be broken behavior in hpricot. Using CSS, you're saying "find all tr tags which are decedents of this reference node". Since your reference node is the top level "table" tag, it finds all four descendants. Your XPath query says "find all nodes starting at the root whose name is 'tr'". If you start your XPath with a slash, it *always* means "from the root node". If you want relative queries in XPath, start with a dot: ".//tr". Just try to think of how you might write the CSS selector when dealing with your web browser. How would you expect it to behave with the browser? That is how it should work with nokogiri. We try to match browser behavior as closely as possible. I've forked your gist to illustrate: http://gist.github.com/57768 Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From timcharper at gmail.com Tue Feb 3 17:07:26 2009 From: timcharper at gmail.com (Tim Harper) Date: Tue, 3 Feb 2009 15:07:26 -0700 Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri node In-Reply-To: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com> References: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com> Message-ID: Thanks for your response :) I don't think I explained my question well enough - sorry about that. And there was a problem with my example which showed Hpricot behavior to be opposite of what it does in the real world. Both of the examples you provided as a solution returned the rows from the nested table, but what I'm trying to get is the rows from the root table. I forked and updated the gist, adding some comments and fixing an issue with the original http://gist.github.com/57787 I understand your point about what should be valid in css, and yes, naturally, you wouldn't use a css selector starting with a >. But you also wouldn't be evaluating a css selector against a node in the document somewhere (you always start from root). IE: Nokogiri::HTML.parse("table#users > tr") Nokogiri::HTML.parse("table#users") / " > tr" In the latter example, I see the node found by Nokogiri as a direct substitute for "table#users", and should have some way to select child elements from the node without recursing deeper. I hope I'm explaining myself better. Thank you again for your quick response! Tim On Tue, Feb 3, 2009 at 2:25 PM, Aaron Patterson wrote: > On Tue, Feb 3, 2009 at 1:05 PM, Tim Harper wrote: > > This gist paste says most of it: > > http://gist.github.com/57751 > > > > Given a table, I'm trying to select all of the rows in that table without > > selecting rows from nested tables. Historically in Hpricot, I would just > > use (element / "> tr"). However, nokogiri doesn't like that syntax. > > Additionally, I can't seem to use the xpath selector either, (element / > > "//tr") is selecting all child rows. > > Am I going the completely wrong route? Or is this a feature that's > planned > > to be implemented at some point? > > Actually, I consider this to be broken behavior in hpricot. Using > CSS, you're saying "find all tr tags which are decedents of this > reference node". Since your reference node is the top level "table" > tag, it finds all four descendants. > > Your XPath query says "find all nodes starting at the root whose name > is 'tr'". If you start your XPath with a slash, it *always* means > "from the root node". If you want relative queries in XPath, start > with a dot: ".//tr". > > Just try to think of how you might write the CSS selector when dealing > with your web browser. How would you expect it to behave with the > browser? That is how it should work with nokogiri. We try to match > browser behavior as closely as possible. > > I've forked your gist to illustrate: > > http://gist.github.com/57768 > > Hope that helps! > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at somethingsimilar.com Tue Feb 3 17:37:50 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Tue, 3 Feb 2009 14:37:50 -0800 Subject: [Nokogiri-talk] flipping the bozo bit In-Reply-To: <6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com> References: <6959e1680902030849l3e10723at5b7810d7615f9827@mail.gmail.com> <6959e1680902031130td954c87hf6e48d93f9d29716@mail.gmail.com> Message-ID: That's fine but I had hoped to replace hpricot with nokogiri for the HTML parsing as well in the near future. Is there some distinguishing characteristic between HTML parse errors and XML parse errors that would be passed to that lambda? And is there a distinguishing characteristic between recoverable and non-recoverable parse errors? Modifying a module-wide variable to when I'll be doing multiple kinds of parses is kind of icky. -- Jeff On Tue, Feb 3, 2009 at 11:30 AM, Aaron Patterson wrote: > On Tue, Feb 3, 2009 at 10:39 AM, Jeff Hodges wrote: >> On Tue, Feb 3, 2009 at 8:49 AM, Aaron Patterson >> wrote: >>> You can pass options to the DOM parser as to how strict you'd like to >>> be. Passing in 0 is most strict. This will blow up: >>> >>> doc = Nokogiri::XML('', nil, nil, 0) >> >> This looks to be what I want. Well, more accurately, However, I can't >> find the SAX::Parser or SAX::Document methods to allow me to do this. >> Actually, this is related to another problem I found: There appears to >> be no way to tell a SAX::Parser or SAX::Document what encoding to >> parse as given that I know that I will always be passing a string to >> be parsed (meaning, SAX::Parser#parse is the appropriate method to >> call). > > If you set the lambda, it still gets called on SAX parse errors. In > there, you can choose to raise or do whatever: > > Nokogiri.error_handler = lambda { |syntax_error| raise "Damn!" } > Nokogiri::XML::SAX::Parser.new.parse('') > >> Maybe there were supposed to be some other options on >> SAX::Parser#parse, #parse_io and #parse_file and they just were >> accidentally left out? > > SAX::Parser.parse_io takes an encoding. The encoding is a number that > maps to these constants: > > http://xmlsoft.org/html/libxml-encoding.html#xmlCharEncoding > > Yuck. I should document that.... :-( > > I'm going to have to look in to setting encoding for in memory > parsing.... The function I'm using doesn't take any encoding options, > but there must be a way to set them. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From aaron.patterson at gmail.com Tue Feb 3 17:38:51 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 3 Feb 2009 14:38:51 -0800 Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri node In-Reply-To: References: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com> Message-ID: <6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com> On Tue, Feb 3, 2009 at 2:07 PM, Tim Harper wrote: > Thanks for your response :) > I don't think I explained my question well enough - sorry about that. And > there was a problem with my example which showed Hpricot behavior to be > opposite of what it does in the real world. Both of the examples you > provided as a solution returned the rows from the nested table, but what I'm > trying to get is the rows from the root table. > I forked and updated the gist, adding some comments and fixing an issue with > the original > http://gist.github.com/57787 > I understand your point about what should be valid in css, and yes, > naturally, you wouldn't use a css selector starting with a >. But you also > wouldn't be evaluating a css selector against a node in the document > somewhere (you always start from root). > IE: > Nokogiri::HTML.parse("table#users > tr") > Nokogiri::HTML.parse("table#users") / " > tr" > In the latter example, I see the node found by Nokogiri as a direct > substitute for "table#users", and should have some way to select child > elements from the node without recursing deeper. I hope I'm explaining > myself better. Okay, I think I understand a little better. In order to support this syntax: doc.css("table#users").css(" > tr") We would have to keep track of the previously used CSS selector. That doesn't sound like fun..... For now, you could do this: doc.css('table#users").xpath('./tr') That will select immediate children whose name is "tr". -- Aaron Patterson http://tenderlovemaking.com/ From timcharper at gmail.com Tue Feb 3 18:03:04 2009 From: timcharper at gmail.com (Tim Harper) Date: Tue, 3 Feb 2009 16:03:04 -0700 Subject: [Nokogiri-talk] How to select child elements only from a Nokogiri node In-Reply-To: <6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com> References: <6959e1680902031325m51aa5d6bq7d35c2c44c979aa4@mail.gmail.com> <6959e1680902031438y549cf86cn45797f5ccd2907a6@mail.gmail.com> Message-ID: OK - thank you for your reply :) We'll work around it for now. Tim On Tue, Feb 3, 2009 at 3:38 PM, Aaron Patterson wrote: > On Tue, Feb 3, 2009 at 2:07 PM, Tim Harper wrote: > > Thanks for your response :) > > I don't think I explained my question well enough - sorry about that. And > > there was a problem with my example which showed Hpricot behavior to be > > opposite of what it does in the real world. Both of the examples you > > provided as a solution returned the rows from the nested table, but what > I'm > > trying to get is the rows from the root table. > > I forked and updated the gist, adding some comments and fixing an issue > with > > the original > > http://gist.github.com/57787 > > I understand your point about what should be valid in css, and yes, > > naturally, you wouldn't use a css selector starting with a >. But you > also > > wouldn't be evaluating a css selector against a node in the document > > somewhere (you always start from root). > > IE: > > Nokogiri::HTML.parse("table#users > tr") > > Nokogiri::HTML.parse("table#users") / " > tr" > > In the latter example, I see the node found by Nokogiri as a direct > > substitute for "table#users", and should have some way to select child > > elements from the node without recursing deeper. I hope I'm explaining > > myself better. > > Okay, I think I understand a little better. In order to support this > syntax: > > doc.css("table#users").css(" > tr") > > We would have to keep track of the previously used CSS selector. That > doesn't sound like fun..... > > For now, you could do this: > > doc.css('table#users").xpath('./tr') > > That will select immediate children whose name is "tr". > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at somethingsimilar.com Tue Feb 3 23:27:16 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Tue, 3 Feb 2009 20:27:16 -0800 Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError Message-ID: I discovered that Nokogiri raises its SyntaxErrors from ruby's own SyntaxError. This has the unfortunate side effect of causing SyntaxErrors generated from slightly broken XML, etc. to be turned into Exceptions that cannot be rescued from normally[1]. I've put up a branch (well, two) to fix this. The first branch[2] just fixes the problem of raising Exceptions instead of StandardErrors by swapping out "< ::SyntaxError" for "< ::StandardError" and rb_eSyntaxError for rb_eStandardError. This is fine, but one of the benefits of inheriting from ::SyntaxError is that you can catch all the Nokogiri SyntaxErrors with one "rescue SyntaxError". This leads us to the second branch[3]. The second branch has these SyntaxErrors all inheriting from one error class, Nokogiri::SyntaxError, giving us that nice little rescue statement back. This second branch might be overkill. The first branch might be too little. Both might cause people's hair to catch aflame. So, I left them separate. Comments? -- Jeff [1] Per usual, this is troublesome for rfeedparser and I found it while working Nokogiri#error_handler. Okay, not a lot of trouble but, really, this is a problem for lots of code. [2] http://github.com/jmhodges/nokogiri/tree/no_exceptions [3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error From jeff at somethingsimilar.com Wed Feb 4 01:42:04 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Tue, 3 Feb 2009 22:42:04 -0800 Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError In-Reply-To: References: Message-ID: Minor update: Forgot to update the Manifest.txt. Pull again if you need to. Nobody cares but me: rfeedparser now works with nokogiri with either the no_exceptions or combined_syntax_error patches applied. If or when those patches are accepted and released, a new version of rfp goes out, too. Yay! No more libxml-ruby causing drama! -- Jeff On Tue, Feb 3, 2009 at 8:27 PM, Jeff Hodges wrote: > I discovered that Nokogiri raises its SyntaxErrors from ruby's own > SyntaxError. This has the unfortunate side effect of causing > SyntaxErrors generated from slightly broken XML, etc. to be turned > into Exceptions that cannot be rescued from normally[1]. I've put up a > branch (well, two) to fix this. > > The first branch[2] just fixes the problem of raising Exceptions > instead of StandardErrors by swapping out "< ::SyntaxError" for "< > ::StandardError" and rb_eSyntaxError for rb_eStandardError. This is > fine, but one of the benefits of inheriting from ::SyntaxError is that > you can catch all the Nokogiri SyntaxErrors with one "rescue > SyntaxError". > > This leads us to the second branch[3]. The second branch has these > SyntaxErrors all inheriting from one error class, > Nokogiri::SyntaxError, giving us that nice little rescue statement > back. This second branch might be overkill. The first branch might be > too little. Both might cause people's hair to catch aflame. So, I left > them separate. > > Comments? > -- > Jeff > > [1] Per usual, this is troublesome for rfeedparser and I found it > while working Nokogiri#error_handler. Okay, not a lot of trouble but, > really, this is a problem for lots of code. > [2] http://github.com/jmhodges/nokogiri/tree/no_exceptions > [3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error > From mike.dalessio at gmail.com Wed Feb 4 11:20:54 2009 From: mike.dalessio at gmail.com (Mike Dalessio) Date: Wed, 4 Feb 2009 08:20:54 -0800 Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError In-Reply-To: References: Message-ID: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com> Jeff - I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much! -mike On Tue, Feb 3, 2009 at 10:42 PM, Jeff Hodges wrote: > Minor update: Forgot to update the Manifest.txt. Pull again if you need to. > > Nobody cares but me: rfeedparser now works with nokogiri with either > the no_exceptions or combined_syntax_error patches applied. If or when > those patches are accepted and released, a new version of rfp goes > out, too. Yay! No more libxml-ruby causing drama! > -- > Jeff > > On Tue, Feb 3, 2009 at 8:27 PM, Jeff Hodges > wrote: > > I discovered that Nokogiri raises its SyntaxErrors from ruby's own > > SyntaxError. This has the unfortunate side effect of causing > > SyntaxErrors generated from slightly broken XML, etc. to be turned > > into Exceptions that cannot be rescued from normally[1]. I've put up a > > branch (well, two) to fix this. > > > > The first branch[2] just fixes the problem of raising Exceptions > > instead of StandardErrors by swapping out "< ::SyntaxError" for "< > > ::StandardError" and rb_eSyntaxError for rb_eStandardError. This is > > fine, but one of the benefits of inheriting from ::SyntaxError is that > > you can catch all the Nokogiri SyntaxErrors with one "rescue > > SyntaxError". > > > > This leads us to the second branch[3]. The second branch has these > > SyntaxErrors all inheriting from one error class, > > Nokogiri::SyntaxError, giving us that nice little rescue statement > > back. This second branch might be overkill. The first branch might be > > too little. Both might cause people's hair to catch aflame. So, I left > > them separate. > > > > Comments? > > -- > > Jeff > > > > [1] Per usual, this is troublesome for rfeedparser and I found it > > while working Nokogiri#error_handler. Okay, not a lot of trouble but, > > really, this is a problem for lots of code. > > [2] http://github.com/jmhodges/nokogiri/tree/no_exceptions > > [3] http://github.com/jmhodges/nokogiri/tree/combined_syntax_error > > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Wed Feb 4 12:26:21 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Wed, 4 Feb 2009 09:26:21 -0800 Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError In-Reply-To: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com> References: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com> Message-ID: <6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com> On Wed, Feb 4, 2009 at 8:20 AM, Mike Dalessio wrote: > Jeff - > > I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much! I am lazy, and I like the patches. I'm adding jeff to the collaborators list. Jeff you may merge it yourself. :-) -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Wed Feb 4 12:29:10 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Wed, 4 Feb 2009 09:29:10 -0800 Subject: [Nokogiri-talk] [PATCH] no Exceptions and a momma SyntaxError In-Reply-To: <6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com> References: <618c07250902040820s652564cdp72f5b791ba29ac35@mail.gmail.com> <6959e1680902040926t26186b5dlf6737ed9f0b7dbc1@mail.gmail.com> Message-ID: Fantastic. On Wed, Feb 4, 2009 at 9:26 AM, Aaron Patterson wrote: > On Wed, Feb 4, 2009 at 8:20 AM, Mike Dalessio wrote: >> Jeff - >> >> I like! I'll talk to Aaron about which patch he'd prefer. Thanks so much! > > I am lazy, and I like the patches. I'm adding jeff to the collaborators list. > > Jeff you may merge it yourself. :-) > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From jeff at somethingsimilar.com Thu Feb 5 16:50:10 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Thu, 5 Feb 2009 13:50:10 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? Message-ID: Hey, What all is left to do before the 1.1.2 release? I checked out the lighthouse for nokogiri, and didn't see anything. I'm guessing that the push parser still needs some love? -- Jeff From aaron.patterson at gmail.com Thu Feb 5 18:48:01 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 5 Feb 2009 15:48:01 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: References: Message-ID: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> On Thu, Feb 5, 2009 at 1:50 PM, Jeff Hodges wrote: > Hey, > What all is left to do before the 1.1.2 release? I checked out the > lighthouse for nokogiri, and didn't see anything. I'm guessing that > the push parser still needs some love? Nope. The push parser is done. The next release is actually going to be 1.2.0. I need to delete the 1.1.2 milestone. I'm planning on releasing on the 7th. I wanted to squash a couple build bugs, but I simply can't reproduce them, and unfortunately I can't get anyone in person to reproduce them. So! Look for 1.2.0 this weekend. :-) -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Thu Feb 5 21:49:53 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Thu, 5 Feb 2009 18:49:53 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> References: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> Message-ID: Cool. Christ, that is an ugly bug. Did Jacque ever let you see his laptop to reproduce it? -- Jeff On Thu, Feb 5, 2009 at 3:48 PM, Aaron Patterson wrote: > On Thu, Feb 5, 2009 at 1:50 PM, Jeff Hodges wrote: >> Hey, >> What all is left to do before the 1.1.2 release? I checked out the >> lighthouse for nokogiri, and didn't see anything. I'm guessing that >> the push parser still needs some love? > > Nope. The push parser is done. The next release is actually going to > be 1.2.0. I need to delete the 1.1.2 milestone. > > I'm planning on releasing on the 7th. I wanted to squash a couple > build bugs, but I simply can't reproduce them, and unfortunately I > can't get anyone in person to reproduce them. > > So! Look for 1.2.0 this weekend. :-) > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From aaron.patterson at gmail.com Thu Feb 5 22:39:55 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 5 Feb 2009 19:39:55 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: References: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> Message-ID: <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com> On Thu, Feb 5, 2009 at 6:49 PM, Jeff Hodges wrote: > Cool. Christ, that is an ugly bug. Did Jacque ever let you see his > laptop to reproduce it? No... He hasn't come to nerd club yet. People have also had this problem (which I cannot reproduce): http://nokogiri.lighthouseapp.com/projects/19607/tickets/7-mac-native-bundle-not-loading Right now, I am working on a gem that will help report bugs in gems. ugh. It would be nice if someone getting these errors could try solving the issue. It's very hard for me to remote debug them! :-( -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Fri Feb 6 01:49:09 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Thu, 5 Feb 2009 22:49:09 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com> References: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com> Message-ID: I've spent a couple of hours just now trying to get either one of these to happen. I give. A release this weekend sounds great. I'm waiting on it to push up the new rfeedparser with nokogiri as the default strict parser. If I can decode __xmlRaiseError in libxml2, and get it to play nice with nokogiri, I'll have nokogiri everywhere in rfp in a couple of weeks. P.S. To anyone reading this: if you're writing in C and your function declaration is 6 lines long and the function itself is 180 goddamn lines long, you have fucked up. -- Jeff On Thu, Feb 5, 2009 at 7:39 PM, Aaron Patterson wrote: > On Thu, Feb 5, 2009 at 6:49 PM, Jeff Hodges wrote: >> Cool. Christ, that is an ugly bug. Did Jacque ever let you see his >> laptop to reproduce it? > > No... He hasn't come to nerd club yet. > > People have also had this problem (which I cannot reproduce): > > http://nokogiri.lighthouseapp.com/projects/19607/tickets/7-mac-native-bundle-not-loading > > Right now, I am working on a gem that will help report bugs in gems. > ugh. It would be nice if someone getting these errors could try > solving the issue. It's very hard for me to remote debug them! :-( > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From aaron.patterson at gmail.com Fri Feb 6 02:11:58 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 5 Feb 2009 23:11:58 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: References: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com> Message-ID: <6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com> On Thu, Feb 5, 2009 at 10:49 PM, Jeff Hodges wrote: > I've spent a couple of hours just now trying to get either one of > these to happen. I give. A release this weekend sounds great. I'm > waiting on it to push up the new rfeedparser with nokogiri as the > default strict parser. > > If I can decode __xmlRaiseError in libxml2, and get it to play nice > with nokogiri, I'll have nokogiri everywhere in rfp in a couple of > weeks. Looking at the header file, I don't think you have access to that function... We don't define IN_LIBXML. I could be wrong though. I may have a solution. It kind of sucks, but I just want to get it out there. The current error handler is not thread safe. We /could/ set a mutex, then lock every time we parse a document, then capture every error from that handler and set those errors on the document after it's done being parsed. I'll create a new branch and hack something together tomorrow. Here is the error handling api btw: http://xmlsoft.org/html/libxml-xmlerror.html -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Fri Feb 6 06:15:30 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Fri, 6 Feb 2009 03:15:30 -0800 Subject: [Nokogiri-talk] what's left for 1.1.2? In-Reply-To: <6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com> References: <6959e1680902051548m6d077582k5e7e7272ae69d26b@mail.gmail.com> <6959e1680902051939u4edcb653l7bf74da11601e69a@mail.gmail.com> <6959e1680902052311o59b9b71eq510ec565c4b576d4@mail.gmail.com> Message-ID: Yeah, so far, I believe the solution looks something like void Nokogiri_error_handler(void * ctx, xmlErrorPtr error) { xmlErrorPtr ptr = calloc(1, sizeof(xmlError)); xmlCopyError(error, ptr); if (ptr->ctxt && ((xmlParserCtxtPtr)(ptr->ctxt))->sax) { // Magic! Instantiate the Parser, snag the document, call document.error } else { VALUE err = Data_Wrap_Struct(cNokogiriXmlSyntaxError, NULL, dealloc, ptr); VALUE block = rb_funcall(mNokogiri, rb_intern("error_handler"), 0); rb_funcall(block, rb_intern("call"), 1, err); } } However, the "Magic!" is where I'm at a loss. I've tried a few things, and nothing seems to work. That conditional does seem to work, though. I'm giving up for the night. -- Jeff On Thu, Feb 5, 2009 at 11:11 PM, Aaron Patterson wrote: > On Thu, Feb 5, 2009 at 10:49 PM, Jeff Hodges wrote: >> I've spent a couple of hours just now trying to get either one of >> these to happen. I give. A release this weekend sounds great. I'm >> waiting on it to push up the new rfeedparser with nokogiri as the >> default strict parser. >> >> If I can decode __xmlRaiseError in libxml2, and get it to play nice >> with nokogiri, I'll have nokogiri everywhere in rfp in a couple of >> weeks. > > Looking at the header file, I don't think you have access to that > function... We don't define IN_LIBXML. I could be wrong though. > > I may have a solution. It kind of sucks, but I just want to get it > out there. The current error handler is not thread safe. We /could/ > set a mutex, then lock every time we parse a document, then capture > every error from that handler and set those errors on the document > after it's done being parsed. > > I'll create a new branch and hack something together tomorrow. > > Here is the error handling api btw: > > http://xmlsoft.org/html/libxml-xmlerror.html > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From aaron.patterson at gmail.com Fri Feb 6 12:57:17 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Fri, 6 Feb 2009 09:57:17 -0800 Subject: [Nokogiri-talk] better error handling Message-ID: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com> I've pushed a new branch to github called "errors". I think it has better error handling for DOM parsing. Specifically check out this changeset: http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939 Comments? If it looks good, I'll implement the same kind of deal with HTML parsing. I also need to mess with the SAX parsing because I really want to get those error and warning handlers working. http://github.com/tenderlove/nokogiri/tree/errors -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Fri Feb 6 21:36:14 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Fri, 6 Feb 2009 18:36:14 -0800 Subject: [Nokogiri-talk] better error handling In-Reply-To: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com> References: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com> Message-ID: Looks good to me. I had originally thought you were referring to the SAX error stuff which is what my last message referred to. By the way, I've run Nokogiri_wrap_xml_syntax_error sans xmlCopyError, and dike screams bloody murder and nokogiri segfaults. So, I'm thinking that the comment about the xmlCopyError call in Nokogiri_wrap_xml_syntax_error is unnecessary. If someone else confirms, I'll toss it out. -- Jeff On Fri, Feb 6, 2009 at 9:57 AM, Aaron Patterson wrote: > I've pushed a new branch to github called "errors". I think it has > better error handling for DOM parsing. > > Specifically check out this changeset: > > http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939 > > Comments? > > If it looks good, I'll implement the same kind of deal with HTML > parsing. I also need to mess with the SAX parsing because I really > want to get those error and warning handlers working. > > http://github.com/tenderlove/nokogiri/tree/errors > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From jeff at somethingsimilar.com Fri Feb 6 21:37:06 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Fri, 6 Feb 2009 18:37:06 -0800 Subject: [Nokogiri-talk] better error handling In-Reply-To: References: <6959e1680902060957s74fbc654x5094da475f2eef5e@mail.gmail.com> Message-ID: Looking back, dike doesn't even get a chance. Just *boom* it all goes to hell. -- Jeff On Fri, Feb 6, 2009 at 6:36 PM, Jeff Hodges wrote: > Looks good to me. I had originally thought you were referring to the > SAX error stuff which is what my last message referred to. > > By the way, I've run Nokogiri_wrap_xml_syntax_error sans xmlCopyError, > and dike screams bloody murder and nokogiri segfaults. So, I'm > thinking that the comment about the xmlCopyError call in > Nokogiri_wrap_xml_syntax_error is unnecessary. > > If someone else confirms, I'll toss it out. > -- > Jeff > > On Fri, Feb 6, 2009 at 9:57 AM, Aaron Patterson > wrote: >> I've pushed a new branch to github called "errors". I think it has >> better error handling for DOM parsing. >> >> Specifically check out this changeset: >> >> http://github.com/tenderlove/nokogiri/commit/5f3453568202bae99f6618efb1d19ce925b79939 >> >> Comments? >> >> If it looks good, I'll implement the same kind of deal with HTML >> parsing. I also need to mess with the SAX parsing because I really >> want to get those error and warning handlers working. >> >> http://github.com/tenderlove/nokogiri/tree/errors >> >> -- >> Aaron Patterson >> http://tenderlovemaking.com/ >> _______________________________________________ >> Nokogiri-talk mailing list >> Nokogiri-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/nokogiri-talk >> > From lianliming at gmail.com Sat Feb 7 09:30:14 2009 From: lianliming at gmail.com (Lian Liming) Date: Sat, 7 Feb 2009 22:30:14 +0800 Subject: [Nokogiri-talk] Docs for nokogiri? Message-ID: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com> Hi all, I am new to nokogiri, and would like to use nokogiri as xml parser. I am wondering where I can find documentation about nokogiri. So far, I have read the wiki pages on github, rdoc, and test cases in the source codes, but still not sure how to use this tool in the most proper ways. Maybe some tutorial or user guides are more easier for new users to start with. Any suggestions are appreciated! And thanks in advance! From aaron.patterson at gmail.com Sat Feb 7 22:33:35 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Feb 2009 19:33:35 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 Message-ID: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> I think I've got the error handling in a place where I'm happy. Please take a look at the errors branch, and tell me what you think: http://github.com/tenderlove/nokogiri/tree/errors The "error_handler" lambda is *gone*. It was not thread safe, and IMHO not very useful. When doing DOM parses, even if you ran in to errors there isn't anything you could do about it. So being notified of the parse errors *after* parsing seems acceptable to me. All document objects will now have a list of errors encountered while parsing the document. For example: doc = Nokogiri::XML('') puts doc.errors.map { |error| error.to_s }.join("\n") That being said, if you want *strict* parsing, you'll get an exception raised: begin doc = Nokogiri::XML('', nil, nil, 0) rescue Nokogiri::XML::SyntaxError => ex puts ex end Removing the error_handler lambda has also made the error callbacks on SAX parsers work. If everyone is cool with this, I'm going to merge it to master and it will be in the next release. I will take silence as a sign of approval. ;-) The next thing I want to tackle is configuring the parser. I hate that you have to look up constants and pass numbers as flags to the parser. I would like to do something like this: doc = Nokogiri::XML('') do |config| config.encoding = 'UTF-8' config.recover_errors config.no_warnings end Comments? -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Sat Feb 7 23:18:32 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Feb 2009 20:18:32 -0800 Subject: [Nokogiri-talk] Docs for nokogiri? In-Reply-To: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com> References: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com> Message-ID: <6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com> On Sat, Feb 7, 2009 at 6:30 AM, Lian Liming wrote: > Hi all, > > I am new to nokogiri, and would like to use nokogiri as xml parser. I > am wondering where I can find documentation about nokogiri. So far, I > have read the wiki pages on github, rdoc, and test cases in the source > codes, but still not sure how to use this tool in the most proper > ways. Maybe some tutorial or user guides are more easier for new users > to start with. What kind of information are you looking for? I would hope that the wiki, rdoc, and test cases would get you going. What kind of information are you missing? That might help me document it better. :-) -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Sun Feb 8 00:13:56 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sat, 7 Feb 2009 21:13:56 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> Message-ID: Looks good to me. Nice, I thought you'd have to put a conditional in the the structured error function, but you managed to just move setting it (and then unsetting it immediately) to where it was actually needed. One question though, I'm seeing this in a few places: if (doc == NULL) { xmlFreeDoc(doc) .... } I'm going to guess this is used solely to get an error from libxml2 if DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's what I gathered from the libxml code I have on hand. Just making sure I'm not missing Something Clever. -- Jeff On Sat, Feb 7, 2009 at 7:33 PM, Aaron Patterson wrote: > I think I've got the error handling in a place where I'm happy. > Please take a look at the errors branch, and tell me what you think: > > http://github.com/tenderlove/nokogiri/tree/errors > > The "error_handler" lambda is *gone*. It was not thread safe, and > IMHO not very useful. When doing DOM parses, even if you ran in to > errors there isn't anything you could do about it. So being notified > of the parse errors *after* parsing seems acceptable to me. All > document objects will now have a list of errors encountered while > parsing the document. > > For example: > > doc = Nokogiri::XML('') > puts doc.errors.map { |error| error.to_s }.join("\n") > > That being said, if you want *strict* parsing, you'll get an exception raised: > > begin > doc = Nokogiri::XML('', nil, nil, 0) > rescue Nokogiri::XML::SyntaxError => ex > puts ex > end > > Removing the error_handler lambda has also made the error callbacks on > SAX parsers work. > > If everyone is cool with this, I'm going to merge it to master and it > will be in the next release. I will take silence as a sign of > approval. ;-) > > The next thing I want to tackle is configuring the parser. I hate > that you have to look up constants and pass numbers as flags to the > parser. I would like to do something like this: > > doc = Nokogiri::XML('') do |config| > config.encoding = 'UTF-8' > config.recover_errors > config.no_warnings > end > > Comments? > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From jeff at somethingsimilar.com Sun Feb 8 01:23:05 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sat, 7 Feb 2009 22:23:05 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> Message-ID: Oh, and I just noticed that it's still returning just a string to SAX::Parser#error. Why wouldn't we want to return an actual error object? -- Jeff On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote: > Looks good to me. > > Nice, I thought you'd have to put a conditional in the the structured > error function, but you managed to just move setting it (and then > unsetting it immediately) to where it was actually needed. > > One question though, I'm seeing this in a few places: > > if (doc == NULL) { > xmlFreeDoc(doc) > .... > } > > I'm going to guess this is used solely to get an error from libxml2 if > DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's > what I gathered from the libxml code I have on hand. Just making sure > I'm not missing Something Clever. > -- > Jeff > > On Sat, Feb 7, 2009 at 7:33 PM, Aaron Patterson > wrote: >> I think I've got the error handling in a place where I'm happy. >> Please take a look at the errors branch, and tell me what you think: >> >> http://github.com/tenderlove/nokogiri/tree/errors >> >> The "error_handler" lambda is *gone*. It was not thread safe, and >> IMHO not very useful. When doing DOM parses, even if you ran in to >> errors there isn't anything you could do about it. So being notified >> of the parse errors *after* parsing seems acceptable to me. All >> document objects will now have a list of errors encountered while >> parsing the document. >> >> For example: >> >> doc = Nokogiri::XML('') >> puts doc.errors.map { |error| error.to_s }.join("\n") >> >> That being said, if you want *strict* parsing, you'll get an exception raised: >> >> begin >> doc = Nokogiri::XML('', nil, nil, 0) >> rescue Nokogiri::XML::SyntaxError => ex >> puts ex >> end >> >> Removing the error_handler lambda has also made the error callbacks on >> SAX parsers work. >> >> If everyone is cool with this, I'm going to merge it to master and it >> will be in the next release. I will take silence as a sign of >> approval. ;-) >> >> The next thing I want to tackle is configuring the parser. I hate >> that you have to look up constants and pass numbers as flags to the >> parser. I would like to do something like this: >> >> doc = Nokogiri::XML('') do |config| >> config.encoding = 'UTF-8' >> config.recover_errors >> config.no_warnings >> end >> >> Comments? >> >> -- >> Aaron Patterson >> http://tenderlovemaking.com/ >> _______________________________________________ >> Nokogiri-talk mailing list >> Nokogiri-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/nokogiri-talk >> > From aaron.patterson at gmail.com Sun Feb 8 02:41:59 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 7 Feb 2009 23:41:59 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> Message-ID: <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> On Sat, Feb 7, 2009 at 10:23 PM, Jeff Hodges wrote: > Oh, and I just noticed that it's still returning just a string to > SAX::Parser#error. Why wouldn't we want to return an actual error > object? I suppose so.... I'll have to play with that. It might not be possible. The callbacks for SAX parsing only give you a string for the error callback, they don't actually give you the error object. I can ask libxml to tell me about the last error it encountered, but that might not be thread safe, and libxml may not have added the error to its list until *after* the error callback finishes. > On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote: >> Looks good to me. >> >> Nice, I thought you'd have to put a conditional in the the structured >> error function, but you managed to just move setting it (and then >> unsetting it immediately) to where it was actually needed. >> >> One question though, I'm seeing this in a few places: >> >> if (doc == NULL) { >> xmlFreeDoc(doc) >> .... >> } >> >> I'm going to guess this is used solely to get an error from libxml2 if >> DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's >> what I gathered from the libxml code I have on hand. Just making sure >> I'm not missing Something Clever. Nope, nothing clever. You are correct. -- Aaron Patterson http://tenderlovemaking.com/ From jeff at somethingsimilar.com Sun Feb 8 06:53:42 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sun, 8 Feb 2009 03:53:42 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: I was thinking something along the lines of static void error_func(void * ctx, const char *msg, ...) { ... VALUE mah_error = Nokogiri_syntax_error_from_string(message); rb_funcall(doc, rb_intern("error"), 1, mah_error); free(message); } Would this be problematic, too? -- Jeff On Sat, Feb 7, 2009 at 11:41 PM, Aaron Patterson wrote: > On Sat, Feb 7, 2009 at 10:23 PM, Jeff Hodges wrote: >> Oh, and I just noticed that it's still returning just a string to >> SAX::Parser#error. Why wouldn't we want to return an actual error >> object? > > I suppose so.... I'll have to play with that. It might not be > possible. The callbacks for SAX parsing only give you a string for > the error callback, they don't actually give you the error object. > > I can ask libxml to tell me about the last error it encountered, but > that might not be thread safe, and libxml may not have added the error > to its list until *after* the error callback finishes. > >> On Sat, Feb 7, 2009 at 9:13 PM, Jeff Hodges wrote: >>> Looks good to me. >>> >>> Nice, I thought you'd have to put a conditional in the the structured >>> error function, but you managed to just move setting it (and then >>> unsetting it immediately) to where it was actually needed. >>> >>> One question though, I'm seeing this in a few places: >>> >>> if (doc == NULL) { >>> xmlFreeDoc(doc) >>> .... >>> } >>> >>> I'm going to guess this is used solely to get an error from libxml2 if >>> DEBUG_TREE is flipped. Otherwise, it's a noop, right? At least, that's >>> what I gathered from the libxml code I have on hand. Just making sure >>> I'm not missing Something Clever. > > Nope, nothing clever. You are correct. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > From jeff at somethingsimilar.com Sun Feb 8 06:55:05 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sun, 8 Feb 2009 03:55:05 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: Obviously, we couldn't use the usual XML::SyntaxError since we lack the rest of the struct we need. So maybe a XML::SAX::SyntaxError < ::Nokogiri::SyntaxError? -- Jeff On Sun, Feb 8, 2009 at 3:53 AM, Jeff Hodges wrote: > I was thinking something along the lines of > > static void error_func(void * ctx, const char *msg, ...) > { > ... > VALUE mah_error = Nokogiri_syntax_error_from_string(message); > rb_funcall(doc, rb_intern("error"), 1, mah_error); > free(message); > } > > Would this be problematic, too? > -- > Jeff From jeff at somethingsimilar.com Sun Feb 8 06:56:30 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sun, 8 Feb 2009 03:56:30 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I should get some sleep. Hrm. -- Jeff On Sun, Feb 8, 2009 at 3:55 AM, Jeff Hodges wrote: > Obviously, we couldn't use the usual XML::SyntaxError since we lack > the rest of the struct we need. So maybe a XML::SAX::SyntaxError < > ::Nokogiri::SyntaxError? > -- > Jeff > > On Sun, Feb 8, 2009 at 3:53 AM, Jeff Hodges wrote: >> I was thinking something along the lines of >> >> static void error_func(void * ctx, const char *msg, ...) >> { >> ... >> VALUE mah_error = Nokogiri_syntax_error_from_string(message); >> rb_funcall(doc, rb_intern("error"), 1, mah_error); >> free(message); >> } >> >> Would this be problematic, too? >> -- >> Jeff > From lianliming at gmail.com Sun Feb 8 09:52:21 2009 From: lianliming at gmail.com (Lian Liming) Date: Sun, 8 Feb 2009 22:52:21 +0800 Subject: [Nokogiri-talk] Docs for nokogiri? In-Reply-To: <6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com> References: <2ab0f52d0902070630o4f0833ffqf965ccfeffcb06f3@mail.gmail.com> <6959e1680902072018t6aa94ab6sa10da3edb638e0c5@mail.gmail.com> Message-ID: <2ab0f52d0902080652rae2c1c2l341924fed2ff5d57@mail.gmail.com> > > What kind of information are you looking for? I would hope that the > wiki, rdoc, and test cases would get you going. What kind of > information are you missing? That might help me document it better. > :-) > It will be better if nokogiri can also have similar docs like :http://wiki.github.com/why/hpricot/an-hpricot-showcase, which is more easier for new users to learn basic and advanced usages of nokogiri. From jeff at somethingsimilar.com Sun Feb 8 16:39:06 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sun, 8 Feb 2009 13:39:06 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute proof of concept that passes error objects to SAX::Parser#error with the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The only problem is that SAX::SyntaxError does not have ancestry in common with XML::SyntaxError. A few options present themselves. -- Jeff On Sun, Feb 8, 2009 at 3:56 AM, Jeff Hodges wrote: > Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I > should get some sleep. Hrm. > -- > Jeff From jeff at somethingsimilar.com Sun Feb 8 16:41:15 2009 From: jeff at somethingsimilar.com (Jeff Hodges) Date: Sun, 8 Feb 2009 13:41:15 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: The link to the branch: Spamming everyone always. -- Jeff On Sun, Feb 8, 2009 at 1:39 PM, Jeff Hodges wrote: > Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the > structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute > proof of concept that passes error objects to SAX::Parser#error with > the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The > only problem is that SAX::SyntaxError does not have ancestry in common > with XML::SyntaxError. A few options present themselves. > -- > Jeff > > On Sun, Feb 8, 2009 at 3:56 AM, Jeff Hodges wrote: >> Oh, fuck, right. We have that structure on Nokogiri::SyntaxError. I >> should get some sleep. Hrm. >> -- >> Jeff > From aaron.patterson at gmail.com Sun Feb 8 19:10:22 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sun, 8 Feb 2009 16:10:22 -0800 Subject: [Nokogiri-talk] error handling in 1.2.0 In-Reply-To: References: <6959e1680902071933p3e6fdc5aq61d83d094ef20980@mail.gmail.com> <6959e1680902072341v2ca5e777nd6bc5bdc495ac188@mail.gmail.com> Message-ID: <6959e1680902081610x430e68a5mf524878311b30a08@mail.gmail.com> On Sun, Feb 8, 2009 at 1:39 PM, Jeff Hodges wrote: > Hah, I was wrong twice. Nokogiri::SyntaxError doesn't have the > structure, Nokogiri::XML::SyntaxError does. I whipped up a 10 minute > proof of concept that passes error objects to SAX::Parser#error with > the side effect of having a new Nokogiri::XML::SAX::SyntaxError. The > only problem is that SAX::SyntaxError does not have ancestry in common > with XML::SyntaxError. A few options present themselves. Hmmm... I'm not sure what this buys us. Since we can't get an xmlErrorPtr in the SAX parser, we're essentially just using an exception object to pass a string. Why not just pass the string? The person implementing the SAX document knows that is an error case and they can act accordingly. -- Aaron Patterson http://tenderlovemaking.com/ From aaron.patterson at gmail.com Tue Feb 10 12:03:47 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 10 Feb 2009 09:03:47 -0800 Subject: [Nokogiri-talk] just one more feature.... Message-ID: <6959e1680902100903k495ba810n2326a23b9eb05e8e@mail.gmail.com> I need to get some opinions on this one..... I've added namespace support to CSS selectors. Now you can do something like this: doc.css('xmlns|link') and that gets converted to this xpath: //xmlns:link I'm thinking about automatically registering the default namespace on the root node when doing CSS selector searches. If you check out section 4 of the CSS3 namespace spec, hopefully that will make sense: http://www.w3.org/TR/css3-namespace/ This way, given the following document: These CSS to XPath conversions will be made: doc.css('foo') => //xmlns:foo doc.css('|foo') => //foo doc.css('xmlns|foo') => //xmlns:foo I think it would make CSS queries in XML documents less surprising and more useful. Just a couple open points: 1. I'm not quite sure how to support the '*|foo' syntax. 2. Should I automatically register *all* namespaces on the root, or just the default one? If I get a couple +1's on this, I'll add it for 1.2.0 (the next release). -- Aaron Patterson http://tenderlovemaking.com/ From julien.genestoux at gmail.com Thu Feb 19 02:22:13 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Wed, 18 Feb 2009 23:22:13 -0800 Subject: [Nokogiri-talk] Nokogiri and namespaces Message-ID: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com> Hello! I am working on Babylon (http://github.com/julien51/babylon/tree/master) and we're using an extensive use Nokogiri's XML SAX Push parser! Thank you for this! To "dispatch" the xml stanzas to the right component, we're using XPATH matching. We used REXML for this and decided to switch all over to Nokogiri (maibnly to decrease the number of dependencies). Unfortunately it seems that it deosn't work, since namespaces are set up with the following code : class XmppParser < Nokogiri::XML::SAX::Document def initialize(&callback) @callback = callback super() @parser = Nokogiri::XML::SAX::Parser.new(self) @doc = nil @elem = nil end def parse(data) @parser.parse data end def start_document @doc = Nokogiri::XML::Document.new end def characters(string) @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem end alias :characters :cdata_block def start_element(qname, attributes = []) e = Nokogiri::XML::Element.new(qname, @doc) # Attributes is an array like [name, value, name, value]... (attributes.size / 2).times do |i| name, value = attributes[2 * i], attributes[2 * i + 1] e.set_attribute name, value end @elem = @elem ? @elem.add_child(e) : (@root = e) if @elem.parent.nil? @callback.call(@elem) end end def end_element(name) if @elem puts @elem.inspect puts @elem.namespaces.inspect @callback.call(@elem) if @elem.parent == @root @elem = @elem.parent # now remove from parent again to avoid space leak: # TODO end end end When the parser receives : << adzadz It outputs : {} {} {} {} Which seems to mean that namespaces are not added (specially for , for example). I have looked into the documentation to find out how to explicitly specify namespaces, but haven't found anything... Can you guys help? Thanks a lot, Julien -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Thu Feb 19 11:40:33 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 19 Feb 2009 08:40:33 -0800 Subject: [Nokogiri-talk] Nokogiri and namespaces In-Reply-To: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com> References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com> Message-ID: <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com> On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux wrote: > Hello! > > I am working on Babylon (http://github.com/julien51/babylon/tree/master) and > we're using an extensive use Nokogiri's XML SAX Push parser! Thank you for > this! > > To "dispatch" the xml stanzas to the right component, we're using XPATH > matching. We used REXML for this and decided to switch all over to Nokogiri > (maibnly to decrease the number of dependencies). > Unfortunately it seems that it deosn't work, since namespaces are set up > with the following code : > > class XmppParser < Nokogiri::XML::SAX::Document > def initialize(&callback) > @callback = callback > super() > @parser = Nokogiri::XML::SAX::Parser. > new(self) > @doc = nil > @elem = nil > end > > def parse(data) > @parser.parse data > end > > def start_document > @doc = Nokogiri::XML::Document.new > end > > def characters(string) > @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem > end > alias :characters :cdata_block > > def start_element(qname, attributes = []) > e = Nokogiri::XML::Element.new(qname, @doc) > # Attributes is an array like [name, value, name, value]... > (attributes.size / 2).times do |i| > name, value = attributes[2 * i], attributes[2 * i + 1] > e.set_attribute name, value > end > > @elem = @elem ? @elem.add_child(e) : (@root = e) > if @elem.parent.nil? > @callback.call(@elem) > end > end > > def end_element(name) > if @elem > > puts @elem.inspect > puts @elem.namespaces.inspect > > @callback.call(@elem) if @elem.parent == @root > @elem = @elem.parent > # now remove from parent again to avoid space leak: > # TODO > end > end > end > > > When the parser receives : > << to='pubsubapi-dev.xmpp.notifixio.us' type='chat' > id='purple6aff0038'>adz xmlns='http://jabber.org/protocol/xhtml-im'> xmlns='http://www.w3.org/1999/xhtml'>adz > > It outputs : > > {} > > {} > > > > {} > to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d"> > > > > > > {} > > > Which seems to mean that namespaces are not added (specially for , for > example). I have looked into the documentation to find out how to explicitly > specify namespaces, but haven't found anything... Can you guys help? You can't with the currently released version, but it seems easy enough to add, and something we need. I will add it for 1.2.0. I *hope* to have that released this weekend. I'll post a follow up once I get it implemented. -- Aaron Patterson http://tenderlovemaking.com/ From julien.genestoux at gmail.com Thu Feb 19 12:20:18 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Thu, 19 Feb 2009 09:20:18 -0800 Subject: [Nokogiri-talk] Nokogiri and namespaces In-Reply-To: <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com> References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com> <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com> Message-ID: <26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com> Aaron, Thanks for this! Looking forward to see it during the weekend ;) Please let me know if I can be of any help! Thanks again! Julien -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 On Thu, Feb 19, 2009 at 8:40 AM, Aaron Patterson wrote: > On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux > wrote: > > Hello! > > > > I am working on Babylon (http://github.com/julien51/babylon/tree/master) > and > > we're using an extensive use Nokogiri's XML SAX Push parser! Thank you > for > > this! > > > > To "dispatch" the xml stanzas to the right component, we're using XPATH > > matching. We used REXML for this and decided to switch all over to > Nokogiri > > (maibnly to decrease the number of dependencies). > > Unfortunately it seems that it deosn't work, since namespaces are set up > > with the following code : > > > > class XmppParser < Nokogiri::XML::SAX::Document > > def initialize(&callback) > > @callback = callback > > super() > > @parser = Nokogiri::XML::SAX::Parser. > > new(self) > > @doc = nil > > @elem = nil > > end > > > > def parse(data) > > @parser.parse data > > end > > > > def start_document > > @doc = Nokogiri::XML::Document.new > > end > > > > def characters(string) > > @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem > > end > > alias :characters :cdata_block > > > > def start_element(qname, attributes = []) > > e = Nokogiri::XML::Element.new(qname, @doc) > > # Attributes is an array like [name, value, name, value]... > > (attributes.size / 2).times do |i| > > name, value = attributes[2 * i], attributes[2 * i + 1] > > e.set_attribute name, value > > end > > > > @elem = @elem ? @elem.add_child(e) : (@root = e) > > if @elem.parent.nil? > > @callback.call(@elem) > > end > > end > > > > def end_element(name) > > if @elem > > > > puts @elem.inspect > > puts @elem.namespaces.inspect > > > > @callback.call(@elem) if @elem.parent == @root > > @elem = @elem.parent > > # now remove from parent again to avoid space leak: > > # TODO > > end > > end > > end > > > > > > When the parser receives : > > << > to='pubsubapi-dev.xmpp.notifixio.us' type='chat' > > id='purple6aff0038'>adz > xmlns='http://jabber.org/protocol/xhtml-im'> > xmlns='http://www.w3.org/1999/xhtml'>adz > > > > It outputs : > > > > {} > > > > {} > > > > > > > > {} > > > to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d"> > > > > > > > > > > > > {} > > > > > > Which seems to mean that namespaces are not added (specially for , > for > > example). I have looked into the documentation to find out how to > explicitly > > specify namespaces, but haven't found anything... Can you guys help? > > You can't with the currently released version, but it seems easy > enough to add, and something we need. > I will add it for 1.2.0. I *hope* to have that released this weekend. > > I'll post a follow up once I get it implemented. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien.genestoux at gmail.com Mon Feb 23 14:41:34 2009 From: julien.genestoux at gmail.com (Julien Genestoux) Date: Mon, 23 Feb 2009 11:41:34 -0800 Subject: [Nokogiri-talk] Nokogiri and namespaces In-Reply-To: <26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com> References: <26c0cf900902182322x5aefca21hd2721599bbf9a6cf@mail.gmail.com> <6959e1680902190840o541f6493oa8050ecf73ee0e2c@mail.gmail.com> <26c0cf900902190920s4bf74280hf8ab6ed2e621ddb9@mail.gmail.com> Message-ID: <26c0cf900902231141s7105bf0dia69c1528afe2eb80@mail.gmail.com> Thanks a lot Namespace support! http://github.com/tenderlove/nokogiri/commit/e969fbe3d2cd273c7988968e0b2555b5e18f8f16 ;) -- Julien Genestoux http://www.ouvre-boite.com http://blog.notifixio.us +1 (415) 254 7340 +33 (0)9 70 44 76 29 On Thu, Feb 19, 2009 at 9:20 AM, Julien Genestoux < julien.genestoux at gmail.com> wrote: > Aaron, > > Thanks for this! Looking forward to see it during the weekend ;) > > Please let me know if I can be of any help! > > Thanks again! > > Julien > > > -- > Julien Genestoux > http://www.ouvre-boite.com > http://blog.notifixio.us > > +1 (415) 254 7340 > +33 (0)9 70 44 76 29 > > > On Thu, Feb 19, 2009 at 8:40 AM, Aaron Patterson < > aaron.patterson at gmail.com> wrote: > >> On Wed, Feb 18, 2009 at 11:22 PM, Julien Genestoux >> wrote: >> > Hello! >> > >> > I am working on Babylon (http://github.com/julien51/babylon/tree/master) >> and >> > we're using an extensive use Nokogiri's XML SAX Push parser! Thank you >> for >> > this! >> > >> > To "dispatch" the xml stanzas to the right component, we're using XPATH >> > matching. We used REXML for this and decided to switch all over to >> Nokogiri >> > (maibnly to decrease the number of dependencies). >> > Unfortunately it seems that it deosn't work, since namespaces are set up >> > with the following code : >> > >> > class XmppParser < Nokogiri::XML::SAX::Document >> > def initialize(&callback) >> > @callback = callback >> > super() >> > @parser = Nokogiri::XML::SAX::Parser. >> > new(self) >> > @doc = nil >> > @elem = nil >> > end >> > >> > def parse(data) >> > @parser.parse data >> > end >> > >> > def start_document >> > @doc = Nokogiri::XML::Document.new >> > end >> > >> > def characters(string) >> > @elem.add(Nokogiri::XML::Text.new(string, @doc)) if @elem >> > end >> > alias :characters :cdata_block >> > >> > def start_element(qname, attributes = []) >> > e = Nokogiri::XML::Element.new(qname, @doc) >> > # Attributes is an array like [name, value, name, value]... >> > (attributes.size / 2).times do |i| >> > name, value = attributes[2 * i], attributes[2 * i + 1] >> > e.set_attribute name, value >> > end >> > >> > @elem = @elem ? @elem.add_child(e) : (@root = e) >> > if @elem.parent.nil? >> > @callback.call(@elem) >> > end >> > end >> > >> > def end_element(name) >> > if @elem >> > >> > puts @elem.inspect >> > puts @elem.namespaces.inspect >> > >> > @callback.call(@elem) if @elem.parent == @root >> > @elem = @elem.parent >> > # now remove from parent again to avoid space leak: >> > # TODO >> > end >> > end >> > end >> > >> > >> > When the parser receives : >> > << > > to='pubsubapi-dev.xmpp.notifixio.us' type='chat' >> > id='purple6aff0038'>adz> > xmlns='http://jabber.org/protocol/xhtml-im'>> > xmlns='http://www.w3.org/1999/xhtml'>adz >> > >> > It outputs : >> > >> > {} >> > >> > {} >> > >> > >> > >> > {} >> > > > to="pubsubapi-dev.xmpp.notifixio.us" type="chat" id="purple6aff003d"> >> > >> > >> > >> > >> > >> > {} >> > >> > >> > Which seems to mean that namespaces are not added (specially for , >> for >> > example). I have looked into the documentation to find out how to >> explicitly >> > specify namespaces, but haven't found anything... Can you guys help? >> >> You can't with the currently released version, but it seems easy >> enough to add, and something we need. >> I will add it for 1.2.0. I *hope* to have that released this weekend. >> >> I'll post a follow up once I get it implemented. >> >> -- >> Aaron Patterson >> http://tenderlovemaking.com/ >> _______________________________________________ >> Nokogiri-talk mailing list >> Nokogiri-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/nokogiri-talk >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron at tenderlovemaking.com Mon Feb 23 22:37:27 2009 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Mon, 23 Feb 2009 19:37:27 -0800 Subject: [Nokogiri-talk] [ANN] nokogiri 1.2.1 Released Message-ID: <20090224033727.GA16639@Jordan.local> nokogiri version 1.2.1 has been released! * * * * * Nokogiri (?) is an HTML, XML, SAX, and Reader parser. Changes: ### 1.2.1 / 2008-02-23 * Bugfixes * Fixed a CSS selector space bug * Fixed Ruby 1.9 String Encoding (Thanks ?????) ## FEATURES: * XPath support for document searching * CSS3 selector support for document searching * XML/HTML builder * Drop in replacement for Hpricot (though not bug for bug) Nokogiri parses and searches XML/HTML very quickly, and also has correctly implemented CSS3 selector support as well as XPath support. Here is a speed test: * http://gist.github.com/24605 Nokogiri also features an Hpricot compatibility layer to help ease the change to using correct CSS and XPath. ## SUPPORT: The Nokogiri mailing list is available here: * http://rubyforge.org/mailman/listinfo/nokogiri-talk The bug tracker is available here: * http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview ## SYNOPSIS: require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove')) #### # Search for nodes by css doc.css('h3.r a.l').each do |link| puts link.content end #### # Search for nodes by xpath doc.xpath('//h3/a[@class="l"]').each do |link| puts link.content end #### # Or mix and match. doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link| puts link.content end ## REQUIREMENTS: * ruby 1.8 or 1.9 * libxml * libxslt ## INSTALL: * sudo gem install nokogiri * * * * * -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Tue Feb 24 19:55:35 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Tue, 24 Feb 2009 16:55:35 -0800 Subject: [Nokogiri-talk] XPath expressions other than location paths Message-ID: <1235523335.14023.40.camel@vandenhoven> Last month, Aaron Patterson, Andrew Watts-Curnow and a few others discussed this briefly. It can be summarized in this exchange: >> Does libxml support expressions other than location paths? >> Would this make sense as an enhancement to nokogiri? >I think it is possible, but I have a hard time justifying to myself >why you would need it. Today I want to offer a reason why non-location paths are important, even critical, to any XPath implementation. Its in the filters, or "where" clauses, if you prefer. Here's an example. So I might want the following: //store[count(book) gt 7]/book //store[name[contains(text(), 'Jim')]]/book That is, give me all the books that are immediate children of stores that have at least 7 books and give me all the books that are immediate children of stores whose name child contains "Jim". They're a little contrived, but you get the point. Anticipating your next objection, we should NOT rely on ruby code to handle this. That is you might do something like: books = [] doc.xpath('//store').each do |store| books << store.xpath('./book') if ( store.xpath('./book').length > 7 ) end or some variation (I've never tried it so it might be syntactically wrong but you get the idea). There are several differences between the two. First, is complexity. The ruby code is much longer. Its also a lot more challenging to understand what's going on. As the XPath gets more complex, the ruby will grow less understandable. Further, converting from one to the other is hard to do in the non-trivial case unless you have a strong understanding of set theory. For example /foo[bar != 'kronk'] is very different from /foo[not(bar = 'kronk')] in the first case, you get all the foos where one bar is not kronk and in the second case, you get all the foos where no bar is kronk. In the non-trivial case, this logic can be very easy to express in an XPath but very hard to get right in code. Second is usage. In order to be considered for a position with a local software company, I've been asked to write a little cgi script that will take author and/or title or ISBN, scrape some number of sites for the pricing information and present a rather banal table comparing the results. In general, its better to write configuration files than it is code. In non-trivial development environments, deploying code is a bigger deal than deploying content, so keeping something that is likely to change frequently (compared to your build cycle) as content is a superior approach than putting it in compiled code. If this was a real application, then I would expect the sites I'm scraping to change their structure with some frequency; it would probably not fit within my build cycle and the system would be broken for some weeks. But if its just a configuration file, those (at least in the environments where I've worked before) are not part of the build process but of the content publishing process. That happens much more frequently (each time someone writes new content). With some careful work, I can write robust XPaths for probably any site I will need to work with; many times those paths will be UGLY. In the cases where there is little semantic information to work from, and the structure changes, so that an element you care about is only identifiable by the text content of some related node. If all the non-location paths are supported, I can use them for the filters of my XPaths and that means that I can actually encode everything as an XPath which can be saved as a string in my yaml file. Adam From aaron.patterson at gmail.com Tue Feb 24 20:16:21 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 24 Feb 2009 17:16:21 -0800 Subject: [Nokogiri-talk] XPath expressions other than location paths In-Reply-To: <1235523335.14023.40.camel@vandenhoven> References: <1235523335.14023.40.camel@vandenhoven> Message-ID: <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com> On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven wrote: > Last month, Aaron Patterson, Andrew Watts-Curnow and a few others > discussed this briefly. It can be summarized in this exchange: > >>> Does libxml support expressions other than location paths? >>> Would this make sense as an enhancement to nokogiri? >>I think it is possible, but I have a hard time justifying to myself >>why you would need it. Not exactly. XPath functions are fine. I object to the xpath() method returning anything other than an XML::NodeSet. > Today I want to offer a reason why non-location paths are important, > even critical, to any XPath implementation. Its in the filters, or > "where" clauses, if you prefer. > > Here's an example. > > So I might want the following: > > //store[count(book) gt 7]/book > //store[name[contains(text(), 'Jim')]]/book These examples work with XPath and Nokogiri out of the box. An example: http://gist.github.com/69929 -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Tue Feb 24 22:36:17 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Tue, 24 Feb 2009 19:36:17 -0800 Subject: [Nokogiri-talk] XPath expressions other than location paths In-Reply-To: <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com> References: <1235523335.14023.40.camel@vandenhoven> <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com> Message-ID: <1235532977.6580.7.camel@vandenhoven> On Tue, 2009-02-24 at 17:16 -0800, Aaron Patterson wrote: > On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven > wrote: > > Last month, Aaron Patterson, Andrew Watts-Curnow and a few others > > discussed this briefly. It can be summarized in this exchange: > > > >>> Does libxml support expressions other than location paths? > >>> Would this make sense as an enhancement to nokogiri? > >>I think it is possible, but I have a hard time justifying to myself > >>why you would need it. > > Not exactly. XPath functions are fine. I object to the xpath() > method returning anything other than an XML::NodeSet. Oh. I can see that. The only question, then, is you can claim to support XPath without doing it? And would this lack of full implementation be a hindrance to acceptance, for example among those who already know XPath? OK the only TWO questions, then, are .... > > Today I want to offer a reason why non-location paths are important, > > even critical, to any XPath implementation. Its in the filters, or > > "where" clauses, if you prefer. > > > > Here's an example. > > > > So I might want the following: > > > > //store[count(book) gt 7]/book > > //store[name[contains(text(), 'Jim')]]/book > > These examples work with XPath and Nokogiri out of the box. An example: > > http://gist.github.com/69929 Hmm. OK. I'd never worked with Nokogiri before (I'd previously worked with Scrubyt but the latest version wasn't working for me at all). I'd tried an XPath that was working there and I fixed it to do something similar who what I suggested and it didn't work the way I expected. But now it seems to be working so I guess my tests used the wrong path. Sorry for the confusion and thanks for the clarification. Adam From aaron.patterson at gmail.com Wed Feb 25 00:48:31 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 24 Feb 2009 21:48:31 -0800 Subject: [Nokogiri-talk] XPath expressions other than location paths In-Reply-To: <1235532977.6580.7.camel@vandenhoven> References: <1235523335.14023.40.camel@vandenhoven> <6959e1680902241716i161911epbebff81c988ec77a@mail.gmail.com> <1235532977.6580.7.camel@vandenhoven> Message-ID: <6959e1680902242148u238325ffle4416defe709f7ab@mail.gmail.com> On Tue, Feb 24, 2009 at 7:36 PM, Adam van den Hoven wrote: > On Tue, 2009-02-24 at 17:16 -0800, Aaron Patterson wrote: >> On Tue, Feb 24, 2009 at 4:55 PM, Adam van den Hoven >> wrote: >> > Last month, Aaron Patterson, Andrew Watts-Curnow and a few others >> > discussed this briefly. It can be summarized in this exchange: >> > >> >>> Does libxml support expressions other than location paths? >> >>> Would this make sense as an enhancement to nokogiri? >> >>I think it is possible, but I have a hard time justifying to myself >> >>why you would need it. >> >> Not exactly. ?XPath functions are fine. ?I object to the xpath() >> method returning anything other than an XML::NodeSet. > > Oh. I can see that. The only question, then, is you can claim to support > XPath without doing it? And would this lack of full implementation be a > hindrance to acceptance, for example among those who already know XPath? > OK the only TWO questions, then, are .... If someone is unhappy with my code, I will issue a full refund. >> > Today I want to offer a reason why non-location paths are important, >> > even critical, to any XPath implementation. Its in the filters, or >> > "where" clauses, if you prefer. >> > >> > Here's an example. >> > >> > So I might want the following: >> > >> > //store[count(book) gt 7]/book >> > //store[name[contains(text(), 'Jim')]]/book >> >> These examples work with XPath and Nokogiri out of the box. ?An example: >> >> ? http://gist.github.com/69929 > > Hmm. OK. I'd never worked with Nokogiri before (I'd previously worked > with Scrubyt but the latest version wasn't working for me at all). I'd > tried an XPath that was working there and I fixed it to do something > similar who what I suggested and it didn't work the way I expected. But > now it seems to be working so I guess my tests used the wrong path. > Sorry for the confusion and thanks for the clarification. No problem. Glad I could help. -- Aaron Patterson http://tenderlovemaking.com/ From adam.vandenhoven at gmail.com Wed Feb 25 18:54:41 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Wed, 25 Feb 2009 15:54:41 -0800 Subject: [Nokogiri-talk] HTML builder and Paragraph tags. Message-ID: <1235606081.6571.20.camel@vandenhoven> hey guys, The documentation is a little thin on one point that I need help with. I want to write some paragraph tags. Following the builder's syntax that would look something like: builder = Nokogiri::HTML::Builder.new do div.test do p "this is a paragraph" end end The problem is, however, that p is also a method of the Kernel so it doesn't trigger method_missing. What's the "right" way to put in paragraphs? Adam From greg at intelligentassistance.com Wed Feb 25 19:38:59 2009 From: greg at intelligentassistance.com (Gregory Clarke) Date: Wed, 25 Feb 2009 16:38:59 -0800 Subject: [Nokogiri-talk] HTML builder and Paragraph tags. In-Reply-To: <1235606081.6571.20.camel@vandenhoven> References: <1235606081.6571.20.camel@vandenhoven> Message-ID: <580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com> With the Builder gem you do things like this: x = Builder::XmlMarkup.new(:target => @xml, :indent => 2) x.instruct! x.div "test" do x.p "this is a paragraph" end Maybe nokogiri is similar? > hey guys, > > The documentation is a little thin on one point that I need help with. > > I want to write some paragraph tags. Following the builder's syntax > that > would look something like: > > builder = Nokogiri::HTML::Builder.new do > div.test do > p "this is a paragraph" > end > end > > The problem is, however, that p is also a method of the Kernel so it > doesn't trigger method_missing. > > What's the "right" way to put in paragraphs? > > Adam > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk From mike at csa.net Thu Feb 26 08:44:59 2009 From: mike at csa.net (Mike Dalessio) Date: Thu, 26 Feb 2009 08:44:59 -0500 Subject: [Nokogiri-talk] HTML builder and Paragraph tags. In-Reply-To: <580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com> References: <1235606081.6571.20.camel@vandenhoven> <580FC677-B443-4544-B784-DE176F055E87@intelligentassistance.com> Message-ID: <618c07250902260544g307d500s790b29a2a65320ef@mail.gmail.com> That's close - you can access the builder through a block argument: builder = Nokogiri::HTML::Builder.new do div.test do |builder| builder.p "this is a paragraph" end end On Wed, Feb 25, 2009 at 7:38 PM, Gregory Clarke < greg at intelligentassistance.com> wrote: > With the Builder gem you do things like this: > > x = Builder::XmlMarkup.new(:target => @xml, :indent => 2) > x.instruct! > x.div "test" do > x.p "this is a paragraph" > end > > Maybe nokogiri is similar? > > > > hey guys, >> >> The documentation is a little thin on one point that I need help with. >> >> I want to write some paragraph tags. Following the builder's syntax that >> would look something like: >> >> builder = Nokogiri::HTML::Builder.new do >> div.test do >> p "this is a paragraph" >> end >> end >> >> The problem is, however, that p is also a method of the Kernel so it >> doesn't trigger method_missing. >> >> What's the "right" way to put in paragraphs? >> >> Adam >> >> _______________________________________________ >> Nokogiri-talk mailing list >> Nokogiri-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/nokogiri-talk >> > > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.vandenhoven at gmail.com Fri Feb 27 02:00:41 2009 From: adam.vandenhoven at gmail.com (Adam van den Hoven) Date: Thu, 26 Feb 2009 23:00:41 -0800 Subject: [Nokogiri-talk] Nokogiri markup to CGI Message-ID: <1235718041.23995.23.camel@vandenhoven> I have what is probably an obvious question. I'm using nokogiri in a simple CGI script and I need to send contents of the builder object to the output. There is probably an easy way and a right way, but I'm not sure what that would be. Any thoughts. From andrew at nextmobileweb.com Sat Feb 28 17:07:31 2009 From: andrew at nextmobileweb.com (Andrew Farmer) Date: Sat, 28 Feb 2009 14:07:31 -0800 Subject: [Nokogiri-talk] would you use this feature? inner_html= Message-ID: I made a ticket for this feature request: I would like Nodes to have an inner_html= function. http://nokogiri.lighthouseapp.com/projects/19607/tickets/46-feature-request-inner_html-method-on-node Aaron would like to know if anyone aside from me would use this feature, so would you? And for my own edification, what is the existing way to set the inner html of an element? Thanks, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at nextmobileweb.com Sat Feb 28 17:28:53 2009 From: andrew at nextmobileweb.com (Andrew Farmer) Date: Sat, 28 Feb 2009 14:28:53 -0800 Subject: [Nokogiri-talk] would you use this feature? Node.swap(html) Message-ID: Another feature that I would like: Node.swap( html ). It would be a method on a Node that you can use to replace that Node with arbitrary HTML. This is something that I'm used to using heavily in Hpricot. http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1 So this is another open question to everyone: would you use this feature? Do you currently do something similar but implement it in a different way? Thanks, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From holtonma at gmail.com Sat Feb 28 17:44:23 2009 From: holtonma at gmail.com (Holtonma) Date: Sat, 28 Feb 2009 14:44:23 -0800 Subject: [Nokogiri-talk] would you use this feature? Node.swap(html) In-Reply-To: References: Message-ID: On Feb 28, 2009, at 2:28 PM, Andrew Farmer wrote: > Another feature that I would like: Node.swap( html ). It would be a > method on a Node that you can use to replace that Node with > arbitrary HTML. This is something that I'm used to using heavily in > Hpricot. > > http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1 > > So this is another open question to everyone: would you use this > feature? Do you currently do something similar but implement it > in a different way? > > > Thanks, > Andrew Curious - how does that differ from .innerHTML? (which I believe Nokogiri supports) -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Sat Feb 28 19:33:24 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 28 Feb 2009 16:33:24 -0800 Subject: [Nokogiri-talk] would you use this feature? Node.swap(html) In-Reply-To: References: Message-ID: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com> On Sat, Feb 28, 2009 at 2:44 PM, Holtonma wrote: > > On Feb 28, 2009, at 2:28 PM, Andrew Farmer wrote: > > Another feature that I would like: Node.swap( html ). It would be a method > on a Node that you can use to replace that Node with arbitrary HTML. This is > something that I'm used to using heavily in Hpricot. > > http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1 > > So this is another open question to everyone: would you use this > feature????? Do you currently do something similar but implement it in a > different way? > > > Thanks, > Andrew > > Curious - how does that differ from .innerHTML? (which I believe Nokogiri > supports) This is to actually swap out the html with something else. It's basically doing this: require 'nokogiri' doc = Nokogiri::HTML(<<-eohtml)
hello
eohtml div = doc.at('div') Nokogiri::HTML.fragment('world').children.each do |node| div.parent << node end div.remove puts doc.to_html I want to get a feel for how many people would actually use this. Hpricot has it, but there are no tests for it, and I don't want to be compatible with something that has no tests. If I add this, I would add it because people find it useful (as opposed to being compatible). I wouldn't actually use the proposed methods, which is why I want some public opinion. :-) -- Aaron Patterson http://tenderlovemaking.com/ From andrew at nextmobileweb.com Sat Feb 28 22:00:15 2009 From: andrew at nextmobileweb.com (Andrew Farmer) Date: Sat, 28 Feb 2009 19:00:15 -0800 Subject: [Nokogiri-talk] would you use this feature? Node.swap(html) In-Reply-To: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com> References: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com> Message-ID: The proposed swap method is a little bit different from setting inner HTML and it is a little bit different from what you've written Aaron. Let's take a different document for an example. doc = Nokogiri::HTML(<<-eohtml)
hello
eohtml I would like to be able to replace that div with my span like this: doc.at("div").swap("world") And get this result: world Aaron, your code would produce this: world The span is in the wrong place! Hopefully now it is somewhat clear what I would like this method to do. I use this function a lot in hpricot for re-working web pages so I think it is a very useful function. Am I the only one who thinks so? On Sat, Feb 28, 2009 at 4:33 PM, Aaron Patterson wrote: > On Sat, Feb 28, 2009 at 2:44 PM, Holtonma wrote: > > > > On Feb 28, 2009, at 2:28 PM, Andrew Farmer > wrote: > > > > Another feature that I would like: Node.swap( html ). It would be a > method > > on a Node that you can use to replace that Node with arbitrary HTML. This > is > > something that I'm used to using heavily in Hpricot. > > > > > http://nokogiri.lighthouseapp.com/projects/19607/tickets/50-swap-method-hpricot-compatibility#ticket-50-1 > > > > So this is another open question to everyone: would you use this > > feature? Do you currently do something similar but implement it in a > > different way? > > > > > > Thanks, > > Andrew > > > > Curious - how does that differ from .innerHTML? (which I believe Nokogiri > > supports) > > This is to actually swap out the html with something else. It's > basically doing this: > > require 'nokogiri' > > doc = Nokogiri::HTML(<<-eohtml) > > >
hello
> > > eohtml > > div = doc.at('div') > Nokogiri::HTML.fragment('world').children.each do |node| > div.parent << node > end > div.remove > > puts doc.to_html > > I want to get a feel for how many people would actually use this. > Hpricot has it, but there are no tests for it, and I don't want to be > compatible with something that has no tests. If I add this, I would > add it because people find it useful (as opposed to being compatible). > > I wouldn't actually use the proposed methods, which is why I want some > public opinion. :-) > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Nokogiri-talk mailing list > Nokogiri-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/nokogiri-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Sat Feb 28 22:31:18 2009 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Sat, 28 Feb 2009 19:31:18 -0800 Subject: [Nokogiri-talk] would you use this feature? Node.swap(html) In-Reply-To: References: <6959e1680902281633x7de0b161jb8a12b8e77cbb8d6@mail.gmail.com> Message-ID: <6959e1680902281931m4a42fac8sac0c80d606fc2e8c@mail.gmail.com> On Sat, Feb 28, 2009 at 7:00 PM, Andrew Farmer wrote: > The proposed swap method is a little bit different from setting inner HTML > and it is a little bit different from what you've written Aaron. Let's take > a different document for an example. > > doc = Nokogiri::HTML(<<-eohtml) > ? > ? ? > ???? > ? ? ?
hello
> ???? > ? ? > ? > eohtml > > I would like to be able to replace that div with my span like this: > > doc.at("div").swap("world") > > And get this result: > > ? > ? ? > ???? > ???? world > ???? > ? ? > ? > > Aaron, your code would produce this: > > ? > ? ? > ???? > ???? > ???? world > ? ? > ? > > The span is in the wrong place! Yes. There is a bug in my implementation. But my point remains the same. Here is a less buggy implementation: div = doc.at('div') Nokogiri::HTML.fragment('world').children.reverse.each do |node| div.add_next_sibling node end div.remove > Hopefully now it is somewhat clear what I would like this method to do. I > use this function a lot in hpricot for re-working web pages so I think it is > a very useful function. Am I the only one who thinks so? Interesting. What are you reworking? I'm curious. Also, please bottom post. Thanks! -- Aaron Patterson http://tenderlovemaking.com/