From port001 at gmail.com Wed Aug 6 07:40:16 2008 From: port001 at gmail.com (Ian Leitch) Date: Wed, 6 Aug 2008 12:40:16 +0100 Subject: [libxml-devel] Build issue on Solaris 5.11 using Sun StudioExpress Message-ID: Hey all, Has anyone tried to build using StudioExpress yet? StudioExpress-sol-x86-2008-07-24-ii # showrev Release: 5.11 Kernel architecture: i86pc Application architecture: i386 Kernel version: SunOS 5.11 snv_67 I'm getting the following errors: cc -I. -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. -I/usr/local/include -DRUBY_EXTCONF_H=\"extconf.h\" -I/opt/csw/include/libxml2/ -I/opt/csw/include -I/opt/csw/include -KPIC -xO3 -xarch=386 -xspace -xildoff -I/opt/csw/include -I/opt/csw/include -KPIC -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. -I/usr/local/include -c ruby_xml_sax_parser.c "ruby_libxml.h", line 3: warning: invalid white space character in directive "ruby_libxml.h", line 4: warning: invalid white space character in directive "ruby_libxml.h", line 6: warning: invalid white space character in directive "ruby_libxml.h", line 8: warning: invalid white space character in directive "ruby_libxml.h", line 9: warning: invalid white space character in directive "ruby_libxml.h", line 10: warning: invalid white space character in directive "ruby_libxml.h", line 11: warning: invalid white space character in directive "ruby_libxml.h", line 12: warning: invalid white space character in directive "ruby_libxml.h", line 13: warning: invalid white space character in directive "ruby_libxml.h", line 14: warning: invalid white space character in directive "ruby_libxml.h", line 15: warning: invalid white space character in directive "ruby_libxml.h", line 16: warning: invalid white space character in directive "ruby_libxml.h", line 17: warning: invalid white space character in directive "ruby_libxml.h", line 18: warning: invalid white space character in directive "ruby_libxml.h", line 19: warning: invalid white space character in directive "ruby_libxml.h", line 20: warning: invalid white space character in directive "ruby_libxml.h", line 23: warning: invalid white space character in directive "ruby_libxml.h", line 25: warning: invalid white space character in directive "ruby_libxml.h", line 28: warning: invalid white space character in directive "ruby_libxml.h", line 31: warning: invalid white space character in directive "ruby_libxml.h", line 33: warning: invalid white space character in directive "ruby_libxml.h", line 34: warning: invalid white space character in directive "ruby_libxml.h", line 35: warning: invalid white space character in directive "ruby_libxml.h", line 36: warning: invalid white space character in directive "ruby_libxml.h", line 37: warning: invalid white space character in directive "ruby_libxml.h", line 38: warning: invalid white space character in directive "ruby_libxml.h", line 39: warning: invalid white space character in directive "ruby_libxml.h", line 40: warning: invalid white space character in directive "ruby_libxml.h", line 41: warning: invalid white space character in directive "ruby_libxml.h", line 43: warning: invalid white space character in directive "ruby_libxml.h", line 44: warning: invalid white space character in directive "ruby_libxml.h", line 45: warning: invalid white space character in directive "ruby_libxml.h", line 46: warning: invalid white space character in directive "ruby_libxml.h", line 47: warning: invalid white space character in directive "ruby_libxml.h", line 65: warning: invalid white space character in directive "ruby_libxml.h", line 66: warning: invalid white space character in directive "ruby_libxml.h", line 67: warning: invalid white space character in directive "ruby_libxml.h", line 68: warning: invalid white space character in directive "ruby_libxml.h", line 69: warning: invalid white space character in directive "ruby_libxml.h", line 70: warning: invalid white space character in directive "ruby_libxml.h", line 71: warning: invalid white space character in directive "ruby_libxml.h", line 72: warning: invalid white space character in directive "ruby_libxml.h", line 73: warning: invalid white space character in directive "ruby_libxml.h", line 74: warning: invalid white space character in directive "ruby_libxml.h", line 75: warning: invalid white space character in directive "ruby_libxml.h", line 76: warning: invalid white space character in directive "ruby_libxml.h", line 77: warning: invalid white space character in directive "ruby_libxml.h", line 78: warning: invalid white space character in directive "ruby_libxml.h", line 79: warning: invalid white space character in directive "ruby_libxml.h", line 80: warning: invalid white space character in directive "ruby_libxml.h", line 81: warning: invalid white space character in directive "ruby_libxml.h", line 82: warning: invalid white space character in directive "ruby_libxml.h", line 83: warning: invalid white space character in directive "ruby_libxml.h", line 84: warning: invalid white space character in directive "ruby_libxml.h", line 85: warning: invalid white space character in directive "ruby_libxml.h", line 86: warning: invalid white space character in directive "ruby_libxml.h", line 95: warning: invalid white space character in directive "sax_parser_callbacks.inc", line 124: warning: invalid white space character in directive "sax_parser_callbacks.inc", line 125: syntax error before or at: do "sax_parser_callbacks.inc", line 125: invalid source character: '\' "sax_parser_callbacks.inc", line 126: invalid source character: '\' "sax_parser_callbacks.inc", line 127: invalid source character: '\' "sax_parser_callbacks.inc", line 128: invalid source character: '\' "sax_parser_callbacks.inc", line 129: invalid source character: '\' "sax_parser_callbacks.inc", line 130: invalid source character: '\' "sax_parser_callbacks.inc", line 131: invalid source character: '\' "sax_parser_callbacks.inc", line 132: invalid source character: '\' "sax_parser_callbacks.inc", line 141: invalid source character: '\' "sax_parser_callbacks.inc", line 150: invalid source character: '\' "sax_parser_callbacks.inc", line 159: invalid source character: '\' cc: acomp failed for ruby_xml_sax_parser.c *** Error code 2 make: Fatal error: Command failed for target `ruby_xml_sax_parser.o' Any help would be greatly appreciated! Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at rubyforge.org Sun Aug 3 11:50:26 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 3 Aug 2008 11:50:26 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080803155026.5F37018581A9@rubyforge.org> Bugs item #21424, was opened at 2008-08-03 11:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From noreply at rubyforge.org Sun Aug 3 12:01:25 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 3 Aug 2008 12:01:25 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080803160125.F0F2518581AC@rubyforge.org> Bugs item #21424, was opened at 2008-08-03 10:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- Comment By: Eric Musgrove (tenpaiyomi) Date: 2008-08-03 11:01 Message: This also applies to the node creation as well, as can be seen here: >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << XML::Node.new('bar') << XML::Node.new('baz') => >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << bar = XML::Node.new('bar') => >> bar << XML::Node.new('baz') => >> a => >> ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From tenpaiyomi at gmail.com Sun Aug 3 15:24:09 2008 From: tenpaiyomi at gmail.com (Eric M.) Date: Sun, 3 Aug 2008 12:24:09 -0700 (PDT) Subject: [libxml-devel] String parsing Message-ID: I'm not sure if I'm just doing something wrong or whatnot, but I cannot get any form of xpath searching to work on a parsed string. Specifically, here is what is happening: -> XML generated, sent out to server <- Server receives XML, sends out response -- response received, xml = XML::Parser.string(response.body).parse -- xml.find('/error') executed, result is nil; XML response, and value of xml, is formatted like so: Error message here. From transfire at gmail.com Fri Aug 8 16:08:28 2008 From: transfire at gmail.com (Trans) Date: Fri, 8 Aug 2008 13:08:28 -0700 (PDT) Subject: [libxml-devel] XML namespace In-Reply-To: <6959e1680807310837p73d094b0h9a55384e61b723b8@mail.gmail.com> References: <6959e1680807310837p73d094b0h9a55384e61b723b8@mail.gmail.com> Message-ID: On Jul 31, 11:37?am, "Aaron Patterson" wrote: > I guess its a little late to find this, but the expat ruby wrapper is > using the XML namespace. ?Just FYI! ?:-) > > http://rubyforge.org/projects/xmlparser/ That okay. We are smarter than average bear ;) require 'libxml' LibXML::XML::Document T. From matt at mattmargolis.net Mon Aug 11 16:02:21 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Mon, 11 Aug 2008 15:02:21 -0500 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away Message-ID: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> I am parsing 120K of XML into a document and then running def get_nodes(node, namespace) self.find("./dn:#{node}", "dn:#{namespace}") end several times. Memory usage for my test driver sits at 20 megs if I run get_nodes less than 10 times. If I run get_nodes 1000 times my memory usage jumps from 20 megs to around 140 megs and does not come back down until the process exits. If I force a GC.start at the end of each loop I can keep the memory usage down but that is not practical in the real world where I need this code to be at least somewhat fast. I am only building the document once during the entire duration of the test program so the parsing of the large string should not be a problem. Any ideas as to why my memory usage grows and then never comes down? I am running ruby 1.8.6 and libxml-ruby .8.3 with libxml 2.6.32. Thank you, Matt Margolis -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean at chittenden.org Mon Aug 11 17:38:01 2008 From: sean at chittenden.org (Sean Chittenden) Date: Mon, 11 Aug 2008 14:38:01 -0700 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> Message-ID: <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> > I am parsing 120K of XML into a document and then running > > def get_nodes(node, namespace) > self.find("./dn:#{node}", "dn:#{namespace}") > end > > several times. > > Memory usage for my test driver sits at 20 megs if I run get_nodes > less than 10 times. If I run get_nodes 1000 times my memory usage > jumps from 20 megs to around 140 megs and does not come back down > until the process exits. If I force a GC.start at the end of each > loop I can keep the memory usage down but that is not practical in > the real world where I need this code to be at least somewhat fast. > > I am only building the document once during the entire duration of > the test program so the parsing of the large string should not be a > problem. > > Any ideas as to why my memory usage grows and then never comes down? If the memory usage caps off at certain levels but isn't continually growing (i.e. a leak), then this is a "problem" with the Ruby GC and not with libxml. libxml just leverages Ruby's GC for memory allocation, etc. See if there is an updated GC patch that you can apply. I don't have the URL handy, but this post makes reference to it: http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/ One could argue, however, that using GC.start is practical if done in tight loops. What exactly are you trying to do with your fragments? Maybe there's a more efficient way of getting the result you're interested in. -sc -- Sean Chittenden sean at chittenden.org From matt at mattmargolis.net Mon Aug 11 18:47:55 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Mon, 11 Aug 2008 17:47:55 -0500 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> Message-ID: <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> Yes the growth appears to happen in bursts with plateus between growth cycles so Ruby's GC could definitely be the culprit. I am taking the document and converting it to ruby objects that map (with some fudging) to the same structure as the XML. The XML originates from a SOAP based web service. I am converting the response from XML in to ruby objects using recursion to walk inside of each object and load any subobjects that are present in the XML. I am doing a rather involved series of type lookups so that I can cast the strings in the XML to the appropriate ruby datatype based on the XSD the web service provides. The memory growth happens even without any recursion or assignment going on, all I have to do to induce the growth is a few Node#find calls on the document. The general structure of my document is something like this made up example except a whole lot bigger and with around 100 different node types represented throughout the document. RestaurantGetResponse Restaurants Restaurant Location Street1 Street2 City Zip RestaurantName Owner Name PhoneNumber Genres Genre Genre Restaurant ..... Restaurant ..... Restaurant ..... I am making the parsed ruby objects available to a Rails application and I find that if I call GC.start when using the library with Rails that it takes several seconds to garbage collect and sometimes crashes. If I call GC.start in the loop when the program is running as a standalone process then GC.start returns in a few dozen milliseconds. I wrote a SAX style parser using libxml-ruby that does not suffer from the memory growth but it is about 30 times slower than the document based parser so I am really trying to make the document based approach work. Matt Margolis On Mon, Aug 11, 2008 at 4:38 PM, Sean Chittenden wrote: > I am parsing 120K of XML into a document and then running >> >> def get_nodes(node, namespace) >> self.find("./dn:#{node}", "dn:#{namespace}") >> end >> >> several times. >> >> Memory usage for my test driver sits at 20 megs if I run get_nodes less >> than 10 times. If I run get_nodes 1000 times my memory usage jumps from 20 >> megs to around 140 megs and does not come back down until the process exits. >> If I force a GC.start at the end of each loop I can keep the memory usage >> down but that is not practical in the real world where I need this code to >> be at least somewhat fast. >> >> I am only building the document once during the entire duration of the >> test program so the parsing of the large string should not be a problem. >> >> Any ideas as to why my memory usage grows and then never comes down? >> > > If the memory usage caps off at certain levels but isn't continually > growing (i.e. a leak), then this is a "problem" with the Ruby GC and not > with libxml. libxml just leverages Ruby's GC for memory allocation, etc. > See if there is an updated GC patch that you can apply. I don't have the > URL handy, but this post makes reference to it: > > > http://antoniocangiano.com/2007/02/10/top-10-ruby-on-rails-performance-tips/ > > One could argue, however, that using GC.start is practical if done in tight > loops. What exactly are you trying to do with your fragments? Maybe > there's a more efficient way of getting the result you're interested in. > > -sc > > -- > Sean Chittenden > sean at chittenden.org > > > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan at spaceboyz.net Fri Aug 15 10:50:36 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Fri, 15 Aug 2008 16:50:36 +0200 Subject: [libxml-devel] [PATCH] Allow to initialize NS with prefix=nil (aka default namespace) Message-ID: <20080815145036.GB1054@chronos.sin> Allow to initialize NS with prefix=nil (aka default namespace) Signed-off-by: Stephan Maka --- ext/libxml/ruby_xml_ns.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/ext/libxml/ruby_xml_ns.c b/ext/libxml/ruby_xml_ns.c index 8c3c16c..517fccf 100644 --- a/ext/libxml/ruby_xml_ns.c +++ b/ext/libxml/ruby_xml_ns.c @@ -26,9 +26,11 @@ VALUE ruby_xml_ns_initialize(VALUE self, VALUE node, VALUE href, VALUE prefix) { xmlNodePtr xnode; xmlNsPtr xns; + xmlChar *prefixS; Data_Get_Struct(node, xmlNode, xnode); - xns = xmlNewNs(xnode, (xmlChar*)StringValuePtr(href), (xmlChar*)StringValuePtr(prefix)); + prefixS = NIL_P(prefix) ? NULL : StringValuePtr(prefix); + xns = xmlNewNs(xnode, (xmlChar*)StringValuePtr(href), prefixS); DATA_PTR(self) = xns; return self; -- 1.5.6.3 From deandeblock at telenet.be Sat Aug 16 07:20:42 2008 From: deandeblock at telenet.be (Dean) Date: Sat, 16 Aug 2008 13:20:42 +0200 Subject: [libxml-devel] NULL pointer (ArgumentError) with SAXParser Message-ID: Hi all, When I instantiate a Saxparser and some overridden callbacks libxml can't successfully parse xml files with a predefined doctype. It then throws the NULL pointer error. If I remove the doctype it all works well. Any ideas someone? I'm using: Ruby 1.8.6 Ruby-libxml 0.8.3 Windows XP -------------- next part -------------- An HTML attachment was scrubbed... URL: From cfis at savagexi.com Sat Aug 16 16:42:01 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 14:42:01 -0600 Subject: [libxml-devel] NULL pointer (ArgumentError) with SAXParser In-Reply-To: References: Message-ID: <48A73B99.4000907@savagexi.com> Hi Dean, > When I instantiate a Saxparser and some overridden callbacks libxml > can?t successfully parse xml files with a predefined doctype. It then > throws the NULL pointer error. If I remove the doctype it all works > well. Any ideas someone? Do you have a test case you can submit (see the rubyforge bug tracker). Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Sat Aug 16 16:46:08 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 14:46:08 -0600 Subject: [libxml-devel] [PATCH] Allow to initialize NS with prefix=nil (aka default namespace) In-Reply-To: <20080815145036.GB1054@chronos.sin> References: <20080815145036.GB1054@chronos.sin> Message-ID: <48A73C90.6040704@savagexi.com> Thanks Stephan, > Allow to initialize NS with prefix=nil (aka default namespace) I'll apply this for the next release. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Sat Aug 16 16:54:39 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 14:54:39 -0600 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> Message-ID: <48A73E8F.9020201@savagexi.com> Hi Matt, > I am making the parsed ruby objects available to a Rails application and > I find that if I call GC.start when using the library with Rails that it > takes several seconds to garbage collect and sometimes crashes. If I > call GC.start in the loop when the program is running as a standalone > process then GC.start returns in a few dozen milliseconds. What platform are you using? Can you run a debug version and get a stack trace so we can see what is going on? Are you using XPath? If so, make sure to free pointers to your XPath result objects and call GC.start before the associated documents get freed (see the rdocs for more info, document#find I think it is). > I wrote a SAX style parser using libxml-ruby that does not suffer from > the memory growth but it is about 30 times slower than the document > based parser so I am really trying to make the document based approach work. Why do you suppose SAX is so much slower. It should be a lot faster since it doesn't build an in-memory tree. Any chance the XMLReader would work for you? Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Sat Aug 16 16:56:36 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 14:56:36 -0600 Subject: [libxml-devel] String parsing In-Reply-To: References: Message-ID: <48A73F04.7000105@savagexi.com> Eric M. wrote: > I'm not sure if I'm just doing something wrong or whatnot, but I > cannot get any form of xpath searching to work on a parsed string. > Specifically, here is what is happening: > > -> XML generated, sent out to server > <- Server receives XML, sends out response > -- response received, xml = XML::Parser.string(response.body).parse > -- xml.find('/error') executed, result is nil; > > XML response, and value of xml, is formatted like so: > > > > Error message here. > Because the error is in an xml namespace. See the rdocs for the XPath module. Basically: xml.find('/mynamespace:error', :mynamespace => 'http://some-url.com') Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From matt at mattmargolis.net Sat Aug 16 17:16:45 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Sat, 16 Aug 2008 16:16:45 -0500 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <48A73E8F.9020201@savagexi.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> <48A73E8F.9020201@savagexi.com> Message-ID: <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> Charlie, I am running on OSX and RedHat. I am using the Node#find method with an XPath expression for the currently desired node in the default namespace of the document. The crashes stopped happening when I set my nodes variable to nil before calling GC.start. The memory does not spike too much if I call GC.start after every single Node#find but since parsing a single document into the required number of ruby objects necessesitates calling Node#find over a thousand times GC.start is really slowing things down. >From what I can tell calling Node#find on such a large document is causing Ruby to add extra object heaps which increases my memory usage in a way that the program does not recover from. This is unfortunate since I want to run multiple processes per box but each process is using several hundred megabytes of RAM after parsing a few large documents. The SAX parser with empty callbacks can rip through the document in about 17ms which is very fast in my opinion. The speed problem arrises when I try to do anything in the callbacks. The nature of the program and the structure of the XML requires me to do quite few lookups in a series of hashes to determine the type of the current node and the type of each text element. When SAX parsing I have to hit the hashes more often since I don't have as much context information available as I do with a recursive depth first document walk with the document parser node objects. With the necessary code in the callbacks I was seeing parse times around 400ms which is about twice as slow as the document based approach. XMLReader looks very interesting from the API docs but I am not sure that I grok how to actually use it. I will keep searching for resources but if you know of any examples of usage out there I would love to read some code. Thank you, Matt Margolis 2008/8/16 Charlie Savage > Hi Matt, > > I am making the parsed ruby objects available to a Rails application and I >> find that if I call GC.start when using the library with Rails that it takes >> several seconds to garbage collect and sometimes crashes. If I call >> GC.start in the loop when the program is running as a standalone process >> then GC.start returns in a few dozen milliseconds. >> > > What platform are you using? Can you run a debug version and get a stack > trace so we can see what is going on? Are you using XPath? If so, make > sure to free pointers to your XPath result objects and call GC.start before > the associated documents get freed (see the rdocs for more info, > document#find I think it is). > > I wrote a SAX style parser using libxml-ruby that does not suffer from the >> memory growth but it is about 30 times slower than the document based parser >> so I am really trying to make the document based approach work. >> > > Why do you suppose SAX is so much slower. It should be a lot faster since > it doesn't build an in-memory tree. > > Any chance the XMLReader would work for you? > > Charlie > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cfis at savagexi.com Sat Aug 16 17:30:02 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 15:30:02 -0600 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> <48A73E8F.9020201@savagexi.com> <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> Message-ID: <48A746DA.2000802@savagexi.com> Hi Matt, > I am running on OSX and RedHat. I am using the Node#find method with an > XPath expression for the currently desired node in the default namespace > of the document. The crashes stopped happening when I set my nodes > variable to nil before calling GC.start. The memory does not spike too > much if I call GC.start after every single Node#find but since parsing a > single document into the required number of ruby objects necessesitates > calling Node#find over a thousand times GC.start is really slowing > things down. Right, that is what you have to do (nodes = nil before GC.start). In my view, this is a design flaw in Ruby's GC but I didn't get very far when I asked about it to the Ruby core list. We can work around it, but I haven't had a chance to do it. If you're feeling like writing some C code, I can explain how I think the problem can be fixed so you avoid all the manual GCs. > From what I can tell calling Node#find on such a large document is > causing Ruby to add extra object heaps which increases my memory usage > in a way that the program does not recover from. This is unfortunate > since I want to run multiple processes per box but each process is using > several hundred megabytes of RAM after parsing a few large documents. Well, the bindings generally only wrap an object when you access it. So in theory, calling nodes = document.find should only add on Ruby object (the result object). The code used to wrap every returned object, but I'm pretty sure I changed it. To verify, the code is in the xpath_object class. Now if you then iterate over each returned node in the result, they will of course get wrapped (i.e, a Ruby object is created for each libxml node). > The SAX parser with empty callbacks can rip through the document in > about 17ms which is very fast in my opinion. The speed problem arrises > when I try to do anything in the callbacks. The nature of the program > and the structure of the XML requires me to do quite few lookups in a > series of hashes to determine the type of the current node and the type > of each text element. When SAX parsing I have to hit the hashes more > often since I don't have as much context information available as I do > with a recursive depth first document walk with the document parser node > objects. With the necessary code in the callbacks I was seeing parse > times around 400ms which is about twice as slow as the document based > approach. Oh, I see. So its all in the lookups. > XMLReader looks very interesting from the API docs but I am not sure > that I grok how to actually use it. I will keep searching for resources > but if you know of any examples of usage out there I would love to read > some code. I think there are a couple of tests (libxml/test) that might help a bit. Can't say I'm super familiar with that code either. But look for Python examples perhaps or .NET (libxml copied the api from .NET supposedly, based on reading the libxml site). Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From matt at mattmargolis.net Sat Aug 16 17:46:07 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Sat, 16 Aug 2008 16:46:07 -0500 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <48A746DA.2000802@savagexi.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> <48A73E8F.9020201@savagexi.com> <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> <48A746DA.2000802@savagexi.com> Message-ID: <4939a97f0808161446l4324bdd8jbd3f785984682f49@mail.gmail.com> I don't mind setting nodes = nil before calling GC.start (read some other threads so I think I understand why I have to do that) but I do mind the speed hit, so if you think there is a way around that I would love to know more. My general calling pattern is 1. Document#find_first to get the most top level element I am interested in 2. top_level_element#find for each of its direct children. When I find each child I then recurse down and load that children's children. So yes I am walking the entire tree which will create a bunch of objects. When only grabbing the top level element in my test program I am still seeing a big spike in memory. I looked at the XPath Object code and it looks to me like this case is the one I am going to match when trying to find the topmost element of interest. case XPATH_NODESET: rval = Data_Wrap_Struct(cXMLXPathObject, ruby_xml_xpath_object_mark, ruby_xml_xpath_object_free, xpop); I am not familiar with Data_Wrap_Struct(part of Ruby?) so I don't know if it could potentially create lots of objects. I will look at the XMLReader tests to try to get a better feel for if it will meet my needs. Thank you for the suggestion. Matt Margolis 2008/8/16 Charlie Savage > Hi Matt, > > I am running on OSX and RedHat. I am using the Node#find method with an >> XPath expression for the currently desired node in the default namespace of >> the document. The crashes stopped happening when I set my nodes variable to >> nil before calling GC.start. The memory does not spike too much if I call >> GC.start after every single Node#find but since parsing a single document >> into the required number of ruby objects necessesitates calling Node#find >> over a thousand times GC.start is really slowing things down. >> > > Right, that is what you have to do (nodes = nil before GC.start). In my > view, this is a design flaw in Ruby's GC but I didn't get very far when I > asked about it to the Ruby core list. We can work around it, but I haven't > had a chance to do it. If you're feeling like writing some C code, I can > explain how I think the problem can be fixed so you avoid all the manual > GCs. > > From what I can tell calling Node#find on such a large document is >> causing Ruby to add extra object heaps which increases my memory usage in a >> way that the program does not recover from. This is unfortunate since I >> want to run multiple processes per box but each process is using several >> hundred megabytes of RAM after parsing a few large documents. >> > > Well, the bindings generally only wrap an object when you access it. So in > theory, calling nodes = document.find should only add on Ruby object (the > result object). The code used to wrap every returned object, but I'm pretty > sure I changed it. To verify, the code is in the xpath_object class. > > Now if you then iterate over each returned node in the result, they will of > course get wrapped (i.e, a Ruby object is created for each libxml node). > > The SAX parser with empty callbacks can rip through the document in about >> 17ms which is very fast in my opinion. The speed problem arrises when I try >> to do anything in the callbacks. The nature of the program and the >> structure of the XML requires me to do quite few lookups in a series of >> hashes to determine the type of the current node and the type of each text >> element. When SAX parsing I have to hit the hashes more often since I don't >> have as much context information available as I do with a recursive depth >> first document walk with the document parser node objects. With the >> necessary code in the callbacks I was seeing parse times around 400ms which >> is about twice as slow as the document based approach. >> > > Oh, I see. So its all in the lookups. > > XMLReader looks very interesting from the API docs but I am not sure that >> I grok how to actually use it. I will keep searching for resources but if >> you know of any examples of usage out there I would love to read some code. >> > > I think there are a couple of tests (libxml/test) that might help a bit. > Can't say I'm super familiar with that code either. But look for Python > examples perhaps or .NET (libxml copied the api from .NET supposedly, based > on reading the libxml site). > > Charlie > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cfis at savagexi.com Sat Aug 16 18:33:58 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sat, 16 Aug 2008 16:33:58 -0600 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <4939a97f0808161446l4324bdd8jbd3f785984682f49@mail.gmail.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> <48A73E8F.9020201@savagexi.com> <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> <48A746DA.2000802@savagexi.com> <4939a97f0808161446l4324bdd8jbd3f785984682f49@mail.gmail.com> Message-ID: <48A755D6.4030703@savagexi.com> Matthew Margolis wrote: > I don't mind setting nodes = nil before calling GC.start (read some > other threads so I think I understand why I have to do that) but I do > mind the speed hit, so if you think there is a way around that I would > love to know more. It would all be in the C code. Something like: xml_document: * for each xpath_object returned, store a pointer to it (st_hash, key is a pointer, value is the object) * Update api to include register_xpath_object, unregister_xpath_object * When freed, iterate over xpath objects, and any left tell them to free their underlying C object (not the ruby object) xpath_object * On creation, call document.register_xpath_object * On freeding, call document.unregister_xpath_object * Add api, called by document, called free_xpath_object (or some such) > > My general calling pattern is > 1. Document#find_first to get the most top level element I am > interested in > 2. top_level_element#find for each of its direct children. When I find > each child I then recurse down and load that children's children. No need to call find for that. Just iterate over the children directly - will be faster. node.children I think it is.... > So yes I am walking the entire tree which will create a bunch of > objects. When only grabbing the top level element in my test program I > am still seeing a big spike in memory. I looked at the XPath Object > code and it looks to me like this case is the one I am going to match > when trying to find the topmost element of interest. > > case XPATH_NODESET: > rval = Data_Wrap_Struct(cXMLXPathObject, > ruby_xml_xpath_object_mark, > ruby_xml_xpath_object_free, > xpop); > > I am not familiar with Data_Wrap_Struct(part of Ruby?) so I don't know > if it could potentially create lots of objects. Yes, this wraps just the return object. The xpop object looks like this: http://xmlsoft.org/XSLT/object.gif Its iterating over xpop->nodesetval I was mentioning. > > I will look at the XMLReader tests to try to get a better feel for if it > will meet my needs. Thank you for the suggestion. Sure - sounds like that will be your best bet. Seems perfect for what you need. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From matt at mattmargolis.net Sat Aug 16 18:45:15 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Sat, 16 Aug 2008 17:45:15 -0500 Subject: [libxml-devel] memory consumption when finding inside of large document never goes away In-Reply-To: <48A755D6.4030703@savagexi.com> References: <4939a97f0808111302w2e7042a6v384c098dd3171938@mail.gmail.com> <3EE31471-C5AF-48CA-A6E8-391EAFDCBC1C@chittenden.org> <4939a97f0808111547r575cc268m4dd777b325023119@mail.gmail.com> <48A73E8F.9020201@savagexi.com> <4939a97f0808161416u656f4e1eg66c102ababcf0408@mail.gmail.com> <48A746DA.2000802@savagexi.com> <4939a97f0808161446l4324bdd8jbd3f785984682f49@mail.gmail.com> <48A755D6.4030703@savagexi.com> Message-ID: <4939a97f0808161545u79a18896h1be6809c3bbf8581@mail.gmail.com> Thank you so much Charlie. I am for sure going to switch my code to stop abusing find and instead iterate over the children. Good catch. If I can find some spare time I am going to give XMLReader a go and if I can work up the courage I will also try to patch the C code to free those objects. Matt Margolis 2008/8/16 Charlie Savage > > > Matthew Margolis wrote: > >> I don't mind setting nodes = nil before calling GC.start (read some other >> threads so I think I understand why I have to do that) but I do mind the >> speed hit, so if you think there is a way around that I would love to know >> more. >> > > It would all be in the C code. Something like: > > xml_document: > * for each xpath_object returned, store a pointer to it (st_hash, key is a > pointer, value is the object) > * Update api to include register_xpath_object, unregister_xpath_object > * When freed, iterate over xpath objects, and any left tell them to free > their underlying C object (not the ruby object) > > xpath_object > * On creation, call document.register_xpath_object > * On freeding, call document.unregister_xpath_object > * Add api, called by document, called free_xpath_object (or some such) > > > >> My general calling pattern is 1. Document#find_first to get the most top >> level element I am interested in 2. top_level_element#find for each of its >> direct children. When I find each child I then recurse down and load that >> children's children. >> > > No need to call find for that. Just iterate over the children directly - > will be faster. node.children I think it is.... > > So yes I am walking the entire tree which will create a bunch of objects. >> When only grabbing the top level element in my test program I am still >> seeing a big spike in memory. I looked at the XPath Object code and it >> looks to me like this case is the one I am going to match when trying to >> find the topmost element of interest. >> >> case XPATH_NODESET: >> rval = Data_Wrap_Struct(cXMLXPathObject, >> ruby_xml_xpath_object_mark, >> ruby_xml_xpath_object_free, >> xpop); >> >> I am not familiar with Data_Wrap_Struct(part of Ruby?) so I don't know if >> it could potentially create lots of objects. >> > > Yes, this wraps just the return object. The xpop object looks like this: > > http://xmlsoft.org/XSLT/object.gif > > Its iterating over xpop->nodesetval I was mentioning. > > >> I will look at the XMLReader tests to try to get a better feel for if it >> will meet my needs. Thank you for the suggestion. >> > > Sure - sounds like that will be your best bet. Seems perfect for what you > need. > > Charlie > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajmohanbanavi at gmail.com Mon Aug 18 07:28:35 2008 From: rajmohanbanavi at gmail.com (Rajmohan Banavi) Date: Mon, 18 Aug 2008 16:58:35 +0530 Subject: [libxml-devel] pull parsing Message-ID: I'm new to the world of XML and have just got started on libxml-ruby. Wanted to know if libxml-ruby supports pull parsing (just like StAX)? I read on Wikipedia that pull parsing is better than SAX. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt at mattmargolis.net Mon Aug 18 09:04:59 2008 From: matt at mattmargolis.net (Matthew Margolis) Date: Mon, 18 Aug 2008 08:04:59 -0500 Subject: [libxml-devel] pull parsing In-Reply-To: References: Message-ID: <4939a97f0808180604p6692e04s4ff8d09a5769f074@mail.gmail.com> I believe this is what you are talking about http://libxml.rubyforge.org/rdoc/classes/LibXML/XML/Reader.html Look in the tests directory of libxml-ruby at tc_reader to see some usage examples. Matt Margolis 2008/8/18 Rajmohan Banavi > I'm new to the world of XML and have just got started on libxml-ruby. > Wanted to know if libxml-ruby supports pull parsing (just like StAX)? I read > on Wikipedia that pull parsing is better than SAX. > > Thanks. > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at rubyforge.org Tue Aug 12 21:26:18 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Tue, 12 Aug 2008 21:26:18 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21546 ] SaxParser Crashes -- Null Pointer Message-ID: <20080813012618.F14DF18581B8@rubyforge.org> Bugs item #21546, was opened at 2008-08-13 01:26 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nick Retallack (nickretallack) Assigned to: Nobody (None) Summary: SaxParser Crashes -- Null Pointer Initial Comment: Any xml document that contains a DTD causes SaxParser to crash. Example: require 'rubygems' require 'libxml' class Handler def method_missing(method_name, *attributes, &block); end end parser = LibXML::XML::SaxParser.new parser.filename = 'anything_with_a_dtd.xml' parser.callbacks = Handler.new puts parser.filename parser.parse result: saxmltest.rb:12:in `parse': NULL pointer given (ArgumentError) It seems I am not the only person having this problem. http://rubyforge.org/pipermail/libxml-devel/2008-July/001042.html ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 From noreply at rubyforge.org Tue Aug 19 14:24:53 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Tue, 19 Aug 2008 14:24:53 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21607 ] XML::Reader#read_state never returns MODE_EOF Message-ID: <20080819182453.E9D19167820F@rubyforge.org> Bugs item #21607, was opened at 2008-08-19 13:24 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Kelvin Liu (kliuless) Assigned to: Nobody (None) Summary: XML::Reader#read_state never returns MODE_EOF Initial Comment: Even after the XML::Reader has reached the end of the stream, the read_state is returning MODE_READING instead of MODE_EOF. This is on Mac OS Leopard with libxml 2.6.16. I do not see the problem on sparc/solaris with libxml 2.6.31. Sorry, I don't have time to try different versions on the two boxes. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494 From cfis at savagexi.com Wed Aug 20 04:42:50 2008 From: cfis at savagexi.com (Charlie Savage) Date: Wed, 20 Aug 2008 02:42:50 -0600 Subject: [libxml-devel] Build issue on Solaris 5.11 using Sun StudioExpress In-Reply-To: References: Message-ID: <48ABD90A.8060201@savagexi.com> Any luck on this Ian? Charlie Ian Leitch wrote: > Hey all, > > Has anyone tried to build using StudioExpress yet? > > StudioExpress-sol-x86-2008-07-24-ii > > # showrev > Release: 5.11 > Kernel architecture: i86pc > Application architecture: i386 > Kernel version: SunOS 5.11 snv_67 > > I'm getting the following errors: > > cc -I. -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. > -I/usr/local/include -DRUBY_EXTCONF_H=\"extconf.h\" > -I/opt/csw/include/libxml2/ -I/opt/csw/include -I/opt/csw/include -KPIC > -xO3 -xarch=386 -xspace -xildoff -I/opt/csw/include -I/opt/csw/include > -KPIC -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. > -I/usr/local/include -c ruby_xml_sax_parser.c > "ruby_libxml.h", line 3: warning: invalid white space character in directive > "ruby_libxml.h", line 4: warning: invalid white space character in directive > "ruby_libxml.h", line 6: warning: invalid white space character in directive > "ruby_libxml.h", line 8: warning: invalid white space character in directive > "ruby_libxml.h", line 9: warning: invalid white space character in directive > "ruby_libxml.h", line 10: warning: invalid white space character in > directive > "ruby_libxml.h", line 11: warning: invalid white space character in > directive > "ruby_libxml.h", line 12: warning: invalid white space character in > directive > "ruby_libxml.h", line 13: warning: invalid white space character in > directive > "ruby_libxml.h", line 14: warning: invalid white space character in > directive > "ruby_libxml.h", line 15: warning: invalid white space character in > directive > "ruby_libxml.h", line 16: warning: invalid white space character in > directive > "ruby_libxml.h", line 17: warning: invalid white space character in > directive > "ruby_libxml.h", line 18: warning: invalid white space character in > directive > "ruby_libxml.h", line 19: warning: invalid white space character in > directive > "ruby_libxml.h", line 20: warning: invalid white space character in > directive > "ruby_libxml.h", line 23: warning: invalid white space character in > directive > "ruby_libxml.h", line 25: warning: invalid white space character in > directive > "ruby_libxml.h", line 28: warning: invalid white space character in > directive > "ruby_libxml.h", line 31: warning: invalid white space character in > directive > "ruby_libxml.h", line 33: warning: invalid white space character in > directive > "ruby_libxml.h", line 34: warning: invalid white space character in > directive > "ruby_libxml.h", line 35: warning: invalid white space character in > directive > "ruby_libxml.h", line 36: warning: invalid white space character in > directive > "ruby_libxml.h", line 37: warning: invalid white space character in > directive > "ruby_libxml.h", line 38: warning: invalid white space character in > directive > "ruby_libxml.h", line 39: warning: invalid white space character in > directive > "ruby_libxml.h", line 40: warning: invalid white space character in > directive > "ruby_libxml.h", line 41: warning: invalid white space character in > directive > "ruby_libxml.h", line 43: warning: invalid white space character in > directive > "ruby_libxml.h", line 44: warning: invalid white space character in > directive > "ruby_libxml.h", line 45: warning: invalid white space character in > directive > "ruby_libxml.h", line 46: warning: invalid white space character in > directive > "ruby_libxml.h", line 47: warning: invalid white space character in > directive > "ruby_libxml.h", line 65: warning: invalid white space character in > directive > "ruby_libxml.h", line 66: warning: invalid white space character in > directive > "ruby_libxml.h", line 67: warning: invalid white space character in > directive > "ruby_libxml.h", line 68: warning: invalid white space character in > directive > "ruby_libxml.h", line 69: warning: invalid white space character in > directive > "ruby_libxml.h", line 70: warning: invalid white space character in > directive > "ruby_libxml.h", line 71: warning: invalid white space character in > directive > "ruby_libxml.h", line 72: warning: invalid white space character in > directive > "ruby_libxml.h", line 73: warning: invalid white space character in > directive > "ruby_libxml.h", line 74: warning: invalid white space character in > directive > "ruby_libxml.h", line 75: warning: invalid white space character in > directive > "ruby_libxml.h", line 76: warning: invalid white space character in > directive > "ruby_libxml.h", line 77: warning: invalid white space character in > directive > "ruby_libxml.h", line 78: warning: invalid white space character in > directive > "ruby_libxml.h", line 79: warning: invalid white space character in > directive > "ruby_libxml.h", line 80: warning: invalid white space character in > directive > "ruby_libxml.h", line 81: warning: invalid white space character in > directive > "ruby_libxml.h", line 82: warning: invalid white space character in > directive > "ruby_libxml.h", line 83: warning: invalid white space character in > directive > "ruby_libxml.h", line 84: warning: invalid white space character in > directive > "ruby_libxml.h", line 85: warning: invalid white space character in > directive > "ruby_libxml.h", line 86: warning: invalid white space character in > directive > "ruby_libxml.h", line 95: warning: invalid white space character in > directive > "sax_parser_callbacks.inc", line 124: warning: invalid white space > character in directive > "sax_parser_callbacks.inc", line 125: syntax error before or at: do > "sax_parser_callbacks.inc", line 125: invalid source character: '\' > "sax_parser_callbacks.inc", line 126: invalid source character: '\' > "sax_parser_callbacks.inc", line 127: invalid source character: '\' > "sax_parser_callbacks.inc", line 128: invalid source character: '\' > "sax_parser_callbacks.inc", line 129: invalid source character: '\' > "sax_parser_callbacks.inc", line 130: invalid source character: '\' > "sax_parser_callbacks.inc", line 131: invalid source character: '\' > "sax_parser_callbacks.inc", line 132: invalid source character: '\' > "sax_parser_callbacks.inc", line 141: invalid source character: '\' > "sax_parser_callbacks.inc", line 150: invalid source character: '\' > "sax_parser_callbacks.inc", line 159: invalid source character: '\' > cc: acomp failed for ruby_xml_sax_parser.c > *** Error code 2 > make: Fatal error: Command failed for target `ruby_xml_sax_parser.o' > > Any help would be greatly appreciated! > > Cheers > > > ------------------------------------------------------------------------ > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From noreply at rubyforge.org Wed Aug 20 04:02:28 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 04:02:28 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21395 ] XML::Parser.io unfair to StringIO ? Message-ID: <20080820080229.1D0CF18581B2@rubyforge.org> Bugs item #21395, was opened at 2008-07-30 13:53 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21395&group_id=494 Category: None Group: None >Status: Closed >Resolution: Rejected Priority: 3 Submitted By: Nobody (None) >Assigned to: Charlie Savage (cfis) Summary: XML::Parser.io unfair to StringIO ? Initial Comment: > LibXML::XML::Parser::VERSION => "0.8.3" > xp = LibXML::XML::Parser.new => # > xp.io = StringIO.new('') TypeError: need an IO object ---------------------------------------------------------------------- >Comment By: Charlie Savage (cfis) Date: 2008-08-20 02:02 Message: Since its easy enough to do this: xp.string = io.string I don't think this is worth adding to libxml. Thus closing this bug. ---------------------------------------------------------------------- Comment By: Erik Hollensbe (erikh) Date: 2008-07-30 15:45 Message: StringIO doesn't inherit from IO, it just mirrors the interface. f.e.: StringIO.new("").kind_of? IO == false The code will have to be adapted to handle StringIO directly. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21395&group_id=494 From noreply at rubyforge.org Wed Aug 20 04:26:47 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 04:26:47 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080820082656.696A818581B1@rubyforge.org> Bugs item #21424, was opened at 2008-08-03 09:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None >Status: Closed >Resolution: Accepted Priority: 3 Submitted By: Nobody (None) >Assigned to: Charlie Savage (cfis) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- >Comment By: Charlie Savage (cfis) Date: 2008-08-20 02:26 Message: Hi Eric, The problem is that << returns self, and not the appended child. I've changed this in trunk, so the next version of libxml should work the way you request. ---------------------------------------------------------------------- Comment By: Eric Musgrove (tenpaiyomi) Date: 2008-08-03 10:01 Message: This also applies to the node creation as well, as can be seen here: >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << XML::Node.new('bar') << XML::Node.new('baz') => >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << bar = XML::Node.new('bar') => >> bar << XML::Node.new('baz') => >> a => >> ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From noreply at rubyforge.org Wed Aug 20 04:29:38 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 04:29:38 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21546 ] SaxParser Crashes -- Null Pointer Message-ID: <20080820082938.A8B1918581B1@rubyforge.org> Bugs item #21546, was opened at 2008-08-12 19:26 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nick Retallack (nickretallack) Assigned to: Nobody (None) Summary: SaxParser Crashes -- Null Pointer Initial Comment: Any xml document that contains a DTD causes SaxParser to crash. Example: require 'rubygems' require 'libxml' class Handler def method_missing(method_name, *attributes, &block); end end parser = LibXML::XML::SaxParser.new parser.filename = 'anything_with_a_dtd.xml' parser.callbacks = Handler.new puts parser.filename parser.parse result: saxmltest.rb:12:in `parse': NULL pointer given (ArgumentError) It seems I am not the only person having this problem. http://rubyforge.org/pipermail/libxml-devel/2008-July/001042.html ---------------------------------------------------------------------- >Comment By: Charlie Savage (cfis) Date: 2008-08-20 02:29 Message: Does the tc_sax_parser#test_doctype test work for you? That's copied directly from the link you mention, and works fine here. Does it work for you? ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 From noreply at rubyforge.org Wed Aug 20 04:30:11 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 04:30:11 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21607 ] XML::Reader#read_state never returns MODE_EOF Message-ID: <20080820083040.51C901858283@rubyforge.org> Bugs item #21607, was opened at 2008-08-19 12:24 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: K Liu (kliuless) Assigned to: Nobody (None) Summary: XML::Reader#read_state never returns MODE_EOF Initial Comment: Even after the XML::Reader has reached the end of the stream, the read_state is returning MODE_READING instead of MODE_EOF. This is on Mac OS Leopard with libxml 2.6.16. I do not see the problem on sparc/solaris with libxml 2.6.31. Sorry, I don't have time to try different versions on the two boxes. ---------------------------------------------------------------------- >Comment By: Charlie Savage (cfis) Date: 2008-08-20 02:30 Message: K Liu, do you have a test case we can look at? ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494 From port001 at gmail.com Wed Aug 20 17:58:04 2008 From: port001 at gmail.com (Ian Leitch) Date: Wed, 20 Aug 2008 22:58:04 +0100 Subject: [libxml-devel] Build issue on Solaris 5.11 using Sun StudioExpress In-Reply-To: <48ABD90A.8060201@savagexi.com> References: <48ABD90A.8060201@savagexi.com> Message-ID: No, I haven't spent much more than 10 minutes trying to fix it myself. I'm not compltley comfortable with the syntax in sax_parser_callbacks.inc. If the problem isn't immediatley obvious to anyone I'll have a go at fixing it at some point. Cheers 2008/8/20 Charlie Savage > Any luck on this Ian? > > Charlie > > Ian Leitch wrote: > >> Hey all, >> >> Has anyone tried to build using StudioExpress yet? >> >> StudioExpress-sol-x86-2008-07-24-ii >> >> # showrev >> Release: 5.11 >> Kernel architecture: i86pc >> Application architecture: i386 >> Kernel version: SunOS 5.11 snv_67 >> >> I'm getting the following errors: >> >> cc -I. -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. >> -I/usr/local/include -DRUBY_EXTCONF_H=\"extconf.h\" >> -I/opt/csw/include/libxml2/ -I/opt/csw/include -I/opt/csw/include -KPIC >> -xO3 -xarch=386 -xspace -xildoff -I/opt/csw/include -I/opt/csw/include >> -KPIC -I. -I/opt/csw/lib/ruby/1.8/i386-solaris2.8 -I. -I/usr/local/include >> -c ruby_xml_sax_parser.c >> "ruby_libxml.h", line 3: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 4: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 6: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 8: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 9: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 10: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 11: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 12: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 13: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 14: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 15: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 16: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 17: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 18: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 19: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 20: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 23: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 25: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 28: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 31: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 33: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 34: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 35: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 36: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 37: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 38: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 39: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 40: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 41: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 43: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 44: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 45: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 46: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 47: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 65: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 66: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 67: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 68: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 69: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 70: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 71: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 72: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 73: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 74: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 75: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 76: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 77: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 78: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 79: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 80: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 81: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 82: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 83: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 84: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 85: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 86: warning: invalid white space character in >> directive >> "ruby_libxml.h", line 95: warning: invalid white space character in >> directive >> "sax_parser_callbacks.inc", line 124: warning: invalid white space >> character in directive >> "sax_parser_callbacks.inc", line 125: syntax error before or at: do >> "sax_parser_callbacks.inc", line 125: invalid source character: '\' >> "sax_parser_callbacks.inc", line 126: invalid source character: '\' >> "sax_parser_callbacks.inc", line 127: invalid source character: '\' >> "sax_parser_callbacks.inc", line 128: invalid source character: '\' >> "sax_parser_callbacks.inc", line 129: invalid source character: '\' >> "sax_parser_callbacks.inc", line 130: invalid source character: '\' >> "sax_parser_callbacks.inc", line 131: invalid source character: '\' >> "sax_parser_callbacks.inc", line 132: invalid source character: '\' >> "sax_parser_callbacks.inc", line 141: invalid source character: '\' >> "sax_parser_callbacks.inc", line 150: invalid source character: '\' >> "sax_parser_callbacks.inc", line 159: invalid source character: '\' >> cc: acomp failed for ruby_xml_sax_parser.c >> *** Error code 2 >> make: Fatal error: Command failed for target `ruby_xml_sax_parser.o' >> >> Any help would be greatly appreciated! >> >> Cheers >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> libxml-devel mailing list >> libxml-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/libxml-devel >> > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at rubyforge.org Wed Aug 20 18:56:15 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 18:56:15 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080820225615.BFD3C18581B2@rubyforge.org> Bugs item #21424, was opened at 2008-08-04 00:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None Status: Closed Resolution: Accepted Priority: 3 Submitted By: Nobody (None) Assigned to: Charlie Savage (cfis) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- Comment By: Masashi Shimbo (shimbo) Date: 2008-08-21 07:56 Message: Hello, Sorry to interrupt, but I am against this change. Operator << in Ruby generally returns self, not the appended object. Actually, Node#<< used to return the appended object but it was changed to be consistent with the Ruby standard at one point. Masashi Shimbo ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-08-20 17:26 Message: Hi Eric, The problem is that << returns self, and not the appended child. I've changed this in trunk, so the next version of libxml should work the way you request. ---------------------------------------------------------------------- Comment By: Eric Musgrove (tenpaiyomi) Date: 2008-08-04 01:01 Message: This also applies to the node creation as well, as can be seen here: >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << XML::Node.new('bar') << XML::Node.new('baz') => >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << bar = XML::Node.new('bar') => >> bar << XML::Node.new('baz') => >> a => >> ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From noreply at rubyforge.org Wed Aug 20 19:15:45 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 19:15:45 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080820231545.E964515F8012@rubyforge.org> Bugs item #21424, was opened at 2008-08-03 10:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None Status: Closed Resolution: Accepted Priority: 3 Submitted By: Nobody (None) Assigned to: Charlie Savage (cfis) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- Comment By: Eric Musgrove (tenpaiyomi) Date: 2008-08-20 18:15 Message: Perhaps then, some alternative to changing the behavior of <> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << XML::Node.new('bar') << XML::Node.new('baz') => >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << bar = XML::Node.new('bar') => >> bar << XML::Node.new('baz') => >> a => >> ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From cfis at savagexi.com Wed Aug 20 23:58:43 2008 From: cfis at savagexi.com (Charlie Savage) Date: Wed, 20 Aug 2008 21:58:43 -0600 Subject: [libxml-devel] << operator Message-ID: <48ACE7F3.2010802@savagexi.com> This code: node = XML::Node.new('foo') << XML::Node.new('bar') << "bars contents" Generates this xml: puts node bars contents That is because the << returns self. Eric Musgrove if instead it should work like this (see http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494): puts node.parent.parent bars contents By making the << operator return the appended node. << returning the appended node seems more useful, but doesn't follow the Ruby standard as pointed out by Masashi Shimbo. Thoughts? Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 21 15:13:10 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 12:13:10 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ACE7F3.2010802@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> Message-ID: <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > This code: > > node = XML::Node.new('foo') << XML::Node.new('bar') << "bars contents" > > Generates this xml: > > puts node > bars contents > > That is because the << returns self. That's bad. > Eric Musgrove if instead it should work like this (see http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494) > : > > puts node.parent.parent > > > bars contents > > > By making the << operator return the appended node. > > << returning the appended node seems more useful, but doesn't follow > the Ruby standard as pointed out by Masashi Shimbo. > > Thoughts? I have all kinds of issues with "the Ruby standard" and think this is an opportune place to break that convention. The path of least resistance should win (i.e. the programmer is right, do what's easiest for the programmer). A contrived example, but this code absolutely irritates the shit out of me: Class F def a=(b) @z = b + 1 return @z end def a return @z end end f = F.new x = f.a = 1 p f.a # Prints 2 p x # Prints 1 @#$@#$@# @#$@#$ @#$@# @#$@# @#$@#$ @#$@ing @#$#@@!!!!!!! Ignore the "Ruby standard." -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkitvkcACgkQTrydwWwuXhaEhACdHk8MlSI+hLfji+tUVm1DMGGo ZAgAniAAKqXGozIAyATQl+o4EFU9fGyf =9VR5 -----END PGP SIGNATURE----- From noreply at rubyforge.org Wed Aug 20 23:44:05 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 20 Aug 2008 23:44:05 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21424 ] Stringing commands bypasses correct node creation Message-ID: <20080821034406.07A28185859F@rubyforge.org> Bugs item #21424, was opened at 2008-08-03 09:50 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 Category: General Group: None Status: Closed Resolution: Accepted Priority: 3 Submitted By: Nobody (None) Assigned to: Charlie Savage (cfis) Summary: Stringing commands bypasses correct node creation Initial Comment: When I create something along the lines of: foo << XML::Node.new('bar') << "bars contents" The following is returned: --------- bars contents --------- However, this: foo << bar = XML::Node.new('bar') bar << "bars contents" returns this: --------- bars contents --------- These results have been sporadic at times, it seems. When I create XML::Documents in the console, sometimes it works fine, sometimes it doesn't. I have been unable to pinpoint a specific cause, but I am, however, having to rewrite a good bit of my code because invalid XML is being produced due to this bug. ---------------------------------------------------------------------- >Comment By: Charlie Savage (cfis) Date: 2008-08-20 21:44 Message: Hi Masashi, Yes, its a good point. But in this case it seems more useful to return the appended object. I'll post to the mailing list for more discussion. ---------------------------------------------------------------------- Comment By: Eric Musgrove (tenpaiyomi) Date: 2008-08-20 17:15 Message: Perhaps then, some alternative to changing the behavior of <> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << XML::Node.new('bar') << XML::Node.new('baz') => >> a = XML::Document.new() => >> a.root = XML::Node.new('foo') => >> root = a.root => >> root << bar = XML::Node.new('bar') => >> bar << XML::Node.new('baz') => >> a => >> ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494 From cfis at savagexi.com Thu Aug 21 15:23:13 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 13:23:13 -0600 Subject: [libxml-devel] << operator In-Reply-To: <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> References: <48ACE7F3.2010802@savagexi.com> <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> Message-ID: <48ADC0A1.8010609@savagexi.com> > A contrived example, but this code absolutely irritates the shit out of me: > > Class F > def a=(b) > @z = b + 1 > return @z > end > def a > return @z > end > end > > f = F.new > x = f.a = 1 > p f.a # Prints 2 > p x # Prints 1 That's bizarre. What is going on there? Print statement show that def a= is called before def a, so can x ever be set to 1? Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From transfire at gmail.com Thu Aug 21 15:28:27 2008 From: transfire at gmail.com (Trans) Date: Thu, 21 Aug 2008 15:28:27 -0400 Subject: [libxml-devel] << operator In-Reply-To: <48ACE7F3.2010802@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> Message-ID: <4b6f054f0808211228v7be0f89du7e3f4b16526944d0@mail.gmail.com> 2008/8/20 Charlie Savage : > This code: > > node = XML::Node.new('foo') << XML::Node.new('bar') << "bars contents" > > Generates this xml: > > puts node > bars contents > > That is because the << returns self. > > Eric Musgrove if instead it should work like this (see > http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21424&group_id=494): > > puts node.parent.parent > > > bars contents > > > By making the << operator return the appended node. > > << returning the appended node seems more useful, but doesn't follow the > Ruby standard as pointed out by Masashi Shimbo. > > Thoughts? The most general notion of #<< is "push", so this behvior makes sense for anything that behaves like a stack. For XML which is a tree, it does not make sense and I agree it would be better to return the appended node. T. From cfis at savagexi.com Thu Aug 21 15:29:22 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 13:29:22 -0600 Subject: [libxml-devel] << operator In-Reply-To: <48ACE7F3.2010802@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> Message-ID: <48ADC212.7000701@savagexi.com> > node = XML::Node.new('foo') << XML::Node.new('bar') << "bars contents" > > puts node.parent.parent > > > bars contents > FYI - one downside of returning the appended object as opposed to self is that the variable node above is the "bar contents" node. Thus you have to write: puts node.parent.parent. I find that surprising. It would be nice if node was the top node. Thus in the example it would be nice for << to return the child object in some cases and self in others. Sigh. Anyone see a way of getting the best of both worlds? Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 21 15:38:26 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 12:38:26 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ADC0A1.8010609@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> <48ADC0A1.8010609@savagexi.com> Message-ID: >> A contrived example, but this code absolutely irritates the shit >> out of me: >> Class F >> def a=(b) >> @z = b + 1 >> return @z >> end >> def a >> return @z >> end >> end >> f = F.new >> x = f.a = 1 >> p f.a # Prints 2 >> p x # Prints 1 > > That's bizarre. What is going on there? Print statement show that > def a= is called before def a, so can x ever be set to 1? It's an "optimization". The rvalue of an assignment is the argument to all lvalues in an assignment statement. i.e.: x = f.a = 1 is translated into "x = 1 ; f.a = 1". I think that was a very bad decision, but oh well. Ruby ain't perfect and this, to me at least, is perfect justification for doing what is right vs. what is "standard." -sc >> class F >> def a=(b) >> @z = b + 1 >> return @z >> end >> def a >> return @z >> end >> end => nil >> f = F.new => # >> x = f.a = 1 => 1 >> x => 1 >> f.a => 2 -- Sean Chittenden sean at chittenden.org From cfis at savagexi.com Thu Aug 21 15:44:15 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 13:44:15 -0600 Subject: [libxml-devel] << operator In-Reply-To: References: <48ACE7F3.2010802@savagexi.com> <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> <48ADC0A1.8010609@savagexi.com> Message-ID: <48ADC58F.7040206@savagexi.com> >>> A contrived example, but this code absolutely irritates the shit out >>> of me: >>> Class F >>> def a=(b) >>> @z = b + 1 >>> return @z >>> end >>> def a >>> return @z >>> end >>> end >>> f = F.new >>> x = f.a = 1 >>> p f.a # Prints 2 >>> p x # Prints 1 >> >> That's bizarre. What is going on there? Print statement show that >> def a= is called before def a, so can x ever be set to 1? > > > It's an "optimization". The rvalue of an assignment is the argument to > all lvalues in an assignment statement. i.e.: x = f.a = 1 is translated > into "x = 1 ; f.a = 1". I think that was a very bad decision, but oh > well. Ruby ain't perfect and this, to me at least, is perfect > justification for doing what is right vs. what is "standard." -sc Ah. That is silly. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 21 15:51:28 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 12:51:28 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ADC212.7000701@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> Message-ID: >> node = XML::Node.new('foo') << XML::Node.new('bar') << "bars >> contents" >> puts node.parent.parent >> >> bars contents >> > > FYI - one downside of returning the appended object as opposed to > self is that the variable node above is the "bar contents" node. > Thus you have to write: > > puts node.parent.parent. > > I find that surprising. It would be nice if node was the top node. > > Thus in the example it would be nice for << to return the child > object in some cases and self in others. Sigh. > > Anyone see a way of getting the best of both worlds? << should operate on the contents of a node, IMHO. So: node1 = XML::Node.new('foo') node1 << 'bar' would produce 'bar' and not 'bar'. However: node2 = XML::Node.new('bar') node1 << node2 should produce ''. If the argument to << is a string, append to the xmlNode->content via xmlNodeAddContent(3). If, however, the argument is an XML::Node, then use xmlAddChild(3) to add an xmlNode->child node. + and << should behave differently: 'node1 + node2' should produce '' I didn't spend a lot of time on the operators when writing this, but I feel pretty strongly that there should be such a distinction and it should behave differently based on the argument's data type (not that I'm actively using the lib atm, but...). That a useful suggestion? -sc -- Sean Chittenden sean at chittenden.org From sean at chittenden.org Thu Aug 21 15:53:06 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 12:53:06 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ADC58F.7040206@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> <0A2BCA1D-481E-46D9-961A-A80DAB706651@chittenden.org> <48ADC0A1.8010609@savagexi.com> <48ADC58F.7040206@savagexi.com> Message-ID: <759FF3F7-02F9-4CD0-8BB3-7CABAB499F78@chittenden.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>>> A contrived example, but this code absolutely irritates the shit >>>> out of me: >>>> Class F >>>> def a=(b) >>>> @z = b + 1 >>>> return @z >>>> end >>>> def a >>>> return @z >>>> end >>>> end >>>> f = F.new >>>> x = f.a = 1 >>>> p f.a # Prints 2 >>>> p x # Prints 1 >>> >>> That's bizarre. What is going on there? Print statement show >>> that def a= is called before def a, so can x ever be set to 1? >> It's an "optimization". The rvalue of an assignment is the >> argument to all lvalues in an assignment statement. i.e.: x = f.a >> = 1 is translated into "x = 1 ; f.a = 1". I think that was a very >> bad decision, but oh well. Ruby ain't perfect and this, to me at >> least, is perfect justification for doing what is right vs. what is >> "standard." -sc > > Ah. That is silly. And because it is a counterintuitive construct compared to nearly every other language on the planet, it is VERY difficult to figure out for the first time!!!!! Ugh. -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkitx6IACgkQTrydwWwuXhbJ/ACeLd18kCsGW2aPx8JXVVKLoTn2 IIoAoI4rTXZ5jYtcrSk76OifSgcKK/DV =82es -----END PGP SIGNATURE----- From sean at chittenden.org Thu Aug 21 15:57:47 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 12:57:47 -0700 Subject: [libxml-devel] << operator In-Reply-To: References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> Message-ID: <4ADBF8D5-B2D3-4E41-B01B-A558F715F0B8@chittenden.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > If the argument to << is a string, append to the xmlNode->content > via xmlNodeAddContent(3). If, however, the argument is an > XML::Node, then use xmlAddChild(3) to add an xmlNode->child node. And if it's an atribute: node1 << XML::Attribute.new('foo','bar') You'd end up with: Here's the problem and line that I haven't drawn yet: node1 << XML::Attribute.new('foo','bar') << 'baz' What's that produce? 'baz' or '' ?? I think chaining objects together is a common practice and more convenient than doing one assignment per line, so I'm biased towards the behavior of the former (not the latter). Arguments/discussion welcome. -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkityLsACgkQTrydwWwuXhYKewCgpGTX8ZXP9ydP+Uy/au24K9yu p44AnRxLg0Mls4eAyXhVjAG6pEa+QRHx =CZJw -----END PGP SIGNATURE----- From cfis at savagexi.com Thu Aug 21 16:01:01 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 14:01:01 -0600 Subject: [libxml-devel] << operator In-Reply-To: References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> Message-ID: <48ADC97D.7030006@savagexi.com> > So: > > node1 = XML::Node.new('foo') > node1 << 'bar' > > would produce 'bar' and not 'bar'. However: > > node2 = XML::Node.new('bar') > node1 << node2 > > should produce ''. > > If the argument to << is a string, append to the xmlNode->content via > xmlNodeAddContent(3). If, however, the argument is an XML::Node, then > use xmlAddChild(3) to add an xmlNode->child node. > > > + and << should behave differently: > > 'node1 + node2' should produce '' Yes, that is how it currently works. Note if you do this: node1 = XML::Node.new('foo') node1 << '' You get this: <bar/> Which seems fine to me. However, that doesn't really help with this issue: node = XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') If << returns self, then you get: '' where node points to foo If << returns the appended child, then: '' where node points to baz. Its the pointing to baz bit I don't like. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Thu Aug 21 16:02:18 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 14:02:18 -0600 Subject: [libxml-devel] << operator In-Reply-To: <4ADBF8D5-B2D3-4E41-B01B-A558F715F0B8@chittenden.org> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <4ADBF8D5-B2D3-4E41-B01B-A558F715F0B8@chittenden.org> Message-ID: <48ADC9CA.50904@savagexi.com> Sean Chittenden wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > >> If the argument to << is a string, append to the xmlNode->content via >> xmlNodeAddContent(3). If, however, the argument is an XML::Node, then >> use xmlAddChild(3) to add an xmlNode->child node. > > And if it's an atribute: > > node1 << XML::Attribute.new('foo','bar') Hmm, I wonder if that still works. I suspect it may not, would have to check. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 21 16:16:27 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 13:16:27 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ADC97D.7030006@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Note if you do this: > > node1 = XML::Node.new('foo') > node1 << '' > > You get this: > > <bar/> > > Which seems fine to me. Agreed, I'll pass on creating an XML CSS vulnerability. > However, that doesn't really help with this issue: Oh! ECOFFEE > node = XML::Node.new('foo') << XML::Node.new('bar') << > XML::Node.new('baz') > > If << returns self, then you get: > > '' where node points to foo > > If << returns the appended child, then: > > '' where node points to baz. Its the > pointing to baz bit I don't like. XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') Should produce: and XML::Node.new('foo') << XML::Node.new('bar') + XML::Node.new('baz') should produce the XML: XML::Node<<(arg) should return the argument being appended *ONLY IF* the argument is an XML::Node, else it should return the xmlNode that had its content modified (e.g. XML::Attributes and Strings). That should solve this problem. -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkitzRsACgkQTrydwWwuXhb/pgCfZua676fAwCJe9hW577AnF3u0 TrIAn02in72+UCysiFXfT8pdI2G4zez+ =e45q -----END PGP SIGNATURE----- From cfis at savagexi.com Thu Aug 21 17:01:57 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 21 Aug 2008 15:01:57 -0600 Subject: [libxml-devel] << operator In-Reply-To: References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> Message-ID: <48ADD7C5.9000505@savagexi.com> > > XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') > > Should produce: > > > > XML::Node<<(arg) should return the argument being appended *ONLY IF* the > argument is an XML::Node, else it should return the xmlNode that had its > content modified (e.g. XML::Attributes and Strings). > > That should solve this problem. So in this case the first << returns the 'bar' node and the second << returns the baz node. Thus: node = XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') node is baz. Which is annoying, because most of the time I imagine you want foo (at least I want foo). Thus, returning the appended argument is great, until the very end, where you want the original node... Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 21 17:10:55 2008 From: sean at chittenden.org (Sean Chittenden) Date: Thu, 21 Aug 2008 14:10:55 -0700 Subject: [libxml-devel] << operator In-Reply-To: <48ADD7C5.9000505@savagexi.com> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> <48ADD7C5.9000505@savagexi.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') >> Should produce: >> >> XML::Node<<(arg) should return the argument being appended *ONLY >> IF* the argument is an XML::Node, else it should return the xmlNode >> that had its content modified (e.g. XML::Attributes and Strings). >> That should solve this problem. > > So in this case the first << returns the 'bar' node and the second > << returns the baz node. > > Thus: > > node = XML::Node.new('foo') << XML::Node.new('bar') << > XML::Node.new('baz') > > node is baz. Which is annoying, because most of the time I imagine > you want foo (at least I want foo). Thus, returning the appended > argument is great, until the very end, where you want the original > node... ahahahaa... doh! Annoying and equally confusing. Since this is a fragment, what about: node = (XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz')).root or node = (XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz')).parent! where "parent!" recursively searches for the parent node that has a null parent (??). I'm guessing the most common situation is going to be finding a node in a document, then modifying it. Preserving an easy to use syntax for chaining nodes when building a fragment is more important than an assignment. Is that an acceptable argument to everyone? Feel free to debate this point! - -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkit2eAACgkQTrydwWwuXhawEQCff1y9cm8HIxInKCz/318Cy8xx NKAAmwUs622pU+ScMyLU/oIGsCQ6nRrD =UOJa -----END PGP SIGNATURE----- From shimbo at is.naist.jp Fri Aug 22 00:23:14 2008 From: shimbo at is.naist.jp (Masashi Shimbo) Date: Fri, 22 Aug 2008 13:23:14 +0900 Subject: [libxml-devel] << operator In-Reply-To: References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> <48ADD7C5.9000505@savagexi.com> Message-ID: <989032930808212123j6449ee35y27701ec9085d886a@mail.gmail.com> On Fri, Aug 22, 2008 at 6:10 AM, Sean Chittenden wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > >>> XML::Node.new('foo') << XML::Node.new('bar') << XML::Node.new('baz') >>> Should produce: >>> >>> XML::Node<<(arg) should return the argument being appended *ONLY IF* the >>> argument is an XML::Node, else it should return the xmlNode that had its >>> content modified (e.g. XML::Attributes and Strings). >>> That should solve this problem. >> >> So in this case the first << returns the 'bar' node and the second << >> returns the baz node. >> >> Thus: >> >> node = XML::Node.new('foo') << XML::Node.new('bar') << >> XML::Node.new('baz') >> >> node is baz. Which is annoying, because most of the time I imagine you >> want foo (at least I want foo). Thus, returning the appended argument is >> great, until the very end, where you want the original node... The above case by Charlie very much summarizes one point of my objection. And my impression is that all the discussion so far ignores how to make trees with two or more children (easily). It is not clear to me (or am I just stupid?) how to build a tree with two or more children when << returns a child instead of self, without variable assignments. It is easy when << returns self: A << (B << C) \ << (D << E) This makes A with two children B and C each with a If you want a path (tree without branches), you can still do A << (B << (C << D))) Masashi Shimbo From transfire at gmail.com Fri Aug 22 03:02:15 2008 From: transfire at gmail.com (Trans) Date: Fri, 22 Aug 2008 03:02:15 -0400 Subject: [libxml-devel] << operator In-Reply-To: <989032930808212123j6449ee35y27701ec9085d886a@mail.gmail.com> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> <48ADD7C5.9000505@savagexi.com> <989032930808212123j6449ee35y27701ec9085d886a@mail.gmail.com> Message-ID: <4b6f054f0808220002n17074a07xf83d0f122a99e671@mail.gmail.com> On Fri, Aug 22, 2008 at 12:23 AM, Masashi Shimbo wrote: > It is easy when << returns self: > > A << (B << C) \ > << (D << E) > > This makes A with two children B and C each with a > If you want a path (tree without branches), > you can still do > > A << (B << (C << D))) Doh! Of course. It has to return self. T. From dsisnero at gmail.com Fri Aug 22 14:26:57 2008 From: dsisnero at gmail.com (Dominic Sisneros) Date: Fri, 22 Aug 2008 12:26:57 -0600 Subject: [libxml-devel] Git repository? Message-ID: Could you convert this to a git repository. I would like to try out some changes and it makes it easier to fork and pull from git. Thanks, Dominic From transfire at gmail.com Fri Aug 22 16:13:21 2008 From: transfire at gmail.com (Trans) Date: Fri, 22 Aug 2008 13:13:21 -0700 (PDT) Subject: [libxml-devel] Git repository? In-Reply-To: References: Message-ID: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> On Aug 22, 2:26?pm, "Dominic Sisneros" wrote: > Could you convert this to a git repository. ?I would like to try out > some changes and it makes it easier to fork and pull from git. Sean says, "devil seed" ;) We've talked about this before. Sean doesn't like git. And honestly I can't say I totally disagree with him. If Rubyforge ever supported Mercurial, then I think Sean would support the switch but I talked to Tom Copeland, and while that's on the TODO list, there is no telling when it will happen. Why not just branch the SVN? I can spell out how if you're not familiar with it. T. From erik at hollensbe.org Sat Aug 23 00:14:24 2008 From: erik at hollensbe.org (Erik Hollensbe) Date: Fri, 22 Aug 2008 21:14:24 -0700 Subject: [libxml-devel] Git repository? In-Reply-To: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> References: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> Message-ID: <200808222114.24823.erik@hollensbe.org> On Friday 22 August 2008 13:13:21 Trans wrote: > On Aug 22, 2:26?pm, "Dominic Sisneros" wrote: > > Could you convert this to a git repository. ?I would like to try out > > some changes and it makes it easier to fork and pull from git. > Why not just branch the SVN? I can spell out how if you're not > familiar with it. Or they could just use git-svn, which makes these kinds of things trivial and doesn't impose personal choices on others. -Erik From transfire at gmail.com Sat Aug 23 04:37:27 2008 From: transfire at gmail.com (Trans) Date: Sat, 23 Aug 2008 04:37:27 -0400 Subject: [libxml-devel] Git repository? In-Reply-To: <200808222114.24823.erik@hollensbe.org> References: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> <200808222114.24823.erik@hollensbe.org> Message-ID: <4b6f054f0808230137m4f2e3954k391ee31f92b87d4e@mail.gmail.com> On Sat, Aug 23, 2008 at 12:14 AM, Erik Hollensbe wrote: > Or they could just use git-svn, which makes these kinds of things trivial and > doesn't impose personal choices on others. Good Answer. T. From sean at chittenden.org Sun Aug 24 12:21:35 2008 From: sean at chittenden.org (Sean Chittenden) Date: Sun, 24 Aug 2008 09:21:35 -0700 Subject: [libxml-devel] << operator In-Reply-To: <4b6f054f0808220002n17074a07xf83d0f122a99e671@mail.gmail.com> References: <48ACE7F3.2010802@savagexi.com> <48ADC212.7000701@savagexi.com> <48ADC97D.7030006@savagexi.com> <48ADD7C5.9000505@savagexi.com> <989032930808212123j6449ee35y27701ec9085d886a@mail.gmail.com> <4b6f054f0808220002n17074a07xf83d0f122a99e671@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> It is easy when << returns self: >> >> A << (B << C) \ >> << (D << E) >> >> This makes A with two children B and C each with a >> If you want a path (tree without branches), >> you can still do >> >> A << (B << (C << D))) > > Doh! Of course. > > It has to return self. I think the most common case is going to be something like this: And the programatic construction of it will look like: def n(a) XML::Node.new(a) end some_block = n('some_block') << (n('a') + n('b') + n('c')) other_block = n('other_block') << (n('d') + n('e')) root = n << some_block + other_block ... And I think the thing that needs to happen is this needs to be documented somewhere because XML construction is one of those things that people tend to like to want to do with XML libraries vs. nested function calls w/ string concatenation. -sc - -- Sean Chittenden sean at chittenden.org -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkixio8ACgkQTrydwWwuXha9vQCeIYz+gNj6VyPNHEIkNFnhzbUW hfQAn0DfHUfiCiDcbXrqOMz+ZJu6FdYz =V3ic -----END PGP SIGNATURE----- From noreply at rubyforge.org Sun Aug 24 13:01:39 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 24 Aug 2008 13:01:39 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21658 ] failure to parse and obey encoding when creating document Message-ID: <20080824170139.4D36818585A0@rubyforge.org> Bugs item #21658, was opened at 2008-08-24 13:01 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: failure to parse and obey encoding when creating document Initial Comment: The following appeared on comp.ruby.lang: ===== quoted material follows I have an XML request, using the following code as an example: require "rubygems" require "xml/libxml" movie = "sin+city" search_url = 'http://www.movie-xml.com/interfaces/getmovie.php?moviename=' url = search_url+movie doc = XML::Document.file(url) Here's the response I get: Input is not proper UTF-8, indicate encoding ! The source XML has an encoding declared as such: ===== end quoted material Tested and confirmed, plus I tried the same operation with REXML and there was no problem. It looks like we are not examining the encoding attribute up front and obeying it when parsing the body of the doc. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 From noreply at rubyforge.org Sun Aug 24 16:24:25 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 24 Aug 2008 16:24:25 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21658 ] failure to parse and obey encoding when creating document Message-ID: <20080824202425.9543E15B801A@rubyforge.org> Bugs item #21658, was opened at 2008-08-24 10:01 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: failure to parse and obey encoding when creating document Initial Comment: The following appeared on comp.ruby.lang: ===== quoted material follows I have an XML request, using the following code as an example: require "rubygems" require "xml/libxml" movie = "sin+city" search_url = 'http://www.movie-xml.com/interfaces/getmovie.php?moviename=' url = search_url+movie doc = XML::Document.file(url) Here's the response I get: Input is not proper UTF-8, indicate encoding ! The source XML has an encoding declared as such: ===== end quoted material Tested and confirmed, plus I tried the same operation with REXML and there was no problem. It looks like we are not examining the encoding attribute up front and obeying it when parsing the body of the doc. ---------------------------------------------------------------------- Comment By: Erik Hollensbe (erikh) Date: 2008-08-24 13:24 Message: >From this thread on ruby-talk: http://www.ruby-forum.com/topic/163524 ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 From noreply at rubyforge.org Sun Aug 24 19:29:57 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sun, 24 Aug 2008 19:29:57 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21658 ] failure to parse and obey encoding when creating document Message-ID: <20080824232958.02A9F18581C4@rubyforge.org> Bugs item #21658, was opened at 2008-08-24 13:01 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: failure to parse and obey encoding when creating document Initial Comment: The following appeared on comp.ruby.lang: ===== quoted material follows I have an XML request, using the following code as an example: require "rubygems" require "xml/libxml" movie = "sin+city" search_url = 'http://www.movie-xml.com/interfaces/getmovie.php?moviename=' url = search_url+movie doc = XML::Document.file(url) Here's the response I get: Input is not proper UTF-8, indicate encoding ! The source XML has an encoding declared as such: ===== end quoted material Tested and confirmed, plus I tried the same operation with REXML and there was no problem. It looks like we are not examining the encoding attribute up front and obeying it when parsing the body of the doc. ---------------------------------------------------------------------- Comment By: Eric Ivancich (ivancich) Date: 2008-08-24 19:29 Message: Twice in the XML data retrieved from the URL generated in the detailed description, the word "verg?enza" appears, where the "?" has hex code 0xFC that encodes a lower case "u" with umlaut in ISO-8859-1. 0xFC cannot appear in UTF-8 data due to RFC-3629. So that adds further evidence that it's trying to parse the file as UTF-8 rather than ISO-8859-1. ---------------------------------------------------------------------- Comment By: Erik Hollensbe (erikh) Date: 2008-08-24 16:24 Message: >From this thread on ruby-talk: http://www.ruby-forum.com/topic/163524 ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21658&group_id=494 From noreply at rubyforge.org Tue Aug 26 14:41:20 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Tue, 26 Aug 2008 14:41:20 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21546 ] SaxParser Crashes -- Null Pointer Message-ID: <20080826184120.ECC8F18581A7@rubyforge.org> Bugs item #21546, was opened at 2008-08-13 03:26 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Nick Retallack (nickretallack) Assigned to: Nobody (None) Summary: SaxParser Crashes -- Null Pointer Initial Comment: Any xml document that contains a DTD causes SaxParser to crash. Example: require 'rubygems' require 'libxml' class Handler def method_missing(method_name, *attributes, &block); end end parser = LibXML::XML::SaxParser.new parser.filename = 'anything_with_a_dtd.xml' parser.callbacks = Handler.new puts parser.filename parser.parse result: saxmltest.rb:12:in `parse': NULL pointer given (ArgumentError) It seems I am not the only person having this problem. http://rubyforge.org/pipermail/libxml-devel/2008-July/001042.html ---------------------------------------------------------------------- Comment By: Andy Hauser (buggs) Date: 2008-08-26 20:41 Message: This bug is whitespace sensitive. Your test_doctype with the inline document does not start with an XML declaration, so it does not get triggered. Remove the whitespace before it and the bug appears. It seems otherwise it's not even parsing the data. Here is complete program showing the bug: require 'libxml' class DocTypeCallback include LibXML::XML::SaxParser::Callbacks def on_start_element(element, attributes) puts element end end def test_doctype xp = LibXML::XML::SaxParser.new xp.callbacks = DocTypeCallback.new xp.string = <<-EOS a1 EOS doc = xp.parse end test_doctype ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-08-20 10:29 Message: Does the tc_sax_parser#test_doctype test work for you? That's copied directly from the link you mention, and works fine here. Does it work for you? ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21546&group_id=494 From mark at thomaszone.com Tue Aug 26 23:01:06 2008 From: mark at thomaszone.com (Mark Thomas) Date: Tue, 26 Aug 2008 23:01:06 -0400 Subject: [libxml-devel] Subclassing XML::Document? Message-ID: <48B4C372.50203@thomaszone.com> I'd like to create a subclass of XML::Document. However, I'm having trouble figuring out how I would instantiate my subclass because parsing requires Parser.parse() which returns an XML::Document instead of My::Subclass. Unfortunately, Ruby doesn't have a way of changing the class of an object. Any suggestions? -- Mark. From ruby at thomaszone.com Wed Aug 27 09:24:23 2008 From: ruby at thomaszone.com (Mark Thomas) Date: Wed, 27 Aug 2008 09:24:23 -0400 Subject: [libxml-devel] Subclassing XML::Document? Message-ID: <48B55587.2080301@thomaszone.com> I'd like to create a subclass of XML::Document. However, I'm having trouble figuring out how I would instantiate my subclass because parsing requires Parser.parse() which returns an XML::Document instead of My::Subclass. Unfortunately, Ruby doesn't have a way of changing the class of an object. Any suggestions? -- Mark. From danj at 3skel.com Wed Aug 27 11:44:54 2008 From: danj at 3skel.com (Dan Janowski) Date: Wed, 27 Aug 2008 11:44:54 -0400 Subject: [libxml-devel] Subclassing XML::Document? In-Reply-To: <48B4C372.50203@thomaszone.com> References: <48B4C372.50203@thomaszone.com> Message-ID: <5D9EC844-1497-4686-9116-5788C0A54A20@3skel.com> You may be able to produce what you want by defining a module and extending the Document with it: ardent:~ danj$ irb irb(main):001:0> class A irb(main):002:1> end => nil irb(main):003:0> module B irb(main):004:1> def hello irb(main):005:2> puts "hello" irb(main):006:2> end irb(main):007:1> end => nil irb(main):008:0> a=A.new => # irb(main):009:0> a.extend B => # irb(main):010:0> a.hello hello => nil irb(main):011:0> You can also override #parse by renaming the original function and perform the extend after calling #parse_original. Dan On Aug 26, 2008, at 23:01, Mark Thomas wrote: > I'd like to create a subclass of XML::Document. However, I'm having > trouble figuring out how I would instantiate my subclass because > parsing requires Parser.parse() which returns an XML::Document > instead of My::Subclass. Unfortunately, Ruby doesn't have a way of > changing the class of an object. > > Any suggestions? > > -- Mark. > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel From cfis at savagexi.com Wed Aug 27 12:44:30 2008 From: cfis at savagexi.com (Charlie Savage) Date: Wed, 27 Aug 2008 10:44:30 -0600 Subject: [libxml-devel] Subclassing XML::Document? In-Reply-To: <48B4C372.50203@thomaszone.com> References: <48B4C372.50203@thomaszone.com> Message-ID: <48B5846E.4080608@savagexi.com> Hey Mark, > I'd like to create a subclass of XML::Document. However, I'm having > trouble figuring out how I would instantiate my subclass because parsing > requires Parser.parse() which returns an XML::Document instead of > My::Subclass. Unfortunately, Ruby doesn't have a way of changing the > class of an object. Hmm, interesting question. One approach is what Dan said. The way the code works now (simplified): ruby_xml_parser_parse(VALUE self) { ruby_xml_parser *rxp; ruby_xml_parser_context *rxpc; xmlDocPtr xdp; VALUE doc; Data_Get_Struct(self, ruby_xml_parser, rxp); Data_Get_Struct(rxp->ctxt, ruby_xml_parser_context, rxpc); xmlParseDocument(rxpc->ctxt) xdp = rxpc->ctxt->myDoc; return ruby_xml_document_wrap(xdp); } The issue is the last line - it takes the libxml document object and wraps it by creating a new Ruby document object. Somehow that would have to change to allow the user to specify what document to create. The most obvious way of doing this is adding a parameter to the parse method that specifies what class to create: XML::Parser.string('foo').parse(MyCustomDocument) and then also add a parameter to the ruby_xml_document_wrap method. That would be easy to do, but seems a bit kludgy. But I have no better ideas... Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From sean at chittenden.org Thu Aug 28 02:53:09 2008 From: sean at chittenden.org (Sean Chittenden) Date: Wed, 27 Aug 2008 23:53:09 -0700 Subject: [libxml-devel] REXML vuln... Message-ID: <3FA514BE-2A53-4904-81DD-80E5544B3197@chittenden.org> Not applicable to libxml, but in case anyone missed it: http://www.ruby-lang.org/en/news/2008/08/23/dos-vulnerability-in-rexml/ -sc -- Sean Chittenden sean at chittenden.org From noreply at rubyforge.org Thu Aug 28 10:51:06 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Thu, 28 Aug 2008 10:51:06 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21607 ] XML::Reader#read_state never returns MODE_EOF Message-ID: <20080828145106.9DFDA18581AE@rubyforge.org> Bugs item #21607, was opened at 2008-08-19 13:24 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: K Liu (kliuless) Assigned to: Nobody (None) Summary: XML::Reader#read_state never returns MODE_EOF Initial Comment: Even after the XML::Reader has reached the end of the stream, the read_state is returning MODE_READING instead of MODE_EOF. This is on Mac OS Leopard with libxml 2.6.16. I do not see the problem on sparc/solaris with libxml 2.6.31. Sorry, I don't have time to try different versions on the two boxes. ---------------------------------------------------------------------- >Comment By: K Liu (kliuless) Date: 2008-08-28 09:51 Message: Here's a simple test case: # === begin script === require 'rubygems' require 'xml/libxml' # constant maps state_map = Hash[ * XML::Reader.constants.grep(/^MODE_/).collect{|c| [XML::Reader.const_get(c), c] }.flatten ] type_map = Hash[ * XML::Reader.constants.grep(/^TYPE_/).collect{|c| [XML::Reader.const_get(c), c] }.flatten ] rdr = XML::Reader.string('') puts 'Node name / Node type / Read state' p [rdr.name, type_map[rdr.node_type], state_map[rdr.read_state]] 2.times { rdr.read p [rdr.name, type_map[rdr.node_type], state_map[rdr.read_state]] } # === end script === Output on the Mac: Node name / Node type / Read state [nil, "TYPE_NONE", "MODE_INITIAL"] ["xml", "TYPE_ELEMENT", "MODE_READING"] [nil, "TYPE_NONE", "MODE_READING"] Output on the Sparc: Node name / Node type / Read state [nil, "TYPE_NONE", "MODE_INITIAL"] ["xml", "TYPE_ELEMENT", "MODE_EOF"] [nil, "TYPE_NONE", "MODE_EOF"] ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-08-20 03:30 Message: K Liu, do you have a test case we can look at? ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21607&group_id=494