From bryant.doug at gmail.com Fri Jun 2 11:40:46 2006 From: bryant.doug at gmail.com (Doug Bryant) Date: Fri, 2 Jun 2006 11:40:46 -0400 Subject: [libxml-devel] error when application finished running In-Reply-To: References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> Message-ID: <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> I created a testcase of ~50 lines which reproduces the problem. The test case duplicates part of what is going on in my program. The problem is when the program exits, you get the following error. On OS X, you get a different, but similar error. Both errors complain about double free memory or deallocating memory inside a block. *** glibc detected *** ruby: double free or corruption (fasttop): 0x09fef6e0 *** ======= Backtrace: ========= /lib/libc.so.6[0xbacf18] /lib/libc.so.6(__libc_free+0x78)[0xbb03ef] ruby(ruby_xfree+0x26)[0x806d936] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x11d)[0x713de5d] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x96)[0x713ddd6] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x96)[0x713ddd6] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x96)[0x713ddd6] /usr/lib/libxml2.so.2(xmlFreeNodeList+0x96)[0x713ddd6] /usr/lib/libxml2.so.2(xmlFreeDoc+0xcb)[0x713dbfb] /usr/local/lib/ruby/site_ruby/1.8/i686-linux/xml/libxml_so.so(ruby_xml_document_free+0x95)[0xfa57e5] ruby(rb_gc_call_finalizer_at_exit+0x87)[0x806db77] ruby[0x80573e9] ruby(ruby_cleanup+0xcc)[0x806327c] ruby(ruby_stop+0x11)[0x8063351] ruby[0x8068831] ruby[0x80521d4] /lib/libc.so.6(__libc_start_main+0xdc)[0xb5e724] ruby[0x8052111] You need two files (attached) to run the test case. Just drop them anywhere and make sure they are in the same directory. The test case simulates merging several xml files together, but in the test case, the merged files to be merged contain the same data, rather than different. You have to tell the test case how many times to merge. On my linux box, it is about 350. On my OS X box, it is about 150. Below those numbers, the error is not present. e.g. ruby merge.rb 350 I really appreciate the help getting this resolved. I played around with gdb and this error yesterday, but did not come up with anything helpful. My C chops are, err.. poor. Doug On 5/31/06, Ross Bamford wrote: > > On Wed, 31 May 2006 21:50:01 +0100, Doug Bryant > wrote: > > > I'm getting the below error when the program I have written using > > libxml-ruby finishes running (when the program terminates). > > > > Can you post your code? > > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060602/59aa7147/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: merge.rb Type: application/x-ruby Size: 1101 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060602/59aa7147/attachment-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: data.xml Type: text/xml Size: 1619 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060602/59aa7147/attachment-0001.xml From rosco at roscopeco.co.uk Fri Jun 2 12:56:21 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 02 Jun 2006 17:56:21 +0100 Subject: [libxml-devel] error when application finished running In-Reply-To: <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> Message-ID: On Fri, 02 Jun 2006 16:40:46 +0100, Doug Bryant wrote: > I created a testcase of ~50 lines which reproduces the problem. > The test case duplicates part of what is going on in my program. > Okay, I can replicate this problem and I'll try to work up a fix. If you have the time, I'd appreciate if you'd file a bug report to http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse 'for the record'. Cheers, -- Ross Bamford - rosco at roscopeco.co.uk From bryant.doug at gmail.com Fri Jun 2 15:31:33 2006 From: bryant.doug at gmail.com (Doug Bryant) Date: Fri, 2 Jun 2006 15:31:33 -0400 Subject: [libxml-devel] error when application finished running In-Reply-To: References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> Message-ID: <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> done - http://rubyforge.org/tracker/index.php?func=detail&aid=4635&group_id=494&atid=1971 Thanks Ross. Doug On 6/2/06, Ross Bamford wrote: > > On Fri, 02 Jun 2006 16:40:46 +0100, Doug Bryant > wrote: > > > I created a testcase of ~50 lines which reproduces the problem. > > The test case duplicates part of what is going on in my program. > > > > Okay, I can replicate this problem and I'll try to work up a fix. If you > have the time, I'd appreciate if you'd file a bug report to > http://rubyforge.org/tracker/?atid=1971&group_id=494&func=browse 'for the > record'. > > Cheers, > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060602/3663cbb3/attachment.htm From rosco at roscopeco.co.uk Sat Jun 3 12:48:50 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Sat, 03 Jun 2006 17:48:50 +0100 Subject: [libxml-devel] error when application finished running In-Reply-To: <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> Message-ID: On Fri, 02 Jun 2006 20:31:33 +0100, Doug Bryant wrote: > done - > http://rubyforge.org/tracker/index.php?func=detail&aid=4635&group_id=494&atid=1971 > Thanks for that. I've been working on this today, and I think it's down to the way we handle copied nodes, especially between documents. The crash occurs when the document is freed and tries to free it's nodes, since some have already been freed (with their originating document). This whole area needs some work but for now I'd appreciate if you could try the attached patch against CVS head which will (hopefully) deal with this problem case. Cheers, -- Ross Bamford - rosco at roscopeco.co.uk -------------- next part -------------- A non-text attachment was scrubbed... Name: inter_document_node_crash.patch Type: text/x-patch Size: 2844 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060603/c2e6a8b4/attachment.bin From bryant.doug at gmail.com Sun Jun 4 10:19:40 2006 From: bryant.doug at gmail.com (Doug Bryant) Date: Sun, 4 Jun 2006 10:19:40 -0400 Subject: [libxml-devel] error when application finished running In-Reply-To: References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> Message-ID: <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> Thanks for getting to this so quickly. The patch prevents the crash on OS X and linux, but causes another, far worse problem. With the patch applied to the head of cvs, the testcase seems to *not* copy some number of nodes. I ran this test with and without the patch. 1) run the test, 2) format the output for consistency, 3) grep "
" and do a word count on the lines found ruby merge.rb 1000 xmllint --format generated_form.xml > output.xml cat output.xml | grep "
" | wc -l With the unpatched version, the result of grep & wc -l was always equal to the number of times you told the merge.rb program to run. With the patched version, after a certain threshold, the output started to differ from the number of times you told the testcase to merge. So, on the linux box, for 1000 times with the patch applied, the result would be 18. Without the patch, it would be 1000. On the linux box, if you gave it a lower number, like 100, the results of applied vs. unapplied patch would be the same - 100. On the OS X box, with a target number of 100, the unapplied patch would be 100, and the applied patch result would be 52. Thanks for helping to get this problem resolved. Doug On 6/3/06, Ross Bamford wrote: > On Fri, 02 Jun 2006 20:31:33 +0100, Doug Bryant > wrote: > > > done - > > http://rubyforge.org/tracker/index.php?func=detail&aid=4635&group_id=494&atid=1971 > > > > Thanks for that. I've been working on this today, and I think it's down to > the way we handle copied nodes, especially between documents. The crash > occurs when the document is freed and tries to free it's nodes, since some > have already been freed (with their originating document). This whole area > needs some work but for now I'd appreciate if you could try the attached > patch against CVS head which will (hopefully) deal with this problem case. > > Cheers, > -- > Ross Bamford - rosco at roscopeco.co.uk > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > > > From rosco at roscopeco.co.uk Sun Jun 4 10:48:15 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Sun, 04 Jun 2006 15:48:15 +0100 Subject: [libxml-devel] error when application finished running In-Reply-To: <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> Message-ID: On Sun, 04 Jun 2006 15:19:40 +0100, Doug Bryant wrote: > Thanks for getting to this so quickly. The patch prevents the crash > on OS X and linux, but causes another, far worse problem. > Ahh well, it was worth a try :) I'll give this issue a proper look this week and try to tackle the underlying problem (those pointer nodes). Thanks for trying the patch, though. I'll post further when I have something. Cheers, -- Ross Bamford - rosco at roscopeco.co.uk From bryant.doug at gmail.com Sun Jun 4 19:46:12 2006 From: bryant.doug at gmail.com (Doug Bryant) Date: Sun, 4 Jun 2006 19:46:12 -0400 Subject: [libxml-devel] error when application finished running In-Reply-To: References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> Message-ID: <78cf1ade0606041646k1e546f1es4144249d19799436@mail.gmail.com> ... and thanks for trying! What are the plans for upcoming memory management in libxml-ruby? I have seen some stuff floating around on the net about the current implementation trying to bypass ruby's gc. Are there plans to use rubys mark & sweep api? Is this related to the bug in question? Thanks, Doug On 6/4/06, Ross Bamford wrote: > On Sun, 04 Jun 2006 15:19:40 +0100, Doug Bryant > wrote: > > > Thanks for getting to this so quickly. The patch prevents the crash > > on OS X and linux, but causes another, far worse problem. > > > > Ahh well, it was worth a try :) I'll give this issue a proper look this > week and try to tackle the underlying problem (those pointer nodes). > Thanks for trying the patch, though. I'll post further when I have > something. > > Cheers, > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > From rosco at roscopeco.co.uk Tue Jun 6 04:39:48 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Tue, 06 Jun 2006 09:39:48 +0100 Subject: [libxml-devel] error when application finished running In-Reply-To: <78cf1ade0606041646k1e546f1es4144249d19799436@mail.gmail.com> References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> <78cf1ade0606041646k1e546f1es4144249d19799436@mail.gmail.com> Message-ID: On Mon, 05 Jun 2006 00:46:12 +0100, Doug Bryant wrote: > ... and thanks for trying! > > What are the plans for upcoming memory management in libxml-ruby? I > have seen some stuff floating around on the net about the current > implementation trying to bypass ruby's gc. Are there plans to use > rubys mark & sweep api? Is this related to the bug in question? > It may be somewhat related, although there's an aspect I'm missing with libxml2, since it's the xmlFreeDoc that actually causes the fault when a node exists in two documents. I'm looking more into how xmlDocCopyNode and so on works, and I might need to ask on the libxml2 list yet, but I'm working on it :) The current implementation does try to avoid the GC where possible, and there was some discussion a while back with Sean about this. He made the decision based (I think) on Ruby's performance at the time but with hindsight he thought it might be better to change the way memory is handled, and stick with the GC where possible. I'm planning a gradual migration as stuff needs to be worked on, and I'm hoping to get started with the basic XML::Node and XML::Document stuff pretty soon (possibly in a cvs branch). -- Ross Bamford - rosco at roscopeco.co.uk From bryant.doug at gmail.com Tue Jun 6 16:52:37 2006 From: bryant.doug at gmail.com (Doug Bryant) Date: Tue, 6 Jun 2006 16:52:37 -0400 Subject: [libxml-devel] error when application finished running In-Reply-To: References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> <78cf1ade0606041646k1e546f1es4144249d19799436@mail.gmail.com> Message-ID: <78cf1ade0606061352t2fad7afeu91978d115d8f3e42@mail.gmail.com> Ross, I ran across xml-smarty today. It is also based on libxml and the author seems to have solved the cross document node copying issue. xml-smarty's license is lgpl rather than the MIT license, but I believe it is legal to look and see how things are done as long as you don't directly copy the code. Then again, I am no lawyer :) http://raa.ruby-lang.org/project/ruby-xml-smart/ I hope this is of some help. Doug On 6/6/06, Ross Bamford wrote: > > On Mon, 05 Jun 2006 00:46:12 +0100, Doug Bryant > wrote: > > > ... and thanks for trying! > > > > What are the plans for upcoming memory management in libxml-ruby? I > > have seen some stuff floating around on the net about the current > > implementation trying to bypass ruby's gc. Are there plans to use > > rubys mark & sweep api? Is this related to the bug in question? > > > > It may be somewhat related, although there's an aspect I'm missing with > libxml2, since it's the xmlFreeDoc that actually causes the fault when a > node exists in two documents. I'm looking more into how xmlDocCopyNode and > so on works, and I might need to ask on the libxml2 list yet, but I'm > working on it :) > > The current implementation does try to avoid the GC where possible, and > there was some discussion a while back with Sean about this. He made the > decision based (I think) on Ruby's performance at the time but with > hindsight he thought it might be better to change the way memory is > handled, and stick with the GC where possible. I'm planning a gradual > migration as stuff needs to be worked on, and I'm hoping to get started > with the basic XML::Node and XML::Document stuff pretty soon (possibly in > a cvs branch). > > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060606/3ca5aa26/attachment-0001.htm From danj at 3skel.com Fri Jun 9 12:36:16 2006 From: danj at 3skel.com (Dan Janowski) Date: Fri, 9 Jun 2006 12:36:16 -0400 Subject: [libxml-devel] Patch/Question about xmlAttr Message-ID: Hi, I have just started using libxml2 for ruby and have come to a problem. Attribute traversal is broken. Only the first attr is accessible from .properties. Here is an illustration: require 'rubygems' require 'libxml' p=XML::Parser.string(%q{}) d=p.parse irb(main):007:0> d.root.properties => name = actid irb(main):008:0> d.root.properties.next => nil This should yield the "type" attribute. I have modified the ruby_xml_attr.c and ruby_xml_node.c, the same run yields this: require 'libxml_so' p=XML::Parser.string(%q{}) d=p.parse irb(main):004:0> d.root.properties => # irb(main):005:0> d.root.properties.name => "name" irb(main):006:0> d.root.properties.next.name => "type" irb(main):007:0> d.root.properties.next => # irb(main):009:0> d.root.properties.next.name => "type" The problem seems to be the use of: rxa->attr = xmlCopyProp(attr->parent, attr); xmlCopyProp seems to disconnect attr->next in the copy In addition to which, I do not know why it would need to copy instead of just use the xmlAttrPtr from the model. Unless there is some problem with this, here is a patch for it. -------------- next part -------------- A non-text attachment was scrubbed... Name: libxml-ruby-0.3.8.1.patch Type: application/octet-stream Size: 1953 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060609/c3ed9518/attachment.obj -------------- next part -------------- From danj at 3skel.com Fri Jun 9 12:59:51 2006 From: danj at 3skel.com (Dan Janowski) Date: Fri, 9 Jun 2006 12:59:51 -0400 Subject: [libxml-devel] Patch/Question about xmlAttr In-Reply-To: References: Message-ID: <0C8A01C6-E156-4984-9AB7-50FF4360FFD9@3skel.com> updated patch include prototype. -------------- next part -------------- A non-text attachment was scrubbed... Name: libxml-ruby-0.3.8.1.patch Type: application/octet-stream Size: 2470 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060609/c79da0b7/attachment.obj From rosco at roscopeco.co.uk Fri Jun 9 13:54:24 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 09 Jun 2006 18:54:24 +0100 Subject: [libxml-devel] error when application finished running In-Reply-To: <78cf1ade0606061352t2fad7afeu91978d115d8f3e42@mail.gmail.com> References: <78cf1ade0605311350m725b0112oa8ac466b9588d58b@mail.gmail.com> <78cf1ade0606020840x6c9c367ax7dc883bac15d304c@mail.gmail.com> <78cf1ade0606021231j32c62253g616e1a73d872bea6@mail.gmail.com> <78cf1ade0606040719i1a81f01elf788224378b905c7@mail.gmail.com> <78cf1ade0606041646k1e546f1es4144249d19799436@mail.gmail.com> <78cf1ade0606061352t2fad7afeu91978d115d8f3e42@mail.gmail.com> Message-ID: On Tue, 06 Jun 2006 21:52:37 +0100, Doug Bryant wrote: > Ross, > > I ran across xml-smarty today. It is also based on libxml and the author > seems to have solved the cross document node copying issue. > > xml-smarty's license is lgpl rather than the MIT license, but I believe > it > is legal to look and see how things are done as long as you don't > directly > copy the code. Then again, I am no lawyer :) > > http://raa.ruby-lang.org/project/ruby-xml-smart/ > > I hope this is of some help. > Yeah, I've looked into XML-Smart on more than one occasion to see how things are done ;) On this occasion, though, there's not a great deal of difference in the way we're doing things - I think this problem is more rooted in the mixture of data pointers and pointers to data pointers floating around inside the Ruby XML::Node instances, which comes from, well, pretty much everywhere :) I wrote up a non-ruby test for the xmlDocCopyNode idea I previously suggested, and found that without copying the node first I saw the same problem as with the extension, but by copying first the problem went away, and valgrind gave a clean leak-check. XML-Smart makes do with xmlCopyNode to handle this, but still I feel sure we need to look deeper at the way we're handling those pointer nodes. -- Ross Bamford - rosco at roscopeco.co.uk From rosco at roscopeco.co.uk Fri Jun 9 13:54:32 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Fri, 09 Jun 2006 18:54:32 +0100 Subject: [libxml-devel] Patch/Question about xmlAttr In-Reply-To: References: Message-ID: On Fri, 09 Jun 2006 17:36:16 +0100, Dan Janowski wrote: > Hi, > > I have just started using libxml2 for ruby and have come to a > problem. Attribute traversal is broken. Only the first attr is > accessible from .properties. > This one is on my list to look at, so thanks for the patch :) This issue is actually connected to the memory management problems I'm working on at the moment, and I'm going to be making a few changes to the way we share libxml2 data between ruby instances, since this is currently causing problems, especially when multiple documents are involved. What I'm aiming for is to let ruby's GC handle as much of the management as possible, which will probably mean copying things around a bit in some cases (e.g. copying between documents), but I think in this case we could use xmlCopyPropList instead of xmlCopyProp to take care of that. Anyway, I'll integrate this fix into my changes and will hopefully be committing very soon. I've taken the liberty of filing a bug report for this in the meantime. Thanks, -- Ross Bamford - rosco at roscopeco.co.uk From danj at 3skel.com Fri Jun 9 16:07:47 2006 From: danj at 3skel.com (Dan Janowski) Date: Fri, 9 Jun 2006 16:07:47 -0400 Subject: [libxml-devel] Patch/Question about xmlAttr In-Reply-To: References: Message-ID: <8F7083EA-E495-48ED-A3D0-F8CDBDC50081@3skel.com> Hi, Has there been a discussion about the memory-model changes? If so, I am interested to understand the direction better. Is there a particular list thread that I might read? In my past ruby extensions I have avoided giving the GC any more than necessary, to limit memory copying as well as performance. Is the libXML memory model unstable to keep pointer references to? Dan On Jun 9, 2006, at 13:54, Ross Bamford wrote: > On Fri, 09 Jun 2006 17:36:16 +0100, Dan Janowski > wrote: > >> Hi, >> >> I have just started using libxml2 for ruby and have come to a >> problem. Attribute traversal is broken. Only the first attr is >> accessible from .properties. >> > > This one is on my list to look at, so thanks for the patch :) This > issue > is actually connected to the memory management problems I'm working > on at > the moment, and I'm going to be making a few changes to the way we > share > libxml2 data between ruby instances, since this is currently causing > problems, especially when multiple documents are involved. > > What I'm aiming for is to let ruby's GC handle as much of the > management > as possible, which will probably mean copying things around a bit > in some > cases (e.g. copying between documents), but I think in this case we > could > use xmlCopyPropList instead of xmlCopyProp to take care of that. > > Anyway, I'll integrate this fix into my changes and will hopefully be > committing very soon. I've taken the liberty of filing a bug report > for > this in the meantime. > > Thanks, > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel From rosco at roscopeco.co.uk Sat Jun 10 03:09:53 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Sat, 10 Jun 2006 08:09:53 +0100 Subject: [libxml-devel] Patch/Question about xmlAttr In-Reply-To: <8F7083EA-E495-48ED-A3D0-F8CDBDC50081@3skel.com> References: <8F7083EA-E495-48ED-A3D0-F8CDBDC50081@3skel.com> Message-ID: On Fri, 09 Jun 2006 21:07:47 +0100, Dan Janowski wrote: > Hi, > > Has there been a discussion about the memory-model changes? If so, I > am interested to understand the direction better. Is there a > particular list thread that I might read? > Sean and I have discussed it a little bit, though it must have been offlist since I can't find it in the archive :I . There are a few threads about problems / bugs caused by memory handling, e.g. the segfault bug Doug Bryant reported recently: http://rubyforge.org/pipermail/libxml-devel/2006-June/000162.html > In my past ruby extensions I have avoided giving the GC any more than > necessary, to limit memory copying as well as performance. Is the > libXML memory model unstable to keep pointer references to? > Not especially, there are just a few special cases that we don't currently cater for, such as when copying nodes between documents. If a node pointer is added to a second document, and them xmlFreeDoc is called on the first, that pointer becomes invalid and will cause a segfault either when the document is accessed, or when xmlFreeDoc is called on it. So in this case, I think we do need to do a recursive copy of the nodes, so that they can be garbage collected with the new document. I'm still learning the intricacies of libxml2 but this is how it looks to me right now. I'm not planning to copy indiscrimiately, however - pointers will probably be retained to nodes copied within the same document, and with attributes wherever possible. Mainly I'm looking at rationalizing our memory handling, cleaning up and making sure we're using the right ruby_xml_whatever_new[123 etc.] function in the right place. -- Ross Bamford - rosco at roscopeco.co.uk From aitorgr at gmail.com Mon Jun 12 13:30:10 2006 From: aitorgr at gmail.com (Aitor Garay-Romero) Date: Mon, 12 Jun 2006 19:30:10 +0200 Subject: [libxml-devel] parsing HTML Message-ID: <11a866cb0606121030teb132ccu56bfea9f28cbf6ea@mail.gmail.com> Hi there!, I would like to use the libxml-ruby library to parse HTML obtained directly from the net. Usually the HTML is not correct, so libxml-ruby complaints with a long list of errors. If i transform the HTML to correct XML using the tidy program i still get some errors like: ---------------------------------------------------- 8< ----------------------------------------------------------------- /tmp/temp.xml26359.0:1: parser error : Space required after the Public Identifier ^ /tmp/temp.xml26359.0:1: parser error : SystemLiteral " or ' expected ^ /tmp/temp.xml26359.0:1: parser error : SYSTEM or PUBLIC, the URI is missing ^ /tmp/temp.xml26359.0:55: parser error : Entity 'nbsp' not defined   ^ /tmp/temp.xml26359.0:57: parser error : Entity 'nbsp' not defined  &n ^ /tmp/temp.xml26359.0:57: parser error : Entity 'nbsp' not defined    ^ ---------------------------------------------------- 8< ----------------------------------------------------------------- How can i solve those errors without having to modify the XML? Anyway, is there a way to parse non-correct HTML with libxml directly? Seems that gnome-libxml2 supports that. Thanks!, /AITOR -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060612/bb860cc6/attachment-0001.htm From danj at 3skel.com Mon Jun 12 15:08:46 2006 From: danj at 3skel.com (Dan Janowski) Date: Mon, 12 Jun 2006 15:08:46 -0400 Subject: [libxml-devel] ruby_xml_node.c patch (includes ruby_xml_attr.c patches) Message-ID: Another attribute bug. calling .properties when node does not include attributes returns a new cXMLAttr when it should return nil. Any calls on the cXMLAttr results in a SEGV since there is no underlying attr structure. This includes prior attr patch. Dan -------------- next part -------------- A non-text attachment was scrubbed... Name: libxml-ruby-0.3.8.2.patch Type: application/octet-stream Size: 2980 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20060612/45406eb4/attachment.obj From rosco at roscopeco.co.uk Tue Jun 13 06:20:51 2006 From: rosco at roscopeco.co.uk (Ross Bamford) Date: Tue, 13 Jun 2006 11:20:51 +0100 Subject: [libxml-devel] parsing HTML In-Reply-To: <11a866cb0606121030teb132ccu56bfea9f28cbf6ea@mail.gmail.com> References: <11a866cb0606121030teb132ccu56bfea9f28cbf6ea@mail.gmail.com> Message-ID: On Mon, 12 Jun 2006 18:30:10 +0100, Aitor Garay-Romero wrote: > Anyway, is there a way to parse non-correct HTML with libxml directly? > Seems that gnome-libxml2 supports that. > I'm not sure, but I'll look into it. In the meantime, you might want to look at SAX parsing, or one of the ruby HTML libraries (RubyfulSoup springs to mind). -- Ross Bamford - rosco at roscopeco.co.uk From aitorgr at gmail.com Tue Jun 13 11:16:41 2006 From: aitorgr at gmail.com (Aitor Garay-Romero) Date: Tue, 13 Jun 2006 17:16:41 +0200 Subject: [libxml-devel] parsing HTML In-Reply-To: References: <11a866cb0606121030teb132ccu56bfea9f28cbf6ea@mail.gmail.com> Message-ID: <11a866cb0606130816v1a305312v59f235a8b8b076ae@mail.gmail.com> >I'm not sure, but I'll look into it. In the meantime, you might want to > look at SAX parsing, or one of the ruby HTML libraries (RubyfulSoup > springs to mind). I need DOM parsing. I use a lot of XPATH and translating this into SAX is going to be a nightmare. Rigth now i use tidy + REXML and i'm happy with that, but definitively i need more performance. Thanks! /AITOR On 6/13/06, Ross Bamford wrote: > > On Mon, 12 Jun 2006 18:30:10 +0100, Aitor Garay-Romero > wrote: > > > Anyway, is there a way to parse non-correct HTML with libxml > directly? > > Seems that gnome-libxml2 supports that. > > > > I'm not sure, but I'll look into it. In the meantime, you might want to > look at SAX parsing, or one of the ruby HTML libraries (RubyfulSoup > springs to mind). > > -- > Ross Bamford - rosco at roscopeco.co.uk > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060613/a91196ef/attachment.htm From aitorgr at gmail.com Thu Jun 15 14:54:28 2006 From: aitorgr at gmail.com (Aitor Garay-Romero) Date: Thu, 15 Jun 2006 20:54:28 +0200 Subject: [libxml-devel] minor typo in README file example Message-ID: <11a866cb0606151154w29eeb17bj9a0049af6288c160@mail.gmail.com> Hi there!, In the README file the "reading xml" example is not correct. Somewhere it says: puts "Node path: #{node.path} \t Contents: #{node}" But it should say: puts "Node path: #{node.path} \t Contents: #{node.contents}" If not the output will not be the one shown on the page. /AITOR -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/libxml-devel/attachments/20060615/4dee7151/attachment.htm From pedrocr at gmail.com Tue Jun 27 09:14:05 2006 From: pedrocr at gmail.com (=?ISO-8859-1?Q?Pedro_C=F4rte-Real?=) Date: Tue, 27 Jun 2006 14:14:05 +0100 Subject: [libxml-devel] Segfault when doing simple parsing Message-ID: <62e8012c0606270614n4d6e7b1ao3c3cdac35a4b9a90@mail.gmail.com> I'm trying to use libxml to run some XPath on a document I have as a string in memory. This is my code for parsing the document: parser = XML::Parser.new parser.string = self.content @xmldoc = parser.parse Doing the same thing in a test script seems to work without segfaulting. This is run inside a rails model and segfaults when testing it. Here's the backtrace: 0xb72b37c4 in ruby_xml_parser_parse (self=144875496) at ruby_xml_parser.c:1124 1124 xmlFreeDoc(rxpc->ctxt->myDoc); #0 0xb72b37c4 in ruby_xml_parser_parse (self=144875496) at ruby_xml_parser.c:1124 #1 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #2 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 (...) Most of the rest seems pretty irrelevant. Just more calls inside ruby with rb_thread_trap_eval. The full backtrace is attached just in case. My ruby is: ruby 1.8.4 (2005-12-24) [i486-linux] I'm running it on Ubuntu Dapper with kernel 2.6.15-25-386 Pedro. -------------- next part -------------- #0 0xb72b37c4 in ruby_xml_parser_parse (self=144875496) at ruby_xml_parser.c:1124 #1 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #2 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #3 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #4 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #5 0xb7ef9946 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #6 0xb7ef7913 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #7 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #8 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #9 0xb7ef8533 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #10 0xb7ef85c6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #11 0xb7ef98ac in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #12 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #13 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #14 0xb7ef8578 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #15 0xb7ef85c6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #16 0xb7ef85c6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #17 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #18 0xb7f030f6 in rb_apply () from /usr/lib/libruby1.8.so.1.8 #19 0xb7ef0523 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #20 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #21 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #22 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #23 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #24 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #25 0xb7ef8533 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #26 0xb7ef991a in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #27 0xb7ef9fc7 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #28 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #29 0xb7efe9d2 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #30 0xb7efb0af in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #31 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #32 0xb7efc107 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #33 0xb7ef0523 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #34 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #35 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #36 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #37 0xb7ef8633 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #38 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #39 0xb7eff2de in rb_yield () from /usr/lib/libruby1.8.so.1.8 #40 0xb7edea1a in rb_ary_each () from /usr/lib/libruby1.8.so.1.8 #41 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #42 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #43 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #44 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #45 0xb7efa997 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #46 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #47 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #48 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #49 0xb7ef8633 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #50 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #51 0xb7eff2de in rb_yield () from /usr/lib/libruby1.8.so.1.8 #52 0xb7edea1a in rb_ary_each () from /usr/lib/libruby1.8.so.1.8 #53 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #54 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #55 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #56 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #57 0xb7efa997 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #58 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #59 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #60 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #61 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #62 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #63 0xb7efc107 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #64 0xb7ef0523 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #65 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #66 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #67 0xb7ef8533 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #68 0xb7ef9fc7 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #69 0xb7efa446 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #70 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #71 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #72 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #73 0xb7f034e5 in rb_apply () from /usr/lib/libruby1.8.so.1.8 #74 0xb7ef9418 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #75 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #76 0xb7eff2de in rb_yield () from /usr/lib/libruby1.8.so.1.8 #77 0xb7edea1a in rb_ary_each () from /usr/lib/libruby1.8.so.1.8 #78 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #79 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #80 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #81 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #82 0xb7efa997 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #83 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #84 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #85 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #86 0xb7f034e5 in rb_apply () from /usr/lib/libruby1.8.so.1.8 #87 0xb7ef9418 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #88 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #89 0xb7eff2de in rb_yield () from /usr/lib/libruby1.8.so.1.8 #90 0xb7edea1a in rb_ary_each () from /usr/lib/libruby1.8.so.1.8 #91 0xb7ef050b in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #92 0xb7efaf8c in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #93 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #94 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #95 0xb7efa997 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #96 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #97 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #98 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #99 0xb7efa997 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #100 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #101 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #102 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #103 0xb7ef97e7 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #104 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #105 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #106 0xb7ef8578 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #107 0xb7ef97e7 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #108 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #109 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #110 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #111 0xb7ef97e7 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #112 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #113 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #114 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #115 0xb7ef85c6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #116 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #117 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #118 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #119 0xb7efb6a6 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #120 0xb7efba1f in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #121 0xb7ef86ab in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #122 0xb7ef84bb in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #123 0xb7efdfe0 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #124 0xb7efe9d2 in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #125 0xb7efeb3e in rb_need_block () from /usr/lib/libruby1.8.so.1.8 #126 0xb7effa48 in rb_exec_end_proc () from /usr/lib/libruby1.8.so.1.8 #127 0xb7effb23 in rb_exec_end_proc () from /usr/lib/libruby1.8.so.1.8 #128 0xb7f06f41 in ruby_cleanup () from /usr/lib/libruby1.8.so.1.8 #129 0xb7f070f3 in ruby_stop () from /usr/lib/libruby1.8.so.1.8 #130 0xb7f0778e in ruby_run () from /usr/lib/libruby1.8.so.1.8 #131 0x080486b1 in main ()