From priimak at highwire.stanford.edu Fri Sep 5 18:43:45 2008 From: priimak at highwire.stanford.edu (Dmitri Priimak) Date: Fri, 05 Sep 2008 15:43:45 -0700 Subject: [libxml-devel] Segmentation fault in document.rb Message-ID: <48C1B621.10308@highwire.stanford.edu> Hi Guys. I just got this error /var/lib/gems/1.8/gems/libxml-ruby-0.8.3/lib/libxml/document.rb:40: [BUG] Segmentation fault Could someone explain, what that is? -- Dmitri Priimak From cfis at savagexi.com Mon Sep 8 16:57:37 2008 From: cfis at savagexi.com (Charlie Savage) Date: Mon, 08 Sep 2008 14:57:37 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C1B621.10308@highwire.stanford.edu> References: <48C1B621.10308@highwire.stanford.edu> Message-ID: <48C591C1.4080304@savagexi.com> Dmitri, > I just got this error > > /var/lib/gems/1.8/gems/libxml-ruby-0.8.3/lib/libxml/document.rb:40: > [BUG] Segmentation fault > > Could someone explain, what that is? You'll have to provide a lot more information if you'd like help. At a minimum, a testcase, traceback, ruby info, libxml version info, operating system info, etc. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Mon Sep 8 17:58:47 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Mon, 08 Sep 2008 14:58:47 -0700 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C591C1.4080304@savagexi.com> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> Message-ID: <48C5A017.2040002@stanford.edu> Charlie Savage wrote: > Dmitri, > >> I just got this error >> >> /var/lib/gems/1.8/gems/libxml-ruby-0.8.3/lib/libxml/document.rb:40: >> [BUG] Segmentation fault >> >> Could someone explain, what that is? > > You'll have to provide a lot more information if you'd like help. At > a minimum, a testcase, traceback, ruby info, libxml version info, > operating system info, etc. > > Charlie libxml2.so.2.7.1 Red Hat Enterprise Linux AS release 4 (Nahant Update 7) ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux] That is all I have for now. However, the crashing is sporadic and consistent test case may not be possible. Can I even get a traceback on Segmentation fault ? -- Dmitri Priimak From cfis at savagexi.com Mon Sep 8 18:46:06 2008 From: cfis at savagexi.com (Charlie Savage) Date: Mon, 08 Sep 2008 16:46:06 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C5A017.2040002@stanford.edu> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> Message-ID: <48C5AB2E.5030803@savagexi.com> > libxml2.so.2.7.1 > Red Hat Enterprise Linux AS release 4 (Nahant Update 7) > ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux] > > That is all I have for now. However, the crashing is sporadic and > consistent test case may not be possible. > Can I even get a traceback on Segmentation fault ? What version of libxml-ruby? Traceback - sure. Make sure your version of libxml-ruby and libxml are compiled with debug information. Then enable crash dumps on your linux box. When you see a crash, inspect the crash dump. Alternatively, run your rails app under gdb (or attach to it with gdb) and set it up to break on an exception. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Mon Sep 8 19:02:22 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Mon, 08 Sep 2008 16:02:22 -0700 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C5AB2E.5030803@savagexi.com> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> Message-ID: <48C5AEFE.8090002@stanford.edu> Charlie Savage wrote: >> libxml2.so.2.7.1 >> Red Hat Enterprise Linux AS release 4 (Nahant Update 7) >> ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux] >> >> That is all I have for now. However, the crashing is sporadic and >> consistent test case may not be possible. >> Can I even get a traceback on Segmentation fault ? > > What version of libxml-ruby? libxml-ruby-0.8.3 > > Traceback - sure. Make sure your version of libxml-ruby and libxml > are compiled with debug information. Then enable crash dumps on your > linux box. When you see a crash, inspect the crash dump. > > Alternatively, run your rails app under gdb (or attach to it with gdb) > and set it up to break on an exception. Thanks, will do that next time. -- Dmitri Priimak From cfis at savagexi.com Mon Sep 8 22:58:37 2008 From: cfis at savagexi.com (Charlie Savage) Date: Mon, 08 Sep 2008 20:58:37 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C5AEFE.8090002@stanford.edu> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> Message-ID: <48C5E65D.7070704@savagexi.com> >> Traceback - sure. Make sure your version of libxml-ruby and libxml >> are compiled with debug information. Then enable crash dumps on your >> linux box. When you see a crash, inspect the crash dump. >> >> Alternatively, run your rails app under gdb (or attach to it with gdb) >> and set it up to break on an exception. > Thanks, will do that next time. FYI - most likely situation is you are using document.find to get an xpath object. Then the xpath object is being freed after the document. To fix that for now: results = document.find('/') ... results = nil GC.start Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Tue Sep 9 11:35:49 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 08:35:49 -0700 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C5E65D.7070704@savagexi.com> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> Message-ID: <48C697D5.8060402@stanford.edu> Charlie Savage wrote: > >>> Traceback - sure. Make sure your version of libxml-ruby and libxml >>> are compiled with debug information. Then enable crash dumps on your >>> linux box. When you see a crash, inspect the crash dump. >>> >>> Alternatively, run your rails app under gdb (or attach to it with >>> gdb) and set it up to break on an exception. >> Thanks, will do that next time. > > FYI - most likely situation is you are using document.find to get an > xpath object. Then the xpath object is being freed after the > document. To fix that for now: > > results = document.find('/') > ... > results = nil > GC.start I see, but why is that needed? Are you using some global variables to keep track of things? Also, it would be cool if one could do something like document.find('/') { |node| ... } With everything freeing and closed after all is done. By the way, I'd like to say that what you guys are doing is very cool and most useful. Thanks for all the work. Can't wait for version 1.0 -- Dmitri Priimak From cfis at savagexi.com Tue Sep 9 12:23:01 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 10:23:01 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C697D5.8060402@stanford.edu> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> Message-ID: <48C6A2E5.10503@savagexi.com> Dmitri Priimak wrote: > Charlie Savage wrote: >> >>>> Traceback - sure. Make sure your version of libxml-ruby and libxml >>>> are compiled with debug information. Then enable crash dumps on your >>>> linux box. When you see a crash, inspect the crash dump. >>>> >>>> Alternatively, run your rails app under gdb (or attach to it with >>>> gdb) and set it up to break on an exception. >>> Thanks, will do that next time. >> >> FYI - most likely situation is you are using document.find to get an >> xpath object. Then the xpath object is being freed after the >> document. To fix that for now: >> >> results = document.find('/') >> ... >> results = nil >> GC.start > I see, but why is that needed? Are you using some global variables to > keep track of things? In my view its a bug in Ruby's GC. See the emails I posted to ruby-core a couple month ago. There is a way to work around it, but haven't go to that yet. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Tue Sep 9 12:24:18 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 09:24:18 -0700 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C6A2E5.10503@savagexi.com> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> Message-ID: <48C6A332.1070506@stanford.edu> Charlie Savage wrote: > > > Dmitri Priimak wrote: >> Charlie Savage wrote: >>> >>>>> Traceback - sure. Make sure your version of libxml-ruby and >>>>> libxml are compiled with debug information. Then enable crash >>>>> dumps on your linux box. When you see a crash, inspect the crash >>>>> dump. >>>>> >>>>> Alternatively, run your rails app under gdb (or attach to it with >>>>> gdb) and set it up to break on an exception. >>>> Thanks, will do that next time. >>> >>> FYI - most likely situation is you are using document.find to get an >>> xpath object. Then the xpath object is being freed after the >>> document. To fix that for now: >>> >>> results = document.find('/') >>> ... >>> results = nil >>> GC.start >> I see, but why is that needed? Are you using some global variables to >> keep track of things? > > In my view its a bug in Ruby's GC. See the emails I posted to > ruby-core a couple month ago. There is a way to work around it, but > haven't go to that yet. Could you expand on it, workaround that is? Perhaps I could help you with it. Does this bug persist in Ruby 1.9? -- Dmitri Priimak From cfis at savagexi.com Tue Sep 9 12:29:49 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 10:29:49 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: <48C6A332.1070506@stanford.edu> References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> Message-ID: <48C6A47D.9040900@savagexi.com> >> In my view its a bug in Ruby's GC. See the emails I posted to >> ruby-core a couple month ago. There is a way to work around it, but >> haven't go to that yet. > Could you expand on it, workaround that is? Perhaps I could help you > with it. http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/17706?17500-18182+split-mode-vertical http://rubyforge.org/pipermail/libxml-devel/2008-July/001034.html > Does this bug persist in Ruby 1.9? I'm not sure, haven't checked. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From tbagby at kosmix.com Tue Sep 9 13:19:07 2008 From: tbagby at kosmix.com (Tom Bagby) Date: Tue, 9 Sep 2008 10:19:07 -0700 Subject: [libxml-devel] Segmentation fault in document.rb References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> <48C6A47D.9040900@savagexi.com> Message-ID: Actually, this seg fault occurs during normal running, not just when Ruby is exiting. We had a ton of crashing due to this. The order in which things are marked has nothing to do with the order in which things are freed, do enough work with different documents over many requests and we always eventually hit a segfault. I added some ref counting and we've been running for a while now with zero segfaults. Attached is a patch with what I did, the downside of which is that I disabled all the xpointer stuff which we don't use and I didn't have time/was unmotivated to fix. If you don't use xpointer this patch should solve your problems. -Tom -----Original Message----- From: libxml-devel-bounces at rubyforge.org on behalf of Charlie Savage Sent: Tue 9/9/2008 9:29 AM To: libxml-devel at rubyforge.org Subject: Re: [libxml-devel] Segmentation fault in document.rb >> In my view its a bug in Ruby's GC. See the emails I posted to >> ruby-core a couple month ago. There is a way to work around it, but >> haven't go to that yet. > Could you expand on it, workaround that is? Perhaps I could help you > with it. http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/17706?17500-18182+split-mode-vertical http://rubyforge.org/pipermail/libxml-devel/2008-July/001034.html > Does this bug persist in Ruby 1.9? I'm not sure, haven't checked. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 10638 bytes Desc: not available URL: From tbagby at kosmix.com Tue Sep 9 13:34:15 2008 From: tbagby at kosmix.com (Tom Bagby) Date: Tue, 9 Sep 2008 10:34:15 -0700 Subject: [libxml-devel] Segmentation fault in document.rb References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> <48C6A47D.9040900@savagexi.com> Message-ID: I just sent this with an attachment, which I've now removed because I think the list doesn't like me doing that: Actually, this seg fault occurs during normal running, not just when Ruby is exiting. We had a ton of crashing due to this. The order in which things are marked has nothing to do with the order in which things are freed, do enough work with different documents over many requests and we always eventually hit a segfault. I added some ref counting and we've been running for a while now with zero segfaults. I made a patch of what I did, the downside of which is that I disabled all the xpointer stuff which we don't use and I didn't have time/was unmotivated to fix. If you don't use xpointer this patch should solve your problems: http://www.acidlunchbox.com/bagby/ref_count.diff -Tom -----Original Message----- From: libxml-devel-bounces at rubyforge.org on behalf of Charlie Savage Sent: Tue 9/9/2008 9:29 AM To: libxml-devel at rubyforge.org Subject: Re: [libxml-devel] Segmentation fault in document.rb >> In my view its a bug in Ruby's GC. See the emails I posted to >> ruby-core a couple month ago. There is a way to work around it, but >> haven't go to that yet. > Could you expand on it, workaround that is? Perhaps I could help you > with it. http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/17706?17500-18182+split-mode-vertical http://rubyforge.org/pipermail/libxml-devel/2008-July/001034.html > Does this bug persist in Ruby 1.9? I'm not sure, haven't checked. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3809 bytes Desc: not available URL: From cfis at savagexi.com Tue Sep 9 13:59:02 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 11:59:02 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> <48C6A47D.9040900@savagexi.com> Message-ID: <48C6B966.5070806@savagexi.com> Hi Tom, Thanks for posting the patch. Tom Bagby wrote: > I just sent this with an attachment, which I've now removed because I think the list doesn't like me doing that: > > Actually, this seg fault occurs during normal running, not just when Ruby is exiting. We had a ton of crashing due to this. The order in which things are marked has nothing to do with the order in which things are freed, do enough work with different documents over many requests and we always eventually hit a segfault. I added some ref counting and we've been running for a while now with zero segfaults. I made a patch of what I did, the downside of which is that I disabled all the xpointer stuff which we don't use and I didn't have time/was unmotivated to fix. If you don't use xpointer this patch should solve your problems: > > http://www.acidlunchbox.com/bagby/ref_count.diff A quick review - that looks about right to me. Although why did you remove the setting of the @doc and @context and then put back the mark functions. Seems to me either way works fine - or am I missing something? As for the ref counting solution - I always leaned towards having doc and xpath objects have pointers to each other because I didn't think ref counting would work. However, I can't remember why I thought that off the top of my head and the logic looks ok to me. Anyway, I'd be happy to apply the patch but I'd like to get the mark functions straight first and of course the XPointer stuff. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From tbagby at kosmix.com Tue Sep 9 14:12:47 2008 From: tbagby at kosmix.com (Tom Bagby) Date: Tue, 9 Sep 2008 11:12:47 -0700 Subject: [libxml-devel] Segmentation fault in document.rb References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> <48C6A47D.9040900@savagexi.com> <48C6B966.5070806@savagexi.com> Message-ID: I had intended to clean this up and post it as a real patch, but never had the time. I believe the ivar stuff does work fine, its just something that I changed while I was digging around for what the problem was and I never backed it out. Ref-counting was the only solution I could come up with. The issue is that the xpath object *has* to be freed before the document, not the other way around. Since the GC doesn't free them in any particular order, that seemed the best way to make sure that the document did not get freed first. We make extremely heavy use of libxml in production with this patch and it completely got rid of all of our intermittent crashing in stress testing, so I feel pretty good about stability with this change. -Tom -----Original Message----- From: libxml-devel-bounces at rubyforge.org on behalf of Charlie Savage Sent: Tue 9/9/2008 10:59 AM To: libxml-devel at rubyforge.org Subject: Re: [libxml-devel] Segmentation fault in document.rb Hi Tom, Thanks for posting the patch. Tom Bagby wrote: > I just sent this with an attachment, which I've now removed because I think the list doesn't like me doing that: > > Actually, this seg fault occurs during normal running, not just when Ruby is exiting. We had a ton of crashing due to this. The order in which things are marked has nothing to do with the order in which things are freed, do enough work with different documents over many requests and we always eventually hit a segfault. I added some ref counting and we've been running for a while now with zero segfaults. I made a patch of what I did, the downside of which is that I disabled all the xpointer stuff which we don't use and I didn't have time/was unmotivated to fix. If you don't use xpointer this patch should solve your problems: > > http://www.acidlunchbox.com/bagby/ref_count.diff A quick review - that looks about right to me. Although why did you remove the setting of the @doc and @context and then put back the mark functions. Seems to me either way works fine - or am I missing something? As for the ref counting solution - I always leaned towards having doc and xpath objects have pointers to each other because I didn't think ref counting would work. However, I can't remember why I thought that off the top of my head and the logic looks ok to me. Anyway, I'd be happy to apply the patch but I'd like to get the mark functions straight first and of course the XPointer stuff. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 4381 bytes Desc: not available URL: From priimak at stanford.edu Tue Sep 9 14:25:33 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 11:25:33 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type Message-ID: <48C6BF9D.5040408@stanford.edu> One more thing. In addition to crashes I seem to be getting these errors at random times rb_gc_mark(): unknown data type 0x28(0xb7dce198) non object They appear in random places when of calling LibXML::XML classes and methods. Could someone elaborate on source of these errors. -- Dmitri Priimak From cfis at savagexi.com Tue Sep 9 14:32:32 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 12:32:32 -0600 Subject: [libxml-devel] Segmentation fault in document.rb In-Reply-To: References: <48C1B621.10308@highwire.stanford.edu> <48C591C1.4080304@savagexi.com> <48C5A017.2040002@stanford.edu> <48C5AB2E.5030803@savagexi.com> <48C5AEFE.8090002@stanford.edu> <48C5E65D.7070704@savagexi.com> <48C697D5.8060402@stanford.edu> <48C6A2E5.10503@savagexi.com> <48C6A332.1070506@stanford.edu> <48C6A47D.9040900@savagexi.com> <48C6B966.5070806@savagexi.com> Message-ID: <48C6C140.1060807@savagexi.com> Tom Bagby wrote: > I had intended to clean this up and post it as a real patch, but never had the time. I believe the ivar stuff does work fine, its just something that I changed while I was digging around for what the problem was and I never backed it out. Ref-counting was the only solution I could come up with. The issue is that the xpath object *has* to be freed before the document, not the other way around. Since the GC doesn't free them in any particular order, that seemed the best way to make sure that the document did not get freed first. Right, exactly. I had a scheme in mind where each time an xpath object was created, it would register itself with the document. Then when the document is freed, it would free the xpath objects first. You'd then have to add in a check for dangling references to the xpath objects though. Your way seems a bit cleaner, although I'd probably add an api to the document object (maybe something like incrementXPathRef, decrementXPathRef) just to keep it a bit more contained. > > We make extremely heavy use of libxml in production with this patch and it completely got rid of all of our intermittent crashing in stress testing, so I feel pretty good about stability with this change. That sounds great. Do you have a bit of time to cleanup it up and submit it? I'd love to solve this problem once and for all. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Tue Sep 9 14:33:04 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 12:33:04 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6BF9D.5040408@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> Message-ID: <48C6C160.9000505@savagexi.com> Dmitri Priimak wrote: > One more thing. > > In addition to crashes I seem to be getting these errors at random times > > rb_gc_mark(): unknown data type 0x28(0xb7dce198) non object > > They appear in random places when of calling LibXML::XML classes and > methods. > Could someone elaborate on source of these errors. That one I have never seen. Google turn up anything? Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Tue Sep 9 14:34:21 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 11:34:21 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6C160.9000505@savagexi.com> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> Message-ID: <48C6C1AD.6000604@stanford.edu> Charlie Savage wrote: > > > Dmitri Priimak wrote: >> One more thing. >> >> In addition to crashes I seem to be getting these errors at random times >> >> rb_gc_mark(): unknown data type 0x28(0xb7dce198) non object >> >> They appear in random places when of calling LibXML::XML classes and >> methods. >> Could someone elaborate on source of these errors. > > That one I have never seen. Google turn up anything? Yes. For example http://www.ruby-forum.com/topic/58150 Second message in there says: This means an object in your application referring a broken reference, i.e. a C extension you are using has a bug. -- Dmitri Priimak From priimak at stanford.edu Tue Sep 9 14:48:04 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 11:48:04 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6C1AD.6000604@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> Message-ID: <48C6C4E4.2000900@stanford.edu> Dmitri Priimak wrote: > Charlie Savage wrote: >> >> >> Dmitri Priimak wrote: >>> One more thing. >>> >>> In addition to crashes I seem to be getting these errors at random >>> times >>> >>> rb_gc_mark(): unknown data type 0x28(0xb7dce198) non object >>> >>> They appear in random places when of calling LibXML::XML classes and >>> methods. >>> Could someone elaborate on source of these errors. >> >> That one I have never seen. Google turn up anything? > Yes. For example http://www.ruby-forum.com/topic/58150 Second message > in there says: > > This means an object in your application referring a broken > reference, i.e. a C extension you are using has a bug. > > > -- > Dmitri Priimak > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel Actually it seem to always happen at parser.string = data where parser = LibXML::XML::Parser.new -- Dmitri Priimak From priimak at stanford.edu Tue Sep 9 16:39:20 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 13:39:20 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6C4E4.2000900@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> Message-ID: <48C6DEF8.8070302@stanford.edu> Dmitri Priimak wrote: > Dmitri Priimak wrote: >> Charlie Savage wrote: >>> >>> >>> Dmitri Priimak wrote: >>>> One more thing. >>>> >>>> In addition to crashes I seem to be getting these errors at random >>>> times >>>> >>>> rb_gc_mark(): unknown data type 0x28(0xb7dce198) non object >>>> >>>> They appear in random places when of calling LibXML::XML classes >>>> and methods. >>>> Could someone elaborate on source of these errors. >>> >>> That one I have never seen. Google turn up anything? >> Yes. For example http://www.ruby-forum.com/topic/58150 Second >> message in there says: >> >> This means an object in your application referring a broken >> reference, i.e. a C extension you are using has a bug. >> >> >> -- >> Dmitri Priimak >> >> _______________________________________________ >> libxml-devel mailing list >> libxml-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/libxml-devel > Actually it seem to always happen at > parser.string = data > > where parser = LibXML::XML::Parser.new Which ultimately originates in line 1049 in gc.c ( function gc_mark_children ) in ruby-1.8.7-p72 Any ideas? -- Dmitri Priimak From cfis at savagexi.com Tue Sep 9 16:43:39 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 14:43:39 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6DEF8.8070302@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> Message-ID: <48C6DFFB.4060503@savagexi.com> > Which ultimately originates in line 1049 in gc.c ( function > gc_mark_children ) in ruby-1.8.7-p72 > Any ideas? If you can easily duplicate the issue (sounds like it), then what I would do is run the program under a debugger like gdb or VC++. Put a breakpoint in the relevant mark functions defined in the libxml library. Then step through them and see if you can see what's going on. Or put a breakpoint in the ruby source code that is raising the exception, and inspect the backtrace. On Windows you won't have any luck with that since the Ruby executable doesn't have symbol information, but maybe your platform is different. Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Tue Sep 9 16:43:39 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Tue, 09 Sep 2008 13:43:39 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6DFFB.4060503@savagexi.com> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> Message-ID: <48C6DFFB.2010102@stanford.edu> Charlie Savage wrote: >> Which ultimately originates in line 1049 in gc.c ( function >> gc_mark_children ) in ruby-1.8.7-p72 >> Any ideas? > > If you can easily duplicate the issue (sounds like it), then what I > would do is run the program under a debugger like gdb or VC++. Put a > breakpoint in the relevant mark functions defined in the libxml > library. Then step through them and see if you can see what's going on. > > Or put a breakpoint in the ruby source code that is raising the > exception, and inspect the backtrace. On Windows you won't have any > luck with that since the Ruby executable doesn't have symbol > information, but maybe your platform is different. I will give it a try, but I haven't used gdb in ages. -- Dmitri Priimak From cfis at savagexi.com Tue Sep 9 16:47:02 2008 From: cfis at savagexi.com (Charlie Savage) Date: Tue, 09 Sep 2008 14:47:02 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6DFFB.2010102@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> Message-ID: <48C6E0C6.1030502@savagexi.com> Dmitri Priimak wrote: > Charlie Savage wrote: >>> Which ultimately originates in line 1049 in gc.c ( function >>> gc_mark_children ) in ruby-1.8.7-p72 >>> Any ideas? >> >> If you can easily duplicate the issue (sounds like it), then what I >> would do is run the program under a debugger like gdb or VC++. Put a >> breakpoint in the relevant mark functions defined in the libxml >> library. Then step through them and see if you can see what's going on. >> >> Or put a breakpoint in the ruby source code that is raising the >> exception, and inspect the backtrace. On Windows you won't have any >> luck with that since the Ruby executable doesn't have symbol >> information, but maybe your platform is different. > I will give it a try, but I haven't used gdb in ages. Good luck! I actually prefer a IDE for debugger - so you could always use Eclipse or NetBeans or some such if you'd like.... Another alternative is put in printf statements to standard out, but I think using a debugger is faster and provides a lot more information. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From noreply at rubyforge.org Wed Sep 10 03:03:40 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Wed, 10 Sep 2008 03:03:40 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21895 ] [Docs] require 'libxml' Message-ID: <20080910070340.1BAD31588030@rubyforge.org> Bugs item #21895, was opened at 2008-09-10 09:03 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21895&group_id=494 Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: Thomas Peklak (thomaspeklak) Assigned to: Nobody (None) Summary: [Docs] require 'libxml' Initial Comment: The README is misleading as the code samples say to use "require 'libxml'" which does not work when you do not include LibXML, too. I found in the comments of libxml.rb to use require 'xml' instead, see below # DEPRECATED: Use require 'xml' instead! # # include LibXML Please correct this, because the README is the first thing people look for code samples ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21895&group_id=494 From priimak at stanford.edu Wed Sep 10 20:14:50 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Wed, 10 Sep 2008 17:14:50 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C6E0C6.1030502@savagexi.com> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> Message-ID: <48C862FA.9080406@stanford.edu> Charlie Savage wrote: > > > Dmitri Priimak wrote: >> Charlie Savage wrote: >>>> Which ultimately originates in line 1049 in gc.c ( function >>>> gc_mark_children ) in ruby-1.8.7-p72 >>>> Any ideas? >>> >>> If you can easily duplicate the issue (sounds like it), then what I >>> would do is run the program under a debugger like gdb or VC++. Put >>> a breakpoint in the relevant mark functions defined in the libxml >>> library. Then step through them and see if you can see what's going >>> on. >>> >>> Or put a breakpoint in the ruby source code that is raising the >>> exception, and inspect the backtrace. On Windows you won't have any >>> luck with that since the Ruby executable doesn't have symbol >>> information, but maybe your platform is different. >> I will give it a try, but I haven't used gdb in ages. > > Good luck! I actually prefer a IDE for debugger - so you could always > use Eclipse or NetBeans or some such if you'd like.... > > Another alternative is put in printf statements to standard out, but I > think using a debugger is faster and provides a lot more information. Well, after some looking around it seems to me that the problem appears in ruby_xml_parser.c when calling rb_gc_mark(((rx_string_data *)rxp->data)->str); Is it possible that ((rx_string_data *)rxp->data)->str is not a proper RVALUE? Or at least without proper ruby type? -- Dmitri Priimak From cfis at savagexi.com Wed Sep 10 23:24:04 2008 From: cfis at savagexi.com (Charlie Savage) Date: Wed, 10 Sep 2008 21:24:04 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C862FA.9080406@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> <48C862FA.9080406@stanford.edu> Message-ID: <48C88F54.8040500@savagexi.com> Hi Dmitri, > > Well, after some looking around it seems to me that the problem appears > in ruby_xml_parser.c > when calling > rb_gc_mark(((rx_string_data *)rxp->data)->str); > > Is it possible that ((rx_string_data *)rxp->data)->str is not a proper > RVALUE? Or at least without proper ruby type? Hmm, could be. I don't much like that code (the string,io and file handling) - I think its can be simplified a lot. Anyway, do you have a test case? Or if not, do you know roughly what is causing this? Is this just a parser parsing a string, or maybe something more complicated is going on like a parser being reused? Charlie -- Charlie Savage http://cfis.savagexi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From priimak at stanford.edu Wed Sep 10 23:44:28 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Wed, 10 Sep 2008 20:44:28 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C88F54.8040500@savagexi.com> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> <48C862FA.9080406@stanford.edu> <48C88F54.8040500@savagexi.com> Message-ID: <48C8941C.1050005@stanford.edu> Charlie Savage wrote: > Hi Dmitri, >> >> Well, after some looking around it seems to me that the problem >> appears in ruby_xml_parser.c >> when calling >> rb_gc_mark(((rx_string_data *)rxp->data)->str); >> >> Is it possible that ((rx_string_data *)rxp->data)->str is not a >> proper RVALUE? Or at least without proper ruby type? > > Hmm, could be. I don't much like that code (the string,io and file > handling) - I think its can be simplified a lot. > > Anyway, do you have a test case? Or if not, do you know roughly what > is causing this? Is this just a parser parsing a string, or maybe > something more complicated is going on like a parser being reused? I do not have a test case. The code is just loading bunch of file one after another in the string, then parses it and extracts few elements using XPath. Peculiarity of the code is that I am loading couple of hundred thousands of those xml files. The error does not always gets triggered, but if I run it several of times, I certainly get the error after processing at least 30000 files, however sometimes I can load more 400000 without any problems. I do not know the cause of it yet. Will be looking into it tomorrow, but any help is appreciated. Please let me know if you have any ideas. By the way, I am unable to compile current svn trunk. Did you get the chance to incorporate patch provided by Tom Bugby? -- Dmitri Priimak From transfire at gmail.com Thu Sep 11 08:08:41 2008 From: transfire at gmail.com (Trans) Date: Thu, 11 Sep 2008 08:08:41 -0400 Subject: [libxml-devel] [ libxml-Bugs-21895 ] [Docs] require 'libxml' In-Reply-To: <20080910070340.1BAD31588030@rubyforge.org> References: <20080910070340.1BAD31588030@rubyforge.org> Message-ID: <4b6f054f0809110508q311dee97na3996d048775ed73@mail.gmail.com> On Wed, Sep 10, 2008 at 3:03 AM, wrote: > The README is misleading as the code samples say to use "require 'libxml'" which does not work when you do not include LibXML, too. > > I found in the comments of libxml.rb to use require 'xml' instead, see below > > # DEPRECATED: Use require 'xml' instead! > # > # include LibXML > > Please correct this, because the README is the first thing people look for code samples I see that someon already did some work on this, but I the README a bit more to give some of the rational between use cases. The first part of ==Usage now reads: == USAGE For in-depth information about using libxml-ruby please refer to its online Rdoc documentation. All libxml classes are in the LibXML::XML module. The most expedient way to use libxml is to require 'xml'. This will mixin the LibXML module into the global namespace, allowing you to write code like this: require 'xml' document = XML::Document.new However, when creating an application or library you plan to redistribute, it is best to not add the LibXML module to the global namespace, in which case you can either write your code like this: require 'libxml' document = LibXML::XML::Document.new or, more conveniently, utilize a proper namespace for you own work and include LibXML into it. For example: require 'libxml' mdoule MyApplication include LibXML class MyClass def some_method document = XML::Document.new end end end For simplicity's sake we will use require 'xml in the basic examples shown below. T. From cfis at savagexi.com Thu Sep 11 13:38:44 2008 From: cfis at savagexi.com (Charlie Savage) Date: Thu, 11 Sep 2008 11:38:44 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C8941C.1050005@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> <48C862FA.9080406@stanford.edu> <48C88F54.8040500@savagexi.com> <48C8941C.1050005@stanford.edu> Message-ID: <48C957A4.5030606@savagexi.com> > I do not have a test case. The code is just loading bunch of file one > after another in the string, then parses it and extracts few elements > using XPath. Peculiarity of the code is that I am loading couple of > hundred thousands of those xml files. The error does not always gets > triggered, but if I run it several of times, I certainly get the error > after processing at least 30000 files, however sometimes I can load more > 400000 without any problems. > > I do not know the cause of it yet. Will be looking into it tomorrow, but > any help is appreciated. > Please let me know if you have any ideas. > > By the way, I am unable to compile current svn trunk. Hmm, works here for me. What problems are you having. > > Did you get the chance to incorporate patch provided by Tom Bugby? No - I'm hoping he updates it (mostly I just don't have time for it right now). Feel free to update it and submit it if you'd like. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From noreply at rubyforge.org Thu Sep 11 13:37:53 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Thu, 11 Sep 2008 13:37:53 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-21918 ] find() returned Nodes in 0.5.x, Attrs in 0.8.3 with this expression Message-ID: <20080911173753.518951678115@rubyforge.org> Bugs item #21918, was opened at 2008-09-11 13:37 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21918&group_id=494 Category: General Group: None Status: Open Resolution: None Priority: 3 Submitted By: Jamie Orchard-Hays (jamieorc) Assigned to: Nobody (None) Summary: find() returned Nodes in 0.5.x, Attrs in 0.8.3 with this expression Initial Comment: With this expression: doc.find("/TEI.2/text/body//div1[@type!='header']/@id") 0.5.4 returned Nodes, while 0.8.3 returns Attrs. Is this an intentional change? I was calling content() on each item, which now blows up because they are Attrs, not Nodes. ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=21918&group_id=494 From priimak at stanford.edu Fri Sep 12 03:06:51 2008 From: priimak at stanford.edu (Dmitri Priimak) Date: Fri, 12 Sep 2008 00:06:51 -0700 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48C957A4.5030606@savagexi.com> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> <48C862FA.9080406@stanford.edu> <48C88F54.8040500@savagexi.com> <48C8941C.1050005@stanford.edu> <48C957A4.5030606@savagexi.com> Message-ID: <48CA150B.4050805@stanford.edu> Charlie Savage wrote: > >> I do not have a test case. The code is just loading bunch of file one >> after another in the string, then parses it and extracts few elements >> using XPath. Peculiarity of the code is that I am loading couple of >> hundred thousands of those xml files. The error does not always gets >> triggered, but if I run it several of times, I certainly get the >> error after processing at least 30000 files, however sometimes I can >> load more 400000 without any problems. >> >> I do not know the cause of it yet. Will be looking into it tomorrow, >> but any help is appreciated. >> Please let me know if you have any ideas. >> >> By the way, I am unable to compile current svn trunk. > > Hmm, works here for me. What problems are you having. ruby_xml_node.c:535: error: conflicting types for ?ruby_xml_node_last_get? ruby_xml_node.c:229: error: previous implicit declaration of ?ruby_xml_node_last_get? was here You did not define prototype for this function. See this patch http://www.stanford.edu/~priimak/patches/ruby-libxml.revision.526.patch.0 I wonder why you did not have any problems compiling it? >> >> Did you get the chance to incorporate patch provided by Tom Bugby? > > No - I'm hoping he updates it (mostly I just don't have time for it > right now). Feel free to update it and submit it if you'd like. > > Charlie > ------------------------------------------------------------------------ > > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel From cfis at savagexi.com Fri Sep 12 03:16:58 2008 From: cfis at savagexi.com (Charlie Savage) Date: Fri, 12 Sep 2008 01:16:58 -0600 Subject: [libxml-devel] what about rb_gc_mark(): unknown data type In-Reply-To: <48CA150B.4050805@stanford.edu> References: <48C6BF9D.5040408@stanford.edu> <48C6C160.9000505@savagexi.com> <48C6C1AD.6000604@stanford.edu> <48C6C4E4.2000900@stanford.edu> <48C6DEF8.8070302@stanford.edu> <48C6DFFB.4060503@savagexi.com> <48C6DFFB.2010102@stanford.edu> <48C6E0C6.1030502@savagexi.com> <48C862FA.9080406@stanford.edu> <48C88F54.8040500@savagexi.com> <48C8941C.1050005@stanford.edu> <48C957A4.5030606@savagexi.com> <48CA150B.4050805@stanford.edu> Message-ID: <48CA176A.7040902@savagexi.com> >> Hmm, works here for me. What problems are you having. > ruby_xml_node.c:535: error: conflicting types for ?ruby_xml_node_last_get? > ruby_xml_node.c:229: error: previous implicit declaration of > ?ruby_xml_node_last_get? was here > > You did not define prototype for this function. See this patch > http://www.stanford.edu/~priimak/patches/ruby-libxml.revision.526.patch.0 > > I wonder why you did not have any problems compiling it? Ah - I reverted that change locally (it was the whole discussion about <<)). I checked in my change to remove the call to ruby_xml_node_last_get. So resync, should be ok (I didn't apply the patch because its not needed anymore). Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From mguterl at gmail.com Thu Sep 25 13:16:10 2008 From: mguterl at gmail.com (Michael Guterl) Date: Thu, 25 Sep 2008 13:16:10 -0400 Subject: [libxml-devel] trunk failures and other miscellanea Message-ID: <944a03770809251016o27483663q457c0f75e6cdd579@mail.gmail.com> I've been working on creating a libxml-jruby gem, which emulates the libxml-ruby interface by wrapping Java classes. My main focus at this point is to be able to install libxml-jruby under JRuby and use it as if it were libxml-ruby under MRI. Throughout my exploration of libxml-ruby I've come across a few things that I may need some clarification on. First and foremost, there are a few tests in trunk that are either resulting in failure or error. I have attached two patches that fixes one failure and one error. I was going to post these on libxml-ruby's bug tracker on rubyforge, however, I wasn't sure if patches for trunk really belong there..? The first patch 0001-fixed-assertions-in-NS-tests.patch fixes some incorrectly constructed assertions. The second patch 0002-make-sure-stringio-is-required-prior-to-creating-an.patch makes sure to require 'stringio' prior to creating an instance in the parser tests. While scanning the libxml-ruby forums on Rubyforge I came across this thread http://rubyforge.org/forum/forum.php?thread_id=24497&forum_id=2129 that asks about creating an XML::Document from a string. In this thread Charlie Savage asked if Phillip Bogle could provide a patch to add the HPricot helper methods. I went ahead and created patches for Phillip's enhancements: http://rubyforge.org/tracker/index.php?func=detail&aid=22147&group_id=494&atid=1973 http://rubyforge.org/tracker/index.php?func=detail&aid=22148&group_id=494&atid=1973 I've run across a few other typos in the documentation, I'll be working up patches for these later this week. Thanks, Michael Guterl -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fixed-assertions-in-NS-tests.patch Type: application/octet-stream Size: 1085 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-make-sure-stringio-is-required-prior-to-creating-an.patch Type: application/octet-stream Size: 786 bytes Desc: not available URL: From stephan at spaceboyz.net Sun Sep 28 20:00:54 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Mon, 29 Sep 2008 02:00:54 +0200 Subject: [libxml-devel] [PATCH] ruby_xml_sax_parser_new_push_parser Message-ID: <20080929000053.GC3610@chronos.sin> Hello libxml-ruby's SAX parser interface is only of limited use currently. The reason is that the document to be parsed must be passed as a whole. This is somewhat contrary to the actual reason for the use of a SAX parser. The attached patch adds a new constructor to use the parser in the so-called "push mode" where it won't complain about the document to be unfinished when parsing subsequent parts of it. My goal was the parsing of a network XML stream (XMPP). Here, the document is being provided continually over time and I need the SAX events long before the document ends. :-) Unfortunately, the libxml parser seems to buffer a lot and I don't receive all events of the current string immediately. This behaviour renders it unsuitable for XMPP and I will stick to the apparently antique expat binding unless someone comes up with a better idea. Stephan -------------- next part -------------- >From 7b70b3f4bd2c212c7750ca22f175b1510451825b Mon Sep 17 00:00:00 2001 From: Stephan Maka Date: Sat, 27 Sep 2008 00:32:07 +0200 Subject: [PATCH] ruby_xml_sax_parser_new_push_parser --- ext/libxml/ruby_xml_sax_parser.c | 50 +++++++++++++++++++++++++++++++++++-- ext/libxml/ruby_xml_sax_parser.h | 1 + 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/ext/libxml/ruby_xml_sax_parser.c b/ext/libxml/ruby_xml_sax_parser.c index 26921c6..2515dbf 100644 --- a/ext/libxml/ruby_xml_sax_parser.c +++ b/ext/libxml/ruby_xml_sax_parser.c @@ -33,6 +33,12 @@ ruby_xml_sax_parser_free(ruby_xml_sax_parser *nodesp) { /* Apparently this isn't needed: time will tell */ /* if (nodesp->xsh != NULL) */ /* xmlFreeSax_Parser(nodesp->sax_parser); */ + + if (nodesp->push_parser && nodesp->xpc) + { + xmlParseChunk(nodesp->xpc, NULL, 0, 1); + xmlFreeParserCtxt(nodesp->xpc); + } } void @@ -61,6 +67,7 @@ ruby_xml_sax_parser_new(VALUE class) { ruby_xml_sax_parser *nodesp; nodesp = ALLOC(ruby_xml_sax_parser); + nodesp->push_parser = 0; nodesp->xsh = &rubySAXHandlerStruct; nodesp->callbackHandler = Qnil; @@ -72,6 +79,28 @@ ruby_xml_sax_parser_new(VALUE class) { ruby_xml_sax_parser_free, nodesp)); } +/* + * call-seq: + * XML::SaxParser.new -> sax_parser + * + * Create a new XML::SaxParser instance. + */ +VALUE +ruby_xml_sax_parser_new_push_parser(VALUE class) { + ruby_xml_sax_parser *nodesp; + + nodesp = ALLOC(ruby_xml_sax_parser); + nodesp->push_parser = 1; + nodesp->xsh = &rubySAXHandlerStruct; + + nodesp->callbackHandler = Qnil; + nodesp->xpc = NULL; + nodesp->filename = Qnil; + nodesp->str = Qnil; + + return(Data_Wrap_Struct(class, ruby_xml_sax_parser_mark, + ruby_xml_sax_parser_free, nodesp)); +} /* * call-seq: @@ -154,9 +183,23 @@ ruby_xml_sax_parser_parse(VALUE self) { status = xmlSAXUserParseFile(nodesp->xsh, nodesp, StringValuePtr(nodesp->filename)); } else if (nodesp->str != Qnil) { str = StringValuePtr(nodesp->str); - status = //ruby_xml_document_new(cXMLDocument, - xmlSAXUserParseMemory(nodesp->xsh, nodesp, - str, strlen(str)); //); + if (nodesp->push_parser) + { + if (!nodesp->xpc) + { + printf("new push: %s\n", str); + status = (nodesp->xpc = xmlCreatePushParserCtxt(nodesp->xsh, nodesp, str, strlen(str), "push.xml")) == NULL; + } + else + { + printf("push: %s\n", str); + status = xmlParseChunk(nodesp->xpc, str, strlen(str), 0) <= 0; + } + } + else + status = //ruby_xml_document_new(cXMLDocument, + xmlSAXUserParseMemory(nodesp->xsh, nodesp, + str, strlen(str)); //); } /* XXX This should return an exception for the various error codes @@ -417,6 +460,7 @@ ruby_init_xml_sax_parser(void) { /* SaxParser */ rb_define_singleton_method(cXMLSaxParser, "new", ruby_xml_sax_parser_new, 0); + rb_define_singleton_method(cXMLSaxParser, "new_push_parser", ruby_xml_sax_parser_new_push_parser, 0); rb_define_method(cXMLSaxParser, "filename", ruby_xml_sax_parser_filename_get, 0); diff --git a/ext/libxml/ruby_xml_sax_parser.h b/ext/libxml/ruby_xml_sax_parser.h index 2d20ad0..8593b7a 100644 --- a/ext/libxml/ruby_xml_sax_parser.h +++ b/ext/libxml/ruby_xml_sax_parser.h @@ -41,6 +41,7 @@ typedef struct ruby_xml_sax_parser_callbacks { */ typedef struct ruby_xml_sax_parser { + int push_parser; xmlParserCtxtPtr xpc; xmlSAXHandlerPtr xsh; //ruby_xml_sax_parser_callbacks *cbp; -- 1.5.6.5 From stephan at spaceboyz.net Mon Sep 29 12:21:56 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Mon, 29 Sep 2008 18:21:56 +0200 Subject: [libxml-devel] Git repository? In-Reply-To: <200808222114.24823.erik@hollensbe.org> References: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> <200808222114.24823.erik@hollensbe.org> Message-ID: <20080929162156.GE3610@chronos.sin> Erik Hollensbe wrote: > Or they could just use git-svn, which makes these kinds of things trivial and > doesn't impose personal choices on others. Just pushed: http://github.com/astro/libxml-ruby/ From transfire at gmail.com Sun Sep 28 11:34:15 2008 From: transfire at gmail.com (Trans) Date: Sun, 28 Sep 2008 11:34:15 -0400 Subject: [libxml-devel] Git repository? In-Reply-To: <20080929162156.GE3610@chronos.sin> References: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> <200808222114.24823.erik@hollensbe.org> <20080929162156.GE3610@chronos.sin> Message-ID: <4b6f054f0809280834p38855bd3u740576d50e57703b@mail.gmail.com> On Mon, Sep 29, 2008 at 12:21 PM, Stephan Maka wrote: > Erik Hollensbe wrote: >> Or they could just use git-svn, which makes these kinds of things trivial and >> doesn't impose personal choices on others. > > Just pushed: http://github.com/astro/libxml-ruby/ How do you plan to keep this in sync with the SVN repo? T. From stephan at spaceboyz.net Mon Sep 29 13:12:39 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Mon, 29 Sep 2008 19:12:39 +0200 Subject: [libxml-devel] Git repository? In-Reply-To: <4b6f054f0809280834p38855bd3u740576d50e57703b@mail.gmail.com> References: <710d8140-df83-4e33-893f-39ef0f5ce74b@k13g2000hse.googlegroups.com> <200808222114.24823.erik@hollensbe.org> <20080929162156.GE3610@chronos.sin> <4b6f054f0809280834p38855bd3u740576d50e57703b@mail.gmail.com> Message-ID: <20080929171239.GF3610@chronos.sin> Trans wrote: > > Just pushed: http://github.com/astro/libxml-ruby/ > How do you plan to keep this in sync with the SVN repo? Everybody may use git svn fetch on his own. :-) There's a link to the homepage and I hope people won't take it for the official repository (which is quite a problem with Git). The repo is just a starting point so cloners won't have to fetch the whole SVN history themselves. From cfis at savagexi.com Sun Sep 28 12:08:25 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sun, 28 Sep 2008 10:08:25 -0600 Subject: [libxml-devel] trunk failures and other miscellanea In-Reply-To: <944a03770809251016o27483663q457c0f75e6cdd579@mail.gmail.com> References: <944a03770809251016o27483663q457c0f75e6cdd579@mail.gmail.com> Message-ID: <48DFABF9.3080405@savagexi.com> Hi Michael, > > First and foremost, there are a few tests in trunk that are either > resulting in failure or error. I have attached two patches that fixes > one failure and one error. I was going to post these on libxml-ruby's > bug tracker on rubyforge, however, I wasn't sure if patches for trunk > really belong there..? Sounds good. Put them in the patch tracker that's on the rubyforge site - and I'll take a look at them from there. > While scanning the libxml-ruby forums on Rubyforge I came across this > thread http://rubyforge.org/forum/forum.php?thread_id=24497&forum_id=2129 > that asks about creating an XML::Document from a string. In this > thread Charlie Savage asked if Phillip Bogle could provide a patch to > add the HPricot helper methods. I went ahead and created patches for > Phillip's enhancements: > > http://rubyforge.org/tracker/index.php?func=detail&aid=22147&group_id=494&atid=1973 > http://rubyforge.org/tracker/index.php?func=detail&aid=22148&group_id=494&atid=1973 Ah, interesting. Will take a look. > > I've run across a few other typos in the documentation, I'll be > working up patches for these later this week. Excellent - thanks for the patches. Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From cfis at savagexi.com Sun Sep 28 12:10:43 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sun, 28 Sep 2008 10:10:43 -0600 Subject: [libxml-devel] [PATCH] ruby_xml_sax_parser_new_push_parser In-Reply-To: <20080929000053.GC3610@chronos.sin> References: <20080929000053.GC3610@chronos.sin> Message-ID: <48DFAC83.7000003@savagexi.com> > libxml-ruby's SAX parser interface is only of limited use currently. The > reason is that the document to be parsed must be passed as a whole. This > is somewhat contrary to the actual reason for the use of a SAX parser. > The attached patch adds a new constructor to use the parser in the > so-called "push mode" where it won't complain about the document to be > unfinished when parsing subsequent parts of it. > > My goal was the parsing of a network XML stream (XMPP). Here, the > document is being provided continually over time and I need the SAX > events long before the document ends. :-) > > Unfortunately, the libxml parser seems to buffer a lot and I don't > receive all events of the current string immediately. This behaviour > renders it unsuitable for XMPP and I will stick to the apparently > antique expat binding unless someone comes up with a better idea. Does the underlying libxml library support this usage? If yes, then I'm sure we can figure out how to support it. I do wonder though - if you're parsing xml as it is streamed in, how do you avoid ill-formed xml (say only partially finished elements " From stephan at spaceboyz.net Mon Sep 29 14:03:04 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Mon, 29 Sep 2008 20:03:04 +0200 Subject: [libxml-devel] [PATCH] ruby_xml_sax_parser_new_push_parser In-Reply-To: <48DFAC83.7000003@savagexi.com> References: <20080929000053.GC3610@chronos.sin> <48DFAC83.7000003@savagexi.com> Message-ID: <20080929180304.GG3610@chronos.sin> Charlie Savage wrote: > Does the underlying libxml library support this usage? If yes, then I'm > sure we can figure out how to support it. I've found no appropriate options in libxml2 so far. > I do wonder though - if you're parsing xml as it is streamed in, how do you > avoid ill-formed xml (say only partially finished elements "' before raising the end_element event. BTW, my patch might not be perfect yet as xmlParseChunk() is called with finalize=1 only in the destructor. There's currently no way to reach that from the Ruby side except by waiting for the GC. Stephan From mguterl at gmail.com Sun Sep 28 13:22:36 2008 From: mguterl at gmail.com (Michael Guterl) Date: Sun, 28 Sep 2008 13:22:36 -0400 Subject: [libxml-devel] trunk failures and other miscellanea In-Reply-To: <48DFABF9.3080405@savagexi.com> References: <944a03770809251016o27483663q457c0f75e6cdd579@mail.gmail.com> <48DFABF9.3080405@savagexi.com> Message-ID: <944a03770809281022j5daad506w1ec9876653a541b9@mail.gmail.com> Hi Charlie, 2008/9/28 Charlie Savage : > Hi Michael, > > >> >> First and foremost, there are a few tests in trunk that are either >> resulting in failure or error. I have attached two patches that fixes >> one failure and one error. I was going to post these on libxml-ruby's >> bug tracker on rubyforge, however, I wasn't sure if patches for trunk >> really belong there..? > > Sounds good. Put them in the patch tracker that's on the rubyforge site - > and I'll take a look at them from there. > I posted these patches to the tracker. http://rubyforge.org/tracker/index.php?func=detail&aid=22206&group_id=494&atid=1973 http://rubyforge.org/tracker/index.php?func=detail&aid=22208&group_id=494&atid=1973 > > While scanning the libxml-ruby forums on Rubyforge I came across this >> >> thread http://rubyforge.org/forum/forum.php?thread_id=24497&forum_id=2129 >> that asks about creating an XML::Document from a string. In this >> thread Charlie Savage asked if Phillip Bogle could provide a patch to >> add the HPricot helper methods. I went ahead and created patches for >> Phillip's enhancements: >> >> >> http://rubyforge.org/tracker/index.php?func=detail&aid=22147&group_id=494&atid=1973 >> >> http://rubyforge.org/tracker/index.php?func=detail&aid=22148&group_id=494&atid=1973 > > Ah, interesting. Will take a look. >> >> I've run across a few other typos in the documentation, I'll be >> working up patches for these later this week. > > Excellent - thanks for the patches. > Michael Guterl From stephan at spaceboyz.net Sun Sep 28 17:42:22 2008 From: stephan at spaceboyz.net (Stephan Maka) Date: Sun, 28 Sep 2008 23:42:22 +0200 Subject: [libxml-devel] REXML-dropin, a REXML replacement using libxml-ruby Message-ID: <20080928214222.GH3610@chronos.sin> Are you tired of strange REXML behaviour? Do you need twice the performance when dealing with XML? Are you stuck with the REXML API because your code uses it everywhere? You're not alone. I don't mean any offense, and I deeply respect Sean Russell's work and the amount of time that he has put into the project. I won't be able to do better, but have had problems more than one time with REXML. In XMPP4R we're bound to the very intuitive REXML API because that's what XMPP4R is about: dealing with XML. I thought the only way out would be resembling it. That's why I have started REXML-dropin, a wrapper around libxml: http://github.com/astro/rexml-dropin/ To be honest, I'm not convinced by libxml-ruby's code, but it looks way cleaner than REXML. So far, I'm able to run a basic XMPP client with it. There's a lot left to be done, so go ahead and fork it! Dependencies: * libxml-parser-ruby: http://www.yoshidam.net/Ruby.html#xmlparser * libxml-ruby beyond 0.8.3/SVN r524: http://libxml.rubyforge.org/ * require 'rexml' first! Stephan Maka From sean at chittenden.org Sun Sep 28 18:21:55 2008 From: sean at chittenden.org (Sean Chittenden) Date: Sun, 28 Sep 2008 15:21:55 -0700 Subject: [libxml-devel] REXML-dropin, a REXML replacement using libxml-ruby In-Reply-To: <20080928214222.GH3610@chronos.sin> References: <20080928214222.GH3610@chronos.sin> Message-ID: > Are you tired of strange REXML behaviour? Do you need twice the > performance when dealing with XML? Are you stuck with the REXML API > because your code uses it everywhere? You're not alone. I've long fantisized about this day. Nice work! This'll likely bring additional eyes to the code too, very big win for all! -sc -- Sean Chittenden sean at chittenden.org From cfis at savagexi.com Sun Sep 28 20:25:41 2008 From: cfis at savagexi.com (Charlie Savage) Date: Sun, 28 Sep 2008 18:25:41 -0600 Subject: [libxml-devel] REXML-dropin, a REXML replacement using libxml-ruby In-Reply-To: <20080928214222.GH3610@chronos.sin> References: <20080928214222.GH3610@chronos.sin> Message-ID: <48E02085.8010609@savagexi.com> > To be honest, I'm not convinced by libxml-ruby's code, but it looks > way cleaner than REXML. So far, I'm able to run a basic XMPP client > with it. There's a lot left to be done, so go ahead and fork it! All suggestions on improving libxml-ruby are of course appreciated...Some of the things that I think would be good are: * Better error handling (I have a bunch of changes that I need to check in) * Better namespace handling - the current api is confusing * Internal cleanup of reading data from files, strings or documents * When possible move code from C to Ruby to make it more accessible to people Charlie -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3237 bytes Desc: S/MIME Cryptographic Signature URL: From rakaur at malkier.net Tue Sep 30 12:55:08 2008 From: rakaur at malkier.net (Eric Will) Date: Tue, 30 Sep 2008 12:55:08 -0400 Subject: [libxml-devel] SAX error handling Message-ID: <1ce38ef40809300955i348ff34aqea76bfa0e0c53b22@mail.gmail.com> Is there a way to get a nice useful error like you get with XML::Parser but with XML::SaxParser? The whole on_parser_error() thing is almost totally useless for identifying where the problem is. I need to find the spot the error is, and cache that tag for later. Thanks Eric Will // rakaur From noreply at rubyforge.org Sat Sep 20 13:56:10 2008 From: noreply at rubyforge.org (noreply at rubyforge.org) Date: Sat, 20 Sep 2008 13:56:10 -0400 (EDT) Subject: [libxml-devel] [ libxml-Bugs-22071 ] dgbszetbjdHw Message-ID: <20080920175610.AC2AE185859F@rubyforge.org> Bugs item #22071, was opened at 2008-09-20 13:56 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22071&group_id=494 Category: General Group: pre 0.5 Status: Open Resolution: None Priority: 3 Submitted By: Nobody (None) Assigned to: Nobody (None) Summary: dgbszetbjdHw Initial Comment: Gf5Fk5 luqgqzpqkncs, [url=http://hpooqgiovhcq.com/]hpooqgiovhcq[/url], [link=http://wgbdbbqdnups.com/]wgbdbbqdnups[/link], http://qnpbsbgepyhp.com/ ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22071&group_id=494