[libxml-devel] [ libxml-Bugs-22909 ] LibXML::XML::XPath::Object segfault (null pointer) on x86-64

noreply at rubyforge.org noreply at rubyforge.org
Mon Nov 24 14:11:29 EST 2008


Bugs item #22909, was opened at 2008-11-20 12:47
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22909&group_id=494

Category: General
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 3
Submitted By: Bradley Buda (bradleybuda)
>Assigned to: Charlie Savage (cfis)
Summary: LibXML::XML::XPath::Object segfault (null pointer) on x86-64

Initial Comment:
This script results in a ruby "[BUG] Segmentation fault" on a 64-bit machine, but works on a 32-bit machine:

----

#!/usr/bin/ruby

require 'rubygems'
require 'libxml'

x = LibXML::XML::Parser.string("<root />").parse
x.find("/root") # if you comment out this line, script will NOT segfault
x.find("/root").length  # segfault occurs here

----

valgrind and gdb agree on this stack trace:
#0  0x00002aaaaca47bc7 in ruby_xml_xpath_object_empty_q (self=46912524542400) at ruby_xml_xpath_object.c:174
#1  0x00002aaaaca47c59 in ruby_xml_xpath_object_length (self=46912524542400) at ruby_xml_xpath_object.c:242
#2  0x00002aaaaacff48f in ?? () from /usr/lib/libruby1.8.so.1.8
#3  0x00002aaaaacff7b8 in ?? () from /usr/lib/libruby1.8.so.1.8
#4  0x00002aaaaad055b7 in ?? () from /usr/lib/libruby1.8.so.1.8
#5  0x00002aaaaad0dbbb in ?? () from /usr/lib/libruby1.8.so.1.8
#6  0x00002aaaaad0dc05 in ruby_exec () from /usr/lib/libruby1.8.so.1.8
#7  0x00002aaaaad0dc30 in ruby_run () from /usr/lib/libruby1.8.so.1.8
#8  0x0000000000400883 in main ()

Unfortunately I don't know enough about the Ruby C API to understand what's going wrong here.  My environment:

(note that this is a Xen node on Amazon EC2)
$ uname -a
Linux ...compute-1.amazonaws.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 GNU/Linux

$ lsb_release  -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 8.04.1
Release:        8.04
Codename:       hardy

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]

$ gem list --local libxml-ruby

*** LOCAL GEMS ***

libxml-ruby (0.9.2)

$ aptitude show libxml2-dev
Package: libxml2-dev
State: installed
Automatically installed: yes
Version: 2.6.31.dfsg-2ubuntu1.3
...

----------------------------------------------------------------------

>Comment By: Charlie Savage (cfis)
Date: 2008-11-24 12:11

Message:
Ok, good.  I will close this bug then.

----------------------------------------------------------------------

Comment By: Bradley Buda (bradleybuda)
Date: 2008-11-24 09:30

Message:
Yes, the original test script works just fine on the same
amd64 host with gem version 0.9.3.

Thank you again for the amazingly quick response and fix!

----------------------------------------------------------------------

Comment By: Jens Wille (jwille)
Date: 2008-11-24 02:20

Message:
excellent, charlie! now we're happy again :-)

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-22 02:41

Message:
Hi everyone,

Try the 0.9.3 release, hopefully fixes the compile issue and segmentation fault issue.

----------------------------------------------------------------------

Comment By: Jens Wille (jwille)
Date: 2008-11-21 04:34

Message:
hi charlie!

i had the same error on our 64-bit machines. it has been fixed in r610, great! but now (introduced in r611) i get several static vs. non-static mismatches:

ruby_xml_xpointer.c:20: error: static declaration of ‘rxml_xpointer_point’ follows non-static declaration
ruby_xml_xpointer.h:23: error: previous declaration of ‘rxml_xpointer_point’ was here

adding the appropriate statics to the header files as well fixes this. i can provide a patch, but it's probably not worth it ;-)

so will there be a new release soon? because we can't use libxml-ruby > 0.8.3 because of that. anyway, let me add that you're work is really awesome! we're relying pretty heavily on your libxml bindings :-)

cheers
jens

p.s.: sorry if i'm hijacking this thread. i just thought it's too tiny an issue to open up a new ticket ;-)

----------------------------------------------------------------------

Comment By: Bradley Buda (bradleybuda)
Date: 2008-11-21 00:42

Message:
This is an AMD box.

I had a fair amount of time to play with this today.  I
learned that there's definitely something weird going on
with compiler optimizations.  I had been compiling
libxml-ruby's C extensions with -O2 (this was the default,
set by rbconfig.rb on my Ubuntu box).  I found that changing
the compile flag to -O0 for JUST ruby_xml_xpath_object.c
caused the segfault to go away.  Setting -O1 for
ruby_xml_xpath_object.c makes the segfault reappear.  I
tried each of the piecemeal -f... optimization options, but
none seemed to make a difference.

In my debugging I also accidentally discovered a printf
'patch' that makes the bug go away, at any optimization
level (see attached).  The printf statement must be
preventing the compiler from performing the harmful
optimization.  Of course this isn't a real patch, but it
might provide a clue for someone who knows more about these
things than I do.

For now, I can work around the issue by compiling with -O0.
 I wish I could give a better explanation for why this is
happening, but I'm getting in pretty far over my head here
:-).  I attached the generated assembler code at -O0 and -O1
as well just in case some compiler guru happens to stumble
on this bug.

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-20 13:48

Message:
FYI - is this an intel or amd box?  As for those old threads, I think this it different (libxml-ruby's internal architecture is much, much different than it used to be).

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-20 13:44

Message:
Hey Bradely,

Sounds good.  From the bug, it looks like this is what is happening:

ruby_xml_xpath_object_empty_q, line 169

  /* Get the c object that the ruby object (self) is wrapping */
  Data_Get_Struct(self,xmlXPathObject,xpop);

/* Looks like xpop is null here when it should not be, but how can that be?  So verify xpop really is null */
  if (xpop->type != XPATH_NODESET)
    return Qnil;

To help you understand the code:

doc.find:

1.  In ruby, create a new XPathContext Object.
2.  In C, call find which does the xpath result
3.  Returnx XPathObject instance wrapped by Ruby object

Since you do that twice, you have 2 XPathContext objects and two XPathObject results.  What is weird, is the first one shouldn't matter since you don't use it at all.  It may or may not be freed deepending on the GC (actually it would be really surpising for it to be freed so quickly). 

The various scenarios (none that plausible) I can think of:

* The first XPathObject is freed, and somehow deletes its associated document (shouldn't happen of course) making the 2nd xpath object invalid

* The second XPathObject is in fact the same as the first.  Not sure how that could be, unless libxml is caching XPath return results (its not that I know of)

* Freeing the first XPathObject somehow corrupts the second.

Or, none of the above, but the only way to find out is do some digging with gdb I think.

Thanks for your help.

----------------------------------------------------------------------

Comment By: Bradley Buda (bradleybuda)
Date: 2008-11-20 13:35

Message:
Yes, it's 100% consistent.  I haven't gotten any further
with LibXML, so I don't know if there are other test cases
that would show similar results - I can try to put something
together.

Thanks for the pointer to ruby_xml_document.c - I can look
at that code as a start.  I know C (I'm a bit rusty) it's
just the Ruby API that I don't know as well.

In my random Googling I found this (old) thread and patch:
http://rubyforge.org/pipermail/libxml-devel/2007-March/000288.html
http://rubyforge.org/pipermail/libxml-devel/attachments/20070309/a8c53f37/attachment.obj

Any guesses as to whether or not this could be in the same
class of problems?

I should have some time soon (maybe this weekend?) to dig
deeper into the code and start to understand how the
allocation and garbage collection works.  I'll update the
bug with whatever I figure out.

Thanks for the quick reply.

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-20 13:01

Message:
Hi Bradely,

Boy, that's interesting.  So it always happens, without fail?  Do you see other things like that happening?

My best guess is somehow the reference counting scheme that is used between xpath objects and documents is broken on 64 bit machines (its in ruby_xml_document.c, the top 150 lines or so).

I don't have any 64-bit machines setup here, so not sure how to debug.  Can you recompile code on EC2?  Are you a C hacker and have time to work through this?  Just trying to figure out how to proceed.

Thanks for the great bug report and stack trace, very helpful.



----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22909&group_id=494


More information about the libxml-devel mailing list