[libxml-devel] [ libxml-Bugs-22531 ] XML::Reader does not work with sockets

noreply at rubyforge.org noreply at rubyforge.org
Mon Nov 24 14:10:53 EST 2008


Bugs item #22531, was opened at 2008-10-23 07:06
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22531&group_id=494

Category: None
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Han Holl (hanholl)
Assigned to: Nobody (None)
Summary: XML::Reader does not work with sockets

Initial Comment:
I tried with the two following programs:
A server:
#!/usr/bin/ruby -w

require 'rubygems'
require 'xml'
require 'socket'


server = TCPServer.new(22222)
while session = server.accept
  reader = XML::Reader.io(session)
  loop do
    rsl = reader.read
    puts rsl
    break if rsl != 1
    puts reader.expand
  end
  session.puts 'ok'
  sleep 1
  session.close
end
# end-of-server
and a client:
#!/usr/bin/ruby

require 'socket'

t = TCPSocket.new('localhost', 22222)

t.puts '<doc><a>k</a></doc>'
puts t.gets
# end-of-client

Output from server:
Entity: line 1:
parser
error :
Extra content at the end of the document

^
-1

I tried different platforms (RH9, CentOS 5.1 and Fedora8).
libxml-ruby-0.8.3
Various libxml2 versions

----------------------------------------------------------------------

>Comment By: Charlie Savage (cfis)
Date: 2008-11-24 12:10

Message:
Hi Han,

Look at ruby_xml_input.c.  The relevant code is this:

/* This method is called by libxml when it wants to read
   more data from a stream. We go with the duck typing
   solution to support StringIO objects. */
int rxml_read_callback(void *context, char *buffer, int len) {
  VALUE io = (VALUE)context;
  VALUE string = rb_funcall(io, READ_METHOD, 1, INT2NUM(len));
  int size;
 
  if(string == Qnil)
    return 0;
  
  size = RSTRING_LEN(string);
  memcpy(buffer, StringValuePtr(string), size);

  return size;
}

What this means:

1.  Pass an io object to libxml (reader = Reader.io(my_io))
2.  When it needs more data, libxml will call:
my_io.read(<length)
3.  my_io should implement

def read(length)
end

Make more sense?

I agree with you - I'd hope libxml only calls read when it needs more data (assumedly have you've called reader.read). 

----------------------------------------------------------------------

Comment By: Han Holl (hanholl)
Date: 2008-11-24 03:32

Message:
I don't quite understand. I hoped, reading the libxml2 docs,
that I could read a node from a stream. IOW that the reader
would return as soon as it saw that it had received a
complete node.
I don't know the length of the next node, the producer knows
Isn't this the relevant libxml2 function?

int xmlTextReaderRead (xmlTextReaderPtr reader)

Moves the position of the current instance to the next node
in the stream, exposing its properties.
reader:	the xmlTextReaderPtr used
Returns: 1 if the node was read successfully, 0 if there is
no more nodes to read, or -1 in case of error

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-23 18:39

Message:
Yes, I would assume this would block as you read more data from the socket.  But I would hope you could incrementally process data.

One thing I missed was that the read method should take a length.  Libxml will tell you how much data it wants, you can't return more.  That makes the interface a bit harder to deal with, unless you can read x bytes off the socket.

So does setting the length help?

----------------------------------------------------------------------

Comment By: Han Holl (hanholl)
Date: 2008-11-22 06:52

Message:
Hello Charlie,

Thanks a lot for this effort. Unfortunately it doesn't yet
work for me. I read the libxml2 C docs and had the
impression that XML::Reader.read should return as soon as it
had read a complete XML node.
If you have a look at the example I opened this discussion
with, here's a strace of the server.
 
read(4, "<doc><a>k</a></doc>", 4096)    = 19
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
read(4, "\n", 4096)                     = 1
read(4,

And there it hangs hungry for more.
Maybe I've got the documention wrong?

Cheers,

Han Holl



----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-22 02:46

Message:
Hi Han,

Ok, upgrade to the latest libxml.  Then use:

reader = Reader.io(io_object)

That io object has to respond to read.  So:

def SocketIO
  def new(server)
    @server = server
  end

  def read
    server.accept
  end
end

And that should do the trick.

----------------------------------------------------------------------

Comment By: Han Holl (hanholl)
Date: 2008-11-17 05:16

Message:
I wouldn't know where to begin. Before I tried I had a look
at the code, ruby_xml_reader_new_io(int argc, VALUE *argv,
VALUE self), and had the strong impression that it _was_
implemented.
I have no idea what is missing.
C and ruby extensions are by no means my strong suit, I
prefer ruby <g>.
My C is read-only, with the exception of the occasional
small patch

----------------------------------------------------------------------

Comment By: Charlie Savage (cfis)
Date: 2008-11-15 16:45

Message:
Yes, that is not going to work.  Libxml does provide its own socket implementation, but that is not exposed via the ruby bindings.  Want to put together a patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22531&group_id=494


More information about the libxml-devel mailing list