From marc at bloodnok.com Tue Mar 6 15:57:45 2007
From: marc at bloodnok.com (Marc Munro)
Date: Tue, 06 Mar 2007 12:57:45 -0800
Subject: [libxml-devel] Odd attribute semantics
Message-ID: <1173214665.15303.28.camel@bloodnok.com>
While trying to create shallow copies of nodes I have run into some
oddities.
Firstly I have been unable to access each attribute of a node except
through Xpath (as coded in each_attribute, below). This is frustrating
and seems to contradict the documentation for Node.properties and
Node.each_attr
More strangely, my each_attribute method only works for nodes returned
from a scan of the document, and not from the original nodes as added to
the document. The following snippet should illustrate this:
node1 = XML::Node.new('node1')
node1['x'] = 'y'
node1['z'] = 'a'
doc.root << node1
node1.each_attribute do |name, val|
# Will never reach this point!
end
doc.each do |node|
if node.name == 'node1'
node.each_attribute do |name, val|
# Reaches this point for each attribute
end
end
end
There is no obvious difference between the 'node1' object, and the
'node' object yielded by doc.each. They both print the same and both
yield similar looking doc objects, yet they have puzzlingly different
semantics.
I write in the hope that someone can explain, document and/or correct
this, and if not, to ensure that this strange behaviour is at least
described in this list.
Here is a simple test case to illustrate the issue.
require 'xml/libxml'
def each_attribute(node)
begin
attrs = node.find('./@*')
rescue TypeError
# Do nothing. This error is mistakenly? raised when no matching
# elements can be found
return
end
attrs.each do |elem|
yield elem.name, elem.value
end
end
def shallowcopy(old)
node = XML::Node.new(old.name)
each_attribute(old) do |name, value|
node[name] = value
end
node
end
doc = XML::Document.new
root = XML::Node.new('root')
node1 = XML::Node.new('node1')
node2 = XML::Node.new('node2')
node1['type'] = 'x'
node2['type'] = 'y'
node1['value'] = 'xval'
node2['value'] = 'yval'
doc.root = root
root << node1
root << node2
puts "NODE1 : #{node1} :NODE1"
puts "COPY OF NODE1: #{shallowcopy(node1)} :COPY OF NODE1"
doc.root.each_child do |node|
if node.name == 'node1'
puts "NODE1 : #{node1} :NODE1"
puts "COPY OF NODE1: #{shallowcopy(node)} :COPY OF NODE1"
end
end
And this is the output generated. Note that the first copy of node1 has
no attributes, while the second does.
NODE1 : :NODE1
COPY OF NODE1: :COPY OF NODE1
NODE1 : :NODE1
COPY OF NODE1: :COPY OF NODE1
__
Marc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070306/6c467cf3/attachment.bin
From shimbo at is.naist.jp Thu Mar 8 12:19:57 2007
From: shimbo at is.naist.jp (Masashi Shimbo)
Date: Fri, 09 Mar 2007 02:19:57 +0900
Subject: [libxml-devel] [patch] Fix GC-related segfault on amd64
(libxml-ruby 0.3.8.4)
Message-ID: <87irdbzm82.wl%shimbo@is.naist.jp>
Hi,
Attached is a patch (against libxml-ruby 0.3.8.4) that should fix
segmentation faults in XML::Document.file.
The patch hasn't been tested much, but the problem described in
http://rubyforge.org/tracker/index.php?func=detail&aid=8337&group_id=494&atid=1971
(originally reported for freebsd-amd64) goes away on my Ubuntu Linux
6.10 amd64 box (with ruby 1.8.4 / gcc 4.1.2).
Alternatively, it is possible to avoid the problem by turning off the
compiler optimization altogether (by providing gcc with -O0), even without
the patch.
Masashi Shimbo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libxml-ruby-0.3.8.4.patch
Type: application/octet-stream
Size: 365 bytes
Desc: not available
Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070309/a8c53f37/attachment.obj
-------------- next part --------------
From transfire at gmail.com Sun Mar 18 08:20:02 2007
From: transfire at gmail.com (TRANS)
Date: Sun, 18 Mar 2007 08:20:02 -0400
Subject: [libxml-devel] Odd attribute semantics
In-Reply-To: <1173214665.15303.28.camel@bloodnok.com>
References: <1173214665.15303.28.camel@bloodnok.com>
Message-ID: <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com>
Marc, do you know C? Think you could fix it?
T.
From marc at bloodnok.com Sun Mar 18 15:23:58 2007
From: marc at bloodnok.com (Marc Munro)
Date: Sun, 18 Mar 2007 12:23:58 -0700
Subject: [libxml-devel] Odd attribute semantics
In-Reply-To: <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com>
References: <1173214665.15303.28.camel@bloodnok.com>
<4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com>
Message-ID: <1174245838.15634.19.camel@bloodnok.com>
Yes I do know C. I don't know the libraries that this all of this based
on though, and don't really have the time available right now to work on
it. My free-time is going on an as yet unpublished free-software
project which I want to concentrate on for now.
I don't actually need a fix for my project as I have found work-arounds.
Actually I found this mostly as the result of some bad practice on my
part, so I'm not sure its a bug so much as an undocumented feature. For
me, it would have been enough to have better documentation. It does
seem though that the attribute methods could use some general attention.
Far more significant in my view are bugs 9134 and 9135
Thanks
__
Marc
On Sun, 2007-18-03 at 08:20 -0400, TRANS wrote:
> Marc, do you know C? Think you could fix it?
>
> T.
> _______________________________________________
> libxml-devel mailing list
> libxml-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/libxml-devel
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070318/c6a10ff3/attachment.bin
From transfire at gmail.com Mon Mar 19 10:28:03 2007
From: transfire at gmail.com (TRANS)
Date: Mon, 19 Mar 2007 10:28:03 -0400
Subject: [libxml-devel] Odd attribute semantics
In-Reply-To: <1174245838.15634.19.camel@bloodnok.com>
References: <1173214665.15303.28.camel@bloodnok.com>
<4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com>
<1174245838.15634.19.camel@bloodnok.com>
Message-ID: <4b6f054f0703190728i76388135s7ed60e00fa1052c5@mail.gmail.com>
On 3/18/07, Marc Munro wrote:
> Yes I do know C. I don't know the libraries that this all of this based
> on though, and don't really have the time available right now to work on
> it. My free-time is going on an as yet unpublished free-software
> project which I want to concentrate on for now.
>
> I don't actually need a fix for my project as I have found work-arounds.
> Actually I found this mostly as the result of some bad practice on my
> part, so I'm not sure its a bug so much as an undocumented feature. For
> me, it would have been enough to have better documentation. It does
> seem though that the attribute methods could use some general attention.
>
> Far more significant in my view are bugs 9134 and 9135
Shoot. I was hopeful. I'm not a C coder myself, but libxml is a very
important library to me. Ross has brought the bindings along nicely,
but he seems to be pretty busy. I hardly even seem him on the mailing
list these days. I just wish we could get some of these bugs fixed.
I'm even wondering is bounties would help.
T.
From deliverable at gmail.com Thu Mar 22 16:18:56 2007
From: deliverable at gmail.com (Alexy Khrabrov)
Date: Thu, 22 Mar 2007 13:18:56 -0700
Subject: [libxml-devel] XML::HTMLParser docs
Message-ID: <7c737f300703221318jed9b179y2a0e9668b6d8210d@mail.gmail.com>
Greetings -- I'm trying to make XML::HTMLParser parse a file, not
string, and when looking over the docs, see no HTMLParser at all --
the docs on the site are apparently 0.3. I'm rather new to Ruby docs,
and see the distro has c/h code in it mostly; how are those docs
generated on the site, and how can I generate them from the source --
specifically, to see all methods available for XML::HTMLParser?
Cheers,
Alexy
From stefan.lauer at hps-technologies.de Thu Mar 29 01:33:22 2007
From: stefan.lauer at hps-technologies.de (stefan lauer)
Date: Thu, 29 Mar 2007 07:33:22 +0200
Subject: [libxml-devel] ctrlA characters in the xml file
Message-ID: <8A71002E-11E1-4E0C-B3AE-B1E2B8B62BF4@hps-technologies.de>
Hello,
on my Server amd x86_64 ruby 1.8.4 and libXML 0.3.8.4 are installed. I
read in large xml-Files (my testfile has 7Mb and roundabout 120000
lines)
and process it (that means delete some
nodes/elements) and write them out. The output file contains sometimes
suddenly ctrl+A characters.
I tried it also with ruby 1.8.5 and libXML 0.3.8.2. and crossover it. It
is always the same.
Sometimes the ctrlA is in the middle of the element-tag and the rest is
deleted. It looks like this for example
normal:
text
with ctrlA