From marc at bloodnok.com Tue Mar 6 15:57:45 2007 From: marc at bloodnok.com (Marc Munro) Date: Tue, 06 Mar 2007 12:57:45 -0800 Subject: [libxml-devel] Odd attribute semantics Message-ID: <1173214665.15303.28.camel@bloodnok.com> While trying to create shallow copies of nodes I have run into some oddities. Firstly I have been unable to access each attribute of a node except through Xpath (as coded in each_attribute, below). This is frustrating and seems to contradict the documentation for Node.properties and Node.each_attr More strangely, my each_attribute method only works for nodes returned from a scan of the document, and not from the original nodes as added to the document. The following snippet should illustrate this: node1 = XML::Node.new('node1') node1['x'] = 'y' node1['z'] = 'a' doc.root << node1 node1.each_attribute do |name, val| # Will never reach this point! end doc.each do |node| if node.name == 'node1' node.each_attribute do |name, val| # Reaches this point for each attribute end end end There is no obvious difference between the 'node1' object, and the 'node' object yielded by doc.each. They both print the same and both yield similar looking doc objects, yet they have puzzlingly different semantics. I write in the hope that someone can explain, document and/or correct this, and if not, to ensure that this strange behaviour is at least described in this list. Here is a simple test case to illustrate the issue. require 'xml/libxml' def each_attribute(node) begin attrs = node.find('./@*') rescue TypeError # Do nothing. This error is mistakenly? raised when no matching # elements can be found return end attrs.each do |elem| yield elem.name, elem.value end end def shallowcopy(old) node = XML::Node.new(old.name) each_attribute(old) do |name, value| node[name] = value end node end doc = XML::Document.new root = XML::Node.new('root') node1 = XML::Node.new('node1') node2 = XML::Node.new('node2') node1['type'] = 'x' node2['type'] = 'y' node1['value'] = 'xval' node2['value'] = 'yval' doc.root = root root << node1 root << node2 puts "NODE1 : #{node1} :NODE1" puts "COPY OF NODE1: #{shallowcopy(node1)} :COPY OF NODE1" doc.root.each_child do |node| if node.name == 'node1' puts "NODE1 : #{node1} :NODE1" puts "COPY OF NODE1: #{shallowcopy(node)} :COPY OF NODE1" end end And this is the output generated. Note that the first copy of node1 has no attributes, while the second does. NODE1 : :NODE1 COPY OF NODE1: :COPY OF NODE1 NODE1 : :NODE1 COPY OF NODE1: :COPY OF NODE1 __ Marc -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070306/6c467cf3/attachment.bin From shimbo at is.naist.jp Thu Mar 8 12:19:57 2007 From: shimbo at is.naist.jp (Masashi Shimbo) Date: Fri, 09 Mar 2007 02:19:57 +0900 Subject: [libxml-devel] [patch] Fix GC-related segfault on amd64 (libxml-ruby 0.3.8.4) Message-ID: <87irdbzm82.wl%shimbo@is.naist.jp> Hi, Attached is a patch (against libxml-ruby 0.3.8.4) that should fix segmentation faults in XML::Document.file. The patch hasn't been tested much, but the problem described in http://rubyforge.org/tracker/index.php?func=detail&aid=8337&group_id=494&atid=1971 (originally reported for freebsd-amd64) goes away on my Ubuntu Linux 6.10 amd64 box (with ruby 1.8.4 / gcc 4.1.2). Alternatively, it is possible to avoid the problem by turning off the compiler optimization altogether (by providing gcc with -O0), even without the patch. Masashi Shimbo -------------- next part -------------- A non-text attachment was scrubbed... Name: libxml-ruby-0.3.8.4.patch Type: application/octet-stream Size: 365 bytes Desc: not available Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070309/a8c53f37/attachment.obj -------------- next part -------------- From transfire at gmail.com Sun Mar 18 08:20:02 2007 From: transfire at gmail.com (TRANS) Date: Sun, 18 Mar 2007 08:20:02 -0400 Subject: [libxml-devel] Odd attribute semantics In-Reply-To: <1173214665.15303.28.camel@bloodnok.com> References: <1173214665.15303.28.camel@bloodnok.com> Message-ID: <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com> Marc, do you know C? Think you could fix it? T. From marc at bloodnok.com Sun Mar 18 15:23:58 2007 From: marc at bloodnok.com (Marc Munro) Date: Sun, 18 Mar 2007 12:23:58 -0700 Subject: [libxml-devel] Odd attribute semantics In-Reply-To: <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com> References: <1173214665.15303.28.camel@bloodnok.com> <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com> Message-ID: <1174245838.15634.19.camel@bloodnok.com> Yes I do know C. I don't know the libraries that this all of this based on though, and don't really have the time available right now to work on it. My free-time is going on an as yet unpublished free-software project which I want to concentrate on for now. I don't actually need a fix for my project as I have found work-arounds. Actually I found this mostly as the result of some bad practice on my part, so I'm not sure its a bug so much as an undocumented feature. For me, it would have been enough to have better documentation. It does seem though that the attribute methods could use some general attention. Far more significant in my view are bugs 9134 and 9135 Thanks __ Marc On Sun, 2007-18-03 at 08:20 -0400, TRANS wrote: > Marc, do you know C? Think you could fix it? > > T. > _______________________________________________ > libxml-devel mailing list > libxml-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel > > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/libxml-devel/attachments/20070318/c6a10ff3/attachment.bin From transfire at gmail.com Mon Mar 19 10:28:03 2007 From: transfire at gmail.com (TRANS) Date: Mon, 19 Mar 2007 10:28:03 -0400 Subject: [libxml-devel] Odd attribute semantics In-Reply-To: <1174245838.15634.19.camel@bloodnok.com> References: <1173214665.15303.28.camel@bloodnok.com> <4b6f054f0703180520qc169ccfv8957a6a1063d02e0@mail.gmail.com> <1174245838.15634.19.camel@bloodnok.com> Message-ID: <4b6f054f0703190728i76388135s7ed60e00fa1052c5@mail.gmail.com> On 3/18/07, Marc Munro wrote: > Yes I do know C. I don't know the libraries that this all of this based > on though, and don't really have the time available right now to work on > it. My free-time is going on an as yet unpublished free-software > project which I want to concentrate on for now. > > I don't actually need a fix for my project as I have found work-arounds. > Actually I found this mostly as the result of some bad practice on my > part, so I'm not sure its a bug so much as an undocumented feature. For > me, it would have been enough to have better documentation. It does > seem though that the attribute methods could use some general attention. > > Far more significant in my view are bugs 9134 and 9135 Shoot. I was hopeful. I'm not a C coder myself, but libxml is a very important library to me. Ross has brought the bindings along nicely, but he seems to be pretty busy. I hardly even seem him on the mailing list these days. I just wish we could get some of these bugs fixed. I'm even wondering is bounties would help. T. From deliverable at gmail.com Thu Mar 22 16:18:56 2007 From: deliverable at gmail.com (Alexy Khrabrov) Date: Thu, 22 Mar 2007 13:18:56 -0700 Subject: [libxml-devel] XML::HTMLParser docs Message-ID: <7c737f300703221318jed9b179y2a0e9668b6d8210d@mail.gmail.com> Greetings -- I'm trying to make XML::HTMLParser parse a file, not string, and when looking over the docs, see no HTMLParser at all -- the docs on the site are apparently 0.3. I'm rather new to Ruby docs, and see the distro has c/h code in it mostly; how are those docs generated on the site, and how can I generate them from the source -- specifically, to see all methods available for XML::HTMLParser? Cheers, Alexy From stefan.lauer at hps-technologies.de Thu Mar 29 01:33:22 2007 From: stefan.lauer at hps-technologies.de (stefan lauer) Date: Thu, 29 Mar 2007 07:33:22 +0200 Subject: [libxml-devel] ctrlA characters in the xml file Message-ID: <8A71002E-11E1-4E0C-B3AE-B1E2B8B62BF4@hps-technologies.de> Hello, on my Server amd x86_64 ruby 1.8.4 and libXML 0.3.8.4 are installed. I read in large xml-Files (my testfile has 7Mb and roundabout 120000 lines) and process it (that means delete some nodes/elements) and write them out. The output file contains sometimes suddenly ctrl+A characters. I tried it also with ruby 1.8.5 and libXML 0.3.8.2. and crossover it. It is always the same. Sometimes the ctrlA is in the middle of the element-tag and the rest is deleted. It looks like this for example normal: text with ctrlA