[Nokogiri-talk] Nokogiri::XML::SAX::Document and elements with invalid xml contents
Jeff Hodges
jeff at somethingsimilar.com
Wed May 27 16:04:14 EDT 2009
Could you give an example? I imagine the answer would be checking
specifically to see what tag you're on, and if it's the parent tag of
the text you want, calling inner_html on that, flipping a boolean that
says to the rest of your code "don't worry about this data" until the
end tag event happens for the tag you care about when you flip it
back. Since you've been handed a pile of angle brackets and told it
was XML, you might have get a little dirty.
--
Jeff
On Wed, May 27, 2009 at 12:33 PM, Jesse Clark <jesse at jesseclark.com> wrote:
> Hi All,
>
> I have an xml document which I am trying to parse with
> Nokogiri::XML::SAX::Parser which contains an element that contains unescaped
> html fragments. I want to get the entire inner contents of this element but
> #characters is never being called because each inner html element is getting
> parsed as well.
>
> From what I remember of the last time I did SAX parsing in Java, I believe
> they had some method that would allow to pull out the inner contents of an
> element as if it were CDATA and then proceed with normal parsing. Is there
> anything similar in Nokogiri? I didn't see anything like it in the docs.
>
> Alternatively, does anyone have any suggestions for other ways I could get
> this accomplished?
>
> TIA,
> -Jesse
> _______________________________________________
> Nokogiri-talk mailing list
> Nokogiri-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/nokogiri-talk
>
More information about the Nokogiri-talk
mailing list