From peter at hexagile.com Sun Mar 28 11:34:46 2010 From: peter at hexagile.com (Peter Szinek) Date: Sun, 28 Mar 2010 17:34:46 +0200 Subject: [Celerity-users] UnexpectedPageException Message-ID: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> Hey guys, Any idea what's going on here? >> Macintosh-4:~ mbp$ jirb irb(main):001:0> require 'celerity' => true irb(main):002:0> browser = Celerity::Browser.new(:user_agent => 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)') => # irb(main):003:0> browser.goto('http://jobs.brassring.com/1033/ASP/TG/cim_advsearch.asp?ref=172009113323&partnerid=11721&siteid=78') Celerity::Exception::UnexpectedPageException: image/gif from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/browser.rb:737:in `page=' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/browser.rb:773:in `enable_event_listener' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/listener.rb:120:in `call' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/listener.rb:120:in `each' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/browser.rb:100:in `goto' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/container.rb:761:in `rescue_status_code_exception' from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ celerity-0.7.9/lib/celerity/browser.rb:99:in `goto' from (irb):4 Any way to work this around - I just need to scrape a few links, don't really care about content types / images etc. Cheers, Peter From jasoninclass at googlemail.com Sun Mar 28 15:18:14 2010 From: jasoninclass at googlemail.com (jason franklin-stokes) Date: Sun, 28 Mar 2010 21:18:14 +0200 Subject: [Celerity-users] UnexpectedPageException In-Reply-To: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> References: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> Message-ID: you downloaded a gif best jason On Mar 28, 2010, at 5:34 PM, Peter Szinek wrote: > Hey guys, > > Any idea what's going on here? > > >> Macintosh-4:~ mbp$ jirb > irb(main):001:0> require 'celerity' > => true > irb(main):002:0> browser = Celerity::Browser.new(:user_agent => 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)') > => # > irb(main):003:0> browser.goto('http://jobs.brassring.com/1033/ASP/TG/cim_advsearch.asp?ref=172009113323&partnerid=11721&siteid=78') > Celerity::Exception::UnexpectedPageException: image/gif > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/browser.rb:737:in `page=' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/browser.rb:773:in `enable_event_listener' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/listener.rb:120:in `call' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/listener.rb:120:in `each' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/browser.rb:100:in `goto' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/container.rb:761:in `rescue_status_code_exception' > from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/browser.rb:99:in `goto' > from (irb):4 > > Any way to work this around - I just need to scrape a few links, don't really care about content types / images etc. > > Cheers, > Peter > _______________________________________________ > Celerity-users mailing list > Celerity-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/celerity-users From peter at hexagile.com Sun Mar 28 19:11:09 2010 From: peter at hexagile.com (Peter Szinek) Date: Mon, 29 Mar 2010 01:11:09 +0200 Subject: [Celerity-users] UnexpectedPageException In-Reply-To: References: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> Message-ID: <2384EC1B-3E81-4F91-8371-670D548F90E0@hexagile.com> Cool... how can I prevent downloading? I don't care about images at all... On 2010.03.28., at 21:18, jason franklin-stokes wrote: > > you downloaded a gif > > best jason > > > On Mar 28, 2010, at 5:34 PM, Peter Szinek wrote: > >> Hey guys, >> >> Any idea what's going on here? >> >>>> Macintosh-4:~ mbp$ jirb >> irb(main):001:0> require 'celerity' >> => true >> irb(main):002:0> browser = Celerity::Browser.new(:user_agent => >> 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)') >> => #> @charset="UTF-8" @viewer=Celerity::DefaultViewer @error_checkers=[]> >> irb(main):003:0> browser.goto('http://jobs.brassring.com/1033/ASP/TG/cim_advsearch.asp?ref=172009113323&partnerid=11721&siteid=78') >> Celerity::Exception::UnexpectedPageException: image/gif >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/browser.rb:737:in `page=' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/browser.rb:773:in `enable_event_listener' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/listener.rb:120:in `call' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/listener.rb:120:in `each' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/listener.rb:120:in `webWindowClosed' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/browser.rb:100:in `goto' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/container.rb:761:in >> `rescue_status_code_exception' >> from /Users/mbp/devel/bin/jruby-1.4.0/lib/ruby/gems/1.8/gems/ >> celerity-0.7.9/lib/celerity/browser.rb:99:in `goto' >> from (irb):4 >> >> Any way to work this around - I just need to scrape a few links, >> don't really care about content types / images etc. >> >> Cheers, >> Peter >> _______________________________________________ >> Celerity-users mailing list >> Celerity-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/celerity-users > > _______________________________________________ > Celerity-users mailing list > Celerity-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/celerity-users From jari.bakken at gmail.com Sun Mar 28 19:50:35 2010 From: jari.bakken at gmail.com (Jari Bakken) Date: Mon, 29 Mar 2010 01:50:35 +0200 Subject: [Celerity-users] UnexpectedPageException In-Reply-To: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> References: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> Message-ID: On Sun, Mar 28, 2010 at 5:34 PM, Peter Szinek wrote: > Hey guys, > > Any idea what's going on here? > This looks like a bug in HtmlUnit, where it for some reason thinks the main content has content type image/gif. Can you check if you can reproduce with Celerity from git (which includes a newer HtmlUnit snapshot)? If that's the case, you should report it to the HtmlUnit bug tracker [1]. Jari [1] http://htmlunit.sourceforge.net/submittingBugs.html From jasoninclass at googlemail.com Mon Mar 29 02:17:29 2010 From: jasoninclass at googlemail.com (jason franklin-stokes) Date: Mon, 29 Mar 2010 08:17:29 +0200 Subject: [Celerity-users] UnexpectedPageException In-Reply-To: References: <52F12151-ED20-4D04-B704-B37C31DFC379@hexagile.com> Message-ID: <88CF830C-BC51-4895-B2C8-583D857D504C@googlemail.com> nope, not a bug in htmlunit, The second iframe on the page has src for a gif which is used to count page views. when the src is loaded htmlunit acts correctly by giving an unexpected page exception. you are going to have to catch that exception and deal with it - unfortunately there is no point catching it in your code there is nothing you can do about it there. you are probably going to have to create a falsify web connection class and filter out these sort of requests. hope that helps. Jason. On Mar 29, 2010, at 1:50 AM, Jari Bakken wrote: > On Sun, Mar 28, 2010 at 5:34 PM, Peter Szinek wrote: >> Hey guys, >> >> Any idea what's going on here? >> > > This looks like a bug in HtmlUnit, where it for some reason thinks the > main content has content type image/gif. Can you check if you can > reproduce with Celerity from git (which includes a newer HtmlUnit > snapshot)? If that's the case, you should report it to the HtmlUnit > bug tracker [1]. > > Jari > > [1] http://htmlunit.sourceforge.net/submittingBugs.html > _______________________________________________ > Celerity-users mailing list > Celerity-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/celerity-users From peter at hexagile.com Mon Mar 29 08:09:25 2010 From: peter at hexagile.com (Peter Szinek) Date: Mon, 29 Mar 2010 14:09:25 +0200 Subject: [Celerity-users] Problem with https page Message-ID: Hey guys, I am receiving this while trying to browse to https://careers.applebees.com/psc/HPROD_EXT_ER/?cmd=start&SiteId=3000 com/sun/net/ssl/internal/ssl/Alerts.java:174:in `getSSLException': javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target (NativeException) Any idea to work it around? Cheers, Peter From peter at hexagile.com Mon Mar 29 08:53:26 2010 From: peter at hexagile.com (Peter Szinek) Date: Mon, 29 Mar 2010 14:53:26 +0200 Subject: [Celerity-users] Problem with https page In-Reply-To: References: Message-ID: <6759104F-3A5B-4030-90E4-B5435F436FAB@rubyrailways.com> > Any idea to work it around? Okay, a step closer, but still not there. I added the certificate to the keystore with java InstallCert careers.applebees.com (omitted a few stuff I had to google to even run the above) got a ton of output ending in Added certificate to keystore 'jssecacerts' using alias 'careers.applebees.com-1' but when I try to open the page, I'm still getting the same... Java, how much I 'love' thee!1##I!1#*$^ P From peter at hexagile.com Mon Mar 29 09:13:19 2010 From: peter at hexagile.com (Peter Szinek) Date: Mon, 29 Mar 2010 15:13:19 +0200 Subject: [Celerity-users] Problem with https page In-Reply-To: <6759104F-3A5B-4030-90E4-B5435F436FAB@rubyrailways.com> References: <6759104F-3A5B-4030-90E4-B5435F436FAB@rubyrailways.com> Message-ID: Ok, mystery solved, sorry for the noise... if anyone interested / runs into this and will be stumped, I am happy to help - just drop me a msg... P On 2010.03.29., at 14:53, Peter Szinek wrote: > >> Any idea to work it around? > > Okay, a step closer, but still not there. I added the certificate to > the keystore with > > java InstallCert careers.applebees.com > > (omitted a few stuff I had to google to even run the above) > > got a ton of output ending in > > Added certificate to keystore 'jssecacerts' using alias > 'careers.applebees.com-1' > > but when I try to open the page, I'm still getting the same... > > Java, how much I 'love' thee!1##I!1#*$^ > > P > > From lists at iDIAcomputing.com Mon Mar 29 11:32:26 2010 From: lists at iDIAcomputing.com (George Dinwiddie) Date: Mon, 29 Mar 2010 11:32:26 -0400 Subject: [Celerity-users] Problem with https page In-Reply-To: References: Message-ID: <4BB0C80A.90406@iDIAcomputing.com> Peter, Peter Szinek wrote: > Hey guys, > > I am receiving this while trying to browse to > https://careers.applebees.com/psc/HPROD_EXT_ER/?cmd=start&SiteId=3000 > > com/sun/net/ssl/internal/ssl/Alerts.java:174:in `getSSLException': > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find valid certification path to requested target (NativeException) > > Any idea to work it around? Try webclient.setUseInsecureSSL(true) to ignore the certs. - George -- ---------------------------------------------------------------------- * George Dinwiddie * http://blog.gdinwiddie.com Software Development http://www.idiacomputing.com Consultant and Coach http://www.agilemaryland.org ---------------------------------------------------------------------- From jari.bakken at gmail.com Mon Mar 29 14:25:40 2010 From: jari.bakken at gmail.com (Jari Bakken) Date: Mon, 29 Mar 2010 20:25:40 +0200 Subject: [Celerity-users] Problem with https page In-Reply-To: <4BB0C80A.90406@iDIAcomputing.com> References: <4BB0C80A.90406@iDIAcomputing.com> Message-ID: On Mon, Mar 29, 2010 at 5:32 PM, George Dinwiddie wrote: > > Try > ? ? ? ?webclient.setUseInsecureSSL(true) > to ignore the certs. > > ?- George > Which is equivalent to either of these: Celerity::Browser.new(:secure_ssl => false) browser.secure_ssl = false