From eric at sortfolio.com Thu Nov 1 20:04:15 2012 From: eric at sortfolio.com (Eric Marthinsen) Date: Thu, 1 Nov 2012 16:04:15 -0400 Subject: [Mechanize-users] getaddrinfo: Temporary failure in name resolution Message-ID: Hi Everyone- I've written a scraper to go through the website of all of our clients and verify that their website url is still active and to see if they have a twitter link on their homepage. You can see the code I'm using to do the scraping here: https://gist.github.com/3996079 There are around 13,000 urls that I'm trying to visit. I get through about 1000 of them and then this error starts showing up for all of the requests: getaddrinfo: Temporary failure in name resolution It's extremely consistent. I'm running this off an EC2 instance. At first, I was using Amazon's DNS servers and thought that maybe it was an issue within their walls. So, I changed my DNS servers to point to Google's public DNS servers. The result was exactly the same and the error presented itself at the same point. Does anything stand out as a potential culprit here? Regards, Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From drbrain at segment7.net Fri Nov 2 01:39:33 2012 From: drbrain at segment7.net (Eric Hodel) Date: Thu, 1 Nov 2012 19:39:33 -0600 Subject: [Mechanize-users] getaddrinfo: Temporary failure in name resolution In-Reply-To: References: Message-ID: <51F44876-9208-4768-84E6-B4126BAC0FDA@segment7.net> On Nov 1, 2012, at 14:04, Eric Marthinsen wrote: > Hi Everyone- > > I've written a scraper to go through the website of all of our clients and verify that their website url is still active and to see if they have a twitter link on their homepage. You can see the code I'm using to do the scraping here: https://gist.github.com/3996079 > > There are around 13,000 urls that I'm trying to visit. I get through about 1000 of them and then this error starts showing up for all of the requests: > > getaddrinfo: Temporary failure in name resolution > > It's extremely consistent. I'm running this off an EC2 instance. At first, I was using Amazon's DNS servers and thought that maybe it was an issue within their walls. So, I changed my DNS servers to point to Google's public DNS servers. The result was exactly the same and the error presented itself at the same point. > > Does anything stand out as a potential culprit here? Try require 'resolv-replace' as a temporary workaround. This enables a pure-ruby DNS resolver. This message likely comes directly from resolv(3), making it an OS-level issue. I'll poke around in the ruby sources to see what I can find. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at sortfolio.com Fri Nov 2 03:56:55 2012 From: eric at sortfolio.com (Eric Marthinsen) Date: Thu, 1 Nov 2012 23:56:55 -0400 Subject: [Mechanize-users] getaddrinfo: Temporary failure in name resolution In-Reply-To: <51F44876-9208-4768-84E6-B4126BAC0FDA@segment7.net> References: <51F44876-9208-4768-84E6-B4126BAC0FDA@segment7.net> Message-ID: Hi Eric- I might have figured it out. I'm re-running the script now and am about 7,000 records in. I made two changes. The first is that I set the idle timeout to 1 second. The second is that I set http keep alive to false. What I think was happening was that as I was creating page objects, they were keeping a persistent connection to the remote server. These connections might have timed out after 5 seconds, but after cranking through enough records, the number of live connections grew to the point where there were no available connections with which to do a DNS lookup. This is all speculation for what might have been happening (I've never heard about a fixed number of outbound connections), but it fits my mental model of what's going on. Regards, Eric On Thu, Nov 1, 2012 at 9:39 PM, Eric Hodel wrote: > On Nov 1, 2012, at 14:04, Eric Marthinsen wrote: > > Hi Everyone- > > I've written a scraper to go through the website of all of our clients and > verify that their website url is still active and to see if they have a > twitter link on their homepage. You can see the code I'm using to do the > scraping here: https://gist.github.com/3996079 > > There are around 13,000 urls that I'm trying to visit. I get through about > 1000 of them and then this error starts showing up for all of the requests: > > getaddrinfo: Temporary failure in name resolution > > It's extremely consistent. I'm running this off an EC2 instance. At first, > I was using Amazon's DNS servers and thought that maybe it was an issue > within their walls. So, I changed my DNS servers to point to Google's > public DNS servers. The result was exactly the same and the error presented > itself at the same point. > > Does anything stand out as a potential culprit here? > > > Try require 'resolv-replace' as a temporary workaround. This enables a > pure-ruby DNS resolver. > > This message likely comes directly from resolv(3), making it an OS-level > issue. I'll poke around in the ruby sources to see what I can find. > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at sortfolio.com Fri Nov 2 22:41:23 2012 From: eric at sortfolio.com (Eric Marthinsen) Date: Fri, 2 Nov 2012 18:41:23 -0400 Subject: [Mechanize-users] getaddrinfo: Temporary failure in name resolution In-Reply-To: References: <51F44876-9208-4768-84E6-B4126BAC0FDA@segment7.net> Message-ID: As a follow-up, the script now runs. Changing the idle timeout and keep-alive setting did the trick. I think either change in isolation would have done the trick, but both certainly got the job done. On Thu, Nov 1, 2012 at 11:56 PM, Eric Marthinsen wrote: > Hi Eric- > > I might have figured it out. I'm re-running the script now and am about > 7,000 records in. I made two changes. The first is that I set the idle > timeout to 1 second. The second is that I set http keep alive to false. > What I think was happening was that as I was creating page objects, they > were keeping a persistent connection to the remote server. These > connections might have timed out after 5 seconds, but after cranking > through enough records, the number of live connections grew to the point > where there were no available connections with which to do a DNS lookup. > This is all speculation for what might have been happening (I've never > heard about a fixed number of outbound connections), but it fits my mental > model of what's going on. > > Regards, > Eric > > > > > On Thu, Nov 1, 2012 at 9:39 PM, Eric Hodel wrote: > >> On Nov 1, 2012, at 14:04, Eric Marthinsen wrote: >> >> Hi Everyone- >> >> I've written a scraper to go through the website of all of our clients >> and verify that their website url is still active and to see if they have a >> twitter link on their homepage. You can see the code I'm using to do the >> scraping here: https://gist.github.com/3996079 >> >> There are around 13,000 urls that I'm trying to visit. I get through >> about 1000 of them and then this error starts showing up for all of the >> requests: >> >> getaddrinfo: Temporary failure in name resolution >> >> It's extremely consistent. I'm running this off an EC2 instance. At >> first, I was using Amazon's DNS servers and thought that maybe it was an >> issue within their walls. So, I changed my DNS servers to point to Google's >> public DNS servers. The result was exactly the same and the error presented >> itself at the same point. >> >> Does anything stand out as a potential culprit here? >> >> >> Try require 'resolv-replace' as a temporary workaround. This enables a >> pure-ruby DNS resolver. >> >> This message likely comes directly from resolv(3), making it an OS-level >> issue. I'll poke around in the ruby sources to see what I can find. >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From drbrain at segment7.net Sat Nov 3 01:51:01 2012 From: drbrain at segment7.net (Eric Hodel) Date: Fri, 2 Nov 2012 19:51:01 -0600 Subject: [Mechanize-users] getaddrinfo: Temporary failure in name resolution In-Reply-To: References: <51F44876-9208-4768-84E6-B4126BAC0FDA@segment7.net> Message-ID: On Nov 2, 2012, at 4:41 PM, Eric Marthinsen wrote: > As a follow-up, the script now runs. Changing the idle timeout and keep-alive setting did the trick. I think either change in isolation would have done the trick, but both certainly got the job done. I'll add some tuning information to the mechanize documentation for people using mechanize against many different servers, and I'll see what I can do in mechanize to improve the idle timeout. > On Thu, Nov 1, 2012 at 11:56 PM, Eric Marthinsen wrote: > Hi Eric- > > I might have figured it out. I'm re-running the script now and am about 7,000 records in. I made two changes. The first is that I set the idle timeout to 1 second. The second is that I set http keep alive to false. What I think was happening was that as I was creating page objects, they were keeping a persistent connection to the remote server. These connections might have timed out after 5 seconds, but after cranking through enough records, the number of live connections grew to the point where there were no available connections with which to do a DNS lookup. This is all speculation for what might have been happening (I've never heard about a fixed number of outbound connections), but it fits my mental model of what's going on. I bet this is exactly what happened. Setting the idle timeout low will not help, the idle timeout controls whether or not the connection is reset. Mechanize doesn't clean up sockets that have passed the idle timeout, it lets the GC take care of it. Disabling keep-alive will immediately close the connection, so this would solve the problem. Disabling the history may help performance without using up all your sockets when you leave keep-alive enabled if you're connecting to the same host for several requests. Mechanize relies on the GC to close the connections. From brauliobhavamitra at gmail.com Thu Nov 8 13:37:49 2012 From: brauliobhavamitra at gmail.com (=?UTF-8?Q?Br=C3=A1ulio_Bhavamitra?=) Date: Thu, 8 Nov 2012 11:37:49 -0200 Subject: [Mechanize-users] Simple cache system with mongomapper Message-ID: Hello all, For those interested, I've made a simple cache system which changes 'get' and 'post' Mechanize methods to check cache before fetching the page. The source is at https://github.com/coletivoEITA/IMD/blob/master/app/models/cache.rb best regards, br?ulio -------------- next part -------------- An HTML attachment was scrubbed... URL: