From Bil.Kleb at NASA.gov Tue Aug 21 11:56:40 2007 From: Bil.Kleb at NASA.gov (Bil Kleb) Date: Tue, 21 Aug 2007 11:56:40 -0400 Subject: [Mechanize-users] Signin to LinkedIn Message-ID: <46CB0B38.2080706@NASA.gov> Hi, Does anyone have the formula for getting logged into LinkedIn? Here's my current attempt: require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new home_page = agent.get('http://www.linkedin.com') signin_page = agent.click home_page.links.text('Sign in') puts "\nSIGNIN PAGE" pp signin_page login_form = signin_page.form('login') login_form.session_login = 'LOGIN' login_form.session_password = 'PASSWORD' welcome_page = agent.submit(login_form, login_form.buttons.first) puts "\nWELCOME PAGE" pp welcome_page <<<< Currently returns signin page I tried mucking about with a session key, but no joy: login_form.session_rikey = agent.cookies.find{ |c| 'JSESSIONID' == c.name }.value (My goal is to scrape a list of my connections' new connections.) Thanks, -- Bil Kleb http://nasarb.rubyforge.org From whitethunder922 at yahoo.com Tue Aug 21 12:41:32 2007 From: whitethunder922 at yahoo.com (Matt White) Date: Tue, 21 Aug 2007 09:41:32 -0700 (PDT) Subject: [Mechanize-users] Signin to LinkedIn Message-ID: <597056.90783.qm@web53311.mail.re2.yahoo.com> Bil, It's possible there is more to it than this, but looking at the page it appears that you have the field names wrong for the login form. Also, the login submit button is an image, which is a hoop I've had to jump through before, even so far as having to specify the exact coordinates of where I "clicked" on the button, so tell it to submit using that button specifically. Since there is only one button on that form, you can just use the following for your login form: login_form = agent.page.form('login') login_form.set_fields(:session_key => 'LOGIN', :session_password => 'PASSWORD') agent.submit(login_form, login_form.buttons.first) When I tried this, it said that my credentials were invalid, so hopefully it will work for you as you actually have a login :). Good luck! Matt White ----- Original Message ---- From: Bil Kleb To: mechanize-users at rubyforge.org Sent: Tuesday, August 21, 2007 9:56:40 AM Subject: [Mechanize-users] Signin to LinkedIn Hi, Does anyone have the formula for getting logged into LinkedIn? Here's my current attempt: require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new home_page = agent.get('http://www.linkedin.com') signin_page = agent.click home_page.links.text('Sign in') puts "\nSIGNIN PAGE" pp signin_page login_form = signin_page.form('login') login_form.session_login = 'LOGIN' login_form.session_password = 'PASSWORD' welcome_page = agent.submit(login_form, login_form.buttons.first) puts "\nWELCOME PAGE" pp welcome_page <<<< Currently returns signin page I tried mucking about with a session key, but no joy: login_form.session_rikey = agent.cookies.find{ |c| 'JSESSIONID' == c.name }.value (My goal is to scrape a list of my connections' new connections.) Thanks, -- Bil Kleb http://nasarb.rubyforge.org _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users ____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070821/a4991001/attachment.html From Bil.Kleb at NASA.gov Tue Aug 21 12:55:28 2007 From: Bil.Kleb at NASA.gov (Bil Kleb) Date: Tue, 21 Aug 2007 12:55:28 -0400 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <597056.90783.qm@web53311.mail.re2.yahoo.com> References: <597056.90783.qm@web53311.mail.re2.yahoo.com> Message-ID: <46CB1900.6010900@NASA.gov> Matt White wrote: > Bil, > > It's possible there is more to it than this, but looking at the page it > appears that you have the field names wrong for the login form. [..] Ah, thanks! Now I'm on apparently to the next step... #} {meta #} {title "Redirecting..."} {iframes} {frames} {links #} {forms}> Thanks again, -- Bil Kleb http://fun3d.larc.nasa.gov From aaron at tenderlovemaking.com Tue Aug 21 11:35:28 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 21 Aug 2007 08:35:28 -0700 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <46CB0B38.2080706@NASA.gov> References: <46CB0B38.2080706@NASA.gov> Message-ID: <20070821153528.GA26283@mac-mini.lan> Hey Bil! On Tue, Aug 21, 2007 at 11:56:40AM -0400, Bil Kleb wrote: > Hi, > > Does anyone have the formula for getting logged into LinkedIn? > > Here's my current attempt: > > require 'rubygems' > require 'mechanize' > > agent = WWW::Mechanize.new > > home_page = agent.get('http://www.linkedin.com') > > signin_page = agent.click home_page.links.text('Sign in') > puts "\nSIGNIN PAGE" > pp signin_page > > login_form = signin_page.form('login') > login_form.session_login = 'LOGIN' > login_form.session_password = 'PASSWORD' > > welcome_page = agent.submit(login_form, login_form.buttons.first) > puts "\nWELCOME PAGE" > pp welcome_page <<<< Currently returns signin page > > I tried mucking about with a session key, but no joy: > > login_form.session_rikey = agent.cookies.find{ |c| 'JSESSIONID' == c.name }.value > > (My goal is to scrape a list of my connections' new connections.) I think the "session_login" field is misleading. Give this a try: mech = WWW::Mechanize.new page = mech.get('https://www.linkedin.com/secure/login') page = page.form('login') { |form| form.session_key = ARGV[0] form.session_password = ARGV[1] }.submit.links.first.click page.save_as('out.html') Hope that helps! I'll add this to the mechanize examples. :-) -- Aaron Patterson http://tenderlovemaking.com/ From Bil.Kleb at NASA.gov Tue Aug 21 13:43:45 2007 From: Bil.Kleb at NASA.gov (Bil Kleb) Date: Tue, 21 Aug 2007 13:43:45 -0400 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <20070821153528.GA26283@mac-mini.lan> References: <46CB0B38.2080706@NASA.gov> <20070821153528.GA26283@mac-mini.lan> Message-ID: <46CB2451.5060001@NASA.gov> Aaron Patterson wrote: > Hey Bil! Hi, and thanks again for betabrite, http://tenderlovemaking.com/2006/09/28/new-ruby-betabrite-002/ It was a blast! > Hope that helps! I'll add this to the mechanize examples. :-) Yes, but it looks like they've hidden the actual "Connections" in an embedded javascript browser:
All I get on the fetched page (or with "view source") is

...processing

(I swear just last month the connections we available with "view source".) Stymied for now, -- Bil Kleb http://fun3d.larc.nasa.gov From aaron at tenderlovemaking.com Tue Aug 21 12:51:39 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 21 Aug 2007 09:51:39 -0700 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <46CB2451.5060001@NASA.gov> References: <46CB0B38.2080706@NASA.gov> <20070821153528.GA26283@mac-mini.lan> <46CB2451.5060001@NASA.gov> Message-ID: <20070821165138.GA26527@mac-mini.lan> On Tue, Aug 21, 2007 at 01:43:45PM -0400, Bil Kleb wrote: > Aaron Patterson wrote: > > Hey Bil! > > Hi, and thanks again for betabrite, > > http://tenderlovemaking.com/2006/09/28/new-ruby-betabrite-002/ > > It was a blast! No problem. I actually got a new sign, but they don't have an API for it. Its got a USB cable with a proprietary protocol. :-( > > > Hope that helps! I'll add this to the mechanize examples. :-) > > Yes, but it looks like they've hidden the actual "Connections" > in an embedded javascript browser: It looks like they also have a CSV export of the contacts. Would that get you the information you want? mech = WWW::Mechanize.new page = mech.get('https://www.linkedin.com/secure/login') page.form('login') { |form| form.session_key = ARGV[0] form.session_password = ARGV[1] }.submit page = mech.get('http://www.linkedin.com/addressBookExport') form = page.form('exportSettingsForm') form.submit(form.buttons.first).save_as('contacts.csv') -- Aaron Patterson http://tenderlovemaking.com/ From Bil.Kleb at NASA.gov Tue Aug 21 14:30:28 2007 From: Bil.Kleb at NASA.gov (Bil Kleb) Date: Tue, 21 Aug 2007 14:30:28 -0400 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <20070821165138.GA26527@mac-mini.lan> References: <46CB0B38.2080706@NASA.gov> <20070821153528.GA26283@mac-mini.lan> <46CB2451.5060001@NASA.gov> <20070821165138.GA26527@mac-mini.lan> Message-ID: <46CB2F44.10007@NASA.gov> Aaron Patterson wrote: > > It looks like they also have a CSV export of the contacts. Would that > get you the information you want? I don't think so, because I'm going after my connections' /new/ connections, which are one step removed from that and indicated by yellow outlines in the connections listing. Maybe I'll have to resort to firewatir? Later, -- Bil Kleb http://fun3d.larc.nasa.gov From aaron at tenderlovemaking.com Tue Aug 21 13:25:21 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Tue, 21 Aug 2007 10:25:21 -0700 Subject: [Mechanize-users] Signin to LinkedIn In-Reply-To: <46CB2F44.10007@NASA.gov> References: <46CB0B38.2080706@NASA.gov> <20070821153528.GA26283@mac-mini.lan> <46CB2451.5060001@NASA.gov> <20070821165138.GA26527@mac-mini.lan> <46CB2F44.10007@NASA.gov> Message-ID: <20070821172521.GA26659@mac-mini.lan> On Tue, Aug 21, 2007 at 02:30:28PM -0400, Bil Kleb wrote: > Aaron Patterson wrote: > > > > It looks like they also have a CSV export of the contacts. Would that > > get you the information you want? > > I don't think so, because I'm going after my connections' > /new/ connections, which are one step removed from that and > indicated by yellow outlines in the connections listing. Ah. Yes, this is getting messier. I was able to get mechanize to fetch the javascript used to populate that list: id = mech.cookies.find { |c| c.name == 'JSESSIONID' }.value page = mech.post('/dwr/exec/ConnectionsBrowserService.getMyConnections.dwr', { 'callCount' => '1', 'JSESSIONID' => id, 'c0-scriptName' => 'ConnectionsBrowserService', 'c0-methodName' => 'getMyConnections', 'c0-id' => '8656_1187721167904', 'c0-param0' => 'number:-1', 'c0-param1' => 'number:-1', 'c0-param2' => 'string:DONT_CARE', 'c0-param3' => 'number:500', 'c0-param4' => 'boolean:false', 'c0-param5' => 'boolean:true', 'xml' => 'true', }) I don't know how brittle that is.... I don't know where the c0-id number comes from, so it may break for you. That javascript has the info you want, but it might be kind of nasty to parse. -- Aaron Patterson http://tenderlovemaking.com/