Not very elegant, but it’s my first script in Ruby. Bits and pieces of code mashed into a beautiful chaos of workability. The result is the load time for Google.com via webpagetest.org. I’m using Selenium, Nokogiri and action mailer to send the result.
Next project is to translate this script into Python. Hmm… maybe do this in Python and parse the data from XML instead? That would be cool. I’ll mull it over…
webscraper.rb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
require 'rubygems' require 'nokogiri' require 'open-uri' require 'selenium-webdriver' require_relative 'SimpleMailer' @driver = Selenium::WebDriver.for :firefox @driver.navigate.to 'https://www.webpagetest.org' wait = Selenium::WebDriver::Wait.new(:timeout => 5) input = wait.until { element = @driver.find_element(:id, 'url') element if element.displayed? } @driver.find_element(:id, 'url').clear input.send_keys('google.com') @driver.find_element(:id, 'start_test-container').click wait = Selenium::WebDriver::Wait.new(:timeout => 450) # seconds wait.until { @driver.find_element(:id, 'test_results-container') } html_source = @driver.page_source # Check that the id exists # starttest = wait.until { # element = @driver.find_element(:id, 'test_results-container') # element if element.displayed? # } # puts "Test Passed: ID found" if starttest.displayed? # @driver.quit # Then you can use Nokogiri to parse the html: ##################################################################### # doc = Nokogiri::HTML(html_source) # PREVIOUS WORKING SCRIPT # doc.css('tr').each do |el| # puts el.text # email = SimpleMailer.simple_message('mygmailaddress, 'Best I can do', el.text) # email.deliver # end # @driver.quit ###################################################################### # copied xpath from chrome inspector doc = Nokogiri::HTML.parse(html_source) content = doc .xpath('//*[@id="header_data"]/h2/span'\ ,'//*[@id="LoadTime"]'\ ,'//*[@id="tableResults"]/tbody/tr[2]/th[2]') .to_a.join(" ") puts content email = SimpleMailer.simple_message('reciepientemailaddress'\ , '1st script parsed and spaced'\ , content) email.deliver @driver.quit ######################################################### |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
require 'action_mailer' ActionMailer::Base.smtp_settings = { :address => 'smtp.gmail.com', :port => 587, :domain => 'gmail.com', :user_name => 'mygmailaddress', :password => 'myapppassword', :authentication => :plain, } class SimpleMailer < ActionMailer::Base def simple_message(recipient, subject, message) mail(:from => 'mygmailaddress', :to => recipient, :subject => subject, :body => message) end end |