I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.
I tried using scrubyt exractor, dont have idea where to specify the
class
type.
google_data = Scrubyt::Extractor.define do
fetch ‘http://www.google.com/’
fill_textfield ‘q’, ‘ruby’
submit
link “Ruby programming language” do
url “href”, :type => :attribute
end
junk = google_data.to_xml
And how to get the output in text/string format.
On 2008.11.17., at 19:17, Sita Rami R. wrote:
google_data = Scrubyt::Extractor.define do
fetch ‘http://www.google.com/’
fill_textfield ‘q’, ‘ruby’
submit
link “Ruby programming language” do
url “href”, :type => :attribute
end
junk = google_data.to_xml
And how to get the output in text/string format.
btw. you should get the newest scRUBYt! , 0.4.05 which does not
depend on RubyInline, Ruby2Ruby and ParseTree etc.
What would you like to do exactly?
- class: use an xpath like this: stuff “//td[@class=‘red’]”
- text/string: use to_hash instead of to_xml.
HTH,
Peter
http://www.rubyrailways.com
http://scrubyt.org
require ‘rubygems’
require ‘scrubyt’
google_data = Scrubyt::Extractor.define do
fetch ‘gap inc - Google Search’
link_title “//a[@class=‘l’]”, :write_text => true do
link_url
end
next_page “Next”, :limit => 3
end
output_file = open(“google_results.txt”, ‘w’) do |f|
google_data.to_hash.each do |result|
f.puts “#{result[:link_title]} - #{result[:link_url]}”
end
end
produces:
Shop clothes for women, men, maternity, baby, and kids at gap.com …
HTH,
Peter
http://www.rubyrailways.com
http://scrubyt.org
My program need to do the following
Navigate to google site, providing “ruby” as search text, clicked the
search
button
Now we get the results page showing 1st 10 results.
I like to collect those 10 links and titles of those links and log them
in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured…but i am unable to get the titles.
And also i know how to extract in XML format…
but i need in this way .each Title and its Link in a single line
My scripts goes here…
require ‘rubygems’
require ‘scrubyt’
google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch ‘http://www.google.com/’
fill_textfield ‘q’, ‘Gap Inc’
submit
#Construct the wrapper
link “gap” do
url “href”, :type => :attribute
end
next_page “Next”, :limit => 10
end
junk = google_data.to_xml
puts junk
Please help me out…
Suggest anyother way, if this doesn’t work out
Thanks,
Sita.
Thanq very much peter…it surved my purpose
That’s great to hear If you have any scRUBYt!/scraping related
questions, don’t hesitate to ask.
Cheers,
Peter
http://www.rubyrailways.com
http://scrubyt.org
Thanq very much peter…it surved my purpose
Peter,
Where can i find some good stuff relating to scruby/Ruby …any
preferred
sites…
Thanks,
Sita.
http://scrubyt.org - check out the older posts dealing with creating
scrapers for different pages
check out the examples:
http://rubyforge.org/frs/download.php/46812/scrubyt-examples-0.4.05.tgz
more is on the way…
Cheers,
Peter
http://www.rubyrailways.com
http://scrubyt.org
Hi Peter,
I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.
am able to get the data using this method.
require ‘rubygems’
require ‘scrubyt’
google_data = Scrubyt::Extractor.define do
fetch ‘http://www.ebay.in’
fill_textfield ‘satitle’, ‘ipod shuffle’
submit
record
“/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr”
do
name “/td[2]/div/a”
price “/td[5]”
image “/td/a/img” do
url “src”, :type => :attribute
end
link “/td[2]/div/a” do
url “href”, :type => :attribute
end
end
end
google_data.to_xml.write($stdout, 1)
but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?
Thanks in advance,
Vipin
I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url
How can i fix this the code?
grtz…remco