I’m very new to using ruby, and I can’t seem to figure something out
(that is probably quite basic). Any help is much appreciated!
When using nokogiri and open-uri in Ruby, I define a variable containing
a partial url (INITIAL_URL =
“https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten”)
so as to be able to add onto the url for continuous use (I have added
the full code below).
However, I keep running into an error. “syntax error, unexpected tLABEL”
- “unknown regexp options - zk” + "syntax error, unexpected ‘?’
How can I fix this?..
Here’s the full code:
irb
require ‘Nokogiri’
require ‘open-uri’
def get_search_result_links(n_page)
links = n_page.css(‘.linker-kolom li a’)
puts “** There were #{links.length} links found”
links.each do |link|
href = link[‘href’]
inner_url = ‘https://zoek.officielebekendmakingen.nl’ + href
puts “\n\n\nFetching page at #{File.basename(inner_url).split(‘?’)[0]}”
datalezer = open(inner_url).read
lokalenieuwefilenaam = href + “.html”
lokalenieuwefile = open(lokalenieuwefilenaam, “w”)
lokalenieuwefile.write(datalezer)
lokalenieuwefile.close
end
end
INITIAL_URL =
‘https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten’
initial_page = Nokogiri::HTML(open(INITIAL_URL))
pagination_links = initial_page.css(‘.paginering.beneden a’)
last_page_link = pagination_links[-2]
last_page_number = last_page_link.text.to_i
(5…last_page_number).each do |page_num|
puts “\n\n\n***** Getting page #{page_num}”
results_page_url = “#{INITIAL_URL}&_page=#{page_num}”
results_page = Nokogiri::HTML(open(results_page_url))
get_search_result_links(results_page)
end