Hi All,
I am a newbie at scraping and multi-threading too. Recently, I
have to implement these two in one of my application. Please find my
code in the attachment. I was facing some peculiar issues and the reason
I don’t know!!! The code works pretty well on my ubuntu7.04 but throws
some “sysread”, “read_status_line” errors when run on Windows. What
could be the obvious problem? Any help greatly appreciated.
regards,
Venkat B.
There are a number of problems, among them:
-
deriving from Monitor does not do anything for you in
this case, since you don’t use any of its locking
facilities (deriving from Monitor like this probably
isn’t a good approach anyway)
-
WWW::Mechanize instances are not safe to share
between threads; it’s best to create a separate agent
per thread.
-
Using the Timeout class on complicated libraries
can often break them. If you really need operations
to time out, it is best to see if the library provides
direct support for timeouts on its operations.
-mental
On Wed, Mar 5, 2008 at 3:10 AM, Venkat B. [email protected]
wrote:
Venkat B.
sysread uses a “low-level” read. In general, I wouldn’t be confident
that anything marked as “low-level” can be mixed well with
multi-threading. On my system, sysread is slower than ordinary read,
so I fail to see the advantage versus the standard IO#read.
Daniel Brumbaugh K.
On Mon, 2008-03-10 at 15:21 +0900, Venkat B. wrote:
I used a synchronized block in “download_file(html_link)”
method…Doesn’t that make any sense of locking ??? Am I wrong again !!!
In the code you gave, shared resources like @@html_links and @agent
aren’t protected at all.
-mental
Mental G. wrote:
There are a number of problems, among them:
- deriving from Monitor does not do anything for you in
this case, since you don’t use any of its locking
facilities (deriving from Monitor like this probably
isn’t a good approach anyway)
-mental
I used a synchronized block in “download_file(html_link)”
method…Doesn’t that make any sense of locking ??? Am I wrong again !!!