Is there a standard pattern for threaded access to a file?

On Oct 13, 5:17 pm, Eric H. [email protected] wrote:

On Oct 13, 2007, at 13:15 , Brian A. wrote:

On Oct 13, 1:32 pm, Eric H. [email protected] wrote:

On Oct 13, 2007, at 07:29 , Francis C. wrote:

Eric, are you reading/posting on comp.lang.ruby ? I don’t see Francis’
post, but both you and 7stud quoted him, so I’m wondering if it was
aggregated from somewhere else.

I use the one, true ruby-talk, the [email protected] mailing list.

Ok, I guess some posts get dropped in the ruby-talk → comp.lang.ruby
transition occasionally. Probably the other direction too I suppose.

On Oct 13, 2007, at 4:35 AM, Robert K. wrote:

we use the mechanism to pass the queue through

end

queue.enq queue
th.join
end

Have fun!

robert

you win the golf for sure - here’s something similar to what i’ve
used in production code:

a lines producer feeds chunks of lines to consuming threads. the

producer

itself does not slurp the potentially huge log file into memory at

once,

rather, it reads only ‘bufsize’ lines at a time. consumers process

‘bufsize’ lines of the file at a time where ‘bufsize’ means that

the number

of lines yielded to the block with be that big at most: near the

end of a

file it’s possible that consumers will be given less that ‘bufsize’

lines to

process

Lines::Producer.new :path => FILE, :bufsize => 10 do
consumer :bufsize => 2 do |lines|
lines.each{|line| puts line}
end

 consumer :bufsize => 3 do |lines|
   lines.each{|line| puts line}
 end

end

Lines module and Producer/Consumer classes

BEGIN do
require ‘thread’

 module Lines
   class Error < ::StandardError
     class Starvation < Error; end
   end

   class Producer
     %w[ path bufsize ].each{|a| attr a}

     def initialize options = {}, &block
       @path = String options[:path]
       @bufsize = Integer options[:bufsize] || 1
       produce &block if block
     end

     def produce &block
       setup
       configure &block
       [ new_buffered_reader, new_buffered_writer ].each{|t| t.join}
       teardown
     end

     def setup
       @consumers = []
       @sq = SizedQueue.new @bufsize
     end

     def configure &block
       instance_eval &block
     end

     def new_buffered_reader
       Thread.new do
         Thread.current.abort_on_exception = true
         open(@path){|fd| fd.each{|line| @sq.push line}}
         @sq.push(:eof)
       end
     end

     def new_buffered_writer
       Thread.new do
         Thread.current.abort_on_exception = true
         catch :eof do
           loop do
             @consumers.each do |consumer|
               chunk = []
               consumer.bufsize.times do
                 line = @sq.pop
                 throw :eof if line == :eof
                 chunk << line
               end
               consumer << chunk
             end
           end
         end
         notify_all :eof
       end
     end

     def notify_all msg = :eof
       @consumers.each{|consumer| consumer << msg}
     end

     def teardown
       @consumers.map{|consumer| consumer.wait}
     end

     def consumer options = {}, &block
       @consumers << Consumer.new(self, options, &block)
     end

     class Consumer
       attr 'bufsize'

       def initialize producer, options = {}, &block
         @bufsize = Integer options[:bufsize]
         @producer = producer
         raise Error::Starvation unless @bufsize < @producer.bufsize
         @block = block
         @q = Queue.new
         @block = block
         @thread = new_thread
       end

       def << data
         @q.push data
       end

       def new_chunk
         Array.new bufsize
       end

       def new_thread
         Thread.new do
           Thread.current.abort_on_exception = true
           loop do
             data = @q.pop
             break if data == :eof
             @block.call data
           end
         end
       end

       def wait
         @thread.value
       end
     end
   end
 end

end

a @ http://codeforpeople.com/

On Sun, 14 Oct 2007 11:17:48 +0900, James Edward G. II wrote:

Here is the relevant header from the message you are discussing that
shows why it wasn’t gated:

Content-Type: multipart/alternative; boundary="----
=_Part_28483_17627615.1192285743535"

I just checked out your “What is the ruby-talk” gateway; I didn’t
realize
that the gateway currently dropped multipart/alternative. That’s a
shame.

Since I bear some responsibility for its evil popularity, I’ll volunteer
to
update that gateway code to extract the text-part out of the multipart
if
you can send it to me…

I should point out, though, that (a) it’s really not that hard
(text/plain
is supposed to come first, so that even clients who didn’t understand
MIME
would display the right thing before displaying the wrong thing) and
that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail -
or,
more accurately, it assigns zero points.

You say that “Some e-mails would be pretty non-trivial to handle
correctly”, but I’d be curious to see examples of those; by definition,
multipart/alternative contains a number of equivalent parts, and as long
as
one of those parts is text/plain, you only have to extract that part.
That
was the whole point of sending multipart/alternative, rather than merely
sending text/html and forcing people to downconvert. If there are
clients
that send multipart/alternative, but don’t send a text/plain subpart,
they’re missing the point.

ara.t.howard wrote:

On Oct 13, 2007, at 11:43 AM, 7stud – wrote:

  1. Where are you checking a return value:

threads.map{|t| t.join}

i’m not, but in a real piece of code longer than 5 lines it would be

In fact, you discard map’s return value.

  1. How is map’s return value ever going to be different than your
    threads array?

ah - ‘join’ should indeed be ‘value’ there. sorry.

basically one should use Thread.current.abort_on_exception, check the
return values, or be prepared that threads may fail and you might no
know about it (which is obviously ok sometimes)

a @ http://codeforpeople.com/

Ok, so let me get this straight:

First you post a poor example that is needlessly complex for a
beginner–and that won’t even work in the op’s situation.

Then, when someone points out some flaws in your code, you claim that
the proposed improvements are faulty and that your original code is
superior.

Finally, when someone pointedly asked how it’s possible your original
code does the things you claim it does, you refer to some imaginary
example that you would have posted.

On Oct 13, 2007, at 5:45 PM, Brian A. wrote:

I use the one, true ruby-talk, the [email protected] mailing
list.

Ok, I guess some posts get dropped in the ruby-talk → comp.lang.ruby
transition occasionally. Probably the other direction too I suppose.

Yes, I’ve written about this in the past:

http://blog.grayproductions.net/articles/what_is_the_ruby_talk_gateway

Here is the relevant header from the message you are discussing that
shows why it wasn’t gated:

Content-Type: multipart/alternative; boundary=“----
=_Part_28483_17627615.1192285743535”

James Edward G. II

On Oct 13, 10:17 pm, James Edward G. II [email protected]
wrote:

On Oct 13, 2007, at 5:45 PM, Brian A. wrote:

Ok, I guess some posts get dropped in the ruby-talk → comp.lang.ruby
transition occasionally. Probably the other direction too I suppose.

Yes, I’ve written about this in the past:

http://blog.grayproductions.net/articles/what_is_the_ruby_talk_gateway

Thanks for the info!

I wonder how many mailing list posters realize this. Do you have any
stats for the percentage of mailing list posts that don’t make it to
comp.lang.ruby?

If it’s common knowledge that one needs to post in text only, then I
don’t mind letting the gateway act as a filter for those who can’t
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I’d like to receive.

I personally much prefer usenet to mailing lists, so I’m reluctant to
switch to the mailing list for just this one group.

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

On Oct 13, 2007, at 9:35 PM, Jay L. wrote:

that the gateway currently dropped multipart/alternative. That’s a
shame.

To be totally clear, our gateway doesn’t drop them. They are
forwarded to our Usenet host. Our host rejects them as invalid
Usenet posts.

Since I bear some responsibility for its evil popularity, I’ll
volunteer to update that gateway code to extract the text-part out
of the multipart if you can send it to me…

http://blog.grayproductions.net/articles/hacking_the_gateway

http://blog.grayproductions.net/articles/mail_to_newsrb

http://blog.grayproductions.net/articles/news_to_mailrb

I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it yet.

I should point out, though, that (a) it’s really not that hard
(text/plain is supposed to come first, so that even clients who
didn’t understand MIME would display the right thing before
displaying the wrong thing)

I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.

and that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail

  • or, more accurately, it assigns zero points.

My apologies. I thought for sure I had seen a reference to that
sometime in the past, but I’ve been unable to dig it up this
morning. I stand corrected.

James Edward G. II

On Mon, 15 Oct 2007 00:26:58 +0900, James Edward G. II wrote:

I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it yet.

Sounds like it’d be more useful for me to help on the TMail version
(since
that’s what I’d end up using anyway)… is that code posted anywhere yet,
or
would you be willing to send/post it? It’s not in the Gateway topic,
and
your search box is, uh, broken.

I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.

Yeah, that’s just totally broken. I mean, it’s technically legal MIME,
but
pointless.

and that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail

  • or, more accurately, it assigns zero points.

My apologies. I thought for sure I had seen a reference to that
sometime in the past, but I’ve been unable to dig it up this
morning. I stand corrected.

IIRC correctly the rule used to have some points attached to it, but
somewhere along the way the mass-checks stopped determining it to be a
useful rule. That’s often what happens with SA; the scores are all
determined with some fancy AI code I think.

On Oct 13, 2007, at 11:40 PM, Brian A. wrote:

Do you have any stats for the percentage of mailing list posts that
don’t make it to comp.lang.ruby?

I just did a simple grep of the logs for a period of a little over
the last month. It looks like we average about eight rejected
messages a day (for an “HTML post” reason).

If it’s common knowledge that one needs to post in text only, then I
don’t mind letting the gateway act as a filter for those who can’t
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I’d like to receive.

Well, I wrote a blog post about it and reference it whenever the
discussion comes up. :wink:

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.

James Edward G. II

On Oct 14, 11:52 am, James Edward G. II [email protected]
wrote:

Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.

8 msgs/day * ~30 days / 5,126 is ~5%; that’s a little higher than I
was hoping :frowning: I’m also surprised by the mailing list : usenet ratio.

On Oct 12, 2007, at 5:51 PM, Jon H. wrote:

         blah blah blah...

     end

end

but that’s inside out! How do I rubify this code?

Thanks,

Jon

finally i remembered where this has been abstracted. in my own
lib : alib :wink:

cfp:~ > cat a.rb
require ‘alib’ ### gem install alib

alib.util.threadify IO.readlines(FILE), n_threads=5 do |line,
lineno|

puts “#{ lineno }:#{ line }”

end

cfp:~ > ruby a.rb
0:require ‘alib’
1:
2:alib.util.threadify IO.readlines(FILE), n_threads=5 do |line,
lineno|
3:
4: puts “#{ lineno }:#{ line }”
5:
6:end

this works for any enumerable thing you want to process with ‘n’
backend threads.

the current (0.5.0) alib version will blow up if you give it an IO
object though, as it uses #size to calculate the return value. i’ll
tweak it and release 0.5.1 today.

cheers.

a @ http://codeforpeople.com/

On Oct 14, 2007, at 2:25 PM, Brian A. wrote:

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.

8 msgs/day * ~30 days / 5,126 is ~5%; that’s a little higher than I
was hoping :frowning:

The log I used actually went back to 9-6-2007, so it was closer to 38
days, but yeah. It was higher than I would have guessed too.

James Edward G. II

On Mon, 15 Oct 2007 08:22:38 +0900, James Edward G. II wrote:

It’s not yet online. I am happy to put it up, sure. I really need
to get through two projects before I get to that though. Please give
me a few weeks.

Sure. I’ll e-mail you in a few weeks, and embed a Flash movie reminder
using ActiveX.

On Oct 14, 2007, at 11:35 AM, Jay L. wrote:

http://blog.grayproductions.net/articles/news_to_mailrb

I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it
yet.

Sounds like it’d be more useful for me to help on the TMail version
(since that’s what I’d end up using anyway)… is that code posted
anywhere yet, or would you be willing to send/post it?

It’s not yet online. I am happy to put it up, sure. I really need
to get through two projects before I get to that though. Please give
me a few weeks.

It’s not in the Gateway topic, and your search box is, uh, broken.

It seems to work OK for me. Feel free to email me the details off-
list and I’ll sure try to fix it.

I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.

Yeah, that’s just totally broken. I mean, it’s technically legal
MIME, but pointless.

We see quite a few broken posts pass through the gateway in quite a
few different ways. Welcome to the Internet. :wink:

James Edward G. II

Francis C. wrote:

On 10/12/07, Jon H. [email protected] wrote:

  1. Open the file
  2. Create 5 threads

Each thread should read a line of the file and process it, but no 2
threads should get the same line.

Why are you doing this in the first place? Do you have a computer with
five
processors and five memory buses?

I’m stress-testing my http server… Since most of the time is spent
waiting for requests to go and come back, multiple threads on the
sending end allows for greater throughput (to a point anyway). The
number 5 was just an example of n where n > 1.

Jon

7stud – wrote:

Actually, the example provided won’t even work in your case. You have
to do some extra things.

I’m pretty new to ruby

A Queue is a first in first out container, which means the items you
push() into one end of the Queue are the first items that pop() out the
other end. A Queue is also thread safe, which means that only one
thread can access it at the same time.

Slapping forehead… of course! Producer/Consumer = q.

Thanks!

Wow!

Thanks everyone for the detailed and insightful help!

Cheers,

Jon