Eric, are you reading/posting on comp.lang.ruby ? I don’t see Francis’
post, but both you and 7stud quoted him, so I’m wondering if it was
aggregated from somewhere else.
you win the golf for sure - here’s something similar to what i’ve
used in production code:
a lines producer feeds chunks of lines to consuming threads. the
producer
itself does not slurp the potentially huge log file into memory at
once,
rather, it reads only ‘bufsize’ lines at a time. consumers process
‘bufsize’ lines of the file at a time where ‘bufsize’ means that
the number
of lines yielded to the block with be that big at most: near the
end of a
file it’s possible that consumers will be given less that ‘bufsize’
lines to
process
Lines::Producer.new :path => FILE, :bufsize => 10 do
consumer :bufsize => 2 do |lines|
lines.each{|line| puts line}
end
consumer :bufsize => 3 do |lines|
lines.each{|line| puts line}
end
end
Lines module and Producer/Consumer classes
BEGIN do
require ‘thread’
module Lines
class Error < ::StandardError
class Starvation < Error; end
end
class Producer
%w[ path bufsize ].each{|a| attr a}
def initialize options = {}, &block
@path = String options[:path]
@bufsize = Integer options[:bufsize] || 1
produce &block if block
end
def produce &block
setup
configure &block
[ new_buffered_reader, new_buffered_writer ].each{|t| t.join}
teardown
end
def setup
@consumers = []
@sq = SizedQueue.new @bufsize
end
def configure &block
instance_eval &block
end
def new_buffered_reader
Thread.new do
Thread.current.abort_on_exception = true
open(@path){|fd| fd.each{|line| @sq.push line}}
@sq.push(:eof)
end
end
def new_buffered_writer
Thread.new do
Thread.current.abort_on_exception = true
catch :eof do
loop do
@consumers.each do |consumer|
chunk = []
consumer.bufsize.times do
line = @sq.pop
throw :eof if line == :eof
chunk << line
end
consumer << chunk
end
end
end
notify_all :eof
end
end
def notify_all msg = :eof
@consumers.each{|consumer| consumer << msg}
end
def teardown
@consumers.map{|consumer| consumer.wait}
end
def consumer options = {}, &block
@consumers << Consumer.new(self, options, &block)
end
class Consumer
attr 'bufsize'
def initialize producer, options = {}, &block
@bufsize = Integer options[:bufsize]
@producer = producer
raise Error::Starvation unless @bufsize < @producer.bufsize
@block = block
@q = Queue.new
@block = block
@thread = new_thread
end
def << data
@q.push data
end
def new_chunk
Array.new bufsize
end
def new_thread
Thread.new do
Thread.current.abort_on_exception = true
loop do
data = @q.pop
break if data == :eof
@block.call data
end
end
end
def wait
@thread.value
end
end
end
end
I just checked out your “What is the ruby-talk” gateway; I didn’t
realize
that the gateway currently dropped multipart/alternative. That’s a
shame.
Since I bear some responsibility for its evil popularity, I’ll volunteer
to
update that gateway code to extract the text-part out of the multipart
if
you can send it to me…
I should point out, though, that (a) it’s really not that hard
(text/plain
is supposed to come first, so that even clients who didn’t understand
MIME
would display the right thing before displaying the wrong thing) and
that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail -
or,
more accurately, it assigns zero points.
You say that “Some e-mails would be pretty non-trivial to handle
correctly”, but I’d be curious to see examples of those; by definition,
multipart/alternative contains a number of equivalent parts, and as long
as
one of those parts is text/plain, you only have to extract that part.
That
was the whole point of sending multipart/alternative, rather than merely
sending text/html and forcing people to downconvert. If there are
clients
that send multipart/alternative, but don’t send a text/plain subpart,
they’re missing the point.
i’m not, but in a real piece of code longer than 5 lines it would be
In fact, you discard map’s return value.
How is map’s return value ever going to be different than your
threads array?
ah - ‘join’ should indeed be ‘value’ there. sorry.
basically one should use Thread.current.abort_on_exception, check the
return values, or be prepared that threads may fail and you might no
know about it (which is obviously ok sometimes)
First you post a poor example that is needlessly complex for a
beginner–and that won’t even work in the op’s situation.
Then, when someone points out some flaws in your code, you claim that
the proposed improvements are faulty and that your original code is
superior.
Finally, when someone pointedly asked how it’s possible your original
code does the things you claim it does, you refer to some imaginary
example that you would have posted.
I wonder how many mailing list posters realize this. Do you have any
stats for the percentage of mailing list posts that don’t make it to
comp.lang.ruby?
If it’s common knowledge that one needs to post in text only, then I
don’t mind letting the gateway act as a filter for those who can’t
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I’d like to receive.
I personally much prefer usenet to mailing lists, so I’m reluctant to
switch to the mailing list for just this one group.
Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?
that the gateway currently dropped multipart/alternative. That’s a
shame.
To be totally clear, our gateway doesn’t drop them. They are
forwarded to our Usenet host. Our host rejects them as invalid
Usenet posts.
Since I bear some responsibility for its evil popularity, I’ll
volunteer to update that gateway code to extract the text-part out
of the multipart if you can send it to me…
I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it yet.
I should point out, though, that (a) it’s really not that hard
(text/plain is supposed to come first, so that even clients who
didn’t understand MIME would display the right thing before
displaying the wrong thing)
I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.
and that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail
or, more accurately, it assigns zero points.
My apologies. I thought for sure I had seen a reference to that
sometime in the past, but I’ve been unable to dig it up this
morning. I stand corrected.
On Mon, 15 Oct 2007 00:26:58 +0900, James Edward G. II wrote:
I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it yet.
Sounds like it’d be more useful for me to help on the TMail version
(since
that’s what I’d end up using anyway)… is that code posted anywhere yet,
or
would you be willing to send/post it? It’s not in the Gateway topic,
and
your search box is, uh, broken.
I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.
Yeah, that’s just totally broken. I mean, it’s technically legal MIME,
but
pointless.
and that
(b) SpamAssassin doesn’t actually assign any points for HTML e-mail
or, more accurately, it assigns zero points.
My apologies. I thought for sure I had seen a reference to that
sometime in the past, but I’ve been unable to dig it up this
morning. I stand corrected.
IIRC correctly the rule used to have some points attached to it, but
somewhere along the way the mass-checks stopped determining it to be a
useful rule. That’s often what happens with SA; the scores are all
determined with some fancy AI code I think.
Do you have any stats for the percentage of mailing list posts that
don’t make it to comp.lang.ruby?
I just did a simple grep of the logs for a period of a little over
the last month. It looks like we average about eight rejected
messages a day (for an “HTML post” reason).
If it’s common knowledge that one needs to post in text only, then I
don’t mind letting the gateway act as a filter for those who can’t
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I’d like to receive.
Well, I wrote a blog post about it and reference it whenever the
discussion comes up.
Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?
Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.
this works for any enumerable thing you want to process with ‘n’
backend threads.
the current (0.5.0) alib version will blow up if you give it an IO
object though, as it uses #size to calculate the return value. i’ll
tweak it and release 0.5.1 today.
I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I’ve just been distracted with work
deadlines and other short term projects, so I haven’t completed it
yet.
Sounds like it’d be more useful for me to help on the TMail version
(since that’s what I’d end up using anyway)… is that code posted
anywhere yet, or would you be willing to send/post it?
It’s not yet online. I am happy to put it up, sure. I really need
to get through two projects before I get to that though. Please give
me a few weeks.
It’s not in the Gateway topic, and your search box is, uh, broken.
It seems to work OK for me. Feel free to email me the details off-
list and I’ll sure try to fix it.
I’ve seen some pretty crazy things in messages sent to Ruby T…
One of those is multipart/alternative with no text/plain component.
I don’t think there’s too much loss in not supporting such setups
though.
Yeah, that’s just totally broken. I mean, it’s technically legal
MIME, but pointless.
We see quite a few broken posts pass through the gateway in quite a
few different ways. Welcome to the Internet.
Each thread should read a line of the file and process it, but no 2
threads should get the same line.
Why are you doing this in the first place? Do you have a computer with
five
processors and five memory buses?
I’m stress-testing my http server… Since most of the time is spent
waiting for requests to go and come back, multiple threads on the
sending end allows for greater throughput (to a point anyway). The
number 5 was just an example of n where n > 1.
Actually, the example provided won’t even work in your case. You have
to do some extra things.
I’m pretty new to ruby
A Queue is a first in first out container, which means the items you
push() into one end of the Queue are the first items that pop() out the
other end. A Queue is also thread safe, which means that only one
thread can access it at the same time.
Slapping forehead… of course! Producer/Consumer = q.