Please explain in English

SSkonishi_takurou_S · January 28, 2013, 6:39pm

I’m learning Ruby and I’m reading some expression that I saw on the
forum. I’m coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it’s a Function
that takes string and count words to return a Hash.

def count_words(string)
res = Hash.new(0)
string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
return res
end

narazana · January 28, 2013, 6:46pm

For those more versed than myself, I have a follow on question (thank
for
posting this Jooma).

In this example can’t you get rid of return res?

Wayne

----- Original Message ----
From: jooma lavata [email protected]
To: ruby-talk ML [email protected]
Sent: Mon, January 28, 2013 11:40:24 AM
Subject: Please explain in English

I’m learning Ruby and I’m reading some expression that I saw on the
forum. I’m coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it’s a Function
that takes string and count words to return a Hash.

def count_words(string)
res = Hash.new(0)
string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
return res
end

narazana · January 28, 2013, 6:55pm

On Mon, Jan 28, 2013 at 6:39 PM, jooma lavata [email protected]
wrote:

end
That’s not a very idiomatic way, because the result of the map
function, which returns an array, is ignored. This signals that map is
not the correct method to use. Now, with that said:

string.downcase #=> returns a new string with all the characters
downcased
.scan(/\w+/) #=> return an array of strings with each match of the
regular expression. \w+ means: one or more word characters, so this
should return an array of words.
.map #=> returns a new array where each position is filled with the
result of invoking the block with each element of the array. Example:

[1,2,3].map {|x| “x is #{x}”} #=> [“x is 1”, “x is 2”, “x is 3”]

res[word] = string.downcase.scan(\b#{word}\b/).size

What this means is, take the string, downcase it again, scan it for
the current word surrounded by word boundaries (so, whole word), take
the size of that array and place it in the hash under the key for this
word.
This is extremely inefficient, since, first of all, for each word it’s
downcasing the string again, and then scanning for each word through
the full string again (which you are already doing). So this seems to
be O(N^2), where a single pass through the string should suffice.
Also, the block-less form of scan and using map like that is creating
many intermediate objects that are not used.

I’d do something like:

res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res

This uses the block form of scan, which instead of building an array,
just yields each match to the block. Since we are not doing anything
with that array, this is more efficient. We take advantage of the
default value of hash, which is set to 0, to just increment the count
for each word.

Hope this helps,

Jesus.

narazana · January 28, 2013, 6:57pm

Regex is critical to this one. \w is word boundary. Scan returns
everything
that matches that regex with a boolean true.

Down case isn’t necessary. The word count would be the same either way.

Now if you just want to count words you don’t even need that hash. If
you’re trying to count instances of words that’s a different story.

Suggested reading: Enumerables, Blocks, Scan, Inject, and Reduce.

Enumerable covers most of those. Read the Ruby docs.

Seeing as I’m on my phone at the moment, could someone else rewrote that
code a bit? It’d look all types of funky if I did right now.

Cheers.

narazana · January 28, 2013, 6:58pm

nevermind… Now I see what’s going on. (just had to run it in irb and
look at
the results with and without the return res).

----- Original Message ----
From: Wayne B. [email protected]
To: ruby-talk ML [email protected]
Sent: Mon, January 28, 2013 11:45:45 AM
Subject: Re: Please explain in English

For those more versed than myself, I have a follow on question (thank
for
posting this Jooma).

In this example can’t you get rid of return res?

Wayne

narazana · January 28, 2013, 6:59pm

On Mon, Jan 28, 2013 at 6:45 PM, Wayne B. [email protected]
wrote:

For those more versed than myself, I have a follow on question (thank for
posting this Jooma).

In this example can’t you get rid of return res?

You could, using inject, but some people might say this is less
readable, and also creates some intermediate object that is not really
needed:

string.downcase.scan(/\w+/).inject(Hash.new(0)) {|h, word| h[word] += 1;
h}

Jesus.

narazana · January 28, 2013, 7:00pm

I’ll try to break it down, let us know if there’s anything further that
needs clarifying.

#Declare a method with one argument
def count_words(string)

#Create an empty Hash (aka Dictionary) to modify it later
res = Hash.new(0)

#Convert the whole string to lowercase (returns a new object, doesn’t
modify in place)
string.downcase

#Use Regex to return each word ( “+” means until a non-word character)
as an enumerator
.scan(/\w+/)

#Iterate through each of the words and return (map) a new object (which
isn’t used in this case)
.map{|word|

#Populate the hash on each iteration (overwriting existing values)
res[word] =

#Get the “size” of the array returned by searching the string for all
instance of the current word
string.downcase.scan(/\b#{word}\b/).size}

#Explicitly return the hash (“return” isn’t strictly required as this is
the last line)
return res
end

I can’t helping feeling that there is a more efficient way to do this,
given that the loop iterates needlessly multiple times over the
duplicates.

This does the same thing (not sure whether it’s faster):

def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
string.scan(/\b#{word}\b/).size }
res
end

narazana · January 28, 2013, 9:15pm

On Mon, Jan 28, 2013 at 7:16 PM, Joel P. [email protected]
wrote:

“Jess Gabriel y Galn” [email protected] wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

I guess the reason is that you avoid the intermediate arrays.

Jesus.

narazana · January 28, 2013, 7:16pm

“Jesús Gabriel y Galán” [email protected] wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

narazana · January 29, 2013, 10:25am

On Jan 28, 2013, at 10:01 , Joel P. [email protected] wrote:

def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
string.scan(/\b#{word}\b/).size }
res
end

This modifies the argument coming in. Don’t ever call downcase! or other
mutating methods on an argument or you’ll wind up in debugging hell.
Make a copy instead:

string = string.downcase

narazana · January 29, 2013, 10:39am

Ryan D. wrote in post #1094171:

On Jan 28, 2013, at 10:01 , Joel P. [email protected] wrote:

def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
string.scan(/\b#{word}\b/).size }
res
end

This modifies the argument coming in. Don’t ever call downcase! or other
mutating methods on an argument or you’ll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Thanks, I thought that those two things were equivalent.
Doesn’t string = string.downcase overwrite the argument string anyway?

narazana · January 29, 2013, 11:45am

On Mon, Jan 28, 2013 at 6:55 PM, Jess Gabriel y Galn
[email protected] wrote:

res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res

And to answer Wayne’s question how to get rid of the “return”:

Hash.new(0).tap do |res|
string.downcase.scan(/\w+/) {|word| res[word] += 1}
end

Kind regards

robert

narazana · January 29, 2013, 10:25am

On Jan 28, 2013, at 12:13 , Jess Gabriel y Galn
[email protected] wrote:

On Mon, Jan 28, 2013 at 7:16 PM, Joel P. [email protected] wrote:

“Jess Gabriel y Galn” [email protected] wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

I guess the reason is that you avoid the intermediate arrays.

I suspect only scanning once is much more important than the extra
arrays.

narazana · January 29, 2013, 12:08pm

On Tue, Jan 29, 2013 at 10:25 AM, Ryan D. [email protected]
wrote:

I guess the reason is that you avoid the intermediate arrays.

I suspect only scanning once is much more important than the extra arrays.

Sure, you are right. I didn’t really read Joel’s proposal, and assume
he had removed the double scan.

Jesus.

narazana · January 29, 2013, 1:19pm

Ryan D. wrote in post #1094171:

This modifies the argument coming in. Don’t ever call downcase! or other
mutating methods on an argument or you’ll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Ah, I didn’t know that a bang method would also change the argument
outside of the current scope as well! Dangerous.

irb(main):001:0> a = ‘a’
=> “a”
irb(main):002:0> def t1(b)
irb(main):003:1> b.upcase
irb(main):004:1> end
=> nil
irb(main):005:0> def t2(b)
irb(main):006:1> b.upcase!
irb(main):007:1> end
=> nil
irb(main):008:0> t1 a
=> “A”
irb(main):009:0> a
=> “a”
irb(main):010:0> t2 a
=> “A”
irb(main):011:0> a
=> “A”

narazana · January 29, 2013, 6:00pm

As usual Robert, you’ve shown me a very elegant way to handle this!
Thanks!

Wayne

----- Original Message ----
From: Robert K. [email protected]

And to answer Wayne’s question how to get rid of the “return”:

Hash.new(0).tap do |res|
string.downcase.scan(/\w+/) {|word| res[word] += 1}
end

Kind regards

robert

narazana · January 29, 2013, 6:00pm

On Tue, Jan 29, 2013 at 1:19 PM, Joel P. [email protected]
wrote:

Ryan D. wrote in post #1094171:

This modifies the argument coming in. Don’t ever call downcase! or other
mutating methods on an argument or you’ll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Ah, I didn’t know that a bang method would also change the argument
outside of the current scope as well! Dangerous.

That’s why there is the exclamation mark in the first place. It means
“potentially dangerous method” (defined by Matz).

Btw, this does not have that much to do with scope but it’s rather
which object gets changed. All places in code which reference that
particular instance will notice the change once they use the object.

irb(main):008:0> t1 a
=> “A”
irb(main):009:0> a
=> “a”
irb(main):010:0> t2 a
=> “A”
irb(main):011:0> a
=> “A”

Yeah, String methods with exclamation mark typically change the
instance itself whereas the “less dangerous” brothers typically return
a modified instance.

Kind regards

robert

narazana · January 30, 2013, 8:16am

On Tue, Jan 29, 2013 at 6:38 PM, sasan sasgho [email protected]
wrote:

what to do ?

First of all, please do not hijack other threads. Then, please
explain what your goal is, i.e. what you want to achieve.

Kind regards

robert

narazana · January 29, 2013, 6:37pm

i have a project in netbeans 6.8. I created a global module so…

module SharedVariables
@prueba = 1

def variable
@prueba ||= 1
end

def variable= (var)
@prueba = var
end

end

this module are in global_var.rb file and want call this module by
other ruby file…

what to do ?

thanks

narazana · January 31, 2013, 8:30am

On Tue, Jan 29, 2013 at 3:16 PM, Wayne B. [email protected]
wrote:

As usual Robert, you’ve shown me a very elegant way to handle this! Thanks!

You’re welcome! But I think the elegance is rather due to language
and library design than me. Thank Matz!

Kind regards

robert