I have been working my way through a ruby book (Beginning Ruby) and I
want to extend on an interesting capability dealing with hashes.
the code:
text=’’
line_count=0
File.open(“txt.txt”).each do |line|
line_count +=1
text << line
end
puts “#{line_count} lines”
total_charachters=text.length
puts “#{total_charachters} charachters”
sentence_count=text.split(/.|?|!/).length
total_characters_no_spaces=text.gsub(/\s+/,"").length
puts “#{total_characters_no_spaces} without spaces”
word_count=text.split.length
puts “#{word_count} words in the text and #{sentence_count} sentences”
paragraph_count= text.split(/\n\n/).length
puts “#{paragraph_count} paragraphs”
puts “#{sentence_count/paragraph_count} sentences per paragraph on
avarage”
puts “#{word_count/sentence_count} words per sentence”
stop_words= %w{a the by on for of are with just but and to the my has
some in}
words=text.scan(/\w+/)
keywords=words.select{|word| !stop_words.include?(word)}
puts “#{((keywords.length.to_f/words.length.to_f)*100).to_i}% non stop
words”
this has been a fun code, and I have been running various text files
through it.
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
On Thu, Jun 18, 2009 at 10:40 PM, Steven D.
[email protected]wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
Posted via http://www.ruby-forum.com/.
Super quick and dirty, but should get you started:
words = {}
File.open(“txt.txt”).each do |line|
line.split(’ ').each { |w| words.has_key?(w) ? words[w] += 1 :
words[w] =
1 }
end
words.sort_by { |e| e[1]}.reverse.each { |k, v| puts “#{k}: #{v}”}
Steven D. [email protected] wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a “histogram” and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
“word”. If you’re willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
http://www.apeth.com/ruby/02justenoughruby.html
To sort, add this line:
wds = h.sort {|x,y| x[1] <=> y[1]}
Note that the concept “sort a hash” has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
Matt N. wrote:
Steven D. [email protected] wrote:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
That is called a “histogram” and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
“word”. If you’re willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:
Just Enough Ruby
To sort, add this line:
wds = h.sort {|x,y| x[1] <=> y[1]}
Note that the concept “sort a hash” has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn’t put a third value in an
array.
thanks for the help. I am going to check out the web page.
(Never knew of Histogram. Learn something new every other day or so.)
Hi –
On Fri, 19 Jun 2009, Matt N. wrote:
Ruby tutorial chapter here:
Just Enough Ruby
To sort, add this line:
wds = h.sort {|x,y| x[1] <=> y[1]}
Or, slightly more compact:
wds = h.sort_by {|x| x[1] }
Note that the concept “sort a hash” has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
In 1.9 hashes are ordered, but by key-insertion order. You can’t
change the order, so you can’t sort back into a hash (unless you
create a new hash manually using the sorted order).
David
Hi –
On Fri, 19 Jun 2009, Steven D. wrote:
Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn’t put a third value in an
array.
It’s more that you sort the hash into an array of two-element arrays,
and then sort that array. Iterating through an array of two-element
arrays is similar to iterating through a hash, in the sense that each
iteration yields two values.
David
On Fri, Jun 19, 2009 at 3:09 PM, David A. Black[email protected]
wrote:
That is called a “histogram” and is one of the most common examples
create a new hash manually using the sorted order).
IIRC the insertion order is maintained correctly for literals and Hash[]
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.
Cheers
Robert
–
Toutes les grandes personnes ont d’abord été des enfants, mais peu
d’entre elles s’en souviennent.
All adults have been children first, but not many remember.
[Antoine de Saint-Exupéry]
Hi –
On Sat, 20 Jun 2009, Robert D. wrote:
wds = h.sort {|x,y| x[1] <=> y[1]}
change the order, so you can’t sort back into a hash (unless you
create a new hash manually using the sorted order).
IIRC the insertion order is maintained correctly for literals and Hash[]
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.
You’d want to throw in a flatten(1) to unwrap the inner arrays:
[*hash.sort_by {…}.flatten(1)]
David
On Sat, Jun 20, 2009 at 2:28 PM, David A. Black [email protected]
wrote:
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Ooops, that was yet another mistake, thx for telling me David.
IIRC the insertion order is maintained correctly for literals and Hash[]
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.
You’d want to throw in a flatten(1) to unwrap the inner arrays:
[*hash.sort_by {…}.flatten(1)]
well spotted.
R.