Management of words in a string

Hi All.

I’m trying to make a program in which you must enter a string and
calculate the number of words entered.

The problem is that you deal with whole words in a string, only handle
characters or letters.

As I can implement the above?

Thanks.

On Fri, Jul 6, 2012 at 5:52 AM, Joao S. [email protected] wrote:

Hi All.

I’m trying to make a program in which you must enter a string and
calculate the number of words entered.

The problem is that you deal with whole words in a string, only handle
characters or letters.

As I can implement the above?

You can use String#split method. You have to define very well what is
a word for you. For example, consider things like “one-way street” or
“it’s raining”, and also be careful with punctuation. A simplistic
approach could be just to use the default split behaviour, which
splits by the spaces:

s = “this has words. how many? let’s see”
[5] pry(main)> s.split
=> [“this”, “has”, “words.”, “how”, “many?”, “let’s”, “see”]
[6] pry(main)> s.split.size
=> 7

You can pass a regular expression to the split method to tune how you
split.

Jesus.

Hi,

Joao S. wrote in post #1067618:

As I can implement the above?

For large text you may use String#scan, which has the advantage of not
collecting all words in an array like String#split does:

input_text = ‘This is a sentence.’
word_count = input_text.strip.scan(/\s+/).size + 1

But like Jesus already said, this simple approach will not always work.
If the “words” in your text may contain whitespace, then looking for
whitespace will obviously fail. You’ll have to use a dictionary in this
case. This would also cover errors (missing or superfluous whitespace).

On Fri, Jul 6, 2012 at 10:06 AM, Jan E. [email protected] wrote:

Hi,

Joao S. wrote in post #1067618:

As I can implement the above?

For large text you may use String#scan, which has the advantage of not
collecting all words in an array like String#split does:

word_count = 0
input_text.scan(/\w+/){ word_count += 1}

input_text = ‘This is a sentence.’
word_count = input_text.strip.scan(/\s+/).size + 1

I don’t think this usage of #scan is a good approach, because it will
yield totally wrong results:

irb(main):002:0> input_text = ‘. : & #’
=> “. : & #”
irb(main):003:0> input_text.strip.scan(/\s+/).size + 1
=> 4

Whereas positive matching sequences of word characters is much closer
to the reality:

irb(main):004:0> input_text.scan(/\w+/).size
=> 0

But like Jesus already said, this simple approach will not always work.
If the “words” in your text may contain whitespace, then looking for
whitespace will obviously fail. You’ll have to use a dictionary in this
case. This would also cover errors (missing or superfluous whitespace).

It’s crucial to clarify the definition of “word”, I agree.

Kind regards

robert

“Jesús Gabriel y Galán” [email protected] wrote in post
#1067632:

You can use String#split method. You have to define very well what is
a word for you. For example, consider things like “one-way street” or
“it’s raining”, and also be careful with punctuation. A simplistic
approach could be just to use the default split behaviour, which
splits by the spaces:

s = “this has words. how many? let’s see”
[5] pry(main)> s.split
=> [“this”, “has”, “words.”, “how”, “many?”, “let’s”, “see”]
[6] pry(main)> s.split.size
=> 7

You can pass a regular expression to the split method to tune how you
split.

Jesus.

and in case you want to count the words that begin with a particular
letter (for example “a”).

##############################################
ct=0

print "Enter a string: "
str=gets.chomp.to_s

puts “Word ==> #{str.split}”

if str.chr == “a”
ct=ct+1
end

puts “Number of words that start with a: #{ct}”

#################################################

str.scan(/a\w+/).size

yeah sorry i was dump.

str = “bag of bananas and one apple”
str.scan(/\Wa\w+/).size
=> 2

Still wrong, sorry Hans :frowning:

str = “apple and banana”
str.scan(/\Wa\w+/).size
=> 1

A correct regex would be (I hope I don’t get it wrong now) /\ba\B/.

– Matma R.

Hans M. wrote in post #1067740:

str.scan(/a\w+/).size

Clearly wrong.

str = “bag of bananas”
str.scan(/a\w+/).size
=> 2

Hans M. wrote in post #1067783:

hm still wrong, the best thing i could do is this:

Try \ba\w*

(\b = word boundary)

hm still wrong, the best thing i could do is this:

str = “a bag of bananas and one apple”
str.scan(Regexp.union(/^a\w*/,/\Wa\w*/))
=> [“a”, " and", " apple"]