Peter Norvig wrote a simple spelling corrector in 20 lines of Python
2.5,
so I thought I’d see what it looks like in Ruby. I’m not too pleased
with
my version, if anyone can make it more elegant, that would be great.
Some
of the sticking points were:
-
List comprehensions in Python made the edits1 function more elegant
IMO. Hopefully someone can improve that function. -
The boolean expression in the correct function evaluates empty
sets/arrays as false in Python but not in Ruby, so I had to add the
“result.empty? ? nil : result” expression to several functions. I expect
there’s a better way to handle this also. -
Minor point, but apparently Python has a built in constant for the
set
of lower case characters “string.lowercase”, so I just defined a
constant.
Otherwise, the translation was pretty straightforward.
Here’s a link to Norvig’s page: How to Write a Spelling Corrector
That page includes a link to a text file that I saved locally as
holmes.txt: http://www.norvig.com/holmes.txt
Note: I wrapped a few of the longer lines for posting.
def words text
text.downcase.scan(/[a-z]+/)
end
def train features
model = Hash.new(1)
features.each {|f| model[f] += 1 }
return model
end
NWORDS = train(words(File.new(‘holmes.txt’).read))
LETTERS = ‘abcdefghijklmnopqrstuvwxyz’
def edits1 word
n = word.length
deletion = (0…n).collect {|i| word[0…i]+word[i+1…-1] }
transposition = (0…n-1).collect {
|i| word[0…i]+word[i+1,1]+word[i,1]+word[i+2…-1] }
alteration = []
n.times {|i| LETTERS.each_byte {
|l| alteration << word[0…i]+l.chr+word[i+1…-1] } }
insertion = []
(n+1).times {|i| LETTERS.each_byte {
|l| insertion << word[0…i]+l.chr+word[i…-1] } }
result = deletion + transposition + alteration + insertion
result.empty? ? nil : result
end
def known_edits2 word
result = []
edits1(word).each {|e1| edits1(e1).each {
|e2| result << e2 if NWORDS.has_key?(e2) }}
result.empty? ? nil : result
end
def known words
result = words.find_all {|w| NWORDS.has_key?(w) }
result.empty? ? nil : result
end
def correct word
(known([word]) or known(edits1(word)) or known_edits2(word) or
[word]).max {|a,b| NWORDS[a] <=> NWORDS[b] }
end
Brian A.