I asked about implementing Peter Norvig's simple spelling corrector in
Arc previously w/o much interest. However since then, I discovered
Clojure and found an implemention in Clojure (by Rich Hickey -
Clojure's inventor) that's one line shorter than Norvig's 22 line
Python program, and quite a bit shorter than my Ruby translation, that
shows off a little bit of Clojure. This seems like a reasonably small problem for an Arc challenge. And
with 3 versions to serve as a reference (Python, Ruby, Clojure), I
wouldn't think it would be that difficult for someone fluent in Arc (I am not). Can an Arc version match or exceed the Clojure version with respect to
brevity, beauty or other criteria of your choosing? Feel free to
assume the existence of re-seq and slurp - it's a language challenge,
not a library challenge. I'll paste some resource links into a comment so they're
clickable: (defn words [text] (re-seq #"[a-z]+" (. text (toLowerCase))))
(defn train [features]
(reduce (fn [model f] (assoc model f (inc (get model f 1))))
{} features))
(def *nwords* (train (words (slurp "big.txt"))))
(defn edits1 [word]
(let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
(distinct (concat
(for [i (range n)] (str (subs word 0 i) (subs word (inc i))))
(for [i (range (dec n))]
(str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i))))
(for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i))))
(for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i)))))))
(defn known [words nwords] (for [w words :when (nwords w)] w))
(defn known-edits2 [word nwords]
(for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)] e2))
(defn correct [word nwords]
(let [candidates (or (known [word] nwords) (known (edits1 word) nwords)
(known-edits2 word nwords) [word])]
(apply max-key #(get nwords % 1) candidates)))
user=> (correct "misstake" *nwords*)
"mistake"
|