Clojure Programming/Examples/Norvig Spelling Corrector

      Peter Norvig wrote an excellent example of a spelling corrector in Python. You can see How to Write a Spelling Corrector for a more detailed explanation.

      This code, translated into Clojure by language creator Rich Hickey, demonstrates the Python-like brevity of Clojure:

      (defn words [text] (re-seq #"[a-z]+" (.toLowerCase text)))
       
      (defn train [features]
        (reduce (fn [model f] (assoc model f (inc (get model f 1)))) {} features))
       
      (def *nwords* (train (words (slurp "big.txt"))))
       
      (defn edits1 [word]
        (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
          (distinct (concat
            (for [i (range n)] (str (subs word 0 i) (subs word (inc i))))
            (for [i (range (dec n))]
              (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i))))
            (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i))))
            (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i)))))))
       
      (defn known [words nwords] (set (for [w words :when (nwords w)]  w)))
       
      (defn known-edits2 [word nwords] (set (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)]  e2)))
       
      (defn correct [word nwords]
        (let [candidates (or (known [word] nwords) (known (edits1 word) nwords) 
                             (known-edits2 word nwords) [word])]
          (apply max-key #(get nwords % 1) candidates)))
      

      To use:

      (correct "misstake" *nwords*)
      (correct "speling" *nwords*)
      

      Note: training with big.txt may require the jvm to be started with a heap bigger than its default. You can increase the heap size either by starting java with the "-server" option or setting the maximum heap size explicitly: "java -Xmx128m ..."

      Last modified on 5 July 2012, at 11:34