Here's mine. It seems to be faster than the versions presented so far, assuming I haven't made any dumb mistakes. You'll have to put
(xdef 'sub substring)
in your ac.scm first.
(= wordfile "big.txt")
(= *nwords* (table))
(w/infile f wordfile (whilet l readline.f (counts (tokens downcase.l whitec) *nwords*)))
(= str [coerce _ 'string])
(= alphabet (accum a (each c "abcdefghijklmnopqrstuvwxyz" (a str.c))))
(def edits1 (w)
(let n len.w
(accum a
(for i 0 (- n 1) (a:+ (sub w 0 i) (sub w (+ i 1))))
(for i 0 (- n 2) (a:+ (sub w 0 i) (str:w (+ i 1)) (str:w i) (sub w (+ i 2))))
(for i 0 (- n 1) (each c alphabet (a:+ (sub w 0 i) c (sub w (+ i 1)))))
(for i 0 n (each c alphabet (a:+ (sub w 0 i) c (sub w i)))))))
(def known (ws) (keep [*nwords* _] ws))
(def known-edits2 (w)
(accum a (each e edits1.w (each e edits1.e (if *nwords*.e a.e)))))
(def correct (w)
(best (compare > [*nwords* _])
(or (known:list w) (known:edits1 w) (known-edits2 w))))
The main optimizations I made were replacing cut with sub, replacing string with +, and dropping the dedup.
Well, not that anyone cares, but it turns out that the changes I made here do close most of the performance gap between Arc and Python. Python is still faster, but it's nothing to lose sleep over. The most important change was replacing Arc's cut with MzScheme's substring, which works the same way but is implemented in C. The other changes were important too, though, especially replacing lists with iterators. So I guess the lessons are
1. Any Arc library function that has an MzScheme equivalent should be rewritten to use the MzScheme version.
2. Using iterators to reduce consing is a win. So, maybe we should make a library of common iterator utilities (map, filter, etc.)?