This implementation is not quite correct because it doesn't do insertion at the end. If the last letter of a word is missing, the Arc will correct it by adding a letter at the next-to-last place and then swapping the last two letters; however, if the word has another error, the Arc won't correct it, whereas the Python will.
Whether this is a serious problem is open to question. However, it does mean that the comparison with Python is not apples to apples.
Incidentally, in case anyone cares, here's how it looks in rkts-lang,* a little toy language I've been working on:
nwords = words "big.txt" . map downcase . Hap.counts
alphabet = "abcdefghijklmnopqrstuvwxyz"
edits1 w = Iter as c:
n = size w
for i of 0 to n-1: c (w[below i] ... w[above i])
for i of 0 to n-2: c (w[i to i+1] @ w[i+1 downto i])
for i of 0 to n-1: for l of alphabet: c (w[i] @ l)
for i of 0 to n: for l of alphabet: c (w[below i] ... l ... w[upfrom i])
edits n w = iterate (join edits1) {w} . take n+1
correct w = edits 2 w . map (keep nwords[]) . find ~empty . greatest nwords[]
If we permit the pg bug, then edits1 can be shortened a bit:
edits1 w = Iter as c:
for i of keys w:
c (w[below i] ... w[above i])
if i > 0: c (w[i-1 to i] @ w[i downto i-1])
for l of alphabet: c (w[i] @ l); c (w[below i] ... l ... w[upfrom i])
(* Suggestions for a catchy name may be rewarded.)
I find code easier to read when the names are easily spoken aloud. In the language I'm working on (nothing serious) all the common names are either English words or things that look like English words. I also favor short words over long words and Germanic words over Latin words, and I avoid abbreviations as much as possible (except in local variables, which are usually single letters). It helps me a lot.
As to car/cdr specifically, though, I think the best option is pattern matching.
Well, cdr can't really be spoken aloud unless you make up a way to pronounce it (which of course people do). And you could do the same for hp, rp, and the rest... adding an "o" sound gives some nice words, like hop, top, and fop. Kind of a Dr. Suess feel. :)
Yes, but IMHO they are omogeneous with the other abbrevs in Arc sintax and are also conceptually clearer than car/cdr (that refer to old registers of an old pc): for example, hp (head of pair), tp (tail of pair) and hthtp (head tail head tail of pair) may help conceptual visualization...
Very nice. I don't fully understand it yet, but it looks like it may be the best solution to the hygiene problem I've seen. Two questions:
1. Because #`() doesn't produce lists, you can't use standard list manipulation functions on the resulting code. I tried writing this macro http://arclanguage.org/item?id=7773 in your system but couldn't because the macro produces expressions with mappend, which doesn't work on your "vec" type. Is there a way to fix this?
2. Is it possible for a macro to refer to variables in the calling environment? Like this:
This is of dubious utility, but still. I thought I could make it work in your system by changing counter to #,'counter, but then I get a cryptic error. Is this a bug or am I misunderstanding something?
Both of these can be done; they are in fact both instances of the "exceptional cases" I talk about under "Hygienic quasiquotation". I should probably have provided examples, so I'll do so now.
In order to do (1), you have to break up the stuff that's getting mappended. It needs to be a list, but it needs to contain syntactic unquotations. I think an example will explain better than words here, so here's your macro written with my hygienic quasiquotes:
(mac bias args
(withs (ws (map car pair.args)
xs (map cadr pair.args)
us (map [uniq] ws))
#`(with ,(mappend (fn (u w) `(,u #,w)) us ws)
(let r (rand (+ ,@us))
(if ,@(mappend
(fn (u x) (list `(< (-- r ,u) 0) `#,x))
us xs))))))
To be honest, this macro doesn't benefit much from hygiene, since you still need to call 'uniq. However, if I were to write this, I'd do it by putting the biasing logic in a function and wrapping it in a macro, which considerably reduces the amount of wrangling you have to do to get it to work with my system:
(def bias-elt (weight lst)
(let r (rand (apply + (map weight lst)))
(catch:each e lst
(if (< (-- r weight.e) 0) throw.e))))
(mac bias args
#`((cadr:bias-elt car
(list ,@(mapeach (w x) pair.args
`(list #,w (fn () #,x)))))))
To implement (2), what you do is quite simple, and it's obvious why it works, but it's a bit nonintuitive:
Oops, my example for (2) is broken. I see why it doesn't work, and it's rather interesting. It doesn't work because '++ is a _macro_ that expands to '(= <counter> (+ 1 <counter>)), where <counter> is the syntactic closure of 'counter. Then '= in turn gets macroexpanded, but '= does its own examination of the code to determine what to do (because of setter functions, etc) and it isn't written with closures in mind. I may need to look into this.
This does, however, serve to highlight the main problem with adding hygiene this way: it adds a new kind of data structure (closures) to what is considered "code", and anything that manipulates could need to be updated to handle this new case.
Wow, that's confusing. I was hoping that #` and #, could fully replace ` and , (at least in macros), so you don't have to juggle the two kinds of quasiquote.
Would it be possible to write an mappend that joins hygienically-quoted lists? It would return an ordinary list with each element inside a closure. Then we could write the first example without using the ordinary quasiquote.
Ok, I think I get it now. If we gave hygienically-quoted expressions to mappend, they would be in a different closure than the surrounding code and therefore couldn't see the variables it introduced (r, in our case). So I guess you might say that #` should usually be used at the start of a macro, and ` after that. Still confusing.
It's true, it's not quite as intuitive as I'd like it to be: I, too, originally envisioned it entirely replacing normal quasiquoting for macros, but this turned out more difficult than I expected. It might be possible to make it more intuitive, but at the moment I'm not really working on it; this is an after-the-fact explanation more than a design document. I'm presently in the process of creating a low-level systems programming language with Arc (or at least a Lisp) as its metalanguage; when I get the the problem of adding macros to that language I'll probably give this another look.
Here's mine. It seems to be faster than the versions presented so far, assuming I haven't made any dumb mistakes. You'll have to put
(xdef 'sub substring)
in your ac.scm first.
(= wordfile "big.txt")
(= *nwords* (table))
(w/infile f wordfile (whilet l readline.f (counts (tokens downcase.l whitec) *nwords*)))
(= str [coerce _ 'string])
(= alphabet (accum a (each c "abcdefghijklmnopqrstuvwxyz" (a str.c))))
(def edits1 (w)
(let n len.w
(accum a
(for i 0 (- n 1) (a:+ (sub w 0 i) (sub w (+ i 1))))
(for i 0 (- n 2) (a:+ (sub w 0 i) (str:w (+ i 1)) (str:w i) (sub w (+ i 2))))
(for i 0 (- n 1) (each c alphabet (a:+ (sub w 0 i) c (sub w (+ i 1)))))
(for i 0 n (each c alphabet (a:+ (sub w 0 i) c (sub w i)))))))
(def known (ws) (keep [*nwords* _] ws))
(def known-edits2 (w)
(accum a (each e edits1.w (each e edits1.e (if *nwords*.e a.e)))))
(def correct (w)
(best (compare > [*nwords* _])
(or (known:list w) (known:edits1 w) (known-edits2 w))))
The main optimizations I made were replacing cut with sub, replacing string with +, and dropping the dedup.
Well, not that anyone cares, but it turns out that the changes I made here do close most of the performance gap between Arc and Python. Python is still faster, but it's nothing to lose sleep over. The most important change was replacing Arc's cut with MzScheme's substring, which works the same way but is implemented in C. The other changes were important too, though, especially replacing lists with iterators. So I guess the lessons are
1. Any Arc library function that has an MzScheme equivalent should be rewritten to use the MzScheme version.
2. Using iterators to reduce consing is a win. So, maybe we should make a library of common iterator utilities (map, filter, etc.)?
Those proposals shorten fns by at most three characters. Are multi-arg fns used often enough to warrant this? news.arc contains 23 multi-arg fns in 1769 lines of code; therefore they would save about 1 char every 26 lines.
That would be ok if the proposals were simple and elegant, but personally I find them hackish and inconsistent with the rest of the language. They also don't fully replace fn because they lack an equivalent for (fn args ...).
Here's my idea: just replace 'fn with a special symbol, like \. This seems to work:
--- brackets0.scm 2008-11-11 17:06:01.000000000 -0600
+++ brackets.scm 2008-11-11 17:06:17.000000000 -0600
@@ -18,7 +18,8 @@
; a readtable that is just like the builtin except for []s
(define bracket-readtable
- (make-readtable #f #\[ 'terminating-macro read-square-brackets))
+ (make-readtable #f #\[ 'terminating-macro read-square-brackets
+ #\\ 'non-terminating-macro (lambda _ 'fn)))
; call this to set the global readtable
personally, I think that (fn (a b) (+ a b)) is more readable than (\(a b) (+ a b)), and readability matters much more than number of characters.
Also, the [:] form could save more characters, if it automatically applied the outer set of parens to the body form.
However, I don't think it's really that much of an improvement; fn works well enough unless you really like extra syntax.
What was it the original poster wanted, anyway? It sounded like something that was more readable than _1 etc. for the var names; thus my dredging of the old thread. If not, then obviously, it wouldn't be a good choice. Maybe the [:] form should be capable of only naming some of the args, and leaving the rest to the other naming convention? Then the [] form can name the first n arguments by putting them before a :, and have the args after that referenced by $, $0, $1, $2, etc. or some better character set, if _ looks bad.
It's only a problem if fns of two or more args are common, and they don't seem to be. In news.arc, srv.arc and blog.arc they appear once every 123 lines. In my CL code they appear every 250 lines. Are they more common in your code?
You're right from the standpoint that I hadn't noticed the similarity and should have. Your solution is probably the best yet for the example I gave.
But it's different in that reduce only works on simple lists- My main motivation for this type of command is for working with arbitrary and complex data structures, not just lists.
I guess I should clarify this a bit. In other languages (e.g. OCaml), it's conventional for each data structure to provide an iterator and a fold/reduce function. In Arc, the convention seems to be to provide iterators but not folds. The above code shows that it's sufficient only to have iterators, and a left fold can be automatically derived.
Alternatively, we could have data structures provide a left fold and automatically derive an iterator:
1. Ports aren't changed randomly for no reason (cf. the Debian OpenSSL debacle). A port is just that--a port of a program to FreeBSD, not some attempt to incorporate it into a holistic user experience.
4. The BSD license means you can do what you want with the code.
5. The overall system is clean, simple and very well documented. Most of what you need to know is in the handbook or man pages.
The advantages are more evident for a server. For a desktop, I think BSD and Linux are both pretty awful compared with OS X, but FreeBSD is slightly less awful, overall.
Points 1 & 2 also apply to Source Mage GNU/Linux, and for point 5, man pages are good for most packages and for the system these days, there is a developing wiki maintained by the distro, and documentation that may be missing is made up for by the helpfulness of the people in the IRC channel.
Point 3 largely applies in that building every package you install is far more automated than in LFS, and most of the "spells" (ports, packages) are kept up to date regularly (Gnome included), and it's very easy to update any that may have been forgotten, or to add missing packages. It took me about 5 minutes to clone the mzscheme spell to "mzscheme352" for Arc use. OTOH you still have to deal with the behemoth that is the Linux kernel configuration. I cheated; I got hold of a Debian kernel and used it as a base for make oldconfig, later tweaking a little.
Your macro has problems with variable capture and multiple evaluation (see chapters 9-10 of On Lisp). Here's a version that should work properly:
(mac bias args
(w/uniq r
(withs (ws (map car (pair args))
xs (map cadr (pair args))
us (map [uniq] ws))
`(with ,(mappend list us ws)
(let ,r (rand (+ ,@us))
(if ,@(mappend
(fn (u x) `((< (-- ,r ,u) 0) ,x))
us xs)))))))
IMO, though, the use of a macro here is a premature optimization. I think you should try to get a function working first, and then wrap a macro around it if you know that's what you need. See my comment http://arclanguage.org/item?id=7760 for an example of such a wrapper macro (in CL, but the Arc is similar).