Performance: for the sorts of exploratory programming I do, performance is important. For instance, one thing I did when trying to figure out the code was match all substrings of one watermark against another, to see if there were commonalities. This is O(N^3) and was tolerably fast in Python, but Arc would be too painful.
Here's a take on it. It isn't so painful, probably because it only takes about O(N^2) time. :)
(def commonalities-at (a b bstart (o threshold 1))
(accum acc
(withs (stop (min len.a (- len.b bstart))
run 0
bank [do (unless (< run threshold)
(acc:list run (- _ run)))
(= run 0)])
(when (< stop 0) (err "The start index was out of range."))
(for i 0 (- stop 1)
(if (is a.i (b:+ bstart i))
++.run
bank.i))
bank.stop)))
(def commonalities (a b (o threshold 1))
(accum acc
(forlen bstart b
(each (run offset) (commonalities-at a b bstart threshold)
(acc:list run offset (+ bstart offset))))))
(def show-top-commonalities (a b number-to-show (o threshold 1))
(each (run astart bstart) (firstn number-to-show
(sort (fn (a b) (> a.0 b.0))
(commonalities a b threshold)))
(pr "matched " run " chars at " astart " and " bstart ": ")
(write:cut a astart (+ astart run))
(pr "=")
(write:cut b bstart (+ bstart run))
(prn)))
-
arc> (show-top-commonalities w1 w1 10 10)
matched 1038 chars at 0 and 0: [snip]
matched 27 chars at 561 and 852: "CGGTAGATATCACTATAAGGCCCAGGA"="CGGTAGATATCACTATAAGGCCCAGGA"
matched 24 chars at 621 and 927: "GTTTTTTTGCTGCGACGTCTATAC"="GTTTTTTTGCTGCGACGTCTATAC"
matched 22 chars at 393 and 411: "TCATGACAAAACAGCCGGTCAT"="TCATGACAAAACAGCCGGTCAT"
matched 18 chars at 447 and 504: "TGACTGTGAAACTAAAGC"="TGACTGTGAAACTAAAGC"
matched 18 chars at 429 and 528: "TCATAATAGATTAGCCGG"="TCATAATAGATTAGCCGG"
matched 18 chars at 546 and 1002: "AGTCGTATTCATAGCCGG"="AGTCGTATTCATAGCCGG"
matched 16 chars at 677 and 971: "GCGGCACTAGAGCCGG"="GCGGCACTAGAGCCGG"
matched 15 chars at 620 and 665: "AGTTTTTTTGCTGCG"="AGTTTTTTTGCTGCG"
matched 15 chars at 318 and 465: "TACTAATGCCGTCAA"="TACTAATGCCGTCAA"
nil
arc> (do1 nil (time:commonalities w1 w1 10))
time: 2494 msec.
nil
arc> (do1 nil (time:commonalities w1 w1))
time: 4657 msec.
nil
-
arc> (show-top-commonalities w1 w2 10 10)
matched 11 chars at 339 and 390: "GCTGTGATACT"="GCTGTGATACT"
matched 10 chars at 799 and 843: "TAGCAATAAG"="TAGCAATAAG"
matched 10 chars at 318 and 456: "TACTAATGCC"="TACTAATGCC"
matched 10 chars at 338 and 575: "TGCTGTGATA"="TGCTGTGATA"
matched 10 chars at 69 and 768: "TGATAAATAA"="TGATAAATAA"
nil
arc> (do1 nil (time:commonalities w1 w2 10))
time: 1622 msec.
nil
arc> (do1 nil (time:commonalities w1 w2))
time: 3264 msec.
nil
To summarize, I started off in Arc, switched to Python when I realized it would take me way too long to figure out the DNA code using Arc, and then went back to Arc for this writeup after I figured out what I wanted to do. In other words, Python was much better for the exploratory part.
Yeah, I know just what you're talking about there. Still, it wasn't long ago that I found my ideas easiest to express in Java, so I think familiarity has an awful lot to do with it. I'm afraid even Arc's error messages can be an acquired taste. :-p