"Adding continuations to a language implementation seems tough."
The more I think about it, the more I feel like implementing my interpreter in Ruby, which has call/cc, and proper lambdas (unlike Python's terrible ones). And then, I could name it Arubic. Awesome name, right?
Pretty awesome, yeah. XD It would actually make it less useful to me personally though, since my cheap web host doesn't support Ruby. >.>
Anyhow, I think Ruby's continuation support is a bit sketchy. There's no equivalent of Scheme's 'dynamic-wind built into the language as far as I know, so it may be necessary to use a patch like this one, which replaces callcc with a dynamic-wind-aware equivalent: https://github.com/mame/dynamicwind
Since there's no dynamic-wind, I haven't looked to see if there's a way to do continuation-friendly dynamic scope, like Racket's parameters. If that callcc replacement works though, it should be possible to implement parameters on top of Ruby, just maybe not that efficiently. :-p
Racket does a bunch of other interesting stuff with continuations, such as continuation marks, continuation guards, and... maybe that's it... and we'd lose those, but I think they're mainly useful for performance and well-specified 'letrec behavior, and who cares about that stuff? :-p
You might also find writing an Arc interpreter in Racket or even Arc3.1 a lot of fun:
- you'll get continuations, threads, and parameters that actually work
- you'll be able to easily implement things like fexpr's, first class macros, or serializable closures in your interpreter that Arc3.1 doesn't have (hard to do in a compiler)
- you can write parts of your program in Arc3.1 that you want to run fast (and doesn't need your extra interpreter features) and other parts using your Arc interpreter, and it can all work together
Yeah, I also considered that, or Chicken Scheme, or Common Lisp, but honestly... as much as I may like Lisp... Scheme and Common Lisp just feel... bloated. I love Arc's shortness and simplicity, and Python has it too, I just hate the way Python behaves sometimes (curse Python's lambdas and lack of proper closures).
I admit that would certainly be an easier choice, in some ways. I do like the idea of writing an Arc interpreter in Arc... but from my initial tests, Arc seems quite slow with string/list processing. Maybe it was just my program. Isn't that already being handled with arcc, though?
Ah, if by "arcc" you're referring to https://github.com/nex3/arc/tree/arcc, that's an Arc compiler (replacement for ac.scm) written in Arc. Thus while it produces code that runs faster than interpreting code would, it has the same limitations as ac.scm.
Arc seems quite slow with string/list processing
Do you have an example handy? ("Here's a program in Python, and here's the program in Arc, and see how Arc is slower"). ...There may be things we can do to speed Arc up, but having a runnable example is useful. I.e., I might do something that makes my Arc program run faster, but it might not necessarily make your Arc program run faster.
Oh, at first glance it seemed to behave somewhat like an interpreter, my mistake.
---
Yeah, I do, actually. I wrote a program that can take an (ad-hoc and simple) playlist format and convert it into .xspf. I then rewrote the program in Arc (which was much nicer than writing it in Python), but it ended up being drastically slower. Which means I'm not sure if it's a problem in Arc, or Racket, or my program! It could be any combination of those three.
The program itself should have O(n^2) complexity, due to error checking (which actually isn't even implemented in the Arc version...) If I got rid of error checking, I could get it down to O(n log n) but I don't want to do that.
In any case, the complexity should be the same for both Python and Arc. If I recall, the slowness was primarily caused by me searching for whether one string is a substring of another string. Python has string.find (which performs faster than regexps, when I tried it), but I'm guessing Arc's implementation is slow.
This is all whishy-washy guessing, though. I haven't done concrete tests or anything, so take it with a grain of salt. However, I'm vaguely interested in finding out what the performance bottleneck is, and possibly fixing it.
---
Edit: I just tried these:
# Python
for item in range(100000):
"foobarquxcorge".find("foobar") != -1
; Arc
(repeat 100000 (posmatch "foobar" "foobarquxcorge"))
And got some very weird results... the Python version consistently takes about 2 seconds. When I first tried the Arc version, it took something like a minute or two. Aha! So that's the bottleneck? Not so fast. I then tried it a second time, and it took ~5 seconds... slower than Python, but not by too much.
It seems pgArc's performance isn't very reliable. I've noticed sometimes at the REPL that running the same (or similar) code will sometimes be instantaneous, and other times it'll chug along for a few seconds. I don't think it should be taking 1-2 minutes for something that should be taking 5 seconds, though.
However, my program pretty consistently took much longer than it does in Python, so I think I can safely rule out posmatch, which actually seems quite fast (almost as fast as Python, anyways).
There's room for improvement in posmatch, but it doesn't seem to be the smoking gun (at least not yet). For fun, I tried this:
for item in range(100000):
re.search("foobar", "foobarquxcorge")
It took ~10 seconds, so as I said, regexps are slower than find, in Python. Possibly because it has to create a match object, rather than just returning a number? I don't know.
Times will be variable because of garbage collection, that's normal. But having your posmatch example take a minute is very weird though, I've never seen anything like that happen.
I'm actually surprised to hear that posmatch is almost as fast as Python's find. Python's find, after all, isn't written in Python (like how Arc's posmatch is written in Arc), it's written in C.
If you use a regex more than once you'll want to compile it. Both Python and Racket have this capability.
I've sometimes experienced extremely long delays with DrRacket (I use DrRacket as an editor; I use the command-line "racket" to interact with Arc, and haven't observed these delays with racket). Perhaps Pauan is using DrRacket? And as for why that happens... I think it was a) trying to complete a garbage collection with b) a large part of its memory put into swap space. I'm puzzled by how it could take so long to un-swap that much memory (max 360-ish MB)... perhaps it did it in an impressively inefficient way (like, load page 1 containing a certain cons cell, then seek the disk for page 25359 containing its car, then seek back to page 2 for the cdr, and so on). Also, perhaps displaying the "recycling" animation contributes to the problem. Hmm, perhaps I'll figure that out at some point.
If you're not using DrRacket, then I'm not sure what might cause it, other than some unlikely things (e.g. memory of "racket" was swapped to disk and you were moving massive files around).
Meanwhile, if you really care about string-searching, or find it to be a pain point, you may want to implement the awesome Boyer-Moore algorithm: http://en.wikipedia.org/wiki/Boyer-Moore
No, I'm just using Arc on the command line. Garbage collection was my bet, and you're right that it could have been swap as well.
---
Yeah, might actually be a good idea to integrate that into the base posmatch, but as I said, posmatch isn't actually that slow compared to Python's find, so the bottleneck is probably elsewhere.
But yes, obviously in production-level code you should be compiling/caching them yourself, if solely on a "just-in-case" basis.
---
Also, I would like to point out that in all my time using Python, it's been very consistent as far as speed goes (no 3 second pauses), so would that imply that Python's garbage collector is more incremental than Racket's?
One possibility is that Python uses reference counting, which immediately frees non-cyclic garbage as you go, and, iirc, an additional occasional garbage collection cycle to free the remaining cyclic garbage. So I'm just guessing, but if you're not creating much cyclic garbage, maybe that's an explanation for why you're not seeing many garbage collection pauses.