"but as far as I know, you can't globally redirect them."
That's what I did with (stderr:stdout). Calling 'stderr with an argument sets (stderr) to that value permanently (or somewhat permanently, depending on whether other code also mucks around with 'stderr).
---
"I meant "the value isn't created dynamically at run-time.""
The result of (stderr) or (stdout) isn't freshly allocated or anything, it's just kept in a dynamic box (a Racket parameter) which you call to unwrap. It's just as constant as if it were kept in a variable, except that when you rebind it, you have the option to temporarily rebind it in a way that's friendly with continuations and threads.
If you replace them with regular global variables, it's probably a good idea to implement those particular variables using threading.local() somehow, just so one thread's (w/stdout ...) doesn't mess up any other threads.
Wow. In the current source for Penknife (http://github.com/rocketnia/penknife), I have an abstraction layer over input streams, which deals in tagged values of type 'fn-input. Any function that allows x!read, x!peek, and x!ready can be wrapped up as a 'fn-input, and Penknife uses this to skip comments and normalize newlines. I find it amusing our designs are so similar. :)
Your (stdin 'line) example hits upon something else I've thought about for Penknife, the ability to read different kinds of units from streams by passing the unit as a parameter. That said, the approach lost favor with me before I did anything with it, 'cause I want people to be able to define new units rather than being limited to ones that are hardcoded into the stream's implementation. The current way I imagine tackling this domain in Penknife is to have extensible global functions [lines x], [chars x], and [bytes x] which coerce arbitrary (but sensible) things into things which can be iterated upon. An input stream would be one example of a sensible x.
By the way, (coerce (stdin) 'byte) doesn't seem like a good way to read bytes if Unicode's on the scene.
---
"Of course, I'm not sure how to implement continuations, and likely won't get around to it for a while."
Adding continuations to a language implementation seems tough. I see two options:
Piggyback on a platform that already has them, or use low-level wizardry like stack copying to force the platform to appear to have supported them all along. If you can manage this, then the code of the language implementation can stay relatively straightforward, it can be mostly the same code as you had before adding continuation support, and it can interact intuitively with the platform's existing tools and libraries.
Implement them explicitly in the language core, for instance by calling everything in CPS or by maintaining explicit call stacks. This lets you implement continuations that have exactly the features you want, rather than just the features that come naturally from the platform.
---
"In any case, I want some relatively-easy way to create streams in Arc."
I understand if that's not as easy as you're looking for, though.
...Hey, you could implement 'yield using threads. That's probably easier, huh? XD It might have odd interactions with thread-locals and locking, but implementing 'yield with continuations probably isn't everyone's idea of intuitive either.
"...Hey, you could implement 'yield using threads. That's probably easier, huh? XD It might have odd interactions with thread-locals and locking, but implementing 'yield with continuations probably isn't everyone's idea of intuitive either."
Threads seem kinda heavy-handed, though, don't you think? In any case, I'm mostly sold on this idea for streams/iterators:
;; adapted from http://c2.com/cgi-bin/wiki?ExternalIterationUsingContinuations
(def make-stream (stream)
(let (yield next) nil
(= next (fn (value)
(ccc (fn (c)
(= next c)
(yield value)))))
(fn args
(ccc (fn (return)
(= yield return)
(apply stream next args))))))
(mac stream (parms . body)
`(make-stream (fn ,(cons 'yield parms) ,@body)))
Note: the above is designed to work in pgArc, and would require some small changes to work in py-arc (when I get continuations working). Also, it mostly works, but there's still one bug I need to sort out.
---
Basically, `make-stream` accepts a function, whose first argument behaves like `yield`:
I'm amazed at how short your definitions are, but I see two problems, not necessarily counting bugs: First, your generators can't yield nil without appearing to terminate. Second, it seems like the created iterators will take arguments the first time they're called and ignore them thereafter, which makes looping over them a bit more of a pain since the initial loop is treated differently from the others.
I have a suggestion we can both benefit from: Forget generator parameters altogether. While it's tempting to emulate Python and C# with a definition form like this,
(gen foo (a b c)
(each elem b
yield.a yield.elem yield.c)))
it's probably simpler to implement, just as useful, and just as flexible to have a parameterless syntax like this one:
(def foo (a b c)
(accum-later:each elem b
yield.a yield.elem yield.c))
Incidentally, 'yield is a good default name for the 'accum variable. I almost always say (accum acc ...), and I've wanted to make an anaphoric version for a long time, but I've never been comfortable having two related names that look like they abbreviate the same word. :-p
"First, your generators can't yield nil without appearing to terminate"
Yeah, I know. I considered that an okay tradeoff, since they're designed to be... well, streams, you know? If necessary, you could wrap the returned values in a list, letting the caller know whether it's nil the value, or nil the terminator.
---
"Second, it seems like the created iterators will take arguments the first time they're called and ignore them thereafter, which makes looping over them a bit more of a pain since the initial loop is treated differently from the others."
"I have a suggestion we can both benefit from: Forget generator parameters altogether."
How is that clearer, shorter, or overall better? Looks clunky to me. To be more specific, what do we gain from it, aside from (as you claim) it being easier to implement?
Just simplicity. My question is, what do we gain from having these parameters?
What's an iterator good for, except to iterate over? There are only three everyday utilities I see using these things with: 'each, 'all, and 'some--and 'each and 'all can be implemented in terms of 'some.
None of those utilities has any spot to provide extra arguments to the Iterable when making its Iterator (to use the Java terms), and it doesn't have any spot to provide extra arguments to the Iterator when getting each of the elements. This doesn't mean we need to add these spots.
You could go backwards... or reset it to the beginning, or send it values, ala coroutines. Or then there's peek vs. consume, etc.
I see streams as a generalization of iterators. They can be used to create simple iterators, yes, but they can also provide additional features that iterators don't (usually) have.
It's okay that functions like `each` can handle iterators without needing to pass arguments: that's a good thing. In fact, I considered it to be pretty important that passing it no arguments would do a "default action" like retrieve the next element.
---
What do we gain from having the parameters? I believe more extensibility. In fact, here's an idea that I was juggling around: prototypes (surprised?). Basically, it could work something like this:
What's going on here? Basically, a prototype is a function, that can selectively delegate to another function, by calling the `fail` continuation. I note that this is similar to rocketnia's idea of :fail or somesuch.
All this could be wrapped up in a macro or two, making it quite simple and short:
(proto bar (foo y)
'hello (+ "hello " y))
Anyways, when a proto calls `fail`, it then calls the prototype (in this case foo) with the same arguments. It can then do whatever it likes. Why is this good?
Well, for starters, it unifies the concepts of objects, classes, methods, functions, and prototypes... secondly, it can give us more extensibility. Specifically, how do we easily extend all functions of this type, without affecting that type? And what if we want to extend a particular function, without affecting other similar functions?
One solution would be to create prototypical functions, that act as "bases" that other functions can build off of. Consider the case of stdin, which would have the 'char and 'peek "methods":
(def stdin (x)
(case x
'peek ...
'char ...))
If you want your function to behave like stdin, you could define such methods in your function, but using stdin as the prototype:
(proto my-stdin (stdin)
'peek ...
'char ...)
Now, what happens is, if you call (my-stdin 'peek) or (my-stdin 'char) it will behave normally. But... what if you want to add an 'xml-node method to all the input functions? Simple, just extend stdin:
(extend stdin (x) (is x 'xml-node)
...)
Because my-stdin inherits from stdin, it'll see the new method automatically! Oh my, this is looking an awful lot like OOP! Indeed, except rather than using objects, methods, classes, etc... it's all just functions. Functions can inherit from any other function. Also, it's a somewhat strange form of OOP, since it uses prototypes rather than more traditional classes.
Note: this methodology would only be useful in the cases where you want to make a certain "type" of function behave differently. Obviously this won't apply to all circumstances, so ordinary functional programming is still good.
I have no more doubt that you're going somewhere interesting with this stuff. ^_^ Funny, I woke up this morning out of a dream in which someone online had started making something very similar to Penknife, and here you are, taking at least some degree of inspiration from failcall.
(Actually, the codebase I dreamed about was written in Java, was organized almost exactly the same way as my Arc code (probably thanks to dream magic), and was actually a language for distributed computing rather than just for extensible syntax. If you hit all those marks, then I'd be outright scared of you, lol.)
---
"it unifies the concepts of objects, classes, methods, functions, and prototypes"
A few years ago I was working with someone who used the phrase "add an else-if" when talking about almost any new feature. But between 'extend, failcall, prototype inheritance, scope shadowing, and operator precedence, it turns out we have an awful lot of sophisticated terms and frameworks for what amounts to adding an else-if. ^_^
"(Actually, the codebase I dreamed about was written in Java, was organized almost exactly the same way as my Arc code (probably thanks to dream magic), and was actually a language for distributed computing rather than just for extensible syntax. If you hit all those marks, then I'd be outright scared of you, lol.)"
Sorry, I'm not going anywhere close to Java code. You must be thinking of some other magic person. :P
Okay, let me put it like this. You want a function that behaves like stdin, so calling (peekc my-stdin) will work. But (readc my-stdin) needs to work too... and possibly (readb) as well. Which means that those functions need to extract internal details out of my-stdin, in order to return a useful value.
The problem is that we want my-stdin to behave differently in different situations, but on the surface it just looks like an ordinary function call, so how do the various functions above get at the internal stuff in my-stdin?
This provides a solution: you call the function and tell it what data you want, and then the function provides that data. Another problem with things like (readc) is that it's all-or-nothing: either you extend it so it changes behavior for all input functions, or you jump through some clunky hoops to make it only work on a subset... but even that's not very extensible.
But with prototypes, you get completely fine-grained control. You can extend an individual function, or extend that function's prototype, or extend that function's prototype, etc. This not only gives a way to expose the function's internal details, but also allows for fine-grained extensibility.
Rather than extending something based on it's `type` (cons, string, fn, etc.), we can extend a function based on it's prototype. The functions still have a type of 'fn, but we get fine-grained control to say "these kinds of functions behave differently in these situations".
"The problem is that we want my-stdin to behave differently in different situations, but on the surface it just looks like an ordinary function call, so how do the various functions above get at the internal stuff in my-stdin?"
In Anarki, here's an easy way to go (if not an easy way to get all the way to what each of us might want want):
(defcall input (self)
readc.self)
With this in place, we can say (my-stream) to read a character, but all the other things we can do with an input stream are still available.
---
One random thing I'm not comfortable with about the (my-stream 'peek) approach is the fact that it uses a symbol, 'peek, which might mean different things to different libraries. Obviously 'peek itself would be toward the core, but I could easily imagine two pull parsers trying to use 'xml-node.
---
"Another problem with things like (readc) is that it's all-or-nothing: either you extend it so it changes behavior for all input functions, or you jump through some clunky hoops to make it only work on a subset... but even that's not very extensible."
Could you elaborate on that somehow? If I "extend" 'readc, I'm probably only changing it to support a whole new range of stream types, not (intentionally) modifying the behavior for existing ones. It shouldn't be too clunky to determine whether something's in that new range, and I don't have a clue what part of this strategy you're saying is less extensible than your idea.
---
"You can extend an individual function, or extend that function's prototype, or extend that function's prototype, etc."
I know what you're talking about there, at least. ^_^ It's interesting to note that both of our approaches put it on the API designer's shoulders to make sure the system is well-factored into extension points. My approach, with direct use of failcall, and with rulebooks rather than deep dependency chains[1], would promote patterns like this:
rulebook.readb
; To simplify the example, we'll assume readc itself isn't a rulebook.
[fun readc [str]
[let first-byte rely:readb.str
...]]
; The use of the "rely" macro above makes readc fail if readb.str
; fails. The above definition of readc is roughly equivalent to this:
;
; [= readc [failfn fail [str]
; [let first-byte [failcall fail readb.str]
; ...]]]
;
; The "fun" macro introduces the "fail" variable anaphorically, and
; the "rely" macro expands to refer to it.
;
; Then "failcall" itself is a macro. It parses its second argument
; as an expression, then tries to unwrap it into a function call
; (potentially causing an error), then wraps it back up as a function
; call with a particular fail parameter subexpression. I'm still
; thinking over this design.
;
; Note that most of this stuff is still just a sketch right now.
Certain rulebooks could have rules that acted as inheritance, in your sense:
rulebook.my-plus
[rule* my-plus [] args
my-plus/+ ; This is the name of the rule.
[rely:apply + args]]
I expect this to be a common design pattern, common enough that I might end up with macros that capture patterns similar to some of your prototype system's patterns. In fact, depending on how I settle on implementing rulebooks (which is up in the air thanks to module difficulties), that pattern might already be easy to accomplish by going to a lower level:
[1] By "deep depencency chains," I mean that I assume you're talking about having patterns whereby A is the prototype of B, B is the prototype of C, C is the prototype of D, and people only ever use D most of the time. (A, B, and C might have longer names.) The rulebook equivalent would be to have a rulebook E which keeps track of A, B, C, and D. I think rulebooks probably make it easier to forget about the precedence order of different cases (for better or worse), while also making it possible to add and remove cases without having to rewire their neighbors (say, possible to remove B without fiddling with C's prototype).
"With this in place, we can say (my-stream) to read a character, but all the other things we can do with an input stream are still available."
Yeah, but that doesn't help if you want to extend readc or peekc so they understand your new stream. You need to extend readc and peekc directly. And then how do those functions get at the data they need?
---
"One random thing I'm not comfortable with about the (my-stream 'peek) approach is the fact that it uses a symbol, 'peek, which might mean different things to different libraries. Obviously 'peek itself would be toward the core, but I could easily imagine two pull parsers trying to use 'xml-node."
That is true, but how is that any more arbitrary than saying that the global functions peekc, readc, etc. are reserved for input streams? The symbol 'peek simply means whatever the function wants it to mean. Thus, a function that behaves like an input stream can be called like an input stream. Duck typing.
---
"Could you elaborate on that somehow? If I "extend" 'readc, I'm probably only changing it to support a whole new range of stream types, not (intentionally) modifying the behavior for existing ones. It shouldn't be too clunky to determine whether something's in that new range, and I don't have a clue what part of this strategy you're saying is less extensible than your idea."
Why not modify the behavior of existing ones? In fact, readline does that right now, by relying on readc. Thus, if your input supports readc, it supports readline too.
And it's not just readc either, it could be other functions, it's just that this is a discussion related to input streams. So, yeah, it's possible to do it with the current model, but I think it's clunkier and leads to more (and uglier) code.
Okay... let me try to explain... I want to allow Arc users to create new data types, like input streams. Right now, this is possible only by extending built-in functions like readc, peekc, etc. Something like this:
(extend readc (x) (isa x 'my-data-type)
...)
...but now you end up needing to define a new data type, and you need to extend all the built-in functions individually. You can avoid that by using coerce...
(def readc (x)
(do-something (coerce x 'input))
...the problem is the "do-something" part. Even after readc coerces your custom input type to 'input, how does it extract the data it needs? How does it read a character? How does readc access the internals of your data type? You could special-case it to be a function call, but then what about peek? What about reading bytes?
What I am proposing is a way to solve both problems in one fell swoop: functions are typed not based on their `type`, but based on the pseudo-methods they contain. Duck typing. And by calling these "methods", functions (like readc) can extract the data they need. Now, you can define readc like this:
(def readc (x)
(x 'char))
And now any function that supports 'char automatically gets access to readc without needing to extend it. What if a different function also uses 'char? No problem, it works too! Thus, the functions don't care about the type of their argument, they only care that it supports the methods they need to do their job. And by leveraging prototypes, it lets you extend the scope chain at any point, giving more fine-grained control than simply whether data is a string or a cons or whatever.
By the way, you can combine this approach with conventional types as well:
(def readc (x)
(coerce x 'input).'char)
The above is more strict, because it not only expects the data to be coercable to 'input, but it also expects it to have a 'char "method". This could potentially handle your use-case about conflicting libraries using the same method names: you can use the function's actual `type` to differentiate between them.
---
Random side note: this of course means that I can define tables so that they are functions of type 'table that have a 'get and 'set method. Then you can create a custom table just by creating an annotated function that has 'get and 'set. And then `apply` could be designed so when you call a table with (foo 'bar), it calls (foo 'get 'bar) instead.
This makes it super-easy for Arc code to define custom table types. Or custom input types. Or custom any-type, really, since all composite data types can be represented as functions with methods.
This is what I'm trying to explain. If you have a table called foo, then calling (foo 'bar) is a "get" action, and calling (= (foo 'bar) 5) is a "set" action. foo is a single data-type, it is self-contained, but it has two different behaviors in two different areas.
If you want to emulate that, you end up needing to jump through some hoops, like the following:
(def my-table ()
(obj get (fn (n) ...)
set (fn (n v) ...)))
...and now you have the fun of extending sref and apply (and probably eval as well) so they behave properly with your custom table. Remember my prototype post a while back? I couldn't even get it to work in pgArc. What a pain. So much easier if all your function needs to do is implement the required methods, and everything just works without needing to extend anything.
And this "functions with methods" idea is applicable to other data types as well, like input streams. So for the same reasons that creating custom table types are a massive pain, creating custom input types are a massive pain. But this proposal makes it super easy.
---
"My approach, with direct use of failcall, and with rulebooks rather than deep dependency chains[1], would promote patterns like this:"
By the way, I would expect my proposal to be completely compatible with rulebooks as well... the only requirement is that the call (foo) return a char, and (foo 'peek) return a char, but not consume the stream. It's completely up to foo how it handles delegation and/or inheritance. I merely provided one potential way: prototypes.
But the proposal works even with ordinary functions that aren't prototypes. In fact, it doesn't even need to be functions per se, for instance I would expect the following to work:
(= foo (obj char #\f))
(readc foo) -> #\f
Neat. Since foo is a table, and (foo 'char) returns #\f, readc was able to use it. This, of course, wouldn't work in the strict version of readc, which tries to coerce it's argument to 'input. But this would work:
"By "deep depencency chains," I mean that I assume you're talking about having patterns whereby A is the prototype of B, B is the prototype of C, C is the prototype of D, and people only ever use D most of the time. (A, B, and C might have longer names.)"
It could be that way, but shallower trees are also quite possible, and in fact I expect those would be the norm. My suggestion to authors of functions would be to only increase the depth of the tree as needed.
"Yeah, but that doesn't help if you want to extend readc or peekc so they understand your new stream. You need to extend readc and peekc directly."
Sounds like you're solving your own problem. :)
"And then how do those functions get at the data they need?"
Why hide it?
Anyway, I'd put the extensions of 'readc and 'peekc in the same place as the rest of my code that dealt in the concrete details of my custom stream type. That way, I can pretend in all my other code that the data is strictly encapsulated, and when I do change the implementation, everything I need to refactor is in the same page or two of code.
---
"That is true, but how is that any more arbitrary than saying that the global functions peekc, readc, etc. are reserved for input streams?"
If you're saying the symbol 'peek is just as arbitrary as the global variable name 'peekc, I agree, but global variables are the more likely thing to be handled by a namespace system. :) If that's not what you're saying, whoops.
---
"Why not modify the behavior of existing ones? In fact, readline does that right now, by relying on readc. Thus, if your input supports readc, it supports readline too."
Huh? Using 'readline doesn't change what happens the next time you use 'readc. I think we're having some word failures here.
Maybe what you mean by "modify" is more of a pure functional thing, where you "change" a list by removing its first element when you call 'cdr. But then I still don't understand what you meant in your original statement that "Another problem with things like (readc) is that it's all-or-nothing."
"you need to extend all the built-in functions individually . You can avoid that by using coerce..."
Darn 'coerce, always enticing people to use it. ^_^ It's actually possible to use a coercion pattern here, but you'll need to check whether you can coerce at all, and go on to other extensions if not. (This is something failcall and rulebooks make convenient.) However, to create a value of type 'input, you still need to specify all the reading and peeking behavior somewhere, and I prefer to specify those behaviors in separate pieces of ravioli, in this case by extending each function individually.
"Even after readc coerces your custom input type to 'input, how does it extract the data it needs?"
Exactly, I wouldn't turn something into an 'input and then turn it back; by extending 'readc directly, I'd preempt the coercion step.
To digress a bit, coercion is a positively useful design pattern when there's more than one sensible set of "axioms" for a set of utilities. If I see utilities A1, A2, B1, B2, B3, and B4, and I notice that the B's can be implemented in terms of the A's, then I can do the following:
1. Write a function C of the form "Take any value, and return a similar value that supports all the A's." I call this a coercion function, but I don't expect everyone to agree with that. ^_^
2. Extend the B's so that they try using C. (If there's no standard way to "try" using a function, such as failcall, then C needs to indicate failure in its own way, or there needs to be another function D that people call to see if they can call C.)
3. Sometimes it's useful (if uninspired) to create a boilerplate concrete type for C to return when there's no better way to make the right A-savvy value. This type tends to take the form of a wrapper containing a function to call for the A1 behavior and a function to call for the A2 behavior. Obviously, the A's should be extended to support this type; that's the only point of it.
After this, if I ever want to extend the B's, there's a good chance I can do it by finding (or making) something that extends the A's instead, and then extending F to do the conversion. After a certain initial cost (making C and C's general-purpose return type), this eventually becomes a net decrease in the number of extensions needed.
...And I've run out of time. I'll get to the rest of your post later! ...And your followups someday. XD
I would like to point out that prototypes and methods are completely separate subjects that serve a similar purpose (extensibility) in completely separate ways. Perhaps I shouldn't have discussed them in the same post.
Methods solve the problem of polymorphism, namely the ability to easily define new types (and new instances of existing types) that can intermesh with existing code (like the built-ins readc and peekc). It does this by implementing duck typing: if the function supports the method, just use it.
This can be augmented by a technique that I've come to like: coercion. Rather than saying "my argument needs to be of type foo", the function just coerces to type 'foo. If it can't be coerced, it will automatically throw an error.
The reason I like the coerce approach is because it means you can easily create completely new types. So if you create a new 'byte type, you can extend coerce so it can be coerced into 'char, 'int, 'num, etc. and existing code will work with it.
The reason I like the method approach is that it makes it easy to create new instances of existing data types. Like I mentioned, it makes it easy to create custom 'table types. It also maximizes the benefit of prototypes, and in some cases allows completely new types to piggyback off of existing functions, which is made easy with prototypes.
The reason I call it "more extensible" is for the exact same reason I call the coerce approach more extensible. With the coerce approach, the function doesn't care what type it is, it only cares that it's possible to coerce it to the type it wants.
In the case of methods, the function doesn't care what type it is, it only cares that it's possible to extract the data it needs, using the function's methods.
---
Prototypes, however, serve a different purpose. Specifically, it tries to solve the problem where you sometimes want to extend a particular function, and sometimes want to extend many functions at once. It's designed to give fine-grained control beyond merely what the type is. Also, by letting one function serve as a "base" for other functions, it tries to reduce duplicate code.
All three concepts are designed to enhance extensibility, but they do so from different angles, and in different ways. You'll note that all three attempt to achieve extensibility by ignoring what the type of something is. Instead, they focus on either the desired type, the desired functionality, or the desired ancestor.
The three combine to form a coherent whole, which I think is as it should be.
Perhaps I should write up a post comparing the two approaches (prototypes/methods vs. pure functions). Depending on the results, that could either convince me that my idea is silly, or convince you guys that it has merit.
Coercion means you only need to define coercion rules in one spot, and existing code can Just Work.
Methods mean you only need to define a method that implements the functionality, and existing code can Just Work.
---
Prototypes get rid of the traditional concept of type entirely. Rather than saying "that's an input stream" or "that's an object of type input" we instead say "that's a function that inherits from stdin."
Of course, I don't plan to get rid of types, since I think they're useful (especially with coercion), but at the same time, I think having more fine-grained control can be useful, so I think there's some synergy between types and prototypes.
"That's what I did with (stderr:stdout). Calling 'stderr with an argument sets (stderr) to that value permanently (or somewhat permanently, depending on whether other code also mucks around with 'stderr)."
Huh, I didn't know that.
---
"If you replace them with regular global variables, it's probably a good idea to implement those particular variables using threading.local() somehow, just so one thread's (w/stdout ...) doesn't mess up any other threads."
Right, threads. I hadn't given any thought to them (they're not implemented yet, either). I'll have to revisit this issue later, when I actually implement threads.
---
"By the way, (coerce (stdin) 'byte) doesn't seem like a good way to read bytes if Unicode's on the scene."
Hm... I guess it depends on what you expect it to return. If it's a Unicode character that can be represented as a single byte (in UTF-8), then just returning that should be fine. Trickier if the code point is represented as more than one byte, but wouldn't that still be possible, assuming code would be expecting multiple bytes? Of course then there's the old fallback of (readb) or (stdin 'byte).
---
"Implement them explicitly in the language core, for instance by calling everything in CPS or by maintaining explicit call stacks. This lets you implement continuations that have exactly the features you want, rather than just the features that come naturally from the platform."
I was leaning toward that approach (everything CPS), but I'm still mulling it over. Python doesn't have TCO, though, so wouldn't CPS exceed the recursion limit in Python, unless I implement a trampoline? In which case things will get even slower, hurrah.
"Adding continuations to a language implementation seems tough."
The more I think about it, the more I feel like implementing my interpreter in Ruby, which has call/cc, and proper lambdas (unlike Python's terrible ones). And then, I could name it Arubic. Awesome name, right?
Pretty awesome, yeah. XD It would actually make it less useful to me personally though, since my cheap web host doesn't support Ruby. >.>
Anyhow, I think Ruby's continuation support is a bit sketchy. There's no equivalent of Scheme's 'dynamic-wind built into the language as far as I know, so it may be necessary to use a patch like this one, which replaces callcc with a dynamic-wind-aware equivalent: https://github.com/mame/dynamicwind
Since there's no dynamic-wind, I haven't looked to see if there's a way to do continuation-friendly dynamic scope, like Racket's parameters. If that callcc replacement works though, it should be possible to implement parameters on top of Ruby, just maybe not that efficiently. :-p
Racket does a bunch of other interesting stuff with continuations, such as continuation marks, continuation guards, and... maybe that's it... and we'd lose those, but I think they're mainly useful for performance and well-specified 'letrec behavior, and who cares about that stuff? :-p
You might also find writing an Arc interpreter in Racket or even Arc3.1 a lot of fun:
- you'll get continuations, threads, and parameters that actually work
- you'll be able to easily implement things like fexpr's, first class macros, or serializable closures in your interpreter that Arc3.1 doesn't have (hard to do in a compiler)
- you can write parts of your program in Arc3.1 that you want to run fast (and doesn't need your extra interpreter features) and other parts using your Arc interpreter, and it can all work together
Yeah, I also considered that, or Chicken Scheme, or Common Lisp, but honestly... as much as I may like Lisp... Scheme and Common Lisp just feel... bloated. I love Arc's shortness and simplicity, and Python has it too, I just hate the way Python behaves sometimes (curse Python's lambdas and lack of proper closures).
I admit that would certainly be an easier choice, in some ways. I do like the idea of writing an Arc interpreter in Arc... but from my initial tests, Arc seems quite slow with string/list processing. Maybe it was just my program. Isn't that already being handled with arcc, though?
Ah, if by "arcc" you're referring to https://github.com/nex3/arc/tree/arcc, that's an Arc compiler (replacement for ac.scm) written in Arc. Thus while it produces code that runs faster than interpreting code would, it has the same limitations as ac.scm.
Arc seems quite slow with string/list processing
Do you have an example handy? ("Here's a program in Python, and here's the program in Arc, and see how Arc is slower"). ...There may be things we can do to speed Arc up, but having a runnable example is useful. I.e., I might do something that makes my Arc program run faster, but it might not necessarily make your Arc program run faster.
Oh, at first glance it seemed to behave somewhat like an interpreter, my mistake.
---
Yeah, I do, actually. I wrote a program that can take an (ad-hoc and simple) playlist format and convert it into .xspf. I then rewrote the program in Arc (which was much nicer than writing it in Python), but it ended up being drastically slower. Which means I'm not sure if it's a problem in Arc, or Racket, or my program! It could be any combination of those three.
The program itself should have O(n^2) complexity, due to error checking (which actually isn't even implemented in the Arc version...) If I got rid of error checking, I could get it down to O(n log n) but I don't want to do that.
In any case, the complexity should be the same for both Python and Arc. If I recall, the slowness was primarily caused by me searching for whether one string is a substring of another string. Python has string.find (which performs faster than regexps, when I tried it), but I'm guessing Arc's implementation is slow.
This is all whishy-washy guessing, though. I haven't done concrete tests or anything, so take it with a grain of salt. However, I'm vaguely interested in finding out what the performance bottleneck is, and possibly fixing it.
---
Edit: I just tried these:
# Python
for item in range(100000):
"foobarquxcorge".find("foobar") != -1
; Arc
(repeat 100000 (posmatch "foobar" "foobarquxcorge"))
And got some very weird results... the Python version consistently takes about 2 seconds. When I first tried the Arc version, it took something like a minute or two. Aha! So that's the bottleneck? Not so fast. I then tried it a second time, and it took ~5 seconds... slower than Python, but not by too much.
It seems pgArc's performance isn't very reliable. I've noticed sometimes at the REPL that running the same (or similar) code will sometimes be instantaneous, and other times it'll chug along for a few seconds. I don't think it should be taking 1-2 minutes for something that should be taking 5 seconds, though.
However, my program pretty consistently took much longer than it does in Python, so I think I can safely rule out posmatch, which actually seems quite fast (almost as fast as Python, anyways).
There's room for improvement in posmatch, but it doesn't seem to be the smoking gun (at least not yet). For fun, I tried this:
for item in range(100000):
re.search("foobar", "foobarquxcorge")
It took ~10 seconds, so as I said, regexps are slower than find, in Python. Possibly because it has to create a match object, rather than just returning a number? I don't know.
Times will be variable because of garbage collection, that's normal. But having your posmatch example take a minute is very weird though, I've never seen anything like that happen.
I'm actually surprised to hear that posmatch is almost as fast as Python's find. Python's find, after all, isn't written in Python (like how Arc's posmatch is written in Arc), it's written in C.
If you use a regex more than once you'll want to compile it. Both Python and Racket have this capability.
I've sometimes experienced extremely long delays with DrRacket (I use DrRacket as an editor; I use the command-line "racket" to interact with Arc, and haven't observed these delays with racket). Perhaps Pauan is using DrRacket? And as for why that happens... I think it was a) trying to complete a garbage collection with b) a large part of its memory put into swap space. I'm puzzled by how it could take so long to un-swap that much memory (max 360-ish MB)... perhaps it did it in an impressively inefficient way (like, load page 1 containing a certain cons cell, then seek the disk for page 25359 containing its car, then seek back to page 2 for the cdr, and so on). Also, perhaps displaying the "recycling" animation contributes to the problem. Hmm, perhaps I'll figure that out at some point.
If you're not using DrRacket, then I'm not sure what might cause it, other than some unlikely things (e.g. memory of "racket" was swapped to disk and you were moving massive files around).
Meanwhile, if you really care about string-searching, or find it to be a pain point, you may want to implement the awesome Boyer-Moore algorithm: http://en.wikipedia.org/wiki/Boyer-Moore
No, I'm just using Arc on the command line. Garbage collection was my bet, and you're right that it could have been swap as well.
---
Yeah, might actually be a good idea to integrate that into the base posmatch, but as I said, posmatch isn't actually that slow compared to Python's find, so the bottleneck is probably elsewhere.
But yes, obviously in production-level code you should be compiling/caching them yourself, if solely on a "just-in-case" basis.
---
Also, I would like to point out that in all my time using Python, it's been very consistent as far as speed goes (no 3 second pauses), so would that imply that Python's garbage collector is more incremental than Racket's?
One possibility is that Python uses reference counting, which immediately frees non-cyclic garbage as you go, and, iirc, an additional occasional garbage collection cycle to free the remaining cyclic garbage. So I'm just guessing, but if you're not creating much cyclic garbage, maybe that's an explanation for why you're not seeing many garbage collection pauses.