If that's all you want, Anarki already has it. ^_^
I'm interested in finding a documentation solution that scales up to programming-in-the-large, without being the cumbersome kind of thing that people would have trouble using even if they wanted to.
In particular, I like the Racket approach of having both an API reference, where people look up the meanings of particular functions, and a user guide, where people learn the design of the system and develop a better idea of which functions they should actually be looking for.
In-source docstrings (or Javadoc comments and the like) should be an okay way to get good coverage in the API reference, but they're kinda independent of each other and hard to automatically organize, especially in a language without modules or classes. They're not even close to being good pieces for walkthrough-like user guides.
Literate programming, meanwhile, encourages code to be arranged as though it's a user guide. (This may be just one style of literate programming; I imagine people who care about literate programming mainly just care about good documentation.) The Inform 7 docs are an awesome example of this:
However, there are usually at least some parts of the code that are left as an exercise for the reader to understand. :-p This approach also straddles the line between guide and reference without necessarily being the best at either. There's kind of a cross-cutting concern going on, the more roles you try to give to a single codebase, and it kinda drives me in the reverse of literate programming: I almost suspect it would be most ideal just to put nothing but implementation notes in the code comments, and to have code be simple enough to stand on its own (like the current state of Arc).
This ties back in with another, more important part of the Racket discussion: The version of the program you have shouldn't determine the version of the docs you have. In fact, the docs oughta be a living document that immediately collects improvements and discussions, oftentimes faster than the codebase changes. They should be some form of wiki or help forum.
I'll go farther than that: Keeping comments in the code is a brittle system because they'll become inconsistent with the real docs. Instead, people should browse the code on a website that automatically lets them see the most recent versions of the wiki content. There'd essentially be a CMS that manages:
- The code itself.
- Snippet-local API docs.
- Comprehensive, listing-like API references.
- User guides.
- Examples of various sizes (like aw's "pastebin for examples" idea).
- Tutorials.
- Bug tracking entries.
- Freeform discussions on all these things.
This is bigger than I expect to design all by myself, for sure. :-p
I definitely agree with you that there is a difference between api reference and user guides. Languages do need both.
You bring up a excellent point that documents should be living documents and can evolve faster than the version of the code. With a community this small I think it's easiest to bundle the two together since anyone can contribute to anarki.
Maybe Anarki contributor should designate a collaborative location that can serve as the official site for documentation for atleast user guides, tutorials and faq's.
For discussions I think we would all agree that arclanguage.com is the best place.
"With a community this small I think it's easiest to bundle the two together since anyone can contribute to anarki."
I like that in concept. We could just have a GitHub Pages branch on Anarki, and GitHub would automatically let us view it as a web page. However, aw mentioned having trouble with GitHub Pages: http://arclanguage.org/item?id=12934
Someone could rig up a website to serve GitHub raw file views as HTML, but I don't know if that's nice to GitHub. :)
Someone could instead have a website that somehow keeps an up-to-date clone of Anarki (perhaps triggering a "git pull" not only as a cron job but also every time a certain page was viewed) and somehow uses that to determine the website content.
One thing to consider is security: If anyone can show their own JS code on this page, they could set tracking cookies or something. If anyone can run Arc code on the server, there's even more to worry about (albeit nothing the racket/sandbox module isn't designed for).
---
"Maybe Anarki contributor should designate a collaborative location that can serve as the official site for documentation for atleast user guides, tutorials and faq's."
However, having a separate place for documentation is only one part of what I'm suggesting. I'm not sure it's worth it unless the separate parts are somehow integrated again--for instance, by showing docs and discussions as you browse code, or by letting user guide writers say {{doc:anarki:lib/util.arc:afnwith}} or somesuch to include a piece of the API reference.
---
"For discussions I think we would all agree that arclanguage.com is the best place."
Speaking of which, are arclanguage.com and arclanguage.org both legitimate? Both of their WHOIS entries list Paul Graham, but I don't know whether that means anything. I've never logged in anywhere but arclanguage.org, just because it's what most people link to.
Anyway, we totally do use the Arc Forum for discussions now, but I think things would be better if we could incorporate ideas like the ones from this thread: http://arclanguage.org/item?id=12920
I imagine that my complaints about GitHub Pages at the time were probably just growing pains on GitHub's part.
However exactly for the reason of implementing our own features at some point such as the cross references you mention I expect that we're going to want to do our own processing. Which suggests that GitHub Pages or the arclanguagewiki on Google Sites might be part of the right long term solution, but only if there's a way to e.g. insert the piece of an API reference... which we're generating.
Here's a thought. What if we had a server which went out and gathered documentation source material from various places such as Anarki. (GitHub has http://help.github.com/post-receive-hooks/ so the server could get notified of new pushes to Anarki instead of having to poll).
The server would work on the text of the sources, such as docstrings found in the Anarki source code. That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser). The server would process the documentation source material and generate static HTML files... which could be hosted on S3 or GitHub Pages. This would have an additional advantage that even if the server were down, the documentation itself would still be up and available.
"The server would work on the text of the sources, such as docstrings found in the Anarki source code."
With this approach, people might be pushing to Anarki way more, sometimes using in-browser file edits on GitHub, and the server would have to scrape more and more things each time. Then again, that would be a good problem to have. :-p
---
"That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser)."
By the same token, it would be harder for just anyone to update the server, right? Eh, that might be a necessity for security anyway.
Potentially, parts of the server could run Arc code in a sandbox, incorporating the Arc code's results into the output with the help of some format that's known to have no untrusted JavaScript, like an s-expression equivalent of BBCode or something.
Well, code that generates page contents.... Suppose I want to put "prev" and "next" links on several pages, or suppose I want an API reference to automatically loop through and include all the docstrings from a file. Lots of this could be up to the server to do, but I'd like for the documentation itself to have some power along these lines. For instance, someone might write a DSL in Arc and want to set up a whole subsite covering the DSL's own API.
Besides that, it would just be nifty to have the Arc documentation improve as people improved the Arc libraries and vice versa.
Suppose I want to put "prev" and "next" links on several pages, or suppose I want an API reference to automatically loop through and include all the docstrings from a file.
I'd just have the server code do that.
For instance, someone might write a DSL in Arc and want to set up a whole subsite covering the DSL's own API.
Sorry, not following you here. How would this be different?
Besides that, it would just be nifty to have the Arc documentation improve as people improved the Arc libraries and vice versa.
Certainly. Naturally the server code can be written in Arc itself.
Say this DSL is a stack language written in Arc, called Starc, and Starc programs are implemented by lists of symbols. I've set up a global table to map from symbols to their meanings, and I have a 'defstarc macro that submits to that table and supports docstrings.
Now I want my language to have documentation support that's seamless with Arc's own documentation. Somehow I need my Starc documentation to be split across multiple pages, with some pages created using the 'defstarc docstrings. I want Starc identifiers to be displayed in a different style than Arc identifiers, but if anything, I want it easier for a Starc programmer to refer to Starc identifiers in the documentation than to Arc identifiers.
So every time I come up with one of these requirements for the documentation, I should submit a patch to the server or something? Fair enough--the code implementing the documentation oughta be documented somewhere too, and keeping it close to the project also makes it more ad hoc and inconsistent--but I think this would present a bit of an obstacle to working on the documentation. I'd rather there be a compromise, where truly ad hoc and experimental things were doable in independent projects and the most useful documentation systems moved to the server code gradually.
This would be more complicated to design, and it could probably be incorporated into a more authoritarian design after it's underway, so no worries.
- you run a copy of the server code you're working on locally, until you see that your "Stark" documentation is being integrated into the rest of the documentation in the way that you want it to
- you push your changes to the server (say, via github for example) and they go live
OK, but what if you're a completely random person, you've never posted anything to arclanguage.org, no one knows who you are, and you want write access to the server so that you "can do stuff". Alright, fork the repo on github, push your changes there, and send a pull request. Then when you turn out to be someone who isn't trying to install malicious Javascript you are given write access to the server repo yourself. (This is pretty standard approach in open source projects, by the way).
But... what if write access to the server repo ends up being controlled by an evil cabal of conservatives who reject having any of this "Starc" stuff added? Fire up your own server, publish the documentation pages yourself, and people will start using your documentation pages because they are more complete than the old stuff.
My concern with the sandbox idea is that I imagine it's going to be hard to create a sandbox that is both A) powerful enough to be actually useful, and B) sufficiently constrained so that there's no possible way for someone to manage to generate arbitrary Javascript.
I'm finding this discussion very helpful, by the way. What I'm spending my time on now is the "pastebin for examples" site. I've been wondering if this project would stay focused on just the examples part (with the ability for other documentation sites to embed examples from the pastebin site) or if it would expand to be a site for complete documentation itself (the "code site for Arc" idea).
For the pastebin site I've thrown away several designs that weren't working and I've found one that so far does look like it's going to work. But, the catch is that by design it allows the site to execute arbitrary code in the target machine that's running the example. This isn't too terrible by itself (you can always run the example in a virtual machine or on an Amazon EC2 instance etc. instead of on your own personal computer if you want), but it does mean that the "pastebin for examples" site is going to need a higher level of security than an Arc documentation site.
Which in turn implies that while the Arc documentation site can use examples from the pastebin site (if people find it useful), the pastebin site itself shouldn't be expanding to take on the role of the Arc documentation site (since the Arc documentation site can and should allow for a much freer range of contributions).
"But... what if write access to the server repo ends up being controlled by an evil cabal of conservatives who reject having any of this "Starc" stuff added?"
The main thing I'm afraid of is the documentation site becoming stagnant. Too often, someone finds the arclanguage.org website and asks "How do I get version 372 of MzScheme?" Too often, someone who's been reading arcfn.com/doc the whole time finally looks at the Arc source and starts a forum thread to say "Look at all these unappreciated functions!" ^_^
I don't blame pg or kens; I blame the fact that they don't have all the time in the world to do everything they want. I'm in the same position, and I bet it's pretty universal.
---
"Fire up your own server, publish the documentation pages yourself, and people will start using your documentation pages because they are more complete than the old stuff."
That could be sufficient. But then while I'm pretty active on this forum, I'm not sure I have the energy to spare on keeping a server up. If the community ends up having only people as "let someone else own it" stingy as me, we'll be in trouble. >.>;
---
"My concern with the sandbox idea is that I imagine it's going to be hard to create a sandbox that is both A) powerful enough to be actually useful, and B) sufficiently constrained so that there's no possible way for someone to manage to generate arbitrary Javascript."
All I'm thinking of is some hooks where Arc code can take as input an object capable of querying the scrape results and give as output a BBCode-esque representation that's fully verified and escaped before use. But then I don't know if that would be sophisticated enough for multi-page layouts or custom styles or whatnot either. ^^;
There could also be another Arc hook that helped specify what to scrape in the first place... but in a limited way so that it couldn't do denial-of-service attacks and stuff. ^^; lol
Partly it's just a curiosity for me. I like the thought of letting Arc code be run in a sandbox for some purpose, even if it's only marginally useful. :-p
---
Meanwhile, I had another thought: Even if the server doesn't allow running arbitrary code, people could still develop special-purpose things for it by running their own static site generators and putting up the output somewhere where the server will crawl. I wonder how this could affect the server design.
But then while I'm pretty active on this forum, I'm not sure I have the energy to spare on keeping a server up.
I'd be happy to run the server, and set up some kind of simple continuous deployment system so that when someone makes a code push to the server repo the code goes live.
Depending on availability and motivation I may (or may not...) end up having time myself to get Ken's documentation into a form where it can be edited (he generously offered last year to let us do this).
A part that I don't have motivation to do myself is writing the code that would crawl Anarki and generate documentation from the docstrings.
I like the thought of letting Arc code be run in a sandbox for some purpose, even if it's only marginally useful.
I certainly won't prevent someone from adding a sandbox to the server. On the other hand... if you'd like to work on something where a sandbox would be useful ^_^, I'd encourage you join me in my API project :-)
"The main thing I'm afraid of is the documentation site becoming stagnant. Too often, someone finds the arclanguage.org website and asks "How do I get version 372 of MzScheme?" Too often, someone who's been reading arcfn.com/doc the whole time finally looks at the Arc source and starts a forum thread to say "Look at all these unappreciated functions!" ^_^
I don't blame pg or kens; I blame the fact that they don't have all the time in the world to do everything they want. I'm in the same position, and I bet it's pretty universal."
I think if contributing is open and flexible people will contribute to keep the site up todate. Complete and simple instructions must exist to help and encourage people to contribute. Some is social where people feel they need "permission" to contribute.
The interesting thing I am seeing among the experimentation and projects people are doing here is the fragmentation. I think experimentation with languages are great and very necessary but it's difficult to see there isn't a main champion for the community to rally behind.
PS stupid question how are you italicizing quoted text. I tried adding <i>some text</i> but that didn't work. I haven't had enough time to play with the comments to figure that out.
"The server would work on the text of the sources, such as docstrings found in the Anarki source code. That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser)."
If it ever got to the point where actually eval'ing the code were necessary/desirable, you could do so in a safe namespace in PyArc (hint hint).
While "default" is a good name for that macro, it already exists as 'or=.
Also, aw's 'inline macro, which expands into the result of evaluating its body, is a slightly easier way to simulate the calculate-once semantics from official Arc in many cases:
(def foo ((o a (inline:uniq)))
a)
As for me, I don't have a preference one way or another. I do rely on the current behavior, but I think I'd rely on the Python-style behavior just as much.
Why not have both, using two different optional arg syntaxes? That's not an elegant or orthogonal way to go, but it could be a good way to see which one you use more often.
I can't promise I'd use PyArc, since I'm entrenched in lots of existing Arc code and working on Penknife, but I wouldn't rule it out either. Python is one of the few CGI scripting languages my cheap web hosting supports, so PyArc could come in handy. ^_^
Well, you're already using (= ...), which isn't the Arc syntax.... :-p
As always, I recommend a syntax that takes advantage of the ssyntax space of symbols so as not to conflict with destructuring.
(fn (.o a (uniq) .= b (uniq)) ; two optional arguments
...)
I think you mentioned that you picked '= so that it wasn't as prone to conflicts as 'o, but sometimes I want to use things like '= as local variable names too. You can see this at http://arclanguage.org/item?id=14069, where I use '< and '==. It would hardly ever come up in practice, but the same can be said of 'o.
Yeah, but if we're having two different behaviors, then we wouldn't be using = for both, right? :P
As for 'o vs '= I would argue that 'o would come up far more often, if the programmer prefers short variable names. And the problem isn't just with conflicts... o is an ordinary letter, so I think it blends in too well, but = stands out. On top of that there's the nice synergy with (= a b) which is used for assignment. And it also leads to the obvious a=b ssyntax, which wouldn't work with 'o.
In any case, you should be able to use = as an argument name just fine. That's because ssyntax makes a distinction between prefix/infix/suffix, and = is considered infix only. Thus, == would count as prefix, but there's no prefix for =, so it's counted as an ordinary symbol. And a single = is also counted as prefix, so it's just an ordinary symbol.
On top of that, you can escape symbols with ||, so they completely bypass ssyntax:
(def foo (|<==|))
And on top of that, you can selectively disable/change ssyntax. So what's the problem? :P
"Yeah, but if we're having two different behaviors, then we wouldn't be using = for both, right? :P"
I meant to imply that you could reintroduce 'o for the official-Arc semantics and still use '= for the Python semantics. (Of course, you could pick a different name for the "reintroduced" 'o.)
---
"In any case, you should be able to use = as an argument name just fine. That's because ssyntax ... And on top of that , you can selectively disable/change ssyntax. So what's the problem? :P"
Er, what does ssyntax have to do with this? I assume by default that whatever ssyntax system you have will allow "=" as at least a global variable name, or else you'll have to come up with a new name for Arc's '=. :-p
You overlooked the only complaint I have--probably my fault for neglecting to explain it, sorry--which is that I'd like to be able to use '= at the beginning of a three-element destructuring parameter, however rarely such a case actually comes up.
That is to say, since I can destructure like this:
(fn ((a b c))
...)
I want it to have the same meaning if I change a variable name:
(fn ((= b c))
...)
Consistency's my only goal here. (That's not quite true: I value consistency 'cause it reduces the number of cases documentation and code-manipulating code need to handle, making both of those more convenient to read and write.)
By the way, having an a=b syntax for optional args is a nice idea. It wouldn't be groundbreakingly commonplace or save much indentation, but it would occasionally remove the need to edit in two places just to put in or take out a left paren. ^_^ It'd conflict with 'expand=list, but there's hardly a point to 'expand=list anyway.
But... that's a problem with most optional argument systems. In fact, the only way I can see to get around that would be to use something like this for optionals:
(fn (a b ? c 1 d 2))
I figure = won't be terribly common with destructuring, hopefully. Less common than 'o, anyways. I could try making it so escaping it with |=| would work, but that seems clunky.
Actually... come to think of it, using ? for optionals would work nicely in PyArc, because all arguments are optional. The primary reason I disliked the ? optional syntax was because you needed to do stuff like this:
(fn (? a nil b nil c nil))
Note the nil's after each argument name. But in PyArc, that's less of an issue, because you can just use this:
(fn (a b c))
The only issue with that is, if you want to assign to an optional argument, and have it followed by more arguments, you need to do this:
(fn (? a 1 b nil c nil))
Ew. Another idea would be to require specifying ? once per argument:
(fn (? a 1 b c))
(fn (a b ? c 1 ? d 2))
Kinda clunky, though... Hm... what if we replaced ? with = ...?
(fn (= a 1 b c))
(fn (a b = c 1 = d 2))
Kinda weird. Also that wouldn't allow us to use the a=b ssyntax, either. It seems every optional argument system has some flaw or other. Let's suppose we used .= so as to prevent problems when destructuring... then you come along and say that you want this to work:
(fn (a b |.=| c d))
At some point, something has to give. :P I agree that consistency is good, but I haven't yet seen an optional argument system that works great in every situation. They all have some gotcha, so at some point we just gotta say, "this is good enough".
Is (= a b) good enough? I'm tempted to say yes, for the reasons I gave earlier: stands out better, less likely to conflict, synergy with assignment, and the a=b ssyntax. The only major issue is using = as the first element when destructuring. That's not too bad, I think, considering the alternatives...
Oh, by the way, there is one optional argument system that would work properly all the time... not using optional syntax at all. So you would do this:
(fn (a b c)
(or= a 1)
(or= b 2)
(or= c 3))
Of course, that's more verbose.
---
Also, my idea was to have two kinds of ssyntax: global and argument-only. So the a=b ssyntax would only apply in the arguments list, unless the user decides to make it global. Thus, no conflict with the global `expand=list`. Unless there's a desire to make it global, and fix expand=list...?
Ohh, you're totally right about there being a dilemma here. Here I was advocating the use of ssyntax symbols for things that weren't variable names, and I somehow overlooked how nonsensical that becomes when ssyntax is in the reader. XD Not to mention how nonsensical it is when ssyntax is customizable.
(I totally value both those things, but I came to a compromise myself a while ago and henceforth forgot there was a problem: The compromise, in Penknife, is to have fixed alphabets of punctuation and non-punctuation characters, and to treat punctuated things completely differently when parsing parameter lists. (Actually, advanced parameter list parsing isn't a near-future thing for Penknife; I plan to tackle it at the same time as regexes, which will be well after I've reengineered the core.) I don't know how much an alphabet separation would help you, but it's an option.)
Hmm, what about reading a=b as a tagged value? That way the different kinds of parameters nodes can be fully distinguished by calling [only.type _].
By the way, argument-only ssyntax is fine by me, as long as you can make sense of it. You really have to do ssyntax and macros in the same step in order to pull that off, so if you move to a read-then-expand model like official Arc's, you'll run into some trouble. You can have ssyntax read as tagged types and then expand those, but I'm not sure that's something you'd like.
"Hmm, what about reading a=b as a tagged value? That way the different kinds of parameters nodes can be fully distinguished by calling [only.type _]."
So a=b would expand to (annotate 'optional '(a b)) ? I like how a=b has a really simple expansion though: (= a b) And then what if you want a destructuring list that has `annotate` as the first element?! :P
If you're talking about making it an internal detail: I already do. In fact, just last night I changed it so (o a b) evals at run time (like Arc), but (= a b) evals at creation time (like Python).
Actually, in PyArc, (o a b) is seen as a destructuring list, with o as the first element. But you can use the --backwards-compat switch to change it's behavior. This switch is intended to let you parse/eval existing Arc scripts in PyArc.
"So a=b would expand to (annotate 'optional '(a b)) ? I like how a=b has a really simple expansion though: (= a b) And then what if you want a destructuring list that has `annotate` as the first element?! :P"
Not quite. I'm suggesting a=b could expand to #(tagged optional (a b)), which is to say it would be the same as the result of evaluating (annotate 'optional '(a b)).
To illustrate, these expressions would be equivalent (if you don't consider memory consumption or mutation):
'(def foo (a b c=d e)
(+ a b c e))
`(def foo (a b ,(annotate 'optional '(c d)) e)
(+ a b c e))
? Unless you're saying that you would need to use a=b for optionals, which I don't like. As I said, I like that a=b expands into (= a b). Very nice and simple, like the other ssyntax expansions.
So... your approach would work, in the sense that you can then use = while destructuring without causing problems, but then if you don't want to use the a=b ssyntax, you would need to do this:
(with (c (annotate 'optional '(c 1))
d (annotate 'optional '(d 2)))
(def foo (a b c d)))
...which is pretty much the most verbose and complicated optional system I've seen yet. :P You'd be better off using these, I think:
(def foo (a b ? c 1 d 2))
(def foo (a b ? c 1 ? d 2))
Though... if I were willing to get rid of the symmetry between a=b and (= a b), I could use a hybrid approach. Then, I could support both ? and a=b for optionals:
(def foo (a b c=1 ? d 2))
Seems kinda confusing, though, using two different syntaxes and semantics (individual infix/collective prefix) for the same thing.
P.S. I find it amusing that we're coming up with increasingly convolted and complicated systems just so we can allow = as the first element when destructuring. :P
(with (c (annotate 'optional '(c 1))
d (annotate 'optional '(d 2)))
(def foo (a b c d)))
My first impression of that code is that it doesn't do what you intend. Rather than using the tagged values held in the variables 'c and 'd, you're making new variables named 'c and 'd which happen to shadow the first ones.
I think what you're going for would need to look more like this:
(def foo (a b #(tagged optional (c 1)) #(tagged optional (d 2)))
...)
Here, #(tagged ...) is vanilla Arc's reader syntax for tagged values. The result of reading #(tagged a b) is pretty much the same as the result of evaluating (annotate 'a 'b); in fact, they satisfy Racket 'equal?. However, this syntax is largely an implementation detail of official Arc, and I expect it would have an altogether different meaning in your system, so this is nothing more than a sketch.
Seeing this sketch, you might still say it's verbose to write #(tagged optional (a b)), and you might continue to say it's verbose if I change the example to use a syntax like {tag = a b}. (I dunno, you might not. ^_^ ) The thing is, we could avoid that syntax in most cases since a=b would work just as well.
---
Oh, I forgot to mention this the first time you brought it up, but the "?" alternatives don't help with this particular issue, since I might want to name a variable "?" too. It's not that I really need to name a variable "?", any more than I really need to name a variable "o" or "=", but it's the same principle.
---
"P.S. I find it amusing that we're coming up with increasingly convolted and complicated systems just so we can allow = as the first element when destructuring. :P"
Well, depending on how much things like this are important to you, they might influence your design of the whole language. A solution can seem convoluted in one language and straightforward in another, and not just 'cause the other provides convoluted libraries. :-p IMO, finding convoluted solutions to inane problems is a great exercise, 'cause sometimes two inane problems end up sharing a solution, and that solution becomes a more obvious thing to try to streamline when building a new foundation.
Yeah, I couldn't really think of a good way to do it without using quasiquotation + unquote. Which pretty much only proves my point further that it's a clunky system that relies heavily on the a=b ssyntax to reduce the clunkiness. :P
Which is okay, if you want users to always use a=b for optional arguments, but I don't think that's so great if you want a=b to desugar to S-expressions (like the other ssyntaxes). I guess it's just the eternal battle between simplicity and correctness.
---
Yeah, but it does help with the issue of destructuring. See what I mean about no system being perfect? Anyways, if I used the hybrid approach, I would expect this to work:
(def foo (a b ?=nil))
Since ssyntax is expanded at read time, this would bypass the ? symbol detection. Thus, as near as I can see, the hybrid system would work in every situation, but at the cost of using two different systems to accomplish the same thing (optional arguments).
That's fine if you don't mind the additional complexity. But I tend to like simplicity, even at the cost of small corner-case issues.
---
I'm not necessarily saying it's a bad thing that we're discussing alternatives (even convolted ones), just that I found it amusing. I personally consider that system too convolted, but it might make for an interesting experiment. And as you say, sometimes experimenting with convolted systems can lead to wonderful new solutions.
"Yeah, I couldn't really think of a good way to do it without using quasiquotation + unquote."
In PyArc, I believe you. A syntax like #(...) doesn't work if # is going to be interpreted as prefix syntax. That said, is {tag = a b} not an option? No worries either way, I'm just happy you know what the heck I'm talking about. ^_^
With all this talk about this one idea, I've been meaning to throw yet another idea at you: How about an alternate way to destructure? If (= a b) makes an optional argument 'a, maybe something like (% = a b) could mean three destructuring local variables '=, 'a, and 'b. A macro that wants to destructure using a user-supplied variable could use the (% ...) version of destructuring, rather than letting '= mess things up or jumping through more complicated hoops to deal with it.
In fact, % and = could someday be part of a user-extensible framework of pattern matchers, although that would take a lot more infrastructure.
---
"Since ssyntax is expanded at read time, this would bypass the ? symbol detection."
Whoa, nifty.
---
"But I tend to like simplicity, even at the cost of small corner-case issues."
And the reason I care about corner-case issues is 'cause I like the simplicity of not having to deal with them. ^_^
Well, I could just refuse to deal with them anyway, but then I sacrifice correctness. So it is a simplicity-and-correctness struggle after all.
---
"I'm not necessarily saying it's a bad thing that we're discussing alternatives (even convolted ones), just that I found it amusing."
Yeah, sorry for giving a more serious and advisory sort of response, I know just what you mean. ^^ For some reason, when I'm in a forum setting, I tend to take the time to refine my words into diamonds of ultimate seriousness.
It's an option in the sense that it would work, but I dislike it because then the user must either A) use the syntax, or B) use eval with quasiquote + unquote. I really like that almost all [1] syntax expands to plain old S-expressions, and would like to keep it that way as much as I can.
Yeah, we could go the alternate route: rather than trying to fix optional syntax, we could change destructuring, since it seems to be the major pain point. If I recall, akkartik was talking about using ' for destructuring:
(def foo (a b '(= c d)) (list a b = c d))
(foo 1 2 '(3 4 5)) -> (1 2 3 4 5)
I kinda like that, actually. Makes it clear that the (= c d) part isn't going to be evaluated. Of course, that would mean hardcoding (quote ...) to mean "destructuring" when in the argument list, but that's not really any worse than hardcoding (= a b) to mean "optional assignment".
In fact, I like that enough that I'm going to implement that in PyArc. You can use --backwards-compat to use the old behavior for destructuring. In case you're worried, it would work even if you want to use quote as a normal argument name or as an argument name when destructuring:
(def foo (quote '(quote a b)))
This also completely frees us from syntax worries. If we want to use (& a b) for something, we don't need to worry about it conflicting with destructuring. The only downside is that now (quote ...) is reserved for destructuring, so you can't use (quote ...) to mean something else. Until I add in customizable semantics for argument lists...?! :P
And as an added bonus, I can make it so it distinguishes between symbols and lists:
(def foo ('a '(b c)))
So we can have ' followed by a symbol mean something different than ' followed by a list. Not sure what it would mean, but it's a possible future option.
This also fixes one thing I wanted to add: destructuring assignments. In Python, you can use this:
a, b = (1, 2)
But I don't know of any analogs in Arc. But I could change = so it destructures when the first argument is (quote ...), like so:
(= '(a b) '(1 2))
Which does the same thing as the Python version. Basically, this would assign 'a to 1, and 'b to 2. Useful if you call a function and it returns a list, since then you can destructure it into separate variables. Consider the case where a function returns a list, but you only care about the first item:
(def foo () (list 1 2 3))
(= '(a) (foo)) -> a is now 1
Oh, and this would lead to a possible use for "," as well:
(= a,b (foo))
But that's not super important. Just something to think about.
[1]: The only syntax I can think of that can't/doesn't expand to S-expressions is "." I could even change PyArc so "" gets expanded to (string ...) and #\ gets expanded to (char ...). I find this one-to-one mapping between syntax and S-expressions really neat.
---
Oh, yeah, and if you had ? and not = you could even do this:
That whole post is awesome. The (= '(a b) ...) syntax might be more consistent as (= (list a b) ...), so that you can (zap rev (list a b)), but the quote version could have more advantages than consistency:
(zap idfn '(a b c)) ; define self-evaluating symbols?
(= '(a b c=4) x) ; multiple assignment with optionals?
Yeah, it's an awesome idea. But akkartik deserves the credit for it: I merely added on the thing about destructuring assignments. I found the post where I got the idea from: http://arclanguage.org/item?id=12575
Turns out the idea originated with waterhouse, akkartik tweaked it, you mentioned something similar which reminded me of it, aaand then I added on destructuring assignments. Hurray for collaboration!
Also, why can't we do both? If I understand it correctly, = works based on the name, so we could have (list ...) and (quote ...) do the same thing. I like how (= '(a b)) is short, and it has synergy with destructuring in argument lists, so I'd like to keep that. :P
By the way, I got quoted destructuring in argument lists working hours ago. Now I'm working on getting fn, quote, if, etc. as fexprs.
Yeah, I just didn't think it was important to mention having both. And I remembered the conversation, vaguely, and I took that into account when saying that your post was awesome. ^_^
By the way, your great idea for optional assignment got me thinking: maybe we could define = so when it sees (quote ...) it basically wraps around a function, like `let`. Then we would get things like optionals for free, and if myself or others extended the argument list further, those changes would also automagically work with =
In fact, if we did that... we could have destructuring destructuring assignments:
(= '(a '(b c (d)) e)
'(1 (2 3 (4)) 5))
a -> 1
b -> 2
c -> 3
d -> 4
e -> 5
Or we could use rest args:
(= '(a b . c)
'(1 2 3 4 5))
a -> 1
b -> 2
c -> (3 4 5)
Oh, and the (let (a b c) nil) idiom would work too:
(= '(a b c) nil)
I love that symmetry. I'm not sure why, but I figure the more things we can desugar into functions, the better.
P.S. Destructuring destructuring might be useful for alists:
(= '(_ '(_ a))
'((a 1) (b 2))) -> a is now 2
...though the above is probably better written like this:
Yeah, I use destructuring destructuring all the time. Arc lets you use optionals in destructuring too, but I've only used that once or twice.
You have an example with _, but be careful, '_ is one of the most common Arc variable names. :-p Replacing '(_ a) with '('() a) should make it safer.
You're talking about doing (= '(...) ...) in terms of functions, but I have trouble seeing how that would work, and I was thinking the reverse. What if every function began by doing a destructuring assignment? You could have no parameter list syntax at all, in a sense, and it would be completely extensible using 'defset. On the other hand, it means 'defset extensions would have to return four elements instead of three, the fourth element being a list of local variables that a function will have if the form is used as a parameter list.
But in PyArc, you can assign to nil. :P Besides, wouldn't that throw an error in pgArc anyways, since you can't assign to nil?
Hm... yeah, I wasn't entirely sure how my idea would work either. I like the idea of using defset to extend the argument list, rather than the other way around. Then we'd get extensible argument syntax for free, rather than me having to add in any infrastructure. :P
In pg-Arc, (let (() a) '(1 2) ...) destructures just fine. It provides a local variable "a" initialized to 2, and it ignores the 1.
In PyArc, I assume (= '() a) and (= nil a) would be distinct cases. Besides the difference of a (quote ...) form, you've already said nil and () were different.
If pg-Arc were changed to require (quote ...) around each destructuring form, (() a) would become '('() . '('a . '())). Note that simple variable names like "a" are themselves destructuring forms; they're degenerate ones which have nothing but a rest arg.
Oh, right. Still, at least (= nil 4) and (= 'nil 4) have the (quote ...) difference....
Come to think of it, maybe it doesn't work to consider a rest parameter as a destructuring form, since conflating (fn 'a ...) with (fn a ...) would also conflate (= 'a ...) with (= a ...). (From the future: In fact, there might be trouble integrating setforms and parameter lists at all. See below.)
Here's a full grammar for pg-Arc parameter lists:
=== parameter list ===
; This matches based on matching the car and cdr of a cons. It raises
; an error on other kinds of value, *including nil*.
(<destructuring form> . <parameter list>)
; This binds what it's matched to to a new local variable.
<non-nil symbol>
; This raises an error if it's matched to something other than nil.
()
; When matched to nil, this evaluates <default> and matches that to
; <destructuring atom>, and it matches nil to <parameter list>. When
; matched to a cons, this matches <destructuring atom> to the car and
; <parameter list> to the cdr. When matched to another kind of value,
; this raises an error.
((o <destructuring atom> <default>) . <parameter list>)
; The ((o <> <>) . <>) rule takes precedence over the (<> . <>) rule.
=== destructuring form ===
; This matches based on matching the car and cdr of a cons *or nil*.
; It raises an error on other kinds of value.
(<destructuring form> . <destructuring form>)
((o <destructuring atom> <default>) . <destructuring form>)
<destructuring atom>
; The ((o <> <>) . <>) rule takes precedence over the (<> . <>) rule.
=== destructuring atom ===
<non-nil symbol>
; This ignores what it's matched to altogether.
()
Here's how I currently imagine one of our grammars under discussion, without setforms integration:
=== parameter ===
; This matches based on matching the car and cdr of a cons or nil. It
; raises an error on other kinds of value.
(quote (<parameter> . <parameter>))
; When matched to nil, this evaluates <default> and matches that to
; <parameter a>, and it matches nil to <parameter b>. When matched to
; a cons, this matches <parameter a> to the car and <parameter b> to
; the cdr. When matched to another kind of value, this raises an
; error.
((= <parameter a> <default>) . <parameter b>)
; This binds what it's matched to to a new local variable.
<non-nil symbol>
; This ignores what it's matched to altogether.
()
; This matches as though it's (quote (<parameter a> . <parameter b>)).
(<parameter a> . <parameter b>)
; The (<> . <>) syntax has the lowest precedence.
With setforms integration, it seems like it would be a bit troublesome. The (<> . <>) case would need to cover (= (my-table k) v), but it would also need to cover (fn (my-table k) ...). Hmm, not as seamless as I hoped. ^^;
Remember: the destructuring/optional/rest/etc. functionality would only be when the first argument to `=` is quoted.
Derp, I just realized that `let` desugars to `fn`, which doesn't have destructuring. I guess you could replace `let` with `with` and it would still work, albeit it would be clunkier. Something like this:
You could also optimize it, so it expands into a normal `fn` when the argument list is a symbol, like (fn args) but does the special stuff when it's a cons, like (fn (a b)):
The above does expand properly in Arc 3.1... except you need to replace (let old fn with (let old 'old-fn because Arc doesn't have first-class special forms like PyArc. :P
Yeah, 'fn is the most basic way to make local variables in pg-Arc, and everything else is based on that. The problem I see with basing 'fn on a simpler, non-destructuring version of 'fn, 'with, or 'let is that somehow you need to take things like (a (b c) d=1 e=2) and extract their variables for use in the simpler form. Otherwise you get cases like this:
(simple-fn gs525
(simple-let-nil (a (b c) d=1 e=2)
(= '(a (b c) d=1 e=2) gs525)
...))
By the time you to all the trouble to make this 'simple-let-nil work, you might as well have made 'fn with destructuring built in.
...Well, since you're in an interpreter, you may be able to get away with this:
(simple-let gs1 (argument-list-vars '(a (b c) d=1 e=2))
(simple-fn gs2
(interpreted-simple-let-nil gs1
(= '(a (b c) d=1 e=2) gs2)
...)))
There are probably plenty of other possibilities I'm overlooking too. ^_^
Oh, and I suppose my Penknife plan is another option. The core language could supply a simple 'fn, and another namespace could supply a more advanced 'fn once enough functionality has been built up based on the core.
Hm... yeah, that makes sense. Alrighty then, let's try to get `=` to desugar to a function, so we can get destructuring assignments. :P I figure (= '(a b) nil) could desugar to this:
(with (gs204 nil gs205 nil)
(= a gs204)
(= b gs205))
And then (= '(a b (c d)) '(1 2 (3 4))) could desugar to this:
(with (gs206 1
gs207 2
gs208 (3 4))
(= a gs206)
(= b gs207)
(= c gs208))
Hm... but that wouldn't work recursively, so it would only be one level of destructuring. In which case you might as well use something simpler:
It sounds like you're ending up with the same thing as setforms. That'll at least work, but it's not what I thought you meant by having '= desugar to a function.
I thought you meant something like this:
(= '(a b (c d)) '(1 2 (3 4)))
-->
(apply (fn (a b (c d))
(export-locals-to-parent-scope))
'(1 2 (3 4)))
And come to think of it, that's something which might actually make sense in an interpreter.
The setforms way is what I'd choose, at any rate. ^_^
That would work, yes, but it'd require adding an `export-locals-to-parent-scope` built-in (or similar), as you say. Preferably, I'd like to define it in a way that's compatible with pg-Arc.
It's not a big deal, though. In a worst-case scenario, we can define `=` so it does a single level of destructuring. That should be simple to add, even in pg-Arc, while still providing some usefulness.
P.S. `with` expands to a function, so technically it is desugaring to a function. :P Your idea is quite a bit shorter and simpler, though.
Also... this just reminded me of something: lambdas in JavaScript. If I recall, there was a proposal for "thin" functions in JS, which (among other things) would not have "var" scope:
function foo() {
(lambda {
var bar = 10;
})()
return bar;
}
foo() -> 10
Basically, any "var" statements would act as if the lambda wasn't even there, thus assigning them to the outer scope. After looking it up, it seems that "lambda" was changed to "#", which is quite a bit shorter (http://brendaneich.com/2011/01/harmony-of-my-dreams/).
A similar "thin-fn" form in Arc might be an interesting experiment: any assignments within the thin-fn would always assign to the outer scope. You could still assign locally, by using `let`, since that desugars to a "heavy" function:
(def foo (a)
((thin-fn (a)
(= a "foo")))
a)
(foo) -> "foo"
(def foo (a)
((thin-fn (a)
(let a nil
(= a "foo"))))
a)
(foo) -> nil
Then we could have `=` desugar to a thin-fn, which would work without needing an `export-locals-to-parent-scope` built-in. The advantage of this approach is that it's more general: other functions might be able to make use of thin-fn's too.
...but, Arc fn's already behave like the lambda proposal for JS, so the problems that JS is trying to solve simply don't exist in Arc. Thus, it's arguable whether Arc should have "even thinner" fn's or not.
Making optionals a first-class datatype is certainly an interesting idea, but it sounds like you're giving it special compile-time semantics. Interesting/weird.
o is a keyword. No different from assign or tagged or if. Your suggestion is to match a 'keyphrase': (tagged optional _) or even having to detect then evaluate things like (annotate 'optional 'a) at compile-time. Maybe even read time. Does that ever happen right now?
I'm having trouble articulating myself, but it definitely seems like another whacky rocketnia idea :)
"o is a keyword. No different from assign or tagged or if."
You might already know this (search for "AST" at http://arclanguage.org/item?id=13079), but I'm irked by special forms like 'assign or 'if too. I like reducing the places where some message almost always has one meaning, unless it's some particular message which has a completely different meaning. I'm just fine when language designers make knowing sacrifices, but whenever possible, I want to be able to say "the eagle has left the nest" to talk about actual eagles--or at least to be sure I won't ever want to talk about actual eagles.
I think 'o is a place where neither sacrifice is necessary (neither giving up (o b c) destructuring nor giving up 'o as a variable name).
As for 'tagged, that's a symbol you can't observe from Arc unless you drop to Racket, whereas macros see the symbol 'o all the time.
---
"Your suggestion is to match a 'keyphrase': (tagged optional _) or even having to detect then evaluate things like (annotate 'optional 'a) at compile-time. Maybe even read time. Does that ever happen right now?"
Eh? O_o You're all over the map in misinterpreting me (which isn't something I blame you for).
Some parts of what you're talking about "happen right now" in PyArc, as far as I understand: Under default options, there's no distinction between compile time and read time, and macros are expanded in that one step, just like ssyntax is. This isn't necessarily the final reader design, since it causes problems with things like (afn ((= a b)) ...). Keep in mind that I'm talking about solutions for PyArc in this turbulent time, so suggestions I make may be more generalized, more wacky, and more misinterpretable than the same suggestions applied to any given concrete version of PyArc. ^^;
The solution I'm talking about does apply to official Arc too, except that official Arc doesn't have ssyntax in the reader, so the approach isn't as practical. Official Arc does save ssexpansion until surrounding macros have done their work (post-read-time), so we can use an easier solution like '.o instead.
Here's exactly how the tagged value approach would look in official Arc:
(def foo (a b #(tagged optional (c d)) e)
(+ a b c e))
Note that #(tagged a b) is read as a tagged value:
arc> (type '#(taÂgged a b))
a
arc> (mac stx-types args `',(map type args))
#(tagged mac #<procedure: stx-types>)
arc> (stx-types 4 a (b) () "c" #\d #(tagged e f))
(int sym cons sym string char e)
However, I've never seen anyone write a literal tagged value this way, and I think we oughta consider it to be mostly an implementation detail, since it piggybacks on Racket vector literal syntax. The most useful thing we'd do with it is serializing a tagged value to a file and reading it back in. Still, it does "happen right now."
By the way, if the #(...) syntax didn't exist, we could still define functions with optional arguments this way:
(eval `(def foo (a b ,(annotate 'optional '(c d)) e)
(+ a b c e)))
I'm not saying it's a good idea. Like I said, this idea is targeted at PyArc, where a read-time a=b syntax would easily be the most attractive way to stuff values of type 'optional into the code.
"You might already know this (search for "AST" at http://arclanguage.org/item?id=13079), but I'm irked by special forms like 'assign or 'if too."
Yeah, I don't like those either. In fact, I was thinking about making `fn` a macro in PyArc. Also, if I can, I plan to allow programs to shadow things like `assign` and other special forms:
(def foo (bar assign)
(= bar "something"))
(foo) -> error, because = uses assign, but assign is nil
I've also tried to keep the number of special forms in PyArc to an absolute minimum. Right now, there's quote, quasiquote, if, fn, assign, and apply. I hope to get rid of apply, and possibly fn.
The only way I can see to get rid of if, assign, quote, and quasiquote would be to create a third "function" type: it behaves like a function, so it's evaluated at run time, but it doesn't evaluate it's arguments, so it's basically a function/macro hybrid.
Haha, I just had a crazy idea. Make (fn) desugar into the special hybrid form, but have it evaluate all it's arguments. Then there would only be two function types, and (fn) would be a thin wrapper around one of them. I should be able to get rid of all the special forms then, leaving only functions and macros (which are just annotated functions anyways).
---
"I think 'o is a place where neither sacrifice is necessary (neither giving up (o b c) destructuring nor giving up 'o as a variable name)."
But you do give up (o a b) destructuring:
(def foo ((o a 1) (o b a)) (list a b))
(foo '(1 2 3) '(4 5 6)) -> ((1 2 3) (4 5 6))
Unless of course you're talking about ab/using ssyntax like .o, in which case you're not really using (o a b) anymore, nor would it work if ssyntax is expanded at read time.
Thus, it has the same gotcha as (= a b), which is that you can't use o as the first element when destructuring. By the way, I mentioned earlier that I could treat escaped symbols differently:
(def foo ((|=| a b)) (list = a b))
(foo '(1 2 3)) -> (1 2 3)
I think this approach would be better than using ssyntax, because || already means "escape this thing", so we don't need to invent new syntax for it. Not sure if I'll go that route, but it is an option.
You have 'apply as a special form? In most Schemes it's a function. :)
You can definitely implement 'quasiquote and 'if as macros.
That leaves 'quote, 'fn, and 'assign. Those are all pretty fundamental. While you could make them macros, there's pretty much nothing to turn them into, unless you go for the tagged type route.
---
"The only way I can see to get rid of if, assign, quote, and quasiquote would be to create a third "function" type: it behaves like a function, so it's evaluated at run time, but it doesn't evaluate it's arguments, so it's basically a function/macro hybrid."
Hey, look, fexprs. XD Every sufficiently long Arc Forum topic seems to get around to them at some point. :-p
Since you're writing an interpreter, fexprs are a natural thing to do. They have certain elegance advantages over classical macros, since you can avoid special forms altogether, and programmers can (re)define their fexprs after the places they use them. The main disadvantage is that they're much harder to efficiently compile--not an issue for an interpreter.
---
"Haha, I just had a crazy idea. Make (fn) desugar into the special hybrid form, but have it evaluate all it's arguments. Then there would only be two function types, and (fn) would be a thin wrapper around one of them. I should be able to get rid of all the special forms then, leaving only functions and macros (which are just annotated functions anyways)."
Kernel, the most promising design I've seen for an efficient fexpr language (er, with lexical scope; sorry, PicoLisp) does the opposite: Functions are annotated fexprs. All forms are applied in exactly the same way, but functions take the extra step of evaluating their arguments. (FYI, Kernel calls functions "applicatives" and fexprs "operatives.") The one thing about Kernel is that it's kinda vaporware....
Eight's another fexpr language with lexical scope. Like Kernel, it isn't quite complete, but at least it's developed in the open on GitHub. (Hi diiq!) There's a very old topic on Eight here: http://arclanguage.org/item?id=10719
Recently, like a few months ago, diiq came back to Eight and reimplemented it in JavaScript. I've been too busy to bother with it yet, but it oughta be awesome if it's both similar to Arc and runnable in a browser.
---
"Unless of course you're talking about ab/using ssyntax like .o, in which case you're not really using (o a b) anymore, nor would it work if ssyntax is expanded at read time."
That's exactly the kind of thing I'm talking about. :) That's a solution I'm satisfied with for official Arc, but I'm confident there'd be at least some solution for PyArc that I liked better than 'o. It may just be too early to identify it.
---
"By the way, I mentioned earlier that I could treat escaped symbols differently: [...] I think this approach would be better than using ssyntax, because || already means "escape this thing", so we don't need to invent new syntax for it. Not sure if I'll go that route, but it is an option."
Well, I'm not sure what s-expressions you consider the 'def macro to see as input. If you manage to be consistent about that, and if you don't actually need to use the || in practice too much, then it's an idea I could get behind.
It's both a function and a special form. Yes, it's weird. Yes, I hope to change it.
I don't see how `if` could be implemented as a macro. Consider this case:
(def foo (a) (if a 1 2))
The `if` form needs to be evaluated every time the foo function is run, and then return either 1 or 2, depending on what 'a is. Keep in mind this is Python, so I can't have `if` desugar to `cond`. :P
---
Yeah, that's what I was talking about: have `fn` be a fexpr that returns functions that are annotated with type 'fn. And since macros are just functions with type 'fn, that would mean that everything in PyArc boils down to fexprs.
I didn't realize that what I described was fexprs, but in hindsight I probably should have. I actually messed around with NewLisp for a bit; they use fexprs. I prefer Arc's macro system, since I don't think NewLisp has quasiquote/unquote, which made certain things more verbose than they should have been.
However, the ability to have everything desugar to fexprs is appealing, especially since I no longer need special forms hardcoded into eval. PyArc will still have normal functions and macros, of course. They'll just be fexprs underneath. :P
"Eight's another fexpr language with lexical scope. Like Kernel, it isn't quite complete, but at least it's developed in the open on GitHub. (Hi diiq!) There's a very old topic on Eight here: http://arclanguage.org/item?id=10719 "
I've been meaning to ask, how are you doing macros in the read step? I take it they're able to parse their bodies as text, or else things like argument-only ssyntax wouldn't even be an option. This would make them similar either to Penknife syntaxes or Scheme/CL reader macros, depending on whether their input is string-like or stream-like. But doesn't this mean they're (at least sometimes) defined in a completely different way from Arc macros?
I tokenize the input stream (which is probably a string) one step at a time, and when I see a "(" I then keep parsing until I find a closing ")" Now, it's available as a list, so I check if the car is a macro, and if so I then call it with the remaining arguments, and return that.
However, I ran into a bit of a problem while parsing arc.arc, so I'll likely be refactoring that system. For instance, I may want it to only expand macros one level, rather than doing them all. The problem is, let's say you have this:
Oops! Note how it expanded the (foo) call, whereas Arc didn't. In other words, PyArc is a little bit too greedy about expanding macros.
In any case, the whole tokenize/parse/transform/macro expand stage is handled at the same time, so once that part's done, it can just use eval on the rest.
---
As for handling argument-only ssyntax, I've been wondering about that myself. But here's my current idea: when eval sees an (fn) form, it then creates and returns a (Python) function, which it can then execute later. Arguments and closures are handled when the function is created, so I should be able to do ssyntax expansion then.
And since macros are just annotated functions, I figure that approach will work with macros too. Thus, global ssyntax expands at read time, but argument ssyntax expands when the function is created, but before it's actually run.
Ah, was afraid of just that gotcha. It means if you say (afn ((= b c)) ...), you'll end up expanding the = macro before even realizing it's an argument list. o.o; (Right?)
There's another approach I thought you might have taken, and it's free of that trouble. You mentioned looking for (fn) forms, and it's probably the same general idea: When you encounter a left paren, read the next expression after that, and use it to determine what happens next (like a reader macro). If it's a symbol, look it up and see if it's bound to a macro, and let the macro (somehow) take care of reading its body and finding the right paren. Otherwise, read elements until you get a right paren, and treat the expreaaions as a function call.
Unfortunately, I don't know how to polish up that idea to make syntax like "a.(b c).d" work out correctly, especially in cases like "catch.(b throw).d" where 'catch determines the meaning of 'throw. Actually, I could figure something out for that too, but it would be a bit arbitrary, and you may already know you're not headed this way. :-p
Yes. I'll need to do some thinking on how to fix macro expansion.
---
Actually, I've been thinking about looking into reader macros, and seeing if I could use something similar to implement customizable ssyntax. That would have the benefit of potentially allowing for more powerful syntax.
I've made the equality comparison its own parameter, and it's based on the comparison function by default rather than being 'is. That way you can find 3.0 by searching for 3 (even though 3 and 3.0 aren't 'is), and you can pass in other comparators like 'iso instead.
I've put in the 'len-stack local and used the 'car of the later portion rather than the 'last of the earlier portion. That lets me avoid doing linear traversals any more than I have to. I don't know if this helps with the time complexity, 'cause the complexity's still linear in the length of the haystack: Even if the caller calculates 'len-stack in constant time (say, using 'qlen) and passes it explicitly, the uses of 'nthcdr ultimately do about as many cdrs as there are elements in the list.
The 'car-versus-'last change does affect the outcome of the algorithm a little: Mine finds the latest applicable index in the list, whereas yours finds the earliest.
I also used 'int instead of 'round. This is apparently a little bit faster (and should have no effect on the results or algorithmic complexity):
Furthermore, I've specifically chosen 'nthcdr over 'split so as to reduce consing (not just to reduce the uses of 'len). Now the only places I suspect memory will be allocated are the lambdas produced by 'withs (which oughta be something the Racket compiler can strip out), maybe possibly the arithmetic (if you're dealing with ridiculously gargantuan list lengths), and miscellaneous runtime overhead associated with things like function calls and variable lookup.
In fact, we probably do have some argument list consing to worry about: Since 'bsearch has optional arguments, it's compiled using 'ac-complex-fn, which makes it a varargs function as far as Racket's concerned.
Here's a version that implements the loop using a non-varargs function:
(load "lib/util.arc") ; for 'xloop
; same as before
(def <-to-== ((o < <))
(fn (a b)
(~or (< a b) (< b a))))
(def bsearch (stack needle (o < <) (o == <-to-==.<))
(xloop (stack stack offset 0 len-stack len.stack)
(case len-stack
0 nil
1 (when (== car.stack needle) offset)
(withs (len-before (int:/ len-stack 2)
after (nthcdr len-before stack))
(if (< needle car.after)
(next stack offset len-before)
(next after
(+ offset len-before) (- len-stack len-before)))))))
arc> (time:repeat 10000 (bsearch '(0 1 4 9 16) 9))
time: 341 msec.
nil
arc> (time:repeat 10000 (bsearch '(0 1 4 9 16) 8))
time: 330 msec.
nil
That's pretty negligible, but it doesn't seem to be quite within the range of error: I've tried this test a few times to account for things like the JIT warming up, and the $.+ version is consistently slightly better.
So yeah, this algorithm is linear in the length of the list, but it still lets you cut down on the number of times the comparison function is called, which is oftentimes the most important thing.
For binary search with insertion or removal, which is a likely operation if we're maintaining a sorted list at runtime, this algorithm is actually especially ideal. We end up with linear-in-length-times-logarithmic-in-comparisons time (plus an allocation for insertion), which is kinda the same as the time complexity on a dynamic array.
To break through these barriers any further, we'd probably need to adapt the surrounding algorithm so it used a fixed-size data structure (cutting down on allocations) and/or a heap (cutting down on the linear-in-length time factor) and/or a more informative model for the the haystack's distribution (so we could use things like Newton's method for better-than-logarithmic search). But this isn't something the implementor of 'bsearch has control over.
Great call! Ever since your post at http://arclanguage.org/item?id=12477, I've been itching to start gathering a full list somewhere. It didn't occur to me to put it on a wiki.
"Thus, the name is intentionally ugly. You shouldn't be mucking around with __built-ins * unless you need to, but it's available just in case you do need it. Though... do you think the two underscores are too Pythony?"
"Too Pythony" was my first impression, lol, but it makes sense according to your naming scheme. An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?
---
"There should be a blacklist (or a whitelist?) of "safe" and "unsafe" functions."
I'm not a security expert either, but I don't know if that should be a global list. Suppose I make an online REPL which executes PyArc code people send to it, and suppose my REPL program also depends on someone else's Web server utilities, which I automatically download from their site as the program loads (if I don't already have them). I might distrust the Web server library in general but trust it enough to open ports, but I also might not want REPL users to have that power over my ports.
I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one.
"As you can see, bar.arc wants to expand the macro `message` in bar's namespace, not in foo's namespace."
Well, it's fine if that's what you expect as the writer of bar.arc, but I'd expect things to actually succeed at being hygienic. My approach to bar.arc would be more like this:
(= foo!something (fn () "goodbye"))
This doesn't need to pollute all uses of foo.arc in the application; bar.arc can have its own separate instance of foo.arc.
There may still be a namespace issue though. If foo.arc defines a macro with an anaphoric variable, like 'aif, and then bar.arc uses foo.arc's version of 'aif, then the anaphoric variable will still be in foo.arc's namespace, right? My own solution would look something like this:
"An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?"
Yeah, I figured two underscores served as more emphasis than one. :P Also, two underscores seemed uglier to me, and also distinguished it from internal ("private") variables.
---
"I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one."
When I said "global list" what I meant was just defining which are safe and which aren't. Then having a default safe namespace that would contain the items deemed safe.
Yeah, you can create custom namespaces, for instance you could create a safe namespace that allows access to safe functions and (system), but nothing else:
Thus, web-server.arc has access to the safe functions, and open-socket. Meanwhile, the input that you get from the user is eval'd in a safe environment. It's a very flexible system. The above is verbose, I admit, but that can be fixed with a macro or two.
---
"My approach to bar.arc would be more like this:"
Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad.
Also, there can't be a `caller-namespace` variable (at least not implicitly), because then untrusted code could access trusted code, and so why have a distinction at all? Your example would work, but only if importers explicitly decide to give access:
"Side note: I'm going to start using .' rather than ! because I think the former looks nicer."
I agree. ^_^
---
"Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad."
I don't know what else you could do if you wanted both somethings at once. ^^; That said, I think I'd just explicitly qualify foo!something or import it under a new name.
There's some more potential trouble, though. If a module defines something that's supposed to be unique to it, then two instances of that module will have separate versions of the value, and they may not be compatible. If a module establishes a framework, for instance, then two instances of the module may define two frameworks, each with its own extensions, and some data might make its way over to the wrong framework at some point. On the other side of the issue, if a module extends a framework, then two instances of the module might extend it twice, and one of them might get in the way of the other.
There are several possible ways to deal with this. Code that loads a library (host code?) could load it in an environment that had dummy variable bindings which didn't actually change when they were assigned to, thereby causing the library to use an existing structure even if it created a new one. Framework structures could all be put in a single central namespace, as you say, and any code to make a new one could check to see if it already existed. A library could require some global variables to have already been defined in its load namespace, intentionally giving the host code a lot of leeway in how to specify those variables.
I've been considering all those approaches for Penknife, and I'm not sure what'll be nicest in practice. They all seem at least a little hackish, and none of them seems to really solve the duplicated-extension side of the issue, just the duplicated-framework side. At this point, I can only hope the scenarios come up rarely enough that whatever hackish solutions I settle on are good enough, and at least standardized so that not everyone has to reinvent the wheel. Please, if you have ideas, I'm all ears. ^_^
When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?
In any case, all those approaches should work in PyArc, in addition to using __built-ins* (provided you really really want the library's unique something to be unique and available everywhere...)
Hm... come to think of it... an environment/namespace/module can be anything that supports get/set, right? It may be possible to create a custom data-type that would magically handle that. Somehow. With magic.
"When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?"
I thought I gave an example. If a module defines something extensible, then having two extensible things is troublesome, 'cause you have to extend both of them or be careful not to assume that values supported by one extensible thing are supported by the other.
---
"Somehow. With magic."
I propose also reserving the name "w/magic" for use in examples. :-p
""something extensible?" Got any more specific/concrete examples?"
I mean something extensible like the 'setter, 'templates, 'hooks, and 'savers* tables, as well as Anarki's 'defined-variables* , 'vtables* , and 'pickles* tables, all defined in arc.arc. These might sound familiar. ^_^
Lathe (my blob of Arc libraries) is host to a few examples of non-core Arc frameworks. There's the Lathe module system itself, and then there's the rule precedence system and the type-inheritance-aware dispatch system on top of that. There's also a small pattern-matching framework.
If you load the Lathe rule precedence system twice (which I think means invasively removing it from the Lathe module system's cache after the first time, but there may be other ways), you'll have two instances of 'order-contribs, the rulebook where rule precedence rules are kept. Then you can sort some rulebooks according to one 'order-contribs and some according to the other, depending on which instances of the definition utilities you use.
---
"Okay, but if I find a magic function I'm going to put it in PyArc so you can use it in real code too. :P"
I think I saw one implemented toward the end of Rainbow.... >.>
Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated.
If you want your stuff to be available everywhere, stick it in __built-ins. Unless you have a better suggestion?
"Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated."
That's exactly why I'm not sure it'll come up much in practice. But as an example...
Suppose someone makes a bare-bones library to represent monads, for instance, and someone else makes a monadic parser library, and then someone else finally makes a Haskell-style "do" syntax, which they put in their own library. Now I want to make a monadic parser, but I really want the convenience of the "do" syntax--but I can't use it, because the parser library has extended the monad operations for its own custom monad type and the "do" library only sees its own set of extensions.
You mentioned having the person loading the libraries be in charge of loading their dependencies, and that would yield an obvious solution: I can just make sure I only load the monad library once, giving it to both libraries by way of namespace inheritance or something.
But is that approach sustainable in practice? When writing a small snippet for a script or example, it can't be convenient to enumerate all the script's dependencies and configure them to work together. Over multiple projects, people are going to fall back to in-library (load ...) commands just for DRY's sake. What I'd like to see is a good way to let libraries specify their dependencies while still letting their dependents decide how to resolve them.
---
"Unless you have a better suggestion?"
I've told ya my ideas: Dummy global variable bindings and/or a central namespace and/or configuration by way of global variables. (http://arclanguage.org/item?id=14036) They're all too imperfect for my taste, so I'm looking for better suggestions too.
Hm... like I said, it should be possible to build a more complicated system on top of the simple core, though I'm not sure exactly how it would work.
But... here's an idea: a bootloader module that would load itself into __built-ins* so it could persist across all modules, including modules loaded later.
It could then define (namespace ...) and (require ...) functions or something. Modules could be written using said constructs, and the bootloader would then handle the dependencies, creating namespaces as needed. And it could keep a cache around, so re-importing a module that has already been imported will just grab it from the cache.
The bootloader could then define (use ...) or something, which would do all the automatic dependency and caching junk, but you could still use plain old (load) and (import) to bypass the bootloader and get more refined control. Something like that may work.
Haha, I just had a crazy idea. What if a module imported itself into __built-ins* ? Something like this:
; foo.arc
(if no:__built-ins*.'foo-check
(do
(= __built-ins*.'foo-check t)
(load "foo.arc" __built-ins*))
(do
; define rest of foo.arc here
...))
I suspect any solution will have some wart or other. Tradeoffs and all that. Also, the solution to the specific problem you mentioned is to load them all in a single namespace, right? Or at least namespaces that inherit from some common one.
So perhaps we could define a macro that makes that easier, since the current way of doing it is pretty verbose. Assuming it was almost-as-simple as (import ...) that would help ease the pain somewhat, though it wouldn't help with dependency management (that's a whole different ballpark).
I also thought of a macro that would make it easier to import/export stuff to/from a module. Right now you need to do stuff like this:
Oh, and by the way. In addition to creating a safe namespace and selectively giving it unsafe functions, you can also remove functions from a safe namespace.
For instance, suppose you wanted to run code in a safe environment, but you didn't want it to be able to print (using pr, prn, prt, etc.) You could use this:
(= env (new-namespace))
(= env.'disp nil)
; do something with env
Like I said, it's very flexible. You have complete control over what is/isn't in a namespace. You can execute each module in it's own namespace, or combine them however you wish, etc. It has a very simple core, but has many many potential uses.
Why? Well, consider the scenario where somebody imports your module but it has a bug in it.
We're practically the same person. XD That's one of the use cases I always reach for when talking about Penknife or about language hackability in general.
I still don't really like Python's "everything is open!" approach, but I appreciate it a bit more now than I used to. On the other hand, I do think it's a good fit for Arc.
I think your question is really "Wouldn't name-munging accomplish hygiene?" :)
What you're talking about is pretty similar to how Racket accomplishes hygiene by way of 'read-syntax. Everything's wrapped up so you know what its source file is, and that's all I know for now. :-p Seems Racket's system is pretty cumbersome, but then that's probably because it's very static, with modules revealing what they export before any of the exported values are actually calculated.
What you're talking about also sounds extremely similar to Common Lisp's approach to namespaces. I don't know that approach very well, but it could act as some sort of example. ^^
If you're creating a shallow copy and you're just treating a namespace as something that maps names to values (rather than mapping names to bindings containing values), then it won't be as customizable in some ways: When you redefine 'some, you don't get 'all for free, and you don't get to maintain different 'setter tables in different namespaces.
I'm going to great pains reimplementing Penknife so that I create an all new core environment from scratch each time, just so that it can be hackable like that. XD But I'm doing this by passing all the bindings around manually so that core-call's implementation can depend on binding-get's binding and vice versa. The core is ridiculously circuitous, with some 11-argument functions and such. Fortunately, the circuitous part is mostly confined to a single, reasonably sized file, which defines a DSL for the rest of the files to use.
Gotta finish up this reimplementation and post it at some point. XD;;;
Okay, so, my plan right now is that if a function is bound in global_env, it has dynamic scope, and if it's bound anywhere else, it's lexical.
This should allow for my shallow-copying strategy to work, but also allow for shadowing core functions. This may break arc.arc though, but I'll tackle that when I get to it.
I have almost no clue what you're doing, but I hope it works out. Treating core variables differently than others sounds nothing like what I would do, so we're separate people. :-p
Yes, it's ludicrous, crazy, and probably insane, but I'm trying it anyways. I suspect it'll break later, though, so I'll have to go back to lexical-scope-for-everything.
By the way, I called it dynamic scope, but I'm not actually sure what it is. It's this weird short-circuiting thing that causes the global_env to jump back one environment level, which then causes it to loop back onto itself if the variable isn't shadowed, but it works (for now).
Edit: nevermind, I had to revert it. Darn. It was such a silly hack too.
Hm... yes, I may end up needing to change that, or hack it in some way to allow better redefining of built-in functions, while still keeping modules isolated.
If there were a special global 'built-ins*, would it be a built-in? ^_^
Honestly though, I'm not quite sure quite what you mean by defining built-ins. If you're trying to change what's returned by (new-namespace), guess what: You can give something a modified version of 'new-namespace. :D
Okay, so, let me explain how this works... global_env is a Python dictionary that defines the built-ins that are exposed to Arc code. After parsing and evaling arc.arc (which still doesn't work yet, but it should eventually), it now contains the core of Arc, including special functions defined in PyArc.
The (new-namespace) function creates a shallow copy of global_env, which is what keeps the modules isolated from each other, because when you use (import) it loads it with (new-namespace).
What this means is, Arc code cannot overwrite built-ins, they can only shadow them. So if you overwrite the new-namespace function, that change would only affect your module, and nobody else's. See what I mean about modules being too isolated from each other? They're child-proof! :P
What would need to happen in order to support the point you brought up (a coerce* table, etc.) would be a way to actually write to global_env directly, bypassing the shallow copy. But since (new-namespace) works by creating a shallow copy of global_env, any future modules loaded after yours would use the stuff you defined even if they don't import your module, which is why I'm calling it dangerous (but possibly dangerous in a good way).
I'm just trying to decide on the name that is used to write directly to global_env. I think it should be a table, but it could be a function as well, like (new-namespace). Of course, there's still the question of whether we should allow overwriting the built-ins at all, but I think that malleability fits well with Arc.
Technically, he ' ` , ,@ operators are reader macros in Arc, not ssyntax. ^_^ A reader macro is more powerful, in a sense[1]: Once a certain sequence of characters is read in, the whole reader behavior can be replaced, putting you in a different language. On the other hand, those particular operators don't need to be quite that powerful, and I'm all for implementing them and ssyntax in a single consistent way.
In Penknife, instead of saying `(a b ,c), I say qq.[a b \,c] (where "\," is just an escape sequence qq implements; Penknife syntax is founded on nestable strings). As long as infix operators are capable of acting on arbitrary subexpressions, any variable can effectively be a prefix operator, lessening the need for heiroglyphics.
---
[1]: From another perspective, reader macros are rather innately limited to prefix notation; I believe it can be overcome in certain cases (http://arclanguage.org/item?id=13888), but it means manually managing a stack of things which could be pushed under the next infix operator to come along. Can't tell if that's ugly at this point. ^^
Oh, by the way, since we're making ssyntax reader-level, I might be able to get more powerful ssyntax as well. For instance, " works by consuming the stream until it finds a matching " Ditto for [] and () etc. This behavior hopefully wouldn't be too hard to add, though I'm not sure what the Arc interface to it would be like.
I probably wouldn't be able to move the defaults for "" [] () into Arc, though, because they call Python functions/classes that I don't particularly want to expose to Arc.
Yeah, I know. Unless I made ssyntax way more powerful, I couldn't put stuff like "" or [] in there. I'm okay with that, at least for now. But since it's possible to put ` ' , and ,@ in the ssyntax, I plan to do that. Makes it more hackable in Arc, you know?
Also, since I plan to expand ssyntax at read-time in PyArc, what's the distinction between ssyntax and reader macros, besides the fact that Arc can't define reader macros, and reader macros are more powerful?
There's nothing stopping Arc from having reader macros too, except that at this point there isn't a good standard; it takes Racket calls, and the more I learn about Racket and reader macros, the more I think it has an incomplete standard too. :-p I want to make a reader macro that stops when it reaches a symbol-terminating character--but wait, there are ways to specify symbol-terminating characters, but I see no way to check for them. Time to hack the language core... if only I could. ^^
"what's the distinction between ssyntax and reader macros, besides..."
I think the distinction is how much you're parsing the stream one character at a time (in which case you can dispatch on reader macros) and how much you're parsing it in chunks. Infix syntax always looks like a chunk to me, but as I was saying, infix operators could be implemented as reader macros if we kept/passed enough state in the reader. There could be no distinction at all.