"I specified that strings within a Unicode escape must be separated by only one space."
Drat. I thought of it as a nifty way to split a large chunk of escape sequences over multiple lines. It's not like I was ever going to want to do that, though.
That was actually never possible, because of the way the parser chunks the stream into lines and then operates on each individual line. I had considered making it possible... but I think that would have been too confusing. And it would have been far too little gain for too much work...
I've made a prototype for you as a DSL in Arc 3.1. Since I didn't want to call it Blub, I'm calling it Whacky, but feel free to change it. ^_^;
(mac $ (x)
`(cdr `(0 . ,,x)))
(def whacky-call (a b)
(let call-fn
(fn (f arg)
; Get the maximum Racket arity, treating varargs arities as
; though the rest arg is absent.
(caselet arity
(apply max
(map [if $.arity-at-least?._
$.arity-at-least-value._
_]
(let arity $.procedure-arity.f
(if (or $.procedure-arity?.arity
(isa arity 'int))
list.arity
arity))))
0 (err "Can't use a 0-arity fn in whacky.")
( (afn (rev-args n)
(if (is n arity)
(apply f rev.rev-args)
[self (cons _ rev-args) inc.n]))
list.arg 1)))
(if (isa a 'fn) (call-fn a b)
(in type.a 'cons 'table) a.b
(isa b 'fn) (call-fn b a)
(in type.b 'cons 'table) b.a
(err "Neither part of a whacky-call was a fn."))))
(def whacky-compile (expr)
(if (~acons expr)
expr
(let (first . rest) expr
(zap whacky-compile first)
(if no.rest
first
(reduce
(fn (op arg)
(if (and op (isa op 'sym) (~ssyntax op)
(isa (bound&eval op) 'mac))
`(,op ,arg)
`(',whacky-call ,op ,whacky-compile.arg)))
(cons first rest))))))
(mac wh code
whacky-compile.code)
This uses Arc procedures, tables, and lists as procedures, uses Arc macros as macros, and resolves variables in the Arc local environment.
When a call is made using a procedure and a procedure, the first procedure is called with the second as its argument. If the first procedure has arity 0 (after ignoring varargs), this results in an error. Note that lists and tables are treated as procedures here.
When calling a procedure, varargs arguments are never passed in. If you must use a varargs Arc procedure, define a non-varargs Arc procedure that proxies it.
When calling an Arc macro, only one argument is passed in, and that argument doesn't go through whacky-compile, so its exact parentheses are significant. Notably, this means you can use "do (...)" to get back to regular Arc code.
Ssyntax isn't processed, except to protect against evaluating an ssyntax symbol when checking whether it's a macro name.
As in Arc 3.1, a global macro shadows a local procedure. If you use this code in a variant of Arc where this shadowing bug has been fixed (e.g. Anarki), there's probably no way to make this code behave properly, since the 'wh macro can't look up the local environment to see if some macro is shadowed. Instead, this implementation will compile the expression as though it's a macro call (i.e. without compiling the body), and the local function shadowing the macro will be called instead.
arc> (wh 1)
1
arc> (wh ((1)))
1
arc> (wh 1 cons 2)
(1 . 2)
arc> (wh obj a)
#hash()
arc> (let w- (fn (a b) (- a b)) (wh - 1 w- 2))
-3
arc> (wh map1 list (do '(1 2 3)))
((1) (2) (3))
arc> (wh list map1 (do '(1 2 3)))
Error: "Can't use a 0-arity fn in whacky."
arc> (wh (3 cons (1 cons 2)))
(3 1 . 2)
arc> (wh ((1 cons 2) cons 3))
Error: "list-ref: expects type <non-negative exact integer> as 2nd argument, given: #<procedure:cons>; other arguments were: '(1 . 2)"
arc> (wh (cons (1 cons 2) 3))
((1 . 2) . 3)
arc> (wh len (10 cons nil))
1
arc> (wh (10 cons nil) 0)
10
arc> (wh 0 (10 cons nil))
10
arc> (wh 10 cons nil 0)
10
arc> (wh 1 - sqrt)
0+1i
The "0-arity fn" error is due to 'list, whose varargs parameters are ignored. The 'list-ref error is due to the fact that the cons cell (1 . 2) counts as a Whacky procedure.
I spent a week thinking I should be able to crank it out with a few hours of work, but unable to actually do so. Finally I gave up and just threw the idea out there without any code, and boy am I glad I did!
Part of the problem was that hacking on wart I've forgotten that I have a mature lisp available to me.
"JSON unicode escapes have four hexadecimal characters (per RFC 4627), and you're missing a u."
Ah, thanks! I'll fix that right away. The reason I left off the leading 0s is that some languages (like Racket) let you do that. Always using 4 is less ambiguous and more portable, so I'll change that.
---
"You now say "It is invalid for a non-empty line to be indented if it is not within a list, comment, or string," but the whole file is in a list. :-p"
You and your pedantry! The top-level implied list doesn't count. I would hope that's obvious enough that I don't need to explicitly say it, but... unfortunately I know how some people are.
Hm... come to think of it... I could just give the implied list the same rules as explicit lists. Which means that it's okay for top-level expressions to be indented, just so long as they're all the same indent. That solves the issue while being more flexible and consistent. I'll do that instead, and thanks to the rewrite, it's a trivial change to make.
Looks like your pedantry saved the day (again)!
---
"The lack of informative commit messages made it difficult to catch up with the spec changes. Just saying. ^_^"
Well, I usually end up putting a lot of changes into a single commit, so just reading the diffs should be enough? In any case, as said, the spec itself didn't really change much, things just got clarified.
Nuit hasn't changed very much from when I first posted it, except that # and " are now multi-line block prefixes, whereas # used to be single-line and " used to be a delimiter like in JSON/YAML.
Oh yeah, it also ignores whitespace now, thanks to your suggestion. It used to throw an error. Oh! And the second part of the @ list can now be any arbitrary sigil rather than just a string. I think that's about it...
I honestly don't expect Nuit to change much from this point onward. I think things are in a pretty stable state. But I'm still not entirely sure about handling whitespace at the start/end of a string, and I've been mulling over the idea of getting rid of the \ sigil...
"The reason I left off the leading 0s is that some languages (like Racket) let you do that."
Hmm, I thought JavaScript was like that too, but it appears ECMAScript 5 doesn't allow it, and Chrome's implementation doesn't like it either.
---
"Well, I usually end up putting a lot of changes into a single commit, so just reading the diffs should be enough?"
There were lots and lots of indentation-only changes. If those were separated into their own commits, with commit messages that indicated that the indentation was all that changed, it would have been easier.
I trust you to know that ultimately, it doesn't matter what's easy for me as long as it's easy for you. :-p
---
"Nuit hasn't changed very much from when I first posted it[...]"
The wordings changed. Even if you had commit messages that stated your intentions like this, I would have looked at the changes carefully in case something became contradictory or ambiguous by accident.
In hindsight, I should have just checked out the old and new versions of the project and done a diff, lol.
Don't trust me to go to this effort all the time, but I guess I was in the mood for it.
---
"Oh! And the second part of the @ list can now be any arbitrary sigil rather than just a string."
That was the most significant change, in my mind. This example of yours should be a good test case for Nuit implementations:
"There were lots and lots of indentation-only changes. If those were separated into their own commits, with commit messages that indicated that the indentation was all that changed, it would have been easier."
Sure, but that woulda been more work for me. :P I honestly wasn't expecting you to pore through the commit log... Since I am used to working alone, I just use git as essentially a safety net: it lets me go back to an old version just in case the new version doesn't work out. So commit messages aren't nearly as important to me as they would be in a team-based environment.
---
"The wordings changed. Even if you had commit messages that stated your intentions like this, I would have looked at the changes carefully in case something became contradictory or ambiguous by accident."
Then the commit messages would have been useless anyways, right? :P
---
"In hindsight, I should have just checked out the old and new versions of the project and done a diff, lol."
Github even lets you do a diff on their website! :D
---
"Don't trust me to go to this effort all the time, but I guess I was in the mood for it."
I honestly wasn't expecting anything like that.
---
"That was the most significant change, in my mind. This example of yours should be a good test case for Nuit implementations:"
Check out "nuit-test.arc" which should have conformance tests for everything in the Nuit spec:
"I've been wondering about that too. It seems like " or ` will work just as well for those cases."
Yeah, I know. I guess it's because the only single-line thing is a non-sigil, and I wanted to be able to slap anything in without worrying about it, so \ was a single-line escape thingy. But I think I can safely get rid of it.
This decision of yours triggered me to post about my own PHP woes just now[1], but just so you know, I totally understand the value of having someone on hand to help you out. I think you're off to a good start, and I hope programming clicks for you. ^_^
What fate befall ye, StackOverflow netizens? How do you think in such backward ways?
---
At one time or another, I've found myself stuck with PHP. It's the most hassle-free server-side platform on my cheap hosting service, I worked in PHP for my college's website, and I've helped family members muck around with WordPress.
The last time I tried to use PHP, I realized its semantics were extremely surprising--at least to someone like me who likes the scoping rules of JavaScript and Scheme. But more recently than that, I realized these semantic surprises weren't utter dealbreakers:
- Assigning to an array element may be sugar for assigning a whole new array to the variable, and the most convenient syntax for function arguments may cause them to be deeply copied, but PHP's classical object system corrects these flaws somewhat.
- PHP's anonymous function syntax may force you to declare the function's closed-over variables, but hey, at least it's an anonymous function syntax, and at least this way it can plausibly avoid capturing the entire lexical scope. (Since PHP supports eval in local scopes, it's hard to determine free variables in the general case. That hasn't stopped JavaScript implementations from optimizing the eval-free case when possible, but still.)
I wouldn't shy away from coding in PHP again if I had to, but I'd much rather compile to it. The right source language could compile all my data structures to objects, and it could infer all my anonymous functions' lexical closures.
---
The StackOverflow link I posted doesn't help much--I just found it amusing--but there are a few helpful options:
- Snowscript (https://github.com/runekaagaard/snowscript) is a CoffeeScript-like syntax sugar layer for PHP. It doesn't address any of my semantic problems with PHP, but it's something. (The OP of the StackOverflow thread linked to Snowscript in a comment.)
- Fructose (https://github.com/charliesome/Fructose) is a language which approximates Ruby and compiles to PHP. The main website is down, the GitHub project hasn't been updated for a year, and I don't see much documentation. But wait! The compiler code does manipulate lexical closures in some way, and the runtime library has object wrappers for arrays and other primitives, so I expect it to be pretty nice to work with.
- Pharen (http://scriptor.github.com/pharen/reference.html) is a lisp which compiles to PHP. Probably the most promising of all of these, not only does it have implicit lexical closures and PHP-object-system-related features, but it also has macros, and the language reference explicitly talks about tail call elimination.[1] And hey, the compiler is written in PHP itself, so it should be possible to import the compiler and do eval at run time (whether or not Pharen is set up to do that out of the box).
[1] Pharen probably doesn't support TCE across multiple procedures, since it seems Pharen procedures can be passed directly to existing PHP utilities, which indicates they're implemented as (TCE-unfriendly) PHP procedures.
To be precise, apparently the result of parsing a Nuit value is either of the following:
- A Nuit string, which is a finite sequence of 16-bit values (just like a JavaScript or JSON string).
- A pair consisting of a Nuit string and a finite sequence of Nuit values. This recursion can't create cycles.
When translating Nuit to JSON, Nuit strings become JSON strings, and string-sequence pairs become JSON Arrays where the first element is a string.
Is this right?
---
Raw text appearing in Nuit's surface syntax (which starts as UTF-8, as you specify) becomes encoded as a sequence of UTF-16 code points, right? You just use the word "characters" as if it's obvious, but if you want strings to be sequences of full Unicode code points, your 16-bit escape sequences aren't sufficient.
Does the byte-order mark have any effect on the indentation of the first line?
What if the first line is indented but it occurs after a shebang line? Does the # consume it?
If I understand correctly, every Nuit comment must take up at least one whole line. There's no end-of-line comment. Is this intentional?
---
When I use JSON, I often encode richer kinds of data in the form {"type":"mydatatype",...} or rarely ["mydatatype",...]. Here's a stab at encoding richer data (in this case JSON!) inside Nuit:
[{a:1,b:null},"null"]
-->
@array
@obj
a
@number 1
b
@null
@string null
I don't have an opinion about this yet, but it's something to contemplate.
"A Nuit string, which is a finite sequence of 16-bit values (just like a JavaScript or JSON string)."
A finite sequence of Unicode characters. UTF-8 is recommended, but the encoding can be any Unicode encoding (UTF-32/16/8/7, Punycode, etc.)
---
"A pair consisting of a Nuit string and a finite sequence of Nuit values. This recursion can't create cycles."
No, because it uses the abstract concept of "list", which might map to a vector, array, cons, binary tree, etc. The only requirement is that it can hold 0 or more strings in order. How it's represented in a particular programming language is an implementation detail, not part of the specification.
---
"When translating Nuit to JSON, Nuit strings become JSON strings, and string-sequence pairs become JSON Arrays where the first element is a string."
Yes, except an empty Nuit list would be an empty JSON array. Also, if a meta-encoding scheme were used, it is possible for the serializer to encode Nuit as a JSON object, number, etc. But that's just de facto conventions, not part of the spec.
---
"Raw text appearing in Nuit's surface syntax (which starts as UTF-8, as you specify)"
Actually, the spec doesn't mention any encoding at all. It deals only with Unicode characters, with the encoding being an implementation detail. Parsers/serializers can use any encoding they want, as long as it supports Unicode. Even Punycode could be used.
In the "Size comparison" section I mention that it is assumed that UTF-8 is used in serialization. That was just so that the bytes would be consistent between the different examples.
It's also useful because it mimics a common situation found when transmitting data over HTTP, so it's closer to a "real world" benchmark rather than a synthetic one. That's also why CR+LF line endings were used rather than just LF.
As noted at the very bottom, if LF or CR endings are used, then Nuit becomes even shorter. This means that even in the worst-case scenario of CR+LF, Nuit is still shorter than JSON.
---
"You just use the word "characters" as if it's obvious, but if you want strings to be sequences of full Unicode code points, your 16-bit escape sequences aren't sufficient."
Incorrect. UTF-7/8 and UTF-16 are capable of representing all Unicode code points. UTF-7/8 does so by using a variable number of bytes. UTF-16 does so by using surrogate pairs. Punycode does so by using dark voodoo magic.
All that matters is that a string is a finite sequence of Unicode code points. How those code points are encoded is an implementation detail.
Hmmm... I think the current spec actually forbids certain valid UTF-16 strings, because surrogate pairs are forbidden. So I should change the Unicode part of the spec so it works correctly in all Unicode encodings.
---
"Does the byte-order mark have any effect on the indentation of the first line?"
Nope. It's a part of the encoding and thus is an implementation detail, so it has no effect on indentation.
---
"What if the first line is indented but it occurs after a shebang line? Does the # consume it?"
Yes, the # would consume it. If you don't want that, then the first line must not be indented. The same is true of @ and ` and " This is intentional. In fact, it's actually illegal for the first sigil to be indented. This is to help avoid the kind of mistakes that you're talking about.
---
"If I understand correctly, every Nuit comment must take up at least one whole line. There's no end-of-line comment. Is this intentional?"
That is correct and it is intentional. The design of Nuit only allows sigils at the start of a line. This makes it easy to take almost any arbitrary string and plop it in without having to quote or escape it. Which means that this Nuit code:
@foo bar
qux#nou
Would be equivalent to this JSON:
["foo", "bar", "qux#nou"]
That's part of the secret to not needing delimiters and escapes. The other part of the secret is using indentation, like with `
---
"I don't have an opinion about this yet, but it's something to contemplate."
I have already thought about such "meta-encoding schemes." Nuit itself doesn't do anything special with them, but applications can use the information to do something special. It is up to the applications to parse things in the way they want to.
I'm not against a Nuit parser/serializer using those kinds of de facto encoding schemes, but I want to keep Nuit simple, so I don't plan to put them into the spec. But, I might include some standard meta-encodings for JSON and YAML. They would be built on top of the simpler Nuit which supports only lists and strings.
This works because JSON keys must always be strings.
---
"Apparently I have to use \u(20) in order to put a space at the end of a string."
Yes, except that \ is only valid at the start of a line or within a " so you would have to prefix those lines with ":
@tag a
@attr href http://www.arclanguage.org/
"Visit\u(20)
@tag cite
Arc Forum
!
I'm still thinking about the right interaction of whitespace, ", and \ escaping. But I believe making whitespace at the end of the line illegal is an overall net gain. I might change my mind about it later.
This is a bit of a spec wormhole (as akkartik calls it ^_^ ), but go with it if you feel it's right.
If I want to escape a Unicode character in the 10000-10FFFF range, can I use \u(12345) or whatnot?
Are Nuit strings, and/or the text format Nuit data is encoded in, allowed to have unnormalized or invalid Unicode? If invalid Unicode is disallowed, then you'll have trouble encoding JSON in Nuit, since JSON strings are just sequences of 16-bit values, which might not validate as Unicode.
Are you going to do anything special to accommodate the case where someone concatenates two UTF-16 files and ends up with a byte order mark in the middle of the file? (I was just reading the WHATWG HTML spec today, and HTML treats the byte order mark as whitespace, using this as justification. Of course, the real justification is probably that browsers have already implemented it that way.)
---
"The only requirement is that it can hold 0 or more strings in order."
Technically it needs to hold sub-lists too, but I know that's not your point.
Zero? How do you encode a zero-length list in Nuit?
Is there a way to encode a list whose first element is a list?
Oh, come to think of it, is there a way to encode a list whose first element is a string with whitespace inside?
---
"In fact, it's actually illegal for the first sigil to be indented."
Cool. Put it in the doc. ^_^
I assume you mean the first line, rather than the first sigil. The first line could be a sigil-free string, right?
Speaking of which, it seems like there will always be exactly one unindented line in Nuit's textual encoding, that line being at the beginning. Is this true?
---
"I'm not against a Nuit parser/serializer using those kinds of de facto encoding schemes, but I want to keep Nuit simple, so I don't plan to put them into the spec."
I like it that way too.
---
@attr href http://www.arclanguage.org/
Er, I think that creates the following:
[ "attr", "href http://www.arclanguage.org/" ]
---
"But I believe making whitespace at the end of the line illegal is an overall net gain."
I agree. I'm not sure I'd make it illegal, but I'd at least ignore it.
If you make whitespace at the end of blank lines illegal, bah! I like to indent my blank lines. :-p
This is partially because I've used editors which do it for me, but also because I code with whitespace visible, and a completely blank line looks like a hard boundary between completely separate blocks of code.
"This is a bit of a spec wormhole (as akkartik calls it ^_^ )"
I have no clue what you're talking about.
---
"If I want to escape a Unicode character in the 10000-10FFFF range, can I use \u(12345) or whatnot?"
I don't see why not...
---
"Are Nuit strings, and/or the text format Nuit data is encoded in, allowed to have unnormalized or invalid Unicode?"
Invalid Unicode is not allowed.
---
"If invalid Unicode is disallowed, then you'll have trouble encoding JSON in Nuit, since JSON strings are just sequences of 16-bit values, which might not validate as Unicode."
I have no clue where you got that idea from... I'm assuming you mean that JSON is encoded in UTF-16.
UTF-16 is just a particular encoding of Unicode that happens to use two or four 8-bit bytes, that's all. UTF-16 can currently handle all valid Unicode and doesn't allow for invalid Unicode.
But JSON doesn't even use UTF-16. Just like Nuit, JSON uses "sequences of Unicode characters" for its strings. And also like Nuit, JSON doesn't specify the encoding: neither "json.org" or Wikipedia make any mention of encoding. And the JSON RFC (https://tools.ietf.org/html/rfc4627) says:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
All characters mentioned in this specification are Unicode code points. Each
such code point is written as one or more bytes depending on the character
encoding used. Note that in UTF-16, characters above #xFFFF are written as
four bytes, using a surrogate pair.
The character encoding is a presentation detail and must not be used to
convey content information.
On input, a YAML processor must support the UTF-8 and UTF-16 character
encodings. For JSON compatibility, the UTF-32 encodings must also be
supported.
A conforming implementation of this Standard shall interpret characters in
conformance with the Unicode Standard, Version 3.0 or later and ISO/IEC
10646-1 with either UCS-2 or UTF-16 as the adopted encoding form,
implementation level 3.
And I believe Java also uses UTF-16.
But I see no reason to limit Nuit to only certain encodings. And if I did decide to specify a One True Encoding To Rule Them All, I'd specify UTF-8 because it's the overall best Unicode encoding that we have right now.
Instead, if a Nuit parser/serializer is used on a string that it can't decode/encode, it just throws an error. It's very highly recommended to support at least UTF-8, but any Unicode encoding will do.
---
"Are you going to do anything special to accommodate the case where someone concatenates two UTF-16 files and ends up with a byte order mark in the middle of the file? (I was just reading the WHATWG HTML spec today, and HTML treats the byte order mark as whitespace, using this as justification. Of course, the real justification is probably that browsers have already implemented it that way.)"
The current spec for Nuit says to throw an error for byte order marks appearing in the middle of the file.
---
"Zero? How do you encode a zero-length list in Nuit?"
Easy, just use a plain @ with nothing after it:
@
@foo
The above is equivalent to the JSON [] and ["foo"]
---
"Is there a way to encode a list whose first element is a list?"
Yes:
@
@bar qux
And I was thinking about changing the spec so that everything after the first string is treated as a sigil rather than as a string. Then you could say this:
Ah, I get to learn a few new things about JSON! JSON strings are limited to valid Unicode characters, and "A JSON text is a serialized object or array," not a number, a boolean, or null. All this time I thought these were just common misconceptions! XD
It turns out my own misconceptions about JSON are based on ECMAScript 5.
To start, ECMAScript 5 is very specific about the fact that ECMAScript strings are arbitrary sequences of unsigned 16-bit values.
4.3.16
String value
primitive value that is a finite ordered sequence of zero or more
16-bit unsigned integer
NOTE A String value is a member of the String type. Each integer
value in the sequence usually represents a single 16-bit unit of
UTF-16 text. However, ECMAScript does not place any restrictions or
requirements on the values except that they must be 16-bit unsigned
integers.
ECMAScript 5's specification of JSON.parse and JSON.stringify explicitly calls out the JSON spec, but then it relaxes the constraint that the top level of the value must be an object or array, and it subtly (maybe too subtly) relaxes the constraint that the strings must contain valid Unicode: It says "The JSON interchange format used in this specification is exactly that described by RFC 4627 with two exceptions," and one of those exceptions is that conforming implentations of ECMAScript 5 aren't allowed to implement their own extensions to the JSON format, and must instead use exactly the format defined by ECMAScript 5. As it happens, the formal JSON grammar defined by ECMAScript 5 supports invalid Unicode.
---
"This follows naturally if you assume that empty strings aren't included in the list."
I'm not most people, but when the Nuit spec says "Anything between the @ and the first whitespace character is the first element of the list," I don't see a reason to make "@ " a special case that means something different.
"I'm not most people, but when the Nuit spec says "Anything between the @ and the first whitespace character is the first element of the list," I don't see a reason to make "@ " a special case that means something different."
Then I'll change the spec to be more understandable. What wording would you prefer?
I'll try to make it a minimal change: "If there's anything between the @ and the first whitespace character, that intervening string is the first element of the list."
Well, it's true that the Nuit spec intentionally ignores encoding issues, and thus a Nuit parser/serializer might need to understand encoding in addition to the Nuit spec. I don't see a problem with that.
The Arc implementation of Nuit basically just ignores encoding issues because Racket already takes care of all that. So any encoding information in the Nuit spec would have just been a nuisance.
There's already plenty of information out there about different Unicode encodings, so people can just use that if they don't have the luxury of relying on a system like Racket.
---
I see encoding as having to do with the storage and transportation of text, which is certainly important, but it's beyond the scope of Nuit.
Perhaps a Nuit serializer wants to use ASCII because the system it's communicating with doesn't support Unicode. It could then use Punycode encoding.
Or perhaps the Nuit document contains lots of Asian symbols (Japanese, Chinese, etc.) and so the serializer wants to use an encoding that is better (faster or smaller) for those languages.
Or perhaps it's transmitting over HTTP in which case it must use CR+LF line endings and will probably want to use UTF-8.
---
I'll note that Nuit also doesn't specify much about line endings. It says that the parser must convert line endings to U+000A but it doesn't say what to do when serializing.
If serializing to a file on Windows, the serializer probably wants to use CR+LF. If on Linux it would want to use LF. If transmitting over HTTP it must use CR+LF, etc.
Nuit also doesn't specify endianness, or whether a list should map to an array or a vector, or how big a byte should be, or whether the computer system is digital/analog/quantum, or or or...
Nuit shouldn't even be worrying about such things. Nuit shouldn't have to specify every tiny miniscule detail of how to accomplish things.
They are implementation details, which should be handled by the parsers/serializers on a case-by-case basis, in the way that seems best to them.
"Speaking of which, it seems like there will always be exactly one unindented line in Nuit's textual encoding, that line being at the beginning. Is this true?"
Well, not exactly. If all the lines are blank, that's fine too. But assuming at least one non-blank line... there must be at least one line that is unindented. There might be more than one unindented line.
"...then I'd say a single item should be put in a list too [...] And yet now Nuit values at the root can't be strings; they must be lists."
That's correct. There's an implied list wrapping the entire Nuit text. You can think of it like XML's root node, except in Nuit it's implicit rather than explicit.
Calling "readfile" in Arc also returns a list of S-expressions, so this isn't without precedent.
If every definition form actually causes a new environment to be created and placed in some file-wide (or global?) variable behind the scenes, how often is that variable accessed? If I say (foo ($set! bar ...) bar), will I already get the new value for 'bar?
That's the kind of question that's tripped me up along this path. I like your system a lot.
The environment variable is implicit, but it's passed to vaus:
$vau Env ...
In the above code, the local name Env is a variable that points to the dynamic environment[1].
I'll note this is a lot like call-cc, which gives you access to the normally-implicit continuation:
call-cc: $fn [C] ...
---
"If I say (foo ($set! bar ...) bar), will I already get the new value for 'bar?"
Well, let's see... the above will:
1. eval foo in the dynamic env
2. if foo is a $fn then...
3. eval ($set! bar ...) passing it the dynamic env
4. $set! will update the dynamic env with the new binding
5. eval bar in the dynamic env
So, yes, the second usage of "bar" will return the new binding, but only because functions evaluate their arguments left-to-right in sequential order. Theoretically, if they didn't evaluate them left-to-right, then that property might not hold. But for now I'm mandating left-to-right order for simplicity.
---
* [1]: That means a vau can mutate the environment variable directly, which is what $set! and $def! and friends do.
This doesn't offer any more capabilities because you can always just say:
I asked about (foo ($set! bar ...) bar) because I thought maybe each symbol would be looked up in the immutable environment that existed at the beginning of the expression. That is, I thought you might pass an immutable environment, not a ref, to 'eval.
---
"That means a vau can mutate the environment variable directly, which is what $set! and $def! and friends do."
Yeah, that makes sense. The thing passed to 'eval needs to be a ref for the same reason.
Wait. I thought $set! worked by looking something up in the immutable environment, expecting that thing to be a ref, and then mutating that ref. Why does the environment need to be in a ref too?
(Sorry, I'd rather say "ref" than "var", since this discussion potentially involves environments which bind certain symbols (aka variables) to non-refs (aka non-variables?).)
"That is, I thought you might pass an immutable environment, not a ref, to 'eval."
Well, I don't see why you can't directly pass in an immutable data structure to eval. It's just that environments are usually a var, that's all.
Of course, if you DID pass in a non-var data structure, then things like $set! would break because they expect a var. So it'd be read-only.
---
"Wait. I thought $set! worked by looking something up in the immutable environment, expecting that thing to be a ref, and then mutating that ref. Why does the environment need to be in a ref too?"
Allow me to explain... look at this piece of code:
$def! foo 1
$def! bar 2
prn! foo bar
In the above, when we evaluate "$def! foo 1" we want "foo" to be bound to "1" in all expressions evaluated after it, and when we evaluate "$def! bar 2" we want "bar" to be bound to "2" in all expressions after it. That way, when we evaluate "prn! foo bar" it will print out "1 2".
In order to accomplish this, we need some form of mutation. This mutation is handled by an implicit environment variable. Let's rewrite the above to make the environment variable explicit:
Here we have an empty environment which is stored in a var. We then sequentially evaluate expressions in that variable.
Now, when we evaluate "($def foo 1)" it will pass the Env variable to the $def! vau:
$vau Env [X Y]
var-set! Env: set (Env) X: eval Env Y
That local "Env" name refers to the exact same variable that is defined in the $let. It will then create a new environment which is just like the existing Env but with the name "foo" bound to "1". It will then set the Env variable to point to this new environment.
This means that when the second $def! expression is evaluated, it's called with the same environment variable, but the environment variable was mutated by the first $def! so that "foo" is bound to "1". This is then repeated again, binding "bar" to "2".
This variable mutation is how Nulan achieves a linear style of bindings that can change over time while maintaining immutability. This normally happens under-the-hood, but, just like call/cc lets you grab a hold of the implicit continuation, $vau lets you grab a hold of the implicit environment variable.
---
"(Sorry, I'd rather say "ref" than "var", since this discussion potentially involves environments which bind certain symbols (aka variables) to non-refs (aka non-variables?).)"
My terminology hasn't been very clear. In Nulan, an "environment" is really just a dictionary that stores symbol/value pairs. As shown above, this dictionary is usually wrapped in a var, which allows for environment "mutation"[1]. I usually call this combination of "dictionary + var" an "environment" but sometimes I might use "environment" to refer only to the dictionary part.
---
As for symbols vs. variables...
A symbol is never a variable. A variable is a separate data type, which is a mutable single-celled box. An environment dictionary has symbols as keys[2], and anything (including variables) as values.
But it doesn't usually make sense to talk of an environment binding a variable directly. Instead, an environment binds a symbol to a variable, and the variable can contain anything.
When discussing Nulan with you earlier, I used the word "reference" to refer to these mutable boxes. I decided to change the name to "variable" since they're the only thing in Nulan that actually varies: everything else is immutable. Instead, I've consistently used the word "name" for symbols that refer to variable/non-variable values.
So, if I say "an environment binds the name foo to the variable 5" that would mean this:
set Env ($quote foo) (var 5)
That is, creating a new environment that has the symbol "foo" as its key, and a variable as its value, which contains the value 5.
---
I'll note that I plan for "eval" to automatically unwrap a variable when looking up a symbol in an environment, which gives the illusion that a variable is bound directly to an environment. As an example... this code here...
$var! foo 5
...is equivalent to this code here[3]:
$set! foo: var 5
That is, we're creating a variable which contains "5", and we're then binding the name "foo" to that variable.
But when eval looks up a symbol in an environment, if it happens to be a variable, it will automatically unwrap it, so that this...
foo
...will return "5" rather than the variable itself. If you want to get at the variable itself, there's a little trick. You can create a vau which will do the lookup manually, bypassing eval:
$def! $get: $vau Env [X]: (Env) X
Now you can say "$get foo" to return the actual variable itself. Most of the time you want the value of the variable, not the variable itself, so having to explicitly use "$get" to retrieve the variable seems like a good compromise to me.
---
* [1]: All mutation in Nulan must use vars, which is why any concept of "bindings changing over time" must use vars as well. That's why the environment dictionary is wrapped in a var, to allow for bindings to change over time.
* [2]: Technically speaking, you could bind non-symbols into an environment which would provide you with a special hiding spot that's invisible to eval, since eval only looks things up by symbol. This is very esoteric and unusual, though. Normal code just uses symbol keys in environment dictionaries.
* [3]: One difference: if the symbol is already bound to a variable, then $var! will update the existing variable rather than making a new one.
"Here we have an empty environment which is stored in a var. We then sequentially evaluate expressions in that variable."
"All mutation in Nulan must use vars, which is why any concept of "bindings changing over time" must use vars as well."
That's the technique you're using to define new things, yes. But I don't think 'eval and $vau, so core to the language's computation semantics, should necessarily be coupled to refs just to make definitions work.
For an alternative, you could have a procedure that sets what the next top-level command's environment will be, and you could implement definition in terms of that.
For another alternative, you could have each top-level command be an expression which returns the context to use for executing the next command. This is an option even if you eschew side-effectful procedure calls entirely.
---
"I decided to change the name to "variable" since they're the only thing in Nulan that actually varies: everything else is immutable."
You forget parameters. Those vary without mutation.
Local variables also vary, by nature of being calculated from parameters.
---
"A symbol is never a variable."
In this particular case, I do think there's a conceptual overlap between "symbol" and "variable" in our everyday use, but I don't care either way.
My point is, I find it confusing to redefine "variable."
In your examples, Env is a variable (in the everyday sense). The Env variable refers to a ref, and that ref holds an immutable environment. I don't want to say "the Env variable refers to a variable," and I'd rather not resort to workarounds like "the Env name refers to a variable."
I'll accept your terminology if you insist though.
"That's the technique you're using to define new things, yes. But I don't think 'eval and $vau, so core to the language's computation semantics, should necessarily be coupled to refs just to make definitions work."
Why not? Immutability and variables are just as core to Nulan as anything else. I don't see why you would see variables as being somehow "less core" than vau/eval.
---
"For an alternative, you could have a procedure that sets what the next top-level command's environment will be, and you could implement definition in terms of that."
And why have a primitive that uses internal hidden mutation that only works in the one special case of global names when I could instead use vars which can be stored in global names, or local names, or inside data structures. It's a much more general and powerful system than a special-cased "set-the-next-global-env" primitive.
And how would you handle things like multiple namespaces? Or are you planning to always use a single namespace all the time? Vars have a sane semantic when it comes to multiple namespaces, and in fact vars can be used to create new namespaces in constant time with no overhead.
---
"For another alternative, you could have each top-level command be an expression which returns the context to use for executing the next command. This is an option even if you eschew side-effectful procedure calls entirely."
Sure, that's an interesting idea, but that still won't help with times where you really DO want mutation, like with local mutation, or dynamic bindings.
I chose vaus because they accomplish the same awesomeness as macros, except BETTAR. I chose immutability + mutable vars because they're very practical: immutability is basically a necessity if you want to do any sort of concurrency, but you still want some mutability for the sake of practicality.
And even in a single-threaded program, vars are still useful because:
1) You get very flexible multiple-namespace-like semantics for free. This includes being able to selectively mark some names as mutable/immutable and being able to include/exclude names.
Not only that, but thanks to immutability + vars you can create multiple namespaces in constant-time with essentially 0 overhead. This isn't normally needed, but the $include! vau does use it. Without vars it would have to be yet-another-special-cased-primitive, but with vars you can use them for everything.
2) Kernel's restricted mutable environment system makes dynamic bindings (along with other things) a pain in the butt. Nulan's freeflowing immutable environment system makes dynamic bindings trivial and also helps with many of the other problems that Kernel's environments have.
3) It works consistently for all data types, all the way from the core upwards.
I'll admit that I'm currently struggling to define a sane, practical, AND consistent semantic for variables, and I may very well ditch them for a better idea, but I don't think either of your two ideas are good enough to replace them.
I'll note this isn't anything new: I've struggled with many of the ideas in Nulan before finally finding a semantic that I'm happy with, or finding a better idea. I have my doubts about vars, but I'm not convinced yet that they're inherently flawed, just that my current semantic for them is.
I'm using vars quite bluntly because it's the best idea I have found. If you don't like vars, come up with a better idea. You'll need to solve all the problems above, or, even better, provide a system that makes the above problems no longer problems. In the meantime, I'll be doing my own thinking.
---
"You forget parameters. Those vary without mutation."
I didn't forget, I just prefer to call them "names". I see them as being more static because even though they can "change" by calling the vau with different parameters, they can't actually change. That is, you can't use $set-var! on them.
This distinction is obvious to me: functions can be trivially inlined, but vars are harder. Thus I see variables as being "more variable" than ordinary names.
---
"My point is, I find it confusing to redefine "variable.""
Well, I can understand that, but I've renamed a lot of "traditional" things in my language.
I know this is against the mathematical definition, but then again, using "sum" for "foldl" is against tradition too.
---
"and I'd rather not resort to workarounds like "the Env name refers to a variable.""
Why not? It's shorter than the word "variable" and closer to how we use the word "name" in everyday life. I'm not terribly fond of unnecessary words like "reference" and "lambda" and "parameter". If there's a simpler/shorter word, I'll use it.
After using Arc for so long, I figured you would be used to names like "fn" and "=" which are against the normal traditions. I don't care for tradition. I thought that would be obvious given some of the crazy ideas I'm cramming into Nulan.
---
"I'll accept your terminology if you insist though."
One of my goals is to be as internally consistent as I can be (while remaining practical enough for my own tastes). External consistency is just a nice bonus.
I'm making the language for myself, so I don't really expect other people to actually use my language, so call it what you want. What's important is that we can understand eachother, the rest is irrelevant.
"What's important is that we can understand eachother, the rest is irrelevant."
I'll use a combination of "name" and "ref" instead of "variable" to minimize confusion.
---
"Immutability and variables are just as core to Nulan as anything else. I don't see why you would see variables as being somehow "less core" than vau/eval."
Without '$vau or 'eval, you don't have much of an fexpr language anymore.
Without refs, it's still possible to write a program in pure FP style. If your top-level design depends on refs, a program without refs might have to take the form of a single gigantic expression, but it's still reasonably maintainable that way.
---
"And why have a primitive that uses internal hidden mutation that only works in the one special case of global names when I could instead use vars which can be stored in global names, or local names, or inside data structures."
I'm not saying to get rid of refs entirely! I'm suggesting to untangle them from '$vau and 'eval. You can think of 'set-the-next-global-env as an output stream instead of a ref if it makes it look more like a whole feature.
(In fact, I'm actually not suggesting to untangle refs from 'eval entirely. When 'eval processes a variable reference, finds a ref, and auto-unwraps it, I don't mind that. I'm only concerned about the fact that 'eval takes a ref to the namespace, rather than simply taking the namespace.)
I'll add another alternative to the fray. Instead of 'set-the-next-global-env, you can have a (global-env) syntax that gives you a ref which contains the environment that will be used for the next top-level command. By setting this ref, you influence future commands, but you don't influence the current one.
---
"And how would you handle things like multiple namespaces? Or are you planning to always use a single namespace all the time? Vars have a sane semantic when it comes to multiple namespaces, and in fact vars can be used to create new namespaces in constant time with no overhead."
You can still coordinate multiple namespaces using any of the alternatives. That's the reason the features 'set-the-next-global-env, (global-env), or returning-a-new-context-to-the-top-level would even exist.
Refs are pretty orthogonal to namespaces.
---
"Sure, that's an interesting idea, but that still won't help with times where you really DO want mutation, like with local mutation, or dynamic bindings."
You can use refs for that. I repeat, I'm not saying to get rid of refs entirely, just to untangle them from '$vau and 'eval.
---
"I chose immutability + mutable vars because they're very practical: immutability is basically a necessity if you want to do any sort of concurrency, but you still want some mutability for the sake of practicality."
If refs are tangled with the core language semantics, then arguably that's not just "some" mutability. :) Of course, it depends on what you compare it to.
---
"1) You get very flexible multiple-namespace-like semantics for free. This includes being able to selectively mark some names as mutable/immutable and being able to include/exclude names.
Not only that, but thanks to immutability + vars you can create multiple namespaces in constant-time with essentially 0 overhead. This isn't normally needed, but the $include! vau does use it. Without vars it would have to be yet-another-special-cased-primitive, but with vars you can use them for everything."
To "selectively mark some names as mutable/immutable," you can use refs for individual names in the namespace, rather than a single ref holding the whole namespace.
To "include/exclude names" or "create multiple namespaces," you can pass a namespace to a pure function that gives you a new namespace. Currently you store that new namespace in a ref to install it, but I'm suggesting alternatives: You can pass it to 'set-the-next-global-env, or you can return the new namespace to the top level. (My (global-env) alternative doesn't count here, because it still stores the namespace in a ref.)
As far as "special-cased" goes, if the namespace is passed around in a ref, I consider that just as arbitrary as these alternatives.
---
"2) Kernel's restricted mutable environment system makes dynamic bindings (along with other things) a pain in the butt. Nulan's freeflowing immutable environment system makes dynamic bindings trivial and also helps with many of the other problems that Kernel's environments have."
I think you're trying to talk about auto-unwrapping here, which I don't mind. Are you actually talking about something else too?
---
"3) It works consistently for all data types, all the way from the core upwards."
I think any of these styles could be implemented in terms of refs. Is that enough?
---
"I have my doubts about vars, but I'm not convinced yet that they're inherently flawed, just that my current semantic for them is."
What issue is it? Something to do with auto-unwrapping? We should talk about this, because it seems workable to me.
---
"After using Arc for so long, I figured you would be used to names like "fn" and "=" which are against the normal traditions."
Those mesh with traditions just fine! They're just not lisp traditions. ^_^
These days I might call an Arc (fn ...) a "procedure" rather than a "function," but that's just because I like the way "procedure" is more overt about the presence of side effects.
"Without refs, it's still possible to write a program in pure FP style."
And without vau it's still possible to use macros, etc. Nulan has many things in it, all of which I consider to be important. I don't see vau as being more important to Nulan than immutable data structures.
If anything, I see immutability as being more important. The only major benefit of vaus over macros in Nulan is that they're hygienic by default, which, frankly, isn't a big deal because of... (wait for it...) immutable environments.
Ironically, this means that now that Nulan has immutable envs, vaus lose a lot of their power and aren't actually that much more attractive than macros. Or, to put it in another perspective, immutable environments make macros more powerful, almost to the point of vaus.
And because practical programs usually need at least a little mutation, I think using variables to provide that mutation is a reasonable thing to do, so I consider variables to be pretty important too (but less important than immutability). I'm still open to better ideas on how to handle mutation, but, I think it'll be pretty hard to beat the raw simplicity of variables.
---
"By setting this ref, you influence future commands, but you don't influence the current one."
Ah, I see the problem now! You're worried about influencing the current environment. I've only used environment mutation at the top level of a global/local environment, never within an expression like with your (foo ($set! bar ...) bar) example, so I don't really care one way or the other.
---
"Refs are pretty orthogonal to namespaces."
Sure, but the fact they make it so trivial to create new namespaces in addition to their other awesome features is certainly a plus.
---
"You can use refs for that. I repeat, I'm not saying to get rid of refs entirely, just to untangle them from '$vau and 'eval."
Oh, well, in that case I don't have much of an opinion one way or the other.
Manual global environment mutation isn't something you're supposed to be doing that frequently anyways, since you're expected to use the built-in "$set!", "$def!", etc.
How environment mutation occurs under the hood isn't a big deal from the programmer's perspective.
---
"If refs are tangled with the core language semantics, then arguably that's not just "some" mutability. :) Of course, it depends on what you compare it to."
Well, in Clojure, you can freely mutate global variables, like functions.
In Nulan, globals are by default immutable, so you have to explicitly use "$var!" to turn on mutability.
So, yeah, even with mutable environments, Nulan is still significantly less mutable than Clojure, which is the language I'm using as a comparison for immutability.
---
"To "selectively mark some names as mutable/immutable," you can use refs for individual names in the namespace [...]"
Right, that's what Nulan does already. It has to do that because even though the whole environment may be in a var, the environment dictionary itself is static.
And once a function has been created, it uses the unwrapped static environment dictionary as its lexical scope. So changing the environment variable only affects future expressions anyways, it can't change previous lexical scopes.
---
"As far as "special-cased" goes, if the namespace is passed around in a ref, I consider that just as arbitrary as these alternatives."
Sure, but right now I get to use a single system (variables) for everything, as opposed to adding in a second primitive (like set-the-next-global-env or whatever). I'd like to keep the core primitives as low as possible, so reusing variables for environment mutation makes sense to me since the whole point of variables is to manage mutation.
---
"I think you're trying to talk about auto-unwrapping here, which I don't mind [...]"
Nope! I'm talking about the fact that Kernel requires you to have an explicit reference to an environment in order to mutate it. So let's suppose I had a "$let-var" vau which is like "$let" except with a dynamic variable:
($let-var ((bar 5))
...)
That will work fine at the top level, but it won't work in this case...
($let ((foo 1))
($let-var ((bar 5))
...))
...because the $let-var can only mutate things in its immediate parent environment. This also means that the common practice (in Arc) of binding a local and then binding a global function doesn't work...
($let ((foo 5))
($define! bar
($lambda ...)))
...because it's assigning the name "bar" into the scope of the "$let", not the global scope like you wanted. So instead you have to do contortions like this:
Kernel went out of its way to define two versions of various vaus: one that mutates the current environment, and one that is passed an explicit environment as an argument. I get the feeling that was done in part to make situations like the above more palatable.
Contrast that with Nulan, which would work like so:
$var! bar
$let foo 1
$let-var bar 5
...
$predef! bar
$let foo 5
$def! bar
$fn ...
Variables not only give you mutation, but they also let you control where a name is set. If there's a var in the scope chain, it'll assign to it (regardless of where the var is), but if not, it'll make a local binding. This is okay because environments are immutable, so it'll only do it if a variable is defined before the assignment.
---
"I think any of these styles could be implemented in terms of refs. Is that enough?"
I'm not sure what you mean. I was talking about how it works for all data types (environments, dictionaries, lists, etc.) which is nice because it's consistent.
---
"What issue is it? Something to do with auto-unwrapping? We should talk about this, because it seems workable to me."
Yeah, that's the thing that's been bugging me the most about vars. But I think the way to handle that is to have two kinds of variables: one that is auto-unwrapped, and another that isn't. Most of the time you'd use the auto-unwrapped one for convenience, but the other one would still be available in case that gets messy.
Whoops. You mean "var" in the Clojure or Common Lisp 'defvar sense, don't you? I've been using "ref" in what I thought was the Clojure sense, but was actually just the Haskell sense. :-p So much for avoiding confusion.
---
"And without vau it's still possible to use macros, etc."
I was about to make a frequency argument, but yeah, I guess macros would still let you do most of what you're doing (as you say).
---
"I don't see vau as being more important to Nulan than immutable data structures."
Did you mean to say "mutable" there instead of "immutable"? I've been talking about mutability being farther from the core.
---
"Ironically, this means that now that Nulan has immutable envs, vaus lose a lot of their power and aren't actually that much more attractive than macros."
Ah, right. Fexprs are nice at the REPL when they're late-bound, but now they're not.
With Fexpress I plan to sacrifice late binding too. I don't have practicality in mind for Fexpress, but I do have at least these goals:
1) Give an existence proof that at least one design for an fexpr language is ahead-of-time compilable.
2) See if this enables programming patterns that make heavy use of eval. (If it turns out there is a cool technique enabled this way, it'll be a practical solution... in search of a problem.)
---
"Ah, I see the problem now! You're worried about influencing the current environment."
Yep. But I haven't really said why yet....
I'm heavily influenced by the desire to partially evaluate fexprs. If a single mutable container is passed through every corner of the code, even one uncompilable expression will make it difficult to know the contents of that container, which will make it difficult to compile any of the expressions which follow that one.
---
"I'd like to keep the core primitives as low as possible, so reusing variables for environment mutation makes sense to me since the whole point of variables is to manage mutation."
Hmm, if Nulan's variables weren't just mutable boxes but had support for something like Clojure's STM, then your goals for that side system might justify using variables in this case, and also in various other cases where just having a random mutable box wouldn't even make sense otherwise. :)
Maybe the interactions with continuations and concurrency could persuade me. Do you intend to have '$let-var be just like Racket's 'parameterize, or would you just use mutate-and-mutate-back?
---
"If there's a var in the scope chain, it'll assign to it (regardless of where the var is), but if not, it'll make a local binding."
I find that unsettling, but I don't have a reason why. Not even an impractical one. :-p
---
"I'm not sure what you mean. I was talking about how it works for all data types (environments, dictionaries, lists, etc.) which is nice because it's consistent."
Given mutable boxes, people are going to create their own stateful interfaces on top of them, so you're going to get stateful interfaces different from boxes anyway. I think 'set-the-next-global-env is consistent with that language experience.
Anyway, the top-level namespace doesn't have to be stored in a data structure. The language itself can hold onto it.
---
"But I think the way to handle that is to have two kinds of variables: one that is auto-unwrapped, and another that isn't."
Oh, I'd take a more syntax-side approach: Define a new expression type that holds a symbol. When that expression is evaluated, it's like evaluating the symbol, but without auto-unwrapping. That way the programmer always has the option to write any part of the code in a more precise style, at the cost of brevity.
"Whoops. You mean "var" in the Clojure or Common Lisp 'defvar sense, don't you?"
Yup, the Clojure sense.
---
"Did you mean to say "mutable" there instead of "immutable"? I've been talking about mutability being farther from the core."
Nope. I see immutability as being more important than vau. But for practicalities sake, you do need some mutability, so I also see mutability as being important (just less than immutability).
You're right in the sense that it's possible to define the language with $vau, eval, and immutability, but without mutability... but I think the end result would be a massive pain and very impractical for actually getting things done, so I still consider mutability important.
My point was that I don't really see these things in some sort of rigid hierarchy where "mutability is frivilous and vau is central". They're used for different purposes and they're both important, for different reasons.
---
"I'm heavily influenced by the desire to partially evaluate fexprs."
Yeah, I get that. But I'm not going to contort my language to conform to some notion of purity, especially not to enable some technique like partial evaluation which may or may not work and may not be beneficial even if it does work. Sure, it might be great if you could partial evaluate Nulan, but if you can't, that's fine too.
I already decided that it's just fine if Nulan can't be compiled and must be interpreted. Until I run into some major roadblocks, I'm just going to do the simplest/best thing I can. I'll worry about speed and purity later. I already said my idea about immutable environments is about practical benefits, not speed. The speed is just a nice bonus, that's all.
---
"If a single mutable container is passed through every corner of the code, even one uncompilable expression will make it difficult to know the contents of that container, which will make it difficult to compile any of the expressions which follow that one."
This is true of vars in general. I'm not exactly sure how that situation is made worse for mutable environments, but you're the one with the partial evaluation ideas. I haven't thought about this much.
---
"Maybe the interactions with continuations and concurrency could persuade me. Do you intend to have '$let-var be just like Racket's 'parameterize, or would you just use mutate-and-mutate-back?"
I'm currently using mutate-and-mutate-back because it's dirt-simple. And so, how $let-var interacts with threads will depend on how vars themself interact with threads. I haven't really given that much thought to multicore yet: right now I'm focusing on just getting the damn thing working at all, so all my talk about multicore is about the future.
---
"I find that unsettling, but I don't have a reason why. Not even an impractical one. :-p"
Yeah it disturbs me too, but like you I can't find any real reason why. It's not any less dangerous than what Arc/Scheme/JavaScript/etc. do, and in fact it's actually far safer because it's selective and you can use various constructs to control mutability.
---
"Given mutable boxes, people are going to create their own stateful interfaces on top of them, so you're going to get stateful interfaces different from boxes anyway. I think 'set-the-next-global-env is consistent with that language experience."
Yeah, but I still need to think about what the core of the language is and what the idioms are. Saying, "mutable boxes make such and such thing possible, so eventually somebody will add it in, so you might as well add it in now" doesn't convince me because languages can and do influence what people do and also how easy it is to do things. My view can be summed up as: the default experience matters.
---
"Anyway, the top-level namespace doesn't have to be stored in a data structure. The language itself can hold onto it."
And how would the language itself hold onto it? Presumably in a data structure. And $vau already reifies the dynamic environment, so I can't just pretend that the environment is some closed-box that isn't accessible to the programmer.
I don't currently make a distinction between top-level or local-level environments: they're the exact same from both the implementation's perspective and $vau's perspective. I like that consistency, so I'd need to see a good incentive to make them different.
---
"Oh, I'd take a more syntax-side approach: Define a new expression type that holds a symbol. When that expression is evaluated, it's like evaluating the symbol, but without auto-unwrapping. That way the programmer always has the option to write any part of the code in a more precise style, at the cost of brevity."
So you'd be wrapping a symbol rather than a var? Well, okay, but I can't think of any situation where I'd want to auto-unwrap something other than a var, so tieing it to var seems okay to me.
Wait... there is one thing... $lazy creates a special data structure which is auto-evaled, and currently there's no way to disable that. So I guess I could use the same auto-unwrap mechanism for both $lazy and vars.
"Nope. I see immutability as being more important than vau."
Then we're talking past each other here. I don't care to compare these two.
---
"This is true of vars in general. I'm not exactly sure how that situation is made worse for mutable environments, but you're the one with the partial evaluation ideas. I haven't thought about this much."
I'm not sure what you're saying here, but I'll explain again. Passing a mutable box to every part of the code means any part could clobber that box, inhibiting our ability to predict the behavior of the remaining code.
The box you're passing around every part of the code is the one that contains the namespace.
---
"And how would the language itself hold onto it? Presumably in a data structure."
That's an implementation detail. It doesn't make a difference to the users of the language.
To implement a language that "holds" something without a data structure, you can implement it in a language that can already do that. ^_-
We already have lots and lots of languages with that feature. Any language with non-first-class variables can qualify (e.g. Haskell, Java, or Arc). Some languages might let you hold values in event queues (e.g. JavaScript or E) or in a language-wide stack (e.g. Forth, Factor, or assembly language).
---
"And $vau already reifies the dynamic environment, so I can't just pretend that the environment is some closed-box that isn't accessible to the programmer."
All it needs is the immutable part. You're putting a box around it, and I consider that to be harmful, unnecessary complexity.
---
"I don't currently make a distinction between top-level or local-level environments: they're the exact same from both the implementation's perspective and $vau's perspective. I like that consistency, so I'd need to see a good incentive to make them different."
Hmm, why are you even wrapping a local environment in a box? Why not use '$let for every locally scoped definition?
"I'm not sure what you're saying here, but I'll explain again. Passing a mutable box to every part of the code means any part could clobber that box, inhibiting our ability to predict the behavior of the remaining code."
Yeah, but only certain functions are capable of clobbering that box, and because environments are immutable... it should be possible to statically determine at compile-time whether such a function occurs or not. Racket can already do this: it statically determines whether a module uses "set!" or not. It should be a piece of cake to do the same in Nulan, at compile-time.
As I already said, when a vau is created, the lexical scope is the current value of the environment variable, not the environment variable itself. So you know exactly what bindings are present within the vau's body when the vau is called, even with environment mutation. That should give you plenty to reason about, right?
And because the dynamic environment is really just the lexical environment of the call site, that should be immutable as well... which means fexpr inlining should be possible even though the environment is stored in a variable. Unless I'm missing something. I haven't really thought this through, so I wouldn't be surprised.
---
"All it needs is the immutable part. You're putting a box around it, and I consider that to be harmful, unnecessary complexity."
Yet clearly there needs to be some kind of mutability... and I don't see your environment-mutability schemes as being any less complex than variables, since I've already decided to include variables as part of the language, might as well use 'em. Now, if my language didn't have variables to begin with, then I can agree with you that they'd be more complicated than your ideas.
---
"Hmm, why are you even wrapping a local environment in a box? Why not use '$let for every locally scoped definition?"
That's to allow $def! and $set! to create local bindings. Though you're right that if I gave up on internal-definition forms I could probably remove var from local environments.
I'm not convinced that's a net win, but it would solve the problem of $def! creating a local binding sometimes and mutating a var in other cases... so that's pretty nice.
---
But I get the distinct impression you're only being so anti-var because they interfere with your whole partial evaluator scheme... maybe it would be better to figure out a way to deal with vars in your partial evaluator? Unless you're saying that's impossible?
In particular, languages like Scheme/Racket/JavaScript seem to do just fine with mutable variables. Yeah they could be even faster with immutable everything, but I wouldn't exactly call them slow languages...
I know Racket/JS use JIT rather than AoT. I'm okay with that. I'm even okay with Nulan staying interpreter-only, until it becomes too slow to be usable. Python/Ruby seem fast enough to me and they're interpreted (albeit written in C... or is it C++?)
I haven't even gotten to the point of porting Nulan from pure-Python to PyPy, so I have no clue how fast PyPy will make it run. I still have high hopes that PyPy will give it Good Enough(tm) speed for the kinds of things I want to do. That's enough for me.
"So I guess I could use the same auto-unwrap mechanism for both $lazy and vars."
Okay, I got a semantic I'm reasonably satisfied with... there will be three built-in primitives: `implicit`, `$explicit`, and `implicit?`.
* implicit accepts a function as its argument and returns a wrapped structure that when evaluated will call the function argument.
* $explicit evaluates its argument and if it's implicit, it returns the underlying function.
* implicit? just tells you whether the data structure is implicit or not.
This can be used to implement auto-unwrapping behavior for vars and also laziness.
---
By the way, if "eval" were mutable, I could actually implement the whole implicit/explicit thing in Nulan itself. The downside is that every call to eval would have extra overhead because it would have to unwrap the variable. I'm also not convinced it's a good idea to let the language change evaluation in arbitrary ways, but it's an idea I'll have to think about.
Your implicit wrappers sound like they would be too slippery. First I'd try (all implicit? lst)--sorry, I'm using Arc syntax here--and I'd find out all the implicit wrappers were unwrapped before they got to 'implicit?. Then I'd try something like (all [implicit? ($explicit _)] lst), but no dice; the implicit wrappers are unwrapped before they get to [implicit? ($explicit _)]. Either I'd have to pass an fexpr (which 'all shouldn't even support, for API simplicity's sake), or I'd have to hand-roll a new version of 'all. (EDIT: Or maybe 'all would just have to use '$explicit in the first place.)
I think aw's implicit global variables are somewhat better. Things are only unwrapped once--when a symbol is evaluated--and after that they're first-class values, without the evaluator oppressing them at every turn.
---
"This can be used to implement auto-unwrapping behavior for vars and also laziness."
I don't think it can model laziness (since it forces at every step). I recommend to model laziness by way of a doppelganger standard library that builds promises out of promises. Eager utilities would still see these promises as first-class promise objects. I explored this not too long ago as a way to port Haskell code to JavaScript... but I used a different technique than this one, and I grew to dislike it.
The code which uses that technique I now dislike is at (https://github.com/rocketnia/underreact/blob/865ccdb1a2c8dc0...), if you're willing to trudge through its boilerplate. (I'm linking to a specific commit because I plan to delete it for being too long out of date and too hard to maintain.)
---
"I'm also not convinced it's a good idea to let the language change evaluation in arbitrary ways, but it's an idea I'll have to think about."
It might be a challenge to keep things efficient and modular. But it's just the kind of challenge you'd looking forward to, I'm sure. :-p
There's nothing wrong with pursuing this kind of thing. A minimalistic Kernel-style 'eval is already user-extensible in a way, since whenever it evaluates a cons cell, it executes user code (an operative or applicative).
"Your implicit wrappers sound like they would be too slippery."
Yeah, I've already changed it since then. It's in a lot of flux right now!
---
"I think aw's implicit global variables are somewhat better. Things are only unwrapped once--when a symbol is evaluated--and after that they're first-class values, without the evaluator oppressing them at every turn."
Not true. aw's implicit system is the same as Arc/Nu and Nulan except that ar hardcodes symbol names while Arc/Nu and Nulan use first-class mutable thunks, that's all.
aw's implicit global system works by having a table of symbols. The compiler will look at this table and if it finds a symbol in it, it will wrap it as a function call. In other words... in ar, if 'foo is in the implicit table, this:
(foo bar qux)
Would be compiled into this:
((foo) bar qux)
And then foo is a function that returns the value. This is just like Arc/Nu and Nulan except both Arc/Nu and Nulan handle it in much cleaner ways:
In particular, aw's system hardcodes the symbols, which means it isn't late-bound and works poorly with multiple namespaces. Because Arc/Nu uses values rather than symbols, both of those work perfectly.
As for Nulan... it doesn't have late-bindedness or multiple namespaces (by default), so that's not a problem either way. :P
---
"I don't think it can model laziness (since it forces at every step)."
Why not? It may call the implicit thunk repeatedly, but the $lazy vau caches it so the expression is only evaluated once:
$defv! $lazy Env; X ->
$lets: U: uniq
Saved: var U
implicit; ->
$if Saved = U
$set-var! Saved: eval X
Saved
The only issue I see with this is efficiency, which I'm not really worried about right now. That particular problem could be solved by giving a way for the implicit wrapper to mutate itself... then, when evaluating the thunk, it would mutate itself to point directly to the evaluated expression. It would still be slightly inefficient because it would still have to unwrap the implicit, but hey, I can only do so much.
---
"There's nothing wrong with pursuing this kind of thing."
There's nothing wrong with any choices. The question is whether you like them or not, whether they support your goals or not.
By the way... one interesting idea that I would be open to is to change $vau so that it returns a list of two elements, the environment and the return value. In other words, rather than saying this:
$vau Env Foo
... Foo ...
You'd instead say:
$vau Env Foo
... [Env Foo] ...
And then it's trivial to write a wrapper that automatically returns the environment, so there's no loss in convenience. I'm not entirely convinced this is actually better than using a mutable var, but it might solve the problem (?) of assigning to an env while in the middle of an expression, like with (foo ($set! bar ...) bar)
That's a more pure-FP approach to things, but what do you plan to use the returned environment for?
If you're just going to thread it into the next subexpression, then it's still hard to process this in parallel. (As I say in the other comment I just posted, "even one uncompilable expression will make it difficult to know the contents of that container, which will make it difficult to compile any of the expressions which follow that one." In this case the series of values isn't in a container, but it's still hard to follow.)
If you ignore the environment result everywhere but at the top level, and you use it as the environment for the next top-level command, that could be pretty nice.
"That's a more pure-FP approach to things, but what do you plan to use the returned environment for?"
Well, here's how I figured it might work... a $vau evaluates its body left-to-right in its lexical environment, right? So I figured it would take the result of evaluating the top-level expression and then use that as the environment for the next top-level expression.
Like I said, I'm not really terribly worried about parallelism, at least, not that kind of parallelism. Being able to evaluate sub-expressions and stuff in parallel is cool and all, but I get the impression it'll be hard no matter what system I come up with, because of mutation and side effects. It's probably better to make concurrency explicit, like it is in Clojure.
---
"If you ignore the environment result everywhere but at the top level, and you use it as the environment for the next top-level command, that could be pretty nice."
Yeah that pretty much sums up my idea. I'm still not convinced this is actually the right way to do it, but it is interesting enough to have mildly piqued my interest.
You get used to it (I hope). Anyways, it's actually like this:
($lets (Top (get-current-env))
(N ($quote nou))
(Old ((Top) N))
(eval Top ($quote $use! foo))
(set! Top N Old))
...Which just caused me to see a bug, woohoo. Anyways, the reason it calls Top is because (as subtly pointed out in the article) environments are variables. Calling a variable returns whatever the variable is pointing to[1]. So we first call Top which returns an immutable data structure, and we then call that data structure to return the value at the key.
Basically, the reason why we're doing this is, a data structure might contain a variable or a value, so we're grabbing the variable-or-value from the current environment, and then assigning it back again after the call to `$use!`
---
* [1]: This isn't set in stone yet, and I'm still thinking about the right semantics for variables.
Right, but all the built-in things that deal with environments usually wrap them in a var. There's no special "environment data type", it's just an ordinary dictionary that is wrapped in a var and treated like an environment[1]. So you could pass in your own custom dictionary to eval, for instance. At least, that's the plan.
---
* [1]: Actually, I was thinking that it might be a good idea to have a special environment data type, which is exactly like a normal dictionary except that if you try to set a key that isn't a symbol, it throws an error. Not sure about that yet.
To start out, you can try adding this line to news.css:
body { direction: rtl; }
The news.css stylesheet isn't actually its own file. It's defined in news.arc like this:
(defop news.css req
(pr "
...
"))
When you add that line, the page layout will go from right to left.
You may end up with petty page layout issues that have to be solved on a case-by-case basis.
---
You might have character encoding issues, but probably not. The code in srv.arc announces in an HTTP header that it's serving UTF-8, which it is. But just to be safe, you may want to put a charset declaration in the HTML itself like this:
; In html.arc, insert this alongside the other (attribute ...)
; declarations:
(attribute meta charset opstring)
; In news.arc, add one line:
(mac npage (title . body)
`(tag html
(tag head
(tag (meta charset "UTF-8")) ; This is the added line.
...)
...))
---
Some characters will probably be displayed out of order. This is because HTML and Unicode try to arrange characters by the characters' own directionality, and when characters of different directionality mix, the result is confusing.
The fix is to use special Unicode characters to clearly delineate which parts of the text are ltr and which are rtl: http://www.w3.org/TR/WCAG20-TECHS/H34
I don't know how easy it will be for users to enter these characters (e.g. if they're pasting ASCII code alongside Hebrew text). All I can say is that I hope it's something Hebrew IMEs already support. :/
I'd be glad to hear horror stories or success stories about Hebrew IMEs (or IMEs of any language). :)
---
You might also want to change the page generator to emit <html lang="he">, although I guess I don't see a lot of sites actually doing that. :)
If you do care about the lang attribute, it should be possible to insert it with two lines of code:
; In html.arc, insert this alongside the other (attribute ...)
; declarations:
(attribute html lang opstring)
; In news.arc, change one line:
(mac npage (title . body)
`(tag (html lang "he") ; This is the changed line.
(tag head
...)
...))
If you're going to do this, add another (attribute ____ lang opstring) line for each tag name you want to use the lang attribute with.
---
I've done this for an HTML page before, if you can't tell. :-p But I have not used news.arc, so the code I'm suggesting might not work. I hope it gets you started though.