It already has a command-line interface: just use "./nulan". I'm working on getting it to work with shell scripts too.
As for socket.io... well, that's just a JS library, right? And Nulan just compiles to raw JS, so you should be able to interface with it just fine.
As for online hosting... no, not really. I mean, if somebody wants to host it, feel free, but I'd personally wait until it's all nice and polished first.
"As for socket.io... well, that's just a JS library, right? And Nulan just compiles to raw JS, so you should be able to interface with it just fine."
I mean Node.js gives JavaScript the side effect powers typical of OS processes, while the browser gives JavaScript a UI. With socket.io, it should be very easy to take your browser interface and use it as a command shell. I can help if you'd like. ^_^
---
"As for online hosting... no, not really. I mean, if somebody wants to host it, feel free, but I'd personally wait until it's all nice and polished first."
I was thinking of something easy and free like GitHub Pages. I understand if you'd like some more polish though.
"I mean Node.js gives JavaScript the side effect powers typical of OS processes, while the browser gives JavaScript a UI. With socket.io, it should be very easy to take your browser interface and use it as a command shell. I can help if you'd like. ^_^"
Alrighty. I'm still not 100% sure how you'd accomplish that, so I wouldn't mind some help.
---
"I was thinking of something easy and free like GitHub Pages. I understand if you'd like some more polish though."
Well, before I consider any of that, I have a few cool ideas I want to add in first to "nulan.html".
I stumbled across this in an unrelated search for "meta-control." The documentation quality surprised me. Take a look at it yourself for a bit and come back. :)
---
I found myself interacting with the example before I finished reading about it. The clear section organization of "The meta code..." and "Your goal:" makes it easy to skim. The fact that the goals are very simple--just change an identifier--makes it painless to jump in and modify the example.
If I actually cared about using the system documented here, I would probably do more than change identifiers. :-p
Oops, I couldn't help myself. I put a combo box in my combo box. Broken, but wonderful! XD
"One could argue that a "usability test" of Fortran II leaded to a complete new language: BASIC, which was designed to be more usable (especially for beginners) than its predecessor."
Funny. I learned BASIC first, and then the second language I learned was C, when I was about 15. I found C much more understandable than BASIC. Things that had seemed mysterious in BASIC suddenly became crystal clear in C. Like the fact that a character is the same as an integer holding its ASCII code, a string is the same thing as an array of characters -- and a pointer is the same thing as a memory address. The puzzle pieces all fit. The thinking seems to go that the more you abstract away the hardware, the more understandable the language will be. But it seems to me that a concrete concept is easier to understand than an abstract one. The hardware gives you a concrete frame of reference.
I'm not sure why you bring up C. A look on Wikipedia tells me FORTRAN II was made in 1958, then BASIC in 1964, and then C in 1969-1973. FORTRAN II looks like it was clearly in the same design family as BASIC, with line numbers, multiple GOTO variants, capitalized English words for syntax, and a necessity to munge obscure memory addresses to invoke advanced functionality. :)
Just like you, the first language I learned was BASIC--specifically Applesoft BASIC--and the second was C. I liked C better because its program code was more modular (no need to push around line numbers when combining programs or inserting code) and it had variables that were local to the current procedure call, which made recursion much more useful.
Then I learned JavaScript, and I no longer had to worry about choosing specific numbers for my array sizes or fumbling with pointer operators. Then I was formally taught Java, and that's when I finally felt capable of writing just about any program: Run time allocation was now easy even when the lifetimes didn't fit a recursion hierarchy (i.e. I couldn't stand malloc() before), and the notion of behavior as part of data made it easy to pursue higher-order designs.
---
"But it seems to me that a concrete concept is easier to understand than an abstract one. The hardware gives you a concrete frame of reference."
Although I briefly programmed in C and I've occasionally read machine code, I don't recall ever considering computation hardware to be a very good frame of reference. Mathematics is where I find confidence, and user experience is where I find tangible feedback.
C tells an elaborate story of a world where memory is mostly one-dimensional, mutable, and uniformly cheap to access, where this memory contains all execution state (even the stack), and where execution takes place in a sequence of discrete steps. In our present-day world of cloud and mobile computing (not to mention the future), where we use networks, caches, distributed code sandboxes, predictive branching, cryptography, etc., this metaphor of computation is a joke at spatially large scales and only an approximation at small scales.
I suspect C feels close to the hardware because (1) historically it's been much closer, (2) CPU-scale architecture design has continued to pander to it, and (3) its elaborate story provides programmers with a chance to discover escape hatch after escape hatch, until they're trained to believe that C closes no doors to them.
---
"a character is the same as an integer holding its ASCII code, a string is the same thing as an array of characters"
If the text you need to represent is always unformatted, canned English, then a sequence of ASCII codes might be the only representation you need. However, I hardly consider all text to fall into that category. I think text can be as hard as natural language and typography, which can be as hard as UI.
>> Then I learned JavaScript, and I no longer had to worry about choosing specific numbers for my array sizes or fumbling with pointer operators.
Javascript is easier in that way because it has garbage collection. Garbage collection makes programming easier. But you can't add garbage collection to every programming language. There are trade-offs when the language abstracts away details like memory management.
>> I suspect C feels close to the hardware because... CPU-scale architecture design has continued to pander to it
I think this gets to the issue of the Von Neumann architecture, and the fact that it isn't the only possible way to design a CPU. I'm not well educated on this subject... However, I don't think you can say that C is the reason for the Von Neumann architecture. I think it's the other way around.
>> "a pointer is the same thing as a memory address"
>> Eh? Who says it isn't? :)
I meant that pointers are the same as integers, integers that hold memory addresses. Which is what they are from the standpoint of the cpu.
"The thinking seems to go that the more you abstract away the hardware, the more understandable the language will be. But it seems to me that a concrete concept is easier to understand than an abstract one. The hardware gives you a concrete frame of reference."
The reason C "feels right" is because it matches the hardware, as you said.
But that doesn't mean high-level is bad: you can have high-level hardware. I'm sure on a Lisp machine, Lisp would feel right at home, whereas C wouldn't.
Unfortunately, I predict we'll be stuck with our von Neumann hardware for quite some time, which will severely hinder the progress of our software. No such thing as a free lunch, eh?
I didn't say high-level is bad. The biggest counter-example to that I can think of is pure mathematics. Clearly, mathematics is incredibly useful.
It just seems to me that, more often than not, the orthodoxy with respect to programming is wrong. Or, at best, partly wrong.
Another thing about programming languages designed for beginners is that they tend to be very strongly typed, e.g. Basic and Pascal. This is either because the designers thought it made things easier, or perhaps because they thought it instills some useful lesson. As far as making things easier, that doesn't seem to be true: Think of how many people out there that would flunk out of CS school are managing to write javascript and PHP.
Leaky abstractions is exactly right. At some point, all abstractions seem to break down. This is why I'm not sure the OO principle of encapsulation or information hiding is a good thing. The more you hide the implementation of something, the more you are forcing the use of an abstraction, and the harder it will be to go around the abstraction when, inevitably, you have to.
I don't think there is much you can say about programming languages without qualification. The range of programs you can write is too vast to generalize.
How do you feel about Lisp versus SML? I had to learn SML my second year in college, but at that time in my educational career I wasn't doing much studying, and I only half-learned it. I think some form of Lisp might have made a better introduction to functional programming, because you wouldn't have to deal with SML's type system at the same time.
I've never used SML and have only read a little about it. It looks a lot like Haskell, which I don't have much experience with either.
But from what I've seen, I don't like static type systems. I think making them optional is fantastic, but I don't like having them shoved down your throat.
I think you should be able to write your program in care-free Ruby/Arc style, and then once things have settled down, go back in and add type annotations for speed/safety. But you shouldn't have to use the type system right from the start.
The problem is that a lot of the people who find static type systems useful are also the kind of people who like safety a lot, so they want the type system on all the time. Not just to protect their code, but to prevent other programmers from making mistakes.
I don't like that mindset. Which is why I prefer languages like Ruby and Arc, even with their flaws. I don't think any restriction should be added in to prevent stupid people from doing stupid things. I think the language should only add in restrictions if it helps the smart people to do smart things. And for no other reason.
So as long as the type system helps smart people to do smart things, and doesn't get in the way too much, then sure, I think it's great. But if it gets in the way, or it's done to prevent stupid people from doing stupid things... no thanks.
In that line of reasoning, I've been thinking about adding in a static type checker to Nulan. But I want it to use a blacklist approach rather than a whitelist.
What I mean by that is, if it can be guaranteed at compile time that a program is in error, then it should throw a well formatted and precise error that makes it easy to fix the problem.
But if there's a chance that the program is correct, the type system should allow for it. This is the opposite of the stance in Haskell/SML which says: if it cannot be guaranteed at compile-time that a program is valid, then the program is rejected.
Here's an example of what I'm talking about:
def foo ->
bar 10 20
The variable `bar` isn't defined. This can be determined at compile-time. Thus, Nulan throws this error at compile-time:
NULAN.Error: undefined variable: bar
bar 10 20 (line 2, column 3)
^^^
The error message is precise, and pinpoints the exact source of the error, making it easy to fix. And likewise, this program...
def foo -> 1
5
foo 2
...creates a function `foo` that requires that its first argument is the number `1`. It then calls the function with the number `2`. This situation can be determined at compile-time, and so I would like for Nulan to throw this error:
...it might not be possible to determine whether the first argument to `foo` is the number `1` or not. If this were Haskell/SML, it might refuse to run the program. But in Nulan, I would simply defer the check to runtime.
This means that every program that is valid at runtime is also valid according to the type-checker. Thus the type-checker is seen as a useful tool to help catch some errors at compile-time, unlike Haskell/SML which attempt to catch all errors at compile-time.
I think this kind of hybrid system is better than pure dynamic/pure static typing.
How is this different from preventing stupid people from doing stupid things?
I've said this recently, but I like static typing when it contributes to the essential details of the program, rather than merely being a redundant layer for enhancing confidence in one's own code. Static typing is particularly meaningful at module boundaries, where it lets people establish confidence about each other's programs.
Anyway, enhanced usability is nothing to scoff at either. If you find this kind of static analysis important, I look forward to what you accomplish. :)
"How is this different from preventing stupid people from doing stupid things?"
Because the only difference is whether the error occurs at compile-time or run-time. I'm not adding in additional restrictions to make the type-system happy: if the type system can't understand it, it just defers the checking until run-time.
Thus, the type system takes certain errors that would have happened at run-time, and instead makes them happen at compile-time, which is better because it gives you early error detection. What the type system doesn't do is restrict the programmer in order to make it easier to detect errors at compile-time.
---
"If you find this kind of static analysis important"
Not really, no. Useful? Yeah, a bit. It's nice to have some early detection on errors. But my goals aren't to guarantee things. So whether you have the type-checker on or off just determines when you get the errors. A small bonus, but nothing huge. So I'd be fine with not having any static type checker at all.
The way I see it, what you're talking about still seems like a way to cater to stupid programming. Truly smart programmers don't generate any errors unless they mean to. ;)
---
"What the type system doesn't do is restrict the programmer in order to make it easier to detect errors at compile-time."
Guarantees don't have to "restrict the programmer." If you take your proposal, but add a type annotation form "(the <type> <term>)" that guarantees it'll reject any program for which the type can't be sufficiently proved at compile time, you've still done nothing but give the programmer more flexibility. (Gradual typing is a good approach to formalizing this sufficiency: http://ecee.colorado.edu/~siek/gradualtyping.html)
I think restriction comes into play when one programmer decides they'll be better off if they encourage other programmers to follow certain conventions, or if they follow certain conventions on their own without immediate selfish benefit. This happens all the time, and some may call it cargo culting, but I think ultimately it's just called society. :-p
"The way I see it, what you're talking about still seems like a way to cater to stupid programming. Truly smart programmers don't generate any errors unless they mean to. ;)"
Then I'll reclarify and say "any programmer who's just as smart as me", thereby nullifying the argument that a "sufficiently smart programmer would never make the mistake in the first place".
---
"If you take your proposal, but add a type annotation form [...]"
Sure, if it's optional. And not idiomatic to use it all the time. The problem that I see with languages that emphasize static typing is that even if it's technically possible to disable the type checker, it's seen as very bad form, and you'll get lots of bad looks from others.
The idioms and what is seen as "socially acceptable" matter just as much as whether it's "technically possible". If I add in type checking, it'll be in a care-free "sure use it if you want, but you don't have to" kind of way. I've seen very few languages that add in static typing with that kind of flavor to it.
---
"This happens all the time, and some may call it cargo culting, but I think ultimately it's just called society. :-p"
And I am very much so against our current society and its ways of doing things, but now we're straying into non-programming areas...
"And I am very much so against our current society and its ways of doing things, but now we're straying into non-programming areas..."
Yeah, I know, your and my brands of cynicism are very different. :) Unfortunately, I actually consider this one of the most interesting programming topics right now. On the 20th (two days ago) I started thinking of formulating a general-purpose language where the primitives are the claims and communication avenues people share with each other, and the UI tries its best to enable a human to access their space of feedback and freedom in an intuitive way.
I'd like to encourage others to think about how they'd design such a system, but I know this can be a very touchy subject. It's really more philosophy and sociology than programming, and I can claim no expertise. If anyone wants to discuss this, please contact me in private if there's a chance you'll incite hard feelings.
I think that C has a good solution. It will compile any code that's possible to compile, but it will output warnings. I don't think it's necessary to halt compilation just to get the programmer's attention. That's what Java does, and it really annoys me.
If the type-checking is not strictly necessary, maybe you should make it an option, like -Wall.
Yes, absolutely. There are certain errors that absolutely cannot be worked around, like an undefined variable. Those are errors that actually halt the program. But the rest should be optional.
I've learned through bitter experience to treat all C warnings as errors, and more. The presence of a single uninitialized local variable somewhere in your program makes the entire program undefined. Where undefined means "segfaults in an entirely random place."
I think that's a good practice in general. But when you are experimenting and debugging, it can be useful to eliminate chunks of code by expedient means, which often generates warnings that you don't care about.
I find programming to fractally involve debugging all the time. So if I allowed warnings when debugging I'd be dead :)
You're right that there are exceptions. I think of warnings as something to indulge in in the short term. The extreme short-term; I try very hard not to ever commit a patch that causes warnings. It really isn't that hard in the moment, and the cost rises steeply thereafter.
Incidentally, I'm only this draconian with C/C++. Given their utterly insane notions of undefined behavior I think it behooves us to stay where the light shines brightest. Whether we agree with individual warning types or not, it's easier to just say no.
But with other languages, converting errors to warnings is a good thing in general. Go, for example, goes overboard by not permitting one to define unused variables.
"The problem is that a lot of the people who find static type systems useful are also the kind of people who like safety a lot, so they want the type system on all the time. Not just to protect their code, but to prevent other programmers from making mistakes.
I don't like that mindset. Which is why I prefer languages like Ruby and Arc, even with their flaws. I don't think any restriction should be added in to prevent stupid people from doing stupid things. I think the language should only add in restrictions if it helps the smart people to do smart things. And for no other reason."
I could not agree more. I think that the idea of preventing mistakes via restrictive language features is one of the dominant ideas behind object-oriented languages. Consider the keywords "private" and "protected;" they literally have no effect other than to cause compile-time errors. It seems to me, intuitively, that the kinds of mistakes that can be easily caught by the compiler at compile time are in general the kinds of mistakes that are easily caught, period. The kinds of bugs that are hard to find are the ones that happen at runtime and propagate before showing themselves, and they are literally impossible for the compiler to find, because that would require the compiler to solve problems that are provably uncomputable. At my last job, I was working on fairly complicated web applications in PHP, and even though occasionally I'd run into a bug that could have been prevented by static type-checking, it was always in code that I had just written and wasn't hard to find. By eliminating things like variable declarations, PHP code can be made very succinct, and I think that simplicity and succinctness more than offset the risks that come from a permissive language. But I've never used a language that came with type-checking optional, so I have never made an apples-to-apples comparison. PHP is actually an interesting example, because in PHP, the rules for variable declarations are basically inverted from normal: you have to declare global variables in every function that you use them (or access them through the $GLOBALS array), but you don't have to declare function-scope variables at all. It makes a lot of sense if you think about it.
"Consider the keywords "private" and "protected;" they literally have no effect other than to cause compile-time errors."
Would you still consider this semantics restrictive if the default were private scope and a programmer could intentionally expose API functionality using "package," "protected," and "public"?
IMO, anonymous functions make OO-style private scope easy and implicit, without feeling like a limitation on the programmer.
---
"It seems to me, intuitively, that the kinds of mistakes that can be easily caught by the compiler at compile time are in general the kinds of mistakes that are easily caught, period."
I think that's true, yet not as trivial as you suggest. In general, the properties a compiler can verify are those that can be "easily" expressed in mathematics, where "easily" means the proof-theoretical algorithms of finding proofs, verifying proofs, etc. (whatever the compiler needs to do) have reasonable computational complexity. Mathematics as a whole is arbitrarily hard, but I believe human effort has computational complexity limits too, and I see no clear place to draw the line between what computers can verify and what humans can verify. Our type systems and other tech will keep getting better.
---
"The kinds of bugs that are hard to find are the ones that happen at runtime and propagate before showing themselves, and they are literally impossible for the compiler to find, because that would require the compiler to solve problems that are provably uncomputable."
I believe you're assuming a program must run Turing-complete computations at run time. While Turing-completeness is an extremely common feature of programming languages, not all languages encourage it, especially not if their type system is used for theorem proving. From a theorems-as-types point of view, the run time behavior of a mathematical proof is just the comfort in knowing its theorem is provable. :-p If you delay that comfort forever in a nonterminating computation, you're not proving anything.
Functional programming with guaranteed termination is known as total FP. "Epigram [a total FP language] has more static information than we know what to do with." http://strictlypositive.org/publications.html
---
"PHP is actually an interesting example, because in PHP, the rules for variable declarations are basically inverted from normal: you have to declare global variables in every function that you use them (or access them through the $GLOBALS array), but you don't have to declare function-scope variables at all. It makes a lot of sense if you think about it."
I find this annoying. My style of programming isn't absolutely pure functional programming, but it often approximates it. In pure FP, there's no need to have the assignment syntax automatically declare a local variable. That's because there's no assignment syntax! Accordingly, if a variable is used but not defined, it must be captured from a surrounding scope, so it's extraneous to have to declare it as a nonlocal variable.
I understand if PHP's interpreter doesn't have the ability to do static analysis to figure out the free variables of an anonymous function. That's why I would use Pharen, a language that compiles to PHP. (http://arclanguage.org/item?id=16586)
>> >> "Consider the keywords "private" and "protected;" they literally have no effect other than to cause compile-time errors."
>>
>> Would you still consider this semantics restrictive if the default were private scope and a programmer could intentionally expose API functionality using "package," "protected," and "public"?
Actually, in C++ the default for class members is private...
It's simply a true statement that "private" generates no machine language. All it does is cause compilation to fail. Whether or not this is a good thing is a matter of opinion.
>> IMO, anonymous functions make OO-style private scope easy and implicit, without feeling like a limitation on the programmer.
If you're speaking of lexical closures, I think you're right. You don't need to declare variables as private, because you can use the rules of scoping to make them impossible to refer to. You can achieve the same thing with a simpler syntax and more succinct code.
>> I believe you're assuming a program must run Turing-complete computations at run time. While Turing-completeness is an extremely common feature of programming languages, not all languages encourage it, especially not if their type system is used for theorem proving.
I'm not assuming that programming languages must be Turing complete. It happens to be true of all general-purpose languages that are in common use today.
>> Functional programming with guaranteed termination is known as total FP. "Epigram [a total FP language] has more static information than we know what to do with." http://strictlypositive.org/publications.html
I'll take a look at that language. I think that in 50 years' time, we might all be using non-Turing-complete languages. Making a language Turing complete is the easiest way to ensure that it can solve any problem, but isn't necessarily the best way.
( Technically, a language has to be Turing complete to solve literally any problem, but my hunch is that all problems of practical utility can be solved without resorting to a Turing machine.)
(arc
(settings ... {instead of <head>}
(load "common-styles.arc")
(styles
(hello
(font "10pt Arial")
(color "red"
)
)
)
(render ... {= instead of <body>}
(repeat 4
(box {insteaf of <div>} style "hello" "Hello World"))
(br)
(a href "www.arclanguage.org")
)
)
I think the primary use case for HTML+CSS+JS is to distribute interactive content to multiple kinds of user agents (desktop screens, mobile screens, text-to-speech screen readers, Web crawlers, etc.), with various kinds of user agent customization (user styles, user scripts, text size configuration, browser extensions, machine translation, etc.).
The specs are kind of a mess, but the way I see it, HTML and CSS try to establish a development experience where developers can cover most cases without thinking about all of them at once. When that falls short, JavaScript provides an escape hatch where developers can model the world the way they see fit, even if it's short-sighted and alienates many users.
I don't see lisp syntax as being particularly well-suited as a format for text. Nor do I see lisp semantics as being a great fit for distributed, networked, interactive computation; they're practically the same as JavaScript's semantics.
"Why is HTML/XML a well-suited format for text and Lisp not? Where are the differences?"
The shorthands of HTML can be pretty nice. For instance, it's nice that HTML collapses whitespace to a single space, and I think there's actually an odd boost in brevity and readability of rich text:
lisp: "Said the program, \"" (i Helloooo) ", nurse world.\""
HTML: Said the program, "<i>Helloooo</i>, nurse world."
Racket tries to get around this shortcoming with its own TeX-like syntax, Scribble:
TeX: Said the program, ``\emph{Helloooo}, nurse world.''
Scribble: Said the program, "@i{Helloooo}, nurse world."
Arc has its own mechanism:
atstrings: "Said the program, \"@(i Helloooo), nurse world.\""
Then, of course, there are markdown languages, which try to incorporate kludges that look similar to the everyday kludges people already use when trying to communicate in unformatted Unicode:
markdown: Said the program, "*Helloooo*, nurse world."
I prefer my own approach, which is probably just a more minimalistic variant of TeX, supporting only a lisp-like section syntax. It lets the section type dictate any idiosyncratic treatment of the section body:
Chops: Said the program, "[i Helloooo], nurse world."
"And why is it not good for distributed, networked computation?"
I mostly make that claim because I strongly believe in David Barbour's RDP (Reactive Demand Programming). RDP, like much research (and practice) in distributed computation, deals with designing control flow constructs that represent changes in the location of the computation. This makes it easier to manage behavior on multiple machines from a single body of source code.
In Web programming, distributed code typically takes the form of a program that primarily runs on the server but generates HTML+CSS+JS. Tools intended to help in this process include text-munging tools like PHP and Rails's use of templating syntax; control-flow-munging tools like Racket and Arc's Web continuation management; and languages whose distributed control flow is more seamless by design, like Opa and Ur/Web.
While most forays into distributed computation, including these, seem to start from an imperative language, RDP starts from a reactive one. A reactive control flow graph applies at any given instant, rather than implicitly carrying interactions across time. Thanks to both time- and space-transportation being represented explicitly, it's probably easier to account correctly for the time cost of space transportation. (And when state, being the time-distribution channel it is, is represented externally to the program, it's easier to use other tools to manage it.)
Although RDP is reactive, it even diverges from popular reactive models like FRP (Functional Reactive Programming) by abandoning the idea of discrete events. Discrete events tend to complicate fan-in computation flow (how do we zip and deduplicate event streams?), and yet they're inessential to the continuous and tentative way we actually interact with the world.
Much of the complexity of Web programming is likely due to the fact that our space- and time-distribution standards--HTTP requests, HTTP caching, cookies, WebSockets, HTML forms, HTML local storage, etc.--are designed with eventful interaction in mind. JavaScript (and potentially a JavaScript-like lisp) only enables our event addiction.
On the other hand, HTML and CSS have reactive semantics. They're very inelegant as they stand today, but with some maintenance, they might represent a better evolution path toward a simple Web.
"Arc uses ar-nil-terminate and ac-niltree to do the Arc->Racket->Arc conversion. ar-nil-terminate does a shallow copy, but ac-niltree does a deep copy.
However, the Arc compiler could be changed so that + uses a shallow version of ac-niltree."
That's the essence of the bug, as I see it. This fix is much shallower than the other fixes discussed, so this fix would make the most sense in a minimally updated variant of pg's Arc.
Should Anarki go for shallow or deep improvement? I've advocated shallow in the past, but now I'm thinking Arc oughta follow through on its promise to "break all your code," which of course means the entire network of hacks, libraries, help resources, and alternate language implementations we've built up so far. It would be nice to see Anarki become a fork of Arc/Nu and undergo deep improvements, without losing the Arc flavor.
"I feel uneasy about the shallow change to +; I'm sure the same bug exists in some other functions since deep niltree is the default."
From what I can see, there are only three places in the pg-Arc code where 'ac-niltree traverses too far, and they're all in ac.scm. Two are the definitions of + and ar-+2, and one is a misleading comment:
; Arc primitives written in Scheme should look like:
; (xdef foo (lambda (lst)
; (ac-niltree (scheme-foo (ar-nil-terminate lst)))))
; That is, Arc lists are NIL-terminated. When calling a Scheme
; function that treats an argument as a list, call ar-nil-terminate
; to change NIL to '(). When returning any data created by Scheme
; to Arc, call ac-niltree to turn all '() into NIL.
; (hash-table-get doesn't use its argument as a list, so it doesn't
; need ar-nil-terminate).
From another point of view, there are only a few places where 'ac-niltree probably needs to be recursive. Those are in the definitions of 'ac-call, 'ac-mac-call, and 'ac-macex, where they deal with quotation and macroexpansion, the two ways literal code is made available to Arc programs.
The other uses of 'ac-niltree are in 'ar-coerce (string to cons), 'dir, and 'timedate, where the lists are flat anyway.
I only looked for uses in pg-Arc, not any code that's been derived from it.
"Confusing that you're using deep/shallow with two meanings in the same comment :)"
Whoops! My comment was originally going to stand on its own, but I adapted it into a reply when Pauan got to what I wanted to say first. :) I didn't notice the word overlap.
"The nice thing about indenting in C is simply that it falls out naturally, so you don't have to think about it."
In my experience with Java, Groovy, and JavaScript, the indentation gets just as idiosyncratic when it comes to nested function call expressions, long infix expressions, and method chaining.
I've rarely used C, but does its error code idiom obscure this drawback? I imagine error codes force programmers to put many intermediate results into separate variables, whether they like it that way or not.
In other words, you usually won't see this in C, right?
foo(bar(qux(1, 2, 3)))
Instead, you'd write it more like this:
x = qux(1, 2, 3)
if (x == ERR) {
...
}
x = bar(x)
if (x == ERR) {
...
}
x = foo(x)
if (x == ERR) {
...
}
Because you don't have nested expressions, there's only a single way to indent the code. But in languages like JavaScript, method chaining and nested function calls mean that there's now multiple ways to indent the same code.
Thus, C's syntax isn't actually more uniform than Lisp, it only seems that way because of C's way of handling errors.
Hmm, I'm losing track of my point. I prefer writing code so it models all errors explicitly, so I end up with that kind of verbosity in JavaScript too. I only get code like foo(bar(qux(x))) when I'm using lots of function calls that have no errors (or whose arguments can be errors).
>> C's syntax isn't actually more uniform than Lisp
C's syntax is actually less uniform, isn't it?
C isn't a very sophisticated language, but it tends to be readable. At least in the sense of following the flow of control; perhaps things like error handling make it hard to see the forest for the trees.
There may be a fundamental law that the more underpowered the language, the easier it is to read. Sort of like how Dr. Seuss books are more readable than research papers on programming languages theory, right?
I don't think Pauan was referring to C syntax as a whole. In this subthread, I think we've been specifically talking about whether certain languages have a "single, canonical" indentation style that "falls out naturally."
---
"There may be a fundamental law that the more underpowered the language, the easier it is to read. Sort of like how Dr. Seuss books are more readable than research papers on programming languages theory, right?"
In one sense that's true, since it's easy to make naive improvements to one feature while neglecting another. In another sense, a less readable language is always relatively "underpowered" due to its greater difficulty to use (assuming it's a language we use by reading :-p ).
I think C is a great language. It maps straightforwardly onto to the capabilities of the hardware. What I meant by calling it underpowered is that it doesn't do much to increase your power beyond freeing you from having to write assembly language.
Higher order functions and metaprogramming are the sort of things I associate with a powerful language, like Lisp. But sometimes things get so abstract you can't tell what you're looking at.
As you point out, it's easy to ruin something like a programming language while trying to improve it. (I haven't created a programming language, but I've used bad ones.)
> "There may be a fundamental law that the more underpowered the language, the easier it is to read."
That's a lot stronger claim than your original :) Aren't python, ruby, and haskell all high-power but easy to read?
There's the confounding effect of learnability; lisp gets more readable over time. There's also the confounding effect of density or difficulty. This quote captures both:
"If you're used to reading novels and newspaper articles, your first experience of reading a math paper can be dismaying. It could take half an hour to read a single page. And yet, I am pretty sure that the notation is not the problem, even though it may feel like it is. The math paper is hard to read because the ideas are hard. If you expressed the same ideas in prose (as mathematicians had to do before they evolved succinct notations), they wouldn't be any easier to read, because the paper would grow to the size of a book." (http://www.paulgraham.com/power.html)
I seem to have overlooked this post until just now...
Incidentally, I've never written python, ruby, or haskell, except for a tiny amount of python.
Good quote. I've been reading a lot of computer science papers lately, and I tend to skip over the math formulas and focus on the text. This could be because I'm reading them for "fun" and not because I have to for a class, or something. But I have always found it hard to take in dense notation, and preferred a conceptual argument. Maybe it's just that I have a deficiency in that area. But I think prose has the potential to carry powerful insights that are out of the reach of formulas; I suspect the problem is that succinct, brilliant prose is just incredibly hard to write. It's probably easier to just list formulas than to get deep ideas into prose. The reverse is also true, of course. Some ideas can only be expressed properly with notation.
But that probably has nothing to do with programming language syntax per se.
"I've been reading a lot of computer science papers lately, and I tend to skip over the math formulas and focus on the text."
I do that too. :) Unfortunately, at some point it gets hard to understand the prose without going back to read some of the fine details of the system they're talking about. XD
I tend to jump around. The introduction is usually boilerplate for the particular area of research, so it can be skipped. (I wonder how many different papers have told the story of the memory hierarchy and how it's getting more and more important as data gets bigger.) Then I try to figure out if the paper has anything important to say, before working on the math. I figure that sometimes the big idea of the paper is in the math, and other times, the big idea is in the text, and the math is just obligatory. (You can't publish a paper on an algorithm without spelling out the precise bounds on time and space, even if the formula contains 15 terms. Not all 15 terms can be important to the performance, but it certainly is important to put them in the paper.) I guess it depends on the field, but in the data structures papers I like to look at, it usually doesn't take a lot of math notation to express the key innovation.
Well, essentially we already have 'avg:list as the variadic version. In fact, it looks like we can shave off two characters:
(map(fn e avg.e)r t)
(map avg:list r t)
I don't remember ever using 'avg except maybe to average some time measurements at the REPL, and I think the idiom (avg:accum ...) comes in handy for that. But I generally don't use arithmetic operators in my programs, probably due to the subject matter I choose.
After I wrote that comment it seemed to me that variadic is more fundamental, because you can always convert lists using apply. But you're right that it's not hard to bounce between the two.
There'll totally be places where it matters whether something conses or not. But that should reflect in the (longer) name, IMO. cons-minimal-append, mayhap?
I don't know exactly what you're asking for, but I'll try to help you. :)
---
Thank you, Pauan.
Ok. I made a mistake not to call apply.
Now it is clear. I learned from this mistake.
---
The phrase "apply it" is confusing because we sometimes say "apply a function (to some arguments)" even when we're not calling apply.
As a verb, "to mistake something (for something else)" means "to misunderstand something (thinking it's something else instead)." As a noun, "a mistake" is an accidental occurrence. The idiom "to make a mistake" means to cause an occurrence by accident.
The word "mistake" isn't usually broken up into "mis-" ("incorrectly") and "take" ("understand").
"That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name."
Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't.
---
"If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime."
I did this in Penknife too, but Penknife only had mutable boxes.
Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?
---
"JS doesn't have boxes"
Sure it does! I usually use { val: _ } when I want a box.
You should probably keep using separate JavaScript variables, but here are your examples using boxes:
// (var foo 1)
(function () {
env.foo = { val: 1 };
})();
// (def bar -> foo + 5)
(function () {
var g1 = env.foo;
env.bar = { val: function () {
return nulan.plus_( g1.val, 5 );
} };
})();
// (var foo 3)
(function () {
env.foo = { val: 3 };
})();
// (bar)
(function () {
return bar();
})();
// (mac foo -> {bar qux corge})
(function () {
var g1 = env.bar;
var g2 = env.qux;
var g3 = env.corge;
env.foo = { val: nulan.macro_( { bar: g1 }, function () {
return [ g1.val, g2.val, g3.val ];
} ) };
})();
// (let bar 5 (foo))
// ==>
// (do (var bar 5)
// (bar qux corge))
(function () {
var g1 = env.foo.val.captures.bar;
var g2 = env.qux;
var g3 = env.corge;
return (function () {
var bar = 5;
return (0, g1.val)( g2.val, g3.val );
})();
})();
(The calls to nulan.macro_() and nulan.plus_() are just placeholders for whatever you would use there.)
Since I'm not using separate JavaScript variables for separate definitions, the original bar is totally out of scope in (let bar 5 (foo)), and I end up having to smuggle it in as part of foo (env.foo.val.captures.bar). I did this in Penknife, and I called the foo->bar path a "lexid." If I had to do it again, I might consider generating unique variables outside every environment, something like what you're doing.
For an easier solution all around, we can call the 'foo macro at run time, rather than relying on the compiler to expand it beforehand.
// (let bar 5 (foo))
// ==>
// (do (var bar 5)
// (bar qux corge))
nulan.eval_( env, [ "let", "bar", 5, [ "foo" ] ] );
This way, 'foo will return its own view of [ bar, qux, corge ], and we don't have to worry about bar being in scope.
As a bonus, this approach allows side effects during macroexpansion to work properly, rather than being forgotten by the compilation process.
The reason I didn't do this in Penknife is because Penknife macros suppressed parsing, the most expensive step, so this kind of compilation wouldn't have done any good.
---
"Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster."
Just wait until someone comes along and thinks your language is too static. ;)
That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now.
---
"I'd still like to hear rocketnia's advice on bytecode, though, for future reference."
I've only done a few things with bytecode, most of which involve reverse engineering GB/NES/SNES machine code and game-specific binary data. I've taken a look at bytecode generation for the JVM a few times, but I know better than to trouble myself with that. :-p
Bytecode is a term commonly used to describe binary formats for programs that run on virtual machines. It's designed to be computationally cheap to execute (e.g. JIT-compile) on multiple CPU machine code dialects.
JavaScript is the only "machine" code you're targeting, so there's not much need to build a compatibility layer. However, you might be interested in fine-tuning for particular JS engines' strengths and weaknesses. For instance, supposedly V8 compiles JavaScript a lot like C code, so it might encourage large blocks of imperative code, few variables, and heavier use of structured control flow than function calls. (Eh, I almost never write C code, so what do I know?)
"Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't."
Ewwww. Slow. And yes I know you only need to eval once per top-level form, but the problem is that means you need to actually keep the top level forms around and the compiler/runtime needs to always be present.
Keep in mind that I want to use Nulan to write Chrome Extensions. Now imagine a popup that occurs when the user presses a button. This popup is coded in Nulan. Now, every time the user presses the button, it has to re-parse, re-compile, and re-eval all of the Nulan code that is inside the popup.
I consider that unacceptable for my desires, where the code should be about as fast as handwritten JS. So, Nulan compiles things ahead of time to a JS file, to achieve those speeds.
---
"Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?"
Well, in Racket, I just use a "let" for local scope, so no, locals aren't boxes. But I could make locals use boxes too. It would just be a lot harder to implement. At that point I might as well just write an interpreter.
And in fact, the interpreted vau version of Nulan does use boxes for all variables, global and local.
---
"Sure it does! I usually use { val: _ } when I want a box."
Yes, but then it has to do the boxing/unboxing at runtime in addition to the variable lookup. Whereas with Nulan's scheme, you get almost all the same benefits, but with just the variable lookup. The only downside is that the boxes are no longer available at runtime (but you can access the fake "boxes" at compile-time).
---
"Just wait until someone comes along and thinks your language is too static. ;)
That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now."
Yes it is a tradeoff. Yes I know it's more static than Arc or the Racket version of Nulan. But with my scheme, I can actually make things feel fairly fluid. As an example, this works:
(var + (fn ...))
(+ 1 2)
That is, the runtime function + shadows the macro +, so it doesn't do the macro expansion. And macros can hygienically expand to runtime values, as I already explained, thanks to the variable renaming scheme.
So, the only real downside is that you can't do this:
(def bar -> ...)
(mac foo ->
(bar ...))
That is, you can't actually call a runtime function from inside the macro. But you can still expand to it:
(mac foo ->
{bar ...})
And you can also evaluate the "bar" function at compile-time, which would make it available:
(&eval (def bar -> ...))
(mac foo ->
(bar ...))
I consider this to be a perfectly reasonable downside in exchange for increased speed.
"Why does it pass { bar: g1 } as the first argument to nulan.macro_?"
That object becomes the "captures" property in env.foo.val.captures.bar later. In Penknife, that's how I let code have run time access to the lexical scopes of compile-time-expanded macros.
---
"Why do you do (0, g1.val)? That's the same as just using g1.val."
I like to go out of my way not to pass a "this" parameter unless I actually want to pass it. If I were hand-writing this code, I would have probably used a fresh local variable to hold g1.val, but this this (0, g1.val)( ... ) approach is sufficient for compiler output.
Well, this behavior is the one that surprised me initially:
o.foo() -> Object
I expected the property access and the procedure call to be two completely separate steps. Instead, there's a secret avenue that somehow exposes my local variable "o" to the callee without me explicitly passing it in!
After this encounter, I reasoned that a.b( ... ) must be its own syntax, only superficially related to a.b and b( ... ). My choice to use this syntax is explicit. This intuition worked well for a while, but then this behavior surprised me:
(o.foo)() -> Object
((o.foo))() -> Object
So it's its own syntax, but it also supports grouping parentheses? What for?
Turns out the spec actually specifies this behavior in terms of "References." An expression doesn't just return a value; it returns a Reference that can be either resolved to a value, assigned to, deleted, passed to typeof, or called as a method call. (Maybe there are some other options too.)
In the spec's own words:
"The Reference type is used to explain the behaviour of such operators as delete, typeof, and the assignment operators. For example, the left-hand operand of an assignment is expected to produce a reference. The behaviour of assignment could, instead, be explained entirely in terms of a case analysis on the syntactic form of the left-hand operand of an assignment operator, but for one difficulty: function calls are permitted to return references. This possibility is admitted purely for the sake of host objects. No built-in ECMAScript function defined by this specification returns a reference and there is no provision for a user-defined function to return a reference. (Another reason not to use a syntactic case analysis is that it would be lengthy and awkward, affecting many parts of the specification.)"
An ECMAScript Reference is similar to what I independently called a "fork" in Penknife. Forks were my approach to unifying Arc-style metafns, setforms, and macros into a single syntactic mechanism. A Penknife syntactic abstraction would return a fork value, and that value would have behaviors for parsing a macro call body, for setting, for getting, and for acting as a top-level command. The result of the parsing behavior would be another fork, so this branching process could handle multiple layers of brackets, metafn style. (Now I understand this as a monadic, coinductive approach. Yay big words.)
I've compared JavaScript's method invocation to metafns before: http://arclanguage.org/item?id=12094. I was somewhat supportive of the combination in that post, but I think Arc, JavaScript, and Penknife just use this technology for name punning, rather than resolving any essential conundrums of language design, so I'm not very enthusiastic about the technique in general.
P.S.: I've also casually explained this part of JavaScript's method invocation to you before: http://arclanguage.org/item?id=14665 But it was technically a different example, so no worries. ;)
Yes, the whole "this" system in JS is pretty insane. Naturally Nulan solves this by requiring you to explicitly pass in the object. This also gives you the "call" and "apply" behavior of JS for free, in an orthogonal way.
P.S. The JS version of Nulan is coming along quite nicely. 25 working macros and 10 semi-working macros. It can handle a ton of stuff already, but it still has a long way to go!