Cool. I wondered when Ruby would get something like that.
Thankfully, I believe Nulan avoids most of that issue by A) using global names, so you could simply define a new "camelize" function, no need to attach it to the String class, and B) a combination between mutable variables/hyper static scope/unique types/Arc-style "extend" to control extensibility.
I'd be interested in seeing some examples where Ruby's refinements are better than Nulan's system.
---
Some interesting points:
"Because there's no "using" anywhere in the code, and we're not extending some other class, most folks will assume we're simply concatenating strings here. After all, why would I expect my "+" call to do something else? Why should my "+" call ever do something else here?"
"The "module_eval" behavior of refinements simply goes too far. It forces every Ruby programmer to second-guess every block of code they pass into someone else's library or someone else's method call. The guarantees of standard method dispatch no longer apply; you need to know if the method you're calling will change what calls your code makes. You need to understand the internal details of the target method. That's a terrible, terrible thing to do to Rubyists."
I'm fairly sure Nulan avoids both these issues, but I think this kind of "magical action at a distance" is consistent with Ruby's hyper-liberal viewpoint.
Hm, thinking about it some more... Nulan does technically suffer from the same problems as Ruby, because of mutable variables. I took this example (https://gist.github.com/4110634#file_ref_8.rb) and tried rewriting it in Nulan style:
# library1
(types %inject)
(def (add-all [ %inject f ])
(f "" -> str acc (+ acc str)))
# application
(import library1)
(def (my-+ x y) "@x plus @y")
(def (my-array)
[ %inject -> str f
(do (each {"foo" "bar" "qux"} -> x
(set! str (let! + my-+ (f x str))))
str) ])
In this case, we're using "let!" to dynamically rebind "+" to "my-+". I think this is okay, though, for two reasons:
1) It's not idiomatic to assign to variables. Instead, it's idiomatic to use unique types to create extensible functionality. Mutability is discouraged in general (though allowed).
2) It's also possible to protect against such things, if you really want to. For instance, you could do this:
This is clunky, though, having to manually make everything safe to use. If I really wanted to, I could make a macro that automatically protects against such things:
But I, personally, don't see much reason for that since as point 1 said, it's not idiomatic to assign to variables.
Of course, I guess the same argument could be used for Ruby... if refinements aren't idiomatic, then it's not a big deal, right? Well, except that mutable monkey-patching actually is idiomatic in Ruby, refinements are just a way to scope them.
In any case, I feel that due to Nulan's different idioms and such, the kinds of problems Ruby has generally aren't an issue.
"In any case, I feel that due to Nulan's different idioms and such, the kinds of problems Ruby has generally aren't an issue."
It perplexed me that you didn't take that point of view in the first place. :) From that blog post, Ruby's refinements strike me as a snowball of complexity arising from the existing use of monkey patching in the wild. Nulan has no entrenched user base and no monkey patching mechanism (that I can see offhand), let alone a culture of monkey patching, so you have little to worry about. :-p
---
"It's not idiomatic to assign to variables [in Nulan]."
Well, I wonder what Ruby idioms were like before Rails came along! EDSLs often use every readability-enhancing language feature they can get their paws on, regardless of the intended stability of those features.
I'm a culprit of this. Consider my Arc module system, where namespaces work by assigning macros to the global scope, calling 'eval on a module body, and removing the macros again. When my module system didn't work in Jarc and Rainbow, I considered Jarc and Rainbow to be nonconformant, even after jazzdev and conanite fixed most of the obvious bugs and probably conformed to pg's intended design about as well as pg-Arc did. :)
---
Blog post: "It forces every Ruby programmer to second-guess every block of code they pass into someone else's library or someone else's method call."
You: "I'm fairly sure Nulan avoids [this issue]"
Actually, since Nulan's an fexpr language, its programmers potentially have to second-guess how every single expression will be evaluated. XD Yes you may have idioms to manage this paranoia, but that doesn't make Nulan exempt to the blog post's point, since Ruby could develop idioms for this too.
---
"[...]but I think this kind of "magical action at a distance" is consistent with Ruby's hyper-liberal viewpoint."
Yeah, when the blog post talks about a very dynamic language and warns about how certain features might threaten to break its local static invariants, there's a sense of whiplash there. :-p But it does make sense to care about local static reasoning in this case, since that's (at least informally) what refinements are supposed to bring to monkey patching in the first place.
"My pet peeve: Lisp's multiple flavors of equality are still with us 50 years later, leading later languages like javascript astray. The superior approach IMO is to use structural equality by default, and offload the semantics of eq to an operator that returns a stable address for a value. In my toy language I call this operator addr. So I'd replace (eq a b) in Common Lisp with something like (equal (addr a) (addr b))."
Nulan, of course, uses an "egal" I wrote for Racket:
; Generic equality predicate, based on egal
; http://home.pipeline.com/~hbaker1/ObjectIdentity.html
(define (is? x y)
(if (number? x)
(if (number? y)
(= x y)
#f)
(if (immutable? x)
; TODO: do I need to check if y is immutable too?
(equal? x y)
; need to use eqv? for characters
(eqv? x y))))
In Nulan it's called "is".
The reason multiple equalities came about is because cons cells were mutable, but were treated like as if they were immutable: thus, most of the time you want to use equal on them, but occasionally you need to use eq.
If you instead have immutable conses and mutable conses, "egal" can correctly distinguish between them. Thus the problem is that we have a datastructure (cons) that attempts to both be mutable and immutable at the same time.
The solution is to properly separate mutability from immutability.
---
"Most programmers don't actually use eq that often. Python and ruby and perl still don't have it. No vaguely insulting muttering has resulted. (Is it really a big enough inconvenience to cause even muttered insult?) Most of the time eq is just an optimization, usually a premature optimization, and even otherwise one that can mostly be delegated to the implementation."
Actually, that's incorrect. Most languages have only eq and no equal: the "==" operator in Ruby/Python compares by object reference.
The reason they don't complain about it is because they never use "equal" because in those languages, every object is mutable.
But in Lisps, we treat cons cells as immutable, hence the issues with eq vs equal. So the reason they don't complain in Ruby/Python is simply because their objects are always mutable, thus eq is always the correct operator.
irb(main):001:0> class A
irb(main):002:1> end
=> nil
irb(main):003:0> a = A.new
=> #<A:0x8f6dd3c>
irb(main):004:0> a == a
=> true
irb(main):005:0> a == a.clone
=> false
irb(main):006:0> x = "abc"
=> "abc"
irb(main):007:0> x == x
=> true
irb(main):008:0> x == x.clone
=> true
irb(main):009:0> (1..10) == 10
=> false
irb(main):010:0> (1..10) === 10
=> true
- Perl (near as I can tell) doesn't have anything in the way of user-definable comparison, but still has separate operators. Granted, I don't think the separation is related to the reference-vs-value thing---I don't do Perl much, so the semantics confuse me:
It'll be hard to appeal to the authority/popularity of most languages, because equality is a tricky concept with many meanings---whether or not it's specifically the address-vs-value (eq-vs-equal) thing.
I like egal fine if we decide to tell the type system about immutability, but that seems like a major design decision. I'm not sure this one feature is sufficient reason.
I don't understand the paper very well (thanks for the pointer). I'm not sure why he cares about identity so much if most of the time we're treating lists as immutable. And if a list is mutable equality is not forever, all it returns is whether these two lists are equal right now.
The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time.
I'm going to go reread it, but so far I think it suffers from being written 10 moore's law generations ago, and complects performance considerations with API design.
"Most languages have only eq and no equal: the == operator in Ruby/Python compares by object reference."
$ python
>>> [0, 1, 2] == [0, 1, 2]
True
I'm pretty sure those two literals turn into objects with different 'identities'.
Most of the time I don't care what 'identity' an object has. If I'm doing weird stuff with assignment and mutable structures I'm happy for the language to push back and make me go back to the straight and narrow asap.
"The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time."
Just as a point of interest, Racket handles this with its built-in types:
I wasn't able to post this earlier, due to the Arc Forum being all wonky, so I'll post it now.
---
"I'm pretty sure those two literals turn into objects with different 'identities'."
Weird, I distinctly remember Python returning false for that... I guess I was wrong about == in Python/Ruby.
Oh, I see, I was thinking about the "is" operator, which does return False, but is rarely used. So you were right: Python uses equal by default for arrays/dictionaries. However, that only applies to the built-in types:
>>> class Foo:
... pass
>>> Foo() == Foo()
False
User-created types can add an __eq__ method to implement "equal" testing, but it's not the default.
JavaScript, however, uses object reference for both "==" and "===":
[1, 2, 3] == [1, 2, 3]
[1, 2, 3] === [1, 2, 3]
Which, as far as I'm concerned, is the correct semantic, because arrays are mutable.
"I think I don't understand why object reference is worth a special operator (besides performance reasons)."
The point of "egal" is that you don't have a special operator: you use the same operator for both mutable and immutable objects. It's wart (and Arc/Common Lisp/Scheme) that has a special operator for reference purposes. Just because you call it "addr" rather than "eq" doesn't make it any less of a special operator. And I doubt you put it into wart for performance reasons.
The "egal" article actually defines equality as "operational equivalence":
Two objects are "operationally equivalent" if and only if there is no way
that they can be distinguished, using ... primitives other than [equality
primitives]. It is guaranteed that objects maintain their operational
identity despite being named by variables or fetched from or stored into
data structures. [Rees86,6.2]
In other words, it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object. But this is not possible with immutable objects.
So my stance is simple: immutable objects should be compared by recursing on their components, like how "iso" works in Arc. Mutable objects should compare by reference equality, because the whole point of mutation is that you can change the object without affecting other objects which have the same subcomponents. That has to do with the conceptual model of how the objects are used and has nothing to do with performance.
The reason the article talks about performance at all is because immutability is much preferred in parellel computing, and most Lisps treat cons cells as immutable, but they're not actually immutable, so the parallel system has to have extra infrastructure to support mutable conses, even though they're almost never mutated. In any case, the point the article is making is valid even if you ignore performance:
There are a several problems with EQUAL. First, it may diverge in the
presence of directed cycles (loops) in one of its arguments, although some
(e.g., [Pacini78]) have suggested more sophisticated predicates capable of
detecting such cycles. Secondly, it is not referentially transparent; two
calls to EQUAL on the "same" (i.e., EQ) arguments can produce different
results at different times. Neither of these possibilities is to be desired
in a programming language primitive equality test because we would like such
a test to always return and we would like object identity to be preserved.
Yet EQUAL is an extremely valuable operation, because the vast majority of
Lisp lists are side-effect free--i.e., "pure". Without side-effects, loops
cannot be constructed and sublists cannot be modified, and within these
restrictions EQUAL becomes a well-defined and useful operation.[8]
This tension between the simplicity of EQ and the usefulness of EQUAL has
led to a great deal of confusion. This confusion has now lasted for 30
years, with additional confusion having been recently added by Common Lisp.
Since neither EQ nor EQUAL has the desired semantics for the multiplicity of
Common Lisp datatypes, Common Lisp added six more--EQL, EQUALP, =, CHAR=,
STRING=, and STRING-EQUAL, yet these also lack the correct semantics.[9]
Scheme continues this confusion by offering the three predicates (eq?, eqv?
and equal?) which are roughly equivalent to Common Lisp's EQ, EQL and
EQUALP.
In other words, if you have mutable cons cells that are almost always treated as immutable, you need both eq and equal. But if you create two datatypes: mutable cons and immutable cons, then you only need "egal". With immutable conses, you don't want to use "eq" on them because they're functional: letting you grab the reference of an immutable cons would be an abstraction leak.
But with mutable conses, you do need "eq", because two cons cells that are equal may not be eq. So, if your language distinguishes between mutable and immutable conses, you only need "egal". This not only prevents abstraction leaks, but it also preserves operational equivalence, which means that if "egal" says that two objects are equal, then it's impossible to distinguish them in any way. This is not true with eq/equal.
As a practical example of why you would want that... look at Racket. It has three kinds of hash tables: hash-eq, hash-equal, and hash-eqv. And it has all kinds of issues if you use a mutable object as a key in an "equal" hash:
Caveat concerning mutable keys: If a key in an equal?-based hash table is
mutated (e.g., a key string is modified with string-set!), then the hash
table’s behavior for insertion and lookup operations becomes unpredictable.
So you, the programmer, have to decide every time you create a hash table which kind of equality it should use. And if you choose wrong, you end up with issues. And it also means you can't mix mutable and immutable keys in a hash table. This is ridiculous. With "egal" this isn't a problem.
Notice that none of this has to do with performance. It has to do with the conceptual model of how objects behave.
Thanks for all that detail! Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy.
I think the crux is this: "..it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object."
This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality.
However I have a lot more respect after your comment for how egal can simplify lots of thorny concurrency issues.
This is a matter where Pauan and I agree on the ideal semantics completely. I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal").
That said, I don't care if this functionality is provided as a two-argument function or as (addr ...).
---
"Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy."
I built Cairntaker on the idealistic notion that every single first-class value is a mutable or immutable weak table. That includes the table keys. Cairntaker does make compromises for performance and FFI, but none that betray this ideal.
Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me.
"This is a matter where Pauan and I agree on the ideal semantics completely."
Quite the rare thing indeed.
---
"I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal")."
I personally haven't. I only recently started to actually care about egal because of Nulan using immutable objects.
I do remember us having discussions before about making "iso" the default equality in Arc, which is what wart does, but I don't recall us talking about cons mutability.
However, I can't find any more than that. Maybe I just took it for granted at that point.
It is the policy I applied in Cairntaker. Cairntaker interns all immutable tables so they can be compared for deep equality even after some of their entries have been garbage-collected. Equality support is essential in Cairntaker since any value can be used as a table key.
"Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me."
You silly. Neither JavaScript nor Arc have immutable lists. Well, I guess you could use Object.freeze on an array in JS, but... even if you did that, it would just serialize the array to a string.
Lol, yep. The way I "implement" immutable lists in practical programming is by not modifying them. I rationalize that their static type would enforce this if only there were a type system. :-p
More specifically...
In Arc I can't really rely on using lists as keys, since (IIRC) Jarc converts table keys to strings using the same behavior as 'write. However, in practice I have used lists of alphanumeric symbols, since those have a predictable 'write representation on Jarc, a predictable Racket 'equal? behavior on pg-Arc, etc.
Similarly, in JavaScript I tend to use Arrays of strings, storing them as their ECMAScript 5 JSON representation.
Yeah I take that back about immutable list keys :)
This is all most interesting. I'm going to keep an eye out for nice programs that exploit structural equality on mutable structures. Y'all should do the same!
"This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality."
Then make the objects immutable. Then equal makes sense. Then the compiler/interpreter knows what your intent is. Then you avoid all these issues. Then you don't even need a distinction between eq/equal: you can just use egal.
This seems to me to be the obvious choice, but it seems we disagree.
Yeah, I didn't fully understand egal when I said I agreed with it if the language distinguishes immutable objects. I want my equality to be more lenient in comparing mutable objects. Even with mutable objects it seems a lot more common that I just care about structural equality rather than identity.
Basically. You could drop down and use Racket's "fixnum?" and "integer?" functions, but Nulan doesn't provide them. If you find any way to distinguish them, I'd like to hear it.
I prefer to view integers as being a subset of floating point, so I prefer to see 1 and 1.0 as identical. Is there a reason not to?
Does Nulan give access to 'write? That would distinguish them.
"I prefer to view integers as being a subset of floating point[...]"
I don't know about that. What happens with very large, but exact, integers?
---
"[...]so I prefer to see 1 and 1.0 as identical. Is there a reason not to?"
Actually I think there is a reason. We're talking about exact and inexact numbers here. If you want to know that a number is exactly 1, then 1.0 doesn't give you enough information. :-p
Personally, I think floating point numbers mostly get in the way. I'd rather treat them as a separate concept, even if it means using a special +. operator like OCaml.
I don't see much need for complex numbers, rational numbers, or negative numbers either, but at least their addition and multiplication are still associative.
"Does Nulan give access to 'write? That would distinguish them."
Not if I change printing to always use floating point, or print exact floating points as integers. :P I actually did something like that with Arc/Nu: I was tired of typing in (/ 1 5) and getting back 1/5, so I changed Arc/Nu to print exact numbers as if they were inexact, so it would print 0.2. This only affects the printing, not the actual math stuff.
---
"I don't know about that. What happens with very large, but exact, integers?"
I dunno, what happens in Racket?
---
"Similarly, in JavaScript I tend to use Arrays of strings, storing them as their ECMAScript 5 JSON representation."
Actually, this is going a bit off topic, since it doesn't especially matter to me whether exact integers are a subset of inexact numbers. I thought I even deleted this part of my reply, but oh well. XD
Looks like the racket docs say[1] Racket's inexact numbers are IEEE floating-point numbers of single or double precision, with double being the default. So they don't preserve integers beyond 53 bits:
Odd, this suggests Racket double-precision floating-point numbers have only 51 bits of mantissa rather than 52, so maybe they actually only preserve integers up to 52 bits...?
I "solved" this issue by simply making all numbers 64-bit floating point. The change was really easy to make. I didn't do it because of this issue, though. I did it because I want to eventually compile to JS, and JS only has 64-bit floating point.
Though, it also makes it easier conceptually to only have to worry about floating point, as compared to floating point + integer + rational + complex + whatever else Racket has. Even if you divide numbers into exact/inexact like Racket does, that's still more complicated than only having a single numeric type.
No, they certainly aren't. Just like you can have dynamic/lexical function scope, you can have dynamic/lexical global scope. There's no difference between global/local. There is a difference between lexical/dynamic.
Most languages (that I know of and use) have lexical scope for functions but dynamic scope for globals. And I've found that all the issues with name collisions are traced back to this global dynamic scope.
If you instead make the global scope lexical, there are no more namespace issues.
"Do you mean allowing globals to be lexically bound?"
I mean that the default binding for globals should be lexical. Arc and JavaScript do not have lexical globals. At all. Ever. It's just not possible.
Racket's modules use dynamic binding inside the module, but lexical binding outside. This is reasonable, but Racket modules are clunky and big and complicated and such. And they disallow certain dynamic features that I like, such as assigning to a variable from another module.
Racket modules are hard to implement, hard to understand (once you get down to the nitty gritty), and they're much too rigid and staticy.
I want something that's simple to implement, simple to understand, concise, flexible and dynamic, no boilerplate, and yet has all the power of Racket's modules. Using lexical global scope + a few utilities can achieve that.
---
"Is it a bad idea?"
I think dynamic scope is both useful and practical. I just don't think it should be the default. So, yes, having a distinction between "let" and "making" is good. But lexical should be the default.
"No, they certainly aren't. Just like you can have dynamic/lexical function scope, you can have dynamic/lexical global scope. There's no difference between global/local. There is a difference between lexical/dynamic."
We're arguing semantics here, but the way I see it, "lexical" means "local to a chunk of code," and "dynamic" means "local to a run time stack frame and its children." "Global," on the other hand, means "local to nothing."
While I'd put away these definitions and accept yours for the sake of argument, I'm having trouble figuring out what yours even are, and it looks like akkartik is too.
---
"Racket's modules use dynamic binding inside the module, but lexical binding outside."
Huh? If you could explain this one sentence in terms other than "dynamic" and "lexical," maybe that explanation could clarify what your definitions are all on its own. No promises though. ^_^;
---
"And they disallow certain dynamic features that I like, such as assigning to a variable from another module."
Racket modules don't provide that feature by default, but the person defining the module does have the option to provide it. This makes it easy for the author of an existing module to introduce mutability without breaking backward compatibility. Since you like immutability by default anyway, I'm surprised you take the opposite stance for module-level variables.
Hmm, actually, my definition of "dynamic" scope might be "not generally possible to associate a usage site with a particular definition site at compile time." I've referred to mutability as a kind of dynamic scope before.
This definition connects with the "not at compile time" meaning of the word "dynamic," and I wonder if it was one of the main complaints lexical scope advocates had against the status quo. I seem to remember reading about that history somewhere, but I don't have a citation, so I could just be making up rumors. :-p
* [1]: Because I have had a hard time using the Arc Forum lately, akkartik, myself, and rocketnia had an e-mail conversation. I compiled this conversation into an HTML page where it can be read:
rocketnia: "Hmm, actually, my definition of "dynamic" scope might be "not generally possible to associate a usage site with a particular definition site at compile time." I've referred to mutability as a kind of dynamic scope before."
I'm fine with that definition.
---
rocketia: "We're arguing semantics here, but the way I see it, "lexical" means "local to a chunk of code," and "dynamic" means "local to a run time stack frame and its children." "Global," on the other hand, means "local to nothing.""
I'm not a fan of those definitions, in large part because the way I'm implementing Nulan is based on first-class environments, and thus "dynamic" means "evaluated in the environment where the function is run" and "static/lexical" means "evaluated in the environment where the function is defined".
I think that's more useful conceptually, rather than bringing in things like stack frames and a distinction between global/local: Nulan treats global environments exactly the same as local environments.
---
rocketnia: "Racket modules don't provide that feature by default, but the person defining the module does have the option to provide it. This makes it easy for the author of an existing module to introduce mutability without breaking backward compatibility. Since you like immutability by default anyway, I'm surprised you take the opposite stance for module-level variables."
Yes and I don't like having to go through those extra hoops. I also dislike the inconsistency. Which is why Nulan uses purely static environments everywhere, and it lets you assign to variables defined in another module. I think this is simpler to reason about and is more powerful too.
As for immutability... that's not correct. I don't value immutability by default. I value immutability for practical reasons. All of Nulan's datatypes (cons, object, number, environment, etc.) are immutable because I've found that to be remarkably useful. But I've come to the conclusion that making variables mutable by default (like Racket and Arc do) is perfectly fine, from a practical perspective.
And even if I did decide to make variables immutable by default, I'd still need to provide some way to easily create mutable variables in the situations where they're useful. And I'd like to allow for modules to modify these mutable variables even if they're defined in a different module.
You might be thinking, "but what if I want to write a module that doesn't allow for mutation?!" My solution is very simple:
(var foo 5)
... do stuff with foo ...
(const foo foo)
The "var" function creates a mutable variable and the "const" function creates an immutable variable. So, the above first creates a mutable variable, which can be mutated by the module. At the end of the module, it replaces it with an immutable variable, and due to the static global scope, this works out perfectly fine.
In fact, you could make this even stricter and delete the variable entirely, effectively making it private to your module:
(var foo 5)
... do stuff with foo ...
(unvar foo)
And now you have created a global mutable variable "foo" which is private to your module. And once again, this works just fine because the global variable is static. And if you would prefer to not put "unvar" at the end of the module, you could use something like "exclude":
(exclude {foo}
(var foo 5)
... do stuff with foo ...)
I think this is a better tactic than Racket's "you have to jump through some extra hoops to mutate things".
---
akkartik: "I don't understand how a global binding can have a lexical scope if it's not attached to a block. Do you want it attached to a file/module/namespace by default?"
Sorta. In this thread[1] I already explained my views, but to summarize, I think functions should only be able to refer to variables that exist at the time the function is defined. In other words, no late binding of global variables.
Alternatively, you could go with Racket's system which uses lexical scope at the module level, and uses modules for everything by default. That works just fine too. The important thing isn't which particular lexical scope system your language uses, the important thing is that it doesn't use dynamic scope by default.
This is an extremely clarifying post. I think it would have headed off a few of the misunderstandings we had over email. XD
Now to reply to the later parts of that email exchange...
---
Me: "Pauan, when you show code examples that demonstrate how you'd deal with certain library dependency corner cases in Nulan, are you considering that the application developer can't go in and change the code of the existing libraries?"
Pauan: "Yes, definitely."
Then I'll respond to your examples individually.
Regarding "two libraries [...] use the same namespace and have name collisons," we've already started to talk about this case, so I'll skip it here and address it below.
Regarding "two libraries, each of which use different, incompatible versions of some third library," the code you posted doesn't comprise a solution. How do you keep library1 and library2 from using the same version of library3?
Regarding "they'd like the new library to be able to occupy both orignal spaces," your code is again incomplete. If the two originals are library1 and library2, an intermediate library library3 depends on library1, and the new one is library4, then how do you get library3 to start using library4 instead of library1? Keep in mind you can't modify any of the libraries.
Regarding the licensing example with "they'd like it to be a drop-in replacement [...] An application I'm working on uses two libraries, one of which requires the replacement library [...] The other [...] uses the original basically by default": I think they're asking for a mechanism that lets the application choose whether the "other" library uses the original or the replacement. I don't see how your code makes this possible, and your "this could instead import replacement" comment suggests that you'd actually modify library2.
Regarding "importing a package should only introduce the names that library explicitly exports," I agree you've got the important parts of this covered.
Regarding "conrol the names you import [...] such that [...] a new export to an existing library will not affect the meaning," it looks like you do have a good response to this. Personally, I wouldn't want there to be any imports that could possibly break that way, but I guess I'm more picky than they are. :)
---
Pauan: "That's the point of lexical scoping: they can use the same names and
not collide with eachother."
You missed my point. If the two libraries really use the same namespace, your code for importing both of them will turn into something like this:
(import library1)
(import library1)
...
Obviously this extrapolation of your code is a degenerate case, and I bet you wouldn't really suggest it. But the challenge itself is degenerate in exactly this way.
I often consider what would happen if two versions of a single library were both useful in a single application. If the library author didn't anticipate and avoid self-conflict, the two versions would host a train wreck of collisons on every single name, even the library name. I think that's the kind of challenge Casey McMann has in mind here.
I fail to see the issue. In library1's lexical scope it will use library3-version1, and in library2's lexical scope, it will use library3-version2. I have no clue what issue you're thinking of.
---
rocketnia: "Regarding "they'd like the new library to be able to occupy both orignal spaces," your code is again incomplete. If the two originals are library1 and library2, an intermediate library library3 depends on library1, and the new one is library4, then how do you get library3 to start using library4 instead of library1? Keep in mind you can't modify any of the libraries."
In other words, you have two libraries. Then they create a third library which merges the two libraries together. Existing code that relies only on library1 will continue to work. Existing code that relies only on library2 will continue to work. New code that wishes to use either library1 or library2 can instead use library3, which combines both library1 and library2.
There's no need to change application1 and application2: they can just keep using library1 and library2. You might say that's inefficient and you'd rather them use library3, but that's up to them whether they want to switch or not. I'm also imagining the situation that library3 actually does cause some backwards compat issues, in which case it would be desirable to keep using library1/library2. The application chooses which libraries to import.
You seem to be suggesting some mechanism in which a module can actually modify what a different module imports and exports. I fail to see the benefit in that: that module's code is written under the assumption that its imports won't be changing. Could you provide an example where that would be useful?
---
rocketnia: "Regarding the licensing example with "they'd like it to be a drop-in replacement [...] An application I'm working on uses two libraries, one of which requires the replacement library [...] The other [...] uses the original basically by default": I think they're asking for a mechanism that lets the application choose whether the "other" library uses the original or the replacement. I don't see how your code makes this possible, and your "this could instead import replacement" comment suggests that you'd actually modify library2."
No I do not think that's what they're asking. I think what they're asking is: can the original library coexist at the same time as the replacement, even though they have name collisions. Notice they used the word "reconcile". This suggests reconciling the name collisions. There's no need to "reconcile" the other library to use the replacement: it can just keep using the original.
But if I'm wrong and they were in fact not talking about name collisions at all but about the orthogonal issue of replacing another module's imports... well, no, Nulan doesn't let you do that, because I don't see any useful benefit from doing so.
In any case, that's a completely different issue from avoiding name collisions, and seems to me like it'd be a part of some sort of build system or dependency resolution system, like RubyGems. That might be useful, but I don't see it as being a part of the namespace system per se.
---
rocketnia: "Regarding "conrol the names you import [...] such that [...] a new export to an existing library will not affect the meaning," it looks like you do have a good response to this. Personally, I wouldn't want there to be any imports that could possibly break that way, but I guess I'm more picky than they are. :)"
Yup, I try to be pretty laid back (at least about my code). If you're anal about it, you can use "include", but I think it's nice to have the option of just doing unqualified imports. It's up to the library/application author, in my opinion.
---
rocketnia: "You missed my point. If the two libraries really use the same namespace, your code for importing both of them will turn into something like this:"
Nope, I don't think namespace == module. You are importing the same module twice, but it's in the same namespace. And if you did this:
(import library1)
(import library2)
You're importing two different modules, but they're in the same namespace. In other words, "library2" can see the stuff exported by "library1", even though it didn't import it.
You might say this is undesirable. I agree it can be undesirable in certain situations, and I think the solution is to import each library in a separate namespace. But hyper-static scope can be implemented with a single namespace, so whether it's single-namespace or multi-namespace is an orthogonal issue.
---
rocketnia: "I often consider what would happen if two versions of a single library were both useful in a single application. If the library author didn't anticipate and avoid self-conflict, the two versions would host a train wreck of collisons on every single name, even the library name. I think that's the kind of challenge Casey McMann has in mind here."
In that case whichever one is defined last "wins", and you can use "include" and "exclude" to control which ones you want. I could also provide a way to import a library into a separate namespace to solve that case:
I think that's useful, but hyper-static can be implemented without it. My point is that even if a language doesn't provide include/exclude/import-as, it'd still be better off using hyper-static rather than nothing at all.
"In library1's lexical scope it will use library3-version1, and in library2's lexical scope, it will use library3-version2."
I'm assuming the example is supposed to look like this instead:
; library1 (coded with library3's version 1 in mind)
(import library3)
; library2 (coded with library3's version 2 in mind)
(import library3)
Each of these libraries would work with a certain library3, but library1 might not work with the newest version, or (in my dream world where everything is perfectly forward-compatible) maybe library1 is just considerably more performant with the old version.
---
"You seem to be suggesting some mechanism in which a module can actually modify what a different module imports and exports. I fail to see the benefit in that: that module's code is written under the assumption that its imports won't be changing. Could you provide an example where that would be useful?"
Here's Casey McCann's full description of the challenge again, just for clarity's sake:
Casey McCann: "At some point, two popular libraries with related functionality get tired of duplicating effort, so they collaborate to produce a third library providing the best of the shared features. To avoid unnecessarily breaking client code with the new versions, they'd like the new library to be able to occupy both original namespaces, or otherwise be easily substituted in. How would the namespace system handle this?"
I had it wrong before, when I made the sweeping statement that only the application's code should have to change in these challenges. This challenge is a case where the library developers are trying to change their code without causing disruption for anyone else.
Ideally, the only thing an application developer should have to do to upgrade is to replace the existing two libraries with the new one. This is even something that an end user might handle using a package manager.
---
"I think what they're asking is: can the original library coexist at the same time as the replacement, even though they have name collisions. Notice they used the word "reconcile". This suggests reconciling the name collisions. There's no need to "reconcile" the other library to use the replacement: it can just keep using the original."
I think the point is that the library has been upgraded, and the library developer would like the upgrade to propagate to all code that uses the library, even though a few library users might actually prefer otherwise. This is a conflict of preferences, and the language might give tools to certain participants to "reconcile" this conflict.
I prefer to view this as a last resort. The module system should give a complete(-feeling) range of freedom to every participant, and the ranges should not overlap. I would allow upstream developers to propagate their changes only as far as downstream developers have opted in. This turns into a design for package versioning.
---
"Nope, I don't think namespace == module."
I agree. The challenges are supposed to be realistic cases where name collisions would actually occur and cause havoc, so naturally they mostly involve programs made up of parts that have been written by multiple people independently. So we should expect to consider modules here, but yeah, the modules don't have to correspond one-to-one with namespaces.
However, I think Casey McCann is assuming a namespace system built on implicit name prefixes, which may or may not incorporate a way to take a name prefix and find a module that addresses it. While this may be a fine way to think of namespaces Java, Racket, and Haskell, the challenges which arise from this assumption may not apply to Nulan. No worries. :)
That said, if you change Nulan's design in order to accommodate some of these other challenges, you may come back to this one and find you've broken it. That is to say, this is still a meaningful test case even if it passes at the moment.
---
"My point is that even if a language doesn't provide include/exclude/import-as, it'd still be better off using hyper-static rather than nothing at all."
By "nothing at all," do you mean a late-bound global scope like Arc's? I think I agree with you.
"I'm assuming the example is supposed to look like this instead"
Well, that either won't work (it would load the same "library3" in both cases), or it would work with a sufficiently smart lookup algorithm.
Basically, when importing a file, I think it should look for it in this order: the cwd, the folder (and subfolders) where the importee is defined, then lastly the folder where Nulan itself is defined.
In that case, assuming library1 and library2 are in different folders and have different versions of library3 in their path, it would work.
This implies bundling the libraries you use with your app, which you pretty much have to do anyways.
In that case, this has nothing to do with name resolution, it has to do with file resolution: finding which file to import when given a name.
You could argue that's a good case for a package manager like RubyGem, and that may be true, but I consider such a system to be in a layer above the namespace system.
---
"This challenge is a case where the library developers are trying to change their code without causing disruption for anyone else."
And Nulan's namespace system handles that just fine.
---
"I think the point is that the library has been upgraded, and the library developer would like the upgrade to propagate to all code that uses the library, even though a few library users might actually prefer otherwise. This is a conflict of preferences, and the language might give tools to certain participants to "reconcile" this conflict."
I still don't get the impression they were talking about some sort of package management or version dependency resolution system. It's certainly okay to discuss such things, but I think they're outside the scope of a namespace system. That thread (and post) seemed to me to be talking specifically about resolving naming conflicts.
I think the talk about "multiple libraries" and "multiple versions" was just to create a situation where naming conflicts occur. I don't think it was meant to imply that the goal is to be able to easily change library versions on the fly, like you were discussing.
If I wanted some sort of system like that in Nulan, it wouldn't be particularly hard to do... you'd create a new file, like say... "dependencies.nu" and you'd then put a bunch of imports into it:
(import foo)
(import bar)
(import qux)
That's all it would be: a list of imports. Then all the dependencies are in one place, rather than scattered around in various files. I personally don't see much benefit to that, but you could do it if you wanted to.
---
"However, I think Casey McCann is assuming a namespace system built on implicit name prefixes, which may or may not incorporate a way to take a name prefix and find a module that addresses it."
That is quite possible.
---
"By "nothing at all," do you mean a late-bound global scope like Arc's?"
Yes. And JavaScript. Basically any language with dynamic scope at the global level by default.
Me: "This challenge is a case where the library developers are trying to change their code without causing disruption for anyone else."
You: "And Nulan's namespace system handles that just fine."
What I meant was that the library developer wants our applications to use their new library version without any involvement of the application developers. I seriously doubt that's what you're claiming to support.
---
"You could argue that's a good case for a package manager like RubyGem, and that may be true, but I consider such a system to be in a layer above the namespace system."
I think the only place we really disagree now is that you're content to neglect the "RubyGem" parts of these challenges for the purposes this discussion, while I consider them the hardest parts. I think if you do try to design a package manager, you may very well get most of a namespace system out of the deal, or even a whole programming language.
Unfortunately, package managers and filesystems aren't quite a full layer above the languages they're used with. Program units tend to link together using hardcoded filenames or package names, a practice that ties some of the filesystem or package manager complexity into the complexity of the language proper.
The module system I've been working on recently (ever since [1]) is meant to help resolve that, by making the non-language-specific complexity as simple and universal as possible. It's based on two notions: A program is a multiset of independently maintained extensions, and it exists in the context of an expansive historical record containing all knowledge ever discovered (or at least all knowledge we have on hand).
The extensions and units of background knowledge may ultimately be encoded as files and managed at a "layer above" using traditional package managers and filesystems. However, they may also be manipulated as first-class values in a host program, or even as visual elements in a UI.
"In that case, this has nothing to do with name resolution, it has to do with file resolution: finding which file to import when given a name."
As an argument in favor of this viewpoint, I'll quote the original challenge:
"Now I'm writing a piece of code that uses two libraries, each of which use different, incompatible versions of some third library, causing namespace collisions between the two versions. How does the namespace system handle this?"
The challenge seems to imply that the two incompatible libraries were loaded successfully: it's NOT talking about file resolution. It's assuming the file resolution proceeded successfully, and thus the only issue remaining is resolving the name collisions.
You keep talking about things like dependency resolution systems, swapping in different libraries, etc. That's fine, but that does not seem to be what the challenges are talking about. I'd rather see some challenges specifically geared toward dependency resolution and file loading rather than trying to hijack namespace challenges, since I see these two issues as being orthogonal, with maybe a little overlap.
"The challenge seems to imply that the two incompatible libraries were loaded successfully: it's NOT talking about file resolution. It's assuming the file resolution proceeded successfully, and thus the only issue remaining is resolving the name collisions."
From what you quoted, technically that could be true.
---
"You keep talking about things like dependency resolution systems, swapping in different libraries, etc. That's fine, but that does not seem to be what the challenges are talking about."
Objection! :)
The challenge we're talking about puts the words "different, incompatible versions" in italics. If direct substitution were unimportant, the challenge would not even need to use the word "version"; it could just say there were two separate libraries with name collisions. And yet that's precisely the scenario of the previous challenge.
"What I meant was that the library developer wants our applications to use their new library version without any involvement of the application developers. I seriously doubt that's what you're claiming to support."
I don't get the impression they're saying, "we release this new library and all the users of the existing libraries automatically switch to it". I think what they're saying is, "this library can be used by existing applications, but they still need to choose to use it". That's how I interpreted the "drop-in replacement" phrase.
Technically speaking Nulan can support that... you would change library1 and library2 to just import library3:
Then the applications don't need to change at all. That's assuming of course that the developers of library3 can change library1 and library2. If they can't, then oh well, my solution is for the applications to manually update to use library3 if they so wish. In that semi-contrived example it doesn't matter because library3 is a superset of library1 and library2, so applications can just keep using library1/2 just fine.
---
"I think the only place we really disagree now is that you're content to neglect the "RubyGem" parts of these challenges for the purposes this discussion, while I consider them the hardest parts. I think if you do try to design a package manager, you may very well get most of a namespace system out of the deal, or even a whole programming language."
They are the hardest parts, I just don't think it has to do with namespace support, which that entire topic was devoted to. Nor am I particularly interested in designing a package manager.
So my suggestion still stands: if you feel this strongly about package managers, make up your own semi-contrived challenges that try to demonstrate what a package manager should be capable of doing. I'm sure it will include all kinds of stuff that was left out of the namespace challenges, because package managers are much more complex and involved than namespaces.
Then you can point to those challenges and I can say, "Nulan doesn't support that stuff" rather than trying to insist that Nulan is failing the namespace challenges because you're interpreting them far larger than what they actually said. I think it's useful to have "package manager challenges", but I think it's also useful to have "namespace challenges". I don't see why we need to throw out the namespace challenges just because you view it as "not as hard" as package managers.
---
"The challenge we're talking about puts the words "different, incompatible versions" in italics. If direct substitution were unimportant, the challenge would not even need to use the word "version"; it could just say there were two separate libraries with name collisions. And yet that's precisely the scenario of the previous challenge."
Like I said, I don't get the impression they're talking about package managers. I think they used the phrase "different incompatible versions" because that's a situation that can actually occur in real life. They're using it as an example, that's all.
It's obvious that you want to think in terms of package managers, and that's fine, but I really do not get the impression that's what they're talking about. Ultimately, it seems the only way to resolve this is to actually ask them what they meant.
Oh, I can understand that point of view. Just because there were multiple examples doesn't mean each one introduced new requirements.
---
"It's obvious that you want to think in terms of package managers, and that's fine, but I really do not get the impression that's what they're talking about."
I just wanted to help you elaborate on the claims you're making about Nulan's namespace system. You originally quoted Casey McCann's examples, but I perceived them differently than you did, and I had trouble understanding your perception from your terse example code.
Now this seems to be resolved. We may not agree in our interpretations, but now I understand your claims relative to both of them. Thanks for your patience with me. ^_^
---
"Ultimately, it seems the only way to resolve this is to actually ask them what they meant."
I don't think that's necessary, but it would be interesting to know.
And here are some replies to the "meta" part of that email exchange:
---
akkartik: "Thanks! I find myself wishing we could put Ross's long response in the 'namespace' section of the page rather than the 'meta' section."
I was conscious of that when I sent that reply. I'm not sure how to reply to a particular "subthread" when it's email. >_<
The only time I've ever seen email become a threaded tree is in mailing list archives, and I kinda assumed that had to do with automatic detection of quotations or something.
---
akkartik: "(I'm not actually expecting you to do this, just thinking aloud about how someone could maintain hygiene when managing a forum.)"
The difficulty here isn't managing one forum, but managing a conversation that spans multiple communication media. Unfortunately, I don't expect there to be a nice bridge that eliminates more complexity than it introduces. That kind of synergy tends to rely on foresight on the part of both systems.
Actually, email supports threaded trees using the References: header. Every message adds the ids of all messages it's under, and it turns out that is enough information to recreate the tree structure. Text-mode email clients like pine and mutt use this.
Given my history with those programs, I'm anal about hitting reply on the precise message in a thread that I'm replying to :) That helps list archives get the tree structure right even though it's invisible over gmail. So in your place I'd have split my message into parts and attached them to the right messages in the thread :) I know, it's utterly ridiculous and irrational.
I don't understand how a global binding can have a lexical scope if it's not attached to a block. Do you want it attached to a file/module/namespace by default?
I think conses are silly and Lisps should use objects instead!
If you insist on using cons, at least make everything immutable, in which case you don't even need copy at all. Yup. That's my stance: accidentally immutable all the things.
---
"The drawback of a deeper copy is that the above function needs to be inlined into arc's list function, which is currently [...]"
pg actually says in the arc.arc source code:
; Can return to this def once Rtm gets ac to make all rest args
; nil-terminated lists.
; (def list args args)
That definition works just fine in Arc/Nu. If it doesn't work right in Anarki, I'd say that's a bug with their compiler that should be fixed. As an added bonus, in Arc/Nu, the above is over 3 times faster than Arc's definition of copylist.
---
"I just noticed today that arc's copy had that additional semantic of merging in new args, and it turned out to be easy to add. Can Nulan add it cleanly? :)"
I'm not sure what you're talking about. Could you clarify please?
def (copy x ... merge_args) :case merge_args
ret result copy.x
each (k v) pair.merge_args
result.k <- v
I assume Nulan's way of doing extensible copy would be to call an object's %copy method or something like that. But would it be able to handle if the signature for copy changed, or would you have to go update all the %copy methods?
"I assume Nulan's way of doing extensible copy would be to call an object's %copy method or something like that. But would it be able to handle if the signature for copy changed, or would you have to go update all the %copy methods?"
Well, there's no need for copy since everything's immutable in Nulan... but assuming I did want copy, yes there would be a %copy method for objects. And yes, if you changed the signature for the %copy method you'd need to change all the objects.
That's because methods form an interface: if you fulfill the interface, you get the functionality. So naturally if the interface changes, any code that uses that interface will need to change. If you're really worried about backwards compat, you could instead create a new %copy2 interface, etc.
But the copy function itself is not the same as the %copy method. So even if the copy function changed, it might not require changes in the %copy method. It depends on what the changes are.
Looking at that function... it looks like you're saying that this:
(copy x 'a 5 'b 10)
Would copy "x", and then assign "a" to 5 and "b" to 10, correct? That change would be trivial to do in Nulan, without affecting the %copy method at all. I guess it'd look something like this:
(def copy -> (let x [ %copy f ]) @r
(let x (f x)
(each (pair r) -> [k v]
(set! x k v))
x))
Not so different from in wart, eh? Though I'd prefer to use an immutable version that doesn't use set! and boxes. Maybe something like this instead:
(def copy -> (let x [ %copy f ]) @r
(foldl (f x) (pair r) -> x [k v]
(set x k v)))
By the way, in case you're wondering... the %copy signature is this here:
(f x)
Yeah, that tiny little thing. What that does is it takes the %copy method and calls it with the object as its first argument. That's it. As long as that doesn't change, then there's no need to update the objects.
So, if we wanted to add in some new copy functionality that required us to pass more information to the %copy method, we'd need to change the copy function to pass more arguments to the %copy method, which would require changes to all the objects that use the %copy method.
But in this case, we don't need to pass any more arguments to the %copy method so nothing changes except the copy function itself.
And the copy function uses @r which means "0 or more extra arguments", so it works just fine in the 1-argument case. Thus, this change is backwards compatible in every way (that I care about).
(def or
[ %pattern-match (vau ~ {~ f e p v}
(any p -> x
(on-error (f e x v)
%pattern-match -> ~ %f))) ])
More complicated. Anyways, it chooses the first one that matches, and because of the way pattern matching is implemented (with first-class environments), it automatically does The Right Thing(tm) with duplicate names:
(var (or {a} {a b}) {1 2})
---
I decided to look up how Racket does pattern matching, so I could see how its implementation of "or" differs from mine. I wasn't able to make heads or tails of how Racket defines "or", but I did find this:
(and pat ...) — matches if all of the pats match. This pattern is often used
as (and id pat) to bind id to the entire value that matches pat.
"I just made an interesting discovery: "and" is a superset of Haskell's "as" pattern"
How so?
The way I like to look at it, the correspondence between patterns and expressions is based on one being the inverse of the other. If a value is matched and then reconstructed using the same code, it should become the same value.
For example, since (cons a b) as an expression takes two inputs and gives one cons cell as output, its pattern version takes one cons cell as input and gives two outputs.
The pattern version of (and ...) would take one input and give at least one output. (I'm ignoring zero-argument (and) here.) If the input is nil, the outputs can be picked arbitrarily from the space of all possible values, as long as at least one of them is nil. If the input is not nil, then the last output will be equal to that value, and the other outputs can be any arbitrary non-nil values. Thanks to all this nondeterminism, an 'and pattern isn't exactly useful or even meaningful.
One refinement of this interface is to pick the outputs such that they're all equal to the input. That seems to be what you're going for. However, there are two caveats:
- The same implementation fits the requirements for a pattern version of 'or, so why choose one of these names over the other?
- If pattern-and's outputs are always equal, then expression-and's inputs should always be equal too, right?
We can dodge both of these conundrums by defining 'as as its own operator and just giving up on finding inverted counterparts of 'as, 'and, and 'or. It's okay for some computations to be non-invertible--unless, of course, the language design demands otherwise.
---
Besides inverting expressions, you and Racket (not to mention my past self) do some extra things with pattern matching beyond inversion: You allow the same variable to appear more than once in a pattern, and you allow backtracking when a match fails.
If the variables are seen as logic variables, then multiple variable occurrences make sense as multiple constraints to be unified. Racket and Nulan operations can only operate on values that have been determined up to equality, so the logic variable semantics degenerates into an equality test.
In another language, (and ...) as a pattern might constrain its arguments without actually making an arbitrary choice. (At this point it's even tempting to remove notions of "input" and "output" and just reason about a constraint graph, but I don't know how to work that way. In my vague thought experiments, the extra polarity comes in handy for choosing orders of evaluation and such.)
---
"Oh, and, here's "or""
I'm suspicious of the name "or," but if you understand what I'm saying about "and" above, I don't have anything more to say on that here.
With this alternation operator, you (and Racket and my past self) introduce backtracking at the level of subpatterns, rather than just at the top level of the pattern.
This is a bit curious for syntax purposes, since different branches impose different constraints on the logic variables. In particular, suppose one branch binds foo and another branch doesn't. If we end up in the branch that doesn't, then foo will be fully unconstrained.
Personally, I would expect this unconstrained foo to shadow nonlocal variables named "foo" just as if it were constrained to a particular value. However, if I tried to access it, I would expect some kind of uninitialized variable error. Is your version of "The Right Thing(tm)" the same as mine?
I've been thinking about as patterns for weeks. Oddly enough, I tried to think up a way wart could cleanly express them using or, which is how I now denote param aliases.
For fun, here's Arc-style optional arguments using "o":
(def o
[ %pattern-match (vau e {~ f d {p o} v}
(if v
(f d p v)
(f d p (eval e o)))) ])
> (var (o a 5) 2)
a -> 2
> (var (o a 5) %f)
a -> 5
There's one problem, though... I'm not sure how to get this to work in a clean fashion:
> (var {a (o b 5)} {1})
a -> 1
b -> 5
> (var {a (o b 5)} {1 2})
a -> 1
b -> 2
But once I figure it out, it'll be awesome. Not only was I able to add in optional destructuring in Nulan itself, but there's no ambiguity when destructuring a list. In Arc, you can't destructure a list where the first element is "o", but in Nulan you can:
{o b 5} is not the same as (o b 5)
So Nulan's pattern matching is extensible, hygienic, and avoids the ambiguity problems of Arc's destructuring. I like to think of Nulan as being a "better" version of Arc: I try to keep the same fluid and liberal flavor, but fix some glaring issues.
My plan with patmac.arc was to support (o a 5) as one of many sequence parsing utilities inspired by regular expressions. The point is, this particular pattern doesn't describe a value; it describes a subsequence of length 0 or 1.
If you want this to work as part of {a b c} list patterns, I recommend somehow making sure all value patterns can be coerced to (or otherwise interpreted as) length-one subsequence patterns.
---
In other news, this kind of (o a 5) syntax means you're putting expressions inside your patterns inside your expressions. You may need to ask yourself just what variables should be in scope in that inner expression; do they include some variables "previously" bound in the pattern? There was no need for this notion of "previously" until now.
I don't especially like the idea of infix operators breaking up juxtaposed identifiers like "a b". Infix operators are visible boundaries, and juxtaposition has no boundary except whitespace.
However, this is lisp syntax. Juxtaposition is an element separator, not an operator of its own. Ultimately, I don't know where I stand on this. :)
---
"And then "is" would implement pattern match behavior."
Because of my preference for patterns to be inverted expressions, I would rather (is b 5) match only t or nil. If the input is t, then b will be bound to 5. If the input is nil, b will be bound to anything but 5.
Not that I would have any use for such an operator.
---
The inverse of the optional arg pattern (o b 5) would be a splicing(!) expression which usually returned a length-1 subsequence containing b, but might(!) instead return a length-0 subsequence if b were 5. I wouldn't have a use for this operator either.
Actually, it might not be as simple as that. Should the expression 5 become the pattern 5 when we invert (o b 5)? What would that mean?
"The way I like to look at it, the correspondence between patterns and expressions is based on one being the inverse of the other. If a value is matched and then reconstructed using the same code, it should become the same value.
For example, since (cons a b) as an expression takes two inputs and gives one cons cell as output, its pattern version takes one cons cell as input and gives two outputs."
I think you're thinking in terms of Haskell pattern matching. I view pattern matching as being more general and dynamic, not exclusively about destructuring the components of data. I see "data destructuring" as being one kind of pattern matching. Nulan supports this kind, of course, but also supports other forms too. For instance, regexps are a form of pattern matching that slices and dices strings, but you can't recreate the string from the regexp.
---
"The pattern version of (and ...) would take one input and give at least one output. (I'm ignoring zero-argument (and) here.) If the input is nil, the outputs can be picked arbitrarily from the space of all possible values, as long as at least one of them is nil. If the input is not nil, then the last output will be equal to that value, and the other outputs can be any arbitrary non-nil values. Thanks to all this nondeterminism, an 'and pattern isn't exactly useful or even meaningful."
That's not how Nulan's pattern matching works. At least, for now. The point of Nulan's pattern matching is simply to take an environment, a pattern, and some blob of data and return an environment that possibly binds new variables to the data. This means a pattern could insert arbitrary stuff into the environment, or return an empty environment, etc.
I'm not big on the whole staticy guarantee thing, where you try to prevent bad stuff from happening. So my stance is, "well, if you wanna do that stuff, go ahead, even though I personally don't see much use for it." And if somebody happens to write a pattern matcher that does funky stuff, I can simply choose to not use it. That's what modules are for: to decide which things you want to use and not use.
"One refinement of this interface is to pick the outputs such that they're all equal to the input. That seems to be what you're going for."
That is kind of the point of "and", yes.
---
"The same implementation fits the requirements for a pattern version of 'or, so why choose one of these names over the other?"
No it doesn't. "or" matches only a single pattern, the first one that works. "and" always matches all the patterns, or fails. This is analogous to how "or" and "and" work in pretty much every programming language.
---
"If pattern-and's outputs are always equal, then expression-and's inputs should always be equal too, right?"
I have no clue what you mean by expression-and. I guess you're talking about using "and" in a non-pattern context. Right now, "and" behaves like it does in Arc (and many other languages). Pattern matching and calling are two separate things and have two separate implementations.
---
"We can dodge both of these conundrums by defining 'as as its own operator and just giving up on finding inverted counterparts of 'as, 'and, and 'or. It's okay for some computations to be non-invertible--unless, of course, the language design demands otherwise."
I see no reason to define "as" as its own operator since I can simply use "and" to get the same effect. I guess "as" might be slightly more self-documenting.
---
"Besides inverting expressions, you and Racket (not to mention my past self) do some extra things with pattern matching beyond inversion: You allow the same variable to appear more than once in a pattern, and you allow backtracking when a match fails."
Yes.
---
"If the variables are seen as logic variables, then multiple variable occurrences make sense as multiple constraints to be unified. Racket and Nulan operations can only operate on values that have been determined up to equality, so the logic variable semantics degenerates into an equality test.
In another language, (and ...) as a pattern might constrain its arguments without actually making an arbitrary choice. (At this point it's even tempting to remove notions of "input" and "output" and just reason about a constraint graph, but I don't know how to work that way. In my vague thought experiments, the extra polarity comes in handy for choosing orders of evaluation and such.)"
No clue what you're talking about here.
---
"Personally, I would expect this unconstrained foo to shadow nonlocal variables named "foo" just as if it were constrained to a particular value. However, if I tried to access it, I would expect some kind of uninitialized variable error. Is your version of "The Right Thing(tm)" the same as mine?"
If one of the branches does not bind "foo" and you try to use "foo", it will use the lexically scoped "foo":
(let foo 5
(let (or {foo} bar) 2
# foo is 5
# bar is 2
))
That's a good point, though: I should support the ability to write a pattern matcher that binds non-matching branches to variables that throw an error when accessed.
That way you can have a version of "or" that uses the lexical scope for non-matching branches, or one that uses special variables or whatever.
---
"In other news, this kind of (o a 5) syntax means you're putting expressions inside your patterns inside your expressions."
No, it doesn't. These are all patterns:
(o a 5)
{a (o b 5)}
(list a (o b 5))
"{a (o b 5)}" is just syntax sugar for "(list a (o b 5))"
A pattern is defined as:
* A list, in which case the first element has a %pattern-match property that is a vau that describes how to do the pattern matching.
* A symbol, in which case the value is bound to the symbol.
* Everything else is compared by equality.
So in the above case, "list" has a %pattern-match property, and "o" also has a %pattern-match property. It's just simple recursion with first-class environments. Very simple to understand and implement.
---
"You may need to ask yourself just what variables should be in scope in that inner expression; do they include some variables "previously" bound in the pattern? There was no need for this notion of "previously" until now."
This is a non-issue for the most part because if a variable occurs twice, it will test for equality. But yes, it currently evaluates the pattern in the environment of the previous patterns.
Because Nulan uses first-class environments, it's also easy to evaluate the patterns in the dynamic environment, or the lexical environment. I think I'll change it to use the lexical environment.
---
"I don't especially like the idea of infix operators breaking up juxtaposed identifiers like "a b". Infix operators are visible boundaries, and juxtaposition has no boundary except whitespace."
Personally I think it looks nice and short. I might choose a different operator other than "=" but I don't see a problem with the idea itself. You just have to keep in mind that "=" is an infix operator, that's all.
---
"Because of my preference for patterns to be inverted expressions, I would rather (is b 5) match only t or nil. If the input is t, then b will be bound to 5. If the input is nil, b will be bound to anything but 5."
I don't have much interest in constraining pattern matching to only be inversions of expressions. User code can supply any kind of pattern matching they want. And the reason for using "is" is because "a = b" expands to "(is a b)". It isn't actually tied to the is-expression, as you might call it.
Also, I would probably use "if" for such an expression. As in:
(var (if a) %f) -> error
(var (if a 1) %f) -> error
(var (if a 1 2) %f) -> 2
(var (if a) "foo") -> "foo"
(var (if a 1) "foo") -> 1
(var (if a 1 2) "foo") -> 1
That actually seems pretty useful. Thanks for the idea.
"I have no clue what you mean by expression-and. I guess you're talking about using "and" in a non-pattern context."
Yep, specifically an expression context. :-p
---
"I think you're thinking in terms of Haskell pattern matching. I view pattern matching as being more general and dynamic, not exclusively about destructuring the components of data."
I see that, of course. I'm not even saying I disagree with adding that functionality, just saying I'd choose operator names so that whenever a pattern and an expression are identical, they correspond in semantics too.
Haskell's pattern matching supports backtracking if the pattern fails, but its expressions don't support the same feature, so I hesitate to cite Haskell as a good example of what I'm going for. It is closer.
---
Me: "The same implementation fits the requirements for a pattern version of 'or, so why choose one of these names over the other?"
You: "No it doesn't. "or" matches only a single pattern, the first one that works. "and" always matches all the patterns, or fails. This is analogous to how "or" and "and" work in pretty much every programming language."
I was still talking about pattern-expression symmetry, trying to explain it with enough examples to convey how I reason about it and why I might make different choices than you do.
You're choosing definitions of pattern-and and pattern-or are (in most ways) not inverses of your definitions of expression-and and expression-or.
The point of pattern-as is that it takes one input and gives some outputs equal to it. If on the other hand you give several equal inputs to expression-and or expression-or, they'll all be equal to the output. Thus, both of these expression operators are special cases of the inverted interface of pattern-as.
---
"No clue what you're talking about here."
Logic variables are variables whose value is defined by solving a system of partial descriptions. Here's a sketch of an example:
(introduce-logic-vars (a b c d)
; We don't know anything about a, b, c, or d yet.
(= a (cons b c))
(= (cons d a) c)
; Now we know a is the infinite list (b d b d b d ...).
; The order of = doesn't matter.
(= a (cons b a))
; Now we know b and d are identical.
(= c "not a cons")
; Now we know there's an error, since the descriptions are
; inconsistent.
)
The terms "unification" and "constraint" may come in handy too, but these are strongly associated with logic programming and constraint programming respectively, and I'm not sure which kind of programming I'm talking about here.
---
Me: "In other news, this kind of (o a 5) syntax means you're putting expressions inside your patterns inside your expressions."
You: "No, it doesn't."
The "5" part is an expression, right?
Suppose the pattern were {a b c (o d (cons b e)) e}. Should the expression (cons b e) see the value of b as bound by the pattern, or the version in the nearest non-pattern surrounding scope? What version of e should it see?
---
"That actually seems pretty useful. Thanks for the idea."
Well, I'm glad you found something that might be useful, but I think that idea is all yours, lol. ^_^
"I see that, of course. I'm not even saying I disagree with adding that functionality, just saying I'd choose operator names so that whenever a pattern and an expression are identical, they correspond in semantics too."
It's up to the object to decide whether to ensure that correspondence or not. What operator name would you suggest for my "and" pattern matching?
---
"I was still talking about pattern-expression symmetry, trying to explain it with enough examples to convey how I reason about it and why I might make different choices than you do."
Yes, I know, and I'm saying I don't see any real benefit to that. So you're going to have to make a harder case to convince me.
---
"You're choosing definitions of pattern-and and pattern-or are (in most ways) not inverses of your definitions of expression-and and expression-or."
Yes, but I think it fits with the intuitive understanding of what "and" and "or" do. "and" makes sure all its arguments are "truthy" and "or" selects the first "truthy" argument. In this case, "truthiness" means whether the pattern matches or not. In an expression context, "truthiness" would mean "not %f"
---
"The point of pattern-as is that it takes one input and gives some outputs equal to it. If on the other hand you give several equal inputs to expression-and or expression-or, they'll all be equal to the output. Thus, both of these expression operators are special cases of the inverted interface of pattern-as."
Yes, and "and" is a superset of "as" because it behaves exactly like "as" when the first argument is a symbol and the second argument is a pattern:
(var (and a {b c}) {1 2})
(var (as a {b c}) {1 2})
But as you rightfully pointed out, "and" can do things besides that, because it can have more than 2 arguments, and it doesn't require a symbol as the first one. Thus, "as" is a constricted form of "and", effectively. Thus, "and" is a superset of "as".
---
"Logic variables are variables whose value is defined by solving a system of partial descriptions."
Okay. I have no clue how to implement such a system, so I'll stick with my nice simple system.
---
"The "5" part is an expression, right?"
In that particular case, I suppose so, yes.
If you call "pattern-match", it's a pattern, if you call "eval", it's an expression. "o" happens to call "eval" on its second argument, so it's an expression.
---
"Suppose the pattern were {a b c (o d (cons b e)) e}. Should the expression (cons b e) see the value of b as bound by the pattern, or the version in the nearest non-pattern surrounding scope? What version of e should it see?"
The "o" pattern matcher can actually choose whether its second argument is evaluated in the pattern-match environment or the enclosing environment. I suppose the "obvious" thing to do would be to evaluate it in the pattern-match environment.
As for "e"... the "list" pattern matcher does its pattern matching from left-to-right by recursing on the cdr of the list. Which means at the time (cons b e) is evaluated, "e" is not in the pattern-match environment.
As for what version it should see... I like the simplicity of doing the pattern matching from left-to-right, so I suppose it would see whatever "e" is bound (or not bound) in the pattern-match environment at the time it's evaluated. In that case, it'll probably use the "e" from the enclosing scope.
---
"Well, I'm glad you found something that might be useful, but I think that idea is all yours, lol. ^_^"
"I have no clue how to implement [logic variables]..."
I'll pretend you're curious though~! :D
In proof theory terms, constraint logic programming is closely related to proof search: Given a theorem, it finds a proof. Functional programming is closely related to proof normalization: Given a proof, it finds a simpler way to express the same proof (by, for example, reducing the proof all the way to a literal value).
Going by this correspondence, one way to combine these kinds of programming is by putting the theorem-heavy constraint logic code into a type system for the proof-heavy functional code. Agda is probably a very expressive language for this approach, since the full computational power of its "static" type system can potentially live until run time. Other typed functional languages with implicit arguments or type classes fit this bill too, but the phase separation might stifle it.
If you're not so interested in type systems, other languages have tackled this combination of programming paradigms in a more procedural way. It looks like http://en.wikipedia.org/wiki/Oz_(programming_language) and its "see also" links are a good starting point.
A search for [racket constraint logic] brings up Racklog (http://docs.racket-lang.org/racklog/index.html). It doesn't look like Racklog has numeric constraints, and it's based on Prolog besides, so it's clearly in the logic programming camp.
I don't really have a clue how this stuff is implemented, but I fear it might just be a lot of backtracking search most of the time. :-p I'm guessing (ISO standard) Prolog is particularly doomed to backtracking search, since some of its predicates have side effects.
There's probably some kind of relationship between depth-first vs. breath-first search and strict vs. lazy evaluation....
---
"[...]so I'll stick with my nice simple system."
"I like the simplicity of doing the pattern matching from left-to-right[...]"
As Rich Hickey says, "simple" and "easy" are distinct concepts. An easy implementation doesn't necessarily lead to a simple result.
In his "Simple Made Easy" talk, he specifically considered 'fold more complex than 'reduce due to the way it imposed a left-to-right or right-to-left ordering that wasn't necessarily part of the programmer's intent.
fun {Fact N}
if N =< 0 then 1 else N*{Fact N-1} end
end
fun {Comb N K}
{Fact N} div ({Fact K} * {Fact N-K}) % integers can't overflow in Oz (unless no memory is left)
end
fun {SumList List}
case List of nil then 0
[] H|T then H+{SumList T} % pattern matching on lists
end
end
I found the above to be cool: notice that the function arguments match the function call. This is like what akkartik was talking about: http://arclanguage.org/item?id=16924
I admit I am vaguely curious about lots of things. That Racklog in particular seems interesting.
---
"As Rich Hickey says, "simple" and "easy" are distinct concepts. An easy implementation doesn't necessarily lead to a simple result."
I am aware. But until I either A) find a better system that I can actually understand and implement or B) find some severe flaw with my system, I'll stick with my system.
In addition, because of immutable environments, variable bindings are strictly from top-to-bottom. So enforcing that same kind of strict left-to-right ordering is more consistent and I think intuitive for the programmer. So I would say it is both easy and simple.
---
"In his "Simple Made Easy" talk, he specifically considered 'fold more complex than 'reduce due to the way it imposed a left-to-right or right-to-left ordering that wasn't necessarily part of the programmer's intent."
Well, okay, but sometimes you really do want that different ordering.
---
"Ironically, he also dissed type systems. :)"
I guess it depends highly on what he meant by "type systems". I haven't seen that video so I don't know.
"What operator name would you suggest for my "and" pattern matching?"
I've been using "as" in this conversation, but I suppose I'd try to name it based on how its corresponding expression would behave (even if it isn't really usable as an expression). Since the point of this pattern is to act as a T-junction whose inputs and outputs are all equal, How about "idfn"?
If all else fails, I might name it something like "pand" just so it didn't conflict with 'and. But I don't seriously expect you to reach the same conclusion. :)
---
"So you're going to have to make a harder case to convince me."
Last time I brought this up, I gave an example implementation of a pattern-matching macro (with no backtracking) that took its operators from the same environment as the functions. In fact, the operators were implemented as Racket procedures with multiple return values.
At the time, I think I might have cited the advantage that a single macro could be used for patterns and expressions without any difference in the code it emits.
Personally, I find that a weak argument; there aren't likely to be very many macros that take advantage of this. Since you're embracing fexprs rather than macros, I don't expect it to convince you either, but I'm bringing it up just in case.
That's not a bad name choice at all, except that idfn takes a single argument, so it feels weird to have it take two with the pattern match version.
---
"If all else fails, I might name it something like "pand" just so it didn't conflict with 'and. But I don't seriously expect you to reach the same conclusion. :)"
That would be a good idea if there were multiple competing implementations for the "and" pattern. In this case, I think my implementation is reasonably intuitive and simple. So there's not much reason to rename it to avoid that sorta thing.
By the way, I know it's a bit of a pain, but it is possible to just do this:
(def pand and)
And now you can use "pand" instead of "and" in your own code.
---
"Since you're embracing fexprs rather than macros"
I'm actually only embracing fexprs because the implementation is super easy and makes dealing with environments easier.
I could instead have gone with the macro route and had a "call-with-current-environment" (analogous to call-with-current-continuation) instead.
That might actually be a better idea. It kinda bugs me having to specify the environment in vaus even if you aren't going to use it.
---
"As you indicate, this has to do with the pattern choosing to call 'eval on an argument. I think the scope question will come up pretty much any time a pattern does that. If different patterns do different things here, that's okay, but I'd hesitate to introduce that inconsistency without a good reason."
My philosophy in this case is basically "don't worry about it, just let the patterns do what they want to do. If I make it easy to do The Right Thing(tm), it'll be okay"
In other words, if I make it easy (and idiomatic) to evaluate in the pattern-match environment, then pattern matchers will naturally go that route unless they have a specific reason to do something else, in which case they're probably right.
"But as you rightfully pointed out, "and" can do things besides that, because it can have more than 2 arguments, and it doesn't require a symbol as the first one."
I didn't suppose 'as would have those restrictions. The reason 'as is a restricted version of 'and is because the arguments of 'as are fully determined by its value, while 'and is nondeterministic in that direction.
(I'm not talking about your pattern-and. That's just pattern-as to me.)
---
"In that particular case, I suppose so, yes."
That's the particular case I was talking about. :)
As you indicate, this has to do with the pattern choosing to call 'eval on an argument. I think the scope question will come up pretty much any time a pattern does that. If different patterns do different things here, that's okay, but I'd hesitate to introduce that inconsistency without a good reason.
Still, getting something that works for now is a good reason. :-p
> (var (list a (splice b) c) (list 1 2 3 4))
a -> 1
b -> (2 3)
c -> 4
Unfortunately it hardcodes the symbol "splice" but I plan to change that. Now to work on dictionary splicing... Oh, and by the way, once I get the reader working, the above will look like this:
See how easy that is?! But the best part is that this customizable pattern matching is available to Nulan. The implementation is decoupled from the interface, because I'm using objects.
Anyways, how dictionary splicing works is...
> (var [@a "foo" b] ["foo" 1 "bar" 2 "qux" 3])
a -> ["foo" 1 "bar" 2 "qux" 3]
b -> 1
> (var ["foo" a @b] ["foo" 1 "bar" 2 "qux" 3])
a -> 1
b -> ["bar" 2 "qux" 3]
What happened is that in the first one, it took the entire dictionary and bound it to "a", then it bound the "foo" key to "b".
In the second one, it bound the "foo" key to "a", and then bound everything except the "foo" key to "b".
Thus, depending on where you splice, you get different results. The first one is like Haskell's "as" patterns:
The second one is the more traditional "put everything except these keys into this variable" approach. And of course you can combine them as much as you like:
> (var ["foo" a @b "bar" 2 @c] ["foo" 1 "bar" 2 "qux" 3])
a -> 1
b -> ["bar" 2 "qux" 3]
c -> ["qux" 3]
And for fun, you can nest things inside the splice too:
> (var ["foo" a @["bar" b]] ["foo" 1 "bar" 2 "qux" 3])
a -> 1
b -> 2
The above is exactly the same as (var ["foo" a "bar" b] ["foo" 1 "bar" 2 "qux" 3]) except that it needed to do a bit more work. Useful? Who knows, but it's awesome.
---
By the way, for the sake of clarity, I used Nulan notation for all the above stuff. But the reader isn't implemented yet, so for now you have to use (dict ...) rather than [...] and (splice ...) rather than @...
In addition, it no longer hardcodes the "splice" symbol, but now the only way to splice is to use @. I plan to fix this so that you can use both in a hygienic way.
Aaand now mutable variables work. I decided to make all variables mutable by default, but I miiiiight add in a way to make a variable constant.
Now there's two primitives for dealing with names: var and set!
(var a 10)
(set! a 20)
"def" then is built on top of them:
(var def
(vau e (list n v)
(eval e (list do (list var n %f)
(list set! n v)))))
Basically, that means (def a 5) is equivalent to (do (var a %f) (set! a 5))
This is important for recursive functions, which need to refer to themself. In other words, "var" is like "let" in Arc, and "def" is like "var" in JavaScript:
Ah, yes, I actually took out fn because I plan to build it in "02 nulan.nu" rather than "01 nulan.rkt". I put in a quick fix to get fn working, but I'll need to change that later.
Well, after a couple false starts, I finally got a Nulan interpreter written in Racket working. Oh, and, ignore the README, it's super duper outdated and I haven't updated it yet.
You can use "./nulan" to get to a REPL. Near the bottom of "01 nulan.rkt" you can see all the stuff that's currently supported:
* An awful lot of functionality is customizable via objects. That includes custom pattern matching. I plan to make more things customizable as time goes on.
---
I found that writing it as an interpreter was an awful lot easier than trying to write it as a compiler, because I discovered that the way I was doing pattern matching made it impossible to determine which variables in a vau are free.
How it works is... it's passed three arguments: the environment, the pattern, and the value. It returns a new environment which is the result of matching the pattern to the value.
But it doesn't do an awful lot: it handles symbols, the wildcard ~, lists use the %pattern-match function of an object, and everything else just uses "eq?".
So, for instance, the "list" pattern matching is implemented here:
It's actually a lot simpler than it looks. It just recurses down the pattern and value, calling "pattern-match" on each element of the lists. Then, at the end, if either the pattern or value is not null, there was a mismatch, so it throws an error.
The argument list of a vau obviously uses the "pattern-match" function, but another place it's used is "def":
See how simple that is? It simply unboxes the dynamic environment, runs it through the pattern matcher, then assigns it to the dynamic environment[1].
With that small amount of code, the following now works:
(def (list a b) (list 1 2))
Which will bind "a" to 1 and "b" to 2. And as said, it's completely customizable: just slap a %pattern-match property onto any object.
And because Nulan is a vau-based language with first-class environments, you can use "def" inside vaus to create local bindings:
(fn ()
(def a 5)
a)
This is like "var" in JavaScript only much better.
---
* [1]: The reason the Racket version is a bit funky is because the "pattern-match" function is written in Nulan style. Here's how "def" would be written in Nulan:
(def def
(vau e {n v}
(let v (eval e v)
(set! e (pattern-match (unbox e) n v))
v)))