Arc Forumnew | comments | leaders | submitlogin
3 points by rocketnia 5522 days ago | link | parent

The way I see it--cue rant--the compiler is looking for information about the runtime binding of a symbol at compile time. Specifically, it wants to know how the velcro in a velcro call treats its arguments. This is the job of a static type system.

Suppose we have something like this:

  (def compose (f g)
    (fn (x) (f (g x)))
I don't know if this is true about Eight, but I suspect that you can pass macro-like velcros to this version of compose and expect to observe the expressions 'x and '(g x). Maybe that makes this 'compose have a sloppy API, but it also makes it a good example.

The type of 'compose might be expressed as something like this:

  (all-of
    ; In the case where f and g both evaluate their parameters, it's the type
    ; you'd expect.
    (for-all-types f-res g-res x-type
      (->
        (at-least (-> g-res f-res))
        (at-least (-> x-type g-res))
        (-> x-type f-res)))
    ; When g doesn't evaluate its parameter but f does, the symbol 'x is passed
    ; directly.
    (for-all-types f-res g-res x-type
      (->
        (at-least (-> g-res f-res))
        (at-least (-> (unevaluated x-type) g-res)
        (-> x-type f-res)))
    ; When f doesn't evaluate its parameter, g doesn't even have to be a velcro.
    ; The unevaluated expression '(g x) is passed directly.
    (for-all-types f-res g-type x-type
      (->
        (at-least (-> (unevaluated (list g-type x-type)) f-res))
        g-type
        (-> x-type f-res))))
  
  ; Notes on the forms used here:
  ;
  ; all-of:
  ;   A procedure that constructs the type of a velcro that can take any of the
  ;   given roles, where the roles are themselves velcro types.
  ;
  ; for-all-types:
  ;   A quantifier form that specifies the type of a velcro that can take any of
  ;   several roles, which have specifications that vary only by one
  ;   or more unrestricted types. The types quantified over are given as the
  ;   first parameters, and the way to construct the signature based on the
  ;   types is given as the last parameter.
  ;
  ;   I think for-all-types can be implemented as a macro by making "unification
  ;   placeholder" types for all the quantified-over types and then wrapping the
  ;   result in a "unification start" type. The point is that the resulting type
  ;   object needs to be able to determine, given unevaluated parameters,
  ;   whether it accepts these parameters and, if it does, what type the result
  ;   is. In this case, it uses pattern matching based on a template type.
  ;
  ; ->:
  ;   A procedure that constructs a simple velcro type in terms of the types of
  ;   its parameters followed by the type of its result. Note that this example
  ;   doesn't take varargs into account.
  ;
  ; at-least:
  ;   A procedure that constructs the type of a velcro which can take the given
  ;   role, but which might also take other, unspecified roles. The result of
  ;   this can only be used as a parameter type, not as a return type, since
  ;   it can't be used to determine the result type of an invocation that falls
  ;   under one of the unspecified cases.
  ;
  ; unevaluated:
  ;   A procedure that constructs a type that describes unevaluated input to a
  ;   velcro. The result of this will always be a simple wrapper type that is
  ;   the "unevaluated" version of some other type. (As with 'at-least, this
  ;   kind of result can only describe a parameter, but for a different reason.)
  ;   However, this doesn't simply wrap its argument.
  ;
  ;   If the input to 'unevaluated is a type, inside the wrapper will be the
  ;   type of a syntax-symbol that is bound to a value of that type.
  ;
  ;   If the input to 'unevaluated is a cons cell, it's recursively applied to
  ;   the car and cdr, then it unwraps those two results, and inside the new
  ;   wrapper will be the type of a syntax-cons with those types of contents.
  ;
  ;   If the input to 'unevaluated is nil, inside the wrapper will be the type
  ;   of the syntax-literal (or whatever) that appears when Eight's nil value
  ;   arrives somewhere unevaluated.
  ;
  ;   No other kinds of input to 'unevaluated have specified results.
Actually, considering that a velcro that evaluates its arguments is the same as a velcro that says it doesn't evaluate its arguments, then proceeds to evaluate its arguments in the normal order before continuing, this can be simplified to just the last case:

  (for-all-types f-res g-type x-type
    (->
      (at-least (-> (unevaluated (list g-type x-type)) f-res))
      g-type
      (-> x-type f-res)))
This is a start, but things can get a bit more difficult.

  (def some-fun ()
    (withs (it (some-calculation)
            start-part (extract-start it)
            ; We're not going to do anything with the humongous middle part.
            stop-part (extract-stop it))
      (aif (some [pos _ start-part] stop-part)  ; This could take a while.
         (* 2 it))))
It would be nice to be able to garbage-collect the result of (some-calculation), together with its "humongous middle part," as soon as extract-stop no longer needs it. Unfortunately, the "it" symbol that appears down in the aif body is currently a closure over that value--even though we as readers know it's going to be rebound at run time by 'leak--so it can't be let go of yet. In this sense, the aif call has "captured" 'it even without the full power of variable capture. (And from the paranoid programmer's standpoint, the aif implementation could even secretly log the previous value of 'it, so there is a sort of variable capture potential here.)

In order for the compiler to optimize this, it needs to be able to ask the aif type what free variables any given invocation has. And in order to do that, it needs to see how unevaluated parameters are used in aif's body. If they're stored or passed to undetermined functions, for instance, then they might be evaluated secretly, and no optimization can take place. Conversely, if the compiler can understand that things like ",(leak 'it then)" and ",(car elses)" will have certain effects, then aif is just fine.

If the compiler does inference like this, then the type system will be slightly different. If we make an appropriate update to the behavior of 'unevaluated in the type system I was using above, the revised version of compose's type might be expressed as follows:

  (for-all-types f-res g-type x-type
    (->
      (at-least
        (->
  
          (unevaluated 'g g-type 'x x-type
            '(g x))
          ; This is an unevaluated type such that 'g is bound to a g-type if not
          ; leaked with some other value, 'x is bound to an x-type if un-leaked,
          ; and '(g x) is the unevaluated expression itself.
  
          f-res))
      g-type
      (-> x-type f-res)))
In order to determine what things like ",(leak 'it then)" and ",(car elses)" can preserve this information, the compiler writer can pick and choose which core functionality like 'leak and 'car is appropriate for expanding at compile time, and these particular expressions can work. However, if someone hacks together a new operation that works only somewhat like 'leak, then it may not be so easy for the compiler to make that determination. At that point, if the programmer even notices the lack of garbage collection--maybe they're storing serializable continuations and finding that they hold lexical references that they'll never use--and even if they figure out exactly where their problem is, it's because they're pushing the limits of syntax already, so they probably have no way to correct it without redesigning the program.

These programmers could use compiler hints. At the simplest level, the compiler might support top-level declarations like (declare leak quick-syntax-tranformer), and that would probably do the trick.

That's all. One kind of 'declare annotation is enough to cover any reasonable challenges I can think of for static analysis of Eight code. I'm sure things could get quite a bit more challenging than that, but this should at least get you to the level of compiling to a macro-less format, which can then be optimized every which way by mainstream compiler techniques.

Incidentally, on my site, rocketnia.com, is an interpreter for a toy lisplike (unusable, even--no lambda or defun yet) that treats every function as a "micro" that takes unevaluated arguments and evaluates them manually, with 'quote as a built-in micro that doesn't take that extra step, and with 'doublequote as a built-in micro that doesn't even parse its arguments. That was made sometime in 2006, I think. I'm also pretty sure I've seen another lisplike since then that does the same thing, although it might have actually been Eight that I saw.

A couple of weeks ago the problem of compiling languages with micros/velcros became really interesting to me, and I came up with the approach you see here. The language I was thinking about this time may not be exactly the same thing--it's inspired more by an everything-is-a-multimethod theme--but multimethods that are macros and procedures at once amount to the same difficulties, so that that's why I've been able to ramble on for so long. ^_^



1 point by diiq 5519 days ago | link

Sorry, I'm still not understanding your proposal. From what I can tell, your example assumes that we can fully type the outputs of functions as well as inputs; I don't see that as a particularly easy thing. Just like any duck-typed language, the return type can be dependent on runtime data. Take, for instance:

    (def pathologic ('a b)
        (if b 
           (fn (c 'd) (,a c ,d))
            a
           (fn ('c d) (,a ,c d))
            5))
And then:

    ((pathologic one-thing another) (list 3 4) '(a b c))
I can't type the function in this expression until the time of evaluation; the same is generally true of higher order functions.

In response to declare leak: I'm not concerned with optimization right now, more with the parsimony and elegance of the language.

In response to your interpreter &c., yes, the idea of first-class macros isn't new. Picolisp has been doing it since at least 2002. As far as I know, Eight is the only language that does so while maintaining lexical scope.

-----

1 point by rocketnia 5518 days ago | link

Right, things can get pretty hairy, and sometimes it won't be possible for the compiler to figure out whether a given expression will be needed unevaluated at runtime, in which case those expressions need to be preserved somewhere in the compiler's result, just in case. The preserved code could still be compiled, too, if the application's file size isn't as important as its speed, or if doing that would save from having to bundle an interpreter in there too. (Take a look at 'compose. If it's exposed in a compiled module, the code (g x) has to be preserved.)

But even for your 'pathologic example, a compiler might be able to determine that the result of the expression must be some result that one-thing could produce. That's because the 'a parameter, 'one-thing, is known at compile time to be a true value (since it's a symbol), and so both of the possible function types successfully accept two parameters and result in something 'a (one-thing) would result in when called.

This might be a bit tough for a compiler to do until it's really mature, but it does fit within the scope of a type system. In particular, the type of (if b x a y 5) might be expressible using an "if" type combinator, which I imagine can be manipulated and simplified as long as there's a way to ask a type whether its values are guaranteed to count as true or false.

More and more pathological examples could be built, though, and that's the halting problem for you.

I'm not concerned with optimization right now, more with the parsimony and elegance of the language.

As it should be. ^_^ Compiling is nothing if not one big optimization, though, so optimization was what I was talking about.

In a non-optimization defense of declare quick-syntax-transformer, it could also potentially make IDE integration slightly better, since the IDE would be able to determine more about the code being generated.

But yeah, it's not very elegant to have something in the language that programs appear to work just as well without, and it's not very parsimonious to have something in the language spec which is likely to mean nothing to lots of implementations. It's just something to keep in mind if there turns out to be a clever tweak to the language that'll help everybody.

-----

1 point by diiq 5518 days ago | link

Compiling is nothing if not one big optimization, though, so optimization was what I was talking about.

Ah. There's our confusion. I define compiling as the transformation of code in one language to code in another (lower level) language.

Sure, Eight can be optimized. I don't see how it can (by my definition) be compiled.

-----

1 point by rocketnia 5518 days ago | link

I was thinking about adding "(and translation)" to that sentence, but I thought that was the easy part. After all, you can just write an interpreter that compiles to the lower level language and package it up with every Eight program. It isn't an especially insightful translation, perhaps, but it's a start.

-----

1 point by diiq 5517 days ago | link

Yes, I concede --- you're right. I should be thinking in terms of partial evaluators, rather than full compilation.

With your permission, I'd like to cp this conversation to the Eight github-wiki, so we can stop cluttering up the Arc Forum ;)

-----

1 point by rocketnia 5516 days ago | link

Sure, that's fine by me.

Speaking of partial evaluators... I was thinking about this topic some more yesterday, in regards to my everything-is-a-multimethod language idea, and I came up with this. I imagine having

  (aif a b c d e)
translate something like this:

  (%apply %compile-to-apply '(aif a b c d e))
Here, %apply is a special value that the compiler knows will evaluate all its parameters, then apply the first parameter to the next. Meanwhile, %compile-to-apply actually breaks down the syntax passed to it so that there's a simpler %apply to execute. Since %apply, %compile-to-apply, and '(aif a b c d e) are constants, a partial evaluator can simplify this to:

  (%apply (%compiler-of aif) '(a b c d e))
Here, %compiler-of is yet another compiler-internal procedure that looks at the signature of its argument in order to produce another procedure that does the actual work of compiling a parameter list to a simpler %apply form. Since (%compiler-of aif) is usually a constant and doesn't usually have side effects, this can be evaluated some more at compile time:

  (%apply ??? a 'b '(c d e))  ; where ??? is an implementation of aif's body
Even if aif isn't a constant, as long as the compiler knows enough about what aif will be (its type), it might be able to reduce (%compiler-of aif) anyway.

Now is when (%apply ??? a 'b '(c d e)) would be inlined if possible, in the hopes of compiling '(c d e). With all of Eight's 'leak, 'comma, and 'asterix, I think that might take a few more techniques than I've thought about. My language's approach to afn would probably work a bit more like this:

  (%apply (%compiler-of aif) '(a b c d e))
  (%apply ??? (list (fn () a)
                    (fn (it) b)
                    (fn () c)
                    (fn (it) d)
                    (fn () e)))
...and therefore avoid having to working with syntax in the method body. Then again, this language is probably going to be considerably more mind-melting to try to read. Here's a generous mockup of the kind of thing I have in mind for the base language (which I'm calling Blade):

  (def (aif (-repeat branch (condition) (consequence it))
            (-optional (else) (fn ())))
    (with-first (finding-on branch
                  (= it (condition)))
      (consequence it)
      else: (else)))
I'm not sure whether you actually wanted to know any of this stuff, heh, but I hope it helps. ^_^

-----

1 point by rocketnia 5519 days ago | link

Your example brings up an interesting side point: 'fn has a special interaction with , and *? Otherwise, ,d and ,c are referring to top-level things. Sounds like 'let probably has the same interaction, since if you could do (let it it ,then), you wouldn't need ,(leak 'it then). Is this something velcros can do too? Something like (def my-let ('var val ... ''body) ...)? ^_^ I'm probably going to check out the Eight interpreter myself at some point, but I figured I'd ask.

-----

1 point by diiq 5518 days ago | link

, and * are just read-macros. ,a -> (comma a) a -> (asterix a)

The lambda-list of fn* is: ('lambda-list ... 'body) so ,c and ,d are not evaluated at the time of function creation.

Let is defined in terms of fn.

I'm not sure what behavior you're hoping for from that double quote.

-----

1 point by rocketnia 5518 days ago | link

Ah, the fact that they're shorthand is what I expected, I was under the impression that even so, they were expanded before the velcro was finally called. Otherwise

  (aif *elses)
seems like it wouldn't work. Time for me to go get the interpreter.

-----

1 point by diiq 5522 days ago | link

Thank you for spending such a non-trivial amount of time explaining this! You'll have to forgive me if I take a day or two to absorb your logic before I can properly respond.

-----