Arc Forumnew | comments | leaders | submitlogin
The Curse of Chains: How the informal, ad-hoc, ill-specified multimethod dispatch hurts Arc
11 points by almkglor 6017 days ago | 15 comments
Arc suports dynamically redefining functions by using something like the following pattern:

  (let old car
    (= foo
       (fn (c)
         (prn "car!")
         (old c))))
This pattern has been captured in the Anarki 'redef:

  (redef car (c)
    (prn "car!")
    (old c))
It's a simple and direct way of overloading a function, for example to add handling for, say, your own types. This style is known as "chaining", since, if you can't handle the case, you fall through to the "old" version.

Unfortunately, this informal method of overloading can be badly handled. The cases are edgy, but the last thing you want in a language is to have to handle edge cases.

For example, consider a vector type implemented underneath as a table:

  (def vector ()
    (annotate 'vector (table)))
  (defcall vector (v i)
    (if (and (isa i 'int) (>= i 0))
        (v i)
        (err:string "Vector expects a non-negative integer - " i))
  (redef sref (v val i)
    (if (isa v 'vector)
        (sref (rep v) val i)
        (old v val i)))
  (redef len (v)
    (if (isa v 'vector)
        (+ 1 (or (best > (keys:rep v)) -1))
        (old v)))
  (redef keys (v)
    (if (isa v 'vector)
        (let max (best > (keys:rep v))
          (if max (range 0 max)))
        (old v)))
For convenience, you can also redefine 'cut:

  (redef cut (v start (o end (len v)))
    (if (isa v 'vector)
        (let newv (vector)
          (if (< end 0)   (zap + end (len v)))
          (if (< start 0) (zap + start (len v)))
          (for i start (- end 1)
            (= (newv (- i start)) (v i)))
          newv)
        (old v start end)))
It'll work, right?

Unless you happen to have done this before you loaded the above:

  (require "lib/scanner.arc")
Why? Because scanners can be infinite:

  (def generator (f init)
    (scanner 'car init
             'cdr (generator f (f init))))
scanner.arc's 'cut, in fact, will work properly with infinite lazy lists like the above:

  (= foo (generator [+ 1 _] 0))
  (= bar (cut foo 2))
  (car bar)
  => 2
Unless, that is, you redef cut as in the 'vector example. Then it'll fail, because vector's 'cut attempts to get the length of an infinite list!

Of course, nobody uses scanners anyway. YET. But what if someone does? Then we suddenly get all sorts of very strange behavior when we start mixing libraries: ouch.

Or take another example. Suppose our 'redef of a function is buggy:

  (redef cut (seq start (o end (lne esq)))
    (...))
The problem is that we can't easily remove it: the bug is now embedded into the system, because of chaining. When you correct this and attempt to reload it, it will still fail, because instead of replacing the handling, we're just chaining: and everything after the buggy link is now affected by the bug.

This is one reason why we need a better multimethod dispatch scheme.



3 points by nlavine 6016 days ago | link

It seems to me that Arc provides a solution to this problem in the fact that you can call any data object as a function, and define what it means to do so. What we're really getting at, though, is a theoretical issue with what a function is.

The easiest solution to your problem might be a callable table whose keys are the types of arguments and whose values are different function implementation. A slightly cooler implementation could provide arbitrary predicates by having a list of (predicate, implementation) pairs. When it was called, it would iterate down until it found a predicate that matched the arguments, and then call that implementation. You could make either of these pretty trivially.

However, this gets at the idea of what a function really is. A theoretical "pure lambda" is something you can apply to other objects and maybe get a result from. I think we need these in the language, to support primitive types and a foreign function interface.

When you talk about chaining, however, you're thinking of something like a linked list of these things, and you want to manipulate the structure of the list. That's already getting away from the pure lambda idea. My ideas get farther away, and at the risk of looking foolish, I'm going to claim that anything that solves your problem also includes a way to get inside a function object and mess with its internals. The question then is, what should a function be? What should you be able to do to one? Here's a short list of what we've talked about:

  ; f is an extended function object.
  ; the following should work.
  (apply f ...) ; call f on some args
  (redef f (argtyps) ...) ; redefine the behavior of f on some argument types, or add a new implementation if one isn't already there.
  (extended-function
     argtyps imp
     ...) ; takes pairs of argtypes and implementations, and makes an extended function object.
You also need the primitives still, so you'll want this:

  ; f is some sort of object.
  (callable? f) ; see if f is any sort of callable
  (primitive-procedure? f) ; see if f is a pure lambda. this might be a bad name for this function.
There would be primitive constructors for assembly code and other languages.

  (asm ...) ; takes asm, returns a primitive-procedure
  (from-shared-object file name) ; loads a primitive from a library. this should probably be renamed too
  (from-c-header header name impfile) ; loads a primitive, but deals with types nicely by reading a C header file that defines this.
Is there anything else?

-----

3 points by almkglor 6016 days ago | link

> The easiest solution to your problem might be a callable table whose keys are the types of arguments and whose values are different function implementation.

Yes. But there's a problem: how do you define the table for, say, the variadic function '+ ?

Your solution(s) are approximately what I had in mind, although I think I found some problems with it last night, and I just woke up and can't remember quite yet (brain is still booting or something).

-----

3 points by almkglor 6016 days ago | link

Okay, brain is up and running.

One way to do this is to use settable-fn.arc and have some sort of attachment for a table of argument signatures.

HOWEVER, we will need to formalize on what part of the argument signature to dispatch from.

For example we can consider nex3's 'defm syntax:

  (defm foo ((t arg type) (t arg2 type2))
     ...)
But then, what about optional parameters?

  (defm foo (arg (o arg something))
    ...)
Also: consider the case of 'coerce . It would be trivial to convert from our type to a target type:

  (defm coerce ((t arg our-type) target)
    (case target
      int    (our-type-to-int arg)
      string (our-type-to-string arg)
      ...))
However, how about when we want to convert to our type? How do we express (coerce (string something) 'our-type) ?

-----

3 points by almkglor 6013 days ago | link

Okay, here's the solution I've been thinking of.

  (def coerce (obj typ . rest)
    (apply <base>coerce obj (annotate typ nil) rest))

  (defm <base>coerce ((t obj my-type) (t _ string))
    (convert-my-type-to-string obj))

                                     ; edit: corrected type
  (defm <base>coerce ((t obj string) (t _ my-type))
    (convert-string-to-my-type obj))
Also, for variadic functions:

  (def + rest
    (reduce <base>+ rest))

  (defm <base>+ ((t x string) (t y string))
    (join x y))
Further, optional parameters are simply ignored and considered as part of the rest parameter (i.e. they can't be typed). Basically, the typesystem matches on the first N parameters, where N is how many type parameters you care to define.

Why the first N parameters? So that we can protect against optional parameters defaulting to values that are inappropriate for other types, such as the 'scanner example I gave above.

-----

3 points by rntz 6015 days ago | link

Ideally, multimethod dispatch would be unified with pattern matching/destructuring. After all, pattern-matching (always?) implies the argument is of a given type. So functions would be "open pattern-matchers"; instead of just defining a function once, you keep adding patterns and their associated code. The problem is precedence: if multiple patterns match, which one do you use? If you want to integrate "inheritance" into this (given pg's views on OO, he probably doesn't), it gets even more complex.

The ideal solution, IMO, is that "implication means precedence". If we have patterns A and B, then if A(x) (if A matches pattern x; I'm treating patterns as predicates here) implies B(x) but not vice-versa, A has precedence over B. This is a superset of the traditional OO "inheritance" model; model types as predicates for membership in that type, and being a member of a subclass implies being a member of the superclass.

I have, of course, absolutely no clue how best to implement this. A minilanguage for pattern-matching predicates and the implications among them would be the simplest solution I can think of. The "right way" would probably involve integrating the type system into the language, along the lines of what Qi has done.

This "solution" is also not necessarily unambiguous. A(x) and B(x) may both match without any implication relationship. I suppose some sort of auxiliary specification, a la the C3 MRO (see http://www.python.org/download/releases/2.3/mro/) would be necessary.

However, the above sounds like something you'd want to build on top of an existing function system; otherwise you're certainly not following pg's ideal of an "axiomatic" approach to language design.

-----

3 points by stefano 6016 days ago | link

The source of this problem is not only in chaining itself: redefining standard functions will almost always lead to problems when distinct libraries redefine the same function. That's why other languages have module systems. Moreover, another problem arises from this chaining: performance. The more the chain is long, the more function calls it will need to get the original behavior. Were you thinking about a multimethod dispatch such that of CLOS?

-----

3 points by almkglor 6016 days ago | link

> The source of this problem is not only in chaining itself: redefining standard functions will almost always lead to problems when distinct libraries redefine the same function.

But this should be supported. For instance, in C++, you can redefine the "standard function" operator+ :

  Foo something(Foo a, Foo b){
      return a + b;
  }
Or in Ruby, you can redefine the "standard function" each:

  a = Foo.new()
  a.each{ |i|
    puts i
  }
C++ does it by strict static typing, while Ruby does it by attaching to the object. Neither way feels very Arcish; what is a better Arc solution?

> Were you thinking about a multimethod dispatch such that of CLOS?

Possibly, if we can hack this into Arc somehow.

-----

3 points by stefano 6016 days ago | link

The CLOS solution is pretty similar to the Ruby solution: it checks the type passed to the function and dispatch to the correct function. The big difference is that CLOS looks at all the arguments, while Ruby only at the first argument (the object). The CLOS approach seems to me the best to solve the problem, but I think that applying it to functions as basic as car would kill performance without a really good optimizer.

-----

3 points by almkglor 6016 days ago | link

> The CLOS solution is pretty similar to the Ruby solution: it checks the type passed to the function and dispatch to the correct function. The big difference is that CLOS looks at all the arguments, while Ruby only at the first argument (the object)

IMO the difference is big enough for CLOS Way != Ruby Way. ^^

Still, I wonder - how does CLOS implement this? How about for variadic functions?

> but I think that applying it to functions as basic as car would kill performance without a really good optimizer.

Hmm. I've been trying to grok through dynamic dispatching speedup techniques - the Sun Self papers are pretty good, Sun's index doesn't have the papers themselves but you can ask Google for them.

-----

2 points by stefano 6015 days ago | link

> Still, I wonder - how does CLOS implement this?

I think that for every method with the same name an hash table indexed on the types of the argument is created, and when the method is called, a lookup is made on the table.

> Hmm. I've been trying to grok through dynamic dispatching speedup techniques - the Sun Self papers are pretty good, Sun's index doesn't have the papers themselves but you can ask Google for them.

I've found those papers in the past (a few weeks ago), but I've never found the will to read them, they are written for people already in the compilers' world and I'm still learning the basics about compilers. BTW, a good type inferencer + a good function inliner could solve the problem, but I wouldn't know even where to start to implement them :(.

-----

2 points by almkglor 6014 days ago | link

The gist of the papers are mostly this:

Use a "Polymorphic Inline Cache" (PIC). Basically if a piece of code could call several different methods, we figure out which one it is, then we create a copy of the calling function which does the type checking at the top and specialize all method calls to that type:

  (defm method ((t n my-type))
    (my-type::method n))
  (defm method ((t n your-type))
    (your-type::method n))
  (defm other-meth ((t n my-type))
    (my-type::other-meth n))
  (defm other-meth ((t n your-type))
    (your-type::other-meth n))

  (def general-function (n)
    (+ (method n) (other-meth n)))

  (general-function (your-type-creator 42))
  ===>
    (do
      (def general-function (n)
        (if (isa n 'your-type)
          (+ (your-type::method n) (your-type::other-meth n))
          (+ (method n) (other-meth n))))
      (general-function (your-type-creator 42)))
Everything else in the papers that go beyond the PIC is mostly about debugging and making the PIC lazy.

Edit:

Okay, I've been thinking. Basically the call* table is about specializing on 'apply, and we might think of 'defcall as:

  (defcall type (val . params)
    (code))
  ==>
  (defm apply ((t val type) . params)
    (code))
Could we possibly generalize this at the language level and make a PIC, say in arc2c/SNAP?

-----

1 point by stefano 6014 days ago | link

To do the optimization when we see

  (general-function (your-type-creator 42))
we need a type inferencer to discover the return type of your-type-creator. The cache should also be able to change. For example I could write:

(general-function (your-type-creator 42))

(set your-type-creator another-type-creator)

(general-function (your-type-creator 42))

Now the optimization doesn't work if the cache doesn't change. This seems a rare case, though, and it's useless to optimize rare cases.

-----

1 point by almkglor 6013 days ago | link

Actually we don't: general-function is actually defined this way:

  (with (PIC (table) ;init empty table
         num-calls 0
         orig-fn
         (fn (n)
           (+ (method n) (other-meth n))))
    (def general-function (n)
      ((aif
         ; don't infer type: just look at the runtime type
         (PIC:type n)
            it
         (is optimization-trigger-level** (++ num-calls))
            (do (= num-calls 0)
                (= (PIC:type n)
                   (optimize* orig-fn (type n))))
            orig-fn)
        n)))
Basically s/type inference/just look at it directly/

-----

1 point by eds 6014 days ago | link

> Still, I wonder - how does CLOS implement this? How about for variadic functions?

I don't think CLOS lets you check types on &rest, &optional, or &key parameters. So you couldn't use CLOS for the current behavior of '+.

Also note that CLOS only works on methods with "congruent lambda lists", that is, methods with the same number of required, optional, and keyword arguments. So you can't have

  (defmethod foo ((a type-a)) ...)
  (defmethod foo ((b type-b) &optional (c ...)) ...)

-----

1 point by almkglor 6014 days ago | link

ah, I see. So I suppose this greatly simplifies things then.

Hmm. This certainly seems easier to hack into arc. We could have each method start with a generic function whose lambda list we have to match.

As an aside, currently the base of Arc lambda lists are simply &rest parameters (i.e. optional parameters are converted into rest parameters with destructuring). Should we match on the plain rest parameters or should we properly support optionals?

-----