Arc Forum | Poll: I'm stuck with optimized compilation, what's your advice ?

Arc Forum

Poll: I'm stuck with optimized compilation, what's your advice ?

3 points by sacado 6315 days ago | 14 comments

I'm still working on my "speed-up-things" compiler, you know, the Psyco-like, but not quite. Hmm... let's call it Arco.

I was thinking about adding a bit of type inference behind it. After all, in an expression like (+ s "foo" "bar")), you know that s is a string as the + operator is designed to concatenate string, append lists or add numbers. That way, compiling functions would be easier and more efficient (we could deal with lambdas inside lambdas for example). Well, I found quit a big problem.

Operators can be redefined to deal with other types (for example, + on characters could build a string), but that's not a problem as the operator's behavior is conservative : it still works the old way too.

But it can be redefined in a destructive way, as in (= + -). Adding two numbers does not behave the same as before, and you can't inline the use of the + operator. That's why mzscheme code is faster inside modules : predefined operators cannot be redefined, so (+ a b) can be safely inlined. Outside modules, well, the norm says you can destroy the basic operators, so, the interpreter is screwed, it cannot inline things.

My solution, as hinted by almkglor, was to say : "hey, you only compile the functions of your choice, so just get sure you don't redefine the existing operators destructively". That way, the + applied on numbers can be translated to mzscheme's +, itself inlined as the Arc code is in a module.

As long as you declare types, that's easy : you know the type of everything and know if you can inline or not. Infering types looks quite easy too : in +, all types must be the same : (+ x 1) means x is a number, (+ x "foo") means it's a string, etc. But, wait, I could redefine the operator "+" to work on a new type of mine, an amount of money for example. Thus, (+ x '(1 euro)) would add 1 euro (ok, that still works), but you could also provide the shortcut (+ x 1) meaning "add one unit of the current currency". There, we're screwed. And forbidding such things is out of question.

There a a few solutions :

- inserting type declaration inside the code you want to optimize, à la CL, where you can get into the details to have the most optimized code as possible,

- using a Stalin-like model : once the compilation is called, we're in a closed world ; the whole system is compiled, and we know which operators cannot possibly change ; if something is changed in the REPL (a new definition, a modification, ...), you get back to the uncompiled version and have to recompile the whole thing if you want to (not to be used when debugging or designing),

- not dealing with optimized compilation anymore, the compile facility already present is enough to get acceptable performance (we're faster than Python for example),

- insert your idea here.

My favorite would be the second solution. Very hard to implement efficiently, but we can go step after step so that's not really a problem.

Anyway, what's your advice ?

	à la CL
	6 points
	à la Stalin
	4 points
	don't touch anything
	1 point
	I have a better idea
	1 point
	with a hook in 'set and 'safeset
	1 point

2 points by nlavine 6315 days ago | link

Stalin, but there's no need to make the whole program freeze. If compiled code needs to inline operators, just inline them as they are in that particular function, and let the rest of the world keep on changing. If it turns out to matter later, you can always say (recompile function) or something like that. Maybe (update-internal-defs function) would be better.

It would make it a lot easier if you just kept the code to all interpreted functions you defined around, so you could just inline them with whatever definition happened to be around. Keeping the code around is a good thing anyway, though, so that's all right.

(Note: I do not mean keep the code to compiled functions. That would be bad, because compiled function objects are exactly how an ffi is going to get into arc later. We don't want to start assuming that everything has sexps. This does make update-internal-defs a little weird, though. Even though they don't necessarily have sexps, I think they do need to have a table listing everything from the environment that they use.)

-----

3 points by stefano 6315 days ago | link

This way, the best we can do is to bring Arc's performance nearer MzScheme's performance. If we really want Arc to be fast we should implement a real Arc -> binary code compiler. Even without optimizations the code would result fast enough. The problem is that such a task requires a lot of skills and a lot of time.

-----

2 points by eds 6315 days ago | link

Maybe you could set a hook on 'safeset that would trigger Arco whenever you changed a variable holding a function definition? Then you could reanalyze the function and produce a new optimized definition. In between you would be able to assume that the function hadn't changed à la Stalin. Or am I missing something that would prevent this from working properly?

If that isn't possible, I guess CL style is the next best. But I don't like how verbose CL type declarations are. I don't really want to code C in Arc.

Of course, a native code compiler in Arc would provide a huge boost in speed, but I think that is a long term goal.

Another random idea: if you wrote a version of Arco for CL-Arc you could take advantage of CL's type declarations. This could result in even greater boosts in speed. Just a thought.

-----

1 point by almkglor 6315 days ago | link

When someone evil doesn't use safeset:

  (= +
    (fn args
      "I am EVIL!!"))

Hmm. Maybe put the hook in 'set instead?

Also, not sure, but how about when it's given an already-compiled-but-not-optimized function?

  (= foo (fn args "I am EVIL!!"))
  (= + foo)

Not sure about how arco is structured anyway ^^;

-----

1 point by eds 6315 days ago | link

I chose 'safeset only because that is what is used internally in 'def. But I see that you would need to watch 'set to be completely sure no redefinitions had occurred.

So if you put a hook in 'set, and watched for local variables (really function parameters, because 'let is implemented in terms of 'fn) overriding function names, that would probably be enough to know a function hadn't changed. And that would hopefully remove the need for CL style type declarations.

Maybe this should be added as an option to the poll?

-----

1 point by almkglor 6315 days ago | link

In fact, yes you really have to watch for locals either way - it's entirely possible to do this (I have, since I wanted to package Arki in a module-like object):

  (let (localfn localfn2
        help* sig source-file*) nil
    ; protect 'def from bashing Anarki help docstrings
    (= help* (table) sig (table) source-file* (table))
    (def localfn ()
      (prn "localfn!"))
    (def localfn2 ()
      (prn "!2nflacol"))
    (def globalfn ()
      (localfn) (localfn2)
      (prn "haha only global!")))

-----

1 point by eds 6314 days ago | link

Is that multi-variable form of 'let an Anarki thing? I don't remember seeing it before. It does look really cool though.

-----

1 point by almkglor 6313 days ago | link

It's part of the Arc core destructuring thing. For example:

  (let (x y z) (list 1)
    (prn x)
    (prn y)
    (prn z))

...will assign 1 to x, and nils to y and z. Basically it seems that destructuring is done in the following manner:

  (let gs4932 (list 1)
    (let x (car gs4932)
      (let y (car:cdr gs4932)
        (let z (car:cdr:cdr gs4932)
          (body ...)))))

(NOTE: not exact code, but basically this is how ac.scm handles destructuring)

-----

1 point by almkglor 6315 days ago | link

Suppose instead that we define a basic + operation which always operates on two values:

  (def base-+ (a b)
    (...))

Then we simply define + as:

  (def + rest
    (reduce base-+ (car rest) (cdr rest)))
  (def reduce (f z l)
    (if l
      (reduce f (f z (car l)) (cdr l))
      z))

Then, instead of redefining +, we add a rule: if you want your new type to have a + operation and still be optimizable by arco, then redefine base-+, not +

Then we can make type inferences based simply on pairs of values.

-----

1 point by sacado 6315 days ago | link

Oh, and btw, Psyco's behavior would not solve the problem I think. Operator redefinition in Python is much easier to deal with than Arc's.

-----

1 point by jazzdev 6315 days ago | link

If you'd named option 2 after someone besides Stalin, it might be more popular. Maybe Frozen? Or Snapshot?

-----

1 point by absz 6315 days ago | link

Stalin is actually the name of "an aggressive optimizing batch whole-program Scheme compiler," to quote Wikipedia (http://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)). It can be found at http://cobweb.ecn.purdue.edu/~qobi/software.html . Now, it's certainly an unfortunate name for the compiler, but that's hardly sacado's fault.

-----

0 points by Jesin 6315 days ago | link

Arco is a bad name. It should be something more like Maniarc.

-----

2 points by Jesin 6315 days ago | link

Or maybe not, I don't know. Name aside, the idea is great.

I say CL style; it seems to give the most freedom.

-----