| Hello everyone. This is a veeeeery long post, I know, so feel free not to read it all at once (or not to read it at all anyway). PART I. "n" operators and the 20 x factor You may know I am interested in improving Arc's performance (I am sometimes crunching numbers or doing equivalent things and would like to do so in Arc, at least at the prototype level, so that's an issue). I was just thinking about making an optimizer for Arc in the same way Psyco works (or the way I think it works). Recently, I've been adding a few more axioms in ac.scm (secretly, I didn't share it on Anarki yet) that optimize the generated mzscheme code when compiled. These are, basically, n+, n-, n<, n>, nis, nisnt (and friends), all versions of their counterparts without the initial n that know that their args are of type 'num. When you type (+ a b), the generated code is : (ar-funcall2 __+ __a __b)
That means : find the __+ version, apply it the values a and b, check the type for these values, ah, they are numbers ? So call mzscheme's +. It then checks they are effictively numbers, adds them and returns the result.If you call this in a loop, you have a very big overhead besides the simple computation. That's the price to pay for polymorphic functions and dynamic typing. Well, when I know my vars are all nums and I want performance, I call (n+ a b), and the generated code becomes : (+ __a __b)
This is inlined and optimized by mzscheme's JIT, so it just becomes : "add __a and __b".That gives impressive results : (fib 40) with only "canonical" operators takes 350 s. while (nfib 40) with "n" operators only takes 18 s. Yes, a factor of 20. I get no honor of course, mzscheme's JIT does 95% of the work. Have a look at this one now : (time:for i 1 10000000)
time: 11375 msec.
(time:nfor i 1 10000000)
time: 722 msec.
That does work with numbers of course, but it could be done with string operators too, for example (intensive regexes anyone ?)PART II. This is not really Arcish ! Well, there are a few drawbacks of course. First, n operators are special forms, not regular functions. They have the same status as if or fn and are not first-class values. You cannot reduce a list of integers with n+, for example : it has to be embedded in a lambda. Next, when you want to optimize something, you have to change its definition so as to manually change all the operators to optimized ones. It can be discouraging for code you didn't write yourself (the for macro, for example). I would like to have that performance when I need it, without having to take care about it and having to rewrite functions whenever I need it ! That brings me to the subject of my post (yes ! finally !) PART III. A Psyco equivalent for Arc : is it easy, possible or unconceivable ? The goal of Python Psyco is quite the same as mine. Take a bunch of Python code. It's elegant but slow, particularily when you calculate numbers in loops, as, when it sees an innocent "+" operator, it does dynamic dispatching, type checking of arguments and all that stuff, taking 95% of the time checking or are fine adding numbers. Yes, I am, thanks for asking ! There comes Psyco. You are calling that fib function with a number. Hmm, OK, let's generate a particular function for when fib gets called with a number. So, n is a number, right ? Then, that call n < 2 is just comparing numbers, let's inline it. n -1 and n - 2 are numbers, too, etc. Now, every time fib is called, we look at its arg. If it is a number, the inlined, faster version is called. Otherwise, the more general version is (but it is never the case). Congratulations, you won a 50 x speed factor. How hard is it to do such a tool for Arc ? I'd like a function named, say, arco, doing this : everytime the function is called with different arg types, generate an optimized version of the code for that type. If you arco fib, the first time you call it with an 'int value, it will generate a version specialized on that type. As < is called with n and 2, since n is currently an int and 2 is, well, an int obviously, change the code to (n< n 2). That would generate something like that (the code could be changed again later if we call fib with, say, a string) : (def fib (n)
(if (isa n 'int)
(if (n< n 2) ; Specialized, automatically generated code
n
(n+ (fib (n- n 1)) (fib (n- n 2))))
(if (< n 2) ; Original code
n
(+ (fib (- n 1)) (fib (- n 2))))))
What are the gotchas there ? + could have been modified, n could have been changed by a function call inside the optimized function (not possible here, but you never know), many other things I guess... Does this mean my dream is an impossible one ? I think these potential traps exist in Python too, however it does work... But if I'm not wrong, the very optimizing Scheme compilers require the code to be encapsulated, i.e., it has to be a closed world : see Stalin or Ikarus. Even mzscheme's JIT is only working when you enclose the code inside a module where it knows you didn't redefine + (that's why my nfib is faster than mzscheme's version ran from the REPL, btw).What's your advice / opinion / feeling ? Oh, and thanks for reading all that ! |