Arc Forumnew | comments | leaders | submitlogin
3 points by almkglor 6116 days ago | link | parent

If you're interested, the Anarki lib/arcformat.arc is actually a start at parsing Arc syntax; we just have to change the filters so that instead of emitting html, we emit Arc objects wrapped in singleton lists. It uses treeparse.

treeparse is pretty good, and I'm sure a full Arc parser can be built on treeparse.

Looks like I need to speed up writing my treeparse tutorial ^^

As for parsing special syntax, an Arc version of ssyntax would work: we just need a symbol->string conversion (meaning full 'coerce support) and seek ~!.: in the substring.

edit: how did you handle continuations and tail call opts?



4 points by sacado 6116 days ago | link

Ah, I'll have a closer look at arcformat then. Might be good to have a reader :)

Continuations & tail-calls are treated the way Steele proposed it in the lambda papers.

The following code :

  (set f (fn (n)
     (if (< n 2)
          n
          (+ (f (- n 1)) (f (- n 2))))))

  (prn (f 30))
is first translated so as to have unique names for variables and so that primitive operations are prefixed with a % (we won't treat them the same way as other operations in the next steps) :

  (do
    (set f (fn (n@1)
      (if (%< n@1 2)
        n@1
        (%+ (f (%- n@1 1)) (f (%- n@1 2))))))

  (%prn (f 30)))
Then we perform continuation-passing-style transformation : each function call is given, as an extra parameter, the portion of code (the continuation) it will send its result to : functions do not return, they call another function with the value they calculated :

  (let ((r@5 (fn (k@6 n@1)
    (let ((r@7 (%< n@1 2)))
      (if r@7
        (k@6 n@1)
        (let ((r@11 (%- n@1 1)))
          (f (fn (r@8)
            (let ((r@10 (%- n@1 2)))
              (f (fn (r@9) (k@6 (%+ r@8 r@9))) r@10))) r@11)))))))

  (let ((r@3 (set f r@5)))
    (f (fn (r@4)
      (let ((r@2 (%prn r@4)))
        (%halt r@2))) 20)))
Finally, we perform closure conversion so as to only perform gotos (i.e., no function calls) :

  (fn nil
    (let ((r@5 (%closure
                       (fn (self@12 k@6 n@1)
                         (let ((r@7 (%< n@1 2)))
                           (if r@7
                             ((%closure-ref k@6 0) k@6 n@1)
                             (let ((r@11 (%- n@1 1)))
                               ((%closure-ref f 0)
                                 f
                                 (%closure
                                   (fn (self@13 r@8)
                                     (let ((r@10 (%- (%closure-ref self@13 2) 2)))   
  ((%closure-ref f 0) f (%closure (fn (self@14 r@9) ((%closure-ref 
  (%closure-ref self@14 1) 0) (%closure-ref self@14 1) (%+ 
  (%closure-ref self@14 2) r@9))) (%closure-ref self@13 1) r@8) 
  r@10))) k@6 n@1) r@11)))))))) (let ((r@3 (set f r@5))) 
  ((%closure-ref f 0) f (%closure (fn (self@15 r@4) (let ((r@2 (%prn 
  r@4))) (%halt r@2)))) 20))))
(Sorry for the indentation, I don't feel like indenting this by hand, but you get the idea I guess ?)

Generating C code from there is then easy. It is almost only Marc feeley's code, based on Steele's ideas a few years ago. I only changed the code so that it understands Arc code and primitives instead of Scheme's...

-----

1 point by almkglor 6116 days ago | link

I assume that for code blocks 3 and 4, 'let is the Lisp 'let, not the Arc one? (let ((var val)) ...) not (let var val ...)?

Also on code block 1 you show (prn (f 30)) but in code block 2 you seem to pass (f <continuation> 20)? I assume this is just a mistake?

I managed to understand the transformation up to code block 3, but not code block 4; what does '%closure do? Allocate heap memory for the closure? Also, what's the syntax for "goto"?

Finally: how about the equivalent C code?

LOL, maybe I should just wait for you to push it on the git and experiment with it myself.

-----

2 points by almkglor 6116 days ago | link

Aruu, okay okay I finally actually looked at the presentation docs, which I probably should have looked at first. One of the things that threw me off was 'self - I was thinking of Arc's sense of 'self as in 'afn!

The output C code looks suspiciously like assembly language to me. Perhaps we can also target a temporary assembly syntax so that we can do minimal peephole opts, such as convert stuff like PUSH(x); y = TOS(); to MOVE(x,y);. Can't wait to actually see this code on the git ^^.

P.S. Given that the transformations are (gasp!) syntactic, it might actually be possible to implement the entire compiler as (gasp gasp!) a treeparse parser (or at the very least a piped chain of treeparse parsers) ^^.

-----

2 points by sacado 6116 days ago | link

Maybe treeparse would be the right thing to use... The code is getting uglier everytime I try to add a new primitive / special form... I dunno...

As for the generated code, yes, it's a lot like a portable assembly code. There are certainly easy optimizations to perform on it, but as for now, it's working and that's a lot :)

And yes, let is the traditional one -- with tons of parens everywhere.

-----

2 points by almkglor 6116 days ago | link

I'll go through the code later to see what can be done. Certainly the AST looks representable as plain lists to me, although I haven't fully analyzed it yet.

As an aside compile-file could be restructured like the following:

  (def compile-file (filename)
    (compile-ast (parse-file filename) (+ (strip-ext filename) ".c")))
  ; to allow programmatic access
  (def compile-ast (ast dest)
    ; chain of conversions
    (let chain
         (list
           (list cps-convert "CPS-CONVERSION")
           (list closure-convert "CLOSURE-CONVERSION"))
    ; do reduction
    (let final-ast
         (reduce
           (fn (ast (f desc))
             (let new-ast (f ast)
               (prn "----------------- AST after " desc)
              (prn (source new-ast))
               new-ast))
           chain ast)
        (prn "-------------------- C Code:")
        (w/outfile f dest
          (w/stdout f
            (prn:liststr:code-generate final-code))))))
This should allow easy insertion of any steps (e.g. optimization steps) along the way.

In fact the chain list should probably be better use `(,), so that we can support flags or suchlike for optimizations:

   `(
       (,cps-convert "CPS-CONVERSION")
       ,@(if optimize
             `((,optimize-after-cps "CPS-OPTIMIZATION")))
       (,closure-convert "CLOSURE-CONVERSION")
       ,@(if optimize
             `((,optimize-after-closure "CLOSURE-OPTIMIZATION"))))

-----

2 points by binx 6114 days ago | link

Just leave the peephole stuff to gcc, it almost always does better than handcoded optimizer.

The CPS transformed code can be arbitrarily inlined, so a simple inliner without flow analysis can give you much efficiency for free.

-----

2 points by almkglor 6114 days ago | link

And if the target isn't gcc?

For that matter my concern is with the expansion of PUSH and POP:

   PUSH(x); y = POP();

   =>

   *sp++ = x; y = *--sp;
Can gcc peephole the above, completely eliminating the assignment to the stack (which is unnecessary in our case after all)?

   y = x; //desired target
Without somehow informing gcc to the contrary, gcc will assume that writing to * sp is significant, even though in our "higher-level" view the assignment to * sp is secondary to transferring the data between two locations.

-----

4 points by sacado 6114 days ago | link

Actually, I tried the above (tuning generated code so as to change something like :

  BEGIN_JUMP(3); PUSH(LOCAL(5)); PUSH(LOCAL(6)); PUSH(LOCAL(7)); END_JUMP(3);
to its semantic but obviously much faster equivalent :

  memcpy (stack, stack + 5, sizeof(obj) * 3); sp = stack + 3; END_JUMP(3);
and

  PUSH(x); if(POP)
to

  if (x)
Well, with full optimizations on gcc (-O3), it doesn't change anything (at least in execution time, I didn't compare generated machine codes). Wow, gcc seems really clever. Now that I know how hard it is to implement a compiler, I can only applaud :)

-----

1 point by almkglor 6114 days ago | link

WOW. gcc must be real good then ^^.

-----