Well, I only have experience of writing direct-style compilers, not CPS-style ones, so my advice needs to be adapted.
But from mechanism of the current arc2c output you showed above, I see many places for improvement:
1)In a function:
(fn (x y z ...) (g A B C D ...)),
if B doesn't rely on x, C doesn't rely on x and y, D doesn't rely on x, y and z...etc, the calling function could avoid copying elements to the bottom. Instead, it moves the stack pointer to the bottom first, and then pushes the arguments.
2)For functions having no environments, we don't have to push a full closure, we just have to push pc.
3)For known functions, we just do a C goto jump not to the jump label, but to the (case n), because C cases are in fact labels.
Finally, in my opinion, a CPS-style compiler is no longer a better choice nowadays. It complicates the source, the debugging information and the (human) analysis of the program structure. Since we are already using a separate stack that is different to C's, continuations can be implemented in direct-style compilers as easily as in CPS-style ones. And codegen for direct-style compilers is just slightly more difficult, which isn't an issue. In addition, a naive direct-style compiler performs much better than a naive CPS-style one. The latter needs a source simplifying step to eliminate unnecessary closures and function calls produced by CPS conversion.
1) personally I think this is a rare case, but I could be wrong
2) arc2c closures are very lightweight: it's just a simple array of obj(s), with the first obj being the pc. So in effect for functions having no environment, we are pushing a pointer to the pc.
That said, closures are also used to represent functions that can be passed around. Unfortunately closures are currently untyped, so we expect the current closure style to be changed.
Also we need to support the possibility that a "function" being called isn't really a function: after all table syntax is just (tb key). And this is perfectly valid Arc:
(let sometable (table)
(each k lst
(= sometable.k (generate-something k)))
(map sometable ; yes, we're passing a table as if it were a function!
foolst))
3) I was actually thinking of this too, although I haven't gotten around to it.
re: CPS: I wouldn't really know. Me, I'm just hacking around at the transformations before the CPS and Closure conversions. Because of the somewhat modular construction of arc2c, in theory you could write a drop-in replacement for CPS and Closure conversions, as well as code generator, and we can then put either CPS or the direct style as options, maybe.
1)It's not a rare case. It's important for speed improvement for most of useful programs. For example, map & foreach, which are used quite often, can be optimized by not copying data on stack.