I've since added a 'tconc facility to Anarki. Basically tconc encapsulates away the head+tail form of list catenation; a single cons cell is used with car==head and cdr==tail.
The head of the list is the start of the list, while the tail of the list is the last cons cell:
cons
O->cdr
v
car
the list (1 2 3 4 5):
O->O->O->O->O->nil
v v v v v
1 2 3 4 5
the tconc cell for the above list:
tconc cell
O-----------+
| head | tail
v v
O->O->O->O->O->nil
v v v v v
1 2 3 4 5
'tconc creates a new cell and modifies the tconc cell to repoint the tail to the new tail. You can extract the currently concatenated list by using 'car on the tconc cell.
The diff between my version of treeparse and yours is now:
--- treeparse.arc 2008-03-21 11:59:13.000000000 +0800
+++ m_treeparse.arc 2008-03-22 23:00:51.000000000 +0800
@@ -23,4 +23,6 @@
; Examples in "lib/treeparse-examples.arc"
+(require "lib/tconc.arc")
+
(mac delay-parser (p)
"Delay evaluation of a parser, in case it is not yet defined."
@@ -112,12 +114,12 @@
(def many (parser)
"Parser is repeated zero or more times."
- (fn (remaining) (many-r parser remaining nil nil)))
+ (fn (remaining) (many-r parser remaining (tconc-new) (tconc-new))))
(def many-r (parser li acc act-acc)
(iflet (parsed remaining actions) (parse parser li)
(many-r parser remaining
- (join acc parsed)
- (join act-acc actions))
- (return acc li act-acc)))
+ (nconc acc (copy parsed))
+ (nconc act-acc (copy actions)))
+ (return (car acc) li (car act-acc))))
(def many1 (parser)
edit: note that use of 'tconc/'nconc is slightly slower than explicitly passing around the tails. For the test, it runs at 79 msec on my machine (explicit passing ran at 58msec); this is expected since we must destructure the cons cell into the head and tail of the list under construction. Would it be perhaps better to use a macro to hide the dirty parts of the code in explicit passing of hd and tl?
Nice optimization. I'm not so sure about the naming of nconc, though. Although it is used for a similar purpose as the traditional CL nconc, I would expect anything called nconc to behave like this:
(def last-list (li)
(if (or (no li) (no (cdr li))) li
(last-list (cdr li))))
(def nconc (li . others)
"Same behavior as Common Lisp nconc."
(if (no others) li
(no li) (apply nconc others)
(do (= (cdr (last-list li)) (apply nconc others))
li)))
>> do you think this optimization is worth putting in treeparse?
Certainly. At the moment you are probably 50% of the treeparse user base, so it needs to be fast enough for your use case :)
I admit that efficiency wasn't a big thought when I first wrote treeparse (besides avoiding infinite loops -- hopefully those are gone now...). I fondly remember my CL optimization days... we've gotta make ourselves one of those nifty profilers for Arc.
It seems that 'many is the low-hanging fruit of optimization. I've since gotten an 8-paragraph lorem ipsum piece, totalling about 5k, which renders in 3-4 seconds (about around 3800msec).
Hmm. Profiler.
I'm not 100% sure but maybe the fact that nearly all the composing parsers decompose the return value of sub-parsers, then recompose the return value, might be slowing it down? Maybe have parsers accept an optional return value argument, which 'return will fill in (instead of creating its own) might reduce significantly the memory consumption (assuming it's GC which is slowing it down)?
Note that the timing will not be very accurate or particularly useful IMO, since it doesn't count recursion but does count calls to other functions. Sigh. We need a real profiler ^^
Hmm. It seems we can't squeeze much performance out of 'alt, I can't really see a way of optimizing 'alt itself, so possibly we should optimize the grammar that uses 'alt.