Arc Forum | I tried the following modification on 'many: (def many (parser) "Parser is ...

Arc Forum

1 point by almkglor 6653 days ago | link | parent

I tried the following modification on 'many:

  (def many (parser)
    "Parser is repeated zero or more times."
    (fn (remaining) (many-r parser remaining nil nil nil nil)))

  (let lastcdr (afn (p) (aif (cdr p) (self it) p))
    (def many-r (parser li acc act-acc acctl act-acctl)
      (iflet (parsed remaining actions) (parse parser li)
             (do
               (when parsed
                 ; edit: necessary, it seems that some of the other
                 ; parsers reuse the return value
                 (zap copy parsed)
                 ; end of edit
                 (if acc
                     (= (cdr acctl) parsed)
                     (= acc parsed))
                 (= acctl (lastcdr parsed)))
               (when actions
                 ; edit: necessary, it seems that some of the other
                 ; parsers reuse the return value
                 (zap copy actions)
                 ; end of edit
                 (if act-acc
                     (= (cdr act-acctl) actions)
                     (= act-acc actions))
                 (= act-acctl (lastcdr actions)))
               (many-r parser remaining
                       acc act-acc acctl act-acctl))
             (return acc li act-acc))))

Basically instead of using join, I used a head+tail form of concatenating lists. It seems to work, and the optimization above seems to drop the test:

   ((many anything) (range 1 1000)))

down to 27 msec (edited: 58msec) on my machine (it was about 7350msec on the older version)

What are your thoughts? The code now looks unprintable. Also, I'm not 100% sure of its correctness.

UPDATE: yes, it's not correct, however the edited version above seems to work now. Rendering of my "difficult" page has dropped to 1100msec.

2 points by almkglor 6653 days ago | link

I've since added a 'tconc facility to Anarki. Basically tconc encapsulates away the head+tail form of list catenation; a single cons cell is used with car==head and cdr==tail.

The head of the list is the start of the list, while the tail of the list is the last cons cell:

  cons
    O->cdr
    v
    car

  the list (1 2 3 4 5):
  O->O->O->O->O->nil
  v  v  v  v  v
  1  2  3  4  5

  the tconc cell for the above list:
  tconc cell
  O-----------+
  | head      | tail
  v           v
  O->O->O->O->O->nil
  v  v  v  v  v
  1  2  3  4  5

'tconc creates a new cell and modifies the tconc cell to repoint the tail to the new tail. You can extract the currently concatenated list by using 'car on the tconc cell.

The diff between my version of treeparse and yours is now:

  --- treeparse.arc     2008-03-21 11:59:13.000000000 +0800
  +++ m_treeparse.arc   2008-03-22 23:00:51.000000000 +0800
  @@ -23,4 +23,6 @@
   ; Examples in "lib/treeparse-examples.arc"
   
  +(require "lib/tconc.arc")
  +
   (mac delay-parser (p)
     "Delay evaluation of a parser, in case it is not yet defined."
  @@ -112,12 +114,12 @@
   (def many (parser)
     "Parser is repeated zero or more times."
  -  (fn (remaining) (many-r parser remaining nil nil)))
  +  (fn (remaining) (many-r parser remaining (tconc-new) (tconc-new))))
   
   (def many-r (parser li acc act-acc)
     (iflet (parsed remaining actions) (parse parser li)
            (many-r parser remaining
  -                 (join acc parsed) 
  -                 (join act-acc actions))
  -         (return acc li act-acc)))
  +                 (nconc acc (copy parsed))
  +                 (nconc act-acc (copy actions)))
  +         (return (car acc) li (car act-acc))))
   
   (def many1 (parser)

edit: note that use of 'tconc/'nconc is slightly slower than explicitly passing around the tails. For the test, it runs at 79 msec on my machine (explicit passing ran at 58msec); this is expected since we must destructure the cons cell into the head and tail of the list under construction. Would it be perhaps better to use a macro to hide the dirty parts of the code in explicit passing of hd and tl?

-----

1 point by raymyers 6653 days ago | link

Nice optimization. I'm not so sure about the naming of nconc, though. Although it is used for a similar purpose as the traditional CL nconc, I would expect anything called nconc to behave like this:

  (def last-list (li)
    (if (or (no li) (no (cdr li))) li
        (last-list (cdr li))))

  (def nconc (li . others)
    "Same behavior as Common Lisp nconc."
    (if (no others) li
        (no li) (apply nconc others)
        (do (= (cdr (last-list li)) (apply nconc others))
            li)))

-----

1 point by almkglor 6653 days ago | link

Ah crick; let me change that to lconc, that was what I was thinking ^^

I picked up 'tconc and lconc from Cadence Skill; see:

http://www.ece.uci.edu/eceware/cadence/sklanguser/chap8.html...

Funny that CL doesn't actually have this facility ^^

Will rename this soon, in the meantime, do you think this optimization is worth putting in treeparse?

-----

4 points by raymyers 6653 days ago | link

>> do you think this optimization is worth putting in treeparse?

Certainly. At the moment you are probably 50% of the treeparse user base, so it needs to be fast enough for your use case :)

I admit that efficiency wasn't a big thought when I first wrote treeparse (besides avoiding infinite loops -- hopefully those are gone now...). I fondly remember my CL optimization days... we've gotta make ourselves one of those nifty profilers for Arc.

-----

3 points by almkglor 6653 days ago | link

>> we've gotta make ourselves one of those nifty profilers for Arc.

True, true. I was optimizing random functions in treeparse, but didn't get a boost in speed until you suggested optimizing 'many.

-----

1 point by almkglor 6653 days ago | link

It seems that 'many is the low-hanging fruit of optimization. I've since gotten an 8-paragraph lorem ipsum piece, totalling about 5k, which renders in 3-4 seconds (about around 3800msec).

Hmm. Profiler.

I'm not 100% sure but maybe the fact that nearly all the composing parsers decompose the return value of sub-parsers, then recompose the return value, might be slowing it down? Maybe have parsers accept an optional return value argument, which 'return will fill in (instead of creating its own) might reduce significantly the memory consumption (assuming it's GC which is slowing it down)?

mockup:

  (def parser-function (remaining (o retval (list nil nil nil)))
    (....)
    (return parsed li actions retval))

  (def many-r (parser remaining acc act-acc (o retval (list nil nil nil)))
      (while (parse parser remaining retval)
        ; parsed
        (lconc acc (copy (car retval)))
        ; actions
        (lconc act-acc (copy (car:cdr:cdr retval)))
        (= remaining (car:cdr scratch)))
      (return (car acc) remaining (car act-acc) retval))

Removing 'actions might help too - we can now use just a plain 'cons cell, with car == parsed and cdr == remaining.

-----

1 point by raymyers 6653 days ago | link

I tried taking out actions for the heck of it. Removing them yields roughly a 30% speed increase on this benchmark:

  (time (do ((many anything) (range 1 5000)) nil))

Using the following method, we can keep actions as a feature but still get the 30% speedup when we don't use them.

  (def many (parser)
    "Parser is repeated zero or more times."
    (fn (remaining) (many-r parser remaining (tconc-new) nil)))

  (def many-r (parser li acc act-acc)
    (iflet (parsed remaining actions) (parse parser li)
           (many-r parser remaining
                   (lconc acc (copy parsed))
                   (if actions (join act-acc actions) act-acc))
           (return (car acc) li act-acc)))

Not bad, but still not as fast as we'd want for processing wiki formatting on the fly...

ed: Yes. act-acc, not (car act-acc).

-----

1 point by almkglor 6653 days ago | link

Hmm. If you remove 'actions, how about also trying to use just a single 'cons cell:

  (iflet (parsed . remaining) (parse parser remaining)
    ...)

  (def return (parsed remaining)
    (cons parsed remaining))

If the speed increase is that large on that testbench, it might very well be due to garbage collection.

This might be an interesting page for our problem here ^^

http://www.valuedlessons.com/2008/03/why-are-my-monads-so-sl...

-----

1 point by raymyers 6653 days ago | link

Tried changing the list to a single cons cell. I did not see any additional performance boost.

-----

1 point by almkglor 6653 days ago | link

  (def many-r (parser li acc act-acc)
    (iflet (parsed remaining actions) (parse parser li)
           (many-r parser remaining
                   (lconc acc (copy parsed))
                   (if actions (join act-acc actions) act-acc))
           (return (car acc) li (car act-acc))))

s/(car act-acc)/act-acc maybe?

Personally I don't mind losing 'actions, it does seem that 'filt would be better ^^.

-----

1 point by almkglor 6653 days ago | link

I tested this on my 8-paragraph 5000-char lorem ipsum page, and the run dropped down to about 3400msec (from 3800 msec).

Hmm. Not sure where the slow down is now ^^

I've tried my "retval" suggestion and it's actually slower, not faster. So much for not creating new objects T.T;

-----

1 point by almkglor 6652 days ago | link

Arrg, I've built a sort-of profiler for the wiki, it's on the git, to enable just look for the line:

  (= *wiki-profiling-on nil)

And change it to t, then reload Arki to turn it on. Then use (*wiki-profile-print) to print out the profile report.

Note that turning on profiling increases time by a factor of > 5. Don't use unless desperate.

Anyway a sample run - this page was rendered in about 800msec without profiling, with profiling it took about 5150msec:

  bold: 305
  bolded-text: 914
  nowiki-e: 31
  open-br: 305
  seq-r: 2681
  italicized-text: 793
  many-format: 4830
  plain-wiki-link: 841
  alt-r: 4555
  nowiki-text: 584
  nowiki: 184
  joined-wiki-link: 413
  ampersand-coded-text: 486
  ampersand-codes: 171
  italics: 128
  many-r: 4829
  formatting: 4616
  close-br: 39

Note that the timing will not be very accurate or particularly useful IMO, since it doesn't count recursion but does count calls to other functions. Sigh. We need a real profiler ^^

-----

2 points by raymyers 6652 days ago | link

>> Sigh. We need a real profiler ^^

Maybe this'll help. http://www.arclanguage.org/item?id=5318

-----

1 point by raymyers 6652 days ago | link

Knowing a bit about the call hierarchy, maybe we can squeeze a bit more knowledge out of that. Here's what seems to be going on:

    formatting               4616
    [-] alt-r                4555
     | bolded-text            914
     | plain-wiki-link        841
     | italicized-text        793
     | nowiki-text            584
     | ampersand-coded-text   486
     | joined-wiki-link       413

-----

1 point by almkglor 6652 days ago | link

Hmm, then the total time of alt-r's children is 4031, leaving 524 msec in alt-r itself.

My test page has quite a bit of bolded text (for testing), so I suppose it's the reason why bolded-text is the highest. Hmm.

Anyway I'm thinking of adding the following parser to the top of the big 'alt structure in formatting:

  (= plain-text
    (pred [or (alphadig _) (whitec _) (in _ #\. #\,)] anything))

  (= formatting
    (alt
      plain-text
      ...))

Hmm. It seems we can't squeeze much performance out of 'alt, I can't really see a way of optimizing 'alt itself, so possibly we should optimize the grammar that uses 'alt.

-----

1 point by almkglor 6651 days ago | link

Did something highly similar to this, it reduced my 8-paragraph lorem ipsum time from about 3200msec to 2100msec.

-----