Arc Forumnew | comments | leaders | submitlogin
3 points by almkglor 6122 days ago | link | parent

nice ^^. I'll be trying to hack this into Arki's parser and see how things go.

Another parsing approach I've seen floating around is to use continuations - have a "pass" continuation and a "fail" continuation. It would still approximately be using combinators, but basic parsers would look more like this:

  (def lit (v)
    (fn (remaining pass fail)
      (if (caris remaining v) (pass (list:car remaining) (cdr remaining) nil)
                              (fail))))

  (def alt-l (parsers)
    (if parsers
      (withs ((parser . parsers) parsers
              next (alt-l parsers))
        (fn (remaining pass fail)
          (parser remaining
            pass
            (fn ()
              (next remaining pass fail)))))
      (fn (_ _ fail) (fail))))
Hmm.


1 point by raymyers 6122 days ago | link

Interesting. Wouldn't the continuation approach be slower though?

-----

1 point by almkglor 6122 days ago | link

Not sure; might actually be faster, since supposedly scheme is much more continuation-friendly than lazy-evaluation-friendly. Or not, I'm not sure

Anyway I haven't managed to wrap my head around how to implement 'seq in a continuation style; also, there's the problem of how 'cut on a parser would work in a nonlazy language - supposedly as soon as the parser given to 'cut consumes anything, 'cut will somehow dispose the 'fail continuation. I'd assume 'cut will be nearer to cut-seq, which would approximately be a cut:seq -

  (def cut-seq (parser . rest)
    (let next (seq rest)
      (fn (remaining pass fail)
        (parser
          ; since the name "remaining" is reused here,
          ; a smart compiler should be able to
          ; dispose of the outer function's remaining
          (fn (parsed remaining actions)
            (next remaining
                  (fn (parsed2 remaining actions2)
                      (pass (+ parsed parsed2)
                            remaining
                            (+ actions actions2)))
                  [err "parse error!"]]))
          fail))))
Since fail is disposed upon continuing to next (which is the rest of the sequence) everything which fail holds could then be reclaimed. 'alt's pass is its own pass, but its 'fail contains a closure.

-----

1 point by almkglor 6122 days ago | link

The nice thing here too is that we completely lose the 'return structure. The current code keeps composing and decomposing it anyway. Hmm:

  (def filt (fun parser)
    (fn (remaining pass fail)
      (parser remaining
        (fn (parsed remaining actions)
          (pass (fun parsed) remaining actions))
        fail)))
If we assume that the base compiler isn't very optimizing (i.e. captures closures when it doesn't need to) we could use helper functions for new 'pass functions:

  (let new-pass
       (fn (fun pass)
         (fn (parsed remaining actions)
           (pass (fun parsed) remaining actions)))
    (def filt (fun parser)
      (fn (remaining pass fail)
        (parser remaining
                (new-pass fun pass)
                fail))))

-----

2 points by almkglor 6121 days ago | link

I've just built a continuation-approach parser combinator library, pushed on the git. It's currently faster on 'many and 'seq, but slower on 'alt. It's considerably faster on Arki's use case, however (about 33% less time).

-----

2 points by raymyers 6121 days ago | link

From the comments: "...continuation-passing style, which is hard on the eyes..."

You weren't kidding :) However, I congratulate you on your superior understanding of continuations.

-----

1 point by almkglor 6121 days ago | link

Thanks. And #2 wasn't kidding either.

That said I think the main issues slowing down the monadic treeparse are:

1. Use of 'return. I'm not sure if mzscheme can optimize this away (it's just being used as a multiple-value return, and is immediately destructured by the calling function). Heck, I'm not even sure Arc destructuring is in a form optimizable by Scheme. Using continuations, this tuple becomes the parameters of the call to the continuation, and that I'm somewhat convinced that Scheme (all Schemes) are capable of optimizing. Also, by avoiding explicit destructuring, we can be reasonably sure that Arc gives the parameter list plain to Scheme (not that I've actually studied ac.scm much).

2. Copying of 'parsed. Unfortunately, we have to copy 'parsed, because of 'actions! This is because 'sem attaches the 'parsed part to the 'actions field (via closure), and so we can't reuse this 'parsed part (alternatively we could just copy 'parsed inside 'sem, instead of copying on 'many and 'seq). IMO: drop 'actions ^^. Copying 'parsed, I suspect, may be the single most time-consuming part of the monadic treeparse.

3. Using a 'tconc cell, while neater, is less efficient (about 20% IIRC) than explicit and separate head+tail variables. My goal in optimizing treeparse was to maintain legibility while adding speed; my goal in cps_treeparse was to optimize, legibility be damned^H^H^H^H^H^H^H consigned to hell.

4. There's a few more bits and pieces that may be optimizable - for example, most of my 'filt usage is (filt [map something _] parser). If we are sure that 'parsed will not need to be reused (assured by dropping 'actions, or giving the responsibility of copying 'parsed to 'sem) then we can define a (filt-map something parser) which returns the same list as 'parser, but with their car's transformed by 'something:

  ;monadic
  (def filt-map (fun parser)
    (fn (remaining)
      ; no actions!
      (iflet (parsed remaining) (parser remaining)
        ((afn (p)
           (when p
             (zap fun (car p))
             (self cdr p)))
         parsed)
        (return parsed remaining))))

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))
Using filt-map would reduce consing, and especially in the cps case would remove the scan through 'parsed to find the 'parsedtl.

Further optimizations abound: another common idiom I found in my wiki parser was (filt nilfn (seq-str "something")), which basically ignores the 'parsed return value of 'seq, It may be possible to create a variant of 'seq, 'seq-nil, which wouldn't bother catenating the 'parsed return values of subparsers.

So no, I don't think continuation-passing per se causes most of the speed difference, except possibly for eliminating the 'return structure.

Another possible issue is the use of a list-like structure. 'scanner-string also conses on each 'cdr, and the total memory usage actually exceeds (coerce "string" 'cons) - it's just that scanners are sexier ^^.

For this case it may be possible to create a variant of the CPS style which takes a tuple of (string, index, end) - as explicit and separate variables of course, so that underlying scheme has a shot at optimizing it away - and use indexing, which I suspect is faster on strings (and which scanner-string ends up doing anyway). This would remove the need for consing too.

-----

1 point by almkglor 6121 days ago | link

LOL forgot to give the parser the list to parse:

  ; cps
  (def filt-map (fun parser)
    (fn (remaining pass fail)
      (parser remaining ;oops!
        (fn (parsed parsedtl remaining)
          ((afn (p)
             (when p
               (zap fun (car p))
               (self cdr p)))
           parsed)
          (pass parsed parsedtl remaining))
        fail)))

-----