Arc Forumnew | comments | leaders | submitlogin
5 points by waterhouse 5018 days ago | link | parent

Hi, finally replying to this thread. So, I, a long time ago, found that the + function was allocating memory every time it was called; this a) made things slower than they had to be (according to my old thread, incrementing a variable was half as fast as it should have been), b) triggered garbage collections, which were annoying pauses at the REPL, and c) triggered garbage collections when I was doing, like, (time:repeat 100000 <test procedure>) (note that "repeat" is implemented with "for", which uses +).

The problem with (c) is that it made the time to completion rather variable (depending on whether a GC happened, or sometimes on how many GCs happened), so the results were difficult to interpret. It interfered with my measuring and comparing the performance of my procedures.

So it annoyed me. And I found that just using the Scheme + fixed it, stopped the malloc'ing. To know that this crap was caused by a feature I didn't use and that Paul Graham had seemed to declare was a bad idea, a feature that seemed it shouldn't even be there anymore... I took it upon myself to eradicate it from the source code. And I felt better afterwards.

But now this case-lambda version of + (which is polymorphic) seems to not malloc at all, and a bit of testing by me indicated it performed approximately as well as the raw Racket +; I couldn't see any difference. (Btw, I think I perceived that Racket doesn't malloc for argument lists when you call "apply" or something, but it does malloc--probably copies the arglist--if you access it with car or cdr.)

So polymorphic + is no longer so obnoxious to me, and I no longer feel the need to eradicate it. I'd probably still prefer to use "string" to join/coerce to strings and "join" to join lists, myself, and I'd probably prefer "join" as a generic join operator. But I won't kill concatenate-+ on sight. Since overloading itself seems not to be the cause of the performance problems, I might warm to the idea... maybe overload + for my polynomials--or for matrices, I've been playing with them recently--though I've actually only multiplied them, never added them--perhaps I should overload *...



2 points by aw 5017 days ago | link

How did you measure how much memory was being allocated by the various alternatives?

-----

3 points by waterhouse 5017 days ago | link

You may find the following definitions extremely useful. (I do.) Evolved from my previous "mem-use" macro.

  ;note that "memory" = $.current-memory-use
  ; and "current-gc-milliseconds", "current-process-milliseconds"
  ; were supplied in ac.scm
  
  (= gc-msec      current-gc-milliseconds
     process-msec current-process-milliseconds)
  
  (mac utime body
    (w/uniq (gtime ggc gmem)
      `(with (,gtime (msec) ,ggc (gc-msec) ,gmem (memory))
         (do1 (do ,@body)
              (prn "time: " (- (msec) ,gtime)
                   " gc: " (- (gc-msec) ,ggc)
                   " mem: " (- (memory) ,gmem))))))
  
  (= time utime)
You might expect some memory or time overhead just from using this, but in fact there doesn't seem to be any:

  arc> (time:- 1 2)
  time: 0 gc: 0 mem: 0 ;zero overhead
  -1
  ;now I copy the old definition of + from ac.scm to clipboard
  arc> (= + (eval:list '$ (read:pbpaste))) 
  #<procedure>
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 352 ;probably from JIT-compiling +
  3
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 64
  3
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 64 ;by now this is clearly the usual mem-use
  3
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 64
  3
  arc> (= + $.+) ;Racket +
  #<procedure:+>
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 0
  3
  arc> (time:+ 1 2)
  time: 0 gc: 0 mem: 0
  3
  ;now I copy this definition to clipboard:
  (define (ar-+2 x y)
    (cond ((char-or-string? x)
           (string-append (ar-coerce x 'string) (ar-coerce y 'string)))
          ((and (arc-list? x) (arc-list? y))
           (ac-niltree (append (ar-nil-terminate x) (ar-nil-terminate y))))
          (#t (+ x y))))
  arc> (eval:list '$ (read:pbpaste))
  #<void>
  arc> (time ($.ar-+2 1 2))
  time: 0 gc: 0 mem: 416 ;JIT
  3
  arc> (time ($.ar-+2 1 2))
  time: 0 gc: 0 mem: 0 ;so it allocates zero memory
  3
  arc> (time ($.ar-+2 1 2))
  time: 0 gc: 0 mem: 0
  3
Now for more playing around. A Racket loop that counts to 1 million.

  arc> (time:$:let loop ((n 0)) (if (> n 1000000) 't (loop (+ n 1))))
  time: 3 gc: 0 mem: 760
  t
  arc> (time:$:let loop ((n 0)) (if (> n 1000000) 't (loop (+ n 1))))
  time: 5 gc: 0 mem: 920 ;kinda weird, it alternates 760 and 920
  t
  ;now up to 10 million
  arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (+ n 1))))
  time: 36 gc: 0 mem: 760
  t
  ;now we test ar-+2
  arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
  time: 1020 gc: 0 mem: 1096
  t
  arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
  time: 1019 gc: 0 mem: 776
Apparently it makes a significant difference (30x) inside a tight Racket loop. For fun, let's try the unsafe ops too.

  arc> ($:require racket/unsafe/ops)
  #<void>
  arc> (time:$:let loop ((n 0)) (if (unsafe-fx> n 10000000) 't (loop (unsafe-fx+ n 1))))
  time: 28 gc: 0 mem: 760
  t
Hmm, I had an idea. Put the number check first. New definition:

  (define (ar-+2 x y)
    (cond ((number? x) (+ x y))
          ((char-or-string? x)
           (string-append (ar-coerce x 'string) (ar-coerce y 'string)))
          ((and (arc-list? x) (arc-list? y))
           (ac-niltree (append (ar-nil-terminate x) (ar-nil-terminate y))))
          (#t (+ x y))))
  
  arc> (eval:list '$ (read:pbpaste))
  #<void>
  arc> (= + $.ar-+2)
  #<procedure:ar-+2>
  arc> (time:repeat 1000000 nil)
  time: 122 gc: 0 mem: 3256
  nil
  arc> (time:repeat 1000000 nil)
  time: 121 gc: 0 mem: 1880
  nil
  arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
  time: 310 gc: 0 mem: 776
  t
  arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
  time: 323 gc: 0 mem: 936
  t
What, now the Arc loop goes faster than it did with the Racket +. Probably because this function assumes two arguments, while Racket + assumes any number. And now the Racket loop is only 10x slower than with Racket +. (I expect Racket knows how to optimize its own 2-argument +.) By the way, in case you think the Arc loop goes faster than Racket, note that the Arc loop goes to 1 million and the Racket loop to 10 million; Racket is here going 3x as fast as Arc (and if Racket used Racket +, it'd be going 30x as fast).

Oh, um, I should test the case-lambda thing. I copy to clipboard my definition of + from the Optimizations page:

  arc> (cp)
  #<procedure:zz>
  arc> (time:repeat 1000000 nil)
  time: 125 gc: 0 mem: 3096
  nil
  arc> (time:repeat 1000000 nil)
  time: 128 gc: 0 mem: 2200
  nil
Virtually no difference from the raw ar-+2. Just for reference, so I can be sure ar-+2 only goes faster than Racket + in the Arc loop because ar-+2 effectively declares that it takes only two arguments:

  arc> (= + ($:lambda (x y) (+ x y)))
  #<procedure:zz>
  arc> (time:repeat 1000000 nil)
  time: 115 gc: 0 mem: 2776
  nil
  arc> (time:repeat 1000000 nil)
  time: 114 gc: 0 mem: 2040
  nil
Good, I'm not going crazy. Anyway, that final version of ar-+2 is my official recommendation.

Oh, incidentally, this "time" thing (though I think it was just "mem-use" back then) also helped me figure out that "<", ">", and "is" were allocating memory (provoking my work behind the Optimizations page). For example, I found that (no x) malloc'd; I eventually realized that this is because "is" mallocs, and (no x) is defined as (is x nil).

Also, here's some further relevant usage of "time":

  arc> (time:no:= xs (n-of 1000000 0))
  time: 2351 gc: 331 mem: 32513488
  nil
  arc> (time:no:zap nrev xs)
  time: 152 gc: 0 mem: 632 ;hell yeah, nrev works well
  nil
  arc> (time:no:zap rev xs)
  time: 254 gc: 0 mem: 32000584
  nil
  ;now here's when it GC's
  arc> (time:no:zap rev xs)
  time: 371 gc: 117 mem: 31824248
  nil
  arc> time:len.xs ;this is my fixed "len"
  time: 8 gc: 0 mem: 0
  1000000
  ;now, for comparison, let's see the old definition of "len" from ac.scm
  arc> (= len (eval:list '$ (read:pbpaste)))
  #<procedure>
  arc> time:len.xs
  time: 461 gc: 261 mem: 68936656 ;OMG this is ridiculous
  1000000

-----