Arc Forum | shader's comments

Arc Forum

new | comments | leaders | submit | shader's comments

1 point by shader 6448 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

Are there any plans in arc2c or SNAP to reify pointers and direct memory addressing in arc? I realize that isn't very "lispy" but it would be way cool to be able to do low level byte mashing/memory management in an otherwise high level language.

pointers + macros = really cool or really dangerous.

Maybe we should go for something a bit like erlang's bit syntax instead.

-----

1 point by almkglor 6448 days ago | link

Err, no. Hehe. Although I believe some Lisps actually implemented their memory managers (GC!) in Lisp itself ^^, but I think it was back when Lisp was more imperative (i.e. when (prog ...) with labels was more popular).

Hmm. Interesting idea, although I can't quite imagine how to do this.

-----

1 point by shader 6437 days ago | link

"interesting" doesn't convey much information. Does it mean positive interesting, as in you'd be interested in trying it? Or does it mean negative interesting, as in your interested in why I would want such a crazy feature in the first place? :)

-----

1 point by almkglor 6437 days ago | link

> Does it mean positive interesting, as in you'd be interested in trying it?

This one. If it were the negative interesting, I'd ask you directly "what for?" ^^

Hmm. Although it might be difficult to do with a copying GC.

As an aside, I'm thinking of exposing the "sharedvar" objects into arc, basically they would be called the "container" type. An arc-on-mzscheme implementation would be:

  (require "lib/settable-fn.arc")
  (def container ((o val))
    (add-attachments
      '= (fn (v) (= val v))
      (annotate 'container
        (fn () val))))

Basically it would be used this way:

  (= foo (container 0))
  (foo)
  => 0
  (= (foo) 42)
  (foo)
  42

Basically assigning to it would be equivalent to setting the pointer, and (foo) would be equivalent to (* foo) in C.

I thought it might be useful in a C FFI (when I get there in, say, T=inf). If a C function accepts a int*, for example, where it would put an int, you would pass in a container, which would automagically get the int.

-----

1 point by shader 6437 days ago | link

Sounds like a reasonable syntax. Would the value in a container in a container be accessed with ((foo))? It fits, but it might get old after a while.

Also, how similar would these "containers" be to pointers? It doesn't seem like it would be as useful, because I doubt you can increment/decrement a container.

So, what are the problems we might encounter if we actually use this?

-----

1 point by almkglor 6437 days ago | link

The value pointed to by the container would be accessed with (foo), just a single layer of parens. The container's pointer would be mutated as (= (foo) ...) though. Oh well.

Edit: Oh, you said "container in a container". Yes, ((foo)) it is.

> It doesn't seem like it would be as useful, because I doubt you can increment/decrement a container

This functionality would be in a scanner, although a scanner, technically, would be just an iterator.

For instance, I fully intend to implement string scanners and input file scanners on the C++-side in SNAP; they would be somewhat "safe" wrappers around C++ iterators, which are just a generalization on the pointer. so (car foo) would be *foo and (cdr foo) would be (foo + 1).

> what are the problems we might encounter if we actually use this?

None at all, if you're talking about just containers and scanners. A container is just a reference to an Arc object. A scanner would just be an Arc-safe wrapper around an iterator.

-----

1 point by almkglor 6387 days ago | link

http://okmij.org/ftp/Scheme/pointer-as-closure.txt

-----

1 point by shader 6449 days ago | link | parent | on: Rethinking macros

Maybe we could use { args | body } ? I don't think the braces are taken.

Now, maybe that's a bad idea; instead, we could redefine the brackets, so that a pipe divides args and body, and if it doesn't have a pipe, it defaults to binding _ ? I don't know how hard that would be, or some variation on the concept, but it would be a bit shorter than (fn (args) (body)), if you don't want to type that all the time.

And how exactly does 'w/hof work, as defined so far? And if it's "standard" now, why not just implement it?

-----

3 points by almkglor 6449 days ago | link

Since Anarki defines [ ... ] as (make-br-fn ...), it's actually possible to have the syntax:

  [params || body]

Which would be:

  (make-br-fn (params || body))

(The double bar is needed because a single | is reserved for weird symbols)

By simply redefining make-br-fn, you can redefine the syntax of [ ... ] to an extent

-----

1 point by shader 6449 days ago | link

I wondered if redefining [...] might be possible. It seems to me that the new double bar syntax is practically a super-set of the old one: if it doesn't have the double bar, just treat it like the old bracket function and define _ to be the argument; otherwise use the argument list provided. Including an empty list, I would hope.

Any word on how hard that would be?

-----

1 point by almkglor 6449 days ago | link

  (let old (rep make-br-fn)
    (= make-br-fn
       (annotate 'mac
         (fn (l)
           (if (some '|| l)
               (do
                 your-stuff)
               (old l))))))

Be careful of supersets: someone's code might unexpectedly break ^^

-----

1 point by shader 6449 days ago | link

Isn't that to be expected in an evolving open source language :)

Do you think || is the best choice, or something else?

-----

2 points by rkts 6449 days ago | link

I think it's a bad choice, personally. I'm not crazy about the single pipe either, but || is awful.

Tangent: this may be a dumb question, but do we really need the pipe character for symbols? I know I've never used it. Why not disallow spaces (and the like) in symbols, and free the pipe for new syntax?

-----

2 points by shader 6449 days ago | link

If you don't like the pipe, then recommend something :)

Other possibilities, in no particular order:

  [ # ]
  [ - ]
  [ = ]
  [ -> ]
  [ : ]
  [ => ]
  [ > ]
  [ ~ ]
  [ % ]
  [ ! ]
  [ $ ]
  [ ^ ]
  [ & ]
  [ * ]
  [ @ ]
  [ + ]
  [ | ]
  [ || ]
  [ ? ]

Most of those are either bad looking or already taken. Anything stand out as a good / ok / not bad choice?

-----

2 points by almkglor 6449 days ago | link

:, =>, and -> don't look bad.

# can't be redefined

Let's try some mockups:

  [ a b c :
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [: (thunk this)]

  [ a b c ->
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [-> (thunk this)]

  [ a b c =>
    (/ (+ (- b) (sqrt (+ (* b b) (* 4 a c))))
       (* 2 a))]

  [=> (thunk this)]

-----

1 point by shader 6435 days ago | link

So, did we ever make a decision about this? Does someone who knows more than I do about this want to implement it?

Also, is there a way to compose or nest these lambda shortcuts? Or would that make this almost impossible to implement?

-----

1 point by almkglor 6435 days ago | link

Nesting doesn't seem impossible: the reader, I think, will handle nesting as:

  [foo [bar]]

  (make-br-fn (foo (make-br-fn (bar))))

As for implementation, it's easy:

  (given ; this gives us access to the old
         ; implementation of [] syntax; it
         ; is used when we don't find the
         ; separator
         old (rep make-br-fn)
         ; use a variable to easily change
         ; the separator
         separator ': ;for experimentation
    (= make-br-fn
       ; a macro is just a function that has
       ; been tagged (or annotated) with the
       ; symbol 'mac
       (annotate 'mac
         ; the reader reads [...] as
         ; (make-br-fn (...))
         (fn (rest)
               ; find the separator
           (if (some separator rest)
               ; note the use of the s-variant givens
               ; the "s" at the end of the name of givens
               ; means that the variables are specifically
               ; bound in order, and that succeeding variables
               ; may refer to earlier ones
               (givens ; scans through the list, returning
                       ; an index for use with split
                       ; (no built-in function does this)
                       scan
                       (fn (l)
                         ((afn (l i)
                            (if (caris l separator)
                                i
                                (self (cdr l) (+ i 1))))
                          l 0))
                       ; now do the scan
                       i (scan rest)
                       ; this part destructures a two-element
                       ; list
                       (params body)
                         ; used to get around a bug in split
                         (if (isnt i 0)
                             (split rest i)
                             (list nil rest))
                 ; it just becomes an ordinary function
                              ; body includes the separator,
                              ; so we also cut it out
                 `(fn ,params ,@(cut body 1)))
               ; pass it to the old version of make-br-fn
               ; if a separator was not found
               (old rest))))))

Edit: tested. Also reveals a bug in split: (split valid_list 0) == (split valid_list 1)

  (= foo [ i :
           [ : i]])

  ((foo 42))

edit2: p.s. probably not really easy much after all^^. As a suggestion, (help "stuff") is good at finding stuff.

edit3: added comments

-----

1 point by shader 6434 days ago | link

Hmm. It doesn't seem to work with the older version. If I try ([+ _ 10] 3) it complains: "reference to undefined identifier: ___"

It used to complain "#<procedure>: expects 1 argument, given 3: + _ 10", but something seems to have changed between updates :)

-----

1 point by almkglor 6434 days ago | link

Have you tried restarting Arc and then repasting the code?

Probably some dirt left from older versions ^^

-----

1 point by shader 6449 days ago | link

I agree, those aren't bad.

I think that out of those, : makes the most sense. They all make logical sense with arg lists, but : looks the best without any.

-----

1 point by almkglor 6449 days ago | link

> Tangent: this may be a dumb question, but do we really need the pipe character for symbols? I know I've never used it. Why not disallow spaces (and the like) in symbols, and free the pipe for new syntax?

Wanna start implementing a reader for Arc?

-----

1 point by rkts 6448 days ago | link

Looks like this is configurable in MzScheme. Do

  (read-accept-bar-quote #f)

-----

1 point by rkts 6449 days ago | link

How would you write a function with no arguments?

-----

1 point by shader 6449 days ago | link

leave the part before the pipe empty, I suppose

  {| body}

it might need a space between the brace and the pipe, but I don't know

-----

1 point by shader 6450 days ago | link | parent | on: Poll: Priorities

How do you post polls? I was going to ask which libraries we should work on first. Though some people think we should write programs first, and then figure out the necessary libs, I think that there are a great many libraries that are so basic and useful that anyone would want them. A math lib, for instance.

-----

1 point by almkglor 6450 days ago | link

http://arclanguage.com/newpoll

Here's a list started by stefano btw:

http://github.com/nex3/arc/wikis/libraries-todo-list

-----

1 point by shader 6450 days ago | link | parent | on: Rethinking macros

Pardon my ignorance, but what is anaphora as it relates to macros?

-----

2 points by almkglor 6450 days ago | link

It's a macro which automagically binds a name. For example, 'afn:

  (afn ()
    (your-code))
  =>
  (let self nil
    (= self
      (fn ()
        (your-code))))

Or aif:

  (aif x
    (your-code))
  =>
  (let it x
    (if it
      (your-code)))

-----

1 point by shader 6450 days ago | link | parent | on: Rethinking macros

Aren't the "only" differences between macros and functions the time of evaluation, the fact that they don't evaluate arguments, and the fact that they return code to be evaluated instead of a value?

What if we made those traits optional modifiers, similar to the * shown here, so that the programmer would have complete control over how the function operated? Then we could make functions that return code, but evaluate arguments, or avoids argument evaluation but returns a value, etc. And we could flag our functions to say whether they should be evaluated at compile time, or only at run time.

Any common patterns of modifiers could be abstracted by a "macro." So a function that ran at compile time, didn't eval args, and returned code, would be "(mac," whereas a function that evaluated arguments, returned a value, and ran at run time would be "(def" etc. etc. We could make more for other varieties of functions.

This doesn't make code shorter necessarily, but gives greater flexibility when defining functions.

Now, what did I forget that makes this obviously a bad idea?

If I understand it right, the [] syntax requires a function of a certain signature. Interesting idea.

-----

1 point by almkglor 6450 days ago | link

> Now, what did I forget that makes this obviously a bad idea?

Interesting, so we can get something that runs at compile time, evals the args (which we presumably evaluate in the runtime environment) and returns a value.

Oh, you can't get the runtime environment at compile time yet? Hmm.

> If I understand it right, the [] syntax requires a function of a certain signature. Interesting idea.

No, it transforms it to a function with that signature:

  (defho foo ([hmm (it)])
    hmm)
  =>
  (mac foo (hmm)
    `(foo-internal (fn (it) ,hmm)))
  (def foo-internal (hmm)
    hmm)

  (macex '(foo it))
  =>
  (foo-internal (fn (it) it))

-----

1 point by shader 6450 days ago | link

lol, I realize that you could try and apply those three at the same time, but the "compile time" modifier was supposed to be used by the programmer to tell arc that it was safe to do just such a thing. Other wise, it would be evaluated at run time. So while you could use those three at once, it would like you say have an undefined consequence.

Basically, I thought that it might be useful to have functions that didn't eval all their args. At that point, they're only a few steps away from macros, and I thought it would be cool if we could unify them.

>transforms to a function Just shows I didn't understand it right. :) That makes more sense, but I'm still not sure I understand it. Oh well.

Does your p-m:def support matching based on function sigs?

-----

1 point by almkglor 6450 days ago | link

> Basically, I thought that it might be useful to have functions that didn't eval all their args.

  (def foo-internal (x y z)
    (do-something x y z))

  (mac foo (x y z)
    ; eval x, don't eval y, eval z
    `(foo ,x ',y ,z))

> Does your p-m:def support matching based on function sigs?

Yes, although I'm not sure I quite understand what you're asking.

For example factorial works out like this:

  (p-m:def factorial
    "a factorial function"
    (1)  1
    (N)  (* N (factorial:- N 1)))

-----

1 point by shader 6450 days ago | link

Actually, what I meant by match on function sigs was: while writing a higher order function that wants a function as an input, can you require that it be a function that's signature implies one var, two vars, a list etc?

As to the non-evaling functions, I suppose we could have a macro that defined a macro / function combination; the resulting macro would expand into a call to the function with ' before each item in the arg list.

Oh, and I think that `(foo ,x ... is supposed to be `(foo-internal ,x ... Otherwise you have an infinitely recursive macro. :)

-----

1 point by almkglor 6450 days ago | link

> can you require that it be a function that's signature implies one var, two vars, a list etc?

Nope ^^

> Oh, and I think that `(foo ,x ... is supposed to be `(foo-internal ,x ... Otherwise you have an infinitely recursive macro. :)

Yep ^^

-----

2 points by shader 6451 days ago | link | parent | on: SNAP: Shared Nothing Arc Processes the VM discussi...

How's SNAP coming?

Would you mind too terribly much explaining/commenting the code so that I could follow the thought process? I'd like to help, as erlang mixed with lisp just sounds awesome, but I'm not the best with c++, much less c++ I don't understand. ;)

-----

2 points by almkglor 6451 days ago | link

Nicely enough, although I haven't managed to push some of my more recent changes on the github (for some inexplicable reason our internet connection is very slow...).

There are quite a few comments in the code but no real top-level description of the code yet. Here's a first attempt:

The basic class structure for memory management is:

  Generic (abstract/non-instantiatable base class)
   +- .... all Arc objects

  Heap (abstract/non-instantiatable base class)
  +-Globals (concrete)
  +-Process (concrete)

  Semispace

  Heap contains a pointer to Semispace, "s"
  Heap contains a vector of Semispace pointers, "other_spaces"
  Heap has the following dynamic functions:
    get_root_set()
      - gets the root set of this heap, i.e. if it's a
        Process, the process's stack and cache of globals

  A Semispace is defined as a memory area containing 0 or
  more Generic objects.  Each Generic may have 0 or more
  references to other Generic objects.

  A Semispace may be held by only one Heap.

  Each Generic may refer only to Generic objects in the
  same Semispace, or to another Semispace held by the
  Heap, i.e. can only refer to Generic objects in the same
  Heap.

  When a Heap wants to transfer an object to another Heap,
  it must copy all those objects into a new Semispace
  (one which it, conceptually, does not "hold" except
  during initialization), then adds it to the destination
  Heap's "other_spaces" vector (obviously locking is
  required).  While holding the lock it must also add
  the copied memory to the mailbox or wherever in the
  destination's root set it can be held.

  During a GC, the Heap holds its lock, then computes
  a "good" size for a new Semispace, creates it, then
  copies the root set to the new Semispace.  Then it
  releases all its Semispace objects (in "s" and
  "other_spaces")

So, err. that's memory management so far. ^^

-----

1 point by almkglor 6451 days ago | link

Here's a further bit, about ToPointerLock:

  ToPointerLock is a RAII class which protects the
  to-pointers of objects.

  The to-pointers are used for tracing and copying.

  Each Generic includes a to-pointer.  This is private to
  the Generic class, and is not accessible except via
  a ToPointerLock.

  During copying, the copy routine gets a reference
  from the to-copy set.  Then it queries the ToPointerLock's
  to() function on the reference.  If it returns a
  non-NULL pointer, the reference is changed to that
  value.  Otherwise, the copy routine clones the object
  to the destination Semispace, then sets the to-pointer
  via the ToPointerLock's pointto() function, and then
  changes the reference.

  ToPointerLock keeps track of modified to-pointers.  If
  something throws and the ToPointerLock goes out of scope,
  it will reset the to-pointers back to NULL (so that
  the Heap will remain valid).

  The ToPointerLock can be told to forget about resetting
  the to-pointers back to NULL, for example after a
  successful GC, where the memory areas will be reclaimed
  anyway and the to-pointer values don't matter.  This is
  done by calling the good() function.

-----

2 points by almkglor 6442 days ago | link

Here's a widdle bit more, about Executor's, BytecodeSequence's, and Bytecode's:

  An Arc function is represented by the Generic-derived
  Closure class.  The Closure class is composed of a
  function variable and a vector of Generic pointers.

  The Closure's function variable is a pointer to an
  Executor.  An Executor is usually just a wrapper around
  an _executor_label variable, but a derived class called
  ArcExecutor includes a BytecodeSequence as well.

  The implementation of the _executor_label type will
  depend on whether you are using GCC or not.  If you're
  using GCC, then _executor_label is a void* and will be
  a pointer to a lebel in the source code.  Otherwise it
  will be an enumeration, which will be used in a switch
  statement for standard C.

  (GCC supports an extension to C / C++ where the pointer
  to a source code label can be taken using &&label.  In
  standard C / C++ this is not allowed.  This is used to
  speed up interpretation so we don't have to go through
  a switch statement, insted we just use indirect jumps)

  A BytecodeSequence is just a sequence of Bytecode's (the
  exact format is not yet well defined - might be a linked
  list for simplicity, or might do some hackery to put
  them into a vector).  Most Bytecode's are (like
  Executor's) just a wrapper around a _bytecode_label type,
  which (like _executor_label) will be either a void* or an
  enum depending on your C++ compiler.

  However there are two additional variants of Bytecode.
  One is the Bytecode with BytecodeSequence variant, called
  a SeqBytecode.  A SeqBytecode is used to represent an
  inner function prior to creation from a closure:

    arc2c AST:
    (%closure (fn (self k) (k (%closure-ref self 0)))
              x)
    SNAP symbolcode:
    ( (fn (check-vars 2)
          (local 1 #|k|#)
          (local 0#|self|#) (number 0) closure-ref
          (apply 2))
      (local 2 #|x|#)
      closure
    )

  It's also used to represent an if:

    arc2c AST:
    (if x
        (k 0)
        (k 1))
    SNAP symbolcode:
    ( (local 2 #|x|#)
      ; the if contains only the then branch
      ; when flow of control enters the if, it won't
      ; leave the branch!
      (if
         (local 1 #|k|#)
         (number 0)
         (apply 2)
      );else
      (local 1 #|k|#)
      (number 1)
      (apply 2)
      )

  Another kind of Bytecode is the IntBytecode, which are
  used to represent symbolcodes with parameters, such as
  local and apply above (in fact a majority of Bytecode's
  will be IntBytecode's)

  I might also need bytecodes for strings, although I
  might just insert a step in the arc2c compilation
  code to transform them (as well as quoted lists) into
  the following code:

    (def foo ()
      (pr "foo"))
    =>
    (let quoted (string (cons #\f (cons #\o (cons #\o nil))))
      (def foo ()
        (pr quoted)))

  Character literals would probably also be transformed
  to IntBytecode's where the int is the character's
  unicode.

  Then there are the symbol literals ^^, which I haven't
  thought of yet.

-----

1 point by almkglor 6440 days ago | link

Okay, I've been (re)thinking about bytecodes a bit.

The current arc2c code generator essentially outputs a stack-based "bytecode", with the bytecodes being directly output as C macros.

However I've been reading some good things about register-based bytecodes recently. In essence register-based bytecodes have the advantage of describing more operations per bytecode, which reduces the number of dispatches in a function - i.e. we have fewer but longer instructions, unlike a stack-based bytecode system which has simpler but more instructions.

Now from my analysis of arc2c, in the final AST output (just prior to code generation) each function's body is composed of just one thing. It can either be a function call, or an if.

An if statement would then have either a variable reference or a primitive call in its condition, and its then and else branches are either function calls or if statements of the same type.

The component sub-expressions of a function call would then be either variable references or a primitive call (function literals at this time have been converted to primitive calls to %closure, and any nested function calls will have been transformed away by CPS conversion).

So it seems that pretty much everything can be transformed to variable references.

Hmm...

Edit: Waa!! I'm being forced to study MFC against my will!! LOL

-----

1 point by almkglor 6437 days ago | link

If anyone at all's still interested, I've pushed some notes on the bytecode system. I've also started to adapt bits of the arc2c compiler (a few changes were necessary in the final cps and closure-conversion changes) and creating the bytecode generator (unpushed though).

There's an untested symbolcode-to-bytecode converter in the executors source of SNAP too.

The VM is a somewhat register, mostly stack machine now. A few bytecodes were modified/added to allow direct access from a local variable instead of just modifying the stack top. So a sequence {(local N) bytecode} becomes {(bytecode-local-push N)}, removing one bytecode dispatch. It still pushes on the stack top though, but then given that there are relatively few "addressing modes" so to speak, it's not bad.

I suspect it will make JIT, if ever, a bit worse though.

-----

2 points by almkglor 6432 days ago | link

Of course nobody actually cares anyway, but SNAP's bytecode engine now seems to be working.

The tests/executors_test.cpp contains the test for the bytecode/executor engine. It's very basic - it just converts a list-encoded symbolcode/bytecode to a function, then executes that function.

The symbolcode calls a builtin function, '$ (which I'm reserving for implementation-specific Arc stuff). Since this is CPS, the builtin function returns by calling back to a bytecoded/interpreted function.

Assuming you have boost in a standard include, and g++, you can try it out by downloading/gitting from http://github.com/AmkG/snap/tree/master , then go to the tests/ directory and:

  ./compile executors_test.cpp
  ./a.out

IIRC the only thing I've used so far in boost are the shared_ptr and scoped_ptr, which are probably the most portable parts of boost (and don't require compiling a library).

It's tested on 8.04 Ubuntu with the versions of g++ and boost included in it (don't remember exact versions, and don't currently have access to my machine).

This will probably end up being how SNAP initializes:

  1. Construct a bytecode list for compilation by the
     executor system.  This bytecode list will be derived
     from a version of arc2c compiling itself with the
     assumption that it's safe to use macros in the same
     environment.
  2. Convert the bytecode list to a function.  Since this
     requires the executor system, which assumes that it
     normally executes interpreted code and only
     occassionally built-in CPS functions, this conversion
     will need to use some temporary global variables to
     access bits of the executor system that are normally
     accessible only to interpreted code.
  3. Execute the function.  This will end up being similar
     to how interpreted code is normally executed, except
     that we don't have to go through work queues etc.

And of course... a few more tests would be necessary for the executor system, then we have to consider how to handle a worker thread pool to execute processes, and also to disable the use of a worker thread pool if we want to use a single OS thread.

-----

1 point by almkglor 6420 days ago | link

Update...

Currently I'm waffling on I/O. Asynchronous I/O is hard, let's go redditing...

Originally the plan was that there would be a separate builtin type for I/O ports, which would be just shared_ptr wrappers around AsyncPort objects.

Of course there's the rather minor problem though when several separate Arc processes try to access the port.

And there's also the minor problem where in Linux some (most?) filesystems assume they're so fast that they will always return "ready" for select() or poll(). Haha.

Currently I'm trying to implement ports as separate C++ processes with their own message queues.

And just suddenly now, it dawned unto me that this time there's the minor problem where I/O is performed on reads. A read might consume a variable number of characters; the problem is, which variable number of characters? How do I determine, before I even look at anything, how many characters to consume? Since I only get one queue entry, and another process might get the next entry? Take for example the 'read function: it knows it will start a list when it sees #\(, but it can't know how many characters, total, the list will be. We can't "release" the port until we actually complete the 'read function, because if we do, another process might read from the port and start reading random characters, meaning we (this process and the other proess) both end up not reading anything useful.

I'm not making sense I think.

So in the end I may very well have AsyncPort objects in the Arc-side, and when reading pass in an entire function which will consume characters, and then when the function returns a value (or throws an error), then we "know" that the port is released and another process can grab it by passing in a new function.

-----

1 point by stefano 6420 days ago | link

For the read problem, the shared I/O port should have a buffer containing the characters read, and an index on this buffer for every process that read data from it. This way every process can act as if it were the only one to read data because it will read starting from its own index. This doesn't work for writing data, of course, so output port shouldn't be shared.

-----

1 point by almkglor 6420 days ago | link

Suppose process A reads up to the second-to-the-last character. Then suppose it passes the port to process B. Process B does a read. What does it read? First character or last character? What would the programmer reasonably expect?

-----

2 points by stefano 6419 days ago | link

Both the options seem reasonable to me. If B wants to read the first character probably he would open a new port, so when B receives a port from A he should read from the last character. What if now A tries to read from the port? To keep the behavior of the virtual machine consistent (always copy) A should read the last character, and not EOF. If this didn't happen, B would have successfully changed the state of something belonging to A. In conclusion, every process should have their own index in the buffer, and when a port is passed from A to B, B gets a new index initialized with the value of A's index.

-----

1 point by almkglor 6418 days ago | link

Huh. Didn't think of that.

Anyway, the internet connection at home is down, I've got girl problems again, and I am in the mood to sleep before I start shooting everybody (especially since I suck at first player shooters, so playing those games just pisses me off). Don't anybody go around expecting anything decent for SNAP this weekend, not that there was anything decent there anyway.

-----

2 points by almkglor 6416 days ago | link

http://www.monkey.org/~provos/libevent/ Hmm. Possibly useful, we might be able to use its event_loop(). Possibly we can implement a central I/O process written in C++ which simply waits for I/O, then notifies the calling process that actually performs I/O.

As an aside, the problem with having separate buffer pointers for each process in each input port is that it stops being orthogonal to output ports, where it's simply impossible to have separate buffer pointers.

So I think I'm putting it back to "I/O ports are really processes", where the I/O port process is an Arc process that acts as a serializing mutex around the I/O port.

-----

1 point by almkglor 6413 days ago | link

Oh man, asynch I/O is hard...

As an aside, the newer (but presumably less well-developed) libev http://software.schmorp.de/pkg/libev.html supports nice timeouts and child process monitoring, which would really help in implementing 'sleep and 'system.

Further, I'm also thinking of ways of reusing continuation closures.

In arc2c many continuation closures can be eliminated because the compiler can inline global functions with impunity (simply by detecting if a global variable is assigned with a function exactly in one place, at the top-level). These remove the need to construct continuations for many cases, such as calling the basic 'car and 'cdr functions (they are inlined to the 'car and 'cdr primitives), which in most cases are defined only once, in the library.

However in SNAP we want to support full dynamism, so we can't do inlining for many cases and we must actually perform a CPS-style call, which requires constructing a continuation closure.

A continuation closure, once invoked, can usually be discarded immediately; in fact, the continuation closure's data can even be completely overwritten.

The only exception here is with 'ccc. So, what I'm thinking is, we add a new type of closure, a k-closure, which is just a subtype of standard closures. It contains an additional bool, specifying if it can be safely reused (the default).

When 'ccc is invoked, it clears that bool. In addition, it must also search through the entire existing "stack" of k-closures, clearing all their reusable flags.

A continuation function can then simply reuse its own k-closure while constructing a new continuation, unless the reusable flag is cleared.

-----

1 point by almkglor 6410 days ago | link

Okay, I've implemented the continuations idea above. I also tested it a little, and tested a beta version of the arc to bytecode compiler (that's now on the git). Testing revealed a hidden bug in the executors framework ^^ specifically the first bytecode in each function did not get executed. Since the most usual first bytecode is a check of the number of parameters, I didn't see the error ^^

I'm thinking of also implementing the continuations idea above in arc2c.

Edit: Oh, and since we're on the subject of continuations: I don't know, but the lack of a full 'dynamic-wind support in Arc seems rather, err, well, puzzling. What's supported is about half of 'dynamic-wind, i.e. the half that handles exiting the 'dynamic-wind; it's exposed as the ac.scm 'after. This means that if someone creates a generator library and does something like:

  (defgen foo (f)
    (w/infile p f
      (yield (read p))))

... it won't actually work, because once 'yield calls outside of the context of 'w/infile (via a continuation), the file is closed and can't be reopened (because 'w/infile doesn't use an "before" handler, just an "after" handler).

What I'm wondering about is: is this actually OK? If I implement just "after" handlers on continuations, is that "good enough" for Arc?

-----

1 point by stefano 6413 days ago | link

How do you safely inline functions?

Suppose you have a file with these definitions:

  (def f () 8)
  (def g () (f))

and you compile it. arc2c would then inline f when called by g. Suppose now that I load the compiled file from the repl and then type:

  (def f () 5)

now calling g from the repl should return 5, but f were inlined before, so it will return 8. Does arc2c handle this kind of problems? If it does, how is this implemented? I'm asking because I wanted to do a similar optimization in nyac, but this problem blocked me.

-----

1 point by almkglor 6412 days ago | link

Simple: arc2c doesn't have a REPL ^^

Basically, it expects all definitions to be in the file. It only inlines definitions that are:

1. Done at the top-level, including loaded and required files

2. Defined only once (i.e. assigned to only once)

Basically this is largely an optimization of the arc.arc and ac.scm libraries.

This optimization is useable only if 'eval is not ever used. If 'eval is ever used, this optimization will have to be disabled.

-----

2 points by stefano 6412 days ago | link

I wonder if there is a way to make that optimization work also when eval is used, maybe by tracking a dependecies list for every function, because it is really helpful. To make an example: the '- function takes a variable number of args, so all the arguments passed to it must be consed. With inlining it would be possible to avoid consing the arguments when they are known to be 1 or 2. To give you an idea of how much consing the args costs, take the fibonacci example: to calculate the 32nd number on NYAC when consing '- args it takes (on my computer) ~3.4 secs, without consing '- args it takes ~0.6 secs. This is a huge difference.

-----

1 point by almkglor 6410 days ago | link

> With inlining it would be possible to avoid consing the arguments when they are known to be 1 or 2.

Hmm.

Since I control the VM anyway, it may be possible for me to add some sort of feature/bytecode/built-in-function that can perform reduction of arguments without consing (i.e. just by allocating a continuation function and reusing it). It would be possible to also have it "short-circuit" so to speak, prevent it from allocating a new closure and just pass its own called continuation directly to the child function.

Basically it would be like this:

  (with (f0 (fn () (handle-0-arguments))
         f1 (fn (a) (handle-1-argument a))
         f2 (fn (a b) (handle-2-arguments a b)))
    (fn rest ; not actually expanded into a list
      ; this will be implemented in C
      (if
        (no rest)
          (f0) ; pass the continuation
        (no (cdr rest))
          (f1 (car rest))
        (no (cdr:cdr rest))
          (f2 (car rest) (cadr rest))
        ; perform reduction
          (ccc
            ; enclose these variables
            (with (a (f2 (car rest) (cadr rest))
                   rest (cdr:cdr rest))
              ; also enclose the continuation
              (afn (k)
                (zap a (f2 (car rest) (cadr rest)))
                    ; rest will be an index into the
                    ; closure variables
                (if (zap cdr rest)
                    (self k)
                    (k a)))))))))

Oh and here's a blog of SNAP: http://snapvm.blogspot.com/

-----

2 points by stefano 6409 days ago | link

So handle-n-arguments is a special form? Exposing such a special form would break compatibility with both arcN.tar and Anarki, but this seems a good solution.

> Oh and here's a blog of SNAP: http://snapvm.blogspot.com/

Nice! This way informations about SNAP will be no more spreaded in the forum. I suggest copying everything already present in the forum about SNAP into the blog.

-----

1 point by almkglor 6408 days ago | link

It won't be exposed as a special form - rather it will be exposed as a bytecode, and the symbolcode-to-bytecode compiler will be exposed as a function that the implementation specific '$ will return. This should reduce the conceptual footprint of extensions to Arc that I add strictly for efficiency, versus actual axioms.

Basically we just grab $ as a sort of dispatcher for implementation-specific stuff (much like it is used to access mzscheme in Anarki), and put anything that is more-or-less implementation-specific there.

-----

1 point by almkglor 6410 days ago | link

True; I'm thinking of something like using a dependencies list like you suggested, and doing dynamic recompilation (inlining the code) when a function call is done a particular number of times. I then keep a reference to the original code, in case some member of the dependencies list is modified.

The problem here however lies in the severely parallel nature of SNAP. If I'm recompiling code, then I either make a (process-local!) copy, or I force everything to use (comparatively slow!) locks just to read code. The better choice is the process-local copy but it has the drawback that optimizations that run on one process won't get passed to another T.T unless SNAP is run in single-worker-thread mode.

-----

2 points by shader 6431 days ago | link

I do too care. It's just that I'm not intelligent/experienced enough to understand what you're doing in a way that would allow me to help. I have very little experience with pl design, and even less with compiler/vm implementation (read, none). However, I'm very interested in how it turns out in case I could learn something and especially because I want to use it when you're finished.

So please, keep working on it, and updating us about the changes. And it's not like there's anything else going on around here for you to distract us from anyway.

-----

1 point by almkglor 6431 days ago | link

Well, this is my first time hacking together a VM anyway ^^. And arc2c is the first time I've hacked on a compiler, and I didn't even start arc2c.

Here are some interesting bits and pieces about VM's that I got via google trawl:

http://www.sidhe.org/~dan/blog/ - Squawks of the Parrot, a blog made by one of the first developers of Parrot VM. SNAP does things very differently from Parrot, partly because of its shared-nothing constraints, partly because this is a lisplike, and mostly because the Parrot VM is just something ^^.

http://blog.mozilla.com/dmandelin/2008/06/03/squirrelfish/ - about the squirrelfish VM. This is the main reason why I decided to use bytecode instead of AST traversal, and why I made a bit of an effort to reduce dependence on the stack (by introducing the -local-push and -clos-push variants of some bytecodes). It also has a bit about direct threading, which is portably implementable only on GNU compilers, and is the fastest portable implementation of bytecode interpreters (faster methods require assembly coding, or at least (in the case of dynamic inlining) some knowledge of what type of code can be copied without modification on what type of processor).

http://research.sun.com/self/papers/papers.html - a bunch of papers about Self, a smalltalk-like language and its implementation. I hear tell it's the most powerful VM on the planet, and its results are what is fueling Sun's recent JVM optimizations. The page itself does not contain the papers, but you can search for their titles and have a good chance of finding them online (or at least, when citeseer is up ^^)

And some more discussion:

Dynamic inlining is a sort of "poor man's JIT". Instead of generating a bytecode sequence as a vector of pointers-to-labels (as in direct threading), we directly copy the bytecode's implementation into the stream.

JIT is still the fastest way to implement an interpreter.

-----

1 point by almkglor 6431 days ago | link

Added a few more tests.

Also, the g++ version I've got is 4.2.3, and libboost is 1.34.1 . Ubuntu here is the 64-bit version, 8.04

-----

1 point by shader 6451 days ago | link | parent | on: Poll: Priorities

>easy-install environment...

That would be cool. As far as I know, s-exps are s-exps no matter which lisp they're written in. The general environment might need a good way of defining syntax so that it could be more helpful than just a text editor with REPL, but that sounds like something lisp should be capable of doing pretty well.

The only problem is that everyone would argue over which language to implement it in. How about we just start in arc, and let them all join us later?

Actually, it sounds like an extended DrScheme, which already supports several different languages of scheme in the editor. If the interface was improved a bit, and it was extended to handle CL and Arc, then it should suit admirably. Unfortunately, I don't know how the language selection feature works, so I don't even know if that's possible.

-----

4 points by cchooper 6451 days ago | link

Arc is probably a good language to implement it in, because it has such a small core and is so flexible. I also think Arc's users are a bit more experimental and ambitious than most Lisp users. Consider the fact that Arc will probably have a shared-nothing garbage collector before any other dialect!

-----

4 points by shader 6451 days ago | link | parent | on: Poll: Priorities

Documentation would be helpful. I think it would be nice if the docs did more than just describe the interfaces to the functions, though that would help. It would be nice if the docs were more developer oriented for the time being, so that interested parties (like myself) could figure out enough of what was going on to help a bit :)

Once I learned what and how things were going on, then I could start to work on libraries. Not to mention the virtual machine. ;)

Personally, of the other three, I would prefer the VM if it included erlang functionality. In second place would be the a2c, because I think it would optimize better than a native compiler, though I could be wrong. I've been amazed several times by what y'all have been able to do.

That being said, a native compiler would be really cool. It sounds more stable and standalone than a2c, and probably the right long term choice. It's also something I'd be interested in learning how to do.

The only other thing I think the community should do is start a good wiki for documentation. That and a good place to upload / download libraries, a la cpan. I don't think we want to include them all in git forever. It would be great if we could make a good centralized place to find libraries and documentation, instead of distributed across blogs and this forum.

-----

5 points by almkglor 6451 days ago | link

There's kens' excellent http://arcfn.com , as well as the Anarki-only (help ...) : use (help function-name) for help on a specific function and (help "regular expression") to search function names and documentation.

re: native compiler: possibly a gcc front-end might work?

-----

1 point by stefano 6451 days ago | link

There is a wiki on github. For the moment I think putting all libraries on Anarki should suffice. When the libraries begin to proliferate (if this will ever happen) we could start thinking about something like cpan.

-----

5 points by shader 6454 days ago | link | parent | on: PLT Scheme 400 officially out -- anybody tried Arc...

Ditto. I especially like the fact that, because there are so few people working on it, I actually have a chance at making a difference. Not a very big chance, since as yet I'm still a noob, but it's still something to hope for ;)

-----

7 points by lojic 6453 days ago | link

I thought it would be cool to be part of new Lisp community also, and it may eventually develop into that, but it's increasingly seeming like just a pet project of pg's that may or may not develop into something more. That's not a criticism - I think it was generous of pg to open source Arc and invite folks to participate, and he's been straightforward with his intentions. It's just a slightly different project than I expected initially.

I'm also a Lisp newbie. If I was already experienced in Common Lisp or Scheme, I might be more interested in investing more time with Arc to learn something new, but at this point, I'm simply looking for the best Lisp to learn. Given that I primarily develop web based software, you'd think Arc might be the one; however the ease of creating simple web apps is offset by the lack of other stuff I need and the question of long term viability.

So, learning PLT Scheme seems like a reasonable course of action because I expect I can bring most of that knowledge back to Arc if it gains more traction. Who knows, if developing web apps with PLT Scheme becomes too painful, maybe I'll just switch back to Arc and give it a few months of learning.

-----

1 point by projectileboy 6448 days ago | link

I agree with your comments, and those above, and I actually think it's a huge strength of this project and why I remain interested. I think way, way too much stuff in the software community is focused on DO LOTS OF STUFF RIGHT AWAY GET IT DONE YESTERDAY WAAAAHHHHHHHH FASTER FASTER FASTER!!! That may be the right way to start a company, but it's a really crappy way to build a "one-hundred year language".

I like the fact that this project is moving slowly, and that the community (from newbies like me to pros like Tilton) has time to try things and reflect, and not in internet time.

I'm 36. I've already learned and thrown away about 3 different major development ecosystems in my career. I accept that I'll continue to do that in order to pay my mortgage, but in my own time, when I'm actually programming for fun, I want to hitch my wagon to something that will actually have some lasting value. And Arc (or at least something very similar) is the best-looking option to me at the moment.

-----

1 point by schtog 6445 days ago | link

True, most language-developers don't have the luxury of skipping backwards compatability.

Obviously this complicates things for library-developers but Arc will be very interesting when it is done.

Even if it might not be something new it will hopefully together with the Arc-community things what´s is wrong with all current LISPs.

-----

7 points by shader 6454 days ago | link | parent | on: PLT Scheme 400 officially out -- anybody tried Arc...

Any plans to move arc away from scheme to a more stand alone situation? Maybe via arc2c? Then we don't have to worry about it working with different scheme implementations.

-----

4 points by stefano 6454 days ago | link

The positive thing of Arc being implemented in top of MzScheme is that this way you can easily modify the core language and try new features without messing with assembly or virtual machines, but in the long run an independent implementation seems to me inevitable. Arc2c is moving this way and I think it will eventually abstract over C, maybe by targeting a general bytecode that can be translated to C, to assembly or interpreted by a VM (this is I think what SNAP is supposed to do).

-----

1 point by mr_luc 6432 days ago | link

Absolutely.

I constantly see comments bemoaning the lack of progress in defining new language ideas in Arc. I don't say I agree 100% with this complaint, although probably the closest that the community has come to this is with hcase.

And of course pg gives library design the same status as language design -- or Arc does, by being so damned flexible. It's not as though Arc is standing still.

But really, the thing that's going to advance the language will be when we use Arc to make things! And that takes some time, and the things we want to make may not be all that important but they'll be interesting to us.

And so while Arc's built-on-PLT status is an obvious hindrance to winning language pissing matches, it means that little is preventing us from doing very interesting things. In fact, that's why FFI's have been such an exciting area of discussion; many of us were limping along with fun little pet projects in another language that wasn't ideal but that had a convenient interface to a couple of C libraries.

In my case, ruby has a good interface to Gosu (for smooth, fast, trouble-free operations on images, even big ones, abstracting away a good chunk of opengl use cases while still letting you use opengl where you want to) and chipmunk (for relatively pain-free 2D physics), and so I made a little screen-saver that has my photographs, mixed with random pg whodehouse quotes from drones.com, tumbling down the screen waterfall-style, and then set about cooking my own code. I got it down to under 100 lines with room for improvement.

(I was living in a fishing village in pacific south america for the last 6 months, with nothing but a $300 windows vista craptop a friend loaned me, so my projects were necessarily of the silly, one-off, only-until-the-surf-gets-good-again variety. I just got back a week ago.)

At the moment, I'm mulling over redoing the same project in Arc, and so my first step is figuring out how to use Chipmunk and Gosu in PLT Scheme. I've never used a C lib from scheme before, so this could be educational.

But I can at least consider doing this project in Arc, in the short-term, because it's implemented on top of PLT. Bad for winning a pissing contest, good for hacking.

-----

4 points by almkglor 6454 days ago | link

There is indeed a plan to make arc2c self-compileable, and to implement a REPL on top of an arc2c that's also capable of interpreting code.

-----

4 points by shader 6454 days ago | link

In that case, what needs to be done to finish it, or at least make it "self-compilable"? And then what do we need to do to skip c? Or is that even reasonable?

-----

3 points by almkglor 6454 days ago | link

> In that case, what needs to be done to finish it, or at least make it "self-compilable"?

Macros, a reader (can be implemented in Arc using lib/treeparse.arc), symeval, I/O

> And then what do we need to do to skip c? Or is that even reasonable?

SNAP

-----

3 points by almkglor 6454 days ago | link

Regarding I/O: it's difficult to do I/O in the presence of green threads. Obviously when one green thread is blocked by I/O the other threads must continue running, meaning we need some sort of async I/O. Most of the async I/O libraries I've seen so far seem to concentrate more on sockets than I/O in general.

-----

2 points by shader 6453 days ago | link

What are your ideas for handling async io? I know there are system specific ways, but I don't think that's a good idea.

We could have separate os threads for the io, one for each major io system: HDDs, network, keyboard, etc. We could write special io methods for the green threads that request the io from the scheduler. The scheduler adds the request to the appropriate io thread queue. When the operation completes, it either unblocks the gt, uses the provided call-back, or passes a SNAP style message.

I don't know if that would work very well. It doesn't seem very scalable, and at the very least is complicated. It would be best of course, if we could find a cross platform async io lib. But it still might be a good idea to make the scheduler responsible for io, whatever we use. I'm sure you know more about the whole process than I do.

Do you know how any other systems do it? Does erlang have async io? How does the JVM get it's async io? The wikipedia page claims that it is possible to synthesize async io from polling and interrupts (supposedly the JVM does that), but I don't know how standard and trans-system those are. http://en.wikipedia.org/wiki/Non-blocking_I/O

-----

2 points by almkglor 6453 days ago | link

> What are your ideas for handling async io?

Here's a list of ideas:

nil

> Does erlang have async io?

I assume it uses async I/O: For a long time Erlang had only green threads and didn't actually use OS threads. From what little I could grok of the Erlang source, it seems it has a set of OS-specific "drivers" for I/O, but these drivers seem to be extremely basic. It seems to have some sort of queue, but I can't figure out how it handles the queue (i.e. how it prevents the queue from blocking normal operation).

Each port is apparently modeled as a process that happens to have the ability to resume/suspend other processes. When the port gets its timeslice (?) it goes through its queue, does one I/O request (?), then resumes the process that asked for that I/O (I think).

In fact the scheme you described seems, approximately, what Erlang does. I think. I haven't found any particularly good discussions on the Erlang source.

Then there's some note somewhere in erl_async too that if threads (OS threads?) are unsupported, then the "async" I/O becomes synchronous (?).

> How does the JVM get it's async io?

I think that again, this is OS-dependent. After all, JVM is a virtual machine, so it's up to the JVM implementation to handle this. Figuring out how a random JVM implementation does this probably means digging through C source.

-----

2 points by stefano 6453 days ago | link

To do async IO there are 2 ways: use OS threads or use green threads and OS-specific asynchronous I/O calls. Either way, the VM should, obviously, abstract away the OS-dependant features. Using OS threads seems easier to me.

-----

1 point by almkglor 6453 days ago | link

Both ways seem quite OS-specific to me ^^

Hmm. This will also have to be done in arc2c too.

-----

1 point by stefano 6452 days ago | link

I don't think there is some way to avoid using OS specific functions for async IO. Abstracting this shouldn't be very hard, though.

-----

1 point by shader 6451 days ago | link

Well, there is a Boost library for cross-platform threading. We could just use that and "simulated" async using threads to offload the synchronous io. Now, maybe that's a bad idea, and we need to use some abstracted os specific async libraries, but it would work.

Boost also seems to have plenty of libraries that would be useful for SNAP. How is that going anyway, almkglor? I should probably ask questions pertaining to that subjects in it's own thread.

Pretty quiet around here. Hopefully that means everyone's busy ;)

-----

1 point by almkglor 6451 days ago | link

> Well, there is a Boost library for cross-platform threading. We could just use that and "simulated" async using threads to offload the synchronous io.

Boost also has asio, but it appears to be concentrated more on socket I/O than I/O in general.

What I would like is to have a general-purpose asynchronous I/O library, which would handle all details of asynchronicity.

Basically, I intend to allow SNAP to be compiled in two modes:

1. Single worker thread, multiple process. For cases where the OS doesn't have decent threads (dorky embedded systems? LOL, quite a few of the embedded systems I've seen recently actually have decent threads)

2. Multiple worker OS-level threads, single runqueue.

async I/O is needed for 1, and would be nice for 2: in case 2, if a worker thread blocks on synchronous I/O, the others will still fetch processes to run from the runqueue, although the blocked worker thread is one less thread that can work.

However in case 1, there's just one thread, so it can't block.

-----

2 points by conanite 6445 days ago | link

rainbow is a completely scheme-independent arc implementation. I expect that it's not the 100-year implementation of the 100-year language: arc2c is likely to have seriously superior performance. And I gather that java isn't pg's all-time favourite language. A lot of scheme leaks into the current arc implementation: (is #f nil) is t, #x12 is 18, 'sread reads scheme expressions. Apart from sread, which is destined in a comment to be replaced by writing 'read, it's not obvious whether these should be considered part of the specification.

In the early days, many java libraries were non-java implementations. Gradually they were replaced by "pure java" versions. Is there any reason arc should not do the same?

-----