Arc Forumnew | comments | leaders | submit | shader's commentslogin

Just tried it, on windows.

It fails pretty early, in ac.scm. If I use pretty much any of the built in "languages", it breaks claiming that "interaction-environment" is unbound. This is a function that's supposed to return the present environment, used as a parameter to eval in the REPL.

If I use the R5RS language, the interaction-environment function is defined, but it doesn't help because it breaks even earlier, at (require mzscheme) in as.scm, claiming that it doesn't recognize require.

Update:

After replacing all of the interaction-environment calls with calls to current-namespace, as recommended by another commend on this forum, the first thing that failed was 'set-car!'

-----

5 points by eds 6454 days ago | link

Since mzscheme's lists are now immutable, you need to use 'mcons rather than 'cons in ac.scm.

-----

1 point by mr_luc 6432 days ago | link

Huh! Is that all it takes?

It's been a while -- does anyone have a version of ac.scm that works w/400? Can I see?

Man, I'm feeling lazy today. Fourth of July, woohoo.

-----


Um, how does that work?

-----


If they're immutable, then great. But using shared memory to transfer state is flawed, and can cause race conditions, etc. That's why we're using the actor model to begin with, right?

I suppose they can't really be immutable, or we couldn't do hot code loading. How does Erlang do it?

-----

1 point by almkglor 6455 days ago | link

> But using shared memory to transfer state is flawed,

cough let the programmer second-guess you cough

> How does Erlang do it?

It doesn't, actually. Functions are still effectively global variables, ones which you can't write to except via the c() BIF.

-----

4 points by shader 6455 days ago | link

>cough...

Heh, you're right. So, in that mindset, why not give a lot of nice macros for controlling share vs copy, and make the default be copy? Then programmers could control nearly everything. Of course, they could always hack on your vm if they really wanted tons of control.

But still, concurrent writing to a global variable sounds dangerous.

I kind of like the idea of them being "registered processes." I'll have to do some more thinking on that.

>It doesn't, actually...

Yes, that answers some of the question, but I was a bit more interested in how they implemented their hot code loading. The code still exists for a while as the existing processes continue to use it. But they eventually phase out the functions and swap to the new ones.

IMHO, hot code loading is a very nifty feature. Combined with remote REPL makes it especially useful. I don't know how well current lisps support hot swap, but I don't think it can work effectively without a concurrent system.

-----

3 points by almkglor 6455 days ago | link

> Yes, that answers some of the question, but I was a bit more interested in how they implemented their hot code loading

It's in the OTP library actually. For example, they have a standard gen_server module. The gen_server would look approximately like this in snap:

  (def gen-server (fun state)
    (<==
      ('request pid tag param)
        (let (state . response) (fun state param)
          (==> pid (list 'response tag response))
          (gen-server fun state))
      ; hot code swapping!!
      ('upgrade new-fun)
        (gen-server new-fun state)
      ('stop)
        t))
So yes: hot swapping just means sending in a message with the new code ^^

It's actually more complex than that - they generally make messages include a version number so that nodes with new versions can communicate to nodes with older versions. This versioning is, in fact, part and parcel of the gen_server series of functions. Requests to servers are made via functions which send a message (with tags and other metadata such as versions abstracted away) and wait for a receive.

I think what they say is, the programmers good at concurrency write the gen_server and other parts of the OTP, while the average programmers write application code ^^

Much of Erlang isn't implemented in the VM level ^^

-----

3 points by shader 6455 days ago | link

It makes sense that they wouldn't do that at the vm level. Your code even makes sense, though I thought "let" only assigned one variable.

I'm still not quite able to read arc fluently, so any explanations of the subtle that I likely missed will always be appreciated. Come to think of it, any explanations of any code would be nice, as the thoughts and reasons behind code don't always come out in the source itself. And I also like learning new things :)

-----

2 points by almkglor 6455 days ago | link

  (def gen-server (fun state)
    (<==
I'm using pattern matching. Although Arc doesn't actually have pattern-matching built in, someone wrote a pattern matching library a long time ago using macros: http://arclanguage.com/item?id=2556 http://arclanguage.org/item?id=1825 . The modern evolution uses something like p-m:def to define a pattern-matching function, p-m:fn to create an anonymous pattern-matching function, etc.

      ('request pid tag param)
The pattern above means "match a 4-element list, whose first element is the symbol 'request, and which has 3 more elements that we'll call pid, tag, and param".

        (let (state . response) (fun state param)
This is a destructuring. It simply means that (fun state param) should return a cons cell, with the 'car of the cell being placed in state and the 'cdr being placed in response. So we expect fun to return something like (cons state response)

          (==> pid (list 'response tag response))
Note the use of 'tag here. We expect that 'tag would be a 'gensym'ed symbol, and is used in pattern matching so that the client can receive the message it's looking for.

          (gen-server fun state))
Plain recursion.

      ; hot code swapping!!
      ('upgrade new-fun)
        (gen-server new-fun state)
This is it, really: just give a new function.

      ('stop)
        t))

-----


Agreed. Depending too much on global state sounds risky, though I suppose it depends on how you abstract it. You could have two types of 'globals'. One being the traditional scopeless variable, local to each process, and the other a process that is registered somewhere that makes it easy to find. So, instead of requiring pre-knowledge of the pid, you can find it easily. That sort of sounds like a global variable, but it wouldn't be limited to just storing a value.

Unfortunately, you'd still need to handle the problems that would occur if one of those global procs died, or if the table got corrupted.

-----


Concerning bloom filters, they are only helpful if knowing the fact that a particular process has a reference matters when deciding to copy. So, it's useful for the case when you're sending an object to a process. Otherwise ref counting is better, because you aren't broadcasting "do you have this?" messages.

Bloom filters are probably only a good idea if your copy is a large one, rather than just a small variable.

Here's an interesting idea: for extremely large messages, have some sort of "lazy copying" where the process only copies pieces just before it uses them, or when there isn't a cost associated with the transfer. That way large copies don't slow down your system. You'd still increment the reference counter, if you were using one, and any mutations would have to change a different copy.

Actually, a good question is what the relative costs of sharing vs. copying are. I would think that for small things copying is less expensive than keeping track of who has it, etc., just to avoid copying. For bigger items, it might be worth avoiding a premature copy. I don't know.

Edit 1: About the k hash functions, I think that if there is low enough correlation between bytes in the result you can break up one hash function into several. Also, most good hash functions, (the ones that might be too slow for this purpose) have enough of a difference in the result with slight changes to the input that you could add just 1 to the value several times, and get a dozen hashes. So you could use:

  (hash val)
  (hash (+ val 1))
  (hash (+ val 2))
  ...
and get "3" hash functions. If you break each hash into bytes as well, and each produced multiple bytes, you could easily have 6, 9, or 27 "hash functions"

-----

2 points by almkglor 6455 days ago | link

> So, it's useful for the case when you're sending an object to a process.

Yes, but it is in fact the only case where we copy objects anyway (even reading a global variable would be effectively sending the object in the global to the process in question)

> for extremely large messages...

  #define N    I don't know
  if(msg.total_size() > N){
    msg.share(process);
  } else {
    msg.copy(process);
  }
I think this will end up having twice the overhead though: you need mechanisms to support both sharing with copy-on-write-and-if-receiver-already-has-it and copying.

Take note also that sending messages is recursive: we might be sending a list and thus end up sending several items.

> I would think that for small things copying is less expensive than keeping track of who has it, etc., just to avoid copying.

This was the justification in the Erlang BEAM and JAM machines to always copy.

> About the k hash functions...

Cool, I didn't think of that!

-----

2 points by shader 6455 days ago | link

So then, should we just always copy?

Unless we make some special immutable types as mentioned above; then we could share those. Maybe I'm just prematurely optimizing, and should just give up on trying to find ways to avoid the copy. Life is much simpler just copying. :)

How about just starting with always-copy, and later when we have performance data, and experience using it, we can code exceptions? I don't know how easy it would be to code such exceptions, but it would be less likely to break if we didn't make these decisions without empirical evidence.

>twice the overhead... I didn't think it would create any extra mechanisms, just a short circuit rule when deciding to copy or not. Rather than checking the ref counts each time the item is mutated or sent, if it is smaller than a certain amount, we know it was copied.

Now, maybe ref counts are faster than total_size(), but I don't know.

Certainly, always copying would avoid this whole overhead business. :)

-----

1 point by shader 6456 days ago | link | parent | on: Arc Programming Assignment

Usually, it seems to be either (n log n) or (n log n) - 1 digits.

And usually in this case I would leave of the O, as that usually refers to the performance of an algorithm in time or space. I suppose you could construe the number of digits to be "space" but multiplying O(n) numbers doesn't make that much sense.

-----

4 points by shader 6457 days ago | link | parent | on: What's next?

Cool. Have you made any progress? The intersection of arc/lisp and erlang is one of my interests as well. Think of macros + easy, fault tolerant concurrency, and hot code swapping! :) I'm very interested in seeing anything you've made, and maybe even helping (if I can; unlikely). Based on what you've done so far I'm sure it will be awesome. If you need any ideas on implementation, you could probably ask Joe or the Termite guys.

-----

3 points by almkglor 6457 days ago | link

Well, the memory management is just about done, with the not-very-minor problem that I haven't tested that part yet, for the very simple reason that I don't have anything to test it on. Bad style, I think; I'm pretty much going to haul a lot of stuff into workability when most of the stuff's written, instead of bit by bit. Lisplikes have spoiled me with their easy-easy-easy-test-on-the-repl T.T

The usability of the VM is also predicated on the usability of arc2c, for the very simple reason that I intend to just replace the final code generator step with a bytecode generator.

Joe == Joe Armstrong?

Edit: Digging through a bit about Termite, I see they can actually implement process migration, by the simple expedient of using call/cc. Since I fully intend to support call/cc myself, and to allow serialization of functions, my new VM will also be able to support this feature ^^.

-----

3 points by almkglor 6457 days ago | link

As an aside, I've noticed that Termite, and apparently quite a few languages built from the ground-up to have Erlang-style concurrency, all ban mutations. Which doesn't seem to be necessary for Erlang-style concurrency, actually. At least, if you're copying objects between processes, then mutation shouldn't matter. If underneath the "shared-nothing" gloss you're actually sharing (at the VM level), then yes, having immutable everything is necessary. Edit: an alternative would be to do copy-on-write for shared objects, but mixing copy-on-write for shared stuff gave me headaches, which is why I decided to drop it in favor of copy-on-message-send.

I think I'd rather keep Arc's ability to mutate everything, though; although certainly I'd avoid using mutations in any code samples if at all possible.

An interesting bit in Termite is that ports can also be passed over the network.... whoa!

-----

1 point by shader 6456 days ago | link

I suppose that choice (mutation vs. copying) depends on the relative performance / ease of coding. If you think that mutation is important, than you can sacrifice the efficiency of limited sharing. Alternatively, you can share and not mutate. Maybe you're implementation could leave that up to the user? Have directives for copy vs. share, and if sharing is permitted, than mutation is not, etc. Of course, that clutters the vocabulary and leads to extra typing.

Actually, I don't know which is worse performance wise: Code introspection to prohibit mutation, or copying. Alas, this problem gets complicated the more one moves away from simple absolutes like "no mutation" and "copy" to "immutable sometimes" and "copy mostly" :)

-----

1 point by almkglor 6456 days ago | link

> Have directives for copy vs. share, and if sharing is permitted, than mutation is not, etc. Of course, that clutters the vocabulary and leads to extra typing.

Then User A writes Process A to use sharing and User B writes Process B to use copying. Process A sends object to Process B. ^^

-----

1 point by absz 6457 days ago | link

What about implementing copy-on-write semantics? That would preserve the efficiency of sharing while allowing mutation. Since Arc only has two mutation primitives (set and lset), modifying those would be necessary and sufficient. You would have to have every object on the heap know how many threads referenced it, and if there was more than one, set/lset would copy and then mutate as opposed to simply mutating in place. (Disclaimer: I am not an expert.)

If I recall correctly, Termite's ports are essentially processes, which is what allows them to be passed around.

-----

3 points by almkglor 6457 days ago | link

scar, scdr, sref....

My problem is with this: http://arclanguage.com/item?id=7066

To summarize: Suppose Process A has an object a. It sends this object to process B. This object is a copy, a' (conceptually, in the language level). Now Process A sends object a to process B again. This object is a second copy, a'':

  (def process-A (process-B a)
    (==> process-B a)
    (==> process-B a))

  (def process-B ()
    (givens a
              (<==
                any any)
            a2
              (<==
                any any)
      (if (is a a2)
          (err "I somehow received the same object twice!"))))
So I also need copy-on-write and copy-on-message-pass-if-receiver-already-has-a-copy.

Edit: It is important for them to be separate because process B expects two separate objects, and it might mutate one without expecting the other to be mutated.

There was a solution in my old shared concept, but it involved an O(M log N) search, where M is the message size in number of objects and N is the number of shared objects the receiver has access to, which I decided was simply suboptimal compared to an O(M) copy (where M is number of bytes in the message, but still...).

Edit2: re: passing stuff around. My intent is to not integrate the actual passing-around-stuff-via-network server in the C++-side code; instead, the VM will provide access to plain sockets. Instead an Arc-side server would be defined, with its semantics standardized (but not implementation, so that you can, say, implement a "more secure" server, or use some other protocol, etc.).

Instead, if a server receives a serialized PID over its connection from another computer, it checks a table matching the PID computer and id number. If the table fails to match, it launches a "reflector process" which simply sends every message it receives to the server, for serialization, and inserts that into the table. The PID of the reflector proces is then used as the message sent by the server to the destination process. All PIDs then get this translation (obviously the table has to be bidirectional. so that a PID can be matched to the serialization of the PID.).

Obviously not only processes would require reflectors, but also ports if we want to pass them around ^^.

-----

2 points by shader 6456 days ago | link

Interesting problem. I bet this is why the erlang folks decided to use mutation-less, copy-everything, so that they didn't have to carefully think through every possible case. Maybe you could write your vm to automatically copy everything, and then have a means of coding exceptions. So, if you find that a certain specific case can share, you can add that to the list of exceptions. Otherwise, it copies, so you don't have to worry about unexpected errors.

And you should probably allow the user to require copying if they need to, so that there aren't any surprises.

As an aside, you mentioned a O(M log N) search to determine whether to share or not. Now maybe I don't understand, but you could use a Bloom filter to determine if the object is present on the other process. The only shortcoming is the false positives, which mean you would copy unnecessarily, but they happen so infrequently that it probably isn't an issue. I believe bloom filters were mentioned on Hacker News yesterday. The link on wikipedia is: http://en.wikipedia.org/wiki/Bloom_filter

Using a bloom filter, you'd be able to in constant time, low memory usage, determine if an item is in a set. In this case the items are shared objects and the set is the objects owned by the target process. Now, maybe I don't understand what you're trying to do, but you can clarify that shortly.

How much are you implementing on the c++ side? Couldn't you get a few basic necessities built first, and then just abstract towards usability in arc? I don't know exactly what you're doing, but I don't see why you have to make everything at once.

Maybe we should move this to another thread? :)

-----

1 point by almkglor 6456 days ago | link

http://arclanguage.com/item?id=7142

-----

1 point by absz 6456 days ago | link

That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C). And whenever you try to set (or lset, or scar, or scdr, or sref---good catch) an object whose shared is true (1), then you copy it first, modify it, set its shared flag to false (0), and then make the reference through which you modified it point instead to this new object.

Except this doesn't take any shared structure into account---as soon as you send something over the network, all the objects which share it are fragmented. So you need some sort of "versioning." Perhaps you instead have every reference have a version flag, and every copy between threads increments version for the new copy; modifying something checks the reference you are modifying it through, makes a new copy if necessary for that version, and then modifies it. The problem is that I can't see a good way to do versioned indexing, especially if references are really just raw pointers.

Though note that if you do come up with a working "copy-on-write-and-other-things" scheme, the Arc copy function becomes essentially

  (def copy (arg)
    (==> (current-pid) arg)
    (<== (current-pid) any))

.

-----

2 points by almkglor 6456 days ago | link

> Then every Arc object on the heap would have a shared flag....

This was indeed my original idea: http://arclanguage.com/item?id=7037

Incidentally, 'lset is very, very hard to implement if closures are not very mutable, as in the arc2c output.

> and then make the reference through which you modified it point instead to this new object.

  (def hmm (x)
    (let y (cdr x)
      (scar y 42))) ; is x mutated? the reference I used here was *just* y...
> the Arc copy function becomes essentially (...)

Yes, this is true, although I'd prefer to add some sort of 'uniq tag so that other pending messages don't accidentally get taken ('<== does pattern matchimg, obviously implemented via my p-m macro) ^^. It will even correctly handle recursive and shared structure, which I think the current 'copy does not ^^.

  (def copy (arg)
    (w/uniq tag
      (==> (mypid) (list tag arg))
      (<==
        (,tag cpy) cpy)))
Edit:

> That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C).

The problem here is: what if the object to be sent is already shared and the receiver already has it? Say:

  (def process-A ()
    (let v (list 42)
       (==> B v)
       (==> C v)))

  (def process-B ()
    (<==
      any  (==> C any)))

  (def process-C ()
    (givens v1 (<== any any)
            v2 (<== any any)
      (if (is v1 v2)
          (err "I somehow received the same object twice!"))))

-----

2 points by shader 6456 days ago | link

I don't think you have to worry about versioning. Because sharing is, as far as I can tell, just to save time and bandwidth over copying, conceptually all messages are copies. This means that you shouldn't be programming as if you can update other processes just by changing a shared object. They would have to detect the change and copy it again anyway, if the other owning processes were remote.

Instead, you should only transfer information over messages. That way you don't need to worry about keeping shared objects up to date for all users.

Good idea about the shared flag; maybe it should be a ref counter? That way you can tell if the other process(es) have stopped using it. So, if the ref count is 1, than just straight up mutate, otherwise, copy?

Also, couldn't the send function be:

  (-> pid message)
? It would save a char ;) Of course, you probably have a reason for not using that.

-----

2 points by absz 6456 days ago | link

You need versioning in a situation like the following:

  (givens str1 "message"        ; 1
          str2 str1             ; 2
    (do                         ; 3
      (-> another-process str1) ; 4
      (= (str1 0) #\M)          ; 5
      (prn str1)                ; 6
      (prn str2))               ; 7
Let's assume you just have the shared flag. First, line 4 will set the shared flag to true. Then, on line 5, we mutate str1. We check the shared flag of the value of str1. OK, it's true, so we make a copy of the value and mutate it. Then we print out our values on lines 6 and 7:

  Message
  message
The problem is that the copy separates the string not only from its copies in other threads, but also its aliases in the thread it comes from.

Versioning would mean that str1 and str2 would both be pointers to "version 0" of some value, which had an internal max-version of 0. Line 4 sends the object over, so max-version is incremented (to 1), and the other process is sent a pointer to "version 1" of the same object (1 because that's the new max-version). However, since no modifications have been made, somehow (I don't know how) both "version 0" and "version 1" resolve to the same underlying object. Then, on line 5, sref is called on "version 0" of this object, so the object is copied: "version 0" is copied away and modified, but "version 1" is left alone. When we print the objects, their reference to "version 0" makes them look at this new, copied objects, so we see

  Message
  Message
The problems here are that (1) I don't know how we'd do versioned addressing, and (2) what we do with the copy of "version 0." It seems like you would have a structure which had a list of the available values, and some sort of mapping from versions to values, e.g.

  ; Available values
  A --> "a string or another object"
  B --> "A string or another object"
  C --> "A string."
  
  ; Version --> value mapping
  0 --> [A]
  1 --> [A]
  2 --> [B]
  3 --> [A]
  4 --> [C]
Then, when we modify 0, to, say, "Another object...", we would add D --> "Another object..." to the list of values and change 0 to 0 --> [D].

Hmm, thinking about the underlying structure of this and writing it out makes me think this makes less and less sense, and that either (1) this is actually trivial/a restatement of another scheme/both and thus won't work, or, if I'm lucky, that (2) there's some more general and less clunky scheme hiding inside it (hey, I can dream, right?). My money's on (1) right now, but I do feel like I understand the problem better.

Oh, and I like your suggestion of -> / <-. Nice and short.

EDIT: Almkglor, is any of the code for your VM available for perusal/editing? School's gotten out and I've gotten interested, so I might be interested in working on it (if I can... I am not all that experienced, after all, so it might be beyond me; regardless, I'd be interested in reading it).

-----

2 points by shader 6456 days ago | link

Maybe you could get versioning to work by making a real alias as opposed to just a copied reference. If somehow (i'm getting really far fetched here) expansion of macros, etc, could happen in any position instead of just application, you could have the alias expand to the other variable name. Then, you wouldn't have to worry about messing with your aliases ;)

So instead of merely 'evaluating' identifiers, you 'expanded' them, and variables expanded to their values, etc. you could have better aliases, and neater macros. Of course, I've probably misunderstood something horribly, but that's why you guys are here.

-----

1 point by absz 6456 days ago | link

The problem is that macro expansion happens at "compile" time, so local variables don't have their values yet. But you reminded me of something in arc2c: to quote the "NOTES" file, "...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives."

So, since I believe almkglor said he was planning to use arc2c for this VM, only changing the code generation step, then the solution seems to be to just use the shared flag on values (probably the reference-counting version, as shader suggested). It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping. Then, when you modify something, then if it's a sent copy, that copy is modified, not affecting the original (because of the shared flag); if it's a local with aliases, then it's modified through the sharedvar, and since the sharedvar is still aliased and only its internals are modified, there's no problem!

Well, maybe---that assumes that I actually understand sharedvars well enough. Almkglor? Stefano (since you suggested/invented them)?

-----

1 point by almkglor 6456 days ago | link

> It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping.

You never actually send sharedvars - they are not accessible at the Arc level. However, they are necessary in order to send functions across processes; the generated code expects to see sharedvars, so it has to have (a copy of) sharedvars.

Edit:

> ...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives.

This means this case:

  (let foo 1
    (set foo 42))
Importantly, this does not use sharedvars!

  (let foo "hello"
    (sref foo #\H 0))

-----

2 points by shader 6456 days ago | link

So, is that the only situation versioning is required? Updating local aliases? If so then there should be some work around, or we could just forget about it and consider it a "feature" ;)

Also, I agree about the whole perusal / editing bit. I'm probably even less experienced than absz, but no less interested.

And I again recommend moving to a new thread. The window's getting a bit cramped, and we aren't even talking on the original subject of the post.

Is there a news.yc / forum equivalent for Anarki?

-----

2 points by almkglor 6456 days ago | link

> Is there a news.yc / forum equivalent for Anarki?

You're using it ^^

-----

1 point by almkglor 6456 days ago | link

> Almkglor, is any of the code for your VM available for perusal/editing?

Not yet, the only code I've started writing on is the memory management code. There are two versions, one which uses a mark-and-sweep for local heaps and has shared heaps implemented via boost::shared_ptr, and my newer version which uses stop-and-copy and does (will do?) copy-on-message-pass.

I can send you (and anyone interested) some initial tarballs though ^^. WARNING: C++. With RAII. Be scared.

My solution for the multiple-version thing was to simply have the thread copy all its shared objects and drop all references to shared.

Then I began to feel uncomfortable with it and decided that just copying the darn thing might be better after all ^^

My main reason for using ==> and <== was to make them stand out a bit ^^. -> and <- just look so skinny. Although it's not carved into stone yet.

I already did a mock-up for this in plain Anarki-on-mzscheme, which is the main reason why Anarki suddenly got semaphores and other lock primitives suddenly a month back ^^

-----