That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C). And whenever you try to set (or lset, or scar, or scdr, or sref---good catch) an object whose shared is true (1), then you copy it first, modify it, set its shared flag to false (0), and then make the reference through which you modified it point instead to this new object.
Except this doesn't take any shared structure into account---as soon as you send something over the network, all the objects which share it are fragmented. So you need some sort of "versioning." Perhaps you instead have every reference have a version flag, and every copy between threads increments version for the new copy; modifying something checks the reference you are modifying it through, makes a new copy if necessary for that version, and then modifies it. The problem is that I can't see a good way to do versioned indexing, especially if references are really just raw pointers.
Though note that if you do come up with a working "copy-on-write-and-other-things" scheme, the Arc copy function becomes essentially
Incidentally, 'lset is very, very hard to implement if closures are not very mutable, as in the arc2c output.
> and then make the reference through which you modified it point instead to this new object.
(def hmm (x)
(let y (cdr x)
(scar y 42))) ; is x mutated? the reference I used here was *just* y...
> the Arc copy function becomes essentially (...)
Yes, this is true, although I'd prefer to add some sort of 'uniq tag so that other pending messages don't accidentally get taken ('<== does pattern matchimg, obviously implemented via my p-m macro) ^^. It will even correctly handle recursive and shared structure, which I think the current 'copy does not ^^.
(def copy (arg)
(w/uniq tag
(==> (mypid) (list tag arg))
(<==
(,tag cpy) cpy)))
Edit:
> That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C).
The problem here is: what if the object to be sent is already shared and the receiver already has it? Say:
(def process-A ()
(let v (list 42)
(==> B v)
(==> C v)))
(def process-B ()
(<==
any (==> C any)))
(def process-C ()
(givens v1 (<== any any)
v2 (<== any any)
(if (is v1 v2)
(err "I somehow received the same object twice!"))))
I don't think you have to worry about versioning. Because sharing is, as far as I can tell, just to save time and bandwidth over copying, conceptually all messages are copies. This means that you shouldn't be programming as if you can update other processes just by changing a shared object. They would have to detect the change and copy it again anyway, if the other owning processes were remote.
Instead, you should only transfer information over messages. That way you don't need to worry about keeping shared objects up to date for all users.
Good idea about the shared flag; maybe it should be a ref counter? That way you can tell if the other process(es) have stopped using it. So, if the ref count is 1, than just straight up mutate, otherwise, copy?
Also, couldn't the send function be:
(-> pid message)
? It would save a char ;) Of course, you probably have a reason for not using that.
Let's assume you just have the shared flag. First, line 4 will set the shared flag to true. Then, on line 5, we mutate str1. We check the shared flag of the value of str1. OK, it's true, so we make a copy of the value and mutate it. Then we print out our values on lines 6 and 7:
Message
message
The problem is that the copy separates the string not only from its copies in other threads, but also its aliases in the thread it comes from.
Versioning would mean that str1 and str2 would both be pointers to "version 0" of some value, which had an internal max-version of 0. Line 4 sends the object over, so max-version is incremented (to 1), and the other process is sent a pointer to "version 1" of the same object (1 because that's the new max-version). However, since no modifications have been made, somehow (I don't know how) both "version 0" and "version 1" resolve to the same underlying object. Then, on line 5, sref is called on "version 0" of this object, so the object is copied: "version 0" is copied away and modified, but "version 1" is left alone. When we print the objects, their reference to "version 0" makes them look at this new, copied objects, so we see
Message
Message
The problems here are that (1) I don't know how we'd do versioned addressing, and (2) what we do with the copy of "version 0." It seems like you would have a structure which had a list of the available values, and some sort of mapping from versions to values, e.g.
; Available values
A --> "a string or another object"
B --> "A string or another object"
C --> "A string."
; Version --> value mapping
0 --> [A]
1 --> [A]
2 --> [B]
3 --> [A]
4 --> [C]
Then, when we modify 0, to, say, "Another object...", we would add D --> "Another object..." to the list of values and change 0 to 0 --> [D].
Hmm, thinking about the underlying structure of this and writing it out makes me think this makes less and less sense, and that either (1) this is actually trivial/a restatement of another scheme/both and thus won't work, or, if I'm lucky, that (2) there's some more general and less clunky scheme hiding inside it (hey, I can dream, right?). My money's on (1) right now, but I do feel like I understand the problem better.
Oh, and I like your suggestion of -> / <-. Nice and short.
EDIT: Almkglor, is any of the code for your VM available for perusal/editing? School's gotten out and I've gotten interested, so I might be interested in working on it (if I can... I am not all that experienced, after all, so it might be beyond me; regardless, I'd be interested in reading it).
Maybe you could get versioning to work by making a real alias as opposed to just a copied reference. If somehow (i'm getting really far fetched here) expansion of macros, etc, could happen in any position instead of just application, you could have the alias expand to the other variable name. Then, you wouldn't have to worry about messing with your aliases ;)
So instead of merely 'evaluating' identifiers, you 'expanded' them, and variables expanded to their values, etc. you could have better aliases, and neater macros. Of course, I've probably misunderstood something horribly, but that's why you guys are here.
The problem is that macro expansion happens at "compile" time, so local variables don't have their values yet. But you reminded me of something in arc2c: to quote the "NOTES" file, "...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives."
So, since I believe almkglor said he was planning to use arc2c for this VM, only changing the code generation step, then the solution seems to be to just use the shared flag on values (probably the reference-counting version, as shader suggested). It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping. Then, when you modify something, then if it's a sent copy, that copy is modified, not affecting the original (because of the shared flag); if it's a local with aliases, then it's modified through the sharedvar, and since the sharedvar is still aliased and only its internals are modified, there's no problem!
Well, maybe---that assumes that I actually understand sharedvars well enough. Almkglor? Stefano (since you suggested/invented them)?
> It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping.
You never actually send sharedvars - they are not accessible at the Arc level. However, they are necessary in order to send functions across processes; the generated code expects to see sharedvars, so it has to have (a copy of) sharedvars.
Edit:
> ...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives.
So, is that the only situation versioning is required? Updating local aliases? If so then there should be some work around, or we could just forget about it and consider it a "feature" ;)
Also, I agree about the whole perusal / editing bit. I'm probably even less experienced than absz, but no less interested.
And I again recommend moving to a new thread. The window's getting a bit cramped, and we aren't even talking on the original subject of the post.
> Almkglor, is any of the code for your VM available for perusal/editing?
Not yet, the only code I've started writing on is the memory management code. There are two versions, one which uses a mark-and-sweep for local heaps and has shared heaps implemented via boost::shared_ptr, and my newer version which uses stop-and-copy and does (will do?) copy-on-message-pass.
I can send you (and anyone interested) some initial tarballs though ^^. WARNING: C++. With RAII. Be scared.
My solution for the multiple-version thing was to simply have the thread copy all its shared objects and drop all references to shared.
Then I began to feel uncomfortable with it and decided that just copying the darn thing might be better after all ^^
My main reason for using ==> and <== was to make them stand out a bit ^^. -> and <- just look so skinny. Although it's not carved into stone yet.
I already did a mock-up for this in plain Anarki-on-mzscheme, which is the main reason why Anarki suddenly got semaphores and other lock primitives suddenly a month back ^^