Arc Forumnew | comments | leaders | submitlogin
3 points by waterhouse 5145 days ago | link | parent

I've had a couple of ideas...

1. Create a special form that rebinds 'assign (which is the assignment operator that '= and pretty much everything else expand into--except not things like 'scar or the equivalent (= (car x) ...)), so that instead of affecting global variables, all '= and 'def forms evaluated within this special form will modify values within a hash table. This special form can return the hash table. You'd put (load "file.arc") inside that special form.

This will probably have to be done by the Arc compiler (or something that duplicates its functionality), because global references to preexisting variables will need to be compiled normally, while global references to variables within the module will need to be compiled into references to the hash-table (or perhaps to variables in a local environment that spans the whole module), and references to local variables will need to be compiled normally. I think this will be somewhat hard to implement, but if done, then it could do exactly what hasenj asks for.

2. A hacky idea that seems it'll work... very well, in fact. It has several caveats, which I'll get into later. Here are the pieces of the idea.

A. Scheme has a built-in (namespace-mapped-symbols) function. It returns a list of all variables that have values in the current namespace.

B. We can use this to extract a list of all Arc variables.

  (def arc-namespace ()
    (map [sym:cut string._ 1]
         (keep [is string._.0 #\_]
               ($.namespace-mapped-symbols))))
C. Now, we're going to find out what global variables the module file creates. And also what it modifies; there may be some definition overlaps (that case is what such a module system is meant to deal with). So, we're going to store the entire Arc environment (i.e. make an assoc-list or hash-table of symbol-value pairs), load the file, find out what variables were modified or created, restore the old environment, create a massive local environment in which all the newly created/modified variables are initialized to nil, load the file in that local environment, and do what we want with all the local variables that now refer to the variables created in the file (like stuff them in a hash table).

Details. First, note that 'eval doesn't respect local variables, and I think it kinda has to be that way, so the way we should load the file is like this:

  (eval `(with (newvar nil newvar2 nil modvar nil ...)
            ,@(readfile "stuff.arc")
            (let gtb (table)
              (= gtb!newvar newvar
                 gtb!newvar2 newvar2
                 ...)
              gtb)))
The stuff involving 'newvar will, of course, be macro-generated as well. This may be a complicated beast of a macro, but quite doable.

Less importantly, note: a) I don't think Arc has something like (let u 'ach (setf (symbol-value u) desired-value)). This is easily patchable with something like (let u 'ach (($ namespace-set-variable-value!) ($.ac-global-name u) desired-value)). You can also do (eval `(= ,u ',desired-value)) but that sucks in a few ways and it really should be put into the language. (Note that $.[name-containing-bang!] breaks due to ssyntax, so you need to put it in parentheses form.)

b) Currently, hash tables can't contain nil; with table tb, (= tb!x nil) deletes the value associated with x. This is worked around in the definition of 'defmemo by creating a "nilcache" table. Not sure if that's the best way to handle it. For the moment, I'd probably use an assoc-list--and note that 'alref doesn't allow an optional "fail-value" argument, so I'd use 'assoc and deal with it.

c) I don't think Arc has a way to undefine variables. Again, this is easily patchable: (($ namespace-undefine-variable!) ($.ac-global-name var)).

...And here is an implementation, and it in action.

http://pastebin.com/qJRepGcM

(I modified it from the above to respect macro definitions within the file. Interestingly, this leads me to load a file three times: once to extract changed variables, once to get macros, once to evaluate all definitions. I've heard of "load, compile, and run". Maybe this is like that.)

We see that creating the module in a local environment containing all the variables makes each one independent. You can just extract 'ppr, and its lexical environment will contain all its dependencies; you can throw away the rest of the hash table, and your namespace is totally unpolluted; additionally, there is no cost of doing hash table lookups (in fact, they're lexical-variable lookups) or of figuring out how to make the compiler optimize them out.

Also, take note of this: Suppose a module contains a huge amount of stuff and all you want is one little function that depends on just a couple of things. If we were, say, going to save the Arc process into a self-contained executable and distribute it, we wouldn't want to stick it with 40 MB of mostly unnecessary libraries (I'm looking at you, SBCL). Well, guess what happens if you use this local-environment method and extract only the function you want: All the unnecessary crap will have nothing referencing it and will get garbage-collected. This is so good. I believe it is the perfect solution. (Returning a hash-table and extracting the desired functions and throwing away the table also has this effect.)

Caveats. First, there are fundamental issues with this design. All toplevel expressions will be evaluated more than once (three times currently; two is easily possible, but not one without compiler power); if you want the module to do any multiple-evaluation-sensitive initialization, put that inside a function named 'initialize or similar, and the person loading the module should call (initialize) afterwards. Also, your module shouldn't generate new names when loaded multiple times. And it shouldn't contain any functions (including 'initialize) that do global assignment to variables that don't exist yet. So if you want it to do (= counter* 0) within a function, you should do a global (= counter* 'nonce). Finally, don't even think about using macros from a module (though the definitions in the module can themselves contain macros). Macros will be put into the table, and you can extract their functions with 'rep and put them into your own macros, but if their expansions are supposed to reference things from the module, that seems impossible without at least code-walking power.

Second, there are some issues that can be fixed with some surface modifications of the design. If you want to load a module and then modify anything inside it (global functions or variables), that currently will have no effect unless the module provides specialized functions to do so (the functions depend on what's in the lexical environment of the module, which you can't touch by modifying the hash table). However, it is trivial to make 'load-module insert 'get-var and 'set-var functions into the module that will get and set the lexical variables. A user of the module will have to use (ns!set-var 'x val) instead of (= ns!x val).

Issues with my implementation: It uses 'is to determine what variables were changed. If, say, your code goes (= thing* nil) and the module goes (= thing* nil), my implementation won't notice the difference and won't localize it. Likewise for (= counter* 0) and (= name* "Bob"). This could be worked around by, say, using something like "defvar" to define global variables (not to be confused with the defvar from Common Lisp, which creates special variables with dynamic scope), where 'defvar would record everything it does in a table that 'affected-vars could access. (For helpfulness, 'load-module could give warnings when attempting to load a file containing toplevel '= things.) Or: We could modify the body of '= to contain a clause like

  (if we-are-loading-a-module*
      `(do (= (things-changed* ',var) t)
           ,(proceed-as-usual))
      (proceed-as-usual))
Then 'affected-vars could set we-are-loading-a-module* to t, do its work, and set it back to nil. Then it could work perfectly (assuming one doesn't do global assignment with 'assign).

Also, btw, it doesn't reset the 'sig function signature table. Easily fixable.

With all these issues in mind, here is my evaluation of my implementation:

1. If your module just defines functions (possibly using macros to define them) and possibly hash tables, then it is flawless and requires no boilerplate.

2. If your module creates global variables with initial values that 'is could think are identical to those already created (integers, nil, strings, symbols), there's a small chance that bad things will happen (if your code and the module's code happen to initialize the same global variable to the same value). This could be fixed in a few different ways.

3. If you want to load a module and then modify and use its global variables, then, with a fix I have described above, you can do that with a little bit of boilerplate.

4. If your module does weird stuff like modifying different variables the second time it's loaded, or performing structural modifications (= (car ...) ...) on existing global variables or the universe (writing to files), this will die badly, although in non-weird cases it could be fixed with the minor boilerplate of putting stuff in an 'initialize function.

5. If you want to export macros... bad. Sorry. Note, however, that a certain class of macros can be covered by creating functions that accept thunks, and then the person using the module can write a macro on top of it. (See 'on-err as an example; imported from Scheme.)

Summary: I agree with hasenj that returning a hash-table containing all variables in a module would be nice. How to do this? Either you could build something on top of the Arc compiler, which I expect to be hard, but it would let your implementation be perfect; or you could do something like my 'load-module system. My system is easy to implement (I did it just now, though it is the result of a lot of thought plus a bit of work), and it works pretty well (it works perfectly in the first large class of cases, almost-perfectly and fixably in the second, fails in the third but is easily fixable, and it works somewhat in the fourth and fifth cases).