Arc Forum | rocketnia brought up an excellent point, that sometimes a module will want to sh...

Arc Forum

1 point by Pauan 5228 days ago | link | parent

rocketnia brought up an excellent point, that sometimes a module will want to share state with all modules. For instance, consider a module that defines a coerce* table, and overwrites the built-in function coerce.

When you want your custom data-type to be coercable, you can modify the global coerce* table, and voila. The problem is, this module system isolates modules from each other a bit too well. My solution is to allow for changing the built-ins directly. This means that all future calls to (new-namespace) will contain the changes.

Of course, this gives modules a lot of power, since they can change how all modules loaded after them behave. Imagine a module that overwrites eval, for instance. With this module system, such changes would be isolated to the module, so everything is fine. But if it's possible to overwrite the built-ins directly, then one module could break things in other modules, even if they don't explicitly allow it.

Then again, that kind of raw power seems to me to be in the spirit of Arc, so I'm going to try that approach. My suggestion, however, is to use it sparingly: only when your module really needs to persist and be accessible even to modules that haven't imported you.

So... one question: how should you define built-ins? Should there be a special global built-ins* that you can write to directly? Should it be a function call?

1 point by Pauan 5227 days ago | link

I gave this some more thought.

My worry with something like built-ins* is that it gives the programmer a lot of power to break stuff. Not just in their module, but in all modules! But then I realized that any module can just use (system "rm -rf /") and break far more.

Modules are child-proof in the sense that they're protected from other modules, but the built-in functions like `system` are not safe. So, really, you should only import modules that you have verified, or that have been verified by somebody you trust, etc. [1]

Don't rely on the module system to protect you from other people's bad code. The module system exists to make programming easier, better, and more flexible; not necessarily safer. [1]

What this means is that the whole "built-ins is dangerous" thing is a non-issue. Regardless of how dangerous it is, there are already far more destructive things in Arc, so I might as well include built-ins* and give people the flexibility to do stuff.

---

With that out of the way, here's my current plan:

Provide a global __built-ins* variable that contains the built-in functions. Using (new-namespace) creates an empty namespace that inherits from __built-ins* [2]

This means that if you change __built-ins* it will change all modules, even modules that were loaded before yours. This completely destroys the conceptual model of modules being distinct and isolated from each other.

Thus, the name is intentionally ugly. You shouldn't be mucking around with __built-ins* unless you need to, but it's available just in case you do need it. Though... do you think the two underscores are too Pythony?

This has some interesting implications. Aside from the obvious ones (being able to create a persistent function/table/whatever that is available to all modules), it also means you can load any module into the built-ins:

  (load "foo.arc" __built-ins*)

So for instance, the Arc core is implemented in arc.arc, but there's also some libraries like string.arc, code.arc, etc. For now, the plan is for PyArc to only load arc.arc, but we'll see.

In any case, let's say you wanted it to behave like as if PyArc loaded arc.arc and strings.arc. You could use this:

  (load "strings.arc" __built-ins*)

Voila. Now the functions defined in strings.arc are available as globals in every module, including yours. This is the only way to do this. If you used `load` without a second argument, it would load them into your module, but other modules wouldn't have access to them (unless they loaded it too).

[1]: partially invalidated; see below

[2]: not anymore; see below

---

Oh, and here's my plan for handling stuff like (system) etc.

There should be a blacklist (or a whitelist?) of "safe" and "unsafe" functions. With "safe" being defined as "any changes made only affect the module, and do not affect other modules, or the system as a whole."

Obviously things like (system) or __built-ins* are unsafe, etc.

Then, there would be two types of Arc code: trusted and untrusted. Trusted can do anything. Untrusted code can only do safe things, but not unsafe.

Since trusted code can load other modules with __built-ins* this lets trusted code elevate the privileges of any other module.

So, why is this important? Well, it means you only need to verify trusted modules. If a module uses only safe functions, then you can load it in a restricted namespace, which means that you can safely load that module even if it's malicious: any bad stuff it does is scoped to the module.

Obviously trusted modules will need to be verified to ensure they don't do nasty stuff, but that's still far better than having to verify every module. You would only need to verify modules that actually use the unsafe functions. I expect most modules will work fine with the safe subset, so this hopefully would save a lot of time spent verifying.

To accomplish this, I plan to change `new-namespace` so it returns a safe namespace. It will also be changed to accept an optional first parameter: the namespace to inherit from. Thus, if you want an unsafe namespace, you can use this:

  (new-namespace __built-ins*)

This also allows you to inherit from other namespaces:

  (import foo "foo.arc")
  (load "bar.arc" (new-namespace foo))

The above loads bar.arc in a new namespace, that inherits from foo.arc's namespace. So if a variable is not found in bar's namespace, it will then check foo's namespace, etc.

The point of it isn't to restrict what programs can do. The point is to give modules only as much functionality as they need, making it easier to verify that they don't do bad stuff.

Oh, by the way. The whole "unsafe/safe" distinction isn't going to be enforced by the interpreter. So it won't be like you submit a module to a committee and they'll put a "trusted" stamp on it or anything like that. It'll just be a community convention thing. Thus, you could load all modules as trusted code if you wanted to, but then you run the risk of a random module doing bad stuff.

The only way the interpreter will "enforce" this is to provide a standard way of making safe namespaces, and ensuring that safe namespaces can't do bad stuff unless you let them. It's up to you whether you want to load a module as safe or unsafe.

The way I'll handle this is that any modules you specify on the command line...

  ./arc foo.arc bar.arc

...will be loaded as trusted. They can then choose to load other modules as trusted/untrusted as they wish. If people want, I could provide a way to load an untrusted module from the command line, but I don't think there'll be any need for that for a while.

I'm not a security expert or anything, so if you see any sort of flaw with this plan, please point it out!

---

Also, here's an idea I had:

A (current-namespace) function that returns the current namespace. That means that the following two are equivalent:

  (load "foo.arc")
  (load "foo.arc" (current-namespace))

Why? Well, here's my plan for handling macros with modules. While within a module, macros are unhygienic. It's up to you to not screw stuff up: it's your module, after all.

But when importing a module, macros always expand in the namespace they were defined in. There's a problem with that, though:

  ; foo.arc

  (def something () "hello")
  (mac message () '(something))


  ; bar.arc

  (import (message) "foo.arc")
  (message) -> "hello"

  (def something () "goodbye")
  (message) -> "hello"

As you can see, bar.arc wants to expand the macro `message` in bar's namespace, not in foo's namespace. Because macros are hygienic by default, this won't work. You can force it with (w/namespace), though:

  (w/namespace (current-namespace)
    (message)) -> "goodbye"

That means that (message) is equivalent to (w/namespace foo (message))

Could this be handled by `eval`, maybe? Something like this:

  (eval '(message) (current-namespace))

Then w/namespace could just be a convenience macro. Not sure how I'll do the interpreter magic to make that work, though. Guess I'll try it and see!

-----

1 point by rocketnia 5227 days ago | link

"Thus, the name is intentionally ugly. You shouldn't be mucking around with __built-ins * unless you need to, but it's available just in case you do need it. Though... do you think the two underscores are too Pythony?"

"Too Pythony" was my first impression, lol, but it makes sense according to your naming scheme. An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?

---

"There should be a blacklist (or a whitelist?) of "safe" and "unsafe" functions."

I'm not a security expert either, but I don't know if that should be a global list. Suppose I make an online REPL which executes PyArc code people send to it, and suppose my REPL program also depends on someone else's Web server utilities, which I automatically download from their site as the program loads (if I don't already have them). I might distrust the Web server library in general but trust it enough to open ports, but I also might not want REPL users to have that power over my ports.

I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one.

Is there a way to execute resource-limited code in Python, like http://docs.racket-lang.org/reference/Sandboxed_Evaluation.h...? ...Hm, I suppose (http://wiki.python.org/moin/How%20can%20I%20run%20an%20untru...) is a starting point to answer that.

---

"As you can see, bar.arc wants to expand the macro `message` in bar's namespace, not in foo's namespace."

Well, it's fine if that's what you expect as the writer of bar.arc, but I'd expect things to actually succeed at being hygienic. My approach to bar.arc would be more like this:

  (= foo!something (fn () "goodbye"))

This doesn't need to pollute all uses of foo.arc in the application; bar.arc can have its own separate instance of foo.arc.

There may still be a namespace issue though. If foo.arc defines a macro with an anaphoric variable, like 'aif, and then bar.arc uses foo.arc's version of 'aif, then the anaphoric variable will still be in foo.arc's namespace, right? My own solution would look something like this:

  ; in foo.arc
  (mac aif ...
    `(...
       ,(eval ''it caller-namespace)
       ...))

-----

1 point by Pauan 5227 days ago | link

"An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?"

Yeah, I figured two underscores served as more emphasis than one. :P Also, two underscores seemed uglier to me, and also distinguished it from internal ("private") variables.

---

"I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one."

When I said "global list" what I meant was just defining which are safe and which aren't. Then having a default safe namespace that would contain the items deemed safe.

Yeah, you can create custom namespaces, for instance you could create a safe namespace that allows access to safe functions and (system), but nothing else:

  (= env (new-namespace))
  (= env.'system system)
  (load "foo.arc" env)

Voila. In fact, here's how you could handle the scenario you described:

  (= unsafe-env (new-namespace))
  (= unsafe-env.'open-socket open-socket)
  (= web (load "web-server.arc" unsafe-env))

  (= safe-env (new-namespace))
  (eval (read-input-from-user) safe-env)

Thus, web-server.arc has access to the safe functions, and open-socket. Meanwhile, the input that you get from the user is eval'd in a safe environment. It's a very flexible system. The above is verbose, I admit, but that can be fixed with a macro or two.

---

"My approach to bar.arc would be more like this:"

Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad.

Also, there can't be a `caller-namespace` variable (at least not implicitly), because then untrusted code could access trusted code, and so why have a distinction at all? Your example would work, but only if importers explicitly decide to give access:

  (= env (new-namespace))
  (env.'caller-namespace (current-namespace))
  (= foo (load "foo.arc" env))

Now foo.arc can access caller-namespace, because you're allowing them to.

---

Side note: I'm going to start using .' rather than ! because I think the former looks nicer.

-----

1 point by rocketnia 5227 days ago | link

"Side note: I'm going to start using .' rather than ! because I think the former looks nicer."

I agree. ^_^

---

"Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad."

I don't know what else you could do if you wanted both somethings at once. ^^; That said, I think I'd just explicitly qualify foo!something or import it under a new name.

There's some more potential trouble, though. If a module defines something that's supposed to be unique to it, then two instances of that module will have separate versions of the value, and they may not be compatible. If a module establishes a framework, for instance, then two instances of the module may define two frameworks, each with its own extensions, and some data might make its way over to the wrong framework at some point. On the other side of the issue, if a module extends a framework, then two instances of the module might extend it twice, and one of them might get in the way of the other.

There are several possible ways to deal with this. Code that loads a library (host code?) could load it in an environment that had dummy variable bindings which didn't actually change when they were assigned to, thereby causing the library to use an existing structure even if it created a new one. Framework structures could all be put in a single central namespace, as you say, and any code to make a new one could check to see if it already existed. A library could require some global variables to have already been defined in its load namespace, intentionally giving the host code a lot of leeway in how to specify those variables.

I've been considering all those approaches for Penknife, and I'm not sure what'll be nicest in practice. They all seem at least a little hackish, and none of them seems to really solve the duplicated-extension side of the issue, just the duplicated-framework side. At this point, I can only hope the scenarios come up rarely enough that whatever hackish solutions I settle on are good enough, and at least standardized so that not everyone has to reinvent the wheel. Please, if you have ideas, I'm all ears. ^_^

-----

1 point by Pauan 5227 days ago | link

When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?

In any case, all those approaches should work in PyArc, in addition to using __built-ins* (provided you really really want the library's unique something to be unique and available everywhere...)

Hm... come to think of it... an environment/namespace/module can be anything that supports get/set, right? It may be possible to create a custom data-type that would magically handle that. Somehow. With magic.

-----

1 point by rocketnia 5227 days ago | link

"When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?"

I thought I gave an example. If a module defines something extensible, then having two extensible things is troublesome, 'cause you have to extend both of them or be careful not to assume that values supported by one extensible thing are supported by the other.

---

"Somehow. With magic."

I propose also reserving the name "w/magic" for use in examples. :-p

-----

1 point by Pauan 5227 days ago | link

"something extensible?" Got any more specific/concrete examples?

---

"I propose also reserving the name "w/magic" for use in examples. :-p"

Okay, but if I find a magic function I'm going to put it in PyArc so you can use it in real code too. :P

-----

1 point by rocketnia 5227 days ago | link

""something extensible?" Got any more specific/concrete examples?"

I mean something extensible like the 'setter, 'templates, 'hooks, and 'savers* tables, as well as Anarki's 'defined-variables* , 'vtables* , and 'pickles* tables, all defined in arc.arc. These might sound familiar. ^_^

Lathe (my blob of Arc libraries) is host to a few examples of non-core Arc frameworks. There's the Lathe module system itself, and then there's the rule precedence system and the type-inheritance-aware dispatch system on top of that. There's also a small pattern-matching framework.

If you load the Lathe rule precedence system twice (which I think means invasively removing it from the Lathe module system's cache after the first time, but there may be other ways), you'll have two instances of 'order-contribs, the rulebook where rule precedence rules are kept. Then you can sort some rulebooks according to one 'order-contribs and some according to the other, depending on which instances of the definition utilities you use.

---

"Okay, but if I find a magic function I'm going to put it in PyArc so you can use it in real code too. :P"

I think I saw one implemented toward the end of Rainbow.... >.>

-----

1 point by Pauan 5227 days ago | link

Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated.

If you want your stuff to be available everywhere, stick it in __built-ins. Unless you have a better suggestion?

-----

1 point by rocketnia 5227 days ago | link

"Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated."

That's exactly why I'm not sure it'll come up much in practice. But as an example...

Suppose someone makes a bare-bones library to represent monads, for instance, and someone else makes a monadic parser library, and then someone else finally makes a Haskell-style "do" syntax, which they put in their own library. Now I want to make a monadic parser, but I really want the convenience of the "do" syntax--but I can't use it, because the parser library has extended the monad operations for its own custom monad type and the "do" library only sees its own set of extensions.

You mentioned having the person loading the libraries be in charge of loading their dependencies, and that would yield an obvious solution: I can just make sure I only load the monad library once, giving it to both libraries by way of namespace inheritance or something.

But is that approach sustainable in practice? When writing a small snippet for a script or example, it can't be convenient to enumerate all the script's dependencies and configure them to work together. Over multiple projects, people are going to fall back to in-library (load ...) commands just for DRY's sake. What I'd like to see is a good way to let libraries specify their dependencies while still letting their dependents decide how to resolve them.

---

"Unless you have a better suggestion?"

I've told ya my ideas: Dummy global variable bindings and/or a central namespace and/or configuration by way of global variables. (http://arclanguage.org/item?id=14036) They're all too imperfect for my taste, so I'm looking for better suggestions too.

-----

2 points by Pauan 5227 days ago | link

Hm... like I said, it should be possible to build a more complicated system on top of the simple core, though I'm not sure exactly how it would work.

But... here's an idea: a bootloader module that would load itself into __built-ins* so it could persist across all modules, including modules loaded later.

It could then define (namespace ...) and (require ...) functions or something. Modules could be written using said constructs, and the bootloader would then handle the dependencies, creating namespaces as needed. And it could keep a cache around, so re-importing a module that has already been imported will just grab it from the cache.

The bootloader could then define (use ...) or something, which would do all the automatic dependency and caching junk, but you could still use plain old (load) and (import) to bypass the bootloader and get more refined control. Something like that may work.

Haha, I just had a crazy idea. What if a module imported itself into __built-ins* ? Something like this:

  ; foo.arc

  (if no:__built-ins*.'foo-check
    (do
      (= __built-ins*.'foo-check t)
      (load "foo.arc" __built-ins*))
      
    (do
      ; define rest of foo.arc here
      ...))

I suspect any solution will have some wart or other. Tradeoffs and all that. Also, the solution to the specific problem you mentioned is to load them all in a single namespace, right? Or at least namespaces that inherit from some common one.

So perhaps we could define a macro that makes that easier, since the current way of doing it is pretty verbose. Assuming it was almost-as-simple as (import ...) that would help ease the pain somewhat, though it wouldn't help with dependency management (that's a whole different ballpark).

I also thought of a macro that would make it easier to import/export stuff to/from a module. Right now you need to do stuff like this:

  (= env (new-namespace))
  (= env.'foo foo)
  (= env.'bar bar)
  ; etc.

Which is clunky. But I haven't figured out a good name for it. Okay, wait, I could use plain-ol `namespace`:

  (namespace foo bar)

I'm undecided though. It's like (table) vs (obj), only with namespaces.

-----

1 point by Pauan 5227 days ago | link

Oh, and by the way. In addition to creating a safe namespace and selectively giving it unsafe functions, you can also remove functions from a safe namespace.

For instance, suppose you wanted to run code in a safe environment, but you didn't want it to be able to print (using pr, prn, prt, etc.) You could use this:

  (= env (new-namespace))
  (= env.'disp nil)

  ; do something with env

Like I said, it's very flexible. You have complete control over what is/isn't in a namespace. You can execute each module in it's own namespace, or combine them however you wish, etc. It has a very simple core, but has many many potential uses.

-----

1 point by shader 5228 days ago | link

Traditionally, scopes are formed as trees, and if a name isn't found in the local scope, then the parent is checked and so on.

I see namespaces, environments and scopes as different names for the same thing. Thus when arc is loaded, it would create a default namespace, and load all built in functions into that namespace. It's up to you whether user functions default to a "user" namespace (some languages do this) or default to the root namespace. Any newly created namespaces reference that as their parent, and so on down the tree.

If you do implement a true environment system for PyArc, I recommend you do it this way.

I'd also recommend considering making environments a first class part of the language, which is where you seem to be headed. Reifying environments creates many interesting possibilities, including user-level implementations of module systems, selective dynamic scoping, parameterization, and more control over evaluation in general.

-----

1 point by Pauan 5228 days ago | link

It's already done. As said, modules are implemented, but there's still some bugs to work out.

There's a Python variable global_env that contains the base functions (defined in PyArc) and (once I get it working) arc.arc as well. When it loads a file, it creates a shallow copy of global_env and uses that as the namespace for the file.

Then, within the file, if it uses import, that then calls (new-namespace) which once again creates a shallow copy of global_env.

Actually, PyArc does have an environment class, but it's not exposed to Arc. On the other hand, eval and load both accept tables and alists for an environment. Is that what you meant?

So, I already have all the scaffolding in place, I'm just trying to decide on what name to use. I don't really like the name built-ins* but it does seem to be pretty accurate.

-----

1 point by rocketnia 5228 days ago | link

If you're creating a shallow copy and you're just treating a namespace as something that maps names to values (rather than mapping names to bindings containing values), then it won't be as customizable in some ways: When you redefine 'some, you don't get 'all for free, and you don't get to maintain different 'setter tables in different namespaces.

I'm going to great pains reimplementing Penknife so that I create an all new core environment from scratch each time, just so that it can be hackable like that. XD But I'm doing this by passing all the bindings around manually so that core-call's implementation can depend on binding-get's binding and vice versa. The core is ridiculously circuitous, with some 11-argument functions and such. Fortunately, the circuitous part is mostly confined to a single, reasonably sized file, which defines a DSL for the rest of the files to use.

Gotta finish up this reimplementation and post it at some point. XD;;;

-----

1 point by Pauan 5228 days ago | link

Okay, so, my plan right now is that if a function is bound in global_env, it has dynamic scope, and if it's bound anywhere else, it's lexical.

This should allow for my shallow-copying strategy to work, but also allow for shadowing core functions. This may break arc.arc though, but I'll tackle that when I get to it.

-----

1 point by rocketnia 5228 days ago | link

I have almost no clue what you're doing, but I hope it works out. Treating core variables differently than others sounds nothing like what I would do, so we're separate people. :-p

-----

1 point by Pauan 5228 days ago | link

Yes, it's ludicrous, crazy, and probably insane, but I'm trying it anyways. I suspect it'll break later, though, so I'll have to go back to lexical-scope-for-everything.

By the way, I called it dynamic scope, but I'm not actually sure what it is. It's this weird short-circuiting thing that causes the global_env to jump back one environment level, which then causes it to loop back onto itself if the variable isn't shadowed, but it works (for now).

Edit: nevermind, I had to revert it. Darn. It was such a silly hack too.

-----

1 point by Pauan 5228 days ago | link

Hm... yes, I may end up needing to change that, or hack it in some way to allow better redefining of built-in functions, while still keeping modules isolated.

-----

1 point by rocketnia 5228 days ago | link

If there were a special global 'built-ins*, would it be a built-in? ^_^

Honestly though, I'm not quite sure quite what you mean by defining built-ins. If you're trying to change what's returned by (new-namespace), guess what: You can give something a modified version of 'new-namespace. :D

-----

1 point by Pauan 5228 days ago | link

Yes, it would be. In fact, it would have to be.

Okay, so, let me explain how this works... global_env is a Python dictionary that defines the built-ins that are exposed to Arc code. After parsing and evaling arc.arc (which still doesn't work yet, but it should eventually), it now contains the core of Arc, including special functions defined in PyArc.

The (new-namespace) function creates a shallow copy of global_env, which is what keeps the modules isolated from each other, because when you use (import) it loads it with (new-namespace).

What this means is, Arc code cannot overwrite built-ins, they can only shadow them. So if you overwrite the new-namespace function, that change would only affect your module, and nobody else's. See what I mean about modules being too isolated from each other? They're child-proof! :P

What would need to happen in order to support the point you brought up (a coerce* table, etc.) would be a way to actually write to global_env directly, bypassing the shallow copy. But since (new-namespace) works by creating a shallow copy of global_env, any future modules loaded after yours would use the stuff you defined even if they don't import your module, which is why I'm calling it dangerous (but possibly dangerous in a good way).

I'm just trying to decide on the name that is used to write directly to global_env. I think it should be a table, but it could be a function as well, like (new-namespace). Of course, there's still the question of whether we should allow overwriting the built-ins at all, but I think that malleability fits well with Arc.

-----