Arc Forum | Okay, I know I wrote a lot, and I kinda rambled a bit, so here is the simplified...

Arc Forum

2 points by Pauan 5198 days ago | link | parent

Okay, I know I wrote a lot, and I kinda rambled a bit, so here is the simplified step-by-step explanation of what I'm talking about. You can ignore the other posts if you like, but please do read this one.

First, what is our goal? I want to allow users to create new data types that behave like existing Arc datatypes... I want them to fit in seamlessly. What am I talking about?

Suppose you wish to create something that behaves almost identically to a table, but does something special when you call = on it:

  (= my-table.'foo 5) ; do something special

The naive approach of creating a new data type and extending sref is not extensible or scalable. I'll explain why soon.

---

But first, what is the problem? In order to explain this, I must first explain the concept of private/public data. Public data is exposed everywhere. An example would be a global variable or a global function.

Private data is data that is hidden from the outside world completely. In Arc, we can use closures to create private data:

  (let priv 5
    (def get-priv () (+ 2 priv)))

  priv       -> error
  (get-priv) -> 7

As you can see, by using a let expression, the variable priv is private: only the function `get-priv` can access it, but nobody else. `get-priv` has full control over who gets to access the variable priv, and how they access it.

Okay, so what? We use closures all the time, this isn't exactly new... hold that thought for a moment. What is an input port, in Arc? They are obviously implemented in Racket, and they contain private data, just like the variable priv.

This is important. Arc code cannot access the private data of the stdin port. Only the Racket functions readc, peekc, readb, etc. can access stdin's private data. This means two things:

1) The above-mentioned functions (readc, peekc, etc.) must be implemented in Racket, because only they can access stdin's private data.

2) This makes it much harder and more verbose to create "fake" input ports: a data type that is written in Arc, but designed to behave like an input port.

Let's consider an example. We wish to create two input ports, foo and bar. Here is the obvious way:

  (let stream '(#\f #\o #\o)
    (def foo ())
    
    (extend peekc (x) (is x foo)
      (car stream))
      
    (extend readc (x) (is x foo)
      (pop stream)))
      
      
  (let stream '(#\b #\a #\r)
    (def bar ())
    
    (extend peekc (x) (is x bar)
      (car stream))
      
    (extend readc (x) (is x bar)
      (pop stream)))

Now let's test it:

  (peekc foo) -> #\f
  (readc foo) -> #\f
  (readc foo) -> #\o
  (readc foo) -> #\o
  (readc foo) -> nil

  (peekc bar) -> #\b
  (readc bar) -> #\b
  (readc bar) -> #\a
  (readc bar) -> #\r
  (readc bar) -> nil

And it works perfectly. But you'll note a couple things... first off, the verbosity. Even worse, there's duplicate code! A macro won't help much, because your stream might want different behavior, so you still need to define it individually. And this was only an example with extending readc and peekc... let alone the other input functions! Not only is the above verbose, but it's inefficient! Every time you want to create a new stream, you need to re-extend all the relevant built-in functions...

Now, that was just streams. Tables are a whole different ballgame. In fact, in pgArc it's impossible to make a fake table (something that looks and behaves like a table, but isn't). This is because you can't properly extend eval and apply in Arc. But even if you could, it would be just as clunky as the stream example (or worse).

Why is this? It is for the exact same reason: tables contain private data that only Racket knows about. Arc cannot access that private data, nor can it create it. But before we solve that problem, let's take a little detour...

---

There are fundamentally two ways of calling a function. Let's consider a simple example:

  (+ 1 5)  -> 6
  (1 '+ 5) -> 6

Woah, what's going on here? Suppose we had a hypothetical Lisp language where numbers were functions. Rather than calling (+ 1 5) you would call the number, and tell it what behavior you want (in this case, addition). Thus:

  (+ 1 5) -> (1 '+ 5)
  (- 1 5) -> (1 '- 5)
  (* 1 5) -> (1 '* 5)
  ...

What gives? That's really weird! Why are we doing this? Hold that thought for a moment, please. Let's look at some more examples:

  (car my-list)       -> (my-list 'car)
  (len my-list)       -> (my-list 'len)
  (my-table 'foo)     -> (my-table 'get 'foo)
  (= my-table.'foo 5) -> (my-table 'set 'foo 5)

No, really, why are we doing this? Let's go back to our two streams from before. They both contain private data, specifically a list of characters. peekc and readc need access to that private data, so we extend them. The alternative would be to store the data as global variables, which is ridiculous.

But... wait... functions can accept arguments. Consider the two streams again. foo is a self-contained stream, that has access to private data. We want a way to access that private data, which is why we use the functions readc and peekc. The problem is, that the behavior of the private data varies depending on how foo is called.

If foo is called by readc, we want it to consume a character... but if foo is called by peekc, we don't want it to consume a character. But foo doesn't know what function called it! It doesn't know whether it was called by readc or peekc, so how does it know what to do? Simple, we give it a message, telling it what to do:

  (foo 'peek)
  (foo 'read)

The first argument to foo is a symbol, telling foo what to do. Now foo knows whether it should consume a character or not. But... why are we going into this whole "message passing" thing? To reduce verbosity and drastically increase extensibility. Let's assume that we adopted this message-passing system. You could now write the two streams like this:

  (let stream '(#\f #\o #\o)
    (= foo (annotate 'input (fn (x)
                              (case x
                                'peek (car stream)
                                'read (pop stream))))))
                                
  (let stream '(#\b #\a #\r)
    (= bar (annotate 'input (fn (x)
                              (case x
                                'peek (car stream)
                                'read (pop stream))))))

You'll note something. I didn't have to extend anything. I didn't have to extend readc, or peekc, or readb, or anything else. Everything just works. Imagine if every time you wanted to make a new stream, you had to extend several built-ins... ew. This way, you don't need to extend anything.

Okay, that's great for streams, but what about tables? They work too! My idea is that when `apply` sees something that is annotated with type 'table, it will do a behind-the-scenes conversion:

  ; before            -> after
  (my-table 'foo)     -> (my-table 'get 'foo)
  (= my-table.'foo 5) -> (my-table 'set 'foo 5)
  (keys my-table)     -> (my-table 'keys)

This is an internal conversion done by `apply`, so tables look and behave the same way they do right now. But what it allows us to do is create new table types, that integrate seamlessly into Arc:

  (annotate 'table (fn (x)
                     (case x
                       'keys ...
                       'get ...
                       'set ...)))

Basically, all the compound data types (conses, tables, input, output) would be represented as functions that accept a message parameter. This lets Arc code create custom data types that integrate seamlessly into the language, without needing to extend anything.

Note: I'm only proposing this for the data-types. Arc code in general can be written in this style, of course, but I doubt it will be. But given the evidence I've seen, I think this particular style is well-suited to creating data types in Arc.

1 point by Pauan 5198 days ago | link

Okay, so... one thing I didn't emphasize in this post is that this lets you write core Arc functions in Arc itself... Like cons, table, instring. Stuff like that. So here's my first partial crack at it:

  (def attr (x . args)
    (apply (annotate 'fn x) args))

  (def cons (x y)
    (annotate 'cons
      (fn (m)
        (case m
          'len (+ (len y) 1)
          'car x
          'cdr y))))

  (def car (x)
    (attr x 'car))

  (def cdr (x)
    (attr x 'cdr))

  (def table ()
    (let alist '((nil nil))
      (annotate 'table
        (fn (m k v)
          (case m
            'keys (map [car _] alist)
            'get (alist k v)
            'set (sref alist k v))))))

  (def keys (x)
    (attr (coerce x 'table) 'keys))

  (def len (x)
    (case type.x
      'cons  (attr x 'len)
      'table (len (keys x))))

  (def vals (x)
    (map [x _] (keys x)))

Note: this is assuming py-arc, and also assuming the message-passing idea.

It's untested, and py-arc doesn't actually support my idea (yet), so I might not be testing it anytime soon. As such, there's probably bugs in it. But it should give you a taste of the power of this idea: the ability to define core datatypes in Arc itself, using only functions (no piggybacking!), but do so in a way that it's easy for user code to extend.

As a side note, what's with the attr function? Well, you see, when Arc sees (my-table 'foo) it expands it to (my-table 'get 'foo), so calling (my-table 'keys) doesn't work. But by annotating the table with type 'fn, you can get at the actual "hidden" attributes, like 'keys, 'get, and 'set. This means that (my-table 'keys) will never conflict with (my-tables 'get 'keys).

Also, I'll note that this is probably significantly slower than having these things defined in Racket, but that's okay. It's nice to know that Arc has the possibility to define them, even if they're defined in Racket (for the sake of performance).

-----

2 points by aw 5198 days ago | link

Yes, Arc should be hackable, so it's a good idea for people to be able to provide their own implementations in Arc for tables, streams, and so on.

-----