This is part II of a series of blog posts, but I don't know if part I or part III is as immediately relevant to us.
One of the things this touches on is associating expressions with source code. That's something I tend to neglect to do, but this makes it look easy.
The other thing is it shows a way to normalize expressions to other expressions, just using HOAS (aka putting functions in your expression AST). This makes expressions the only run time data structures needed in the language.
Normalization of expressions is commonly needed in dependent type theories because it makes it possible to compare types by intensional (implementation) equality, even if those types contain program expressions. (Comparing types for equality is needed for instance when typechecking a funtcion application, since you might have used an argument of incorrect type.)
The creator (smihica, Shin Aoyama) hasn't come to the forum yet, but their work is impressive. Just look at that REPL start up in under a second! On my machine anyway. :)
Thanks to svetlyak40wt for discovering this project 80 days ago (http://arclanguage.org/item?id=18355). It was in active development then, and it's still active now.
---
The primary spark that led me to make Rainbow.js was a desire to respond constructively to threads like this one.
Using Rainbow was a side effect because I wanted to wow people with the execution speed, which conanite had already meticulously worked on in Rainbow. This was still tied into the desire to help those forum threads along: Between speed and portability, what possible excuse could there be not to use it? ^_^
Once I committed to particular technical goals involving consistency with Rainbow, the 20,000 lines of code were basically predetermined. It's kind of unfortunate that it would fail to be helpful in threads like this one, but it's not all a loss.
The secondary spark that led me to make Rainbow.js was that I wanted to get the hang of using JavaScript, after having worked mostly in Arc for a few years. That goal was met. :)
Thank you so much for this!! arc-js looks great. I'm able to define and run Arc macros with it.
This changes everything. With arc-js you can start writing client-side apps in the browser without dealing with Javascript. You can add libraries in arc-js that do that (are there any already?) The big advantage is you bypass HTML: you can make a browser and a server send sexps back and forth.
There I was thinking there was no hope with an Arc in Javascript and sure enough Arc is quietly thriving.
That arc-js looks very promising, though also somewhat incompatible with the existing libraries.
For instance, the basic fizzbuzz example provided on the main page looks very similar, but doesn't actually run in anarki. Maybe that's anarki's fault, but the syntax of the for loop is different.
I do like the idea of arc ported to js though; makes getting access to mongodb easier, at the very least :)
In general, worrying about compatibility in the arc/anarki neighborhood is a fool's errand. Not worth doing. Just do what makes sense, and don't be afraid to be different.
Thanks for sharing, rocketnia! I’ve been looking into ways of getting started, and Arc.js looks like the most novice-friendly implementation so far. Node.js and browser is what I use for hobby coding anyway. :)
The news.arc code writes to files. It doesn't use an SQL database.
---
Even without SQL, code injection is something to worry about. The Arc codebase is a breeding ground for exactly this kind of issue, since it rarely does string escaping. Let's see...
HTML injection (XSS attacks): This is the kind of injection news.arc primarily needs to worry about. Almost every string it passes around is used directly as an HTML code snippet. Fortunately, every user input is sanitized thanks to the form-generating utilities in app.arc.
Shell injection: Make sure that any directory paths passed to (ensure-dir ...) are already shell-escaped. (Arc also invokes the shell in a few other places, but those don't need any extra escaping.)
Format string injection: Be careful about file paths passed to (tofile ...). Everything after the last slash must be a valid 0-argument format string. The format string syntax is described at http://docs.racket-lang.org/reference/Writing.html.
Arc injection: The prompt.arc webapp is explicitly designed to let admin users evaluate their own Arc code on the server. If an attacker gained access to this page, it would be worse than any other kind of code injection. Because of this, I don't recommend running prompt.arc on a production site. (If it can't be helped, I recommend at least using HTTPS so admin login credentials and commands can't be intercepted by a man-in-the-middle attack.)
"So, are there anything like restarts available for Racket/arc?"
I'm pretty sure the answer is no, for now.
With the right use of Racket's 'parameterize and 'current-parameterization, we could build a system that works like Common Lisp's conditions and restarts. (Global variables are an alternative to 'parameterize, but they'll have more awkward interactions with continuations and threads.)
I've only played with CL once or twice, but I seem to recall one of the nicest parts about conditions and restarts is that when a condition isn't caught at the REPL, the user gets a sub-prompt they can use to invoke a restart. Threads might give us something close to this behavior, if we make the REPL's catch-all condition handler block the thread while the user chooses a restart.
Unfortunately, all preexisting Racket/Arc errors will still use the exception system rather than this homemade condition system. In order to make conditions the main error mechanism, we might have to modify the way Arc compiles to Racket so that it traps exceptions and turns them into conditions.
Finally, CL's condition handlers use its subclassing system to find the most specific handler. Arc doesn't have hierarchical dynamic typing like that, or at least not in the main language, so it might not make sense to handle conditions that way.
Altogether this is quite a few feature interactions that would need to be considered, and we'd end up with quite a different REPL experience--maybe even a qualitatively different language. How do we know if it's worth it?
It's probably not. Restarts (and similar) just seemed like a really useful tool for use in production environments. Anything that makes error handling more powerful, flexible, or understandable is a good thing. But changing arc to support it might be a bit much, at least at this point.
I do think it would be good if we could improve arc's error handling though. Or at least error reporting on the repl.
Arc doesn't have hierarchical dynamic typing like that...
I had created a hierarchical typing system for arc once, by just changing the tagged symbol to a list of symbols that is treated as the type hierarchy, but I think I lost the code... I'll have to see if I can find it.
arc> (= k-idfn ($:lambda (#:k v) v))
#<procedure:zz>
arc> (k-idfn '#:k "hello")
Error: "struct procedure:zz: expects 0 arguments plus an argument with keyword #:k, given 2: #:k "hello""
This function expects 0 positional arguments and a #:k keyword argument. However, we've passed it 2 positional arguments, one of which is #:k and one of which is "hello". We haven't passed it any keyword arguments at all, let alone one with the keyword #:k.
We pretty much have two ways to call this function successfully.[1] First, we can write the function call itself as Racket code, taking advantage of Racket's function call syntax compiler to parse the keywords:
(Here I've used ($.list ...) as a way to construct ()-terminated instead of nil-terminated lists.)
Note that keyword-apply requires the keywords to be given in a specific order based on their UTF-8 bytes.
If we do a lot of this in the code, I'm thinking we could benefit from one of three things:
- An Arc function that takes an Arc table like (obj k (list "hello")) and calls Racket's 'keyword-apply it to the sorted keywords it expects.
- An Arc macro that looks like a function call, e.g. ($kw-call k-ifn #:k "hello"), but finds any occurrences of keywords and generates the appropriate Racket function call.
- A patch to Arc's own function call syntax so that it parses Racket-style keyword arguments.
I think these solutions would be more effective than shaving one or two heiroglyphics off of '#:k.
---
Not even Racket allows #:k as an expression by itself. It has to be written '#:k. After all, even if #:k evaluated to itself, (list #:k "hello") would not give you a list containing #:k and "hello", since it would try to pass a keyword argument instead.
You might wonder why Racket chose a design with such hoops to jump through. Well, I can't speak for them, but personally I'd give several shallow and hand-wavy reasons:
- It aligns with a certain mental model where positional args are an ordered list and keyword args are an orderless map. (This is the mental model I learned a long time ago in Groovy, so I like it.)
- It lets Racket treat (foo #:a 1 #:b #:c 3) as a compile-time error.
- It lets Racket treat missing/extra keyword args and missing/extra positional args as the same kind of error.
- It means keyword args don't have to be described in terms of positional args. It would be possible to design a lisp that only has Racket-style keyword args, with no positional args whatsoever.
- It means when a programmer wants to get a procedure's arity information dynamically, the result can include information about the keyword args it supports, rather than just the positional args it supports.
---
[1] Hmm, maybe there's a third way to call it successfully:
My original goal was not to shave characters off of '#:key. I apparently misunderstood how they were supposed to work, and thought that arc wasn't compatible with them. I ended up doing the cosmetic change as part of my process of figuring out how to hack it onto arc.
If what you're saying about '#:key vs #:key is true though, then my test cases were incorrect and I was trying to make something work that shouldn't have.
So, what's the right way to hack on arc to support calling racket functions with keyword args? Or would it be better to make a more arc-idiomatic mongo driver, and how?
Oops, I think you've been seeing something I wasn't seeing.
It turns out I'm getting much different results on a local setup than I was getting in tryarc.org. Since Arc 3.1 and Anarki use (require mzscheme), all sorts of things are replaced with doppelgangers that doesn't support keyword args, including the function application syntax #%app. In fact the lack of keyword arguments is one of the only things (require mzscheme) is good for. This is something tryarc.org changes, apparently.
"So, what's the right way to hack on arc to support calling racket functions with keyword args?"
I wouldn't say it's "the right way," but we could put the functionality of Racket's 'keyword-apply in a function that takes an Arc table, and then we'd pretty much never need to worry about keywords other than that.
I have an implementation for this, which I should be able to commit with some unit tests now that I know what's going on with MzScheme.
...Actually it might take me a few days to get around to that, so here's the code if anyone wants to use it right away:
(def $kw-apply (func kwargs . posargs)
; Convert the last element of posargs (the first-class list) into a
; Racket list. If posargs is completely empty, pretend the last
; element is an empty list just like Arc's 'apply already does.
(zap [rev:aif rev._
(cons (apply $.list car.it) cdr.it)
(list:$.list)]
posargs)
(let (ks vs)
(apply map list ; Transpose.
(sort (compare $.keyword<? !0)
(map [list ($.string->keyword:+ "" (or _.0 "nil")) _.1.0]
tablist.kwargs)))
(apply $.keyword-apply func (apply $.list ks) (apply $.list vs)
posargs)))
I'm using a table of singleton lists so that we can pass the symbol 'nil as an argument value.
---
"Or would it be better to make a more arc-idiomatic mongo driver, and how?"
I actually have some opinion about "the best way" for this. :)
I think Arc-idiomatic approaches serve no particular purpose, since Arc is a tool for general-purpose computation.[1] I would want a database driver to be idiomatic only for the database itself, so that it serves the more specific purpose of storing data. This can then be accompanied with sugary helper utilities, as long as they're optional.
It seems many ORMs want to bake the sugar into the interface, or they make sugar that has tiny escape hatches for poking at the underlying interface. If sugar is the only thing a programmer (usually) sees, I look at it as though it's a full-on database design of its own... which is rarely favorable since it usually inherits most of the complexity and obligations of the original.
[1] Well, while Arc is a general-purpose tool, it's specifically a language, so Arc-idiomatic approaches serve the particular purpose of making features more accessible to language users.
Do we actually need the mzscheme dependency? Any reason we couldn't switch to a full racket base?
---
Interesting. I'll have to play around with that. And study it a bit more to figure out how it works. If I understand your description correctly, the reason you're using lists is so that 'nil is interpreted as no value, while '(nil) is interpreted as intentionally passing the value 'nil? How does the function know the difference?
---
Interesting opinion. Unless I misunderstood you, I would have felt the opposite way. Normally, I would expect the purpose of the data interaction layer to be separating the implementation details from the code, so that if changes need to be made to what backend storage system you can just trade it out.
Maybe that is what you mean though, and that's supposed to be a distinction between the "driver" and an additional abstraction layer. Though your comment about ORMs including sugar confuses that a bit. I don't really want sugar per se, just abstraction away from how I'm actually storing the data, within reason anyway.
"Do we actually need the mzscheme dependency? Any reason we couldn't switch to a full racket base?"
There's always a reason, but these days Anarki has broke all my code enough times that that shouldn't be a concern. :)
I think this would be a positive change.
---
"...the reason you're using lists is so that 'nil is interpreted as no value, while '(nil) is interpreted as intentionally passing the value 'nil? How does the function know the difference?"
Arc doesn't support nil as an element of a table. Setting a table entry to nil removes it. Therefore '$kw-apply will only see non-nil values anyway.
As pg says: "In situations where the values you're storing might be nil, you just enclose all the values in lists." http://arclanguage.org/item?id=493
When I was using Arc heavily, I defined a utility (sobj ...) that was just like (obj ...) but it wrapped everything in a singleton list. That may have been the only extra utility I needed.
I could write (each (k (v)) tab ...) in place of (each (k v) tab ...). I could write tab!key.0 in place of tab!key. I could write (iflet (v) tab!key ...) in place of (iflet v tab!key ...).
It was surprisingly unintrusive, even pleasant.
---
"Normally, I would expect the purpose of the data interaction layer to be separating the implementation details from the code, so that if changes need to be made to what backend storage system you can just trade it out."
I like the sound of that, but I think it's always a bit leaky, unless the result is a full database design that lets people happily forget there's another database underneath.
It seems to, though much of that is the ODBC layer, which I'm not sure I like anyway. It generates some constructors and getters and setters that use keywords, and I don't think that fits particularly well with the arc style.
Unfortunately, some of the database options are also keyword based, but maybe wrappers could be made for the few cases that matter? Sadly, I am also a beginner with mongo, so I don't really know what I need or how to do it.
I think you're telling fibs. :-p I double-checked srv.arc (which defines 'defop), and the code there opens a thread for every request. This is true in Anarki, in official Arc 3.1, and even way back in Arc0.
Even without parallelism, this would come in handy to prevent I/O operations from pausing the whole server.
Arc does have threads, yes, but it also has a style of mutating in-memory globals willy-nilly. As a result, all its mutator primitives run in atomic sections (http://arclanguage.github.io/ref/atomic.html#atomic) at a deep level. The net effect is as if the HN server has only one thread, since most threads will be blocked most of the time.
I can't find links at the moment, but pg has repeatedly said that HN runs on a single extremely beefy server with lots of caching for performance.
Edit: Racket's docs say that "Threads run concurrently in the sense that one thread can preempt another without its cooperation, but threads do not run in parallel in the sense of using multiple hardware processors." (http://docs.racket-lang.org/guide/concurrency.html) So arc's use of atomic wouldn't matter in this case. It does prevent HN from using multiple load-balanced servers.
Looking back, I see that I did indeed inaccurately answer zck's question about "running single-threaded". I'd like to amend my answer to "No, it runs multi-threaded, but the threads use a single core." Rocketnia is right that arc has concurrency but not parallelism.
"The net effect is as if the HN server has only one thread, since most threads will be blocked most of the time."
Well, in JavaScript, I do concurrency by manually using continuation-passing style and building my own arbiter/trampoline... and using it for all my code. If I ever do something an easier way, I have to rewrite it eventually. Whenever I want to try out a different arbiter/trampoline technique, I have to rewrite all my code.
Arc's threading semantics are at least more automatic than that. Naive Arc code is pretty much always usable as a thread, and it just so happens it's especially useful if it doesn't use expensive 'atomic blocks (or the mutators that use them).
"Automatic" doesn't necessarily mean automatic in a good way for all applications. Even if I were working in Arc, I still might resort to building my own arbiters, trampolines, and such, because concurrency experiments are part of what I'm doing.
All in all, what I mean to say is, Arc's threads are sometimes helpful. :-p
Absolutely. I was speaking only in the context of making use of multiple cores.
I see now that I overstated how bad things are when I said it's as if there's only one thread. Since I/O can be done in parallel and accessing in-memory data is super fast, atomic isn't as bad as I thought for the past 5 years.
"It sounds hard to believe using a single core is a problem of mzScheme."
(Psst, if you say "mzScheme," I feel the need to remind you that Arc works on the latest versions of Racket, which have long since dropped the name "mzscheme".)
Generally, threads are for concurrency, not necessarily parallelism. They're a workaround for an imperative, sequence-of-side-effects model of computation, which would otherwise force us to choose which subcomputation should come first. In Racket, this kind of workaround is their only purpose.
Racket has two features for parallelism, and they're called "futures" and "places":
I'm finding out about these for the first time, but I'll summarize anyway.
Futures are a lot like threads, but they're specifically for speculative parallelism. They're allowed to break some invariants that threads would have preserved, and (as per the nature of speculative parallelism) some of their computations may be thrown away.
Places use shared-nothing concurrency with message passing, and each place runs in parallel.
So although news.arc does spawn a thread to handle every server request, it would probably need to use Racket's futures to take advantage of multi-core systems--and even then, I'm guessing it would need some fine-tuning to avoid wasting resources. If it used Racket's places, that could make its resource usage easier to reason about (but not necessarily better!), but it would require even more substantial refactoring.
It's funny to see you make this change to 'pluralize, 'cause I was thinking of exactly the reverse change for everything else.
In the language(s) I'm working on, the number type has mainly been a way to represent list indexes and lengths. List processing depends on integer processing, or so I thought. The other day I realized I could just use lists instead.
; Scheme-ish code
> (list-ref '(a b c) '())
a
> (list-ref '(a b c) '(z))
b
> (list-ref '(a b c) '(y z))
c
> (len '(a b c))
(() () ()) ; or even (a b c)
This unary encoding won't be efficient for big numbers, but these aren't very big numbers. They're no bigger than other cons lists that I already have in memory, and they might even share the same structure.
So I'm thinking that even if I do support other number types, I'll very rarely need to use them for counting first-class objects. They'll mostly be for counting pixels, milliseconds, etc.
"What was the reason you decided to do it this way? It seems more complicated to work with."
In my designs, I don't just want to make things that are easy for casual consumers. I want to make things people can consume, understand, talk about, implement, upgrade, deprecate, and so on. These are things users do, even if they're not all formal uses of the interface.
I hardly need number types most of the time. If I add them to my language(s), they might just be dead weight that makes the language design more exhausting to talk about and more work to implement.
Still, sometimes I want bigints, or at least numbers big enough to measure time intervals and pixels. I don't think any one notion of "number" will satisfy everyone, so I like the idea of putting numbers in a separate module of their own, where their complexity will have a limited effect on the rest of the language design.
---
"Huh, this is similar to Church numerals"
I'm influenced by dependently typed languages (e.g. Agda, Twelf, Idris) which tend to use a unary encoding of natural numbers in most of their toy examples:
data Nat = Zero | Succ Nat
To be fair, I think they only do this when efficiency isn't as important as the simplicity of the implementation. In toy examples, implementation simplicity is pretty important. :)
A binary encoding might use a technique like this:
data OneOrMore = One | Double OneOrMore | DoublePlusOne OneOrMore
data Int = Negative OneOrMore | Zero | Positive OneOrMore
Or like this:
data Bool = False | True
data List a = Nil | Cons a (List a)
data Int = Negative (List Bool) | Zero | Positive (List Bool)
I get the impression these languages go to the trouble to represent these user-defined binary types as efficient bit strings, at least some of the time. I could be making that up, though.
For what I'm talking about, I don't have the excuse of an optimization-friendly type system. :) I'm just talking about dynamically typed cons cells, but I still think it could be a nifty simplification.
I don't think this itself would be called Church numerals, but it's related. The Church encoding takes an ADT definition like this one and looks at it as a polymorphic type. Originally we have two constructors for Nat whose types are as follows:
Zero : Nat
Succ : (Nat -> Nat)
These two constructors are all you need to build whatever natural number you want:
We could make this function more general by abstracting it over any type, not just Nat:
buildMyNat : a -> (a -> a) -> a
This type (a -> (a -> a) -> a) is the type of a Church numeral.
While it's more general in this way, I think sometimes it's a bit less powerful. Dependently typed languages often provide induction and recursion support for ADT definitions, but I think they can't generally do that for Church-encoded types. (I could be wrong.)
For something more interesting, we can go through the same process to build a Church encoding for my binary integer example:
data OneOrMore = One | Double OneOrMore | DoublePlusOne OneOrMore
data Int = Negative OneOrMore | Zero | Positive OneOrMore
buildMyInt :
OneOrMore -> -- One
(OneOrMore -> OneOrMore) -> -- Double
(OneOrMore -> OneOrMore) -> -- DoublePlusOne
(OneOrMore -> Int) -> -- Negative
Int -> -- Zero
(OneOrMore -> Int) -> -- Positive
Int
buildMyInt :
a -> -- One
(a -> a) -> -- Double
(a -> a) -> -- DoublePlusOne
(a -> b) -> -- Negative
b -> -- Zero
(a -> b) -> -- Positive
b