Arc Forum | Brainstorm: syntax sugar for lambdas

Arc Forum

Brainstorm: syntax sugar for lambdas

3 points by andreyf 6103 days ago | 20 comments

I'm a big fan of Arc's [... _ ...] syntax, but I find it lacking when I want to represent short lambdas with no variables, or [>= _ 2] variables.

I'm aware that Clojure solves this via

    #(... %1 ... %2 ...)

(yuk!), and Anarki via

    (... _1 ... _2 ...)

(where is _0!?), are there any other elegant solutions people have had?

5 points by cchooper 6102 days ago | link

As rkts said, if you start specifying names of variables, then you aren't really saving many characters. The other problem with some of the suggestions is that you don't get the benefit of omitting the outer parens of an expression:

  [car _]

Note that there are no parens in that function. On the other hand, most suggestions about putting the variables at the start reintroduce the parens:

  [x y || (list x y)]

It would be better to do this:

  [x y || list x y]

One option is to do what K does: parameters are called x, y and z by default:

[list x y z]

It's concise, but not really much different to [list _1 _2 _3] or Clojure's #(list %1 %2 %3).

I like the idea of re-purposing \ as a synonym for 'fn', as in some other languages. Personally, I've never used \ as a single-escape character in any Lisp code I've written, and all instances can be replaced with |...| anyway. But this is still not as concise as [...]

  (\(x) (car x))

Perhaps using \ as an infix operator would be better:

  (x)\(car x)

Pushing it further:

  x\(car x)

  x\y\z\(list x y z)

This means you can't use rest arguments with this notation, but then I almost never use rest arguments with anonymous functions, just as I don't use implicit progn (which you can't represent with [...] either). You would have to resort to 'fn' in these situations, which is fine because the language should be optimised for the most common case rather than the general case.

One final observation: almost all Arc functions take either one, two or three arguments, with the last one often being a rest argument. So these are the cases that need to be optimised.

-----

3 points by andreyf 6101 days ago | link

almost all Arc functions take either one, two or three arguments, with the last one often being a rest argument. So these are the cases that need to be optimized

Although the other analysis is insightful, IMO, this is the most important part of your post. Although, I'm not looking to optimize all functions, just those with bodies smaller than about twice the size of the "fn (argument list)" code.

-----

3 points by shader 6099 days ago | link

Here's an idea: If you don't actually specify an order to your variables, unbound symols are bound in alphabetical order. That way you can have complex symbols you like, and you don't have to specify them either.

This would work with _1 _2 _3, and a b c.

-----

1 point by shader 6093 days ago | link

Any ideas on how to implement this? Is it worth it?

-----

2 points by Darmani 6091 days ago | link

Let's call these anonymous functions whose arguments are the unbound symbols within them in alphabetical order "ofn"'s.

The clearest way to implement ofns is to have a function on the outside of every block of code which does a tree-traversal, collecting all the symbols bound, and then, whenever it finds and ofn, just simply does (sort < (rem [mem _ bound-list] (get-list-of-symbols-used ofn-code)) to find the args list.

There are two major issues with this. The first is global variables -- it needs to determine which symbols will be bound in the global scope at runtime at compile time. Let's assume all global bindings will be done at the top layer or in a layer separated by nothing by calls to 'fn from the top (please, tell me you don't bind your globals in 'eval). (E.g.: (let a 1 (def b ...)) becomes ((fn (a) (= b (fn ...))) 1) and thus meets that criteria, but (def a (b) (= b 1)) becomes roughly (= a (fn (b) (= b 1))), and thus the call to (= b 1) has a call to = between it and the top, and thus will not be counted.) In a process very similar to the overall version, we can then go through the macro-expanded files, and find all the symbols bound by those = forms.

The second problem is nested ofn's (e.g.: [map [map [+ a b] c] d]). I'm still in search of a good solution. I've thought about finding the arity of each ofn (made impossible in the general case by higher-order functions like map which call functions with a variable amount of arguments); finding where each variable is used to determine its minimum scope, and having each symbol an argument of the ofn with that minimal scope; letting inner ofns each have one arg, determined by some alphabetical ordering, and letting the outer ofn have the rest. All those solutions fail some major elegance tests. Best way is probably to just ban nested ofns.

At first I would have said ofn's would be a great feature, but, if we can't have them nested, I can't be so sure -- it would be inelegant to have nested bracketed functions work differently that the outer one, or to have a different delimeter for ofns and normal square-bracketed functions.

Ironically, the difficulty in implementing this dynamic feature stems from the source code of dynamic languages being difficult to analyze.

-----

2 points by shader 6091 days ago | link

It seems to me that, however we do it, it would probably be slow. Maybe, as the source is being read in, the compiler/interpreter could somehow flag each symbol as bound or unbound? Then you could have your ofns just inspect all of the symbols below them, and check whether they're bound or not.

About nested ofns, how would you expect them to work? That should be the real test of what method to use in determining which ofns bind which vars. Obviously they're bound in alphabetical order. How about having the outer ofn bind as many vars as it has args, and leaving the rest unbound for the next layer? This would make it impossible to tell which ofn would bind which var at compile time (unless you know the arity of the outer function based on it's environment), but it would (I think) make the most sense. It does leave room for confusing circumstances though ;)

What would happen if there weren't enough arguments to fill in all of the vars? Return a fn? Even more fun!

If the ofns bind inwards, you would need to do your example backwards: [map [map (+ d c) b] a]

You could however try and bind outwards, so the inner ofn binds first, but I don't see how that would be a good idea.

What do you think? Any better ideas? Any glaring flaws with my idea? (I don't know lisp or arc that well)

-----

1 point by Darmani 6090 days ago | link

Source-code traversals are O(n), and compile-speed is not that important below a certain point. And finding all the globals only needs to be done once.

At first I thought it was important to have this conventionalize-ofns function run inside every def and every mac, so that, if I wished to use ofns inside a macro-expansion, which symbols are arguments would not depend on the context. I then realized this would break for virtually any method of generating macro-expansions other than quasiquote, and that, since, under the curretn implementation, two pseudo-gensyms have the same alphabetical ordering as the order they were created in unless they're of different length, it would still be workable (just do a (w/uniq (a b c) ...) and I'll hardly notice the difference -- saves fewer characters overall, but I find it a little more readable; plus, these can be reused if the macro-expansion contains multiple ofns).

If we are to do ofns at runtime, they would be much more workable, but also more unpredictable. I've said once before that a runtime-list of all bound symbols would be desirable for other reasons, but, if I'm testing code in the REPL and set a to some value, I don't want half the functions I call breaking for some mysterious reason. Especially considering I haven't found a way to unbind variables.

-----

1 point by shader 6090 days ago | link

Well, I think that the ofns should probably be closures, at which point the binding of the variables only matters when they are defined, after that defining a out of context wouldn't overwrite the a in the ofn. If not, we could have the ofn replace each symbol with a gensym tagged with 1st, 2nd, 3rd, etc. Then it wouldn't matter what you did with the original name. The gensym replacement only happens, of course, if the var is unbound during definition.

Here's a question: how does anarki's [ _1 _2 _3 ] form work when nested?

-----

2 points by shader 6103 days ago | link

Well, almkglor and I had talked once about having [ .. : .. ] syntax, which overloads the normal brackets if the reader detects the colon. In this version, the variable names come before the colon, and the function body after. Zero arguments should also work. I don't know if it was ever included into anarki, but it seems like it should be useful.

Here's the link to the thread: http://www.arclanguage.com/item?id=7245

Near the bottom is almkglor's implementation. It is backwards compatible with the [ .. _ .. ] form.

-----

1 point by rkts 6102 days ago | link

Those proposals shorten fns by at most three characters. Are multi-arg fns used often enough to warrant this? news.arc contains 23 multi-arg fns in 1769 lines of code; therefore they would save about 1 char every 26 lines.

That would be ok if the proposals were simple and elegant, but personally I find them hackish and inconsistent with the rest of the language. They also don't fully replace fn because they lack an equivalent for (fn args ...).

Here's my idea: just replace 'fn with a special symbol, like \. This seems to work:

  --- brackets0.scm       2008-11-11 17:06:01.000000000 -0600
  +++ brackets.scm        2008-11-11 17:06:17.000000000 -0600
  @@ -18,7 +18,8 @@
   ; a readtable that is just like the builtin except for []s
 
   (define bracket-readtable
  -  (make-readtable #f #\[ 'terminating-macro read-square-brackets))
  +  (make-readtable #f #\[ 'terminating-macro read-square-brackets
  +                     #\\ 'non-terminating-macro (lambda _ 'fn)))
   
   ; call this to set the global readtable

Now we can say

  arc> (map (\(a b) (prn a ", " b)) '(1 2 3) '(3 2 1))
  1, 3
  2, 2
  3, 1
  (1 2 3)
  arc> ((\args args) 1 2 3)
  (1 2 3)

This saves two chars and is relatively unintrusive.

P.S. Please nobody add any more comments to the thread linked above. It's making my threads page very wide :-(

-----

1 point by shader 6102 days ago | link

personally, I think that (fn (a b) (+ a b)) is more readable than (\(a b) (+ a b)), and readability matters much more than number of characters.

Also, the [:] form could save more characters, if it automatically applied the outer set of parens to the body form.

However, I don't think it's really that much of an improvement; fn works well enough unless you really like extra syntax.

What was it the original poster wanted, anyway? It sounded like something that was more readable than _1 etc. for the var names; thus my dredging of the old thread. If not, then obviously, it wouldn't be a good choice. Maybe the [:] form should be capable of only naming some of the args, and leaving the rest to the other naming convention? Then the [] form can name the first n arguments by putting them before a :, and have the args after that referenced by $, $0, $1, $2, etc. or some better character set, if _ looks bad.

-----

1 point by andreyf 6101 days ago | link

Yuk, the point I was making is that we should skip argument lists if our function is tiny - for example...

    (fn (a b) (- (/ b (* 2 a))))

...has "fn (a b) ", or 9/29 characters ~ 30% code is in some sense superfluous.

-----

1 point by rkts 6100 days ago | link

It's only a problem if fns of two or more args are common, and they don't seem to be. In news.arc, srv.arc and blog.arc they appear once every 123 lines. In my CL code they appear every 250 lines. Are they more common in your code?

-----

1 point by andreyf 6102 days ago | link

> It is backwards compatible with the [ .. _ .. ] form.

I agree that this is important - it's hard to beat the elegance of [ .. _ .. ] when you have one variable.

From the thread you linked, I'm a big fan of:

    [a b -> (- (/ b (* 2 a)))]

While this seems a bit obfuscated:

    [(- (/ _1 (* 2 _0)))]

-----

1 point by shader 6102 days ago | link

The only things I don't like about arrow form are 1) two characters, and 2) it looks like other math symbols.

I like the colon form, but some text editors make it almost invisible. If the font makes it bold enough, it can be easier to recognize than many of the others.

I originally like the pipe form, as it's also pretty obvious. However, since this is a lisp, you can always rewrite it to suite your individual tastes ;) How about writing a "config" file for arc, and various conversion tools, that allow us to all write in our own style, and easily convert between them? Then we wouldn't have to argue over which separator to use.

-----

1 point by andreyf 6101 days ago | link

How about writing a "config" file for arc, and various conversion tools, that allow us to all write in our own style, and easily convert between them?

Good call, but this doesn't address the problem of having to explicitly list parameters, which can be the majority of the code in a small function.

-----

1 point by rincewind 6102 days ago | link

I implemented something like the first one in my m-expression reader, "a -> b;" is translated into "(fn a b)". It could be used with cchooper's customisable reader, so you can still use s-exprs most of the time, like this:

  (map #m[a;b]-> 0 - b / 2 / a; my-list)

it would be read as

  (map (fn (a b) (- 0 (/ b 2 a))) my-list)

-----

1 point by rincewind 6101 days ago | link

let [... _ ...] mean (fn _ (... _ ...))

example usage:

  (reduce [-  (/ _.1 (* 2 _.0))] mylist)

-----

1 point by absz 6101 days ago | link

Unfortunately, this is then deoptimized for the most common one-argument case, where you have to write [... _.0 ...]. We could have [...] expand to (fn __ (let _ (car _) ...)), so that you would have

  (reduce [- (/ __.1 (* 2 __.0))] mylist)

but still have

  (map [+ _ (* _ _)] myotherlist)

. I still prefer the _1, _2, ... syntax, though---unsurprisingly, as I contributed that implementation (though not the idea) :)

-----

1 point by shader 6101 days ago | link

Actually, I do think the _1, _2 .. syntax makes the most sense, if you aren't wanting to name parameters. Though maybe (as was mentioned above) in some cases, the first three should be a b c, or x y z? I don't know how you would set that up to work well with what already exists though.

-----