Arc Forum | Docstrings: a contrarian view

Arc Forum

Docstrings: a contrarian view

7 points by CatDancer 6368 days ago | 26 comments

Documentation strings can be stored with the function or macro definition in some way [http://arclanguage.org/item?id=989], e.g.:

  (def add (a b) "Adds two numbers." ...)

or, they could be stored in a separate docstring file:

  ((add "Adds two numbers.")
   ...)

To have the docstrings embedded in the source code implies that we want to have one true documentation for the code. But I might prefer Alice's documentation because it give explanations in great detail, someone else might prefer Bob's documentation because it is pithy, a third person might prefer Cindy's documentation because it has lots of examples, yet another might prefer Dmitry's documentation because it is written in Russian...

I've gotten sucked into a number of useless arguments over the years, and I've found that many were based on a false assumption that there had to be one "official" way of doing something. Putting docstrings in with the source code makes that documentation the "official" version, and I foresee various useless arguments about the "official" version: for example, should it describe actual behavior even if it was unintended, or should it describe what the system "should" be doing (and then, what "should" the system's functionality be?)

It is better to decouple the documentation from the source. Don't like the documentation? See a way to describe things better? Publish your own documentation file. You don't need anyone's permission, you don't need to try to get your changes accepted by some central gatekeeper, just do it. And if your documentation is better than Alice's or Bob's or Paul's, people will use yours.

Embedding docstrings in with the source also imply that functions and macros should be documented as they are written, or, at the very least, when changing things around that old documentation that is no longer valid should be deleted.

I have in various times in my life single-handedly implemented projects that a team of programmers failed to do, or implemented parallel features ahead of a team of programmers. A fun example is last year I was the only programmer a startup had when they discovered a major new business model of integrating with large businesses. They were in a typical startup catch-22 of not being able to make the sale without having a product to sell, but weren't able to fund (i.e., pay me to) make the product without having made the sale. So we (the startup and I) told the large business that we were starting them off with our API running in "test mode", and once they had done their system implementation to work with the API, we'd "turn on" our system and they'd be live. In actuality, the "test mode" was some stubs that I had thrown together over a day or two, and the "system" that we claimed we'd be "turning on" hadn't been written yet. As the team of programmers at the big business did their implementation to talk to our API, I wrote our system which implemented the API.

When I think about the times like this when I was able to go fast, and other times when I got mired down and the project was a failure, a key realization for me is that my brain isn't very large. I'm only able to think about one or two things at once. A vital strategy for me is thus to be able to pare things down to absolute essentials. Thus, for me, not having to embed the documentation in with the source is a major advantage. As I am pushing code around, I'm able to look just at the code. Having documentation in view at the same time may seem like a rather minor issue... and maybe it is for people who either A) don't try to beat teams of programmers single-handedly, or B) have larger brains than I do... yet, again for me, every additional thing that is in view when I'm working on something diminishes my brain's capacity to work on the problem.

If documentation were stored in a separate docstring file, then I could concentrate on the code, and when it was time to work on the documentation, I could go over and work on the documentation.

I realize that embedding documentation with the source code is a feature offered, and indeed is the usual practice, of many languages; and it is one of those things that on the face of it sounds likes a good idea. Yet I recommend against it.

I don't have a problem with those who want to embed their documentation in with their source code to be able to do that. I hope that as people write help systems and documentation viewers, they will support both styles: documentation that is written in a separate docstring file, and documentation that is embedded for those who want to do that.

14 points by kennytilton 6368 days ago | link

The problem with comments is that no one ever changes them when the code changes even tho the comment is right there! don't you see it! change that,too!!!

Moving docstrings away from the code would make things worse, creating multiple copies would make it even worser.

-----

6 points by marvin 6367 days ago | link

I love contrarian views. I disagree with this one, though.

I'd like to hear good arguments for not having docstrings. CatDancer has one instance above, but that particular case could be solved by instructing the editor to hide them until we want them. Source-level documentation solves so many other problems.

Documentation is essential if multiple programmers are going to touch something, which will be the case in the vast majority of projects. "The vast majority" likely even includes most peripheral work on the language itself, such as writing libraries. Keeping the documentation right next to the function/macro definition makes it that much easier to check the documentation and a lot easier to write the documentation in the first place. Even looking up automatically generated documentation on a web site is too much hassle when it could be checked interactively from your terminal.

I think the principal argument for docstrings is that they actually make most programmers document their code, as long as the framework (help-query systems, automatic HTML generation etc.) is in place. And this in turn makes libraries that much more valuable, because it won't any longer take two hours to understand a simple, peripheral but necessary support system.

-----

3 points by kennytilton 6367 days ago | link

* Documentation is essential if multiple programmers are going to touch something, which will be the case in the vast majority of projects.*

I don't know. (a) This is Lisp, we need one tenth the programmers of other languages meaning it is one tenth likely anyone will ever look at my code. (b) Even in tall buildings on projects with twenty people when I was on a trekking vacation in the Andes and my program broke, guess whose yak-phone rang?

-----

3 points by almkglor 6367 days ago | link

(a) This is Lisp, so the same-sized team of Lisp programmers can build a project ten times larger. (b) You.

-----

2 points by shiro 6367 days ago | link

Although this is not a very general example, I tend to oppose in-code docstrings for a couple of reasons.

One of the primary reasons is that we have developers in multiple countries with different primary languges; forcing everyone to write English docs would actually discourages some from writing docs. And mixing multiple languages in the source was a source of troubles (although it's getting better now for most tools and editors can cope with utf-8.)

Another is that I see sometimes documentation and code need different structures; tying one doc for each global entry skews documents away from the better structure. (This doesn't exclude having a small 'remainder' entry per each code, which is useful as you said that you can check in the editor. )

-----

4 points by kennytilton 6367 days ago | link

btw:

"I love contrarian views. I disagree with this one, though."

A sig is born. :)

-----

1 point by akkartik 6367 days ago | link

Here's how I handle that: explicitly connect comments to a specific point of time. Use (version control) tools to regain the advantages of having comments next to code.

http://akkartik.name/codelog.html

That said, the idea of different people maintaining their own documentation is pretty interesting. Anybody have pointers to projects where this has happened?

-----

6 points by vrk 6367 days ago | link

> Embedding docstrings in with the source also imply that functions and macros should be documented as they are written

This is how it should always be done. Encoding your ideas in any programming language is a lossy operation: it's difficult or impossible to determine from an arbitrary piece of source code what the intent and purpose of the programmer originally was. Good documentation gives the missing pieces.

I've found it incredibly helpful to write the documentation of functions before the function itself. Like you, I have a small head, and unless I know exactly what I should program beforehand, it's going to be difficult and messy to keep all details in mind. Describe the intent and purpose, give a canonical example of use (single line, or a couple of more if the function is variadic), rule out illegal input, and you're set.

Note that I'm not advocating long pieces of documentation. If you cannot describe a function well in a few lines, it's too big, and it's definitely not clear to you what it should be. If the documentation is in the same place as the source code, it is worlds easier to write the documentation, write the source code, then bounce between them while developing (since both usually require iteration).

I do not want to see idiotic documentation in Arc programs, but I doubt people who pick up Arc would do that anyway. I've seen many systems written in, e.g., Java, where documentation follows the same pattern as many Microsoft application help systems: "If you press the button called Print, you can print the document." or "Check here to enable grayscale printing" as a tooltip to a checkbox called "Enable grayscale printing".

-----

2 points by akkartik 6367 days ago | link

"Embedding docstrings in with the source also imply that functions and macros should be documented as they are written"

Not at all! It's been a mind-blowing experience to me to see documentation pop up in all of the arc codebase just because we had a wikipedia-like workflow.

People often don't want to talk about what they're doing when they are doing it. It's just human nature. This suggests another great time to document code: not when you write it, but when you read it and figure out what it does. A wiki allows this.

-----

1 point by partdavid 6366 days ago | link

Little discursive notes about why one thing works and something more obvious doesn't can be really helpful. Commentary about the code you didn't write can be helpful.

But in my experience on large projects that kind of documentation becomes:

   # do_my_foo() is a function accepts a float and
   # returns a float, performing necessary calculations.
   double do_my_foo(double inarg) {
   .
   .
   .

That is, they lie, they rot, and, since it's a "requirement" the programmer wasn't inclined to perform, they aren't informative, either.

-----

4 points by pau 6367 days ago | link

If my opinion counts for something, I totally support your view, CatDancer. The same happens to me, I always want to remove distraction. But it's a personal thing, I realize that. Everyone has deep reasons to prefer their own way, and I am no exception. I also should confess that I program mostly "alone", that's why my opinion maybe doesn't count as much as everyone else's.

In my case, I kind of think about it as "typographical" reasons. For instance, all manuals of typography tell you that it's uncomfortable to read very long lines of text if they are too close (in fact, this forum is above the limit, whereas PGs essays, if you remember, are almost perfect). The reason is that when you try to find the exact line that is the continuation of the one you are on, you have to search. Best measures for this have been tabulated by the professionals, of course. And, you know, lately I've discovered myself trying all the time to make programs not surpass the 65th column, and using the "wc -L" command all the time, so that programs occupy mostly your field of vision without moving your eyes... Stupid maybe? Perfectionism? Yes, I know. But I think this goes in the lines of your comment, since documentation breaks this 'unity' of code, I don't like it in the middle.

And the one thing that has attracted me towards Arc is, as a matter of fact, "the PG style", this minimalist, amazing (for my taste) coding style that I saw in "On Lisp" and "ANSI Common Lisp". It feels right to me, and you feel that code becomes a 'definition', so I now think I should be trained enough to read "arc.arc" _without_ documentation, as if I was reading math or something. And since Arc seemed like a language built with those principles, the addition of documentation, while certainly useful, didn't feel right to me.

In fact, I would die to know what PG thinks about this... ;)

-----

3 points by kens 6367 days ago | link

I find it hard to believe that reading the code can be a replacement for documentation. For example, I would find a one-line description much more useful than trying to figure out:

  (def qne (obj q)
    (atomic
      (++ (q 2))
      (if (no (car q))
          (= (cadr q) (= (car q) (list obj)))
          (= (cdr (cadr q)) (list obj)
             (cadr q)       (cdr (cadr q))))
      (car q)))

Seriously, I'd like to know if I can expect to get to the point where the above is just obvious.

-----

3 points by kens 6366 days ago | link

In case anyone was wondering... this is Arc's enqueue operation, which atomically adds obj to the queue q. Arc's queue data structure provide constant-time enqueue, dequeue, length, and conversion to list operations. My point is that documentation is a good thing, since understanding raw code can be difficult for mere mortals such as myself.

-----

2 points by almkglor 6367 days ago | link

>It feels right to me, and you feel that code becomes a 'definition', so I now think I should be trained enough to read "arc.arc" _without_ documentation, as if I was reading math or something

Sure, sure. Then you completely forget wtf 'on is supposed to do, and how it's different from 'in, and have to read it again.

-----

1 point by pau 6367 days ago | link

Well, if you completely forget about some function, you will have to read something anyway. I guess what you say is that the documentation is obviously more readable than code, but I'm not so sure...

-----

2 points by kennytilton 6366 days ago | link

It probably varies. Natural language is a mess, and in the hands of geeks...ugh. So if I have an interesting function to understand I would just as soon see the code. But if I am looking at standard library stuff it will be hard to muck up the one very short sentence description. Add to that that we have a new language with a lot of shortcut syntax and that its a Lisp and a lot of non-Lispers are hopefully looking at the language, I think maybe the ball was dropped when we were told to just read the code, as much as I grok that sentiment. It is something I am much faster to say when we have people asking about application code in a language everyone asking is supposed to know.

-----

1 point by almkglor 6366 days ago | link

  (mac in (x . choices)
    (w/uniq g
      `(let ,g ,x
         (or ,@(map1 (fn (c) `(is ,g ,c)) choices)))))

  (mac on (var s . body)
    (if (is var 'index)
        (err "Can't use index as first arg to on.")
        (w/uniq gs
          `(let ,gs ,s
             (forlen index ,gs
               (let ,var (,gs index)
                 ,@body))))))

versus:

  (from "arc.arc")
  [mac] (in x . choices)
   Returns true if the first argument is one of the other arguments.
      See also [[some]] [[mem]]

  (from "arc.arc")
  [mac] (on var s . body)
   Loops across the sequence `s', assigning each element to `var',
      and providing the current index in `index'.
      See also [[each]] [[forlen]]

The main point is this: documentation says "what this code should do". Code says: "this is how we do it". In many cases, the user of a function or macro is interested in what the code is supposed to do; how it's done is less important.

-----

2 points by cpfr 6366 days ago | link

almkglor that documentation is beautiful

-----

2 points by almkglor 6366 days ago | link

The full documentation is available on arc-wiki.git. Wanna make a guess on who did most of the documentation?

-----

1 point by pau 6366 days ago | link

I am really sorry I gave the impression that I was critisizing your work... my apologies.

-----

1 point by almkglor 6366 days ago | link

Err, sorry, but that's not what I meant. Just tooting my own horn ^^. In any case, the documentation can be improved. I did most of that while I was sick, and I'm not 100% sure it's correct - there may be subtle uses that I haven't documented. Also, the docs are far from complete - there are a huge bunch of functions without decent "see also" links. And I've only completed arc.arc yet, still haven't had time to do srv.arc, html.arc, app.arc, and prompt.arc

-----

4 points by nex3 6367 days ago | link

I understand your points, but I think there's one concern that sort of overrides them: it's very important for users to be able to have access to the documentation without any extra effort. They shouldn't need to load, or God forbid _find_, a documentation file before they can get help.

Now, you could say that we should have a default docfile and move the inline docstrings to that, but I think that in general when you are writing and maintaining documentation, it's much easier to have it right there.

I think adding support for alternative docfiles might be a good idea... geel free to do it in Anarki. But I don't think all documentation should be moved out to them.

-----

2 points by ryantmulligan 6367 days ago | link

I feel like if we have a canonical web repos for docstrings people will be able to find and use it. It works with Java's and Ruby's documentation. Most people probably don't care if the documentation is in the src or not. I don't know how to do it better than put it in the src though.

-----

2 points by almkglor 6367 days ago | link

If you've got arc-wiki and the arc server running, try http://localhost:8080/help

-----

2 points by nzc 6367 days ago | link

My experience (with emacs, and emacs-lisp, anyway) is that the docstrings are incredibly valuable to me, and that the culture of emacs hackers leads the doc strings to be maintained pretty well.

However, I've never seen anyone use the docstrings in any of the common lisp programming jobs I've held. Which is why I invoke the "culture of emacs hackers" above.

-----

1 point by CatDancer 6363 days ago | link

Documentation is Good

From some comments it sounds like some people thought I was saying that documentation is unimportant, or even that code shouldn't or doesn't need to be documented. So do let me clarify: I think documentation is wonderful. I am delighted and appreciative of documentation efforts, and I find the documentation that people provide of tremendous value to me.

-----