Arc Forum | My new project: programming in x86 assembly with lots of tests

Arc Forum

	My new project: programming in x86 assembly with lots of tests (github.com)
	5 points by akkartik 2632 days ago \| 15 comments

3 points by i4cu 2632 days ago | link

I'm going to ask some seriously basic questions here:

1. So, Mu is a general programming language - right? (in reading the docs I almost wondered if it was a testing language only).

2. It is built on top of Arc - right? (It looks as though it's built with c code, but I see arc in there too so I have to wonder where it fits in).

3. Is the idea that you build your tests for each function inside the actual function? (which the examples seem to illustrate). If that's the case wouldn't the code become really large and hard to navigate? Have you written any substantial programs in it to see how that might look?

4. How does SubX relate to Mu?

I'm probably missing the depth being presented here, but I'm a high-level language guy so I have to start with basic questions :)

-----

4 points by akkartik 2632 days ago | link

Thanks for the questions! Part of the problem is that the repository isn't for a single 'language'. It's for my experiments for a new way to program that makes codebases easier to understand. As the biggest stress test for the new way, I've trying to create a whole stack that is easy to understand. But it's just a prototype, or rather a series of prototypes. I've tried to make each prototype self-contained, with a Readme and instructions for running it that will continue to work to this day.

Prototype 1 was a statement-oriented Assembly-like language built in Arc. You can still see it at https://github.com/akkartik/mu/tree/master/arc. I worked on it until about 2015. It's 3kLoC of code and 7kLoC of tests. The most substantial program I built in it was a hacky Lisp repl with syntax highlighting: https://github.com/akkartik/mu/blob/master/arc/color-repl.mu

Prototype 2 was a similar (but not compatible) language built in C++. It had the most work on it, and I also used it for teaching for a couple of years. I stopped working on it sometime this year. It's still available on the top level at https://github.com/akkartik/mu. It's 23kLoC of C code, and 5kLoC of Mu libraries. The most substantial program built in it was a 2-pane programming environment I used for teaching programming: https://github.com/akkartik/mu/tree/master/edit#readme. It's 12k lines of Mu code, about half of which is tests. You can see its sources colorized to be easier to read at the bottom of http://akkartik.github.io/mu.

Now I'm at prototype 3, SubX. It's in a very preliminary state. As above it is not intended to be compatible with existing prototypes. The previous prototypes were kinda-sorta designed to be easy to translate to native code, but they were still simple-minded tree-walking interpreters. I had hazy plans of gradually compiling them to native code from the top down, but that turns out to be beyond my ability. SubX starts from the bottom up, building an almost trivial syntax on top of raw native x86 machine code, and I'm gradually learning how to implement a compiler in it. Most people would say it's too hard to build a compiler in assembly these days when we have so many high level languages available. I'd like to see if it's manageable with the right framework for writing automated tests. So SubX starts out not with improvements to syntax, but to error checking and automated testing.

The most substantial program I've built in SubX so far is a port of the very initial version of Crenshaw's "Let's build a compiler" series[0]. All it does is read a number and emit native code to return that number in the exit status: http://akkartik.github.io/mu/html/subx/apps/crenshaw2-1.subx.... So SubX is still in a very early state.

SubX tests aren't inside the functions they test, they're just interleaved in the same file. Any label that doesn't start with '$' is the start of a new function. Functions that start with 'test-' are tests, and they all run when you run with a 'test' argument on the commandline.

The hope is to one day build a robust, hackable stack culminating in a high-level Lisp atop this infrastructure, a stack that leans into the Lisp tendency to fragment dialects by encouraging people to create incompatible forks -- while also making it easy (but not automatic!) to share code between the incompatible forks. It would still be some amount of work to copy the tests over and then make them pass, copying over bits of code at a time and modifying it as necessary. That's the sort of workflow I want to encourage rather than blindly upgrading software by running a package manager command. But before I can recommend it to others I have to see if I can get it to work. This repo is a test bed for eventually building tools to help people collaborate across incompatible forks.

Thanks for asking these questions! They're very helpful in understanding how others see the mess my repo has turned into. I'm going to try cleaning it up.

[0] https://compilers.iecc.com/crenshaw

-----

3 points by i4cu 2631 days ago | link

Ah... OK.

See I always move up to the top level directory and work my way down. So as I did this my reference point for understanding got completely mixed up.

I'd make a top-level dir called 'prototypes' with details like [1] you provided in this comment and then branch down.

* Also, I'd probably limit referencing Mu, specifically, in SubX. I don't think its helpful. Just remove this line:

  "We'll gradually port ideas for other syscalls from the old Mu VM in the parent 
  directory."

It doesn't add much value and takes people's attention away as they start trying and understand the relationship. If you need to just add notes at the bottom.

[1] "The hope is to one day build a robust, hackable stack culminating in a high-level Lisp atop this infrastructure, a stack that leans into the Lisp tendency to fragment dialects by encouraging people to create incompatible forks -- while also making it easy (but not automatic!) to share code between the incompatible forks."

Re: [1]

Yeah, if you can build a language that allows someone to build arc and even a clojure version of arc using the same base code, that would be cool. I'd immediately try spinning up a new version of arc with clojure's tables and table functions :)

-----

3 points by akkartik 2631 days ago | link

> I'd make a top-level dir called 'prototypes'... and then branch down.

Yeah, I've been planning a reorganization like that. Unfortunately the reorg is going to break all the links I shared before I thought of this :/ So I'm going to wait a bit before I make the switch.

> if you can build a language that allows someone to build arc and even a clojure version of arc using the same base code, that would be cool.

I'm not aiming quite there. That would be really hard to do, and then it would be impossible to keep in sync with existing language upstreams over time, given the underlying platform will be very different. The way I imagine providing something similar is this: there would be multiple forks of the Mu stack for providing a Clojure-like or Arc-like high-level language. But these languages wouldn't be drop-in replacements for real Arc or real Clojure. Also, each fork would try to minimize the number of languages it relies on to do its work, so you wouldn't immediately be able to run both Clojure and Arc on a single stack. Because every new language used in a codebase multiplies the comprehension load for readers. (I ranted about this before at https://lobste.rs/s/mdmcdi/little_languages_by_jon_bentley_1...)

Basically, I want to commoditize the equivalent of a Lisp Machine for any language. My goal is to help people collaborate across incompatible forks, _but_ the forks have to all have certain characteristics that no existing software has (because it all assumes rigid compatibility requirements).

-----

3 points by i4cu 2631 days ago | link

> Basically, I want to commoditize the equivalent of a Lisp Machine for any language. My goal is to help people collaborate across incompatible forks...

I'm struggling to understand so forgive me, but the reasons why and what you're doing seem to change or at least are many fold and thus hard to unpack. Or maybe the different prototypes are messing me up.

So let me see if I can unpack it (at least for myself :)

Your goals are:

1. To build prototype 'x' compiler/language that's more robust and easier to maintain because it has been built with testing capabilities in mind (I'm imagining a model with convenience features or set requirements to accomplish this).

2. Build prototype 'x' to permit developers to build their own compiler/language(s) that inherit the benefits from prototype 'x', thus making that process more enjoyable and more likely to succeed.

3. Permit/Encourage greater collaboration amongst developers, on separate prototype 'x' projects, because prototype 'x' is robust and developers are working under a shared model that has core concepts/features that act as a bridge for that collaboration to happen.

Does this seem right?

So is this a project that you're doing because you're passionate about it and you think it can change things (i.e. improve the lives of other people)? Or is this a product idea where you have assessed there's a need and you're going to fill it?

-----

4 points by akkartik 2631 days ago | link

Thanks for the probing questions! Yes, I don't mean to move the goalposts on my reasons. I feel like I'm reaching for something fundamental that could end up having lots of different benefits, mostly things I can't anticipate.

I don't actually care that much what the high level language is at the top. I'm biased toward Lisp :) so that's what I'm going to build towards. But if others want a different language I want to make it easy to switch the superficial syntax to suit their taste -- and convert all existing code on the stack so everything is consistent and easy to read. If some others want to add new runtime features, I want to make that easy too. Finally, I want it to be tractable to mix and match syntax and runtime features from different people and still end up with something consistent. Using tests at the bottom-most layers, and building more rigorous type systems and formalisms as necessary higher up.

The key that would make this (and much else) possible is making the global structure of the codebase easier to comprehend so that others can take it in new directions and add expertise I won't ever gain by myself, in a way that I and others can learn from.

Ignore the other prototypes in this repo; they're just details. The goal I'm working toward is a single coherent stack that is easy for others to comprehend and modify.

This isn't a product, in the sense that I can't/won't charge money for it. I'm not really making something others want right now. I'm trying to make something I think the world needs, and I'm trying to make the case for something the world hasn't considered to be a good idea yet. I'm sure I don't have all the details nailed down yet :)

-----

3 points by i4cu 2631 days ago | link

> I'm trying to make something I think the world needs, and I'm trying to make the case for something the world hasn't considered to be a good idea yet. I'm sure I don't have all the details nailed down yet :)

"To boldy go where no man has gone before"

"Second star on the Left"

I can get behind that :)

-----

3 points by hjek 2632 days ago | link

I'm somewhat disoriented by the very first example[0]. How is a newcomer meant to grok how this program prints out 42?

    bb/copy-to-EBX  2a/imm32
    # exit(EBX)
    b8/copy-to-EAX  1/imm32
    cd/syscall 0x80/imm8

This is not meant to criticize, but just as feedback from a real assembly newcomer, since this is in your project description:

> It would make it easier to write programs that can be easily understood by newcomers.

[0]: https://github.com/akkartik/mu/blob/master/subx/examples/ex1...

-----

2 points by akkartik 2632 days ago | link

Absolutely valid criticism. I'd love to hear more about what you did between seeing the link here and navigating to that example program. I've been trying to build a path to gradually take programmers to an understanding of (this particular unconventional style of) assembly programming. For example, I'm curious how much of the Readme you read, and if you happened to notice that the Readme has an orientation on the x86 processor.

I also made a couple of tweaks to this particular example. I hadn't looked at it in a while. Thank you! https://github.com/akkartik/mu/commit/d6535f3382

-----

2 points by hjek 2632 days ago | link

I read the first section of the readme without really understanding much but also not expecting to as I don't know x86 assembly. Then I decided to at least give the examples a superficial look as I'd noticed the word newcomers in the readme. But I couldn't see the number `42` in a program meant to print `42`, so that's where I gave up.

Is this meant to be a tutorial for assembly noobs?

For comparison I think your readme for Wart[0] is more welcoming: Briefly explaining what it is and how to run it, and then straight onto a simple example that people can actually try out.

[0]: https://github.com/akkartik/wart

-----

2 points by akkartik 2632 days ago | link

That's a good point. I have similar instructions here, but they're in the second section, 3 screens down..

The audience is assembly-curious programmers, but you aren't expected to know any assembly. I just want to try to hook anyone interested in the goal. If you're interested in a stack you can understand from the ground up, I'm willing to try to explain things to you.

-----

2 points by hjek 2632 days ago | link

I might just be outside the target audience, for now. I don't even know any C. Realistically, I think it would have to be spoon-fed to me in some Bret Victor-esque crocodiles and eggs[0] manner for me to not lose focus.

I went through this absolutely fantastic SQL tutorial this week. Perhaps you might find their list of pedagogical principles[1] useful?

I think one thing that potentially could tempt me into low-level code would be making cool tunes[2][3].

[0]: http://worrydream.com/AlligatorEggs/

[1]: https://selectstarsql.com/frontmatter.html

[2]: https://www.youtube.com/watch?v=GtQdIYUtAHg

[3]: https://www.youtube.com/watch?v=qlrs2Vorw2Y

-----

2 points by akkartik 2632 days ago | link

I just made some tweaks to the Readme. What do you think?

https://github.com/akkartik/mu/commit/4650c8188f

https://github.com/akkartik/mu/blob/master/subx/Readme.md

-----

2 points by hjek 2631 days ago | link

Yea, I think that's more inviting.

I got to try out the point of compiling and trying out your programs now. `ex8`, `ex9` and `ex10` all segfault here.

Some time ago I was at a wedding, and I was terribly bored, until I found out that the guy to my left was writing washing machine software in assembly. In a way it seems awfully primitive, e.g. your `ex11.subx` is 350 lines long and prints out `.......` but I guess in certain systems it's the only option, and what's underneath it all in any system.

-----

2 points by akkartik 2631 days ago | link

Thanks for trying them out!

Those examples expect arguments at the commandline, and I chose not to perform error checking for an example :) The focus lay elsewhere for them. See the comment at the top for each.

ex11.subx is running a test for each of those dots :)

-----