Thursday, January 14, 2010

Decrement in Reck

One obvious question about Urbit, of course, is whether you can actually build a practical programming language that uses Nock as an intermediate representation.

Does Watt exist? It does not. It is vaporware. However, I know the problem is solvable, because I've actually built several of these languages. The only one I've actually bootstrapped (that is, used to compile itself to Nock) is the ancient Dogo, which barely had a type system to speak of.

Because any Nock language has to build arithmetic from scratch, the equivalent of "hello, world" is always decrement. Here is decrement in Reck, the predecessor to Watt:
=.  a=@
|=
  =.  b=0
  |-
    ?:  (eq a (inc b))
      b
    $(b (inc b))

(This, of course, assumes a kernel which exports gates named eq and inc. But since equivalent operators are built into Nock, these are trivial.)

Reck (like Watt) can be seen as a Lisp dialect, if you can still call something a Lisp dialect when it has (a) a static subject rather than a dynamic environment, and (b) algebraic, higher-order type inference. The syntax is also quite a bit different, as you see. Watt is a lot like Reck, except that all the details are different. (Also, the type system is not as broken.)

If you care to see some more Reck code, you can download a working Reck kernel, through arithmetic and container types, from Google Code. To bootstrap the language, the next stages would be a parser and a Reck-to-Nock compiler. Unfortunately, the Reck type system turned out to need a major overhaul, so...

Wednesday, January 13, 2010

Urbit: functional programming from scratch

Urbit is a static functional namespace: a programming environment specified as a single stateless, frozen function. The Urbit function is Maxwellian: formally defined by a small system of axioms. Urbit, while a cool toy, is designed for practical computing.

If this sounds like it needs some explaining, it does. But first, since Moron Lab is just a blog and not a recipient of Federal funding, I can start with a vaporware disclaimer. Urbit, while all the things I say it is, does not exist. Indeed, it may never exist.

Urbit, which is vaporware, can indeed be described as a "static functional namespace." Or it could if it existed, which it does not. While Urbit is designed to be useful, it is only a design. Some of this design is complete; some is not. It is not in any sense useful at present, because it does not exist. It may be useful in future, or not. I expect the former; I believe in the former; but as Mandy Rice-Davies said, "well, he would, wouldn't he?" I have certainly not demonstrated it, and will not any time soon.

That said: a what? What in the bejesus am I peddling here? What percentage is real, what percentage is vapor? What does it do, why does it do it, and when will it do it? Most important: what problem is it supposed to solve?

Since these are always the questions I ask when introduced to someone else's random vaporware, it seems sensible to ask them of myself. It seems impolite to suddenly start spouting technology. Besides, as we'll see, there is really not a lot of technology here. Context must always come first.

You may also be interested in about comparable efforts for defining software from the bottom up, such as Kay and Piumarta's work at VPRI, or Paul Graham's Arc. This list should probably be longer - if you know of any other active effort that should be on it, please notify. If non-active efforts were included, the list might exceed Blogger's disks. The problem, by definition, is not a new one.

If you follow those links, you'll see that everyone has their own introduction to "software from scratch." So, of course, do I. Mine starts with Mars.

Few of us have been to Mars. But science fiction has filled in many of the gaps. We know that the Martian civilization is immensely old. Its software technology, therefore, must be similarly antique. Obviously nothing like it exists on earth - though those of us who remember VMS might be deceived.

Whether we remember VMS or not, some of us are unhappy with the state of Earth software. Therefore it behooves us to consider an alternative. Since we have not yet established communications with the Martians (who retreated into tunnels as their planet froze), we cannot just FTP their code. We could, however, try to write it.

What is Martian code actually like? There are two possibilities.

One: since Earth code is fifty years old, and Martian code is fifty million years old, Martian code has been evolving into a big ball of mud for a million times longer than Earth software. (And two million times longer than Windows.)

This hypothesis strikes me as possible, but implausible. Since the big ball of mud expands indefinitely, Martian code would therefore be so large and horrible that, despite its underground installed base, the server room bulged into space like a termite mound, intercepting low-flying asteroids and stinking up the solar system all the way to Pluto. Our latest space telescopes would surely have detected this abominable structure - if not, in fact, collided with it.

Two: therefore, at some point in Martian history, some abject fsck of a Martian code-monkey must have said: fsck this entire fscking ball of mud. For lo, its defects cannot be summarized; for they exceed the global supply of bullet points; for numerous as the fishes in the sea, like the fishes in the sea they fsck, making more little fscking fishes. For lo, it is fscked, and a big ball of mud. And there is only one thing to do with it: obliterate the trunk, fire the developers, and hire a whole new fscking army of Martian code-monkeys to rewrite the entire fscking thing.

This is such an obvious and essential response to the big ball of mud pattern that, despite the fact that we know nothing about Mars, we can deduce that it must have happened on Mars. Probably several times. Probably several hundred. For each of these attempts but the last, of course, the result was either (a) abject failure, (b) another big ball of mud, or (c) both.

But the last, by definition, succeeded. This is the crucial inference we can draw about Mars: since the Martians had 50 million years to try, in the end they must have succeeded. The result: Martian code, as we know it today. Not enormous and horrible - tiny and diamond-perfect. Moreover, because it is tiny and diamond-perfect, it is perfectly stable and never changes or decays. It neither is a big ball of mud, nor tends to become one. It has achieved its final, permanent and excellent state.

For instance (and just for instance - this is by no means a guarantee of perfection), Martian code is entirely Maxwellian: computationally defined by a small set of axioms. In other words, it is formally specified - always and everywhere. Nothing else could be perfect.

Of course, Martian code is also Martian. So completely incompatible with Earth code. Note, however, that Mars had to go through this same transition - they had to throw away their old Windows for Mars (tm), and adopt this new, tiny, diamond-perfect kernel.

But can Earth code cooperate productively with this alien intruder? It sure can. All it takes is glue - and Earth code is no stranger to glue. If there's one thing Earth doesn't need, it's a new glue language. There is certainly no difficulty in gluing Mars code to Earth code. From the Earth perspective, Martian software is just another strange, mutually incompatible doohickey. Welcome aboard! Alas, our mudball is already ornamented with many such curios. They stick quite well.

Moreover, Martian code is so perfect that it cannot be contaminated, even in this sticky environment. The general structure of cross-planet computation is that Earth always calls Mars; Mars never calls Earth. The latter would be quite impossible, since Earth code is non-Maxwellian. There is only one way for a Maxwellian computer to run non-Maxwellian code: in a Maxwellian emulator. Any direct invocation is an implementation error by definition. Thus, Mars remains completely pure and Martian, even while glued firmly to Earth.

Unfortunately, Moron Lab is not in contact with Mars and cannot download their tiny, diamond-perfect, 49-million-year-old kernel. Therefore, we have had no alternative but to construct our own. It is not entirely done yet, and even the done parts are probably nowhere near the quality of real Martian software. Getting them there will require an actual collective effort, in the traditional open-source manner. Instructions for participation are at the end of the post.

As with any dangerous new technology, it's fortunate that Urbit is not yet fully armed and operational. It's our plan to introduce Martian computing gradually, to avoid producing serious social unrest. He who controls the spice controls the universe. While Martian code is not in any real position to dominate the universe, it cannot be described as Martian with a straight face, unless clearly designed to dominate the world. After all, it's already dominating Mars!

Any ersatz, Earth-produced Martian code is truly Martian in character only if it is obviously and self-evidently immune to any extension or improvement. Simplicity inflation is a chronic industry disease; Martian software must be far beyond simple. It can only be described as stupid in its majestic triviality. Or, of course, moronic.

Last week, we saw the Martian computational automaton, Nock. (Or, in the original Martian, Nwoglecq.) I posted the spec of Nock - but I didn't say what it was for. Today, we'll see what it's for.

Let's start by taking a closer look at the Martian data structure: the noun. Broadly, this is the Martian equivalent of XML, JSON, or S-expressions. Here is the definition:
A noun is either an atom or a cell. An atom is an unsigned integer of any size. A cell is an ordered pair of any two nouns.
Doesn't this data structure strike you as Martian (or moronic) in quality? Some readers may say: yes, obviously. This is the Martian data structure. Others may not be so sure.

Let us, therefore, defend this structure - the noun. We'll see why it's absolutely minimal and perfect. To validate the exercise, we'll forget Mars for a moment, and explain nouns in mere Earth terms.

First, we note that nouns reflect no concern whatsoever with the fascinating problem of deriving mathematics from propositional logic, set theory, etc. The unsigned integer, or natural number, is boldly and uncritically adopted as an axiomatic feature of the universe - as is the even more speculative ordered pair. Unlike the lambda calculus, Nock represents code as data, not data as code. This is because it is system software, not metamathematics. Its loyalties in this matter are not even slightly divided.

The Earth software that's most like Martian software is, of course, Lisp. (Lisp claims to be based on lambda, but the connection is in fact dubious - a true lambda language is more like Haskell.) Nouns can be described as "S-expressions without the S" - that is, there are no symbols or other base types; atoms are completely typeless. Note that Rivest's IETF draft also makes this change, although he (a) defines his atoms as octet-strings, not unsigned integers; and (b) remains list-centric, not pair-centric.

Of course, an octet-string and an unsigned integer are the same thing. And by general convention and common decency, nouns are pair-centric. In Nock, it is bad form to stick unnecessary terminators on tuples. Or, as a Lisper would say, "improper lists." Other than these unfortunate articles of baggage, though, Rivest's draft has a lot of cool syntax, which could easily be adapted to nouns and probably should be - if just to quell the suspicion, perhaps already pregnant in your mind, that Urbit is really just the world's worst case of NIH syndrome.

Note, however, that there is no textual language definition associated with nouns. There is no one "noun syntax." There will never be. We define the data structure, not the syntax. A remarkable variety of functions map invertibly between an atom (such as a text file) and an arbitrary noun. Except for standardization reasons, there is no reason to privilege any one such function. And as for standardization, Urbit's approach will shortly be clear!

Note the similarity of nouns to the Unix file abstraction. An atom is actually slightly cleaner than a Unix bytestream, because an atom's size in bits is data, not metadata. (From the bytestream perspective, direct atomic encoding ignores trailing null octets.) And of course, because nouns also have cells, they can represent actual data structures directly - without encoding.

Nonetheless, if you look at pre-Unix OS, you see filesystems festooned with an enormous and inordinate variety of built-in file-type mechanisms. One of the greatest achievements of Unix was to take out all typed records and shoot them. Unfortunately, Unix did not provide for type at a higher layer, but it inevitably crept back in as filename extensions and MIME types. Extension and MIME together are not exactly the world's best type system, especially as there is no formal specification of a MIME type. But they work.

So it's not that Unix has no type system. It's that it has a layered type system. A MIME type specifies a set of files - at least, a good MIME specification can be converted, typically with nontrivial human effort, into a boolean predicate function which tells you whether a typeless bytestream is or is not, say, a GIF. Because the generic data model (bytestreams) is not cluttered with quixotic attempts at defining image formats, the description layer remains clean.

Nouns are designed on this principle. For instance, while nouns have no string type, any string is an octet-string, and any octet-string is an unsigned integer, ie, an atom (conventionally LSB first). Because there is no way to tell whether an atom is an integer, a string, or something else, printing an untyped noun is at best a delicate and heuristic process. Of course, so is printing an untyped Unix file.

Adding a string type seems like a very simple improvement to nouns. In fact it would be a disaster. For instance, what character set are the strings in? Unicode? Great. You've just taken a data model defined in three sentences, and expanded it to a shelf-sized manual set. If you have a shelf that strong, why not just start with VMS? Or, of course, you could include some simple, broken, built-in string type, in which case you're making the same mistake that VM/370 made with its record formats: inserting a low-level facility which higher layers will (or should) work around, not with.

Moreover, note that nouns are acyclic. In my own personal opinion, the task of constructing formal descriptions of cyclic data structures is one of the most substantial wastes of human brainpower since Thomas Aquinas was a little boy. Like the Revelation of St. John the Divine, the foundations of mathematics, or the classification of butterflies, this problem appears capable of absorbing arbitrary human intelligence in both quality and quantity.

So why does Earth use cyclic data structures, and why doesn't Mars? This demands a digression on Nock and performance.

Cyclic pointer graphs are, of course, essential to the high performance of low-level programming languages, such as C. There is no C on Mars, but there is something quite like it. It is not difficult to define a low-level language in functional, Maxwellian terms: just model an address space as an atom, and iterative mutations of that address space as processing steps. If you define a C interpreter this way, you have a functional definition of C - which need not be executed by naive iteration. But I digress.

The point is: if your goal is merely to model data, you neither need nor want cyclic graphs nor pointer identity. (Note that while nouns containing replicated subtrees can and should be represented physically as acyclic graphs, there is no Nock operator, like Lisp eq, which reveals pointer identity.) Nor are cyclic structures required to specify any interesting algorithm.

Rather, pointer cycles are useful only for efficient implementations of certain algorithms. Which is not nothing. However, it suggests an approach to optimized noun processing. Specify your entire program in functional, Maxwellian code; if this is not fast enough, implement key functions in C.

These C functions, or jets, will take noun arguments and produce noun results. In between, they can do whatever the heck they want - so long as they produce the same result as the actual Nock formula. They can even make system calls and have side effects, though this is a very bad idea. A jet is not a "native method."

The assumption that any practical Nock interpreter will be optimized for the specific formulas it is expected to execute, jet-propelling all well-known inner loops, allows Nock to discard many programming-language features normally considered essential to efficiency - from cyclic graphs, to built-in arithmetic. Because its only arithmetic operator is increment, decrement in Nock is an O(n) operation. But this is not a practical concern, because a production Nock interpreter will recognize the standard decrement formula and substitute an efficient jet. The approach is quite scalable.

Thus the design narrative for nouns. The result is a design that could have been invented in 1960 - and, in various forms more or less corrupt, has been invented umpteen times since 1960. However, it also clearly benefits from 50 years of programming history, and it seems to satisfy Saint-Exupéry's principle. Martian code may not always be this clean, but it should at least try to be.

Your reaction to nouns may be: but does it need to be this clean and simple? Yes. It does. This is perhaps my most significant objection to "other peoples' languages." They all want to be clean and simple. Everyone does. But their idea of clean and simple: terminating bacterial growth. My idea of clean and simple: terminating atomic motion.

Which is not because, like Sir James Dewar,
Sir James Dewar
Is a better man than you are
None of you asses
Can liquefy gases
I'm a better man than they are. It's because I have a different problem to solve. If mine can be solved at all, which is not at all clear, it can be solved only at cryogenic temperatures.

In case this metaphor is too weird for you, let me make it concrete. Normally, when normal people release normal software, they count by fractions, and they count up. Thus, they can keep extending and revising their systems incrementally. This is generally considered a good thing. It generally is.

In some cases, however, specifications needs to be permanently frozen. This requirement is generally found in the context of standards. Some standards are extensible or versionable, but some are not. ASCII, for instance, is perma-frozen. So is IPv4 (its relationship to IPv6 is little more than nominal - if they were really the same protocol, they'd have the same ethertype). Moreover, many standards render themselves incompatible in practice through excessive enthusiasm for extensibility. They may not be perma-frozen, but they probably should be.

The true, Martian way to perma-freeze a system is what I call Kelvin versioning. In Kelvin versioning, releases count down by integer degrees Kelvin. At absolute zero, the system can no longer be changed. At 1K, one more modification is possible. And so on. For instance, Nock is at 9K. It might change, though it probably won't. Nouns themselves are at 0K - it is impossible to imagine changing anything about those three sentences.

With these basic conceptual tools in place, let's step back and design the Martian computing infrastructure. We have already started to look at it from one perspective: from the axioms up. Let's refocus and take a different angle: from the network in.

Strangely enough, some insight into Martian networking can be gleaned by reading this Wikipedia page, or watching this Van Jacobson talk. (VJ, in case you're not aware, is a god in human form.) When I saw the latter I thought: "that bastard had my idea!" "Bastard" was not actually the word. It was a worse word - illegal, in fact, in many states. But of course, any truly Martian idea is far too simple to be original. Fortunately, VJ did not have my whole idea - just the network part.

I call the Martian network architecture source-independent networking, or SIN. In almost every way, a source-independent network is exactly the same as today's insecure, unreliable packet network. However, there is one difference. To make IP source-independent, remove one 4-byte field: the packet source address.

In SIN, a packet's processing is independent of the sender. Its meaning must be computed from the packet itself. If you're not instantly convinced that this is right and good and true, watch the talk - VJ will tell you all about it. VJ more or less invented TCP, which is fundamentally source-dependent. So I'd kind of suggest you believe him on this one. It takes a true engineer to throw out his own mistakes.

On Mars, SIN is taken to an extreme. Logically, Urbit is a single broadcast network - a single big Ethernet wire. Everyone sees everyone else's packets. You don't send a packet. You "release" it. As a matter of practical optimization, of course, routing is necessary, but it is opaque to the receiver. If there is a routing envelope, it is stripped before processing.

An easy way to see this is to compare an Urbit packet, or card, to an IP packet. An IP packet has a header, which is visible both to the routing infrastructure and the application stack. The header, of course, contains the infamous source address. Urbit removes not just the source address, but also the rest of the header. An Urbit card is all payload - it has no region seen both by receiver and router.

A card is simply a naked atom - a large unsigned integer. Nothing about its internal structure can be assumed. Obviously, since Martian networking is insecure and unreliable, it may consist entirely of garbage. Anyone could have sent us this garbage. Moreover, whether a card is garbage or not, we can discard it for any reason at any level.

Of course, a card is not at all opaque or without structure. It would be a strange routing infrastructure indeed in which the routers never examined their payloads. But logically, a card is simply a naked PDU (as the old WAP specs used to put it - if you hear an individual using this term, while he may be a fine individual personally, in professional life he is the only thing worse than a Bellhead: a Euro-Bellhead; flight is indicated) in the Urbit protocol. Ie: an Urbit message, which can be delivered to any Urbit host, at any time, for any reason.

In Urbit, the function is the protocol. Which is also the application. On Mars, there is only one application: Urbit. There is one operating system: Urbit. There is one network protocol: Urbit. One ring to bind them all, and in the darkness hold them!

On Mars, Urbit is TCP, Urbit is HTTP, Urbit is Linux, Urbit is Java, Urbit is Tomcat, Urbit is Mozilla. Urbit is the entire application stack above the network layer, on both server and client sides. There is no pre-Maxwellian application logic presently executing on Mars, although a few of the darkest, most ancient histories allude vaguely to such abuses - just as in the oldest books of Judaism, we see hints of human sacrifice. (Urbit is actually dedicated to the Roman bull-god, Mithras.)

Below Urbit, as below any standard, is whatever it wants to be. On Earth: the classic American operating system. Presumably on Mars there is some classic Martian equivalent. Just as almost anything can run a Web browser, almost anything can compute Urbit.

Back to the network-up design. We have seen the network. Up to the computer.

The formal state of an Urbit computer is its deck: the list of cards it has received, newest first. Clearly, there is only one network on Mars. Just as clearly, there is no pre-network legacy I/O. Everything the computer knows, it knows because it received a card that told it so. Even time can (and should) be delivered by card.

Now, a small reality check. In practice, is it always practical for a computer to store all the packets it has ever received? In some cases, this is exactly what it should do. Disk space is much cheaper than network bandwidth. The ratio is only changing in the former's favor. In other cases, of course, storing the entire deck would be preposterous. Broadly, most servers should; most clients should not.

It is always practical, however, to define the computer's semantics as a function of its deck. Ideally, these semantics should be crafted so that computers not needing the whole deck can devise some hack to avoid storing it. In general, one of the great benefits of tightly-specified systems is that they can be safely hacked to within an inch of their lives.

But what is the function? Ah. Well, this is where the vaporware comes in. I don't have the Urbit function, you see. I have a description of the Urbit function - an interface. Eventually, I expect to implement this description. The result will receive a Kelvin number - probably about room temperature. When Urbit reaches 0K, it is done and frozen forever. This will be as soon as possible, and no sooner.

What does Urbit do? Urbit defines the functional semantics of a computer's deck. The deck is the computer's state; to compute, we must query it. This suggests an obvious interface, which I render in pseudo-C:
noun
Urbit(noun path, atom_list deck)
{
  return file;
}
In other words, the Urbit function is a functional namespace. This pure, stateless function takes two arguments, a name or path, which is an arbitrary noun; and a packet log or deck, which is a list of atoms. It returns a value or file, also an arbitrary noun. I trust that we are not far into rocket-science territory here.

The function can fail, of course. Like its computational automaton, Nock, Urbit is crash-only. All errors are modeled as nontermination - ie, an infinite loop. Of course, your actual implementation need not spin forever upon mandatory nontermination. It is just a way of saying that all computation ends, and no result is produced.

(This is generally referred to as an exit. An exit is like an exception, except without an exception object. If a Nock computation does not exit, but succeeds, it is said to pass. If it is aborted externally for failure to compute, it is said to tank. Thus a mandatory or formally detected infinite loop exits; a heuristically suspected infinite loop, or any other computation judged impractical, tanks.)

When Urbit passes a file for some path and deck, the path is said to be bound in that deck. When Urbit exits, the path is unbound. The path-deck-file triple is the binding.

Urbit is not just a functional namespace, however. It is a static functional namespace. That is: its definition ensures that for every binding
Urbit(path deck) == file
we also know
Urbit(path [card deck]) == file
In other words, once a binding is determined, new cards cannot revise it. Once Urbit learns something, it cannot un-learn it. A functional namespace obeying this property also could be described as monotonic, but I prefer the simpler static. Static namespaces, for instance, are common in revision-control systems - for obvious reasons.

I hope you'll agree that this problem definition is not rocket science. Surprisingly, however, the legitimate FP community (by which I mean the Haskell community) is still searching for it.

As Conal Elliott writes:
This perspective is what’s behind my confidence that we will someday really learn to look at whole systems in a consistently functional/denotational [style] (simple and precise semantics).
Read the whole thing. Haskell fans, you'll know exactly where you can stick your monads. Indeed, Elliott on monadic I/O sounds a lot like VJ on connection-oriented networking. This is not, I believe, a coincidence. If Urbit is like anything, it's like Elliot's FRP - though with no monads, gonads, morphinisms, monoids, combiguators, or other implements of metamathematical torture.

Now, let's step back a moment and look at the most important aspect of the Urbit function. The most important aspect of the Urbit function is that it is frozen. It is nothing special to write a function which maps path and deck to file. It is something else to freeze one. Everyone on Mars uses the same Urbit, and has for 49 million years.

It is also something else to make that function universal. Mars has no computational infrastructure other than Urbit. Code on Mars is of two types: code within Urbit, and implementations of Urbit. In practice, this means that Urbit has to be able to do anything any application can do, and do it well.

So we have the Urbit problem: a frozen, universal, static functional namespace. In other words: a standard static functional namespace. Perhaps we are getting further into the rocket-science department. It is not hard to write a functional namespace. But how can we standardize a functional namespace? You will search the RFC Editor's shelves in vain for anything even remotely like Urbit.

In general, IETF RFCs and things like them (IETF is at the top of a tall food-chain of Internet standards organizations, descending in quality) do not standardize functions. When they have to standardize functions, they do not use any existing programming language. They use pseudocode - in other words, informal English. This is not out of stubbornness or stupidity - it is exactly the right thing to do.

The problem is a mismatch in objectives between the people trying to standardize languages, and the people trying to standardize protocols. There is simply no Turing-complete function definition format that is small and stable enough to reference as a dependency in an RFC. If you're writing an RFC, here is the type of external standard you want to depend on: ASCII. Even XML is a bit dodgy. But even a "simple" practical language like Scheme is defined by a 50-page manual. Show me a 2-page spec that depends on a 50-page spec, and I'll show you a 52-page spec.

In an RFC, pseudocode or English is better than Scheme, because Scheme would give a false impression of precision. Scheme is defined by an old-school language-standardization process whose goal is to terminate bacterial growth. Internet standardization means terminating atomic motion. Scheme seems pretty cool to Schemers, but it's nowhere near chilly enough for Urbit.

Urbit, of course, is defined by a Nock formula:
Urbit(path deck) == Nock([path deck] urbit-formula)
Since Nock is specified in less than 200 words of English, so is Urbit. The formal precision of Urbit is the formal precision of Nock. The formula does not add to this total - it is code, not English. And Nock, I think, is about as precise as a practical automaton can get.

Of course, this urbit-formula noun is not at all a pretty thing - a large (though not vast - certainly measured in kilobytes, not megabytes) tree of 0's and 2's and 3's. The definition of Urbit is certainly precise, because Urbit is defined in Nock, and Nock is precise. However, since it is not easy to write a big tree of small numbers, surely it is quite impossible for everyone to agree on a big tree of small numbers. Standardization requires extreme clarity, whereas Nock formulas are extremely obscure.

Therefore, we need another deliverable: a human-usable language, Watt. Watt is defined as follows:
Watt(source) == Nock(source watt-formula)
Thus, for instance,
urbit-formula == Watt(urbit-source) 
              == Nock(urbit-source watt-formula)
watt-formula == Watt(watt-source) 
             == Nock(watt-source watt-formula)
In English, Watt is a self-compiling compiler that translates a source file to a Nock formula. Watt is formally defined by the combination of watt-source and watt-formula. The latter verifies the former. Again, we are not in rocket-science territory here.

To bootstrap Watt, we write Watt in C, then use that implementation to write Watt in Watt, then ensure the two are equivalent. Thus, we never have to write anything in raw Nock - and we never do. (Except as an exercise, of course.)

Thus, to agree on Urbit, here is what we need to agree on: the Nock spec, the Watt source, and the Urbit source. The Nock spec is pseudocode, but very short and already very cold. Watt and Urbit are Maxwellian source, both readable and well-defined. (Watt, of course, must be not just a usable language, but an excellent one.) They should also be small. Definitely under 5000 lines of Watt - ideally, under 2000. Anything much larger would be almost impossible to get to zero Kelvin. As it is, the task will be quite difficult.

Should this be submitted to an actual standards organization, such as IETF? Maybe, but maybe not. As a Marxist might say, the historical mission of the standards forum may be reaching its end. An even more challenging test for a proposed standard, such as Urbit, is that it be suited to standardization without a formal process. If it is hard to standardize a system through IETF, it is even harder to standardize it without IETF. Obviously, any challenge that forces Urbit to be more Martian is good for Urbit.

This poses an obvious question. Is this really possible? How in the world can a 5000-line program, in any language, constitute an entire planetary software infrastructure? We may be able to freeze Urbit - but can we make it universal?

If Urbit has any hope of universality, it has to do something else which is traditionally Not Done (tm). It has to run code off the network. To run code off the network is to extract a source tree, or other application representation, from the deck, and execute it within the Urbit function. Yikes!

It is easy to see that execution of remote code is an absolute requirement for any source-independent network. Since the network does not have a source address, it must validate packets through cryptography. Since no encryption algorithm is proven secure, any serious functional namespace must be able to upgrade its crypto algorithms. To validate packets secured by the new algorithms, it must run code that was delivered as data - secured by the old algorithms.

Again, we see why a new language is needed. The reason, as always, is a new problem. The only family of functional languages designed to execute code as data is the Lisp family. For instance, to write Urbit in Haskell, you would need a Haskell interpreter written in Haskell. It could be done, but I don't think it has been done; and Haskell is certainly not designed to do it well. Therefore, the revolution will not be Haskellized.

Most Lisps can compile and evaluate remote code. At least, if your Lisp can't, it is not much of a Lisp! However, I know of no Lisps designed to solve this problem. I also know of no Lisps which are anywhere near as frozen as Nock. Hence: a new language.

In case you're still wondering whether Urbit can actually serve as a frozen, universal programming environment, let's pull the camera back and examine this monster in the large.

Any Urbit computer may be described in three ways: by what it hears, what it knows, and what it does. By analogy to another famous TLA, we can call this the HKD model.

What the computer hears is the list of cards it has received - its deck. One computer, one deck. Not even the computer's identity is defined outside the deck. Its identity is a function of what it has heard, which is the deck. What makes a deck, say, Igor's deck? A deck is Igor's deck if it contains Igor's secrets. This enables it to decode Igor's email, display Igor's photographs, talk to Igor's bank, etc. Generally, it enables Urbit to compute Igor's applications.

What the computer knows is the Urbit namespace: the meaning of the deck. For instance, suppose your computer wants to display the front page of the New York Times. "The front page of the New York Times" is not a phrase that can be mapped to a static path. "The front page of the New York Times at 6:15:01 PM, January 12, 2010" is. Similarly, "the image on Igor's monitor" is not a static path; "the image on Igor's monitor at 6:15:01 PM, frame 12, pixel (653, 24)" is.

If Igor is reading the Times at that time, Urbit must know that Igor's monitor is showing an image that is a rendering of the Times front page as of that time. Of course, this requires Igor's deck to actually contain an up-to-date copy of the New York Times.

Which presumably requires Igor's computer to have contacted that of the Times, etc, etc. How does Urbit send cards? It only appears to receive them. Is it just half the protocol definition?

This gets us to the third part of the HKD model, which is what the computer does. Which is what? What does it do? Oddly enough, we haven't said anything about this at all.

Answer: what it does is its own damned business! Urbit is no more than a recipe for how to interpret cards. A computer must act; it is useless unless it causes some side effect, whether by actual, physical output, or just by releasing more cards. No standard can constrain its actions. All Urbit can do is to define the knowledge it needs to act; all it can hope is that this knowledge is computed by the Urbit namespace, not ad hoc in the traditional manner - or worse, by a competing functional namespace.

So, for instance, Igor is reading the New York Times. To display the Times, his computer must contain the Times. To get the Times, it must contact the Times. To contact the Times, it must release and route cards. In other words, Igor's browser application must know that, at some time T, Igor does not have the Times; Igor wants the Times; and Igor's browser, to satisfy this request, proposes to release certain cards, with envelopes that should route them to the Times. All this, Urbit can know.

Urbit can specify all this knowledge - or, to be more precise, compute it from Igor's deck, which presumably includes his keystrokes and mouse movements. But Urbit cannot do. Urbit cannot force an Urbit implementation to actually calculate Igor's browser requests, and route the resulting cards to the Times. It can define the calculation; it cannot cause it to happen, or cause the results to be used.

And that's all for this week. However big your brain is, I suspect it is probably full.

Next Thursday, we'll dig a little deeper into Urbit. Which is vaporware - but not as vaporous as this post might make it seem. In particular, although there are many enormous problems, both practical and theoretical, in making this architecture work, the most significant impediment is the problem of allocating the namespace. After all, since everyone on Mars is running the same function, they all have to share the same static namespace. One can imagine some conflicts.

Any good namespace allocation must solve the unsolvable problem known as Zooko's triangle. The problem is, of course, unsolvable. Therefore, Urbit does not and cannot solve it. However, in the classic OS style, we can score 100% on one vertex and 75% on the other two. This quasi-solution is a design I call digital feudalism - of which more later.

If you're interested in joining the Urbit conspiracy, there's a simple step you can take. Take the Nock challenge - that is, write a Nock interpreter, then a Nock decrement formula, and mail them to me. (My Google account is cgyarvin.) While this task is of no use whatsoever, and (as previously noted) decrement is the only formula you will ever write in raw Nock, it makes an excellent pons asinorum. If you succeed, you know your Nock.

In my original post I said that this was "not a difficult problem." This may have been an overstatement. With 16 (unspecified) prizes to distribute, I've received only two (unsolicited) solutions. One of these lucky winners is a math major at Caltech; the other is a math Ph.D in Budapest. So perhaps it is slightly difficult.

But still, you can do it! And now that you know what Nock is for, perhaps you will. He who controls the spice controls the universe...

Saturday, January 9, 2010

Maxwell's equations of software (Nock)

This post, although by me and not by him, was generously hosted by my good friend Mencius Moldbug.

I love Mencius to death, but let's face it - he's a strange guy. And his blog is even stranger. Frankly, I'd stay off it.

So here's a copy of the original UR post - the Nock spec, plus some mysterious non-clarification:
1 Context

    This spec defines one function, Nock. 

2 Structures

    A noun is an atom or a cell.  
    An atom is any unsigned integer.  
    A cell is an ordered pair of any two nouns.

3 Pseudocode

    Brackets enclose cells.  [a b c] is [a [b c]].

    *a is Nock(a).  Reductions match top-down.

4 Reductions

    ?[a b]           => 0
    ?a               => 1

    ^[a b]           => ^[a b]
    ^a               => (a + 1)

    =[a a]           => 0
    =[a b]           => 1
    =a               => =a

    /[1 a]           => a
    /[2 a b]         => a
    /[3 a b]         => b
    /[(a + a) b]     => /[2 /[a b]]
    /[(a + a + 1) b] => /[3 /[a b]]
    /a               => /a

    *[a 0 b]         => /[b a]
    *[a 1 b]         => b
    *[a 2 b c d]     => *[a 3 [0 1] 3 [1 c d] 
                            [1 0] 3 [1 2 3] [1 0] 5 5 b]
    *[a 3 b]         => **[a b]
    *[a 4 b]         => ?*[a b]
    *[a 5 b]         => ^*[a b]
    *[a 6 b]         => =*[a b]
    *[a [b c] d]     => [*[a b c] *[a d]]
    *a               => *a


For context and comparison, see this or this or this. Here at UR, axioms are axioms - we leave no "niceties such as arithmetic" to the programmer's imagination. Just sayin'.

To grok Nock, construct a formula f such that *[s f] equals (s - 1) for any atomic nonzero subject s. The equivalent formula for (s + 1) is [5 0 1]. The first 16 people who mail me a correct answer may (or may not) win a prize, which may (or may not) be valuable. (Please include your interpreter - in any language, it should fit on a page. Please do not post the answer here or anywhere, and let me know if you see it posted. This is not a difficult problem.)

As you'll quickly see if you try this exercise, raw Nock is not a usable programming language. But nor is it an esoteric language or automaton, like SK combinators. Rather, Nock is a tool for defining higher-level languages - comparable to the lambda calculus, but meant as foundational system software rather than foundational metamathematics.

To define a language with Nock, construct two nouns, q and r, such that *[q r] equals r, and *[s *[p r]] is a useful functional language. In this description, p is the function source; q is your language definition, as source; r is your language definition, as data; s is the input data. You will find this a tractable and entertaining problem.

Moron Lab: author, goals and guidelines

Moron Lab (the name is a pun) presents my independent (but stupid) computer-science research.

Actually, like all sound engineers, I despise this term. I am a computer programmer. Nonetheless, if you absolutely have to read an authentic CS paper by me, you can go here. Google claims 52 citations for this thing, which I must say is not bad for a 19-year-old. Unfortunately, I am no longer a 19-year-old.

The rest of my dubious non-career can be condensed, with some loss of truth, to one word: WAP. If you ever tried and failed to use the broken "microbrowser" in your old cell phone, I deeply apologize for this entire fiasco. Not that I was responsible for your suffering - but I may have profited from it. Carriers still get considerable ARPU off accidental activations of the WAP browser whose kernel I wrote.  In fact, I think Virgin Mobile took about $10 off me this way, and is liable to get more. Alas, I am no longer connected to this revenue stream.

Due to my great personal desire to atone for these and other abortions, all Moron Lab content is entirely unauthorized, unsupervised, unsupported, uncopyrighted and unpatented. Come and get it!

So long as you don't pretend my work is yours, you can do whatever you like with it. (If my work actually is yours, it's probably because you had the same idea, but you had it first. This is common with good ideas! Send me a link and I'll post it.)

To be exact: all code and documents posted at Moron Lab are in the public domain. All code, algorithms, and designs described are mathematical formulas - nowhere patentable, at least not legally. While so for all code, this is even more so for Maxwellian code.

(However, I assume no liability for any bandit-raids some patent brigand may make on your purse. Try to travel without purse; if purse you have, guard it. Odin in the Havamal says:
A wayfarer should not walk unarmed,
But have his weapons to hand:
He knows not when he may need a spear,
Or what menace meet on the road.

For "spear," read "lawyer." In the early 21st century, Odin's advice is your only prudent approach to patent "law." The spear industry is certainly not about to go away.)

And yes, C. Guy Yarvin is my real name, although in real life I go by the C. This adjustment is an eggshell between my real life and my virtual life. It is not secure, but it can't hurt. So even if you know me as C. in person, I'll have to ask you to call me Guy in bits.

Intelligent comments at ML are welcome. Your comments will be read by me, and to the best of my ability responded to. However, please ensure they are directed to other readers, rather than to me - or, at least, that they will inform or entertain other readers. If you have something to say just to me, please send it to my Gmail account, which is cgyarvin.