Urbit is a
static functional namespace: a programming environment specified as a single stateless, frozen function. The Urbit function is
Maxwellian: formally defined by a small system of axioms. Urbit, while a cool toy, is designed for practical computing.
If this sounds like it needs some explaining, it does. But first, since Moron Lab is just a blog and not a recipient of Federal funding, I can start with a
vaporware disclaimer. Urbit, while all the things I say it is,
does not exist. Indeed, it may
never exist.
Urbit, which is
vaporware, can indeed be described as a "static functional namespace." Or it could if it existed, which it does
not. While Urbit is designed to be useful, it is only a
design. Some of this design is complete; some is
not. It is not in any sense useful at present, because it
does not exist. It may be useful in future, or
not. I expect the former; I believe in the former; but as Mandy Rice-Davies said, "well, he would, wouldn't he?" I have certainly not
demonstrated it, and will not any time soon.
That said: a what? What in the bejesus am I peddling here? What percentage is real, what percentage is vapor? What does it do, why does it do it, and when will it do it? Most important: what problem is it supposed to solve?
Since these are always the questions I ask when introduced to someone else's random vaporware, it seems sensible to ask them of myself. It seems impolite to suddenly start spouting technology. Besides, as we'll see, there is really not a lot of technology here. Context must always come first.
You may also be interested in about comparable efforts for defining software from the bottom up, such as
Kay and Piumarta's work at VPRI, or
Paul Graham's Arc. This list should probably be longer - if you know of any other active effort that should be on it, please notify. If non-active efforts were included, the list might exceed Blogger's disks. The problem, by definition, is not a new one.
If you follow those links, you'll see that everyone has their own introduction to "software from scratch." So, of course, do I. Mine starts with Mars.
Few of us have been to Mars. But science fiction has filled in many of the gaps. We know that the Martian civilization is immensely old. Its software technology, therefore, must be similarly antique. Obviously nothing like it exists on earth - though those of us who remember VMS might be deceived.
Whether we remember VMS or not, some of us are unhappy with the state of Earth software. Therefore it behooves us to consider an alternative. Since we have not yet established communications with the Martians (who retreated into tunnels as their planet froze), we cannot just FTP their code. We could, however, try to write it.
What is Martian code actually like? There are two possibilities.
One: since Earth code is fifty years old, and Martian code is fifty million years old, Martian code has been evolving into a
big ball of mud for a million times longer than Earth software. (And two million times longer than Windows.)
This hypothesis strikes me as possible, but implausible. Since the
big ball of mud expands indefinitely, Martian code would therefore be so large and horrible that, despite its underground installed base, the server room bulged into space like a termite mound, intercepting low-flying asteroids and stinking up the solar system all the way to Pluto. Our latest space telescopes would surely have detected this abominable structure - if not, in fact, collided with it.
Two: therefore, at some point in Martian history, some abject fsck of a Martian code-monkey must have said:
fsck this entire fscking ball of mud. For
lo, its defects cannot be summarized; for they exceed the global supply of bullet points; for numerous as the fishes in the sea, like the fishes in the sea they fsck, making more little fscking fishes. For lo, it is
fscked, and a big ball of mud. And there is only one thing to do with it:
obliterate the trunk,
fire the developers, and hire a whole new fscking
army of Martian code-monkeys to rewrite the
entire fscking thing.
This is such an obvious and essential response to the
big ball of mud pattern that, despite the fact that we know nothing about Mars, we can deduce that it must have happened on Mars. Probably several times. Probably several hundred. For each of these attempts but the last, of course, the result was either (a) abject failure, (b) another
big ball of mud, or (c) both.
But the last, by definition, succeeded. This is the crucial inference we can draw about Mars: since the Martians had 50 million years to try, in the end they must have succeeded. The result: Martian code, as we know it today. Not enormous and horrible -
tiny and diamond-perfect. Moreover, because it is tiny and diamond-perfect, it is perfectly stable and never changes or decays. It neither is a
big ball of mud, nor tends to become one. It has achieved its final, permanent and excellent state.
For instance (and just for instance - this is by no means a guarantee of perfection), Martian code is entirely Maxwellian: computationally defined by a small set of axioms. In other words, it is formally specified - always and everywhere. Nothing else could be perfect.
Of course, Martian code is also Martian. So completely incompatible with Earth code. Note, however, that Mars had to go through this same transition - they had to throw away their old Windows for Mars (tm), and adopt this new, tiny, diamond-perfect kernel.
But can Earth code cooperate productively with this alien intruder? It sure can. All it takes is glue - and Earth code is no stranger to glue. If there's one thing Earth doesn't need, it's a new glue language. There is certainly no difficulty in gluing Mars code to Earth code. From the Earth perspective, Martian software is just another strange, mutually incompatible doohickey. Welcome aboard! Alas, our mudball is already ornamented with many such curios. They stick quite well.
Moreover, Martian code is so perfect that it cannot be contaminated, even in this sticky environment. The general structure of cross-planet computation is that Earth always calls Mars; Mars never calls Earth. The latter would be quite impossible, since Earth code is non-Maxwellian. There is only one way for a Maxwellian computer to run non-Maxwellian code: in a Maxwellian emulator. Any direct invocation is an implementation error by definition. Thus, Mars remains completely pure and Martian, even while glued firmly to Earth.
Unfortunately, Moron Lab is not in contact with Mars and cannot download their tiny, diamond-perfect, 49-million-year-old kernel. Therefore, we have had no alternative but to construct our own. It is not entirely done yet, and even the done parts are probably nowhere near the quality of
real Martian software. Getting them there will require an actual collective effort, in the traditional open-source manner. Instructions for participation are at the end of the post.
As with any dangerous new technology, it's fortunate that Urbit is not yet fully armed and operational. It's our plan to introduce Martian computing gradually, to avoid producing serious social unrest. He who controls the spice controls the universe. While Martian code is not in any real position to dominate the
universe, it cannot be described as Martian with a straight face, unless clearly designed to dominate the
world. After all, it's already dominating Mars!
Any ersatz, Earth-produced Martian code is truly Martian in character only if it is obviously and self-evidently immune to any extension or improvement. Simplicity inflation is a chronic industry disease; Martian software must be far beyond simple. It can only be described as
stupid in its majestic triviality. Or, of course,
moronic.
Last week, we saw the Martian computational automaton,
Nock. (Or, in the original Martian,
Nwoglecq.) I posted the spec of Nock - but I didn't say what it was
for. Today, we'll see what it's
for.
Let's start by taking a closer look at the Martian data structure: the
noun. Broadly, this is the Martian equivalent of XML, JSON, or S-expressions. Here is the definition:
A noun is either an atom or a cell. An atom is an unsigned integer of any size. A cell is an ordered pair of any two nouns.
Doesn't this data structure strike you as Martian (or moronic) in quality? Some readers may say: yes, obviously. This is the Martian data structure. Others may not be so sure.
Let us, therefore, defend this structure - the noun. We'll see why it's absolutely minimal and perfect. To validate the exercise, we'll forget Mars for a moment, and explain nouns in mere Earth terms.
First, we note that nouns reflect no concern whatsoever with the fascinating
problem of deriving mathematics from propositional logic, set theory, etc. The unsigned integer, or natural number, is boldly and uncritically adopted as an axiomatic feature of the universe - as is the even more speculative ordered pair. Unlike the
lambda calculus, Nock represents code as data, not data as code. This is because it is system software, not metamathematics. Its loyalties in this matter are not even slightly divided.
The Earth software that's most like Martian software is, of course,
Lisp. (Lisp claims to be based on lambda, but the connection is in fact dubious - a true lambda language is more like
Haskell.) Nouns can be described as "
S-expressions without the S" - that is, there are no symbols or other base types; atoms are completely typeless. Note that Rivest's
IETF draft also makes this change, although he (a) defines his atoms as octet-strings, not unsigned integers; and (b) remains list-centric, not pair-centric.
Of course, an octet-string and an unsigned integer are the same thing. And by general convention and common decency, nouns are pair-centric. In Nock, it is bad form to stick unnecessary terminators on tuples. Or, as a Lisper would say, "improper lists." Other than these unfortunate articles of baggage, though, Rivest's draft has a lot of cool syntax, which could easily be adapted to nouns and probably should be - if just to quell the suspicion, perhaps already pregnant in your mind, that Urbit is really just the world's worst case of
NIH syndrome.
Note, however, that there is no textual language definition associated with nouns. There is no one "noun syntax." There will never be. We define the data structure, not the syntax. A remarkable variety of functions map invertibly between an atom (such as a text file) and an arbitrary noun. Except for standardization reasons, there is no reason to privilege any one such function. And as for standardization, Urbit's approach will shortly be clear!
Note the similarity of nouns to the Unix file abstraction. An atom is actually slightly cleaner than a Unix bytestream, because an atom's size in bits is data, not metadata. (From the bytestream perspective, direct atomic encoding ignores trailing null octets.) And of course, because nouns also have cells, they can represent
actual data structures directly - without encoding.
Nonetheless, if you look at pre-Unix OS, you see filesystems festooned with an enormous and inordinate variety of built-in file-type mechanisms. One of the greatest achievements of Unix was to take out all typed records and shoot them. Unfortunately, Unix did not provide for type at a higher layer, but it inevitably crept back in as filename extensions and MIME types. Extension and MIME together are not exactly the world's best type system, especially as there is no formal specification of a MIME type. But they work.
So it's not that Unix has no type system. It's that it has a
layered type system. A MIME type specifies a set of files - at least, a good MIME specification can be converted, typically with nontrivial human effort, into a boolean predicate function which tells you whether a typeless bytestream is or is not, say, a GIF. Because the generic data model (bytestreams) is not cluttered with quixotic attempts at defining image formats, the description layer remains clean.
Nouns are designed on this principle. For instance, while nouns have no string type, any string is an octet-string, and any octet-string is an unsigned integer, ie, an atom (conventionally LSB first). Because there is no way to tell whether an atom is an integer, a string, or something else, printing an untyped noun is at best a delicate and heuristic process. Of course, so is printing an untyped Unix file.
Adding a string type seems like a very simple improvement to nouns. In fact it would be a disaster. For instance, what character set are the strings in? Unicode? Great. You've just taken a data model defined in three sentences, and expanded it to a shelf-sized manual set. If you have a shelf that strong, why not just start with VMS? Or, of course, you could include some simple, broken, built-in string type, in which case you're making the same mistake that VM/370 made with its record formats: inserting a low-level facility which higher layers will (or should) work around, not with.
Moreover, note that nouns are acyclic. In my own personal opinion, the task of constructing formal descriptions of cyclic data structures is one of the most substantial wastes of human brainpower since Thomas Aquinas was a little boy. Like the Revelation of St. John the Divine, the foundations of mathematics, or the classification of butterflies, this problem appears capable of absorbing arbitrary human intelligence in both quality and quantity.
So why does Earth use cyclic data structures, and why doesn't Mars? This demands a digression on Nock and performance.
Cyclic pointer graphs are, of course, essential to the high performance of low-level programming languages, such as C. There is no C on Mars, but there is something quite like it. It is not difficult to define a low-level language in functional, Maxwellian terms: just model an address space as an atom, and iterative mutations of that address space as processing steps. If you define a C interpreter this way, you have a functional definition of C - which need not be executed by naive iteration. But I digress.
The point is: if your goal is merely to
model data, you neither need nor want cyclic graphs nor pointer identity. (Note that while nouns containing replicated subtrees can and should be represented physically as acyclic graphs, there is no Nock operator, like Lisp
eq, which reveals pointer identity.) Nor are cyclic structures required to
specify any interesting algorithm.
Rather, pointer cycles are useful only for efficient
implementations of certain algorithms. Which is not nothing. However, it suggests an approach to optimized noun processing. Specify your entire program in functional, Maxwellian code; if this is not fast enough, implement key functions in C.
These C functions, or
jets, will take noun arguments and produce noun results. In between, they can do whatever the heck they want - so long as they produce the same result as the actual Nock formula. They can even make system calls and have side effects, though this is a
very bad idea. A jet is not a "native method."
The assumption that any practical Nock interpreter will be optimized for the specific formulas it is expected to execute, jet-propelling all well-known inner loops, allows Nock to discard many programming-language features normally considered essential to efficiency - from cyclic graphs, to built-in arithmetic. Because its only arithmetic operator is increment, decrement in Nock is an O(n) operation. But this is not a practical concern, because a production Nock interpreter will recognize the standard decrement formula and substitute an efficient jet. The approach is quite scalable.
Thus the design narrative for nouns. The result is a design that
could have been invented in 1960 - and, in various forms more or less corrupt,
has been invented umpteen times since 1960. However, it also clearly benefits from 50 years of programming history, and it seems to satisfy
Saint-Exupéry's principle. Martian code may not always be this clean, but it should at least try to be.
Your reaction to nouns may be: but does it need to be
this clean and simple? Yes. It does. This is perhaps my most significant objection to "other peoples' languages." They all want to be clean and simple. Everyone does. But their idea of clean and simple: terminating bacterial growth. My idea of clean and simple: terminating atomic motion.
Which is not because, like Sir James Dewar,
Sir James Dewar
Is a better man than you are
None of you asses
Can liquefy gases
I'm a better man than they are. It's because I have a different problem to solve. If mine can be solved at all, which is not at all clear, it can be solved only at cryogenic temperatures.
In case this metaphor is too weird for you, let me make it concrete. Normally, when normal people release normal software, they count by fractions, and they count
up. Thus, they can keep extending and revising their systems incrementally. This is generally considered a good thing. It generally is.
In some cases, however, specifications needs to be
permanently frozen. This requirement is generally found in the context of
standards. Some standards are extensible or versionable, but some are not. ASCII, for instance, is perma-frozen. So is IPv4 (its relationship to IPv6 is little more than nominal - if they were really the same protocol, they'd have the same ethertype). Moreover, many standards render themselves incompatible in practice through excessive enthusiasm for extensibility. They may not be perma-frozen, but they probably should be.
The true, Martian way to perma-freeze a system is what I call
Kelvin versioning. In Kelvin versioning, releases count down by integer degrees Kelvin. At absolute zero, the system can no longer be changed. At 1K, one more modification is possible. And so on. For instance, Nock is at 9K. It might change, though it probably won't. Nouns themselves are at 0K - it is impossible to imagine changing anything about those three sentences.
With these basic conceptual tools in place, let's step back and design the Martian computing infrastructure. We have already started to look at it from one perspective: from the axioms up. Let's refocus and take a different angle: from the network in.
Strangely enough, some insight into Martian networking can be gleaned by
reading this Wikipedia page, or watching
this Van Jacobson talk. (
VJ, in case you're not aware, is a god in human form.) When I saw the latter I thought: "that bastard had my idea!" "Bastard" was not actually the word. It was a worse word - illegal, in fact, in many states. But of course, any truly Martian idea is far too simple to be original. Fortunately, VJ did not have my
whole idea - just the network part.
I call the Martian network architecture
source-independent networking, or
SIN. In almost every way, a source-independent network is exactly the same as today's insecure, unreliable packet network. However, there is one difference. To make IP source-independent, remove one 4-byte field: the packet source address.
In SIN, a packet's processing is independent of the sender. Its meaning must be computed from the packet itself. If you're not instantly convinced that this is right and good and true, watch the talk - VJ will tell you all about it. VJ more or less invented TCP, which is fundamentally source-dependent. So I'd kind of suggest you believe him on this one. It takes a true engineer to throw out his own mistakes.
On Mars, SIN is taken to an extreme. Logically, Urbit is a single broadcast network - a single big Ethernet wire. Everyone sees everyone else's packets. You don't send a packet. You "release" it. As a matter of practical optimization, of course, routing is necessary, but it is opaque to the receiver. If there is a routing envelope, it is stripped before processing.
An easy way to see this is to compare an Urbit packet, or
card, to an IP packet. An IP packet has a
header, which is visible both to the routing infrastructure and the application stack. The header, of course, contains the infamous source address. Urbit removes not just the source address, but also the rest of the header. An Urbit card is all payload - it has no region seen both by receiver and router.
A card is simply a naked atom - a large unsigned integer. Nothing about its internal structure can be assumed. Obviously, since Martian networking is insecure and unreliable, it may consist entirely of garbage. Anyone could have sent us this garbage. Moreover, whether a card is garbage or not, we can discard it for any reason at any level.
Of course, a card is not at all opaque or without structure. It would be a strange routing infrastructure indeed in which the routers never examined their payloads. But
logically, a card is simply a naked PDU (as the old WAP specs used to put it - if you hear an individual using this term, while he may be a fine individual personally, in professional life he is the only thing worse than a
Bellhead: a Euro-Bellhead; flight is indicated) in the Urbit protocol. Ie: an Urbit message, which can be delivered to any Urbit host, at any time, for any reason.
In Urbit, the function
is the protocol. Which is also the application. On Mars, there is only one application: Urbit. There is one operating system: Urbit. There is one network protocol: Urbit. One ring to bind them all, and in the darkness hold them!
On Mars, Urbit is TCP, Urbit is HTTP, Urbit is Linux, Urbit is Java, Urbit is Tomcat, Urbit is Mozilla. Urbit is the entire application stack above the network layer, on both server and client sides. There is no pre-Maxwellian application logic presently executing on Mars, although a few of the darkest, most ancient histories allude vaguely to such abuses - just as in the oldest books of Judaism, we see hints of human sacrifice. (Urbit is actually dedicated to the Roman bull-god,
Mithras.)
Below Urbit, as below any standard, is whatever it wants to be. On Earth: the classic American operating system. Presumably on Mars there is some classic Martian equivalent. Just as almost anything can run a Web browser, almost anything can compute Urbit.
Back to the network-up design. We have seen the network. Up to the computer.
The
formal state of an Urbit computer is its
deck: the list of cards it has received, newest first. Clearly, there is only one network on Mars. Just as clearly, there is no
pre-network legacy I/O. Everything the computer knows, it knows because it received a card that told it so. Even time can (and should) be
delivered by card.
Now, a small reality check. In practice, is it always practical for a computer to store all the packets it has ever received? In some cases, this is exactly what it should do. Disk space is much cheaper than network bandwidth. The ratio is only changing in the former's favor. In other cases, of course, storing the entire deck would be preposterous. Broadly, most servers should; most clients should not.
It is always practical, however, to define the computer's
semantics as a function of its deck. Ideally, these semantics should be crafted so that computers not needing the whole deck can devise some hack to avoid storing it. In general, one of the great benefits of tightly-specified systems is that they can be safely hacked to within an inch of their lives.
But what is the function? Ah. Well, this is where the vaporware comes in. I don't have the Urbit function, you see. I have a
description of the Urbit function - an
interface. Eventually, I expect to implement this description. The result will receive a Kelvin number - probably about room temperature. When Urbit reaches 0K, it is done and frozen forever. This will be as soon as possible, and no sooner.
What does Urbit do? Urbit defines the
functional semantics of a computer's deck. The deck is the computer's state; to compute, we must
query it. This suggests an obvious interface, which I render in pseudo-C:
noun
Urbit(noun path, atom_list deck)
{
return file;
}
In other words, the Urbit function is a
functional namespace. This pure, stateless function takes two arguments, a name or
path, which is an arbitrary noun; and a packet log or
deck, which is a list of atoms. It returns a value or
file, also an arbitrary noun. I trust that we are not far into rocket-science territory here.
The function can fail, of course. Like its computational automaton, Nock, Urbit is
crash-only. All errors are modeled as nontermination - ie, an infinite loop. Of course, your actual implementation need not spin forever upon mandatory nontermination. It is just a way of saying that all computation ends, and no result is produced.
(This is generally referred to as an
exit. An exit is like an exception, except without an exception object. If a Nock computation does not exit, but succeeds, it is said to
pass. If it is aborted externally for failure to compute, it is said to
tank. Thus a mandatory or formally detected infinite loop exits; a heuristically suspected infinite loop, or any other computation judged impractical, tanks.)
When Urbit passes a file for some path and deck, the path is said to be
bound in that deck. When Urbit exits, the path is
unbound. The path-deck-file triple is the
binding.
Urbit is not just a functional namespace, however. It is a
static functional namespace. That is: its definition ensures that for every binding
Urbit(path deck) == file
we also know
Urbit(path [card deck]) == file
In other words, once a binding is determined, new cards cannot revise it. Once Urbit learns something, it cannot un-learn it. A functional namespace obeying this property also could be described as
monotonic, but I prefer the simpler
static. Static namespaces, for instance, are common in
revision-control systems - for obvious reasons.
I hope you'll agree that this problem definition is not rocket science. Surprisingly, however, the legitimate FP community (by which I mean the Haskell community) is still searching for it.
As
Conal Elliott writes:
This perspective is what’s behind my confidence that we will someday really learn to look at whole systems in a consistently functional/denotational [style] (simple and precise semantics).
Read the whole thing. Haskell fans, you'll know exactly where you can stick your monads. Indeed, Elliott on monadic I/O sounds a lot like VJ on connection-oriented networking. This is not, I believe, a coincidence. If Urbit is like anything, it's like Elliot's
FRP - though with no monads, gonads, morphinisms, monoids, combiguators, or other implements of metamathematical torture.
Now, let's step back a moment and look at the most important aspect of the Urbit function. The most important aspect of the Urbit function is that it is
frozen. It is nothing special to write a function which maps path and deck to file. It is something else to
freeze one. Everyone on Mars uses the same Urbit, and has for 49 million years.
It is also something else to make that function
universal. Mars has no computational infrastructure other than Urbit. Code on Mars is of two types: code within Urbit, and implementations of Urbit. In practice, this means that Urbit has to be able to do anything any application can do, and do it well.
So we have the Urbit problem: a frozen, universal, static functional namespace. In other words: a
standard static functional namespace. Perhaps we are getting further into the rocket-science department. It is not hard to write a functional namespace. But how can we standardize a functional namespace? You will search the
RFC Editor's shelves in vain for anything even remotely like Urbit.
In general, IETF RFCs and things like them (IETF is at the top of a tall food-chain of Internet standards organizations, descending in quality) do not standardize functions. When they have to standardize functions, they do not use any existing programming language. They use pseudocode - in other words, informal English. This is not out of stubbornness or stupidity - it is exactly the right thing to do.
The problem is a mismatch in objectives between the people trying to standardize languages, and the people trying to standardize protocols. There is simply no Turing-complete function definition format that is small and stable enough to reference as a dependency in an RFC. If you're writing an RFC, here is the type of external standard you want to depend on: ASCII. Even XML is a bit dodgy. But even a "simple" practical language like Scheme is defined by a 50-page manual. Show me a 2-page spec that depends on a 50-page spec, and I'll show you a 52-page spec.
In an RFC, pseudocode or English is better than Scheme, because Scheme would give a false impression of precision. Scheme is defined by an old-school language-standardization process whose goal is to terminate bacterial growth. Internet standardization means terminating atomic motion. Scheme seems pretty cool to Schemers, but it's nowhere near chilly enough for Urbit.
Urbit, of course, is defined by a Nock formula:
Urbit(path deck) == Nock([path deck] urbit-formula)
Since Nock is specified in less than 200 words of English, so is Urbit. The formal precision of Urbit is the formal precision of Nock. The formula does not add to this total - it is code, not English. And Nock, I think, is about as precise as a practical automaton can get.
Of course, this
urbit-formula
noun is not at all a pretty thing - a large (though not vast - certainly measured in kilobytes, not megabytes) tree of 0's and 2's and 3's. The definition of Urbit is certainly precise, because Urbit is defined in Nock, and Nock is precise. However, since it is not easy to write a big tree of small numbers, surely it is quite impossible for everyone to
agree on a big tree of small numbers. Standardization requires extreme clarity, whereas Nock formulas are extremely obscure.
Therefore, we need another deliverable: a human-usable language,
Watt. Watt is defined as follows:
Watt(source) == Nock(source watt-formula)
Thus, for instance,
urbit-formula == Watt(urbit-source)
== Nock(urbit-source watt-formula)
watt-formula == Watt(watt-source)
== Nock(watt-source watt-formula)
In English, Watt is a self-compiling compiler that translates a source file to a Nock formula. Watt is formally defined by the combination of
watt-source
and
watt-formula
. The latter verifies the former. Again, we are not in rocket-science territory here.
To bootstrap Watt, we write Watt in C, then use that implementation to write Watt in Watt, then ensure the two are equivalent. Thus, we never have to write anything in raw Nock - and we never do. (Except as an exercise, of course.)
Thus, to agree on Urbit, here is what we need to agree on: the Nock spec, the Watt source, and the Urbit source. The Nock spec is pseudocode, but very short and already very cold. Watt and Urbit are Maxwellian source, both readable and well-defined. (Watt, of course, must be not just a usable language, but an
excellent one.) They should also be small. Definitely under 5000 lines of Watt - ideally, under 2000. Anything much larger would be almost impossible to get to zero Kelvin. As it is, the task will be quite difficult.
Should this be submitted to an actual standards organization, such as IETF? Maybe, but maybe not. As a Marxist might say, the historical mission of the standards forum may be reaching its end. An even more challenging test for a proposed standard, such as Urbit, is that it be suited to standardization
without a formal process. If it is hard to standardize a system through IETF, it is even harder to standardize it
without IETF. Obviously, any challenge that forces Urbit to be more Martian is good for Urbit.
This poses an obvious question. Is this really possible? How in the world can a 5000-line program, in any language, constitute an entire planetary software infrastructure? We may be able to freeze Urbit - but can we make it universal?
If Urbit has any hope of universality, it has to do something else which is traditionally Not Done (tm). It has to run code off the network. To run code off the network is to extract a source tree, or other application representation, from the deck, and execute it within the Urbit function. Yikes!
It is easy to see that execution of remote code is an absolute requirement for any source-independent network. Since the network does not have a source address, it must validate packets through cryptography. Since no encryption algorithm is proven secure, any serious functional namespace must be able to upgrade its crypto algorithms. To validate packets secured by the new algorithms, it must run code that was delivered as data - secured by the old algorithms.
Again, we see why a new language is needed. The reason, as always, is a new problem. The only family of functional languages designed to execute code as data is the Lisp family. For instance, to write Urbit in Haskell, you would need a Haskell interpreter written in Haskell. It could be done, but I don't think it
has been done; and Haskell is certainly not designed to do it
well. Therefore, the revolution will not be Haskellized.
Most Lisps can compile and evaluate remote code. At least, if your Lisp can't, it is not much of a Lisp! However, I know of no Lisps
designed to solve this problem. I also know of no Lisps which are anywhere near as frozen as Nock. Hence: a new language.
In case you're still wondering whether Urbit can actually serve as a frozen, universal programming environment, let's pull the camera back and examine this monster in the large.
Any Urbit computer may be described in three ways: by what it
hears, what it
knows, and what it
does. By analogy to
another famous TLA, we can call this the
HKD model.
What the computer
hears is the list of cards it has received - its deck. One computer, one deck. Not even the computer's identity is defined outside the deck. Its identity is a function of what it has heard, which is the deck. What makes a deck, say, Igor's deck? A deck is Igor's deck if it contains Igor's secrets. This enables it to decode Igor's email, display Igor's photographs, talk to Igor's bank, etc. Generally, it enables Urbit to compute Igor's applications.
What the computer
knows is the Urbit namespace: the meaning of the deck. For instance, suppose your computer wants to display the front page of the New York Times. "The front page of the New York Times" is not a phrase that can be mapped to a static path. "The front page of the New York Times at 6:15:01 PM, January 12, 2010" is. Similarly, "the image on Igor's monitor" is not a static path; "the image on Igor's monitor at 6:15:01 PM, frame 12, pixel (653, 24)" is.
If Igor is reading the Times at that time, Urbit must
know that Igor's monitor is showing an image that is a rendering of the Times front page as of that time. Of course, this requires Igor's deck to actually contain an up-to-date copy of the New York Times.
Which presumably requires Igor's computer to have contacted that of the Times, etc, etc. How does Urbit
send cards? It only appears to receive them. Is it just half the protocol definition?
This gets us to the third part of the HKD model, which is what the computer
does. Which is what? What does it do? Oddly enough, we haven't said anything about this at all.
Answer: what it
does is its own damned business! Urbit is no more than a recipe for how to interpret cards. A computer must
act; it is useless unless it causes some side effect, whether by actual, physical output, or just by releasing more cards. No standard can constrain its actions. All Urbit can do is to define the knowledge it needs to act; all it can hope is that this knowledge is computed by the Urbit namespace, not
ad hoc in the traditional manner - or worse, by a competing functional namespace.
So, for instance, Igor is reading the New York Times. To display the Times, his computer must contain the Times. To get the Times, it must contact the Times. To contact the Times, it must release and route cards. In other words, Igor's browser application must
know that, at some time T, Igor does not have the Times; Igor wants the Times; and Igor's browser, to satisfy this request, proposes to release certain cards, with envelopes that should route them to the Times. All this, Urbit can
know.
Urbit can specify all this knowledge - or, to be more precise, compute it from Igor's deck, which presumably includes his keystrokes and mouse movements. But Urbit cannot
do. Urbit cannot force an Urbit implementation to actually
calculate Igor's browser requests, and route the resulting cards to the Times. It can define the calculation; it cannot cause it to happen, or cause the results to be used.
And that's all for this week. However big your brain is, I suspect it is probably full.
Next Thursday, we'll dig a little deeper into Urbit. Which
is vaporware - but not as vaporous as this post might make it seem. In particular, although there are many enormous problems, both practical and theoretical, in making this architecture work, the most significant impediment is the problem of
allocating the namespace. After all, since everyone on Mars is running the same function, they all have to share the same static namespace. One can imagine some conflicts.
Any good namespace allocation must solve the unsolvable problem known as
Zooko's triangle. The problem is, of course, unsolvable. Therefore, Urbit does not and cannot solve it. However, in the classic OS style, we can score 100% on one vertex and 75% on the other two. This quasi-solution is a design I call
digital feudalism - of which more later.
If you're interested in joining the Urbit conspiracy, there's a simple step you can take. Take the
Nock challenge - that is, write a Nock interpreter, then a Nock decrement formula, and mail them to me. (My Google account is
cgyarvin.) While this task is of no use whatsoever, and (as previously noted) decrement is the only formula you will ever write in raw Nock, it makes an excellent
pons asinorum. If you succeed, you know your Nock.
In my original post I said that this was "not a difficult problem." This may have been an overstatement. With 16 (unspecified) prizes to distribute, I've received only
two (unsolicited) solutions. One of these lucky winners is a math major at Caltech; the other is a math Ph.D in Budapest. So perhaps it is
slightly difficult.
But still, you can do it! And now that you know what Nock is for, perhaps you will. He who controls the spice controls the universe...