T-expressions: Pairs and Unions

October 27, 2016

The more I think about T-expressions the more I feel that the dotted (setlike) portion really needs to be recursively decomposable. This means that in, eg, a T-expression like

(a . 1 2 3)

the expression ‘really’ needs to be represented internally as

(a . 1 . 2 . 3 .)

and then the parsing and printing convention, is just to suppress all dots after the first.

This means that a setlike expression breaks down into a pair of an element and a set, and the final set is always the empty set (.)

We could call such element-and-set pairs unions, to distinguish them from the more conventional conslike pairs. (We could also call a union a column and a pair a row).

Types would break down as:

– symbol- number
– union (starts with .)
– term (starts with |)
– pair (starts with .)

Under the more strict (and simpler) T-expression semantics I’ve been thinking about recently, there would be no auto-expanding of setlike expressions inside other setlike expressions. This makes finding equivalents for car/cdr much simpler:

first -takes the first element of a union, or simply returns a non-union
rest – takes the second element of a union, or simply returns a non-union
head – takes the first element of a pair
tail – takes the second element of a pair

The tail of a T-expression pair is always a set, even in the case of a simple list such as (a b c),  which makes parsing them quite different from S-expressions. To get the equivalent of (cdr X) you would have to take (first (tail X)).

It might be useful to define car as (head first) and cdr as (tail first).

A list probably isn’t the best way to store sets internally. An obvious alternative would be a hash or tree structure (say a red/black tree), and it might be useful to have varying kinds of internal representation. But we still need a way to construct and serialise sets, and that’s probably going to need some kind of canonical ordering.

It becomes especially important if we want to have a Prolog or Kanren like language produce a sensible datastructure for its output: it would be nice if all predicates could return a first-class set of results rather than an ‘answer list’ which is almost, but not quite, entirely unlike a set.

Canonical orderings of computed sets is probably a complicated subject. There’s probably no way to optimise for both searching and generating a computed set. In the Prolog/Kanren way, I think the answer list of a predicate would be in a defined order but not necessarily one well-suited to searching. The idea would be to avoid even generating more sets than necessary.

The Primitiveness of Primitives

September 26, 2016

It’s generally understood, I think, at least in theory if not in practice, that complex systems are best constructed out of a small set of universal primitives which have a balance between expressive power,  and enough restriction to make them able to be reasoned about. It’s usually a good thing if the primitives can then be recursively composed into high-level components or operations that allow higher-level reasoning. (Ie, stacked inside each other at arbitrary depth: as, for example, both ‘data types’ and ‘functions’ can within a language, even one as reasonably primitive as C).

On the planet-spanning Internet in 2016, you might think that we had evolved such a sufficiently powerful and flexible layer of software composition. Rather incredibly, after nearly 40 years, we find that we really only have two sets of universally agreed on primitives to connect all our software. And neither of them are recursively composable.

  1. Within a computer:
    1. Read a machine word from RAM at an address
    2. Write a machine word to RAM
    3. Execute raw machine code at a RAM address
    4. Call an operating system function using an API based on (1,2,3) plus, sometimes, ‘machine interrupt’.
  2. Between two computers:
    1. Send a TCP/IP packet (of bytes, not machine words) to (IP address, port)
    2. Receive and process a TCP/IP packet on a process listening on a port

There’s a lot more, of course. Vast stacks and layers of APIs and abstractions. But there’s not much in the way of universally agreed on layers above these two.

Eg, we have HTTP: GET a document at a DNS address, POST a reply, maybe (if you’re lucky) PUT a document (but you’re more likely to have to use some not-especially-well-adopted extension like WebDAV, CalDAV, etc).

You can’t, though, use HTTP to abstract all of TCP/IP. You’ll have to use a whole patchwork of different protocols to get different results. SSL/TLS for encryption may or may not be available for every protocol. You can encapsulate things, badly, in various VPN and tunnelling protocols that are extremely brittle and fragile. You can’t even abstract all of IP with TCP. IP packets remain the only universal lingua franca between computers. (And sometimes not even that – eg Bluetooth, NFC, and all the other Layer 2 protocols – but we’re being optimistic here).

Inside a computer it’s far, far worse. We have such a Babel of languages and binary formats that put data in a machine address and execute code at a machine address remain the only universal connections between our language stacks – and it’s pretty obvious, looking at these, why we have such a security disaster: because executing raw machine code truly is Turing-complete in all its horror. It’s like a society built on sending messages by injecting yourself with used syringes. Once you execute raw machine code – unless you’re 100% certain you hand-loaded it and triple-checked it with mathematical proofs – you’ve pretty much lost any security guarantees you ever had.

It would seem obvious that these two devices (send an IP packet, and execute machine code at an address) are incredibly poor as universal abstractions! Neither of them actually correlate to what’s taught as the ironclad foundation of computing in computer science classes: function evaluation. Sending an IP packet *might*, roughly, correlate to an ‘object call’, except that it’s incredibly coarse-grained (machine, port), there’s no notion of object identity, creation or destruction, and there’s no mechanism for marshalling or referencing other objects.

We do have, of course, protocols for ‘remote procedure call’. Many of them. RPC, SOAP, CORBA, DCOM, etc. They are not, however, universally used: there’s no agreement on which is appropriate when, and the answer very often is ‘none of these are appropriate’. Their semantics are complex and all subtly different, which seems kind of odd since ‘function application’ is something you’d think would be mathematically fundamental, simple, and unchanging.

What, I wonder, might be the simplest possible safe and composable primitive for communicating functions, objects or processes (preferably not making any kind of type distinction) both within a machine process and to arbitrary remote machines?

By ‘safe’ I mean: one which does not (like ‘execute machine code at this address’ does) open an entire machine state to manipulation by remote actors, but allows only strictly bounded amounts of 1) machine state change and 2) usage of transmit/compute/storage resources?

By ‘composable’ I mean: a single primitive type that can be user-defined and stacked inside itself, using controlled recursion, to create arbitrarily complex structures or operations.

You’d think this would have been the #1 priority to solve back when the Internet went live, and yet, here we are decades later, and it hasn’t been, and the Internet’s on fire.

A generous evaluation as to why this is: the design goals of the Internet (and, well, Unix and C, because that’s what drove most of it) never even remotely included ‘safety’ or even ‘composability’, and certainly not ‘privacy against hostile nation-states and organised crime’. It was simply assumed that all Internet users would be sufficiently smart and sufficiently careful. What the developers of the Internet, C and Unix wanted – and got – was simplicity of implementation, speed of execution and absolute freedom for the endpoints. Period. Send a message, read/write RAM, execute code. Everything else is the programmer’s problem.

The problem is that we now know, that’s not enough. We need a communication/computation fabric which can make some basic guarantees (some of which are potentially in conflict): that you can send a message that’s not been overheard or altered in transit, that you can identify the source of a message and refuse receipt of irrelevant or fraudulent messages, and that you can limit the resources a message uses or the changes it makes to your state. On the other hand, we don’t really want another OSI model. But can’t we at least get something like ‘function call’? With something just a little more structured than ‘raw array of bytes’ and safer than ‘C pointer’ as the universal message type?

Negation as a Type

September 26, 2016

If the expression

(|not x)

were something like an instance of a datatype, what might ‘not x’  mean as a value? How would one use it?

I imagine there’s not much you could do directly with such a datatype other than 1) add it to a collection (such as |all or |only), 2) search that collection to decide if it asserts or does not assert a value (and obtain a collection of those values), and 3) extract the ‘positive’ or ‘absolute’ value from it (ie the statement which is being negated).

It seems odd, though, that the idea of such a ‘negation type’ doesn’t seem to exist as such in the usual literature. It seems like an obvious generalisation of basic arithmetic to logic. Perhaps it exists somewhere in the literature of modern (computing) type theory? But which type theory, I wonder?


Delta terms and functors

September 22, 2016

We can construct a workable dataset delta – a set of logical propositions showing change in a set of logical propositions – by just adding a (|not) to our existing (|all) expression that indicates a conjunction (something similar to, but not quite like a set – depending on whether or not we want to give it a hard boundary like a set has). But for practical use it would be nice to add a second term: (|only).

‘All’, ‘not’ and ‘only’ act here like the operators +, – and = from my earlier idea of ‘delta expressions’, just mapped into term-expression syntax.

The usefulness of (|only) is that we have a lot of cases in real-world data where a given property or slot of an object-like structure has a many-to-one relationship (ie, a property that can only take one value) or where we want to express the idea of ‘updating’ it rather than just ‘adding a new piece of information to it’. For searching, it’s also really useful to encounter something like (|only), because it means we can stop the search at that point, making it more efficient.

I thought we could maybe get away without needing (|not) entirely, but I don’t think that will work. So (|all), (|not) and (|only) are I think a minimal useful set of such operators (or functors, in Prolog-talk, which expresses perhaps better that these aren’t necessarily functions or function-like things: they’re syntactic markers in data which might have standard interpretations, but don’t have to rewrite to a different value).

The hope is that we could use one set of standard functors to make a data format that can describe both changes to information, and static information. So we could have, eg, a person record, using a functor to be something like a datatype indicator:

(|person |all (name |only Fred) (age |only 42…) (hobbies stamp collecting) (|not smoker))

etc. Where we’re explicitly flagging in a record that certain information is authoritative, or overrides any other information.

Marked up like this, ‘only’ (and ‘not’) start to look a little like data types themselves. I’m not sure that they work quite the same as the more familiar data types we know in computing, but it’s interesting to think of them that way: that (|only . x) might be a ‘typed’ or more narrow version of x, and that wherever we see a (|only…) expression we know it obeys a particular law: that it replaces information rather than adding to it.

A secondly, slightly related, thought:

Thinking a little more about functors (ie, heads of term-expressions) and how we can assign meaning to them: it seems the meaning of a term s probably a function of 1) the functor definition, 2) the body of the term, and 3) the context (ie, all the expressions ‘to the left’ of the expression). The context (3) being roughly equivalent to, or a generalisation of, the ‘dynamic environment’ as opposed to lexical environment of fexprs (and vau expressions in vau calculus).

I’m wondering now if we could make that idea of context a little more restricted by forcing it to be the functors of all enclosing expressions, not just ‘all terms in all enclosing expressions’. Would that lose us a useful degree of freedom?

Eg: say we had a (|let) form that worked much like Lisp. If we weren’t restricting context to functors, we could say

(|let (a 1) |a)

such that |let binds the name a to the value 1, |a refers to that bound value, and we could then say that the meaning of the new functor |a we’ve created is defined by the enclosing (|let) context. But the information that defines the meaning of that functor isn’t in the enclosing functor, it’s in the term body. If we restricted meaningful context to functors, we might have to write that as

(|(|let a 1) |a)

Which is syntactically clumsier, but does put all the meaning into the head of the term: we’re creating a new combined functor (|let (a 1)), and using  that to give meaning to the otherwise undefined functor-call (|a). Among other things, this seems like it makes tracking the ‘functor environment’ a lot simpler. It’s just a straight list of all the heads of enclosing terms. Otherwise, I think it would have to be quite a large expression.

(We still probably wouldn’t want functors directly accessing that context term except in very restricted ways as it makes their meaning hard to predict. )

We could *maybe* relax even more and assume that a list as a functor is a term

(|(let a 1) |a)

but really not sure I like that at all – we might want a list-as-list to mean something.

This takes advantage of something that the HiLog people have pointed out for a while, which is that there might be an improvement in expressiveness by allowing functors to be complex terms. But do we win by enforcing rather than just allowing that?

If we were able to rely on something like currying, such that the meaning of every token in the body of a term was completely defined by the expression syntactically to the left of it, we could use the list of all such left-expressions. But I’m not sure we can rely on currying – it’s just too useful to distinguish functors by arity, so that (|frob bla) can mean something different from (|frob bla1 bla2).

Potentially messy to look at, but perhaps interesting.

In normal Lisp, we’d handle this by restricting the idea of ‘meaning’ to ‘environment’. It’s just that I’m trying to generalise from Lisp toward much more data-centric systems which may not necessarily be doing evaluation.



A For Loop for Logic

September 21, 2016

Taking an opposite turn from the exploration of ‘generalised OR as computation’, which is about peeling apart the atoms, and putting them together: here’s a mostly Prolog-like logic-programming clause which works a bit like a function, mixing variable definition with an implication.

Using term-expressions (ie, writing | in a list to indicate that it’s the start of a Prolog-like logical term), and defining the functor ‘for’ to declare variables, we can have:

|for (varlist) (pattern) … list of conditions


(cat likes |for (x) (x) (x isa fish))

This is almost, but not quite, like a Prolog clause-and-implication structure. It does three things at once:

1. Declares a new variable ‘x’. (As a Prolog clause boundary does, and as Kanren’s ‘fresh’ operator does). The syntax here assuming that we just replace any occurrences of the declared variable tokens as variables. That’s maybe a bit ambiguous, but is simple.

2. Produces the pattern ‘x’ (substituting in any possible value for the variables – and potentially producing unique ‘unknowns’ as Prolog does, and which don’t quite have any counterpart in functional languages). This can be understood as a return value, except that as a relation, this might return multiple such values, lazily. (If it returned them eagerly we’d have a problem because it might be an infinitely large set of values).

3. Restricts the pattern to the list of conditions (binding variables by unifying as it evaluates). This would work just like the body of a Prolog rule.

The main difference between this and Prolog is that we can see that this is a functional form, that it’s returning a value (a list, or a set of lists), and that it’s putting that value in a very specific place. The sense of place, of recursive structure, is an important modern feature that Prolog, with its 1970s-era central database, lacks.

Using a ‘for’ declaration like this calls to mind ‘forall x’ in logic, but also a ‘for’ loop in programming – it makes it clear that introducing a logic variable is like creating a loop, or a setlike structure.

I like this a lot, but I still don’t like that there’s no apparent way to resolve the split between ‘unification environment’ (the logical variables and their bindings, including their ‘next value’ which must be something a bit like a closure) and the ‘logical database’ (the set of currently defined statements). In a functional language like Scheme these are both part of the same environment. Or at least, the functional equivalent of ‘unification environment’ is sort of the activation records (stack and/or heap).

micro-Kanren implementations seem to do various tricks to create ‘unknowns’ as unique objects – sometimes just by incrementing a system-global integer – which is, um. It works, but doesn’t quite seem right to me. Certainly doesn’t seem like something that you’d want to scale up to the Internet. Generate unknowns as GUIDs? And then the unification environment is a big old list, which is fine, but…

It seems like unknowns ought to be something like closures, but that probably means it would be very hard to unify them. You could make them OOP objects, I guess, with ‘unify’ methods, but again that seems like putting the cart before the horse. Why should logic programming be so much more complicated at its foundations than functional programming? Or rather why are there concepts – like unknowns – in logic resolution which don’t seem to have have any counterparts in functional theory? (Other than type theory, which basically is logic resolution with the serial numbers filed off).

What I find especially odd is that functional theory as used in modern computation (eg lambda calculus and combinatory logic) was invented as a method of doing logic – but we’ve never used either lambdas or combinators for logic programming! The logic programming people had to stumble onto a computation kernel (Robinson resolution with backtracking over Horn clauses) by a very different path.

(Both lambdas and combinators were in fact rejected for logic because they were inconsistent – produced paradoxes – which explains why the Prolog folks didn’t use them, but somehow this didn’t stop Lisp from using lambdas.)

I often feel with logic that I’ve stumbled into a vast, crumbling, abandoned library written in a dead language. Programming is a bustling city, but logic outside of the bright lights of simple Boolean algebra is… extremely hard to parse out exactly what problems they were trying to solve. Why was any of this infinity stuff considered important? Life went on perfectly fine without it over in Computation City. The problems we’re now trying to solve where logic seems like it might be applicable seem much simpler and more practical than ‘represent all of maths’: how to search databases, how to prevent buffer overflows. But even the simple things still seem unnecessarily complex, which makes me think we’ve not quite got the foundations right.

That is to say: now that the Internet is up and running and we’ve moved most of our civilisation online, the problem of computing is ‘how do we efficiently and safely represent and navigate all of human discourse’, as opposed to ‘how do we invent a mechanical oracle which, given carefully restricted input on any possible theoretical question, will give us absolute truth as output’.

I feel that many if not most of the formalisms of logic were created to solve Problem #2 – something we don’t actually care about today – as opposed to Problem #1. The good news is that restricting the universe of logic to discourse – words, images, ideas existing in symbolic form – rather than abstract theoretical concepts like towers of infinity, may mean the problem’s more tractable! But it also means that a lot of the constructs invented for Problem #2 just aren’t applicable, and also get in the way. ‘Restricting the input’ is not an option when the input is given for you.

If we want to use logic for something like Vannevar Bush’s Memex, Douglas Engelbart’s NLS, Ted Nelson’s Xanadu – decentralised personal databases of texts and concepts – we need a logic that’s very simple, durable, makes almost no assumptions about the data that’s fed in, and can let us search and index and update and make basic deductions reasonably safely and without infinite loops. If there’s data we don’t know how to handle, we should preserve it untouched and just leave it alone. That’s all we need, really.

Some of the things we need to do, but can’t do easily right now:

  • Manage collections of messy data. Unstructured, ragged collections that aren’t perfectly rectangular tables.
  • Deduce facts from collections of data and rules. Assume there are probably many, many contradictions. Don’t blow up if this happens. Be conservative and handle the error gracefully.
  • Given two sets of data, produce a change delta – and given a delta, apply it to create another set. Apply deltas to deltas. Preferably every data set should itself be a delta.
  • Assume these sets may be very large, infinite in multiple dimensions, so only compute as much data as you need at any one time.
  • Have one data structure that can cope with everything. Something like objects, but with fewer assumptions. Something so simple anyone can implement it.
  • Allow modelling of absolutely ANY system (including yourself – compilers compile themselves, after all). Make as few assumptions as possible.

Modal logic, AND and OR

September 20, 2016

Thinking further about the odd (yet convincing, to me) idea that AND is a data-level construct while OR/XOR are control-level constructs, and what this might imply for logic systems:

The difference between ‘testing’ vs ‘asserting’ a logical connective like AND and OR comes down to much the same thing as the difference between logical ‘possibility’ vs ‘necessity’. For AND, possibility implies necessity: for OR/XOR, this is not the case. This is an idea from modal logic, but not perhaps a full implementation of modal logic.

(This is to say that all facts we feed into the database of a logical system are to be taken as ‘necessary’ because they are given: there is no interpretation or world-model we can come up with where these can be disputed. Any facts which are NOT given, but COULD be true in some world model that would still be consistent with our input, are ‘possible’. This is not the only definition of necessity/possibility, but it’s the one that’s relevant both to searching a logical database, and to my approach to AND/OR).

For AND, it is always true that the facts ‘A’ and ‘B’ taken together in the database imply ‘A AND B’. There may be some mechanical shuffling to get from there (eg our actual database may say ‘A and (B and C)’ from which our search procedure may rearrange things to get ‘A and B’) but they are very strongly associated.

For OR, though, the presence of ‘A’ in the database ALLOWS us to assume the much stronger statement ‘A OR B’ – but it doesn’t make this statement a NECESSARY deduction. We could only deduce ‘necessarily A OR B’ if our database contained (some mechanically-shuffled version of) the actual statement ‘A OR B’.

However, to look at our current world state and ask ‘is (A OR B) currently true?’ – evaluating a truth-function – is the same as asking ‘is the strong connective statement (A OR B) POSSIBLE?’

We see this clearly in material implication. The usual human meaning of ‘implies’ revolves around NECESSITY: ‘A -> B means that there is NO POSSIBLE WORLD where, given A, we can have not-B’. Looking just at the current state of the world, seeing B, and from there deducing ‘A -> B in all possible worlds’ is obviously not a valid inference. The other side of the coin, the truth-function ‘A -> B’, true only if A and not B, has very little use or meaning in the context of ordinary human logic, which is what makes material implication so weird. But we could understand it as asking ‘could it be POSSIBLE that A implies B?’ Yes, if true it could be possible – but most humans would want some evidence (perhaps a large number of cases, with few or no exceptions) before inferring the rule itself.

So we see that logical connectives as asserted statements and logical connectives as evaluated truth-functions  are two very different concepts, and we must not confuse them. But we could perhaps collapse ‘asserted statement’ with ‘necessarily true truth-function’, and ‘evaluated truth-function’ with ‘possibly-true statement’, to get a sense of how the two concepts overlap and complement each other.

In Prolog, this possibility/necessity, or evaluation/assertion split is very easy to see, because we see two very separate things:

a) asserting a fact or rule into the database (which is either given, loaded from a file, or must be wrapped in an ‘assert’ if we do it at runtime)
b) testing the current state of the database using an ‘AND’ or ‘OR’ predicate – which can only occur as a case of testing a higher-level predicate which ‘calls’ it

Note that Prolog does provide a built-in way of asserting ‘AND’ (implicit in the database when we assert multiple facts), but does NOT provide any built-in way of asserting ‘OR’. However it does allow us to assert an implication, with :-. The symbol used is that of the sequent calculus turnstyle – for ‘deduction rule’ – not the material implication ‘<-‘. This is because although Horn clauses may be defined like material implications as ‘OR’ forms, Prolog treats them in practice more like sequent rules: we are not allowed to infer them except in a strict sequence by following other such rules.

On the other side, Prolog has predicates to evaluate both ‘AND’ and ‘OR’, but does not have a predicate for testing material implication. It does though have a predicate for testing the existence and reading the contents of a :- rule (although this then gets tricky because of variable scoping,  the non-recursively-defined nature of Prolog clauses, and Prolog’s lack of a ‘quotation’ construct as in Lisp).

What the Prolog resolution algorithm doesn’t make clear though is why Horn clauses must be treated in this special way (like sequent rules, only inferred as guided by other rules) – and this makes understanding the algorithm hard.

The odd but incomplete parallelism between sequent rules and implications has always bothered me, but this explains it a little. A sequent rule is really just an implication that we are not allowed to infer without cause. But what if we made all implications like that? More particularly, what if we made all disjunctions like that?

I believe this insight about the fundamentally different levels of operation of OR vs AND has been hidden in Prolog and other systems based on sequent calculus because of the artificial construction in modern logic of using two different operators for implication-as-possibility (arrow <- ) vs implication-as-necessity (turnstile🙂. They’re not actually two different operations! Just two different modes of applying the same concept. And since implication is not fundamental but just a combination of OR and NOT, then it’s OR that we should be looking at.

The idea that OR is naturally a higher-level construct than AND – or rather, that both AND and OR have ‘possibility’ and ‘necessity’ modes but that AND always combines the two, so that it is very easy for us in Boolean logic to confuse ‘possible-OR’ (function) with ‘necessary-OR’  (assertion) – would lead us, perhaps, to a Prolog-like design but with a few changes: for example, perhaps generalised OR (or XOR) might be allowed as a top-level assertion (in the same way that Prolog allows implications) – with the awareness that evaluating the truth-function predicate OR (eg, testing for ‘possible OR’) is not the same thing as testing for the existence of OR-clauses in the database (testing for ‘necessary OR’).

(Also keeping in mind that ‘database’ should actually always just mean ‘AND-statement’. An important required feature of any logic system in the Internet age is that data should be recursively structured, should be kept local where possible, that entry and exit points of data from systems can be delimited, and and sets of data that represent entire systems must be able to be communicated as a single unit.)

This would require that variable scoping and allocation is firmly separated from logical connectives – as it is in Prolog (though non-nestable clauses are a strong restriction), and also in Kanren and most, though it’s tempting elsewhere to think of variable scoping as linked with ‘defining a rule’.

Would there be problems (eg loops, or inefficiency) with such a generalised necessary-OR design that Prolog’s restriction to Horn clauses mitigates? Perhaps. But it seems worthwhile to explore.

For one thing: keeping ‘possibility’ and ‘necessity’ modes separate would maybe help us think about ‘testing’ as ‘speculating’ or ‘constructing’. It would be more obvious to see that we can speculate both about the current world state  (ie, constructing inferences as call-chains on the stack or heap) – but we can also speculate about rules that are consistent with the world state (ie… self-programming or machine learning – or more prosaically, ‘adding runtime assertions’ or ‘allocating objects on the stack/heap’, which is something functional systems do all the time).


A Language for Interfaces

September 19, 2016

A shorter statement of what I see as one of the major problems in computing:

We have no or very few (formal, machine-parseable) languages capable of describing the interfaces between computer systems. But increasingly our systems are made up of these interfaces. And faults in the interfaces become faults in the whole system.

What do I mean by interfaces? I mean the parts where our systems overlap. No, not the function calls, the Web sockets or even the files. I mean at a deeper level of abstraction than that. Deeper even than that. The levels of abstraction themselves.

We have chips written in Verilog/VHDL with CPUs running system software  written in C, communicating locally using USB and SCSI to and globally over Ethernet and TCP/IP to SQL databases. And all the stacks on top of that. To the extent that small teams of theoretical pioneers have been able to formally describe and prove much of that, it’s been a subset of C and a subset of a CPU / machine architecture.

Do we have a formal language that can describe how VHDL interacts with C, C with USB, USB with TCP/IP and TCP/IP with SQL? Does even saying that sound weird, as if it’s a category error for a trained professional to even try to think about how such disparate subsystems described in such vastly different paradigms – even different disciplines – might overlap?

Then that might be a problem. Because all these separate languages are systems; the systems they implement certainly do connect in our real, physical computers; and if we can’t specify how they connect, how can we know what our real, physical computers are actually doing?

You can certainly define systems as separate theoretical worlds using separate abstractions. But if you can’t define the interfaces between those systems and those abstractions, you haven’t finished the real job. Possibly haven’t even begun it.

So far it seems the proof-theoretical people have been using Coq as their ‘glue language’ to wire everything together. I’d like to think there was something better than Coq possible.

I think a good start is to ask whether the idea of ‘category error’ is not, itself, an error.

Four-State Logic and Disjunction Introduction

September 19, 2016

If you have a world (a set of logical propositions) and you want to represent changes to that world over time (or perhaps space, or inheritance, or any other comparison), you will need some kind of ‘delta’ expression. The simplest possible delta, annoying as it is to humans, is an XOR (and I might yet come back to that). The next simplest is adding an explicit NOT, and representing deltas as sets of propositions P1, P2… NOT P1, P2…, to capture how propositions change from true to false, and false to true.

This leads naturally to a four-state logic: propositions not listed in the delta set (an infinite number of them) have logical state Unknown; propositions which have both a P1 and a NOT P1 have logical state Both. Whether or not we’d prefer a three-state logic, the intrinsic shape of the data seems to give us four states.

Belnap used four-state logic to investigate relevance logics, but after a quick glance at his ideas I’m stuck in the weeds of complex logical formalisms with very little idea on what he was trying to accomplish. Trying instead to consult my intuition on which way to go. Or at least wander around looking at some of the sights now that I’m off the path.

Relevance (or relevant) logics are one way of attacking the problem of explosion ( ‘from paradox, all propositions follow’) issue of classical logic. This annoying feature of classical logic is annoying because no human naturally thinks like this – which is perhaps a clue that this logic, though simple, is not based on our actual laws of thought. It’s maybe also linked to the problem of creating an efficient search procedure for logical inference – because explosion seems to involve dragging in irrelevant facts, while efficient search is about minimising the number of irrelevant facts we need to drag in.

The more I look at explosion, the more my sense of annoyance centres around the idea of disjunction introduction. My gut feeling seems to come down to: a simple logical fact is natural: it’s in the database or it’s not. A conjunction (AND) is also natural – two or more facts are in the database, or they are not. If we express the database as something like a set, or a relation, we seem to get ANDs ‘for free’: they are simply in the structure of the data.

A disjunction (OR), however is NOT naturally ‘in the data’ in any sense. It’s an extra fact we have to add: a higher-level judgement about data, in a sense.

This queasiness about disjunctions parallels the queasiness most people feel about material implication: that we could infer (A -> B),  ((not A) or B) if we only know that B is true. This feels intuitively very wrong because, I believe, we have a natural intuition that an implication is a statement about all possible states of the world, not about the current state.

The queasiness about material implication comes down to a disjunction being inferred. I would like to argue that – as our intuition suggests-  a disjunction is in fact a higher-level fact (at least, a fact in itself) and that it should never be legitimate to infer a disjunction from any part of the current state of a logical world.

I feel strongly that this is correct. In logic as practiced by actual humans, ‘A or B’, is not entailed in ‘A’, in much the same sense that the cons cell (A . nil) is not the same data element as the atom A. It is a higher order statement linking two statements. Think about it for a moment:

First, ‘I am in New York, or I am in Moscow’ is a sensible fact to add or infer if we have some reason for believing that either were the case. (Eg, we had been directly told this proposition: or we had observed something which entailed this disjunction). The disjunction, however, does not arise at will. The ‘OR’ here is itself one bit of data. We can’t arbitrarily add bits of data to a database during a search procedure!

Second: in the back of our human mind is our constant understanding that the state of a world is fragmentary, not completely known, and will change over time, and therefore individual facts should be complete in themselves, and should not be dependent on the state of the world as a whole. ‘A’ is complete a in itself. ‘A and B’ is also complete. ‘A or B’ is a complete statement if we know this as a given, as a whole piece of data. But ‘A or B’ inferred out of thin air seems wrong: it is created by the temporary, transient state of A or B, and even by the mere act of thinking about either of them. If it’s caused by the act of thinking (or searching), it’s metadata; again, it seems very much illegitimate to add metadata about an enquiry to the database as if it were the same kind of information that we are searching.

Another way of looking at this: Think about the statement ‘A and not-B’. This would also be true of A as long as B wasn’t in the database – thus it could, legitimately, be inferred at any point from a database which only contained (A). But if the state of B ever changed, it would become false. It’s easy to see that this inference links two facts. The inference ‘A or B’ also links two facts, but because (by accident) it is always true when A is true (though it is also true at other times), we might think it’s legitimate to think that A alone entails it – the ‘or anything’ is somehow contained within every A. But if you view connectives as links, this inference becomes obvious as a mistake. And indeed, in the case where we A somehow becomes both true and false, that mistake (in the sense of being a very strong deviation from human logic) becomes obvious.

What this line of thinking comes down to is that the (modern) classical-logic idea that connectives are and only are truth-functions capturing the current state of the database is dead wrong, at least in the case of disjunctions. In analyzing the intuition of how our own built-in human logic works, we see that we naturally treat disjunctions as two separate cases: we can evaluate a truth-function ‘A or B’ by looking at the database – but evaluating the truth of ‘A or B’ is a very different thing from adding ‘A or B’ to the database. Adding ‘A or B’ at any point means we are creating a rule, a link, a meta-statement about the state of the world in all its possible states.

This means, I think, that we need to start from a model of logic not as separated facts but as links between inferences.

I’d like to go further and say that disjunction is at the heart of the concept of ‘rule’ and ‘law’ – we can construct implications from OR, but also more subtle links between data. With AND we have ‘the data itself’, but with OR we have something much deeper, stronger and more interesting: the potential to construct data not explicitly described – to describe or put restrictions on infinite possible worlds. That’s where our whole concept of computation comes from right there.

And because of that deep computational-inference power encoded within OR – that it represents a structure that ranges across infinite possible logical worlds – we should treat it with respect. Just as you wouldn’t expect a compiler or interpreter to, at runtime, construct and execute random garbage lambdas  (well, unless it were deliberately stress-testing a system, such as fuzzing an API),  you shouldn’t expect a sensible logic solver to randomly introduce disjunctions – the logical equivalent of lambdas in their computational power – at the level of the primary database and expect sane results.

Prolog’s search procedure (Robinson resolution over Horn clauses with the closed-world assumption), interestingly, ‘bakes in’ this view of disjunctions, though it never explicitly says so. But in Prolog, disjunctions are never inferred during search. Implications can be added to the database only if another implication instructs us to do so. This decision makes the Prolog solver naturally easy to understand – but it would be very useful, I think, if we understood at a much deeper theoretical level that disjunction introduction is an unnatural operation – and not just making it a ‘solver design quirk’.

Restricting disjunction introduction is obviously what some paraconsistent logics do. But Prolog doesn’t deliberately intend to be a paraconsistent logic. It just happily ‘falls out’ of the solver design that it’s somewhat paradox-tolerant in a way that mostly aligns with human expections. I’m saying it would be smarter if we didn’t just trust the stability of our core logic to a happy accident. And that, given the massive complexity of paraconsistent logics, perhaps we don’t need to go all the way to paraconsistency to get big wins. Small steps in a useful direction may be more helpful.

If we eliminate disjunction introduction, we can still get paradox at the level of individual statements, or ones linked by  direct database-level disjunctions. This is where the fourth logical value in four-state logic might help us decide that from ‘both true and false’ we should not conclude anything. But in our human logic, it would still be an acceptable failure (though unhelpful) to conclude from ‘A and not-A’ all the entailments of A and the entailments of not-A; it remains a completely machinelike and nonhuman failure for A and not-A to entail ‘everything in the entire universe’.

The infinite-arithmetic model of the fourth state as ‘NaN’ may or may not be helpful here. It would perhaps flag a direct contradiction as immediately wrong. I think Belman’s use of it is much more subtle, and I’m not entirely sure I want to follow him.

But the idea of ‘AND is data; OR is computation’ I’ve been exploring here seems perhaps useful. I want to think about it some more.

Edit: Looking at OR as a secondary construct created from the more primary Boolean arithmetic operations of AND (multiplication) and XOR (addition) maybe sheds some more light. It’s very easy to see that A XOR B depends strongly on the state of both A and B, just as A AND B does, and that therefore it is not legitimate to infer it from one fact alone, while A OR B gives us that accidental (and unfortunate) coincidence with A.

But there’s still a deeper connection, I think. My intuition tells me it should still not be legitimate to conclude A XOR B from (A and not-B) even given access to both bits of data. Because the XOR connective is ‘not naturally in the shape of the data’ in the way AND is, therefore it must be an extra piece of information, separate from the data itself. And that extra piece of information is about how data changes across infinite possibility space. Inference and computation.

Edit: What I’m talking about here is similar to the idea of Strict Conditional in relevance logics: that implication, in natural human logic, means something like ‘in every possible world‘ A -> B. But I’m arguing a stronger version: that the OR/XOR operators themselves are, fundamentally, modal across all possible worlds. Because when you look at a set of facts, you see ‘and’ arising naturally as a result of its set-ness: the existence of more than one fact means there’s an implicit ‘and’, which is inextricable from the facts. OR/XOR, however, aren’t in the set of facts: they are statements about relations between facts. So the most natural interpretation is that they are describing possible relations among configurations of data that don’t yet exist. OR/XOR (and to an extent, NOT) are control operators while AND is a data operator.

This explains why, if a human reads a sentence with ANDs – ‘John owns a car and a bicycle’ – it feels simple and easy to understand: we can parse it directly into a single data model, or a picture in our heads. But a sentence with ORs immediately creates a sense of tension, nervousness, complexity, because it requires creating multiple possible worlds, sets of data, in our mind: ‘John owns a car or bicycle or both / John owns either a car or a bicycle, but not both’.  Try it: what do you visualise in your mind? A set of scenarios? Two or pictures: John with a car, John with a bicycle, John with both? A feeling that ‘This is a rule, not a fact – I don’t quite fully understand this, this could get very complicated very quickly, I need to sit down and carefully work it through?’

That’s because AND is naturally in the data model, but OR is a control operator – it’s creating constraints over a potentially infinite set of potential models. That right there is the power of logic, but also a hint that AND and OR are working on fundamentally different levels. One is a connective, the other a meta-connective, and we should not confuse those two levels. (Per Wittgenstein and contra Russell, they’re not different types or categories of things – if you can name anything, even a mental concept, it’s at the very least an instance of NameableThing – but there’s an issue of nesting level involved. What is the data set that OR, taken as a fact, operates on? It is not the current world state but the set of all world states).

This makes me strongly sympathetic to the idea that ‘testing’ or ‘evaluating’ a disjunction (which can occur at the level of ‘data’, or the current world state) is something fundamentally different from ‘inferring’ or ‘asserting’ a disjunction (which is at the level of ‘control’, or constraints over all possible worlds) and that two disjunction operators are probably required.Which I guess puts my idea of a ‘natural human logic’ into the subset of paraconsistent logics which 1) reject disjunction introduction and 2) have two disjunction operators.

In fact AND operates much the same way as OR – but it’s an accident that, in all possible worlds, the assertion of AND is the same as the evaluation of AND. If and only if ‘A’ and ‘B’, then ‘A AND B’ by definition. This is not true for OR: ‘A’ at data-level with or without ‘B’ does not imply the control-level rule ‘A OR B’, although the truth-function ‘A or B’ evaluates to true whenever A is true. So we need a way of testing ‘OR’ as a truth-function without asserting its truth.

Prolog resolution gets us this nice property that inferences are only made as a result of higher-level implications in the database, apparently ‘by accident’, but actually because it is optimised to be efficient, and the key to being efficient is to only deal with the smallest set of relevant facts. Many of the inference rules of classical logic are wildly inefficient, which should also be a hint that they’re not how human reasoning works.

Still not convinced that AND and OR act at separate levels? Here’s what they look like as syntax:

(AND (owns John bicycle) (owns John car)
(OR (owns John bicycle) (owns John car)))
You can see that they are literally separate statements: they can coexist at the same time. (AND is so primal that we have to, literally, wrap the OR inside it in order to merely put these statements together, to say that they are ‘true at the same time’.) Since they can exist at the same time, and are not identical, and can’t be made identical, they must require separate bits of data to describe. However all the bits of data that describe the current world state are taken up by the AND-statements. Since A OR B as a truth function is true of all B, any OR-statements in a statement like this containing also AND-statements would be vacuous tautologies. That’s one possible interpretation of ‘OR’ but it seems a very poor one, that adds very little information and makes it mostly useless. So the most natural interpretation of OR – the interpretation that maximises its information-containing capacity – would be that it’s describing something completely orthogonal to AND: a constraint that persists through all permutations of the world.

(This is applying the same logic which, applied to NOT, leads us to a four-state logic.)

An Infinite Arithmetic of Four-State Logic

September 16, 2016

Logical paradoxes are interesting, not necessarily because they demonstrate any particularly deep truths about logic, but because they demonstrate the differences between human thought and the simplest mechanical models of human thought that we can build. Most paradoxes won’t throw a human for a moment; but they will cause problems for a machine, if that machine is tiny, fairly simpleminded and doesn’t have elaborate guard conditions added.

Take the Liar Paradox, for instance – the classic example of how annoying the simple act of self-reference can become, ‘This sentence is false’.

It strikes me that most recursive logical sentences (of which the Liar Paradox is the oldest example, though Russell’s Paradox seems to me to be much the same thing except expressed in set theory) are essentially examples of the well-known mathematical concept of infinite series  (or rather infinite product – an infinite list of factors to be multiplied, rather than summed). It’s well known since the earliest days of calculus that some infinite series converge, and some diverge to infinity, and others just don’t produce any numerical answer at all. If a human looks at a Liar or Russell Paradox for more than a moment you realise it’s pretty much just an oscillating infinite series. So I find it odd that a line wasn’t immediately drawn between the two areas of inquiry.

I mean, maybe it was – surely it must have been? George Boole had already crossbred algebra with logic in 1854, and most of the logic we use in practical computing, simpleminded though it is, derives from him – but I do find it weird looking at, eg, Russell and wondering why he’s wasting his time with complicated stratified type constructions instead of just saying ‘divergent infinite series, got it, it won’t produce a result, long solved and not very interesting, next’.

Except I presume Russell et al were trying to put mathematics on a solid ground, and so probably didn’t accept calculus as given but were trying to prove it from logic, with the fixed and unalterable meanings of true and false (or at least set membership) as their starting point, and so having things like infinite divergences popping up in the very axioms they were starting from would have been a most unpleasant surprise. And then all the stuff we now take as standard from the theory of computable functions about non-terminating computations would have come post-Russell. So they did perhaps have some excuse for expecting, at least in the apparently simple grounds of logic, every well-formed formula to terminate, even recursive ones. Well, we know better now.

But still. With hindsight – and after Boole –  it does seem a bit surprising that they wouldn’t have started from calculus.

Anyway. If we wanted to model the Liar Paradox as an infinite series or product, it would be fairly simple to do so as the Boolean ‘not’ function:

Find the truth value of: (not not not not ….. )

This seems pretty obvious that’s never going to end… and therefore that the value isn’t going to be either ‘true’ or ‘false’… , but I’m guessing Boole hadn’t thought that much about abstract functions.

But there’s another way we could do it. If we didn’t follow Boole by assigning True to 1 and False to 0, the other obvious mapping from logic to numbers would be True as 1, and False as -1. This would give the Liar as:

-1 . -1 . -1 . -1 …

Which becomes a very standard numerical infinite product and it’s obvious it’s oscillating. However you choose to compute that, it’s NOT going to give you either -1 or +1 as a sum.

So much for the Liar. What’s interesting here is the modelling of logic-states-to -numbers.

A logic based on -1 and +1 suggests it’s going to have 0 as well, probably meaning ‘unknown’, which would be a trivalent logic, 0, and that seems very useful in the real world for modelling things like ‘deltas’ – sets of changes to the state of a world. But… three values doesn’t seem especially numerically stable. Thinking about, eg, William Hamilton’s dealing with quaternions, where he tried to model 3D vectors in algebra and couldn’t get it to work consistently until he went to 4 dimensions. And in fact, we seem to have an embarrassment of mutually incompatible ‘trivalent logics’, which is not a good sign. Could it be that something like quaternions is going on here? Is three an unstable number of logic states? Would four work better?

One could also come to this conclusion by just looking at Boolean logic and counting bits. One bit of information – ‘True or not True’ gives us Boolean. The next smallest step up is two bits, for four states at a minimum. The simplest mapping would be one bit for ‘True’, and one bit for ‘False’.: If we let 10 = True, 01 = False, that leaves 00 = ‘Neither True Nor False’ (unknown)… and a fourth state, 11, ‘Both True And False’. Which looks like ‘Contradiction’.

Now. Thinking about our assignment of +1 for True and -1 for False… that’s nice, but it doesn’t quite capture the irrevocable-ness of classical logical judgements. A logic based on ‘balance of evidence’ would make sense to use integers for ‘weight of data’… but if we’re sticking with classical logic, we have odd equations like True + True = True, and  True + False = Contradiction. What kind of numbers behave like that? Very large ones?

Infinite numbers do, that’s what. Can we assign our four logical states to infinities? I think perhaps we can!

True  = +Infinity
False = -Infinity
Unknown = 0
Contradiction = … what? How about 0/0.. undefined, or Not-A-Number?

Ten years ago, in 2006, a British academic, Dr James Anderson, ruffled some feathers by saying that 0/0 should be considered an actual number – he called it ‘nullity’. The storm of scorn and controversy the Internet and the Academy spat out in response ranged from ‘that’s nonsense, it will never work’ to ‘of course, we all know this is true but nullity is just a silly new name for IEEE floating point NaN’.

Anderson’s Nullity does behave slightly different from IEEE NaN: it compares equal to itself while NaN does not. NaN’s definition (which works the same as SQL Null, in that it literally means undefined value rather than every possible value) may or may not be more correct here. For our purposes, though, the important part is that we need to consider this value to be a legitimate logical state: what you get by adding True and False. 0/0 is Undefined – this is a kind of infinity that ranges across the entire number line, as opposed to the usual positive and negative infinities that live beyond each end of the number line.

So let’s do the truth tables of infinity + NaN + 0, and see what we get!

Addition = ‘asserting two truth values for the same fact’

0 + 0 = 0  == Unknown and Unknown is Unknown
0 + inf = inf == Unknown and True is True
0 -inf = -inf  == Unknown and False is False
0 – nan = nan == Unknown and Contradiction is Contradiction

inf + 0 = inf
inf + inf = inf == True and True is True
inf – inf = nan = True and False is Contradiction
inf + nan = nan = True and Contradiction is Contradiction

-inf + 0 = -inf
-inf + inf = nan
-inf  -inf = -inf == False and False is False
-inf + nan = nan == False and Contradiction is Contradiction

nan + 0 = nan
nan + inf = nan
nan – inf = nan
nan + nan =  nan == Contradiction and Contradiction is Contradiction

That looks good so far! Multiplication is perhaps a bit trickier to determine what it naturally represents in logic. The application of a function, I think – False works like Boolean Not – but I’m not sure what the other logical states imply. Specifically, if ‘Unknown’ behaves like multiplying by zero, then it takes True/False to Contradiction, which seems not quite right. Shouldn’t it become Unknown? But it would depend on exactly what the operation was.

0 x 0 = 0 == Unknown x Unknown is Unknown
0 x inf = nan == Unknown x True is Contradiction
0 x -inf = 0 == Unknown x False is Contradiction
0 x null = 0 == Unknown x Contradiction is Contradiction

Multiplying by Unknown might be asserting that a value is unknown. If multiplying infinity by zero gave us zero it would force the value to be Unknown, but since it’s not, it apparently requires the value to be unknown.

inf x 0 = nan
inf x inf = inf == True True is True
inf x -inf = -inf == True False is False
inf x nan = nan == True Contradiction is Contradiction

Multiplying by True *ought* to be the identity operation, but for ‘unknown’, it’s not.

-inf x 0 = nan
-inf x inf = -inf
-inf x -inf = inf == False False is True
-inf x nan = nan == False Contradiction is Contradiction

Multiplying by False inverts, unless it’s Unknown or Contradiction

nan x 0 = nan
nan x inf = nan
nan x -inf =nan
nan x nan = nan == Contradiction Contradiction is Contradiction

And Contradiction just asserts a contradiction which won’t go away.

(I won’t do the full truth tables here, but you can also, as in multivalued or fuzzy logic, take Minimum and Maximum, which should work like Boolean AND / OR. As with SQL Null, Min and Max should bypass a NaN value).

With the proviso that we need to more fully understand what ‘logical multiplication’ means (since it’s not, as in Boolean logic, the AND operation), it seems like, by accepting Undefined or NaN as a legitimate value, we get a sensible interpretation of four-state logic, a Contradiction that won’t let us derive arbitrarily True or False propositions, and a negation behaviour of False much like that of Boolean logic.

Perhaps this doesn’t let us immediately short-circuit the Liar, because -Inf x -Inf is perfectly valid? But… the fact that we’re taking the product series to infinity – and therefore the length of the series is undefined – and the final value depends precisely on the length being or not being divisible by two -. maybe does mean it’s legitimate to short-circuit! (Assuming, however, that we know the series will not terminate, which is not always possible or easy). The obvious short-circuit value would of course be Undefined/Contradiction.

So that’s one possible answer. In a four-state logic which also maps to an infinity algebra including nullity/undefined, the Liar can be immediately shortcircuit-evaluated as Contradiction. It’s a paradox-  you could legitimately derive any logical state from it. Except that you can’t because it depends on its own logical state. Except that in four-state logic, the only result you could derive is Contradiction. Therefore, Undefined/Contradiction remains a stable ground state. That’s nice.

There are probably other paradoxes which don’t shortcircuit as neatly as the good old Liar, but – as well as having a sensible mapping to two bits of logical proof, and having a nice algebra based on well-understood notions of infinity – this makes four-state logic look quite appealing.

And… this may well be Belnap‘s logic! So there’s a thing. But I’m not sure if Belnap mapped it to infinite arithmetic?

The Non-Closure of Binary Relations under Indexing

September 12, 2016

Binary relations are neat. They’re simple, mathematically well-understood, they provide a proper superset of the data-representation ability of functions, lists, arrays, dictionaries, and Codd tables. They seem like a great candidate for a universal structured-data type!

Except they have one annoying little property. They’re not closed under a very simple, commonly-used operation: lookup or indexing. By which I mean: the operation where you take a (chunk of structured data) and ask it ‘give me everything you have under key X’.

If you do this to a function, you’ll get either a value, or a function.
If you do this to a dictionary, you’ll get either a value, or a dictionary.
If you do this to a list (taking its CDR) you’ll get either a value, or a list.
If you do this to a relational Codd table (relational division, or joining), you’ll get a table of results. But Codd relations have a lot of preconditions. Your data has to be extremely regular in a way that real-world data mostly isn’t.

If you do it to a binary relation, you won’t get a binary relation. You’ll get a set of results.

A set is not, in general, a binary relation. It might be! But it’s not always. This might, or might not, be a problem depending on whether you want to have three basic datatypes in your simplest-possible semantics (set, pair and atom). But it’s annoying if you wanted to have at most two (structured_collection and atom).

A binary relation is like a nondeterministic function. If you treat it like a nondeterministic function, you’ll get some kind of {and here a system-dependent miracle occurs} fork point: eg, in Prolog, you’ll set a backtracking point, then you’ll get each one of the possible set of answers in a very specific linear order; or, if you use the findall() predicate, you’ll get not a set but a list of those answers, ordered in that very specific linear order. (Good luck defining the findall() predicate in predicate logic, by the way; it’s considered ‘extra-logical’, though you can’t do much useful work in Prolog without it. This also may or may not be a problem, but it’s annoying.)

In some other language, invoking a nondeterministic function might spawn a bunch of parallel processes. Top Minds tried modifying Prolog to support this back in the 1980s – the result, Kernel Language 1 with its ‘guarded Horn Clauses’, did not get very far, and wasn’t especially simple or logically tractable.

However, modulo that {system-dependent miracle} at the fork point, each of those subqueries will be closed – what they return will be either atoms or binary relations. Well, sort of. You may still end up with nil pairs inside your binary relations. (You will). These don’t map neatly onto our current model of ‘function’ or ‘dictionary’. So that’s a second annoying thing.

But because ‘get me the entire set of results’ to a query is such a common thing to want to do, it is very annoying that we can’t preserve that property of closure. (Not to be confused with the transitive closure of a binary relation, which is something else again, or with ‘closures’ as in lexically-scoped lambdas, which are also different).

Dave Childs’ Extended Set Theory ( see eg http://c2.com/cgi/wiki?ExtendedSetTheory  and http://c2.com/cgi/wiki?ExtendedSetTheoryStorageModel ) claims to provide that closure property! But I’m not convinced he’s cracked it. His XST, at least in the few papers I’ve managed to read, is the equivalent of sets of proper pairs – no nils, no atoms. You can certainly construct a lot of data structures with that. But I’m not sure that you can map that neatly onto the semantics of lists. In XST, a set containing only {X} is mapped as {(X . {})}.  {}, the empty set, is not the same as Lisp nil, ie, (), so it’s not at all clear how he would model a nil pair. But if {} was taken as the same as (), I think you would get something very bad happening: X would equal (X) for all X, which would destroy the semantics of lists. Childs claimed (back in the 1970s) to have proven something about XST, but I don’t grasp whether what he’s proved relates to the semantics of pairs and lists at all.

(I think XST does not concern itself with list semantics at all, really; it’s more based on fixed-size records. Which is fine for Codd-like relations with rigid sizes, but less fine for semi-structured data – like logical assertions – that doesn’t know its size ahead of time.)

Perhaps XST plus an explicit pair type would be stable? And in some papers on XST, it seems that he’s adding explicit pairs – which doesn’t seem like it gets us anything at all that just using ordinary sets and pairs wouldn’t get you.

Oh, and finally, XST is patented –  which complicates things a lot.

Edit: A relatively recent (2011) statement of the XST axioms may shed some more light: http://www.math.lsa.umich.edu/~ablass/XST_Axioms.pdf