Tell me who you are?
SECOND DRAFT of 14k words: please leave feedback. Thank you[a].
(fewer typos, nice layout, and comments. Head on over!)
Identity, institutional memory, and the persistent illusion of the self
“The soul, if it can be said to exist at all, is certainly without attributes” - Shivanath
More than a couple of times a year, I meet teams working on identity.
I always make myself immediately unpopular by asking them “what is identity?” and, of course, everybody acts as if it is completely obvious from our mutual shared context, and I should already know. Usually when people use the word “identity[b]” their implicit definition falls into one of following buckets.
Our challenge, in creating software to enable people to use or manifest their identity, is that to do either job well results in problems. On one hand, a world in which people are things and are serialized and tracked like all other things. On the other hand, a liquid world in which defrauding Grandma on eBay never, ever feeds back to a person’s dating profile, even after convictions.
Our job, as software engineers working on identity, is first to be philosophers - phenomenologists and epistemologists - to understand the abstraction that we hope to represent.
What is a person, and how do I identify one to a computer?
Charles and Ava Berner distilled a variety of Eastern and Western tradition sources into a practice called the Enlightenment Intensive. The procedure is pretty simple - rows of people facing each other, taking turns following the instruction “tell me who you are.” This sounds simple enough, but 10 hours a day for three or four days rapidly shakes many people’s faith that they understand themselves. Many experience a sudden break with their usual flow of life, perhaps along the line of what Zen practitioners would term “satori.” I’m not formally trained in either the EI process or Zen, so I’m not making a direct equivalence here, nor am I going to delve too much further into the technical details of the practice.
What I want to point out is that identity is a deeply secret and mysterious thing for us, and staring right at it for long periods of time is the lifestyle preference of poets, saints, and madmen. I’ve never read a single piece on identity in a computer science context - much less a business context - which fully spoke to the mysteriousness of identity, so I thought I’d write this one.
Nobody comes out of an Enlightenment Intensive with a pat answer. Temporary answers like “I am Pat’s[f][g] mother” can be factually true, but when examined in a lot of detail, according to many people, these descriptions seem to miss the core of a person’s identity. They’re things we do, or conditions we experience, relationships we participate in, but the sum of all the roles we play seems to fall short of describing the totality of our identity. Even when all the parts are put together, there is something undescribed.
There is something in the whole which is beyond description.
So I want to start by problematizing identity. It seems that whatever toolkit we approach identity with - art, philosophy, psychology, esoteric yoga, whatever it happens to be for any given person - there is no simple, clear, pat answer which holds true for a lifetime. For people that do “find themselves” (whatever that means) the result may be a higher quality of life and a less turbulent mind, less self-doubt or similar benefits, but they never quite seem to be able to explain what it is that they found, much less how a third party might find it.
You probably don’t know who you are. And if you do, you probably can’t tell me about it in a way that communicates it to me.
And we’re supposed to write software about this???!
So, for now, let’s assume that software isn’t going to help us much with metaphysical or deep psychological models of our identity.
What are the areas where software is doing a good job of describing identity?
Let’s take a look at four cases where identity seems to intersect software in useful and predictable ways - not strange one-offs or art projects, but the meat-and-potatoes software of everyday life.
I hope, by now, that I’ve entirely convinced you that identity is a field filled with half-understood categories, half-baked theories which none-the-less are being used by enormously powerful organizations every day to decide questions like “can I have a mortgage, please?”
I want to posit that there is simply no solution to these questions without building out a really extensive new theory of identity, and that such a theory is going to have to compromise with the philosophical abyss presented by the gap between who we think and feel we are, and the identities that the world generates for us on a more-or-less ad-hoc basis, without our say so or so much as a by-your-leave other than, perhaps, accepting an occasional EULA.
It’s a mess.
If we think the identity debate is purely ontological, if we share the Enlightenment faith in Natural Laws and Natuallly Occuring Kinds, damn right it’s a mess. The world looks unknowable.
However, if we understand that it’s a set of narratives that are rely on culture to be effective, we can talk about what narratives are useful and reliable, and try and construct some purpose built narratives that help us relate to strangers and make the queues at airport immigration move more quickly.
So I want to start over, from the other end of the equation: what’s the simple, easy, clear stuff in the identity space?
The simplest form of identity we have is our role in our family when we are babies. We don’t know our name, and in some cultures we might not even have a name, but everybody in our family knows who we are, and how we relate to them and the environment.
At this stage, assuming we are healthy, we have very few attributes. Maybe some early behavioral preferences, like sleeping or not sleeping, and a little biological variance like how our hair is. But there’s not much to us at this stage: no biography, no (visible?) hopes and dreams. We’re mainly made of potential, and the story of our lives is largely told by other people.
At this point, there’s no doubt, identity is what other people make around us. Let’s think about the identities a healthy baby might have.
Now, at this point, the near-absence of attributes of a baby is really worth noting. There’s just not much you can add: height, weight, ok. But no degrees, no employment history, no credit card debt - all that comes later!
Of course, in a fully high tech medicine society, there will be pre-natal test results, ultrasound and a whole pile of other medical information, but for the sake of argument, let’s imagine we’re dealing with a slightly lower tech culture - us as we were a few decades ago, or a poorer country where, if everything is going well, there isn’t much medical involvement in pregnancy. This is a slightly artificial simplification, but I think it really will help. Go with me on this! So we have our state-of-nature healthy baby without much of a medical history. Let’s now introduce the medical scenario and see what happens to baby.
Let’s say the poor kid is a few weeks premature, and a bit unwell. Hospital is involved. Now we have a case history.
Suddenly our attribute-free happy child becomes wrapped up in an entire world of data, identity, relationship and rights. And, let us note, none of this is endogenous - our baby has not generated all of these relationships, data and identities by will. Perhaps the little tyke had a rough outline of a medical history wrapped up in midwife and mother’s head, but when all is well, that data is soft and passing. It may be recorded, but it is scarcely processed. But now, in even a minor, routine crisis, the entire charting-and-recording apparatus spins up and it becomes urgent to build a medical identity for a child that previously probably had only the most superficial medical history.
Particularly in countries where families pay for their own medical care, and medical care for their kids, there is a sudden urgency here: who is to pay, and if nobody is going to pay, who is liable for covering the legal care that must, by law, be given to the child? This is no joke: if the situation was complex, and the care expensive, the price of a couple of houses could be spent saving a child’s life. So a lot of what is being established in the hard case of paying for your own health care is a set of entitlements that one is contractually bound to receive having paid health insurance premiums. Identity: “are you the child of a person who has paid their premiums?” turns into life rather than death in our worst case scenario.
Now, I want you to pause a second and think.
We’ve established in the first section of this essay that at a very fundamental level, identity seems to be philosophically hard to the point of near-insolubility. It’s just not that clear that the commonly used concept of “identity” neatly maps to anything which stands up to the full light of clear, conscious scrutiny. And that’s OK, we have quite a few concepts like that (try “love” next) which are at the core of our lives - we know it when we see it, but all written definitions seem to fall short of the mark.
Then we examine some actually-existing but still very, very odd forms of identity - social network, project leadership, and the ever-present surveillance marketing databases. All very dark glass, hard to see through - things are not defined.
Then in a medical context, we fall bang into the middle of the hardest possible kinds of identity: we know who you are, because we paid for you to be born in this hospital, and they gave you a tag 30 seconds after you took your first breath. And in the world where people have excellent medical care, that medical identity forms not that long after conception - medical records dating before your birth. That’s a trend likely to continue and become ever-more important as we discover more about epigenetics: your DNA and your gene activation continuously mapped from the first signs of pregnancy through to post-birth.
So this is my first concrete observation: philosophically and intellectually unsatisfying as “identity” might be as a concept there is absolutely no denying that in a variety of critical places, the current identity infrastructure works.
It is not always ideal. To access a patient’s records when they are (for example) unconscious in another country is an absolute nightmare, and there are many less rare scenarios which still throw up plenty of obstacles between a doctor and their patient’s data. But there’s no way around the basic conclusion that in the core cases, medical identity basically holds together.
But gods of bureaucracy help you if you get trapped in an edge case with your life on the line.
In the 1980s, if you had internet access, it was usually through a university. The university had a pretty good hold on your identity, and usually the “chain of custody” for your identity would go something like:
So the username and password you had were, in the final analysis, backed by a long chain of accountability which went all the way back to the government at the head end of the chain of custody of your identity.
There were probably other ways that the internet could have been organized than username and password. The username/password standard had been set years before, in the earliest days of the minicomputer era. But once those standards and those expectations had been set, path dependence sets in, and everything tends to look a little like what came before because if it does not, the users can’t navigate.
And in the old days, the 1980s and 1990s, those system administrators acted very much like town mayors or other local government. If you were being a pest on the internet, your identity was trackable: email@example.com went back to the Computer Science system administrator at Utah State or wherever it referred to, and he’d figure out who you were from your username, and then fire you off an email that threatened either your grades, your tuition or your life depending on the degree of infraction! This system of accountability lasted until The September Which Never Ended where new actors entered the space (in this case, AOL) who were paid, not by the federal government or by the universities, but by the users themselves. AOL had very, very many users per administrator, and no particular interest in kicking paying customers off the internet for abuse.
The result of this breakdown in accountability (a breakdown in the personal relationships between individuals using and providing internet access) was a deep and near-permanent change in the lived experience and culture of the internet. Identity, governance and accountability are beyond scope for this article, but do remember this story: it will come back over and over again as the internet progresses and new media like virtual reality or biometrics get added to the systems we currently have acclimatized to and take for granted.
It’s remarkable to think that the humble, ordinary username and password used to be connected to such a fundamental hierarchical chain of accountability, but that’s how it used to be.
Let’s compare that to today’s username/password combination.
A typical flow might be that I go to a web service, give it a username and password, and usually an email address which might come from (say) a one-use temporary email address provider. The owners of this web service now have no way of reaching me other than in-app messaging, and even if they can reach me, they have absolutely no idea of who I am. Therefore there’s no reasonable approach to managing abuse on these services, other than (say) tracing the IP address I connect with and going after me at the internet service provider level. This is, of course, rather an analogue to the old 1980s internet norms: in those days, the university system administrators were essentially the internet service providers. You could even argue that the IP address is providing much the same kind of accountability. But, in practice, there are so many users and so few administrators, and such unwillingness to go after abuse that - unless the matter is criminal, and in many cases, even if it is criminal - there’s simply nobody to go to.
The same credentials - a username and a password - have gone from being a more or less absolute guarantee of good conduct based on a rigid chain of accountability going from your academic institution right up to the nation state level, to indicating almost nothing other than you are the same person that opened the account yesterday, or their chosen representative.
Identity is slippery, and the rules keep changing. In a five year period around 1993, internet access and the username/password combo went from being a near-guarantee of good conduct because it was embedded in a comprehensive and federated model of governance, to being open to everybody in the fruitful anarchy we know today, ridden as it is with spam, fraud and aggressive idiocy.
Around the same time this was going on, corresponding work was being done to make it easier and easier for the machines to talk to each other. Every machine became outfitted with a “MAC address” if it wanted to connect to the network, with this address built into the hardware used to connect to the network (in those days, ethernet cards: this was the 1980s.) Vendors each got a range of the MAC number range, and it was up to them to not issue duplicate hardware.
However, mac addresses are not cryptographic objects - they are just numbers. This was OK before money started to change hands on the network, but once money starts to flow, keypairs (a public and private key, from strong cryptography) starts to be vital.
Enter the mobile phone SIM card, which has a globally unique IMEI number to identify it, and also a keypair on the SIM which is used for billing. Identifying which phone - and therefore which contract - made the call is done by identifying the SIM in use through its keypair.
This system has proven to be remarkably stable. Mac addresses are sometimes changed by software (even for privacy reasons) but nothing much changes when this happens; perhaps an occasional machine gets identified as being something it is not, but MACs are not used for routing traffic etc. usually. The internet address which machines get when they connect to a local network is another issue all together: a centralized, top-down numbering system in which the world is divided up into zones, each getting a number range to put machines into. This top down system operates all the way down to your router when it issues you an IP address: that address has come from the IANA (Internet Assigned Numbers Authority) to your nation state or network provider, to your router, to you.
If this sounds very baroque - bureaucracy straight from the Brazil film - it works because all the companies making hardware, selling internet connections, and managing these numbers simply have to make it work to get paid. Their common interest is that these systems just work and stay out of everybody’s hair, and as a result cooperation to resolve problems (including through standardization) is usually swift, spontaneous and effective. (There may be some chortling from people who are intimately involved in IANA etc. All I can say is “compared to the climate process.”)
Part of what makes this possible is that there is very little commercial competition over these numbers: one device per number, but nobody (generally speaking) buys/sells/speculates on these numbers. They are anonymous and unimportant. In the few instances where scarcity has resulted in arbitrage opportunities, usually the outcry at crass commercialism taking precedence over the structure of the network and its function is immediate.
Note this is entirely different to the domain name system, where there is massive political contest over who gets to sell which domain names, and over the domains themselves as people scramble to find a domain name that suits their purposes (alas, poor http://leashless.org). There’s little commercial advantage to having the right number for a device; a number is just a number. Much the same kinds of systems dole out credit card numbers to the banks, and that all works out just fine, for the most part.
So in the land of machines, things work basically smoothly. All the trouble in the world gathers around DNS and HTTPS, and we’ll look into that below when we discuss certificate authorities and their problems.
Back to humans.
So far we’ve examined a few different ways that identities are acquired, mostly defaulting to some long variant on the state hands you out an identity number and it is used to back things like hospital or university records.
The biometric has an appealing quality, in that it is not necessarily tightly associated with a State for its issuance.
So let’s start with the base case: a simple facial biometric. I take some clear pictures of my face. A computer measures things like how long my nose is, or the distance between my pupils. These things are hard to change, and (if we measure enough points) pretty unique. Anybody who matches all these facial biometric features is going to be more similar looking to me than even an identical twin might be (in many cases.)
In theory this is a pretty good way of identifying somebody.
In practice, though, often we want to be able to identify ourself in different ways to different people. Suppose I am a closeted gay member of a church which is rather hard on gay people. If I use the same ID process, involving my “you only get one” face, as identity for both my church and my gay dating sites, I’m wide open to blackmail and exposure. An enterprising hacker can match my identity information across a couple of different databases, come to the conclusion I’m a closeted gay man, and then extort me.
This may seem far fetched, and obviously something that could be done much more easily using my computer’s IP address: connects from the same address, two separate lives, obviously there’s an exploitable person here. This is an excellent reason why people with things to hide should use tools like Tor to obscure their IP addresses, and why websites should not keep logs which could be captured by a hacker and correlated to other sources of data. The recent Ashley Madison hack exposed an enormous number of people living double lives, because personally identifying information was spattered all the way through their inadequately protected databases. I don’t know how many lives were destroyed in that hack, but the answer is certainly “more than none” and we should be acutely aware of the risk presented by personal data in sensitive areas of people’s lives.
So let’s go back to the biometric example. Suppose that I authenticate to a web site by waving my face into a phone camera. This seems pretty reasonable: in fact, my Samsung phone can use my face to unlock itself. I wave the phone in front of my face for a fraction of a second, and it unlocks. I’m old enough that this seems rather magical, particularly given that it’s reasonably reliable and precise. Could similar mechanisms be used to log me into Facebook?
The answer is obviously yes, but here’s the problem: somebody can make a fake version of my head that meets all the biometric criteria required, and now can log in as me.
I realize this sounds absurd, but what’s required? Some high resolution pictures, maybe a 3D printer, and some experience making latex moulds / rubber heads. Certainly it seems unlikely anybody’s going to do this for me... but suppose I was an elderly rich person of the type so often targeted by unscrupulous criminal gangs for fraud, intending to hit scared old people right in their retirement savings. Of course, many of these old people have weak passwords - in general, in fact, people have incredibly weak passwords. So, once again, as with IP addresses leaking huge amounts of personal data, we must remember that existing practice has horrible privacy implications already. The hypothetical flaws I’m discussing with biometrics are certainly matched by equally horrific flaws in our current identity infrastructures, as I’ll discuss further.
So with Grandma’s nest egg in play, our unscrupulous gang takes some pictures, makes a fake head, logs into her bank account, and wires the whole amount to a bitcoin ATM in a far away country, never to be seen again. Using a rubber head to hack somebody’s identity may seem utterly bizarre, but these things are already happening at a lesser scale. Consider this stunning story of a group of Brazilian doctors manufacturing fake fingers so they could log each-other in at work.
Now, I want you to think about this carefully, relative to the username / password situation.
With a username / password, something I (and only I) know is what identifies me. In theory I can keep this information in my head, so short of using FMRI to pull the secret out of my head, there’s no way for a thief to steal my username/password combo. But if there’s some kind of technical attack on my computer, where my password gets copied by some sneaky piece of software as I type it in, my identity is compromised.
And I can change a password. The biggest critical problem with using raw biometrics for identification is that I cannot change my biometrics. If the vital data leaks, I’m compromised for the rest of my life!
DNA is, at least on the current scientific knowledge, a seductively perfect biometric: a unique sequence of digital data, one CD worth per human, accessible from a swab (you don’t even need a blood sample). Unchanging through a lifetime, with the ability to read the DNA sequence from a sample getting cheaper, easier and more reliable every year. We can even tell the difference between “identical” twins now. And you get lots of additional information from DNA about useful things like family structure - who parents, siblings, cousins, grandparents and all the rest are, pretty much 100% reliably baked into the deal.
DNA is great. If it was only private, it would be perfect.
The rub with DNA is that it gets everywhere that cat hair gets, and further. If you’ve ever owned a fluffy white cat, or a golden retriever with a tendency to shed, you know what I’m talking about. You’re at a friend’s house, and you see their nice black jacket, and there’s one of your cat/dog hairs stuck on the arm. They’ve never even worn the jacket to your house (they aren’t daft!) - and yet there’s no limit to the reach of your dog hair.
DNA from shed skin cells is even worse. Such tiny quantities of DNA tend to get mixed in with everything else, but the techniques continue to improve. Sequencing is faster and faster, and techniques for rejecting environmental contaminants grow ever more sophisticated.
In the limit, perhaps in twenty or thirty years (as far forwards as the internet is looking back) it might be possible to discover everybody who had set foot inside of a shopping center by sweeping-and-sequencing the dust on the floors. You might well get next of kin and coworker data for a lot of them too. If there is a DNA database to turn such incidental biological traces into identity information then privacy as we currently know it is a thing of the past. Obviously political use could be made of such technology too - unless people are going to start attending riots in drysuits!
Over and over, with biometric information, we are going to discover that the data itself is harmless, but the ability to compare the data with data from other sources is dangerous. Imagine that I am a small shopkeeper with a smart camera that I use for keeping people who have behaved inappropriately out of my shop. There’s not much that can go wrong here: I have 5, 10, maybe 20 people a year on my camera’s memory of miscreants, and if they come into my shop again, it beeps. I could remember the faces myself, but how do I teach my staff to recognize the people I’ve banned? Probably easier to have a machine do it.
If instead that was a futuristic DNA system, it is no more or less dangerous than a recognition camera. These things, in and of themselves, are not dangerous. Something that aids my memory is not, in itself, bad.
But networked it is a whole different story. If my “shoplifter memory” system analyzes the DNA data of everybody coming into my shop, I could well know more about people’s familial relationships than they do. “That’s not your brother!”
Likewise if I correlate the DNA samples from my shop floor with medical databases: now I know your genetic predispositions to a range of diseases. If science continues to make strong advances in mapping genes to risks this information could be quite valuable - mapping out insurable risks for a health insurer, say. Again, it’s not the data which is dangerous, but the correlation of the data.
Finally, what if I as the shopkeeper network my shop’s DNA scanner output with everybody elses? Now we can track shoplifters, sure. But we also have to contend with cantankerous shopkeepers who put their former employees into the database as thieves by accident, or otherwise contaminate the “reputation” database associated with each piece of DNA that the network has stored with false information of some other kind. Suddenly we have an enormous capability to identify people, far exceeding our ability to justly and prudently profile them based on their previous behaviour. We cannot actually manage the reputation databases all that well - scandal, hearsay and gossip penetrate into people’s record far too easily - and if these systems have something resembling the weight of law (at least common law) in everyday life, there are clearly huge consequences to even the occasional accidental false report, or intentional and malicious contamination of the history data.
The better able we are able to identify people, the more certain we must be that the identity databases which back up those conclusions are flawless.
I don’t think that’s going to be an easy job at all.
While DNA gives us perhaps the best imaginable way to distinguish two individuals from each-other in the normal course of day-to-day business, but in the process it reveals the exact genetic composition of their body - essentially the “source code” to their biology! We know that an awful lot of very private, very sensitive data is revealed in this template: propensity to cancer, diabetes, depression, resistance (or not) to HIV and many other medical factors are encoded.
There is reason to believe that temperament may also have at least some genetic factors: heritability of temperament is still an open question (nature, nurture, epigentics and more have to be weighed) but, because we do not know what we do not know, it is probably safer to assume that there might be huge insights into human personality and general cognitive function from the genetic level and plan accordingly: to defend our genetic privacy against all attempts to turn our DNA into a casual identity validation mechanism. But, of course, we are still faced with the problem of leaking our DNA in every physical environment we set foot in.
I strongly suspect that the paradoxical nature of DNA - as an identifier, but also as deep insight into the biology and possibly psychology of the person identified - is going to be a defining quandary of the 21st century. As our tools rapidly evolve to allow us to use DNA, can business, government and society keep up and create countervailing protections and indirections, keeping us safe from the transparent nature of at least some of our genes?
Into this gap steps cryptography[m]. The first, easy-to-imagine measure is biometric hashing of DNA data. An arithmetic formula is applied to 750mb of DNA data, reducing it to a few dozen lines. If the formula is run in reverse, those few dozen lines could decode to one of a trillion trillion or so possible DNA sequences. This is just how hashes work in general - those long strings of “hex” (8511b2ee59142cf1ced7e70ff6fca103 for example) can be derived from any digital data source. If one bit changes in the input, roughly half of the output changes[n], so hashes are (generally speaking) very secure. Indeed, Bitcoin’s security rests on the security of hashes, and the computational difficulty of finding two texts which have similar (not even identical!) hashes.
Of course, biometrics are (generally speaking) a bit squishy and analogue - they are, after all, measurements of a body. Even DNA sequence matching is usually probabilistic, because even if the data is dry and digital in the abstract, in practice an awful lot of error-prone wet chemistry is performed to get to the matches and sequences.
Most of our current generation of biometric hashing schemes mash the data, so that when it goes into the hash, if there is an analogue measurement error on the body the impact of the error is reduced. For example, if a fingerprint line is obscured by a paper cut, the impact of the error compared to accurate measurement of the lines on your fingerprint is reduced by the algorithms so that the underlying facts (fingerprint A is the same person as fingerprint B) are not obscured.
In theory, a biometric hash of your DNA ought to be a pretty reliable artefact: your DNA sequence does not (for the most part) change very much, although within muscle groups, organs, the brain etc. the parts of the DNA which are active and inactive are changed by diet, exercise, heredity, stress, use, and many other external factors. Changes in the active and inactive sections of DNA do not alter the underlying code, so while these factors are medically interesting, they do not affect our gene sequences for biometric purposes.
The same kind of biometric hashing can be applied to fingerprints, iris scans and various other kinds of measurements of the body, with differing degrees of success, confidence and precision.
The problem is that to prepare a biometric hash, I require a complete biometric: I need to have a sample of your DNA, which I sequence, to check your DNA against a biometric hash in a database or in a block chain. In theory I might delete that data immediately, but in practice the temptation of people to filch biometric data (or even biometric samples) to pry deeply into people’s bodies and potentially minds will likely be too strong to be reliably resisted. It’s hard to imagine being in a world where somebody steals some cells you left on the arm of a chair in your doctor’s office, and the resulting data set doubles your life insurance premiums because of a family risk of some expensive disease or other, but this is exactly the kind of world that cheap DNA sequencing and interpretation of results takes us into.
To get back out of this world (probably best visualized in the movie GATTACA) we have to visualize a comprehensive effort to control DNA sequencing equipment so that it doesn’t simply wind up lying around to be used by anybody who can swab some DNA off a pair of shoes you once handled in a store. Perhaps we could envisage a future in which a canonical copy of your DNA sits with some kind of highly secure agent who protects it on your behalf, and allows medical personnel with the correct consent forms and identification to access your most private DNA records.
We can be sure that handling personal DNA records is going to be at the core of the hardest problems in future privacy systems. It’s also likely to be at the nexus of interests who want to be able to uniquely identify humans. Such a future is likely to be highly dependent on cryptography for holding the records unreadable to the people storing them (a simple single blind system), and perhaps for authorizing and logging access by whatever legitimate users might need to access your DNA records. This is not a scenario to be modelled in detail now, in this discussion, but some of my earlier work touches on a less cryptographically heavyweight solution to these problems.
So where this puts us is staring at two areas of incredibly rapid and sophisticated technology development, both of which have a school of thought which views them as keys to solving the problem of doing business with people we don’t know.
On the blockchain side we get objective permanent record storage without creating an intermediary like VISA. Plus we get things like anonymity, pseudonymity, public key infrastructure for figuring out who we’re talking to without having to meet them in person and various other useful properties from the cryptographic systems which typically go right alongside or within blockchain technology. This is a near-ideal scenario for many kinds of identity, but it ties quite poorly to the legacy identity infrastructure provided by the nation state.
Here’s the why. The model of identity encoded in Bitcoin is cryptographic keypairs: you have a secret (key) which you can prove to other people without ever revealing it. So we can prove (mathematically) that a person who knows a secret (X) is the person who (for example) signed both document A and document B. We cannot tell anything about this person - not a name, not an age, not a face - without additional information. One method by which that information is traditionally attached to a keypair is called a “keysigning party” in which people come with their passports, and a computer, and they take the name on the passport and sign a statement that a given cryptographic key is under the control of the person who matches that passport. While a single attestation of this kind may be considered quite weak, a few dozen attestations of this sort including some from either people you know or well known honest public figures turns out to be quite a comprehensive validation that a passport matching this person’s face exists.
However, in this paradigm, most uses of this secret key and the associated identity are private - personal email, say. In the blockchain paradigm, at least at present, the vast majority of actions undertaken with a key are public: attestations do not become “real” (socially agreed as admissible in an argument, say) until they have been lodged in a block chain. So if we link your real name to a secret key, and then put all the actions taken with this secret key into a blockchain, you have essentially zero privacy. The Bitcoin model is that people’s privacy was preserved specifically by not linking their passport name to their secret key, so you could only tell how much Bitcoin a key owned, not who owned the key.
When these two models collide, what’s left is a kind of panopticon: a permanent record, tied to your real name, in a medium which cannot be erased. The “web of trust” model (the old name for the passport-and-secret-key type activities described previously) and the blockchain model are fundamentally different models of privacy, each one of which works reasonably well within its parameters.
But when they are combined, at least in an unsophisticated way, the whole is less than the sum of its parts: nobody (except maybe David Brin) wants a world in which everybody can see everybody else’s financial transactions all the time, right down to kids inspecting their parent’s Amazon purchases around Christmas.
So if we aren’t going to enjoy a naive tie to the passport name infrastructure provided by the government, what about biometrics?
Let’s consider that case from the beginning, taking the worse possible case first. I put my DNA on to a blockchain - not a hash, the whole thing. I then link it to my secret key. We will examine some of the mechanisms for linking a DNA record to a secret key later on, but for now, just assume we do that part in some convenient way. Now we are in a position where anybody in the world can estimate my risk of heart attack by taking my genetic profile and matching it to my purchase history. If 100% of what I buy goes through the blockchain (a scenario many “Bitcoin maximalists” would predict) it’s not at all unreasonable to think that random third parties could make an educated guess, from genetic and lifestyle factors, what my heart attack risk might be.
In a less extreme example, what if I tie a keypair to my fingerprint hash - so the blockchain doesn’t record my fingerprint directly, but stores information which can be extracted from my fingerprint. Again, we have the permanence problem: future science figures out how to reverse from a fingerprint hash to a fingerprint, and we are back at people 3D printing rubber fingers to open my doors. Or people can take casual fingerprints I leave behind when, say, examining goods in a store and recover my identity from the blockchain and use my purchase history (again, all on the blockchain) to send me devastatingly personal targeted advertising.
The “public profile on a blockchain” model just does not seem to have the kinds of properties we want our societies to have: it’s the wrong kind of transparency. There’s an essay at least this length on different kinds of transparency. It’s probably already been written by somebody who sees the world mainly through that lens: if you know what it is, please put it in the comments!
So if we can’t do the naive thing, what might work? What do you have to do to get blockchains and biometrics to mix?
So what if we simply put all of our biometric information on the blockchain, accept that friends and family will hector us about eating dessert based on what they know about our risk of heart disease based on our genes, and life goes on.
This is another “path not taken” situation. There were early proposals for containing the HIV epidemic with compulsory testing - extreme proposals, proposals that might well have destroyed quite a few people’s lives. Over the course of the epidemic, over half a million people are thought to have died of AIDS. Suppose a compulsory testing approach had been adopted: would life have gone on? Yes: it would be a different world, but it would not be the first time that society has taken a dramatic wrong turn with a critical situation, and the world muddles through. Just look at how badly we screwed up on breaking the atom! So there are clearly historical precedents for this kind of radicalism. The Personal Genome Project at Harvard has (anonymous) individuals publishing their genomes for public use, and it is not at all unlikely that a keen interest is taken in the datasets of groups like 23andMe who see an awful lot of genetic data.
We are, in a sense, quite a long way down this path already. Genomes have not been hacked, stolen and published yet (at least not as far as I know) but it’s only a matter of time until raiding celebrity medical records results in genetic data being leaked, or simply sold on the open market.
For our older readers: yes, this is the cyberpunk future you were promised.
So what kinds of things might change if we accepted that, over time, most of us were likely to be identified by our genomes: brave volunteers first, then a few leaked celebrities, then (as nothing particularly bad happens) life goes on. I think we would see three broad kinds of change.
I have no doubt at all that after a few generations, this would all seem as completely and perfectly natural - with all of its utopian and dystopian features - as having every click we make on the internet used to target advertising to us from semi-visible corporations whose models of our behavior focus on which buttons they should push to make us buy stuff, or the knowledge we now all have post-Snowden about the likely fate of our electronic data with respect to various aspects of the US state.
We just get used to things. We did it with the Cold War. We are currently doing it with climate change.
It’s just what we do.
We’re going to do a lot of adapting to technology in the next few years. I’ve lived my entire life without driving (neurological glitch affects my ability to judge distances) and pretty soon I expect self-driving cars will be an option. Odds are I’d never own one, either - on demand use. And the delivery drones won’t take that long to arrive, and so on. You think we might balance out this whole identity mess, but it won’t last, not longer than a generation or two. Then it’ll be cloning or personality uploading or people taking illegal genetic modification agents and turning themselves into big lizards.
So try thinking of this as a continual attempt to fit “the social good” or “progress” or whatever your chosen set of virtues (“Liberté, égalité, fraternité”) happens to be. This is the role of critical technologists - to understand that we have choices, that each technology or set of technologies comes with a set of options, a set of decisions, about how we will integrate it into our lives, our societies, and our culture.
We aren’t particularly good at this steering process. In fact, it’s very nearly cost us the world a few times already in the last two centuries. So as we examine some slightly less radical solutions than “let’s just dump all of our genetic and biometric data into the public domain and figure it out” think about this as a critical technologist.
We must negotiate with each incoming set of possibilities, each possible future, and individually or collectively we adopt or ignore the offers that these futures make us. And this is fundamental: if hotheads in the American government had forced compulsory HIV testing as their front line response to the AIDS crisis, the human rights crisis created might be running to this very day.
So put on your critical technologist hat, and come with me through the next two options.
Let’s reframe what we are talking about: there are two amazing technologies marching rapidly into our culture - blockchains and biometrics. Apple Touch ID meets Bitcoin. The simple version of that story isn’t so scary at all: the Bitcoin wallet on your iPhone is protected by your fingerprint. That all seems very good. But it’s also tied to the same architecture at wider and wider scales:
Now we’ve gone from a real convenience - simple passwordless protection for your digital cash accounts - through to an all-encompassing, all-embracing solution to matching identity to actions. Each technology independently matures, and the intersection between the technologies becomes huge, intrusive, massive.
And that’s how simple, fast and painless this could all be: incremental adoption of technologies following one on another much as we did with the internet and wound up with continent-spanning robot warehouses which will deliver ice cream and red wine to a dinner party if you order them half an hour before your guests arrive, at least in major cities.
So keep your eyes open for those weird little weak signals which indicate the shape of things to come.
To expand, our second option is to work with biometrics and lots of other personal information (like spending records) on a blockchain, but this time to encrypt as much as possible as we go. For example, a naive blockchain implementation of identity might sign your real name into the blockchain, but encrypted with keys which are closely held so only financial institutions and the state can use them. However, keys being keys, they leak: consider DeCSS, in which a single prime number enables one to play DVDs on devices which the industry considers improper.
But with the right kinds of key management schemes - information compartmentalization - a public register can still provide useful kinds of truth, but reflected as in a multi-faceted crystal: I give my bank a key which reveals another aspect of my identity, and they can know for sure that this key reveals a past which actually happened, because all that information is on the blockchain.
This seems, in principle, quite stable. But the information contained in these documents should last at least a human lifetime, and 100 years from 2015 is a very, very long time indeed. While I’ve generally been pretty shy about the Singularity and its friends it is worth considering that most of our current generation of crypto will be long, long obsolete by 2115 - a century (not an unusually long lifespan these days) will give us computers a thousand trillion times faster than today’s computers, if speeds keep doubling every two years.
Quantum computers, new mathematics (particularly number theory effecting prime numbers[p]) and so on accelerate the pace with which the code making up blockchains gets broken down, albeit that it will still work nicely - it just will no longer be remotely secure! People will be able to rewrite anything, perhaps even the past, if they have a big enough computer.
And this is not something we can patch, this is inherent to the nature of all of our public key / private key (“asymmetric”) cryptosystems.
In this model where there’s a blockchain with material encrypted to various special-purpose keys on the blockchain, allowing certifications like “this person is over 18” without having to out their full identity, there has to be technological migration as algorithms become obsolete. You have to pull the data out of one format, decrypt it, reencrypt with a cipher expected to last for at least another few decades, and this dance becomes a standard, static part of how we move our identities into the future across several rounds of profound technological change.
Quantum computers, anybody?
The strategy of using encryption to protect facts on the blockchain can only work if there’s some way to get rid of the old data, or if the old algorithms last long enough that we are dead.
And there’s the catch: in all probability there will be no meaningful content[q] to delete (finally, really, no copies remain) data which is being migrated to new cryptographic algorithms. That implies there is no reasonable way to keep information on blockchains which is encrypted and likely to remain sensitive for a 30 or 50 year period. Information disclosed in near-absolute security now is likely to be broken into in future.
At the 100 year horizon, one popular key strength estimation website suggests that a 10,000 bit key would be required for security. Today even the most paranoid only use a 4,096 bit key.
So this suggests that keeping our private data private, well enough to protect young children today - or those born in 20 years - requires standards which extremely aggressively model and map technological change, and deploy algorithms with the best possible chance of being secure well over the farthest horizon that we can imagine.
Most of our record systems were never designed to last forever. Some civilizations used clay tablets which lasted essentially indefinitely, others used palm leaves which have fared a little less well than baked clay.
Given how fraught everything to do with identity has been in the 20th century, from genocides run on reading people’s government issued ID cards (more than once) through to the current ongoing debate about how much of our personal life only exists because Facebook curates it for us, perhaps what we need to consider is:
It seems unlikely that the NSA will have held medical records sacrosanct.
It seems unlikely that they will look past any large repositories of genetic information, particularly if it turns out that there are some genetic sequences which can be identified as behaviorally-linked to (for example) terrorism.[r]
Perhaps, then, what is needed is a rather different approach to blockchains, trust, and the public sphere.
Blockchains are great for pointing out things that we want to prove to other people: “look, it is right here, in the blockchain.” This is great for things like property registers, but perhaps not quite so great when it comes to all the things we want to selectively disclose.
So what if the architecture is a little different: what goes in the blockchain isn’t the data - or, in many cases, even the hash of the data. Instead, we store a signed statement from a seriously credible source, or the necessary information required for a zero knowledge proof. A thing we might often store would be a public key, or a set of statements-and-signatures on a public key. It might be a script to perform some function, but without tons of personal information on it.
This is a transactional vision of the blockchain: something that’s a little less about storing the entirety of our being on a permanent record, but more focussed on disclosing the minimum required for people to be able to do business (or other critical functions) together.
Maybe information is partitioned: your school grades, but not your name, all separated with a zero knowledge proof. If you have to prove that those are, in fact, your grades - well, that’s what that public key is for. You demonstrate possession of the relevant private key - you know the secret - without revealing the key. Every piece of personal information about you is stored with a different public key: you have a big fat keyring, and each key reveals only a single fact.
And maybe those keys have blind signatures or proxies or similar arrangements which prevent somebody noticing that Applicant for Job 1 has the same “see my proven grades” key as Applicant for Job 2. We are not helpless in this mess: a certain perspicacity and awareness of how long time is, how fast things pass, and the sheer complexity of the upcoming 21st century leads to a sagacious approach to information partitioning, database translucency, use of temporary credentials for passing needs, being economic with disclosures, even guarded. These trends are very different to what has become normative in the social media age in the liberal democracies which are still, just barely, the majority of the internet. But this kind of blind trust that the data you publish will never, ever be used against you is very much a product of liberalism. People from countries with more to fear, or cultures which were ruled by aggressive empires, have quite different norms when it comes to personal disclosure.
Although preferring legal protection to technical protection sounds like naivete, inadmissibility in a court of law is still a very serious problem for data gathered by illicit technical means. Data protection legislation prevents organizations with shorter perspectives or less technical sophistication from simply throwing critical facts about us on a blockchain and then apologizing when it turns out that there’s no way to get it back down again.
Actually, it doesn’t prevent such careless behavior, but it at least punishes it.
So let’s factor all of this back together, and take a crack at defining identity and speculate about some possible stable equilibria in identity that the blockchain might enable.
We can’t go on like this, the systems we have just will not work. Identity theft, including complex identity frauds going on for years, are a huge criminal enterprise which keeps wrecking people’s lives. Many people sense huge economic opportunities from straightening out the identity mess and making things work, but we have to be sure the solutions proposed do not simply become the next generation of problems. Finally, as with my work on Simple Critical Infrastructure Maps, we have to be person-centered if we want to find a stable point which will survive the extremely rapid rates of change which are so much a part of our twenty first century. Put the person in the middle of the story, and work out from there, and generally speaking technological change will not break your model: people are, for now at least, people. Ask me about this again in 2050.
So what can we do that works?
I believe in a simple division of the identity space into two parts.
Profiles are a set of opinions that somebody has about me. These might include things like medical records, which are stories about me. Note that I’m very carefully describing these as stories because I, as an individual, have no idea what is in your profile about me.
I cannot validate it. I might not even know it exists, and it might be filled with utter stuff and nonsense. Your profile is your business, even if it is about me.
Much of the direction of our society right now is towards merging profiles - credit records and social media to give even more accurate forecasts of loan risk, that sort of thing. The problem is that each profile contains within it a set of assumptions (a hidden frame, a perspective) which is incommensurable with much of the other data in such a combined profile. While these errors are small at first, the inevitable semantic problems of profile combining will always lead to problems which people (in most cases) will never be able to correct, because the profiles about them are largely secret, private, or shared between cartels, but never shown to the people they are about. Public profiles (for example, a university which publically lists student grades) might have quite a bit of utility in some cases - see “privileged profiles” below.
Actions started out being called, in my mind, a volitional register - a log of the places where I have made choices. These would typically be key life decisions, like signing an employment contract, where the evidence that the signature was valid and accepted is a visible flow of money ideally across the blockchain itself. Another good example would be a mortgage payment. In all cases actions ideally lead to a stream of payments into or (better still) out of an account, something which shows that the contract was actually performed.
Who are you? A complex construct inside of the head of a naked ape that looks at the sky and wonders, or are you (in fact, in the final analysis) the sum total of your mortgage payments.
If I’m looking for an identity, the person who’s paid the mortgage for 18 years is a very, very tangible person - a person who’s got £216,000 of evidence over almost two decades as an escrow, a proof of trust.
It’s not even about the property, it’s about the sheer historical fact of the payments which, particularly in a blockchain environment, constitute a visible, transparent impact of a decision made. You sign the contracts, you make the payment, and that identity is very solid. Same thing applies to student loans and almost any other major contractual commitment. We should not be surprised that the same foundational pillars which banks have been using for generations turn out to still provide support in this new world.
Now, in a pre-blockchain environment, actions and profiles get confused, because more or less the only way I can get to your actions is by asking somebody else (your bank) what you did. This confusion is pervasive. On a blockchain, because my actions are self-certifying, like performative speech acts, I can present you with proof of what I have done without any third party having to vouch for me. This identity is all mine: self-sovereign.
So we have to mentally separate it all out: my proof of what I did is on the blockchain, and you can see it if I tell you which public key corresponds to my actions. Other people’s profiles of my behavior might even be on blockchains, but if I didn’t sign the transaction, it’s not “me” in the same way as my digital signature is “me.”
Note that, for the moment, I’m holding back on things like key management and how we manage lost keys and similar issues. There are approaches to those problems, but let’s get the epistemological basis straight to begin with, then manage the resulting security issues.
Now profiles, as noted, are other people’s stories about me and my behavior. They could be riddled with errors, or bound to me by mistake: somebody else uses my health insurance number because of identity fraud, resulting in me having medical records showing I’ve been prescribed drugs which, in fact, I have never taken.
Similarly, my actions are not my stories about myself: they are the evidential trail of things I have done, and preferably are bound to a keypair.
Not everything is a profile or an action. Let’s deal with some of those cases.
My profile about myself is a special case: a set of things I say are true about myself (“I like cold beer”) which (still) may or may not be true. But because they originate with me, and I might well sign those statements with the same keys I used to pay my mortgage, you might be inclined to believe those stories. They’re a special case, particularly if I am an honest dealer.
On the other hand, there’s privileged profiles. These might include profiles about me from people you trust more than me: if Harvard says I did a degree there, most people will believe them. My proof-of-payments on a student loan is pretty good evidence too, but most people would prefer to hear about my grades from Harvard itself.
Of course, the combination of their privileged profile and my action records is pretty much unbeatable: “they say I got a law degree, and here’s my evidence that I’ve paid for one.”
In the more general case, a privileged profile is a profile that I sign, stating that I (as the person this profile information is about) currently consider it to be correct, as far as I can see. Examples of privileged profiles we might see a lot of: medical records, tax records, court transcripts, but also less official documents like my score record as a player of baseball in the local neighbourhood team could become privileged profiles.
You make a statement about me. I approve that statement about me. That’s a privileged profile. It’s privileged because I said it’s true, not because it was signed off on by the Supreme Court!
Now let’s talk about cryptographic keys. Specifically, about PKI (Public Key Infrastructure) which is a fancy way of saying “address books and cancellation notices.”
You use PKI every day. Every time you go to Amazon or Google, a certificate makes the “https” goodness happen - the all-important protection to your privacy over http indicated with a humble single letter. If your credit card goes over the wifi without https, it can be stolen at any point from your computer through to Amazon’s data center. This is highly problematic, and a lot of time and effort were spent in the 1990s making sure that these systems behaved properly. Remember, the web was never built with credit card type security in mind, it was made for slinging physics papers around!
So the security in the “https” comes from knowing the encryption key that Google wants you to use to protect your data when you log in. Your browser has a list of people that it trusts to verify other people’s identities, and Google presents a signed document from one of those people saying “This key belongs to Amazon, please encrypt your payment details with it when you go to their web site.” What’s interesting about this is that this is clearly a profile. Amazon has no way of identifying itself to you without using the profile provided by this certificate authority. Your browser auto-trusts the registrar, and that’s how you get to Amazon to do your Christmas shopping.
Of course it would be lovely if Amazon had some more direct way of identifying itself to me, particularly when the certificate providers are getting paid when they issue certificate, and are therefore at some risk of simply handing out certificates any time somebody tells them a reasonably plausible story. It’s not like they get sued when something goes wrong, and a certificate registrar lets a fake Microsoft certificate into the wild, for example.
But as things stand there’s no way for Amazon to show me credentials to prove it is who it says it is directly. I can’t use, for example, the purchase records (profiles) provided by 80 million of my neighbours to validate that a site purporting to be Amazon really are who they say they are. It all has to go through the certificate registrar system, and if any one of 1500 or more registrars makes a mistake, fake Amazons can spring up, skim your credit cards when you try to order, and you may never be any the wiser.
Certificate Authority is an antipattern, unless perhaps they are also financially liable with very solid insurance you can claim against if they issue a bad certificate which causes people to be defrauded when they try to reach your business.
I can’t say exactly how we will do PKI in future, but I am confident of at least two things...
There is an enormous amount of room for innovation in public key infrastructure now that we have blockchains to work with. In particular, Simple Public Key Infrastructure is a very good fit for the kinds of social interaction we do in social networks (spanning friends-of-friends for key distribution), and the blockchain itself is more or less the perfect medium for distribution of key revocation certificates.
I’m all ears if anybody wants to innovate in these areas.
If we do not find a compelling alternative to profile-based identities, we are going to live in a world in which we exchange constant surveillance for services, from Facebook through to our own governments. The convergence of interests between national security and advertising is one of the most unholy pacts made in our time.
To get out of this hole we have to find
Blockchains provide every possibility of efficient payments for small services - a dollar here and there - which might be more profitable for many services in the long run than advertising.
I think I’ve made a good case that action-based rather than profile-based identities are possible for at least a large subset of adults. Minors, particularly those too young to even have much profile, still represent a very hard problem in the identity space, not least because of extremely strict data protection laws. But action-based identities are very, very rigid things, and represent (for a good number of people) a credential nearly as hard as a State ID, and with considerably better privacy implications in many cases.
The temptation is always going to be to nail people to the meat. Leaving aside the history of putting serial numbers on human beings with tattoo guns - which should be enough to kill this idea forever - the Office of Personnel Management hack breached the IDs of 22 million US government personnel.
It’s absolutely clear that there is no way to protect biometric data in the real world, at least not with current levels of data security technology. It’s possible we’ll continue to see this issue fudged and hedged with biometric hashing of limited biometrics (“left pinkie only!”) but, in the long run, biometrics probably shouldn’t wind up in databases, or they should wind up in blockchains where everybody knows exactly what has been exposed, and what has not. I think in particular there’s a good case to be made for facial biometrics (already exposed for most Facebook users, or anybody who walks around in cities with CCTV cameras.)
What this leaves us with is a tentative path forwards. Put some things into blockchains, particularly things like the public keys of major organizations, and certificate revocations for people whose keys have been stolen or whatever. Use additional cryptography to provide things like zero knowledge proofs of age without revealing identity. Compartmentalize information about our various roles in ways which make it hard to join profiles without our consent.
The result of this “wise as a serpent” approach might be a mixed ecology of identity solutions, which serves us well enough to navigate around our world, without opening up some of the scary terrain which surrounds us on all sides as these various technologies change, evolve, mutate, mature, and form a continuous surface on which the next phase of human history will be played out. We clearly need new identity infrastructure, but we also need time to experiment, scale experiments, and understand where we are in the ever-changing terrain of the new which so much marks our current lives.
What is not disclosed can be disclosed in future, but what is published now is published forever, encrypted or not.