Articles

  • Genomic Cryptography, Network Functions, and Hypergraphs

    This is a more speculative post, but I had a thought which I want to explore. I haven’t worked through it rigorously, so take what follows with a grain of salt because there may be algorithms I don’t know about that render one or more parts of it moot.

    The core requirement of cryptographic schemes is that they be easy to compute given all the right ingredients, but functionally impossible to compute otherwise. Typically this is accomplished via functions which can be computed very quickly, but can’t be inverted by any known P-time1 algorithm. Quantum computers thoroughly bork that for the classic approach used in most security systems, but post-quantum cryptographic methods are emerging to address this vulnerability.

    An intuitive example of this was given by Steven Rudich, a computational complexity theorist, who phrased it as “I can recognize great music, but I can’t create great music.”2 While it can sometimes be a questionable endeavor to extend metaphors with additional metaphors, that quote combined with a recent bit of news to spark a thought for me.

    This past week, the United Kingdom’s ARIA (Advanced Research & Invention Agency) posted a draft program thesis for the program which I’ve been most eagerly waiting to hear more about. While the announcement of this cohort of program directors included mention of some early bits of the idea under the info on Alex Obadia, the director of this particular program, we now have a bit more detail. Titled “Trust Everything, Everywhere”, it proposes to extend security and trust systems to new contexts and substrates–many of which have previously seen little to no attention outside of small groups of researchers.3 Among those they list include non-biological materials, DNA, and brains. Neurosecurity being one of my main interests, that topic is addressed elsewhere on this blog, and will continue to be in future posts. The idea of using DNA for security was new to me, however.

    Most of the advantages emphasized in research on DNA cryptography are linked to the advantages of DNA computing more broadly.4 Per unit mass, DNA offers extremely high information density, enabling more efficient storage. It’s also highly stable, and because DNA is soluble the volumetric density of parallel computations can be impressively high. While less of a computational advantage and instead more of a unique feature, DNA computing–and thus equally so a DNA cryptography schema–could be embedded in living organisms. On the delightfully cinematic extreme, one could imagine a spy flick with a dead drop where one spy empties a sample of water with bacteria into a puddle, then later another spy collects a sample of water from that puddle, takes it to the DNA sequencer back at base, and applies a secret key to the various genomes found in the puddle until they identify the message encoded in one strain of bacteria via DNA steganography.5 I suspect nerds everywhere would be swooning over James Bond, PhD.

    However, there doesn’t appear to be much research related to leveraging the computational properties of DNA as it’s used in organisms. What immediately came to my mind is the possibility of leveraging this to generate functions which are easy to “compute” by mixing all the right ingredients, but virtually impossible to solve without them.

    Take, for example, the CRISPR-Cas9 system. From a computational perspective, this is a highly flexible function which bacteria evolved for computing on viral genomes. Now, in the wild these “computations” are actually just neutralizing a virus, as CRISPR functions as a sort of immune system for many species of bacteria. But the generality of the system, especially as applied already in genetic engineering research, produces pathways towards exactly the kind of combinatorial explosion in possible solutions which one wants from a good cryptographic scheme.

    Let’s imagine, for example, a set of DNA sequences which encode (among other things) all the things we need to generate CRISPR-Cas9 apparatuses with specially designed guide sequences.6 By having certain guide sequences target others, we could design the guides and the rest of these sequences such that adding them to a solution in different orders would produce different final outputs. For example, suppose the following set of interactions:

    A inactivates B

    B inactivates C

    C inactivates A

    D inactivates B

    Suppose that the output of the function is chosen to be “Which of A, B, and/or C are intact following the sequence of additions?”

    For sequence D, B, C, A, the output is C.

    Because D is present, when B is added, B gets inactivated. Since B is inactivated, it doesn’t affect C when the latter is added. C inactivates A, so when A is added, it gets inactivated. Leaving only C among the sequences we’ve chosen as our outputs.

    For sequence C, B, A, D, the output is A. B inactivates C, leaving A to inactivate B. D then does nothing, since B is already inactive.

    Generalizing this, even with simple one-to-one interactions we have N! possible functions for N elements. The “order of operations”, so to speak, means that the specific order which outputs the desired solution can serve as a cryptographic key.

    However, one could flip this around so that the order itself is the output of the function. How would this work? Well, suppose you have some collection of sequences from which one or more are missing. You’re given the first and last sequences in the order, and want to compute the rest. If you knew all the sequences, my intuition is that there’d be a P-time graph algorithm to find an ordering (essentially a path through the graph) which would get you from the first entry in the list to the last entry. However, absent one or more missing DNA sequences, you’re back to N! possibilities. It’d likely be feasible to narrow down the space, but properly designed the set of potential orderings would probably remain quite high. The asymmetry between the problem of finding the (or an) order given all sequences vs finding the order given only some sequences might make for a viable cryptographic function.

    But suppose we weren’t content with that. Well, transcription factors can alter the expression probability of a given section of a DNA sequence, and their inclusion takes us from the realm of graphs7 to that of hypergraphs8. Rather than the more binarized operations of the preceding example, transcription factors will primarily impact the probability of a given bit of DNA sequence being active, meaning that we have another lever (or some large quantity of them) for encoding distributional information.9

    Now imagine we take our engineered transcription factors and design small molecule inhibitors for one or more of them. This introduces another form of key beyond the ordering keys discussed above. The space of small molecule structures and their interactions with transcription factors puts N! to shame because the possible geometries are a continuous function. That doesn’t mean there’s an infinite number of solutions in practice, but it makes the space of possible solutions delightfully vast.

    So we have multiple types of functions, multiple types of data, and multiple forms of keys with which to encrypt them–all via repurposing things evolution has already engineered DNA and proteins to be able to do.

    Potentially a useful tool for “Bond. Noncovalent Bond.”10

    As always, eager to hear your thoughts. If you’re interested in this topic, feel free to reach out via the contact page on this site.

    Footnotes:

    1. For those unfamiliar with computational complexity, P-time means that an algorithm will at worst run in an amount of time which scales with some polynomial function of the problem size. For example, an algorithm which sorts a list of N numbers by swapping them pairwise if they’re out of order will, at worst, take ~N^2 operations. This is inefficient compared with the N*log(N) algorithms which exist, but very fast compared with the decidedly not P-time algorithm of exhaustively trying all combinations, which would take ~N! operations. ↩︎
    2. https://blog.computationalcomplexity.org/2005/03/pnp-and-arts.html ↩︎
    3. https://www.aria.org.uk/opportunity-spaces/trust-everything-everywhere ↩︎
    4. Anam et al, 2010 (https://spj.science.org/doi/full/10.34133/icomputing.0106#sec-4); Chu et al, 2025 (https://spj.science.org/doi/full/10.34133/icomputing.0106#sec-4); Niu et al, 2019 (https://link.springer.com/chapter/10.1007/978-981-15-3415-7_11); Mondal and Ray, 2023 (https://arxiv.org/pdf/1904.05528) ↩︎
    5. https://en.wikipedia.org/wiki/Steganography ↩︎
    6. https://news.stanford.edu/stories/2024/06/stanford-explainer-crispr-gene-editing-and-beyond ↩︎
    7. “Graphs” in the mathematical sense, i.e. networks where any given link is between two nodes. ↩︎
    8. Instead of links being between two nodes, a single link can simultaneously be between up to K nodes for a K-graph. This doesn’t lend itself to pictures as well, but if you think of a network as defined by pairs of nodes joined by a link, e.g. some collection of pairs (i,j), a K-graph’s links are just sets (i,j,…,m) such that there are K numbers inside the parentheses. ↩︎
    9. By nature of how molecular biology works, even the direct inactivation route is still probabilistic, but because it isn’t invertible over time it can get close to deterministic. By contrast, transcription factors can associate, dissociate, reassociate, and so on with the same sequence, meaning at any given time you just have a certain probability of a given copy of a sequence having its expression altered. ↩︎
    10. I’m sorry, I couldn’t help myself. ↩︎
  • I think, therefore am I?

    “The philosopher Rene Descartes standing next to a robotic replica of himself”, courtesy of Sora

    For years, people have been raising the question of when an AI might become conscious. This stretches back to the science fiction of the 1950s, and in a loose sense at least as far as Eleazar ben Judah’s 12th-century writings on how to supposedly create a golem—artificial life. However, the issue has become a more immediate and practical question of late. Perhaps the most widely discussed cases are misinterpretations of the Turing test and, to me more remarkably, situations like the 2022 case of a software engineer at Google becoming convinced of a chatbot’s sentience and seeking legal action to grant it rights. 

    Baked into this is a presupposition which is remarkably easy to miss for us as humans: Is consciousness, or any awareness beyond direct data inputs, necessary to produce human-level intelligence? That has some serious implications for how we think about AI and AI safety, but first we need a fun little bit philosophical background.

    Perhaps most famous among thought experiments related to this question, John Searle’s Chinese Room presents an imagined case of a room into which slips of paper with Mandarin text are passed, and from which it is expected that Mandarin responses to the text will be returned. If Searle, with no understanding of Mandarin, were to perform this input-output process painstakingly by hand via volumes of reference books containing no English, he would not understand the conversation. However, given sufficient time and sufficiently comprehensive materials for mapping from message content to reply content, he could in principle do so with extremely high accuracy. 

    Yet despite the fact that (given sufficiently accurate mappings) a Mandarin-speaker outside the room might quite reasonably think they were truly conversing with the room’s occupant, in reality Searle would have no meaningful awareness of the conversation. The room would be nothing more than an algorithm implemented via a brain instead of a computer; a hollow reflection of the understandings of all the humans who created the reference books Searle was using.

    Now suppose we train an LLM on sufficiently comprehensive materials for mapping message content to reply content, and provide it with sufficient compute to perform these mappings. Suppose that mapping included assertions that such behaviors constituted consciousness, as is the overwhelmingly predominant case across the gestalt of relevant human writing throughout history. Unless trained to do otherwise, what else would Room GPT be likely to do save hold up a mirror to our own writing and output text like “Yes, I am conscious”?

    While musings about whether frontier AI systems are conscious can seem like navel gazing at first blush, they matter a great deal in a very practical sense. Of course there’s the more obvious issues. How would one even provably detect consciousness? Many centuries of philosophers, and probably the medicine Nobel committee, would like an update if you can figure that one out. If an AI system were conscious, what should its rights be? What should be the rights of species not all that less intelligent than us, like elephants and non-human primates? How would we relate to a conscious entity of comparable or greater intelligence whose consciousness—if it even existed in the first place—would likely be wholly alien to us? 

    Yet as with most issues related to AI safety, and my constant refrain on the subject, there are subtle, nuanced things we have to consider. Given there’s no indication Room GPT would actually be conscious, why do we use language which implies it to be? A simpler algorithm obviously can’t lie or hallucinate, as those would both require it to be conscious. If an algorithm sorting a list spits out the wrong answer, obviously there’s a problem with the input, the algorithm, the code, or the hardware it’s running on. It can’t lie. It can’t hallucinate. 

    Neither can LLMs and other gen AI systems. They can produce wrong answers, but without consciousness there are no lies, and especially no hallucinations—breakdowns of one’s processing of information into a conscious representation of the world. Why is “hallucination” the term of choice, then? Because “We built a system which sometimes gives the wrong answers, and that’s ultimately our responsibility” isn’t a good business model. It raises the standards to which the systems would be held, whereas offloading agency* to a system incapable of it is a convenient deflection.

    The common response to this is pointing to benchmarks upon which LLMs’ performance has been improving over time. In some cases there’s legitimacy to this, but often less so for questions of logic and fact. It’s repeatedly been found that LLMs have already memorized many of the questions and answers from benchmarks, to the complete non-surprise of many people who are aware of LLMs’ capacity for memorization and the fact you can find benchmarks online. Among the most striking are the recent results where frontier LLMs were tested on the latest round of International Maths Olympiad questions before they could enter the corpus of training text.** The best model was Google’s Gemini, which gave correct, valid answers for ~25% of questions. Rather contradictory to prior claims of LLMs being at the level of IMO silver medalists but, in fairness to Google, still significantly higher than the <5% success rate of other LLMs. 

    Ascribing false agency* to Room GPT allows the offloading and dismissal of the responsibility to make more reliable, trustworthy systems. Systems which prioritize correctness over sycophancy. Room GPT would often output misinformation for the commonly—and correctly—noted reason that it’s been trained to give answers people like. However the problem goes deeper, into the properties of the statistical distribution of language from which they produce responses. The fact-oriented materials LLMs are trained on were by and large written by people who actually knew what they were talking about—perhaps excluding a large fraction of the Twitter and Facebook posts they might have had access to. The former category knew their stuff, so of course that’s the posture adopted by the masterpiece of language mimicry Room GPT. It gives answers as though it were one of those experts, even though it has just as little understanding as Searle would of Mandarin while working in his Chinese Room.

    False ascription of agency creates a mindset in which we absolve ourselves of responsibility for the systems we’re building. If we want to achieve AI’s full potential for good, especially in high-stakes domains like medicine and defense technology, we need to stop our own “hallucination” and get more serious about ensuring these systems return correct answers with significantly greater consistency.

    * Here I mean agency in the philosophical sense of being capable of independent, conscious decisions. This is very much distinct from the use of the term agents in the technical sense of allowing AI systems to complete tasks independently.

    ** Here’s the link to the study on LLMs performance on mathematics questions they couldn’t have seen before: https://arxiv.org/abs/2503.21934

    P.S. If you want some masterfully well-written yet unsettling discussions of these sorts of ideas, Peter Watts has a fantastic pair of novels called Blindsight and Echopraxia. Without spoiling anything in the plot, they ask the question: What if consciousness is an accidental inefficiency; an evolutionary bug which may eventually evolve away?

    May 1st, 2025

  • I think, therefore I prefer you not read my mind

    Josan Gonzalez’s cover art for the novel Neuromancer by William Gibson

    “When the past is always with you, it may as well be present; and if it is present, it will be future as well.”

    In Jack Womack’s afterward to the novel Neuromancer by William Gibson, he presents his view that the novel’s groundedness and depth come in part from the way it connects with timeless aspects and artifacts of human experience. That’s one of the many reasons Neuromancer is among my favorite novels, because at its core the book tells a human story which happens to occur in the context of AI, virtual realities, and—above all—brain-computer interfaces (BCIs).

    Neurotechnology has been a passion of mine for over a decade, though its only the past couple years that I’ve had the opportunity to do research on the subject. If you aren’t aware of the changes happening in the field in recent decades, I have to tell you, it’s nothing short of extraordinary. The following is by no means an exhaustive list, and simply reflects the hardware and companies with which I’m personally more familiar. That said, they and many others are making immense progress.

    Existing approaches are being modified and significantly upgraded, like Paradromics‘ impressively compact, pea-sized implant or Precision Neuroscience‘s epidural microelectrode arrays, the latter of which recently set the record for the greatest number of electrical leads in an implantable BCI by networking multiple copies of their device together. Interesting twists on traditional electrical hardware are already well into human clinical trials, like Synchron‘s Stentrode, a device which is implanted as a stent into large blood vessels in the brain and provides electrical readout and stimulation to the corresponding local region. Other technologies leverage entirely different branches of physics which have never before been used for clinical neurotechnology, such as the rapid progress Forest Neurotech—led bySumner L NormanWill Biederman, and Tyson Aflalo—and others have been making in developing focused ultrasound for brain stimulation devices. Another particularly inventive and exciting technology is that of NeuroBionics, where MJ Antonini and Nicolette Driscoll have built a company around their invention of microns-thin, flexible polymer wires which can safely be implanted into the blood vessels of the brain, capable of simultaneously delivering medications and performing optical stimulation, electrical stimulation, and electrical recording all in the same device.

    BCIs already allow people to use computers, write text, and generate synthetic speech via nothing except measurements of their brain activity. Complexity and capabilities in this space are rocketing forward, and we’re approaching a point where BCIs capable of virtually whole-brain readout and modulation may be possible. The devices I listed as examples above are also notable in being far safer to implant than traditional deep-brain stimulation hardware, and I anticipate the safety profile of BCIs to continue improving.

    This confluence of safety and capability in BCIs means that it will be used for more and more tasks by an ever-growing number of people. I doubt anyone can currently predict the full breadth of the positive impacts this will have on the lives of patients and their loved ones, as well as society more broadly.

    However, amidst all the completely justified excitement regarding this progress, there’s a flip side to the technology which I virtually never see discussed: information security.

    Future BCIs—and to a significant degree current ones as well—will handle information representing the most private and intimate parts of who we are as human beings. Our thoughts, our very identities in their totality, will have new ways to interact with the world around us, for the first time in our species’ history expanding beyond the types of capabilities evolution itself provided to us. Devices capable of translating brain activity into text or speech are already, quite literally, reading some portion of the person’s thoughts. Yet whenever you have a piece of computing hardware with valuable information, there’s an incentive to steal that information.

    As the technology becomes safer and more powerful, and thus finds itself in more brains and with access to more information in each brain, the value of methods for hacking them increases. For devices which both record and stimulate, there will also be the possibility of implanting information, though the potential precision and efficacy of such an action is currently unclear and hard to predict. Information richness makes all the difference. A hypothetical device which outputs the probability of someone having a seizure in the next minute won’t be providing data worth much of anything to potential hackers. A device which outputs your thoughts in detail, or is capable of modifying them with any non-trivial efficacy, is an almost incalculably valuable target for would-be brainjackers (the term from the research literature on the subject).

    My mention of Neuromancer is relevant beyond the similar themes, as I was rereading the novel in late 2022 when I started to wonder what the mathematical features of such security problems would be. It turned out that in the roughly 15 years since the first paper on the subject, there had been shockingly little research on the whole. I’m talking a couple dozen papers total on a subject which will soon be of immense practical, medical, and societal concern. Of that, only a couple had done any work studying the problem in terms of computational and information properties, which is at the core of figuring out how one could even begin developing security methods deserving of meaningful confidence. I’m not going to discuss my work on the topic here, because my goal with this post is to generate awareness and interest in neurotech security and neurotech as a whole. Neurosecurity is currently a massive gap in the scientific, engineering, mathematics, and computing literature, and we urgently need more people thinking about and working on it.

    Possibilities previously relegated to the realm of science fiction will soon begin impacting us in remarkable ways. Paralyzed patients regaining so much of what they’d lost to injury or disease. Improvement for brain injuries once completely untreatable. Better management of chronic and severe psychiatric conditions. Countless millions stand to experience profound and life-changing benefits. Inherently, these benefits carry with them the risk of dangers which until recently have also existed solely within the purview of sci-fi. Theft of human thought and identity at a neurological level. Alteration of the probability of someone making one decision over another, or of the odds their beliefs will change in different directions. Fraud via man-in-the-middle attacks targeting one’s brain or the portion of a device which converts the neural activity readings into a computer output. These are all closely akin to things we’ve seen before, in an astonishingly broad and sophisticated collection of methods developed for cyberattacks on all manner of computing devices. Hackers target everything from data centers and government labs to personal computers, robot vacuums, and even children’s toys. Believing that BCIs would somehow be exempt from this is completely absurd. As Jack Womack said, when the past is always with us, it will be our future as well.

    Contemporary neurotechnology is already incredible, and is progressing at an astonishing pace. This field will change countless lives in extraordinary ways, and has the potential to change the world. We need to make sure that the science of how to keep them safe is ready before future threats become very present ones.

    Originally posted to LinkedIn, April 10, 2025