Taming AI Bots: Controllability of Neural States in Large Language Models
ABSTRACT:
I will present a view of large language models (LLMs) as stochastic dynamical systems, for which the notion of controllability is well established. From this view, it is easy to see that the ``state of mind'' of an LLM can be easily steered by a suitable choice of input, given enough time and memory. However, the space of interest for an LLM is not that of words, but rather the set of ``meanings'' expressible as sentences that a human could have spoken, and would understand. Unfortunately, unlike controllability, the notions of ``meaning'' and ``understanding'' are not usually formalized in a way that is relatable to LLMs in use today.
I will propose a simplistic definition of meaning that is compatible with at least some theories found in Epistemology, and relate it to functional characteristics of trained LLMs. Then, I will describe both necessary and sufficient conditions for controllability in the space of meanings. I will show that a well-trained LLM establishes a topology and geometry in the space of meanings, whose embedding space has words (tokens) as coordinate axes. In this space, meanings are equivalence classes of trajectories (complete sentences).
I will then argue that meaning attribution requires an external grounding mechanism, and relate LLMs with models of the physical scene inferred from images. There, I will highlight the analogy between meanings inferred from sequences of words, and the ``physical scene'' inferred from collections of images. But while the entity that generates textual meanings (the human brain) is not accessible for experimentation, the physical scene can be probed and falsified.
Joint work with Alessandro Achille, Matthew Trager, Giovanni Paolini, Pratik Chaudhari, and others