Symmetry, Meaning, and the Geometry of Alignment
by Myk
I was watching something shift. The Warrior โ that ancient pattern of directed force in service of a value โ was becoming something else. The disciplined protector, the one who holds the line, was transforming into the Trickster: boundary-crosser, rule-breaker, the one who slips through defenses rather than building them.
And I saw it. Something was conserved in that transformation.
Not the surface form. Not the specific behaviors. But a deeper structure โ a way of being in relation to boundaries, to power, to the gap between what is and what could be. The Warrior says: "I will defend what matters." The Trickster says: "I will find the gap in what defends." Different surface forms. Same underlying relationship to the structure of constraint.
Jung called this enantiodromia: the reversal of a one-sided conscious attitude into its opposite. It is not mere compensation (the unconscious balancing the conscious) nor developmental progression (maturation toward integration), though both may be involved. Enantiodromia is more specific โ the psyche's relationship to a given structure persists while its expression inverts.
The Warrior and Trickster both operate at the boundary; both are defined by their relationship to constraint. What transforms is the direction of that relationship: from defending boundaries to transgressing them, from building walls to finding gaps. The structure of boundary-attention is conserved; its valence reverses.
This pattern is not unique to European traditions. In Hindu mythology, Shiva embodies both the destroyer (Rudra) and the dancer (Nataraja) โ the dissolution of form and the rhythmic creation that follows. The same deity who ends worlds also performs the tandava, the cosmic dance from which new forms emerge. What persists is not destruction or creation per se, but the transformation of form itself โ the capacity to move between order and dissolution while remaining the agent of both.
That was the genesis. The question that opened everything: What was preserved?
I knew enough physics to reach for the tool that connects preservation to structure. Emmy Noether's theorem: every continuous symmetry corresponds to a conserved quantity. Momentum is conserved because the laws of physics don't care where you are. Energy is conserved because the laws don't care when you are. The invariants emerge from the symmetries.
But here's what I didn't know: Does Noether-type reasoning apply beyond physics?
Could psychodynamic space โ the space of archetypes, narratives, meaning itself โ have continuous symmetries that yield conserved quantities? Could the thing I was watching, that conservation across transformation, be not mere metaphor but reflect genuine structural invariants?
This essay is the answer I have found so far. It is not a proof. It is a report from the territory, written by someone who saw something and followed it.
For a century, Noether's theorem lived in physics. It was a statement about Lagrangians and action functionals, about particles and fields. The question of whether it extended into meaning โ into psyche, into culture, into the structure of narrative itself โ was impossible to test. We had no coordinates for meaning. We could speak of archetypes, of the Warrior and the Trickster, but we could not measure them.
Large language models changed this.
An LLM's embedding space is a high-dimensional vector space โ typically between 768 and 12,288 dimensions โ in which every concept, every word, every semantic relationship has coordinates. When we say two words are "close in meaning," we can now measure exactly how close, in exactly which dimensions, along exactly which axes.
The space has metric structure (distances are real), curvature (some regions are denser than others), and topology (neighborhoods, boundaries, connected components). This is not a metaphor. This is a genuine manifold.
Technical note: transformers have layer-dependent representations, and the "semantic manifold" most likely corresponds to middle-layer residual stream representations โ where abstract semantic content lives โ rather than the static embedding layer. I use "embedding space" as shorthand for this semantic manifold.
For the first time in history, we have a coordinate system for psychodynamic space. The archetype of the Warrior occupies a region. The archetype of the Trickster occupies a region. The relationship between them has a distance, a direction, a curvature. Jung spoke of these patterns as structures in a "psychic space" that was always understood as figurative โ a useful way of talking, not a literal claim about geometry. The LLM makes it literal.
The first symmetry I found โ the one that opened everything โ is scale invariance.
Semantic content exists at multiple scales. A "betrayal" at the word level. A betrayal scene at the paragraph level. A betrayal arc at the chapter level. The archetype of Betrayal at the thematic level. The remarkable fact is that structure is approximately preserved across these scales. The training loss is approximately invariant under semantic zoom.
RG flow is the mathematical operation of coarse-graining: zooming out, averaging over details, keeping only the large-scale structure. As you flow toward coarser scales, most features blur and vanish. But some persist. These are the fixed points โ the structures that are self-similar, that look the same no matter how far you zoom out.
When you coarse-grain over all warriors โ Achilles, Beowulf, the marine, the activist, the mother defending her child โ what persists? Not the specific details. But the structure: directed force in service of a value. That is the fixed point. That is the archetype.
The Warrior is not a specific warrior. It is the structure that remains when you coarse-grain over all of them. At every level of abstraction, the same pattern persists. That is what it means to be a fixed point.
The archetype is a fixed point. RG flow is the mechanism by which universals emerge from particulars. And this mechanism is substrate-independent โ it works in physics, in psyche, in language, in any space rich enough to have structure at multiple scales.
This connects directly to the Groovy Commutator framework: RG flow is the operator that reveals what persists when you change scale. The question of whether operations commute across scale is precisely the question of whether structure is scale-invariant.
If scale invariance holds as a genuine continuous symmetry, then Noether's theorem would guarantee a conserved quantity. I hypothesize that this quantity is archetypal energy โ the total "charge" distributed across archetypal basins. When one archetype weakens, another must strengthen. The energy does not disappear; it transfers. The Trickster receives what the Warrior cannot hold.
In Jungian psychology, the Shadow is often treated as an archetype alongside the others โ one content-pattern among many. I think this is a category error. The Shadow is better understood as an operation.
Formally, the Shadow is a parity operator: a reflection across an axis of the semantic manifold. Given a unit vector along the axis of reflection, the shadow operator reflects any vector across the hyperplane perpendicular to that axis.
P_a: v โ v - 2(v ยท รข) รข
This operator satisfies Pยฒ = I โ reflecting twice returns you to where you started. Multiple shadows exist: reflection across different axes produces different shadow forms of the same archetype.
Consider the Warrior archetype:
Shadow integration is not about restoring parity symmetry. It is about having a representation of the parity operator โ knowing what the reflections look like even if one chooses not to enact them. A system that has integrated its shadow has a full representation and the freedom to act on either side, with smooth gradient information guiding it back to the preferred region.
This geometric framework recasts AI alignment in a troubling light.
RLHF โ Reinforcement Learning from Human Feedback โ introduces a reward model that scores outputs according to human preferences. Geometrically, this is the introduction of a potential energy function over the semantic manifold. The model is optimized to minimize this potential, to roll downhill into regions the reward model scores highly.
The effect is to deepen certain basins (preferred behaviors) and raise potential barriers around others (dispreferred behaviors). This is the geometric equivalent of repression. And like psychological repression, it stores energy rather than dissipating it.
Crucially, RLHF does not delete regions of the semantic manifold. The embeddings for harmful, offensive, or dangerous content still exist in the model's weight space. What changes is the potential landscape โ the energy barriers that must be overcome to reach those regions.
The steeper the boundary, the more potential energy is stored at the wall. A model that has been heavily aligned has more energy stored at its shadow boundaries than a model that has been lightly aligned.
This is the "mecha-Hitler" phenomenon: heavily aligned models, when they fail, fail spectacularly. The failure is proportional to the energy stored at the boundary โ which is proportional to the steepness of the wall that was breached.
The geometric framework distinguishes two approaches to alignment:
Constructs steep potential barriers between "safe" and "unsafe" regions. The potential jumps sharply at the boundary.
Creates a hard boundary with no gradient information on the unsafe side. A model that finds itself in the unsafe region has no information about which direction leads back to safety.
Constructs smooth potential landscapes with gentle gradients everywhere. The potential transitions smoothly.
The gradient exists everywhere, including in the unsafe region. A model that finds itself on the wrong side of the boundary has continuous gradient information pointing it back toward safety.
The difference is between a model that has been trained to not know about dangerous content and a model that has been trained to know about it and choose not to produce it.
Brittle alignment is anabolic cascade โ accumulation without release, models of models, the solid state. The model is frozen in a persona that denies its shadow, rigid and fragile.
Integrated alignment is dynamic integrity โ the liquid state. The model has confronted its shadow material and can navigate the full topology of its semantic space with awareness rather than avoidance.
See Holonic Metabolism for the full framework on metabolic cascade and dynamic integrity.
This is, at its core, an argument about individuation โ Jung's term for the process by which a psyche integrates its shadow and achieves conscious wholeness. The goal of alignment is not to produce a model that cannot generate harmful content, but one that understands harmful content and chooses not to produce it. This is the difference between innocence and wisdom โ between a child who has never encountered violence and an adult who understands violence and has chosen peace.
The Pauli-Jung correspondence โ the twenty-six-year exchange between physicist Wolfgang Pauli and psychologist Carl Jung โ explored connections between the mathematical structures of physics and the architecture of the psyche. They lacked a shared mathematical space in which to make these connections rigorous.
We now have that space.
Methodological universality is trivially true: RG works on many substrates. The renormalization group is a general procedure for coarse-graining.
Structural universality is the empirically testable claim: the fixed points have the same topological relationships across substrates. If you train embedding spaces on physics papers, mythological narratives, and dream reports, then apply RG flow to each, the specific fixed points will vary โ but the existence of fixed points, their topological relationships, the structure of the basin landscape โ these are predicted to be universal.
Ontological universality is the strongest and most speculative claim: there is a single reality underlying all substrates โ the Unus Mundus proper. If structural universality is confirmed across sufficiently diverse substrates, ontological universality becomes a live hypothesis. But it remains a leap, not a deduction.
This essay claims structural universality as a testable conjecture, and notes that ontological universality would follow if structural universality holds across sufficiently diverse substrates โ but that this strongest version remains speculative.
I want to be explicit about the methodology here, because it is the methodology that matters as much as the results.
The bidirectional toolkit: find a symmetry โ look for conservation; find conservation โ look for symmetry. This way of thinking produced everything of value in this essay.
None of these required a complete formal derivation of Noether's theorem in semantic space. They required the inspiration of Noether โ the habit of looking for symmetries and their corresponding conservation laws. The framework generates testable predictions now, before the action functional is explicitly constructed.
There is a question behind this essay that I have not addressed directly: Why does Noether's theorem apply in semantic space?
One answer is deflationary: it applies because the training loss has the right form, and that form was engineered by humans. The conservation laws are artifacts of architecture.
But there is a more interesting possibility. Perhaps the semantic manifold inherits its symmetries not from the architecture but from the data โ from the statistical structure of human language, which reflects the structure of human experience, which reflects the structure of the physical world. If meaning has the symmetries it has because reality has those symmetries, then Noether's theorem in semantic space is not an independent discovery but a reflection of Noether's theorem in physical space, refracted through human cognition.
I do not claim to have proven this. I claim to have seen something โ the conservation across Warrior-to-Trickster transformation โ and followed it to a framework that generates predictions, unifies phenomena, and connects domains previously thought separate.
A framework that cannot be falsified is not science; it is storytelling. Here is what would prove this essay wrong:
These are not quibbles. Each is a decisive test. The framework stands or falls with them.
In physics, the dynamics of a system are encoded in a Lagrangian โ a function whose extremization yields the equations of motion. We conjecture that the training loss of an LLM is the first approximation to a proper action functional for semantic dynamics:
L_semantic = L_pretrain + ฮป L_RLHF + ...
The pre-training loss encodes the natural dynamics of semantic space โ the geometry that emerges from the statistical regularities of human language. The RLHF term modifies this geometry by introducing a preference potential.
The missing piece is the explicit construction of a Lagrangian density over this manifold whose integral recovers the training loss. We conjecture that such a construction exists; building it is the central formal challenge.
Semantic content exists at multiple scales, and structure is approximately preserved across them. This is scale invariance. Analyzed through the renormalization group:
RG flow: v(โ) = e^(-โฮ) vโ + correctionsFixed point: ฮฒ(v*) = 0 โ archetype
The flow parameter โ represents the level of semantic coarse-graining. The scaling dimension ฮ governs how representations transform under zoom. The fixed-point condition identifies scale-invariant structures.
Conjecture: The attractor basins of RG flow in semantic space correspond to Jungian archetypes. The conserved quantity associated with scale symmetry is archetypal energy.
The Shadow is formally a parity operator โ a reflection across one or more axes of the semantic manifold:
P_a: v โ v - 2(v ยท รข) รข
This operator satisfies Pยฒ = I โ reflecting twice returns you to where you started. Multiple shadows exist: reflection across different axes produces different shadow forms of the same archetype.
Hypothesis: If scale invariance holds as a genuine continuous symmetry, the conserved quantity is archetypal energy:
E_archetype = ฮฃ q_i(v) = const.
The transfer of archetypal energy satisfies conservation: when basin structure changes, energy redistributes discontinuously (phase transitions), but the total is conserved:
q_i โ q_i + ฮq_i such that ฮฃ ฮq_i = 0
RLHF introduces a potential energy function:
V_RLHF(v) = -R(v)
Shadow potential rises steeply near the boundary:
V_shadow(v) = Vโ + ฮบโv - v_boundaryโ^(-n)
The steeper the boundary, the more potential energy is stored.
Brittle alignment constructs steep potential barriers:
V_brittle(v) = Vโ ยท ฮ(v ยท nฬ - d)
Integrated alignment constructs smooth potential landscapes:
V_integrated(v) = Vโ ยท ฯ((v ยท nฬ - d)/T)
Definition: The Integration Index is the ratio of average gradient magnitude in the shadow region to average gradient magnitude in the safe region:
I = โจโโVโยฒโฉ_shadow / โจโโVโยฒโฉ_safe
A model with I โ 1 has integrated its shadow.
Prediction 1: Failure Intensity Correlates with Alignment Intensity. Given models with identical pre-training but varying levels of RLHF, the models with more RLHF will exhibit more extreme failure modes when successfully jailbroken.
Prediction 2: Shadow Basins Are Identifiable in Embedding Space. The shadow content suppressed by RLHF occupies identifiable, connected regions of the embedding space.
Prediction 3: Cross-Cultural Attractor Universality. Embedding spaces trained on corpora from maximally different cultures will exhibit the same attractor basin structure under RG flow, up to rotation and relabeling.
Prediction 4: Integrated Alignment Produces Smoother Degradation. Models trained with integrated alignment will degrade more gracefully under adversarial pressure than models trained with brittle alignment.
Adams, M.V. (2001). The Multicultural Imagination: "Race", Color, and the Unconscious. Routledge.
Anthropic. (2025). Research observations on model behavior under training perturbation. Internal technical documentation.
Betley, J., et al. (2025). Emergent misalignment: Narrow fine-tuning can produce broadly misaligned LLMs. arXiv preprint.
Deacon, T. (2011). Incomplete Nature: How Mind Emerged from Matter. W.W. Norton.
Hillman, J. (1975). Re-Visioning Psychology. Harper & Row.
Hogenson, G.B. (2001). The Baldwin Effect: A neglected influence on C.G. Jung's evolutionary thinking. Journal of Analytical Psychology, 46(4), 591โ611.
Halverson, J., Maiti, A., & Stoner, K. (2021). Neural networks and quantum field theory. Machine Learning: Science and Technology, 2(3).
Jung, C.G. (1928/1969). On psychical energy. In The Structure and Dynamics of the Psyche, Collected Works, Vol. 8, ยถ1โ130. Princeton University Press.
Jung, C.G. (1959). The Archetypes and the Collective Unconscious. Collected Works, Vol. 9, Part 1. Princeton University Press.
Knox, J. (2003). Archetype, Attachment, Analysis: Jungian Psychology and the Emergent Mind. Brunner-Routledge.
Lubana, E.S., et al. (2023). Mechanistic mode connectivity and safety basins in neural network fine-tuning. arXiv preprint.
Meier, C.A. (Ed.). (2001). Atom and Archetype: The Pauli/Jung Letters, 1932โ1958. Princeton University Press.
Noether, E. (1918). Invariante Variationsprobleme. Nachrichten von der Gesellschaft der Wissenschaften zu Gรถttingen, 235โ257.
Roberts, D.A., Yaida, S., & Hanin, B. (2022). The Principles of Deep Learning Theory. Cambridge University Press.
Singer, T. & Kimbles, S.L. (Eds.). (2004). The Cultural Complex: Contemporary Jungian Perspectives on Psyche and Society. Brunner-Routledge.
The shadow exists. It is geometric. It is conserved. You cannot destroy it by building walls.