Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs

39 Pages Posted: 24 Jun 2026

Date Written: June 21, 2026

Abstract

The deployment of large language models is severely constrained by the von Neumann memory bottleneck. We introduce Pollux, an architecture that shifts from 1D scalar to 24dimensional vector quantization on the Leech lattice H24, achieving a core inference footprint of 0.76 bits per parameter. This extreme compression allows the entire transformer backbone to reside within on-chip SRAM, converting LLM inference from a memory-bandwidth-bound to a compute-bound operation. Unlike continuous models that conflate fluid syntactic reasoning with crystallised factual memorisation, Pollux operates as a purely structural engine. Its 0.76-bit Voronoi bottleneck functions as a thermodynamic noise filter: it crystallises invariant syntactic rules while mechanically attenuating high-entropy factual trivia. The efficient ternary logic (amplify-attenuate-reject) that inflates 1D scalar systems to ≈ 1.58 bits is realised at the 24D vector level via a null attractor absorbed into the 18-bit Leech codebook at zero marginal deployment cost. Evaluated under a strict Iso-Memory paradigm, the 1B-class Pollux-1920 compresses its 796M-parameter backbone into just 76 MB of SRAM (265 MB total on-disk). At its thermodynamic crystallisation peak (2.6 billion processed tokens), Pollux-1920 achieves exact parity in fluid intelligence with the uncompressed Pythia-160M baseline (73.0% vs. 73.1% aggregate mean on BLiMP), despite requiring less than half the SRAM-relevant backbone memory footprint (76 MB vs. 162 MB) and utilising over two orders of magnitude less training data. By restricting its internal factual substrate, Pollux avoids the parametric knowledge conflicts that trigger contextual hallucinations in RAG applications. This strict architectural decoupling positions Pollux as a stateless, zero-interference reasoning CPU-ideal for immediate deployment in SRAM-resident Edge AI, while establishing the thermodynamic blueprint for hallucination-free, datacenter-scale Macro-RAG pipelines.

Declaration of AI Assistance: The core conceptual architecture and mathematical formulations are the original work of the author. Generative AI tools were utilized solely as collaborative assistants for LaTeX formatting and academic prose refinement. A full statement of responsibility is available at the end of the manuscript.

Keywords: Leech lattice, sub-1-bit quantization, language model, fluid intelligence, SRAMresident Edge AI, Retrieval-Augmented Generation

Suggested Citation

Lavicka, Alexander, Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs (June 21, 2026). Available at SSRN: https://ssrn.com/abstract=6973978 or http://dx.doi.org/10.2139/ssrn.6973978

Alexander Lavicka (Contact Author)

Independent ( email )

Austria

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
14
Abstract Views
29
PlumX Metrics