Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs

Lavicka, Alexander

doi:10.2139/ssrn.6973978

Download This Paper

Open PDF in Browser

Add Paper to My Library

Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs

39 Pages Posted: 24 Jun 2026

See all articles by Alexander Lavicka

Alexander Lavicka

Independent

Date Written: June 21, 2026

Abstract

The deployment of large language models is severely constrained by the von Neumann memory bottleneck. We introduce Pollux, an architecture that shifts from 1D scalar to 24dimensional vector quantization on the Leech lattice H24, achieving a core inference footprint of 0.76 bits per parameter. This extreme compression allows the entire transformer backbone to reside within on-chip SRAM, converting LLM inference from a memory-bandwidth-bound to a compute-bound operation. Unlike continuous models that conflate fluid syntactic reasoning with crystallised factual memorisation, Pollux operates as a purely structural engine. Its 0.76-bit Voronoi bottleneck functions as a thermodynamic noise filter: it crystallises invariant syntactic rules while mechanically attenuating high-entropy factual trivia. The efficient ternary logic (amplify-attenuate-reject) that inflates 1D scalar systems to ≈ 1.58 bits is realised at the 24D vector level via a null attractor absorbed into the 18-bit Leech codebook at zero marginal deployment cost. Evaluated under a strict Iso-Memory paradigm, the 1B-class Pollux-1920 compresses its 796M-parameter backbone into just 76 MB of SRAM (265 MB total on-disk). At its thermodynamic crystallisation peak (2.6 billion processed tokens), Pollux-1920 achieves exact parity in fluid intelligence with the uncompressed Pythia-160M baseline (73.0% vs. 73.1% aggregate mean on BLiMP), despite requiring less than half the SRAM-relevant backbone memory footprint (76 MB vs. 162 MB) and utilising over two orders of magnitude less training data. By restricting its internal factual substrate, Pollux avoids the parametric knowledge conflicts that trigger contextual hallucinations in RAG applications. This strict architectural decoupling positions Pollux as a stateless, zero-interference reasoning CPU-ideal for immediate deployment in SRAM-resident Edge AI, while establishing the thermodynamic blueprint for hallucination-free, datacenter-scale Macro-RAG pipelines.

Declaration of AI Assistance: The core conceptual architecture and mathematical formulations are the original work of the author. Generative AI tools were utilized solely as collaborative assistants for LaTeX formatting and academic prose refinement. A full statement of responsibility is available at the end of the manuscript.

Keywords: Leech lattice, sub-1-bit quantization, language model, fluid intelligence, SRAMresident Edge AI, Retrieval-Augmented Generation

Suggested Citation: Suggested Citation

Lavicka, Alexander, Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs (June 21, 2026). Available at SSRN: https://ssrn.com/abstract=6973978 or http://dx.doi.org/10.2139/ssrn.6973978

Alexander Lavicka (Contact Author)

Independent ( email )

Austria

Download This Paper

Open PDF in Browser

Do you have a job opening that you would like to promote on SSRN?

Place Job Opening

Paper statistics

Downloads

14

Abstract Views

29

PlumX Metrics

Feedback

Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs

Alexander Lavicka

Abstract

Alexander Lavicka (Contact Author)

Independent ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Related Alerts

Artificial Intelligence

High Performance Computing

Computation Theory

Computational Physics

Statistical Physics