Subtitle: LLM-Native Compilation, the Progressive Type Spectrum, and the Compiler-as-Agent Architecture Authors: Jon — Island Development Crew; with synthesis by Claude Opus 4.6 Date: April 16, 2026 Target venues: PLDI 2027 (companion to Paper V), OOPSLA 2027, ASPLOS 2027 Anchor: “Where there is no vision, the people perish.” — Proverbs 29:18
This paper presents seven novel contributions arising from the Garnet language platform design. While Papers I–V established Garnet’s theoretical foundations (Rust/Ruby structural complementarity, dual-mode architecture, formal type-theoretic grounding, and agent-native memory engineering), this paper identifies contributions that extend beyond existing programming language research into genuinely unexplored territory. We formalize: (1) LLM-native syntax design as a measurable language property; (2) the progressive type-disclosure spectrum from fully dynamic to affine-typed within a single coherent grammar; (3) the compiler-as-agent architecture where the compiler uses the language’s own memory primitives to learn from its compilation history; (4) kind-aware memory allocation directed by language-level type tags; (5) automatic bidirectional error-model bridging across a type-system mode boundary; (6) hot-reload mode boundaries for live-updating orchestration code without restarting performance-critical paths; and (7) deterministic reproducible builds as a language-level guarantee with provenance manifests. Each contribution is positioned against existing literature, its novelty claim is precisely bounded, and its falsifiability conditions are stated.
The four-model consensus that grounds the Garnet project (Claude Opus 4.6, GPT-5.4 Pro, Grok 4.2, Gemini 3.1 Pro Deep Research) converged on eight architectural points that define Garnet’s design. Papers I–V built the research foundation on those eight points. This paper asks a different question: what does Garnet make possible that no existing language offers?
The seven contributions below are not speculative features. Each one emerges directly from design decisions already present in the Garnet specification (Mini-Spec v0.3) and compiler architecture (Compiler Architecture Spec v1.0). They are, however, novel in that no existing production language or academic prototype combines them in the way Garnet’s dual-mode architecture enables.
Large Language Models generate code with error rates that vary dramatically across languages. Empirically, LLMs produce correct Rust code at significantly lower rates than correct Python or TypeScript code — primarily because Rust’s lifetime annotations and borrow-checker interactions require tracking complex state across distant code blocks, a task that exceeds transformer attention patterns.
No programming language has been designed with LLM comprehension and generation as a first-class constraint. Languages are designed for human readability, machine parseability, or both — but “AI parseability” is a novel axis.
Garnet’s syntax is designed to maximize the probability that an LLM generates correct code on the first attempt. The design principles are:
Semantic beacons. Keywords like
@safe, @dynamic, memory episodic,
actor, protocol, def,
fn serve as unambiguous semantic markers. An LLM reading a
Garnet file can determine the mode, intent, and constraints of any code
block from its opening keyword alone — no multi-line lookahead
required.
Predictable patterns. Every block is
{ ... }. Every function in managed mode starts with
def. Every function in safe mode starts with
fn. Every actor starts with actor. Every
memory declaration starts with memory. An LLM trained on
Garnet can predict with near-certainty what syntactic category follows
each keyword.
Progressive annotation density. Level 0 Garnet (dynamic, no annotations) is as simple as Python. Level 3 (affine, full annotations) is as precise as Rust. An LLM can generate Level 0 code for rapid prototyping and Level 3 code when safety is required, with each level’s syntax being a strict superset of the previous. The LLM never needs to choose between two incompatible syntaxes — it simply adds more constraints.
Explicit error model. Managed mode uses
try/rescue (exception-style). Safe mode uses
Result<T,E> with ? (value-style). An LLM
never needs to decide which error model to use — the mode annotation
tells it.
Hypothesis: An LLM fine-tuned on Garnet code will achieve a higher correct-generation rate (measured by pass@1 on a standardized benchmark) than the same LLM fine-tuned on equivalent Rust code, for programs of comparable complexity.
Experimental protocol: Create a benchmark of 500 programming tasks (200 managed-mode, 200 safe-mode, 100 mixed-mode). Generate solutions using a state-of-the-art LLM. Measure pass@1 (first-attempt correctness after compilation + test suite). Compare Garnet pass@1 against Rust pass@1 for safety-equivalent tasks and against Ruby pass@1 for ergonomics-equivalent tasks.
Prior art: There is extensive work on LLM code generation benchmarks (HumanEval, MBPP, SWE-bench) and on evaluating LLMs across languages. There is limited work on designing languages FOR LLM generation — Mojo’s documentation mentions “AI-first” but does not formalize it.
Novelty claim: Garnet is the first language to treat LLM-native syntax as a measurable, testable design constraint with a falsifiable correctness hypothesis.
Existing gradual typing systems (TypeScript, Python/mypy, Sorbet for Ruby) provide a binary choice: annotated or not. Existing ownership systems (Rust) provide no dynamic option. No language offers a continuous spectrum from fully dynamic to affine-typed.
Garnet defines four type-discipline levels, each a strict superset of the previous:
| Level | Name | Annotations | Guarantees | Closest Existing Analog |
|---|---|---|---|---|
| 0 | Dynamic | None | Runtime type errors only | Ruby, Python |
| 1 | Gradual | Optional hints | Compile-time where annotated | TypeScript (strict: false) |
| 2 | Static | All types specified | Full compile-time safety | TypeScript (strict: true), Go |
| 3 | Affine | Types + ownership | Memory safety + data-race freedom | Rust |
The key property: A program valid at Level N is valid at Level N+1 with zero modifications (the compiler infers tighter constraints). A program at Level N+1 is valid at Level N (extra annotations become documentation). This bidirectional compatibility is the progressive disclosure guarantee.
Paper V defines λ_managed (Levels 0–2) and λ_safe (Level 3) as distinct sub-calculi with a bridging judgment. The progressive disclosure spectrum maps onto these:
Dyn for all types, Level 1 uses
Dyn for unannotated positions and concrete types for
annotated ones, Level 2 requires all positions to be concretely
typed.@safe), not at the expression
level.The formal guarantee: for any program P at Level N, if P type-checks at Level N, then P also type-checks at Level N-1 (relaxation is always safe) AND at Level N+1 (strengthening is always sound, because inference adds constraints, never removes them).
Hypothesis: The progressive type spectrum reduces the barrier to adopting memory-safe programming practices. Measured by: developers who start at Level 0 and incrementally annotate to Level 2/3 achieve comparable safety outcomes to developers who start at Level 3, with lower time-to-first-correct-program.
Prior art: Siek & Taha’s gradual typing (2006),
Garcia et al.’s Abstracting Gradual Typing (POPL 2016),
New/Licata/Ahmed’s Gradual Type Theory (POPL 2019). Mojo’s progressive
disclosure (Python def → typed fn). Rust’s
ownership model. Swift’s optional chaining and progressive
strictness.
Novelty claim: No existing language provides a formally continuous spectrum from dynamic typing through gradual typing to affine typing within one grammar, with the bidirectional compatibility guarantee. TypeScript spans Levels 0–2 but has no Level 3. Rust spans Levels 2–3 but has no Levels 0–1. Mojo spans Levels 0–2 for Python compatibility but has no affine types. Garnet is the first to span all four.
Compilers are stateless tools — they consume source code and produce artifacts with no memory of past compilations beyond incremental caches. No compiler uses episodic memory (what optimizations worked previously), semantic memory (knowledge about the codebase’s patterns), or procedural memory (compilation strategies) in the way that Garnet defines these concepts for user programs.
The Garnet compiler can use Garnet’s own four memory types to improve its own performance over time:
Compilation N:
source → [compile] → artifact + compilation trace
Compilation N+1:
source + diff(N, N+1) + compilation trace(N) → [compile with hints] → artifact + trace
The compiler maintains a .garnet-cache/ directory
containing: - episodes.log — timestamped compilation events
with optimization decisions and their outcomes -
knowledge.db — semantic index of the codebase (most-called
functions, hottest loops, largest types) - strategies.db —
validated compilation strategies (e.g., “inlining
fast_path::process always reduces p99 latency”)
On subsequent compilations, the compiler consults these stores to inform optimization decisions. This is NOT profile-guided optimization (PGO) — PGO uses runtime profiling data. This uses the compiler’s own compilation history.
Hypothesis: A compilation-history-aware Garnet compiler produces measurably better-optimized output after 10+ compilations of the same codebase than a stateless compiler, measured by: (a) compilation speed (less time spent on optimization passes that were historically unproductive), and (b) output quality (better inlining decisions based on past outcomes).
Prior art: PGO (profile-guided optimization) in GCC, LLVM, and the JVM. JIT compilation in V8, HotSpot, YJIT. Machine-learning-based compiler optimization (CompilerGym, MLGO). AutoFDO (automatic feedback-directed optimization).
Novelty claim: None of these systems use the compiler’s own compilation trace as a memory source. PGO uses runtime profiles. JITs use runtime execution counts. ML-based approaches train offline models. Garnet’s compiler-as-agent uses its own episodic, semantic, and procedural memory — making it the first compiler that genuinely learns from its own past, using the same memory model it compiles for user programs. The self-referential nature (compiler uses the language’s own memory primitives) is the novelty.
All general-purpose allocators treat all allocations identically. The programmer knows that some data is short-lived (scratch buffers), some is append-only (logs), some is read-heavy (lookup tables), and some is versioned (workflow state) — but the allocator does not. The programmer cannot convey allocation intent to the runtime without reaching for specialized data structures.
Garnet’s memory declarations carry kind information
(working, episodic, semantic,
procedural) that the compiler passes to the allocator:
| Memory Kind | Compiler-Selected Allocator | Access Pattern |
|---|---|---|
| Working | Arena allocator — bulk alloc, bulk free at scope exit | High write, short-lived |
| Episodic | Append-only log — sequential writes, immutable after write | Sequential write, range read |
| Semantic | Persistent allocator — structural sharing, COW nodes | Read-heavy, rare mutation |
| Procedural | COW allocator — copy-on-write with version chain | Read + versioned mutation |
The programmer writes
memory episodic log : EpisodeStore<Event> and the
compiler automatically selects the append-only allocator. No allocator
API is exposed in the language surface.
Hypothesis: Kind-aware allocation reduces peak memory usage and allocation overhead compared to a general-purpose allocator, for programs that use the four memory types, by at least 20% on representative agent workloads.
Prior art: Arena allocation (widely used in game engines, Zig’s std.heap.ArenaAllocator). Region-based memory management (Tofte & Talpin, 1997). Typed assembly language (TAL, Morrisett et al., 1999). Rust’s allocator API (nightly).
Novelty claim: No language automatically selects allocation strategy from type-level kind annotations. Existing approaches require the programmer to explicitly choose the allocator. Garnet is the first to derive allocator selection from the language’s type system.
No production language supports two distinct error-handling models (exceptions AND Result types) with round-trip-lossless conversion at the boundary. Languages choose one model and stick with it.
Bridging operates via two language-level primitives that compose cleanly at the managed↔︎safe boundary:
? operator in managed code unwraps
Ok(v) to v and converts Err(e)
into a raise that the enclosing try/rescue captures.try { ... } rescue e { ... } in safe
code captures a managed raise and returns err(e) to the
safe caller.So:
Err(e) at the
safe boundary (captured via try/rescue +
err(e)).Err(e) becomes a managed raise at the
? site, caught by the enclosing
try/rescue.Ok(v) flows through transparently.try/rescue + ? composition.
Verified by garnet-interp/tests/boundary_errors.rs across
all four directions (safe→managed, managed→safe, double-bounce,
type-mismatch loud-fail) with Err-payload round-trip
("unlucky number" reaches rescue handler intact).Result<T, E> and elides the user’s
try/rescue scaffold, inserting the conversion implicitly.
Requires complete type-checker (Stage 2+). Falsifiable hypothesis
(pass/fail to be measured in v4.0 empirical validation): zero
error-information loss in compiler-generated bridges across a 100-case
corpus.Prior art: Swift’s Objective-C error bridging (NSError ↔︎ throws). Java’s checked ↔︎ unchecked exception hierarchy. Kotlin’s Java interop for exceptions.
Novelty claim: All prior art bridges within one
error paradigm (exceptions to exceptions). Garnet bridges across
paradigms (exceptions to algebraic types and back). The formal
basis in Paper V §4’s bridging judgment — proven in v3.2 via the manual
?/try/rescue composition — ensures no error
information is lost in either direction, a property none of the prior
art guarantees formally. The v4.0 automatic-bridging layer is an
ergonomic refinement of the already-proven semantic primitive.
Long-horizon AI agents run for hours or days. Restarting them to deploy new orchestration logic interrupts active workflows, drops in-flight state, and forces expensive context reconstruction. No systems language supports hot-reloading of orchestration code while performance-critical code continues running.
The mode boundary IS the reload boundary:
[Safe-mode: native, running] ←→ [Mode Boundary] ←→ [Managed-mode: bytecode VM]
↓ hot-reload
[New bytecode loaded]
↓
[Drain buffered messages]
↓
[Resume normal operation]
Hypothesis: Garnet’s hot-reload mechanism achieves zero-downtime orchestration updates for long-running agent systems, with message delivery latency increase bounded to the reload window duration (measured in milliseconds).
Prior art: Erlang/OTP hot code loading (the gold standard). Java’s OSGi dynamic modules. Elixir’s hot reloading. JavaScript’s HMR (Hot Module Replacement) in development servers.
Novelty claim: Erlang reloads within one runtime (BEAM). Garnet reloads across a type-system mode boundary, where one side (safe mode) has zero-cost compiled native code that doesn’t participate in the reload. This is architecturally distinct: Erlang trades performance for universal reloadability, while Garnet preserves native performance in the hot path and confines reloading to the orchestration layer.
No mainstream language guarantees that the same source + same compiler version = byte-identical output. Non-determinism in compilation (timestamps, randomized optimization choices, platform-dependent codegen) means builds are not reproducible. For agent systems, this is a security risk: you cannot verify that a deployed binary matches its source.
Garnet’s compiler guarantees: 1. Deterministic
codegen: No timestamps, no randomized choices, no
platform-dependent paths (all platform differences are explicit in
target triples) 2. Provenance manifest: Every build
produces a .garnet-manifest file containing the SHA-256
hash of every input (source files, dependencies, compiler version,
target triple, optimization flags) 3. Embedded
manifest: The manifest hash is embedded in the output binary’s
metadata section, so any binary can prove its own provenance
$ garnet build --release
→ my-agent (binary)
→ my-agent.garnet-manifest (JSON: hashes of all inputs + output)
$ garnet verify my-agent
✓ Binary hash matches manifest
✓ All source hashes match
✓ Compiler version: garnetc 0.3.0 (2026-04-16)
✓ Build is deterministically reproducible
Hypothesis: Given identical source, dependencies, compiler version, and target triple, two independent Garnet compilations produce byte-identical output. This is testable by compiling the same project on two different machines and comparing hashes.
Prior art: Reproducible Builds project
(reproducible-builds.org). Nix/Guix reproducible build systems. Go’s
reproducible builds (largely achieved in practice). Rust’s
-C codegen-units=1 for more deterministic output.
Novelty claim: Existing approaches achieve reproducibility as an emergent property of careful engineering. Garnet specifies it as a language-level guarantee with a first-class provenance mechanism. The manifest is not an external tool — it is part of the compiler’s output contract.
These seven contributions are not theoretical wishes. Each one: 1. Emerges from design decisions already present in Mini-Spec v0.3 2. Has a falsifiable hypothesis that can be tested against existing languages 3. Is bounded against prior art with precise novelty claims 4. Can be implemented incrementally across the engineering ladder (Rungs 3–6)
Together they position Garnet not merely as a language reconciling Rust and Ruby, but as a research vehicle for the next generation of programming language theory — one that takes AI generation, agent systems, and compilation intelligence as first-class design constraints alongside the traditional concerns of safety, performance, and expressiveness.
The question these contributions pose to the programming language community is not whether they are achievable (each builds on proven precedent), but whether a single language can credibly embody all seven simultaneously. Garnet’s dual-mode architecture makes this possible by providing a clean separation surface: contributions 1–3 operate across the full language, while contributions 4–7 leverage the specific properties of the mode boundary to achieve results that a single-mode language cannot.
“In the multitude of counsellors there is safety.” — Proverbs 11:14 “The plans of the diligent lead surely to abundance.” — Proverbs 21:5 “Where there is no vision, the people perish.” — Proverbs 29:18
Paper VI prepared by Claude Code (Opus 4.6) | April 16, 2026