Skip to content

🚀 Deployment & Infrastructure: The Enterprise Manifold

The coreason-runtime is not designed to be run as a fragile local script in production. It is engineered as a mathematically isolated, air-gapped Deployment Manifold.

This guide explains how to deploy the entire cybernetic mesh (Runtime + Orchestrator + Tensor Engine) onto a bare-metal Linux host or a VM Hypervisor (like Proxmox) utilizing hardware GPU passthrough.


1. Prerequisites

Before booting the mesh, your host machine must meet the SOTA 2026 infrastructure standards:

  • OS: Linux (Ubuntu 24.04 LTS or Debian 12 recommended).
  • Orchestration: Docker Engine (v24+) and Docker Compose (v2+).
  • Hardware: A physical NVIDIA GPU (e.g., RTX 3090/4090, A100, H100).
  • Drivers: NVIDIA Container Toolkit installed and configured for the Container Device Interface (CDI).

2. Host Preparation: The NVIDIA CDI

Because the Swarm requires sub-millisecond Time-To-First-Token (TTFT) for its active inference loop, we cannot use CPU-bound models or heavy virtualization abstraction layers. We pass the physical PCIe lanes directly to the sglang container.

Ensure your Docker daemon is configured to use the NVIDIA runtime. Generate the CDI specification on your host:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Note: This allows our compose.yaml to request capabilities: [gpu] natively without deprecated --gpus all flags.


3. Environment Projection

Clone the repository to your host machine and project the environment topology.

cp .env.example .env

Open .env and configure the necessary variables. Crucially, you must provide an HF_TOKEN (HuggingFace User Access Token). The SGLang container requires this to securely pull gated foundational models like Meta-Llama-3-8B-Instruct.

# .env
SGLANG_URL=http://sglang:30000
LANCEDB_URI=/app/data/lancedb
ECOSYSTEM_REGISTRY_URL=http://ecosystem:8080
TELEMETRY_BROKER_URL=http://localhost:8000
TEMPORAL_HOST=temporal:7233
HF_TOKEN=hf_your_secure_token_here

4. Booting the Mesh

With the environment projected and the GPU primed, you can boot the entire organism. We execute this via the root compose.yaml:

docker compose up -d --build

The Topography of the Mesh

Once booted, the manifold isolates the components into three microservices:

  1. runtime (Port 8000): The FastAPI edge and Temporal Worker. This is the only service that exposes an API to the public network/frontend IDE. It runs securely as a non-root user (UID 10000).
  2. sglang (Port 30000): The Cognitive Engine. It monopolizes the GPU and binds only to the internal Docker bridge network. It is mathematically invisible to external port scanners.
  3. temporal (Port 7233 / Web UI 8233): The Orchestration Substrate. Handles the heavy gRPC task queue routing and durable state serialization.

5. Epistemic Persistence & Volume Mounts

If a container crashes, the Linux OOM-killer terminates a process, or you upgrade the daemon, the Swarm must not experience amnesia.

The compose.yaml maps physical host directories into the containers via strict bind mounts. Because the runtime executes as the unprivileged coreason user (for WASM sandbox security), you must ensure the host directories have the correct permissions:

# Create the physical data directories on the host
mkdir -p data/lancedb data/bronze data/silver data/gold

# Grant read/write access to the internal container user (UID 10000)
sudo chown -R 10000:10000 data/
  • data/lancedb/: The continuous Epistemic Vector Ledger.
  • data/bronze/, silver/, gold/: The Medallion ETL intelligence matrices.

6. Observability & Analytics

Once the mesh is running, you have two primary observability vectors:

A. Real-Time Swarm Topography (Temporal)

To watch the deterministic AST execution, visualize retries, or manually intervene in a stalled workflow, navigate to the Temporal Web UI: 👉 http://<your-host-ip>:8233

B. Offline Epistemic Auditing (The Medallion Pipeline)

The daemon automatically funnels Server-Sent Events (SSE) telemetry into local .parquet files. To audit token economics, epistemic drift rates, or node latency, simply point a Jupyter Notebook or a local Polars script at your host's data/gold/ directory:

import polars as pl

# Audit total token spend and node activation
df = pl.read_parquet("./data/gold/metrics.parquet")
print(df)