Architecture Overview

Conduit is built on a two-plane architecture that brings queries to data rather than moving data to a central location. This design enables Conduit to work with data sovereignty requirements while providing unified access across all your industrial data sources.

Core Principles

1. Data Stays Where It Is

Unlike traditional data platforms that require ETL pipelines and data lakes, Conduit queries data at the source. Your Splunk data stays in Splunk. Your MQTT streams stay on the broker. Conduit simply provides a unified, AI-powered query interface.

2. Queries Travel, Data Doesn't

When you ask Conduit a question, the query is compiled and routed to the appropriate translators. Each translator converts the query to the native format (SPL for Splunk, MQTT subscriptions, MCP for IoT protocols), executes it locally, and returns only the results.

3. Context is Centralized, Data is Distributed

While data remains distributed, Conduit maintains a centralized Context Store (PostgreSQL + pgvector) that holds metadata, semantic embeddings, Golden Templates, and organizational context. This enables natural language queries without centralizing the actual data.

System Components

┌─────────────────────────────────────────────────────────────┐
│                      CONTROL PLANE (Fastify)                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ Context     │  │ Multi-LLM   │  │ Query Planner       │ │
│  │ Store (PG)  │  │ AI Engine   │  │ (DAG)               │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ Golden      │  │ Mesh        │  │ Vector Search       │ │
│  │ Templates   │  │ Registry    │  │ (pgvector)          │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
          │ MCP (queries)            │ NATS (real-time)
     ┌────▼────────┐          ┌─────▼──────────┐
     │ Query Plane │          │ Real-Time Plane │
     │ (HTTP/MCP)  │          │ (NATS pub/sub)  │
     └────┬────────┘          └─────┬───────────┘
          │                         │
     ┌────▼─────────────────────────▼────┐
     │          TRANSLATORS              │
     │  Splunk  │  MQTT  │  OPC-UA  │MCP │
     └──────────┴────────┴──────────┴────┘

Control Plane

The Control Plane is the brain of Conduit, built on Fastify for high-performance HTTP handling. It handles query compilation, routing, AI processing, and context management.

Key Components

Context Store (PostgreSQL + pgvector)

A PostgreSQL database with pgvector extension that stores:

Tag Catalog: Metadata about all discovered tags across all sources
Semantic Embeddings: Vector representations for hybrid BM25 + semantic search
Golden Templates: Learned query patterns with confidence scores
Organizational Context: Department/role-based filtering and access patterns
UNS Topics: Dynamic Unified Namespace structure

Multi-LLM AI Engine

Supports pluggable LLM providers for query interpretation:

Claude (Anthropic) - Primary recommendation
OpenAI (GPT-4 / GPT-4o)
Azure OpenAI - For enterprise Azure deployments
Ollama - Self-hosted open-source models
Mock - For testing and development

NQE Parser & Query Compiler

The Natural Query Engine processes natural language queries through several stages:

Golden Template Matching: Check for high-confidence learned patterns
AI Interpretation: Understand user intent and extract entities via LLM
Query Generation: Create structured query (SPL, MQTT, MCP commands)
Validation: Check against the tag catalog and organizational context
User Review: Present query for user confirmation or adjustment
Federation Planning: Determine which translators to query

Query Planner (DAG Engine)

The Query Planner builds directed acyclic graphs (DAGs) for multi-source query execution:

UNS Resolution: Maps NQE fields to physical data source paths via the Unified Namespace
DAG Construction: Groups resolved fields by source, creates parallel execution nodes
Partial Execution: Returns results from available sources immediately, updates when slower sources respond
Formula Evaluation: DuckDB compute nodes evaluate cross-source formulas within the DAG
Plan Caching: Redis-backed cache with SHA-256 topology hashing for instant repeat queries

Pattern Detection Engine

Correction pattern analysis for query improvement:

Tracks query corrections and identifies recurring patterns
Confidence scoring based on correction frequency and outcomes
Auto-promotion pipeline promotes patterns to Golden Templates when thresholds are met

DuckDB Correlation Engine

In-memory analytical engine for cross-source data correlation:

Time-window based joins across different adapters
Anchor event correlation with context windows
High-performance aggregation and analytics

Translators

Translators are standalone containers that connect to your data sources and handle protocol translation. Each translator publishes real-time values to NATS and exposes an MCP server for queries.

Production Translators

| Translator | Protocol | Status | Use Case | | --------------- | ---------- | ---------- | ------------------------------- | | Splunk | REST/SPL | Production | Log analytics, SIEM data | | MQTT | mqtt.js v5 | Production | IoT telemetry, Sparkplug B | | OPC-UA | OPC-UA | Production | Industrial automation servers | | MCP IoT Gateway | MCP | Production | Modbus, S7, EtherNet/IP, BACnet |

MCP IoT Gateway

The MCP IoT Gateway is a multi-protocol translator that uses the Model Context Protocol to dynamically connect to multiple industrial protocols:

Modbus TCP/RTU: PLCs and field devices
Siemens S7: S7-300/400/1200/1500 PLCs
EtherNet/IP: Rockwell/Allen-Bradley controllers
BACnet/IP: Building automation systems

Two Data Planes

Query Plane (MCP over HTTP)

Historical and analytical queries flow through MCP:

tools/call: Execute NQE queries against translators
tools/list: Discover capabilities across all connected translators
Bulk export: Large result sets from Splunk, historians
REST fallback: For environments where direct MCP isn't available

Real-Time Plane (NATS)

Push-based real-time data flows through NATS:

Sub-millisecond delivery: Tag values published and received in under 1ms
Wildcard subscriptions: Subscribe to uns.PlantA.Line3.> for all tags on Line 3
Edge-to-edge: Translators communicate directly through NATS without the control plane
JetStream: Persistent streams for bulk export and replay after consumer disconnection
WebSocket bridge: Portal dashboards receive NATS data via WebSocket gateway

Message Flow

1. User submits natural language query
2. Golden Template matching (fast path if match found)
3. AI interprets intent via configured LLM provider
4. User reviews and confirms the compiled query
5. Federation layer routes to target adapters
6. Adapters compile to native format (SPL, MQTT sub, MCP call)
7. Results normalized and returned via MCP
8. DuckDB correlates cross-source results
9. Control Plane returns unified results to user

Security

All communication is secured with:

TLS 1.3 encryption in transit
JWT tokens for authentication
API keys for translator identity
mTLS for mesh node communication with automatic certificate rotation
Health endpoint reports certificate days-until-expiry

Data Flow Example

Let's trace a query through the system:

Query: "Show average temperature for Tank 1 over the last hour"

1. NQE Query Compiler
   Input: "Show average temperature for Tank 1 over the last hour"
   Golden Template: Match found (confidence: 0.92)
   Output: SPL query + MCP IoT Gateway command

2. User Review
   User sees compiled query with confidence score
   User confirms or adjusts

3. Federation Planner
   Decision: Route to Splunk translator (historical data)
            + MCP IoT Gateway (current values via Modbus)

4. Parallel Execution
   Splunk: SPL query against indexed data → 60 data points
   MCP IoT Gateway: Modbus register read → current value

5. DuckDB Correlation
   Combine results, time-align, apply aggregation
   Return unified dataset with metadata

Technology Stack

| Component | Technology | | ------------- | ----------------------------------------- | | Control Plane | Fastify (Node.js) | | Portal | React + Vite | | Database | PostgreSQL 15+ with pgvector | | Cache | Redis | | Correlation | DuckDB (in-memory) | | Search | Hybrid BM25 + pgvector semantic | | Transport | NATS (real-time pub/sub) + MCP (queries) | | AI | Multi-LLM (Claude, OpenAI, Azure, Ollama) |

Scaling

Horizontal Scaling

Control Plane: Scale behind a load balancer for high query volumes
Translators: Deploy multiple translators per source for redundancy
PostgreSQL: Read replicas for query scaling
Redis: Cluster mode for distributed caching

Vertical Scaling

DuckDB: Increase memory for larger correlation datasets
Translators: Tune connection pools for high-throughput sources

Next Steps

NQE Guide - Learn the natural query language
Translators - Learn about translator architecture
Deployment - Production deployment guide