Architecture Overview
Conduit is built on a two-plane architecture that brings queries to data rather than moving data to a central location. This design enables Conduit to work with data sovereignty requirements while providing unified access across all your industrial data sources.
Core Principles
1. Data Stays Where It Is
Unlike traditional data platforms that require ETL pipelines and data lakes, Conduit queries data at the source. Your Splunk data stays in Splunk. Your MQTT streams stay on the broker. Conduit simply provides a unified, AI-powered query interface.
2. Queries Travel, Data Doesn't
When you ask Conduit a question, the query is compiled and routed to the appropriate translators. Each translator converts the query to the native format (SPL for Splunk, MQTT subscriptions, MCP for IoT protocols), executes it locally, and returns only the results.
3. Context is Centralized, Data is Distributed
While data remains distributed, Conduit maintains a centralized Context Store (PostgreSQL + pgvector) that holds metadata, semantic embeddings, Golden Templates, and organizational context. This enables natural language queries without centralizing the actual data.
System Components
┌─────────────────────────────────────────────────────────────┐
│ CONTROL PLANE (Fastify) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Context │ │ Multi-LLM │ │ Query Planner │ │
│ │ Store (PG) │ │ AI Engine │ │ (DAG) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Golden │ │ Mesh │ │ Vector Search │ │
│ │ Templates │ │ Registry │ │ (pgvector) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ MCP (queries) │ NATS (real-time)
┌────▼────────┐ ┌─────▼──────────┐
│ Query Plane │ │ Real-Time Plane │
│ (HTTP/MCP) │ │ (NATS pub/sub) │
└────┬────────┘ └─────┬───────────┘
│ │
┌────▼─────────────────────────▼────┐
│ TRANSLATORS │
│ Splunk │ MQTT │ OPC-UA │MCP │
└──────────┴────────┴──────────┴────┘
Control Plane
The Control Plane is the brain of Conduit, built on Fastify for high-performance HTTP handling. It handles query compilation, routing, AI processing, and context management.
Key Components
Context Store (PostgreSQL + pgvector)
A PostgreSQL database with pgvector extension that stores:
- Tag Catalog: Metadata about all discovered tags across all sources
- Semantic Embeddings: Vector representations for hybrid BM25 + semantic search
- Golden Templates: Learned query patterns with confidence scores
- Organizational Context: Department/role-based filtering and access patterns
- UNS Topics: Dynamic Unified Namespace structure
Multi-LLM AI Engine
Supports pluggable LLM providers for query interpretation:
- Claude (Anthropic) - Primary recommendation
- OpenAI (GPT-4 / GPT-4o)
- Azure OpenAI - For enterprise Azure deployments
- Ollama - Self-hosted open-source models
- Mock - For testing and development
NQE Parser & Query Compiler
The Natural Query Engine processes natural language queries through several stages:
- Golden Template Matching: Check for high-confidence learned patterns
- AI Interpretation: Understand user intent and extract entities via LLM
- Query Generation: Create structured query (SPL, MQTT, MCP commands)
- Validation: Check against the tag catalog and organizational context
- User Review: Present query for user confirmation or adjustment
- Federation Planning: Determine which translators to query
Query Planner (DAG Engine)
The Query Planner builds directed acyclic graphs (DAGs) for multi-source query execution:
- UNS Resolution: Maps NQE fields to physical data source paths via the Unified Namespace
- DAG Construction: Groups resolved fields by source, creates parallel execution nodes
- Partial Execution: Returns results from available sources immediately, updates when slower sources respond
- Formula Evaluation: DuckDB compute nodes evaluate cross-source formulas within the DAG
- Plan Caching: Redis-backed cache with SHA-256 topology hashing for instant repeat queries
Pattern Detection Engine
Correction pattern analysis for query improvement:
- Tracks query corrections and identifies recurring patterns
- Confidence scoring based on correction frequency and outcomes
- Auto-promotion pipeline promotes patterns to Golden Templates when thresholds are met
DuckDB Correlation Engine
In-memory analytical engine for cross-source data correlation:
- Time-window based joins across different adapters
- Anchor event correlation with context windows
- High-performance aggregation and analytics
Translators
Translators are standalone containers that connect to your data sources and handle protocol translation. Each translator publishes real-time values to NATS and exposes an MCP server for queries.
Production Translators
| Translator | Protocol | Status | Use Case | | --------------- | ---------- | ---------- | ------------------------------- | | Splunk | REST/SPL | Production | Log analytics, SIEM data | | MQTT | mqtt.js v5 | Production | IoT telemetry, Sparkplug B | | OPC-UA | OPC-UA | Production | Industrial automation servers | | MCP IoT Gateway | MCP | Production | Modbus, S7, EtherNet/IP, BACnet |
MCP IoT Gateway
The MCP IoT Gateway is a multi-protocol translator that uses the Model Context Protocol to dynamically connect to multiple industrial protocols:
- Modbus TCP/RTU: PLCs and field devices
- Siemens S7: S7-300/400/1200/1500 PLCs
- EtherNet/IP: Rockwell/Allen-Bradley controllers
- BACnet/IP: Building automation systems
Two Data Planes
Query Plane (MCP over HTTP)
Historical and analytical queries flow through MCP:
- tools/call: Execute NQE queries against translators
- tools/list: Discover capabilities across all connected translators
- Bulk export: Large result sets from Splunk, historians
- REST fallback: For environments where direct MCP isn't available
Real-Time Plane (NATS)
Push-based real-time data flows through NATS:
- Sub-millisecond delivery: Tag values published and received in under 1ms
- Wildcard subscriptions: Subscribe to
uns.PlantA.Line3.>for all tags on Line 3 - Edge-to-edge: Translators communicate directly through NATS without the control plane
- JetStream: Persistent streams for bulk export and replay after consumer disconnection
- WebSocket bridge: Portal dashboards receive NATS data via WebSocket gateway
Message Flow
1. User submits natural language query
2. Golden Template matching (fast path if match found)
3. AI interprets intent via configured LLM provider
4. User reviews and confirms the compiled query
5. Federation layer routes to target adapters
6. Adapters compile to native format (SPL, MQTT sub, MCP call)
7. Results normalized and returned via MCP
8. DuckDB correlates cross-source results
9. Control Plane returns unified results to user
Security
All communication is secured with:
- TLS 1.3 encryption in transit
- JWT tokens for authentication
- API keys for translator identity
- mTLS for mesh node communication with automatic certificate rotation
- Health endpoint reports certificate days-until-expiry
Data Flow Example
Let's trace a query through the system:
Query: "Show average temperature for Tank 1 over the last hour"
1. NQE Query Compiler
Input: "Show average temperature for Tank 1 over the last hour"
Golden Template: Match found (confidence: 0.92)
Output: SPL query + MCP IoT Gateway command
2. User Review
User sees compiled query with confidence score
User confirms or adjusts
3. Federation Planner
Decision: Route to Splunk translator (historical data)
+ MCP IoT Gateway (current values via Modbus)
4. Parallel Execution
Splunk: SPL query against indexed data → 60 data points
MCP IoT Gateway: Modbus register read → current value
5. DuckDB Correlation
Combine results, time-align, apply aggregation
Return unified dataset with metadata
Technology Stack
| Component | Technology | | ------------- | ----------------------------------------- | | Control Plane | Fastify (Node.js) | | Portal | React + Vite | | Database | PostgreSQL 15+ with pgvector | | Cache | Redis | | Correlation | DuckDB (in-memory) | | Search | Hybrid BM25 + pgvector semantic | | Transport | NATS (real-time pub/sub) + MCP (queries) | | AI | Multi-LLM (Claude, OpenAI, Azure, Ollama) |
Scaling
Horizontal Scaling
- Control Plane: Scale behind a load balancer for high query volumes
- Translators: Deploy multiple translators per source for redundancy
- PostgreSQL: Read replicas for query scaling
- Redis: Cluster mode for distributed caching
Vertical Scaling
- DuckDB: Increase memory for larger correlation datasets
- Translators: Tune connection pools for high-throughput sources
Next Steps
- NQE Guide - Learn the natural query language
- Translators - Learn about translator architecture
- Deployment - Production deployment guide