The Intent Blade: SLM Orchestration
The Intent Blade is the conversational intelligence layer of ANDARTIS. Rather than using fixed rule parsers, it runs a Small Language Model (SLM) locally on Apple Silicon to understand complex prompts, select tools, and format raw database records.
Technical Stack
- ML Library: Apple
mlx-lmrunning in the persistent Python daemon. - Model Target: High-performance, quantized 7B-class models (e.g.,
Mistral-7B-Instruct-v0.3-4bit) or ultra-lightweight 3B models. - VRAM Persistence: The weights remain loaded in Apple Silicon's Unified Memory, ensuring near-instant response times for subsequent queries.
- Storage Path: Configured globally inside
storage/models/global_slm/.
Execution Phases
mermaid
flowchart TD
Query[User Query] --> Intent[Intent Parsing Phase]
Intent --> |JSON Schema Plan| Tools{Tool Orchestrator}
Tools --> |SQL Query| DB[(SQLite DB)]
Tools --> |Neural Search| Vector[Semantic Navigator]
DB --> Output[Raw JSON Data]
Vector --> Output
Output --> Synthesis[Grounded Synthesis Phase]
Synthesis --> Response[Conversational prose]1. Intent Parsing
When you submit a prompt, the SLM interprets your intent and compiles a structured JSON execution plan containing specific tools and SQL filters.
Supported Actions
hybrid_search: Combines a strict metadata SQL filter with semantic neural ranking. Recommended for scaling across 10,000+ files.entity_analytic: Used to run high-speed quantitative counting or database statistics.semantic_search: Used to surface abstract concepts across different files.aggregate_query: Processes map-reduce style extractions over large datasets.clarify: Prompted when a user query is ambiguous, generating a professional clarifying question.
Grounded Synthesis (The Model Weaver)
To process massive contexts without exceeding local RAM limits, ANDARTIS uses the Model Weaver pipeline:
- Information Filtering (Micro): The Semantic Navigator or Entity Analytic executes the initial query. For a dataset of 10,000 documents, it filters the set down to approximately 50 candidates using metadata, then ranks the top 5 semantically.
- Context Compilation: The PHP Orchestrator retrieves the text of the top 5 chunks and structures them into a raw "Ground Truth Context".
- The Synthesis Pass: The context is fed to the Senior Analyst (SLM) with the
synthesizeaction. The model writes the final response using only the provided facts, ensuring zero hallucinations.
Sample Interaction Flow
Below is a trace of the system processing the query: “Show me patient summaries from March”
- Input: User sends the query via the Vue UI.
- Parsing: The Intent Blade parses the query and returns a structured JSON payload:json
{ "action": "aggregate_query", "query": "SELECT summary, date FROM clinical_records WHERE date LIKE '2026-03-%'", "parameters": {} } - Retrieval: The Laravel core runs the SQL query against the SQLite database in milliseconds, retrieving matching rows.
- Packaging: The core packages the raw results and forwards them to the Python synthesis pipeline.
- Synthesis: The SLM receives the JSON array and compiles it into a bulleted summary.
- Output: The user sees a formatted response in the desktop window.