Skip to content

The Intent Blade: SLM Orchestration

The Intent Blade is the conversational intelligence layer of ANDARTIS. Rather than using fixed rule parsers, it runs a Small Language Model (SLM) locally on Apple Silicon to understand complex prompts, select tools, and format raw database records.


Technical Stack

  • ML Library: Apple mlx-lm running in the persistent Python daemon.
  • Model Target: High-performance, quantized 7B-class models (e.g., Mistral-7B-Instruct-v0.3-4bit) or ultra-lightweight 3B models.
  • VRAM Persistence: The weights remain loaded in Apple Silicon's Unified Memory, ensuring near-instant response times for subsequent queries.
  • Storage Path: Configured globally inside storage/models/global_slm/.

Execution Phases

mermaid
flowchart TD
    Query[User Query] --> Intent[Intent Parsing Phase]
    Intent --> |JSON Schema Plan| Tools{Tool Orchestrator}
    Tools --> |SQL Query| DB[(SQLite DB)]
    Tools --> |Neural Search| Vector[Semantic Navigator]
    DB --> Output[Raw JSON Data]
    Vector --> Output
    Output --> Synthesis[Grounded Synthesis Phase]
    Synthesis --> Response[Conversational prose]

1. Intent Parsing

When you submit a prompt, the SLM interprets your intent and compiles a structured JSON execution plan containing specific tools and SQL filters.

Supported Actions

  • hybrid_search: Combines a strict metadata SQL filter with semantic neural ranking. Recommended for scaling across 10,000+ files.
  • entity_analytic: Used to run high-speed quantitative counting or database statistics.
  • semantic_search: Used to surface abstract concepts across different files.
  • aggregate_query: Processes map-reduce style extractions over large datasets.
  • clarify: Prompted when a user query is ambiguous, generating a professional clarifying question.

Grounded Synthesis (The Model Weaver)

To process massive contexts without exceeding local RAM limits, ANDARTIS uses the Model Weaver pipeline:

  1. Information Filtering (Micro): The Semantic Navigator or Entity Analytic executes the initial query. For a dataset of 10,000 documents, it filters the set down to approximately 50 candidates using metadata, then ranks the top 5 semantically.
  2. Context Compilation: The PHP Orchestrator retrieves the text of the top 5 chunks and structures them into a raw "Ground Truth Context".
  3. The Synthesis Pass: The context is fed to the Senior Analyst (SLM) with the synthesize action. The model writes the final response using only the provided facts, ensuring zero hallucinations.

Sample Interaction Flow

Below is a trace of the system processing the query: “Show me patient summaries from March”

  1. Input: User sends the query via the Vue UI.
  2. Parsing: The Intent Blade parses the query and returns a structured JSON payload:
    json
    {
      "action": "aggregate_query",
      "query": "SELECT summary, date FROM clinical_records WHERE date LIKE '2026-03-%'",
      "parameters": {}
    }
  3. Retrieval: The Laravel core runs the SQL query against the SQLite database in milliseconds, retrieving matching rows.
  4. Packaging: The core packages the raw results and forwards them to the Python synthesis pipeline.
  5. Synthesis: The SLM receives the JSON array and compiles it into a bulleted summary.
  6. Output: The user sees a formatted response in the desktop window.

Released in Alpha