Multi-Agent Orchestration Architecture

We provide a detailed explanation of the multi-agent orchestration system that powers almost all analysis in Aliveo AI. The architecture is designed to address complex analysis questions by decomposing them into smaller tasks, generating and validating code, executing data-driven workflows, and then combining the results into a final, coherent answer. A high-level overview of the architecture is shown below:

The system is composed of:

Multi-Agent Sub-Engines (e.g., Plan Generator, Code Engine)
Fine-Tuned LLMs with RAG-Based Prompts (e.g., Answer Validator, Chat History Generator)
Helper LLMs (e.g., Evaluator, Example Generator, DKG Traversal)
Data and Knowledge Repositories (VectorDB, relational/NoSQL databases, and a Data Knowledge Graph)

This layered approach ensures that each step—planning, code generation, execution, and validation—is handled by the most appropriate specialized component.

High-level Flow

When a user asks a complicated question, the Plan Generator first outlines a sequence of subtasks and pulls relevant examples from VectorDB or DKG. Each subtask is then sent to the Code Engine, where code is generated, compiled, and executed. If errors occur, the Error Handling module tries automated fixes; otherwise, a Per-Answer Validator reviews each result for consistency. All validated subtask outputs are assembled into one Combined Answer, which the Answer Validator inspects for logical coherence before returning a final, polished response. Helper LLMs (Evaluator, Example Generator, and DKG Traversal) are invoked along the way to provide extra context, examples for prompts, data details or lineage verification.

By breaking a complex request into smaller, specialized steps—each handled by the sub-engine best suited for that job—this architecture enables end-to-end automation, maintains high accuracy, and produces explainable, data-driven answers in a structured, LLM-guided workflow.

Core Components

Plan Generator

The Plan Generator first retrieves relevant workflows from a Plan Library, which contains both positive examples (successful analyses) and negative examples (common pitfalls). By embedding the user’s query and performing a nearest-neighbor search, it selects patterns that have worked before and avoids known mistakes.

Next, it checks a UDF Library of reusable code snippets—such as functions to compute churn rates or normalize features—picking only those relevant to the question’s intent. This way, when the Code Engine starts writing code, it already has a small set of recommended helper functions.

Before moving forward, the Plan Generator performs a quick Plan Validation. A reasoning LLM inspects the proposed sequence of steps to ensure logical consistency, and lightweight code samples are executed to confirm basic assumptions (for instance, that a “compute churn flag” function actually yields 0/1 values). Only after both checks pass does the plan move on to the Code Engine.

Code Engine

Code generation follows the structure of the validated plan, producing one snippet per subtask in the order specified. Each snippet is generated by a fine-tuned LLM that takes as input the subtask description and any relevant chat context or UDF references. Once generated, the code is linted and tested against a sample dataset to catch syntax errors or missing imports before full execution. Execution proceeds step by step, ensuring that intermediate outputs feed correctly into subsequent steps.

If a step fails—such as a filtering operation that leaves an empty DataFrame—the Error Handling module consults a Common Error Library built from past auto-executions and an expanding test suite. This library contains patterns for frequent issues (e.g., missing column names, overly restrictive filters, or deprecated function calls). In the case of an empty DataFrame after filtering, the engine might run a quick diagnostic to compare filter criteria against schema metadata and suggest a relaxed filter or a fallback behavior. Once any error is automatically corrected or flagged for minor manual adjustment, execution resumes until all subtasks succeed and their outputs are validated before being added to the Combined Answer.

Answer Validator

The Answer Validator reviews the combined results using UDF-based checks and both plan and code validators to minimize hallucinations and ensure factual accuracy. It verifies that each insight is grounded in data—cross-checking numbers, labels, and narrative text against the outputs. For example, when explaining why spend metrics have dropped, it ensures the statement “Spends are down because of a budget change that was done last week” is directly supported by transaction and budget data rather than inferred without evidence.

For non-technical users, the Answer Validator frames findings in clear, business-friendly language. It highlights key takeaways—such as budget adjustments impacting spend—while avoiding jargon. If any inconsistencies emerge (e.g., conflicting risk factors or unsupported conclusions), it signals the Plan Generator to revisit specific steps. Once checks pass, it produces a concise, polished response that combines data tables, charts, and plain‐language explanations.

Chat History Generator

Throughout the process, a Chat History Generator updates and formats conversational context. Whenever the Code Engine needs to generate or refine code, it receives the entire session history (including prior instructions, examples, or user clarifications). This ensures consistency—variable names remain the same, previously agreed-upon thresholds or data‐cleaning steps are preserved, and the LLM never “forgets” earlier decisions.

Helper LLMs

A set of smaller LLM‐based modules sits above the main flow to handle focused tasks. The Evaluator compares alternate subtask outputs (for example, two different clustering methods) and suggests the best choice. The Example Generator creates synthetic or toy datasets on demand, such as a small DataFrame illustrating how k-means clustering works. The DKG Traversal module navigates the Data Knowledge Graph to reveal entity relationships, data lineage, or semantic metadata (for instance, showing which raw logs feed into a reporting table). Plan Generator and Code Engine consult these helpers whenever they need a quick second opinion, a sample dataset, or lineage details that inform query logic.

Data Sources & Knowledge Repositories

Three main repositories back the orchestration pipeline. A VectorDB stores high‐dimensional embeddings of code snippets, documentation, and past question-answer pairs. When an LLM needs a relevant example or template, it performs a nearest-neighbor lookup in VectorDB. A relational/NoSQL database holds all raw datasets (customer tables, transaction logs, time-series records), which the Code Engine reads or writes during execution. Finally, a Data Knowledge Graph (hosted in a graph database) captures semantic relationships—how “Customer” connects to “Order,” what fields each table contains, and data quality rules. Whenever a workflow touches the data layer, the system can query the DKG to confirm that joins are correct, attributes are named properly, and lineage is maintained.