Beyond Keyword Search

Traditional Retrieval-Augmented Generation (RAG) systems find relevant text chunks, but they don't understand the language within them. We propose a Linguistic Query Engine that models language structure to enable far more precise, meaningful, and powerful queries.

Traditional RAG

User Query

"Did the project fail?"

Semantic Search

Finds vectors close to the query.

Retrieved Text Chunks

"The project launch was a success..."
"...initial failure was overcome..."
"...a project to mitigate failure..."

This approach often retrieves documents with matching keywords but conflicting or irrelevant meanings, leading to inaccurate answers.

Linguistic Query Engine

User Query

"Did the project fail?"

Linguistic Parsing

Analyzes grammar, roles, and context.

Structured Query on Model

SELECT * WHERE
Participant = 'project'
Process = 'fail'
Polarity = 'positive'

By querying a structured model of language, we can precisely target concepts and their relationships, ignoring superficial keyword matches.

A Dual-Model Architecture

The engine's power comes from integrating two linguistic theories: Universal Grammar (UG) for timeless, stable structure, and Systemic Functional Linguistics (SFL) for dynamic, contextual meaning.

Firmare (The Deep Structure - UG)

Represents the core, abstract knowledge. It's the stable "hardware" of meaning, defining concepts and their relationships independent of how they are expressed.

Proposition

The core unit of knowledge. A timeless, abstract claim. (e.g., The concept 'Acme Corp' is related to the concept 'acquire' which is related to the concept 'Innovate Inc').

Concept

The atomic entities (nouns, ideas) that build propositions. The building blocks of the ontology.

Software (The Contextual Meaning - SFL)

Represents a single, concrete communicative act. It's the dynamic "software" that runs on the hardware, capturing who is saying what to whom, and why.

Discourse Event

The central hub for context. Represents a specific utterance like an email, a sentence in a report, etc.

Interpersonal Stance

Captures the "Tenor" - the relationship, attitude, and certainty of the speaker.

The Critical Bridge: Text Chunk

The `TextChunk` model is the linchpin that connects the two worlds. It links a specific piece of text (e.g., "The company announced a layoff") to both its dynamic SFL context (who announced it, when, with what certainty) and the timeless UG propositions it represents (company -> layoff).

Hybrid Processing Pipeline

Raw text is transformed into the structured dual-model through a five-stage automated pipeline, combining traditional computational linguistics with modern LLM capabilities.

Integrated Database Schema

The architecture is realized as a set of interconnected database tables. Explore the models below to see how linguistic concepts are stored.

Unlocking Powerful Queries

This structured approach enables queries that are impossible for systems based on semantic similarity alone. See the difference in specificity and power.

Traditional RAG Query

Linguistic Engine Query