Modeling Language as Data

This application explores a novel database architecture for Natural Language Processing. It translates abstract linguistic theories—Universal Grammar (UG) and Systemic Functional Linguistics (SFL)—into a concrete, integrated ActiveRecord schema designed for contextual Retrieval-Augmented Generation (RAG) systems.

Core Linguistic Theories

The architecture models language on two axes: its formal, hierarchical structure (the "what") and its functional, contextual meaning (the "why" and "how").

Universal Grammar (UG)

Provides a blueprint for the formal syntactic structure of text. It allows us to model the underlying grammatical rules (Deep Structure) in a canonical form, independent of surface-level variations.

  • Focus: Formal Syntax & Structure
  • Key Concept: X-bar theory models phrasal hierarchies (NP, VP, etc.).
  • Application: Defines the schema for `Lexeme`, `Phrase`, and `DeepStructure` models.

Systemic Functional Linguistics (SFL)

Provides a framework for modeling the functional and contextual dimensions of language. It analyzes how language is used to make meaning based on the situation.

  • Focus: Function & Contextual Meaning
  • Key Concept: The `ContextOfSituation` (field, tenor, mode) shapes meaning.
  • Application: Defines the schema for `Clause` and its three metafunctional frames.

Interactive Schema Explorer

This diagram visualizes the integrated ActiveRecord schema. The top half represents the UG-based structural models, while the bottom half shows the SFL-based functional models. Click on any model to view its detailed schema.

Lexeme
Base lexical unit
Phrase
Syntactic group (NP, VP)
DeepStructure
Canonical grammar
SurfaceUtterance
Actual source text
ContextOfSituation
Field, Tenor, Mode
Clause
Unit of meaning
IdeationalFrame
Content (Who does what)
InterpersonalStance
Interaction (Mood, Tone)
TextualOrganization
Message Flow (Theme)

Data Processing Pipeline

Raw text is processed through a series of specialized Ruby gems to extract structural and functional information, which then populates the schema models.

Raw Text
Tokenization

Pragmatic Tokenizer

POS & DEP Parsing

ruby-spacy

Deep Grammar

Link Parser

Semantic Relations

ruby-wordnet

Populate Schema

Interactive RAG Demo

This simulates how a RAG application queries the SFL models. Select a `ContextOfSituation` to see how the three metafunction frames are weighted differently to retrieve or generate the most relevant response.