Beyond Semantic Search: A Neurosymbolic Knowledge Graph
An initiative to evolve Retrieval-Augmented Generation (RAG) from simple vector search to a sophisticated, linguistically-grounded reasoning engine using a modular Ruby stack.
"The dominant architectural paradigm... is built upon a foundation of vector-based semantic similarity search. This foundation, while effective for simple fact retrieval, imposes a hard 'semantic ceiling' on the system's reasoning, precision, and contextual understanding capabilities."
This project addresses a critical limitation in modern AI systems: the "context conundrum." Traditional RAG methods often destroy vital contextual information during data processing, leading to responses that are topically relevant but lack deep understanding. By migrating from monolithic frameworks like `langchainrb` to a curated stack of high-performance Ruby gems, we can build a system that prioritizes explicit control, transparency, and maintainability.
The goal is to implement a working neurosymbolic knowledge graph with Systemic Functional Linguistics (SFL) and Memex-inspired capabilities. The existing RubyRAG architecture provides the perfect foundation for this evolution, enabling rapid, research-friendly, and incremental development of a system capable of true multi-faceted contextual understanding.
The Firmare & Software Architecture
A bifurcated data model grounded in linguistic theory to capture both stable knowledge and dynamic context.
The "Firmare" Layer
Theory: Universal Grammar (UG)
This layer structures the stable, timeless, ontological knowledge base. It represents the deep, canonical meaning of information, independent of how it's expressed. Using UG principles, it models innate hierarchical syntax and core logic, forming the backbone of the knowledge graph.
The "Software" Layer
Theory: Systemic Functional Linguistics (SFL)
This layer models the rich, dynamic, and queryable context surrounding each piece of information. It captures language in use, answering questions about who is communicating, what is being discussed, and through what channel, enabling highly precise, context-aware retrieval.
The Evolution of RAG
Comparing traditional RAG methods with the proposed Linguistic Query Engine.
Implementation Roadmap
A phased approach to building the Neurosymbolic Knowledge Graph. Click each phase for details.
Total Estimated Timeline: 14-20 weeks
Core Components & Pipeline
The modular gem stack, processing pipeline, and database schema that power the system.
Hybrid Processing Pipeline
Click a stage above to see details.
Database Schema Explorer
| Column | Data Type | Description |
|---|---|---|
| Schema details will appear here. | ||