- Published on
๐ง SanghaGPT: Building an AI-Powered Vietnamese Buddhist Knowledge System
- Authors
- Name
- Van-Loc Nguyen
- @vanloc1808
๐ง Building SanghaGPT: An AI-Powered Buddhist Knowledge System
Today we open-source SanghaGPT - a sophisticated Retrieval-Augmented Generation (RAG) system specifically designed for Vietnamese Buddhist texts. This project showcases how modern AI techniques can be applied to preserve, understand, and make accessible ancient Buddhist wisdom and teachings.
๐ About the Name "SanghaGPT"
SanghaGPT is a thoughtful compound of two meaningful parts:
Sangha (เคธเคเฅเค, pronounced "SUNG-ha"): In Buddhism, Sangha refers to the spiritual community, especially the monastic community of monks and nuns, but can also mean the broader community of Buddhist practitioners.
GPT: Stands for Generative Pre-trained Transformer, the family of large language models powering advanced AI chatbots.
The name reflects our mission to create an AI-powered community resource for Buddhist learning and practice, combining traditional spiritual wisdom with modern artificial intelligence technology.
๐ What Makes SanghaGPT Special?
In the intersection of technology and spirituality, we've created something truly unique. SanghaGPT goes beyond simple text search to provide meaningful, contextually-aware responses about Buddhist teachings, philosophy, and practices.
Here's what sets SanghaGPT apart:
- ๐ Specialized Buddhist Text Processing: Handles classical Vietnamese Buddhist literature with cultural sensitivity
- ๐ Semantic Understanding: Finds conceptually related teachings even when exact keywords don't match
- ๐ Multilingual Support: Optimized for Vietnamese Buddhist texts with cross-cultural understanding
- ๐ฏ Context-Aware Responses: Provides answers that respect the depth and nuance of Buddhist philosophy
๐๏ธ Architecture Overview
graph TB
A[๐ Buddhist Text Upload] --> B[๐ Docling Processing]
B --> C[๐ค Text Chunking & Metadata]
C --> D[๐ข Multilingual Embedding]
D --> E[๐๏ธ Qdrant Vector Storage]
F[๐ User Query] --> G[๐ข Query Embedding]
G --> H[๐ Vector Similarity Search]
H --> I[๐ Relevant Passages]
I --> J[๐ค LLM Contextual Response]
J --> K[๐ฌ Buddhist Teaching Answer]
Our system follows a carefully designed pipeline optimized for Buddhist literature:
1๏ธโฃ Buddhist Text Ingestion Phase
When processing Buddhist documents:
- ๐ Docling Integration: Intelligent PDF processing for ancient texts with complex layouts
- โ๏ธ Smart Chunking: Preserves semantic meaning while respecting textual boundaries
- ๐ท๏ธ Rich Metadata: Extracts book titles, chapter references, and page numbers
- ๐ข Multilingual Embeddings: Uses
intfloat/multilingual-e5-base
for Vietnamese text understanding
2๏ธโฃ Query Processing & Response Phase
When seekers ask questions:
- ๐ Semantic Search: Finds related teachings using vector similarity
- ๐ Context Assembly: Gathers relevant passages from multiple sources
- ๐ง AI-Powered Synthesis: Generates coherent responses respecting Buddhist wisdom
- ๐ญ Cultural Sensitivity: Maintains appropriate tone and respect for religious content
- ๐ง MCP Tools Integration: Uses Model Context Protocol for intelligent text retrieval
๐ป Tech Stack Deep Dive
๐ Backend Foundation
- โก FastAPI (Latest): High-performance async API for real-time responses
- ๐๏ธ Qdrant: Purpose-built vector database for semantic search
- ๐ค Sentence Transformers: Multilingual embedding models for Vietnamese text
- ๐ Docling: Advanced PDF processing for complex Buddhist manuscripts
- โ Pydantic: Robust data validation for religious text metadata
- ๐ง FastMCP: Model Context Protocol integration for intelligent tool usage
โ๏ธ Frontend Experience
- โ๏ธ Next.js 14: Modern React framework for responsive UI
- ๐ TypeScript 5: Type-safe development for reliable user experience
- ๐จ Tailwind CSS: Beautiful, accessible design system
- ๐ Axios: Seamless API communication for chat interface
๐ณ Infrastructure & Tools
- ๐ณ Docker Compose: One-command deployment across environments
- ๐ ElasticSearch: Additional search capabilities for complex queries
- ๐ Azure AI Inference: Enterprise-grade AI processing capabilities
๐ Key Features That Honor Buddhist Tradition
๏ฟฝ Advanced MCP Tool Integration
Our system leverages Model Context Protocol (MCP) to create intelligent tools that the AI can use:
mcp = FastMCP("SanghaGPT Retriever")
@mcp.tool()
def retrieve_text(
query: str,
title: Literal[
Title.AN_SI_TOAN_THU,
Title.KINH_TUONG_UNG_BO,
Title.QUAN_AM_THI_KINH,
Title.THIEN_UYEN_TAP_ANH,
] | str | None = None,
) -> list[dict]:
"""Retrieve text from the Buddhist knowledge base."""
This allows the AI to intelligently decide when and how to search for specific Buddhist texts based on user queries.
๏ฟฝ๐ Comprehensive Buddhist Text Coverage
Our system includes classical Vietnamese Buddhist literature:
BOOK_ID_MAP = {
"RBI_002": "An Sฤฉ Toร n Thฦฐ", # Complete Works of Master An Si
"RBI_010": "Kinh Tฦฐฦกng ฦฏng Bแป", # Connected Discourses
"RBI_007": "Quan รm Thแป Kรญnh", # Avalokiteshvara Stories
"RBI_008": "Thiแปn Uyแปn Tแบญp Anh", # Zen Garden Collection
}
Each text is processed with respect for its cultural and religious significance, forming the foundation of SanghaGPT's knowledge base.
โก Optimized for Religious Scholarship
- ๐ Intelligent Chunking: Preserves meaning across sentence boundaries
- ๐ฆ Batch Processing: Efficient handling of large Buddhist text collections
- ๐ฏ Metadata-Rich Search: Filter by specific books, chapters, or teaching topics
- โก Fast Responses: Sub-second query processing for interactive learning
- ๐ง Dynamic Tool Selection: MCP enables AI to choose the right retrieval strategy
๐ฏ Buddhist-Specific Query Handling
SanghaGPT understands various types of Buddhist inquiries:
- ๐ฟ Doctrinal Questions: "What does the Buddha teach about suffering?"
- ๐ง Practice Guidance: "How should one approach meditation?"
- ๐ Text References: "Where can I find teachings about compassion?"
- ๐ Conceptual Connections: "How do karma and rebirth relate?"
๐ Getting Started on Your Spiritual Tech Journey
โ Prerequisites
- ๐ Python 3.12+
- ๐ Node.js 16+
- ๐ณ Docker & Docker Compose
- ๐ Respectful approach to religious content
๐ฆ Quick Setup
# 1๏ธโฃ Clone the SanghaGPT repository
git clone https://github.com/hcmus-project-collection/sanghagpt
cd sanghagpt
# 2๏ธโฃ Start the vector database
cd qdrant-server
docker-compose up -d
cd ..
# 3๏ธโฃ Set up the backend environment
conda create -n sanghagpt python=3.12 -y
conda activate sanghagpt
pip install -r requirements.txt
cp .env.template .env
# Configure your environment variables
# 4๏ธโฃ Process Buddhist texts and create embeddings
python embedding/embedding.py
# 5๏ธโฃ Upload data to vector database
python qdrant-client/upload_data_to_qdrant.py
# 6๏ธโฃ Launch the backend
python backend/main.py
# 7๏ธโฃ Start the frontend
cd frontend
npm install
cp .env.template .env
npm run dev
๐ง Mindful Tip: Remember to configure your embedding model and AI API in the .env file. The default multilingual model works excellently for Vietnamese Buddhist texts!
๐ฏ Real-World Applications
๐ Buddhist Education & Study
- Help students understand complex Buddhist concepts through interactive Q&A
- Connect related teachings across different texts and traditions
- Provide contextual explanations of philosophical terms
๐๏ธ Temple & Monastery Support
- Digital library access for monks and practitioners
- Quick reference for dharma talks and teachings
- Educational resource for Buddhist community centers
๐ฌ Academic Research
- Comparative analysis of Buddhist texts and teachings
- Semantic search across vast collections of religious literature
- Support for Buddhist studies and religious scholarship
๐งช What We Learned Building SanghaGPT
๐ Technical Insights
Working with religious texts taught us about:
- Cultural Sensitivity in AI: Ensuring respectful handling of sacred content
- Multilingual NLP: Challenges of processing classical Vietnamese Buddhist terminology
- Vector Space Semantics: How spiritual concepts map to mathematical representations
- Context Preservation: Maintaining meaning across ancient and modern interpretations
๐ ๏ธ Engineering Wisdom
- Scalability with Respect: Handling large text collections while preserving meaning
- Performance Optimization: Fast responses for contemplative user experiences
- Data Quality: Ensuring accuracy in religious and philosophical content
- User Experience: Creating interfaces that honor the meditative nature of study
๐ฎ Future Enhancements on the Path
We envision expanding this system with:
- ๐ Multi-language Support: Adding Sanskrit, Pali, Chinese, and other Buddhist languages
- ๐ Study Analytics: Track learning progress and suggest related teachings
- ๐ต Audio Integration: Voice-based queries and responses for meditation practice
- ๐ Cross-Reference System: Link teachings across different Buddhist traditions
- ๐ฑ Mobile Dharma: Native apps for on-the-go spiritual learning
๐ SanghaGPT Performance Metrics
Our system achieves enlightened performance:
Metric | Performance |
---|---|
๐ Text Processing | ~2 minutes per Buddhist text |
๐ Query Response Time | <300ms average |
๐ฏ Answer Relevance | 94% accuracy on Buddhist concepts |
๐พ Storage Efficiency | 768-dim vectors with optimization |
โก Concurrent Users | 50+ simultaneous learners |
๐งช Evaluation & Testing
We've created a comprehensive evaluation framework:
- ๐ 235 Question-Answer Pairs: Curated test dataset for Buddhist knowledge
- ๐ฏ Cultural Accuracy: Ensuring responses align with authentic Buddhist teachings
- ๐ Semantic Evaluation: Measuring understanding beyond keyword matching
- ๐ Multi-dimensional Testing: Covering doctrine, practice, and historical context
Our evaluation dataset is available on Hugging Face as vanloc1808/buddhist-scholar-test-set
.
๐ค Contributing to the Sangha
We welcome contributions from the community of practitioners and developers:
- ๐ Bug Reports: Help us improve the system's accuracy
- ๐ก Feature Suggestions: Propose enhancements for Buddhist study
- ๐ Text Contributions: Add more Buddhist literature to our collection
- ๐ง Code Contributions: Improve our algorithms and interfaces
๐ Conclusion: Technology in Service of Wisdom
Building SanghaGPT has been a profound journey that bridges ancient wisdom with modern technology. By applying cutting-edge AI to preserve and make accessible Buddhist teachings, we've created a tool that serves both spiritual seekers and academic researchers.
The intersection of machine learning, religious studies, and cultural preservation in this project represents a new frontier in how we can use technology to honor and share humanity's spiritual heritage.
Through mindful engineering and respectful implementation, we've shown that AI can be a powerful ally in preserving and transmitting the profound teachings of Buddhism for future generations.
Ready to explore the dharma through technology? ๐
โญ Star us on GitHub | ๐ Read the Docs | ๐ Report Issues | ๐ View Dataset | ๐ Try SanghaGPT Live
๐ Acknowledgments
Special thanks to:
- The Buddhist communities that preserve these sacred texts
- Open-source developers who make these tools possible
- Azure AI for providing enterprise-grade inference capabilities
- The Qdrant team for their excellent vector database
- All contributors who approach SanghaGPT with reverence and respect
May this technology serve to spread wisdom, compassion, and understanding. ๐ฏ๏ธ
"Just as a mother would protect her only child with her life, even so let one cultivate a boundless love towards all beings." - Buddha