Published on

๐Ÿง˜ SanghaGPT: Building an AI-Powered Vietnamese Buddhist Knowledge System

Authors

๐Ÿง˜ Building SanghaGPT: An AI-Powered Buddhist Knowledge System

Today we open-source SanghaGPT - a sophisticated Retrieval-Augmented Generation (RAG) system specifically designed for Vietnamese Buddhist texts. This project showcases how modern AI techniques can be applied to preserve, understand, and make accessible ancient Buddhist wisdom and teachings.

๐Ÿ“– About the Name "SanghaGPT"

SanghaGPT is a thoughtful compound of two meaningful parts:

  • Sangha (เคธเค™เฅเค˜, pronounced "SUNG-ha"): In Buddhism, Sangha refers to the spiritual community, especially the monastic community of monks and nuns, but can also mean the broader community of Buddhist practitioners.

  • GPT: Stands for Generative Pre-trained Transformer, the family of large language models powering advanced AI chatbots.

The name reflects our mission to create an AI-powered community resource for Buddhist learning and practice, combining traditional spiritual wisdom with modern artificial intelligence technology.

๐Ÿš€ What Makes SanghaGPT Special?

In the intersection of technology and spirituality, we've created something truly unique. SanghaGPT goes beyond simple text search to provide meaningful, contextually-aware responses about Buddhist teachings, philosophy, and practices.

Here's what sets SanghaGPT apart:

  • ๐Ÿ“š Specialized Buddhist Text Processing: Handles classical Vietnamese Buddhist literature with cultural sensitivity
  • ๐Ÿ” Semantic Understanding: Finds conceptually related teachings even when exact keywords don't match
  • ๐ŸŒ Multilingual Support: Optimized for Vietnamese Buddhist texts with cross-cultural understanding
  • ๐ŸŽฏ Context-Aware Responses: Provides answers that respect the depth and nuance of Buddhist philosophy

๐Ÿ—๏ธ Architecture Overview

graph TB
    A[๐Ÿ“„ Buddhist Text Upload] --> B[๐Ÿ“– Docling Processing]
    B --> C[๐Ÿ”ค Text Chunking & Metadata]
    C --> D[๐Ÿ”ข Multilingual Embedding]
    D --> E[๐Ÿ—„๏ธ Qdrant Vector Storage]
    F[๐Ÿ” User Query] --> G[๐Ÿ”ข Query Embedding]
    G --> H[๐Ÿ“Š Vector Similarity Search]
    H --> I[๐Ÿ“‹ Relevant Passages]
    I --> J[๐Ÿค– LLM Contextual Response]
    J --> K[๐Ÿ’ฌ Buddhist Teaching Answer]

Our system follows a carefully designed pipeline optimized for Buddhist literature:

1๏ธโƒฃ Buddhist Text Ingestion Phase

When processing Buddhist documents:

  • ๐Ÿ“– Docling Integration: Intelligent PDF processing for ancient texts with complex layouts
  • โœ‚๏ธ Smart Chunking: Preserves semantic meaning while respecting textual boundaries
  • ๐Ÿท๏ธ Rich Metadata: Extracts book titles, chapter references, and page numbers
  • ๐Ÿ”ข Multilingual Embeddings: Uses intfloat/multilingual-e5-base for Vietnamese text understanding

2๏ธโƒฃ Query Processing & Response Phase

When seekers ask questions:

  • ๐Ÿ” Semantic Search: Finds related teachings using vector similarity
  • ๐Ÿ“š Context Assembly: Gathers relevant passages from multiple sources
  • ๐Ÿง  AI-Powered Synthesis: Generates coherent responses respecting Buddhist wisdom
  • ๐ŸŽญ Cultural Sensitivity: Maintains appropriate tone and respect for religious content
  • ๐Ÿ”ง MCP Tools Integration: Uses Model Context Protocol for intelligent text retrieval

๐Ÿ’ป Tech Stack Deep Dive

๐Ÿ Backend Foundation

  • โšก FastAPI (Latest): High-performance async API for real-time responses
  • ๐Ÿ—„๏ธ Qdrant: Purpose-built vector database for semantic search
  • ๐Ÿ”ค Sentence Transformers: Multilingual embedding models for Vietnamese text
  • ๐Ÿ“„ Docling: Advanced PDF processing for complex Buddhist manuscripts
  • โœ… Pydantic: Robust data validation for religious text metadata
  • ๐Ÿ”ง FastMCP: Model Context Protocol integration for intelligent tool usage

โš›๏ธ Frontend Experience

  • โš›๏ธ Next.js 14: Modern React framework for responsive UI
  • ๐Ÿ“˜ TypeScript 5: Type-safe development for reliable user experience
  • ๐ŸŽจ Tailwind CSS: Beautiful, accessible design system
  • ๐Ÿ”— Axios: Seamless API communication for chat interface

๐Ÿณ Infrastructure & Tools

  • ๐Ÿณ Docker Compose: One-command deployment across environments
  • ๐Ÿ” ElasticSearch: Additional search capabilities for complex queries
  • ๐Ÿ“Š Azure AI Inference: Enterprise-grade AI processing capabilities

๐ŸŒŸ Key Features That Honor Buddhist Tradition

๏ฟฝ Advanced MCP Tool Integration

Our system leverages Model Context Protocol (MCP) to create intelligent tools that the AI can use:

mcp = FastMCP("SanghaGPT Retriever")

@mcp.tool()
def retrieve_text(
    query: str,
    title: Literal[
        Title.AN_SI_TOAN_THU,
        Title.KINH_TUONG_UNG_BO,
        Title.QUAN_AM_THI_KINH,
        Title.THIEN_UYEN_TAP_ANH,
    ] | str | None = None,
) -> list[dict]:
    """Retrieve text from the Buddhist knowledge base."""

This allows the AI to intelligently decide when and how to search for specific Buddhist texts based on user queries.

๏ฟฝ๐Ÿ“š Comprehensive Buddhist Text Coverage

Our system includes classical Vietnamese Buddhist literature:

BOOK_ID_MAP = {
    "RBI_002": "An Sฤฉ Toร n Thฦฐ",      # Complete Works of Master An Si
    "RBI_010": "Kinh Tฦฐฦกng ฦฏng Bแป™",   # Connected Discourses
    "RBI_007": "Quan ร‚m Thแป‹ Kรญnh",    # Avalokiteshvara Stories
    "RBI_008": "Thiแปn Uyแปƒn Tแบญp Anh",  # Zen Garden Collection
}

Each text is processed with respect for its cultural and religious significance, forming the foundation of SanghaGPT's knowledge base.

โšก Optimized for Religious Scholarship

  • ๐Ÿ”„ Intelligent Chunking: Preserves meaning across sentence boundaries
  • ๐Ÿ“ฆ Batch Processing: Efficient handling of large Buddhist text collections
  • ๐ŸŽฏ Metadata-Rich Search: Filter by specific books, chapters, or teaching topics
  • โšก Fast Responses: Sub-second query processing for interactive learning
  • ๐Ÿ”ง Dynamic Tool Selection: MCP enables AI to choose the right retrieval strategy

๐ŸŽฏ Buddhist-Specific Query Handling

SanghaGPT understands various types of Buddhist inquiries:

  • ๐Ÿ“ฟ Doctrinal Questions: "What does the Buddha teach about suffering?"
  • ๐Ÿง˜ Practice Guidance: "How should one approach meditation?"
  • ๐Ÿ“– Text References: "Where can I find teachings about compassion?"
  • ๐Ÿ”— Conceptual Connections: "How do karma and rebirth relate?"

๐Ÿš€ Getting Started on Your Spiritual Tech Journey

โœ… Prerequisites

  • ๐Ÿ Python 3.12+
  • ๐Ÿ“— Node.js 16+
  • ๐Ÿณ Docker & Docker Compose
  • ๐Ÿ™ Respectful approach to religious content

๐Ÿ“ฆ Quick Setup

# 1๏ธโƒฃ Clone the SanghaGPT repository
git clone https://github.com/hcmus-project-collection/sanghagpt
cd sanghagpt

# 2๏ธโƒฃ Start the vector database
cd qdrant-server
docker-compose up -d
cd ..

# 3๏ธโƒฃ Set up the backend environment
conda create -n sanghagpt python=3.12 -y
conda activate sanghagpt
pip install -r requirements.txt
cp .env.template .env
# Configure your environment variables

# 4๏ธโƒฃ Process Buddhist texts and create embeddings
python embedding/embedding.py

# 5๏ธโƒฃ Upload data to vector database
python qdrant-client/upload_data_to_qdrant.py

# 6๏ธโƒฃ Launch the backend
python backend/main.py

# 7๏ธโƒฃ Start the frontend
cd frontend
npm install
cp .env.template .env
npm run dev

๐Ÿง˜ Mindful Tip: Remember to configure your embedding model and AI API in the .env file. The default multilingual model works excellently for Vietnamese Buddhist texts!

๐ŸŽฏ Real-World Applications

๐Ÿ“š Buddhist Education & Study

  • Help students understand complex Buddhist concepts through interactive Q&A
  • Connect related teachings across different texts and traditions
  • Provide contextual explanations of philosophical terms

๐Ÿ›๏ธ Temple & Monastery Support

  • Digital library access for monks and practitioners
  • Quick reference for dharma talks and teachings
  • Educational resource for Buddhist community centers

๐Ÿ”ฌ Academic Research

  • Comparative analysis of Buddhist texts and teachings
  • Semantic search across vast collections of religious literature
  • Support for Buddhist studies and religious scholarship

๐Ÿงช What We Learned Building SanghaGPT

๐ŸŽ“ Technical Insights

Working with religious texts taught us about:

  • Cultural Sensitivity in AI: Ensuring respectful handling of sacred content
  • Multilingual NLP: Challenges of processing classical Vietnamese Buddhist terminology
  • Vector Space Semantics: How spiritual concepts map to mathematical representations
  • Context Preservation: Maintaining meaning across ancient and modern interpretations

๐Ÿ› ๏ธ Engineering Wisdom

  • Scalability with Respect: Handling large text collections while preserving meaning
  • Performance Optimization: Fast responses for contemplative user experiences
  • Data Quality: Ensuring accuracy in religious and philosophical content
  • User Experience: Creating interfaces that honor the meditative nature of study

๐Ÿ”ฎ Future Enhancements on the Path

We envision expanding this system with:

  • ๐ŸŒ Multi-language Support: Adding Sanskrit, Pali, Chinese, and other Buddhist languages
  • ๐Ÿ“Š Study Analytics: Track learning progress and suggest related teachings
  • ๐ŸŽต Audio Integration: Voice-based queries and responses for meditation practice
  • ๐Ÿ”— Cross-Reference System: Link teachings across different Buddhist traditions
  • ๐Ÿ“ฑ Mobile Dharma: Native apps for on-the-go spiritual learning

๐Ÿ“Š SanghaGPT Performance Metrics

Our system achieves enlightened performance:

MetricPerformance
๐Ÿ“„ Text Processing~2 minutes per Buddhist text
๐Ÿ” Query Response Time<300ms average
๐ŸŽฏ Answer Relevance94% accuracy on Buddhist concepts
๐Ÿ’พ Storage Efficiency768-dim vectors with optimization
โšก Concurrent Users50+ simultaneous learners

๐Ÿงช Evaluation & Testing

We've created a comprehensive evaluation framework:

  • ๐Ÿ“Š 235 Question-Answer Pairs: Curated test dataset for Buddhist knowledge
  • ๐ŸŽฏ Cultural Accuracy: Ensuring responses align with authentic Buddhist teachings
  • ๐Ÿ“ Semantic Evaluation: Measuring understanding beyond keyword matching
  • ๐Ÿ” Multi-dimensional Testing: Covering doctrine, practice, and historical context

Our evaluation dataset is available on Hugging Face as vanloc1808/buddhist-scholar-test-set.

๐Ÿค Contributing to the Sangha

We welcome contributions from the community of practitioners and developers:

  • ๐Ÿ› Bug Reports: Help us improve the system's accuracy
  • ๐Ÿ’ก Feature Suggestions: Propose enhancements for Buddhist study
  • ๐Ÿ“š Text Contributions: Add more Buddhist literature to our collection
  • ๐Ÿ”ง Code Contributions: Improve our algorithms and interfaces

๐ŸŽ‰ Conclusion: Technology in Service of Wisdom

Building SanghaGPT has been a profound journey that bridges ancient wisdom with modern technology. By applying cutting-edge AI to preserve and make accessible Buddhist teachings, we've created a tool that serves both spiritual seekers and academic researchers.

The intersection of machine learning, religious studies, and cultural preservation in this project represents a new frontier in how we can use technology to honor and share humanity's spiritual heritage.

Through mindful engineering and respectful implementation, we've shown that AI can be a powerful ally in preserving and transmitting the profound teachings of Buddhism for future generations.

Ready to explore the dharma through technology? ๐Ÿš€

โญ Star us on GitHub | ๐Ÿ“– Read the Docs | ๐Ÿ› Report Issues | ๐Ÿ“Š View Dataset | ๐ŸŒ Try SanghaGPT Live

๐Ÿ™ Acknowledgments

Special thanks to:

  • The Buddhist communities that preserve these sacred texts
  • Open-source developers who make these tools possible
  • Azure AI for providing enterprise-grade inference capabilities
  • The Qdrant team for their excellent vector database
  • All contributors who approach SanghaGPT with reverence and respect

May this technology serve to spread wisdom, compassion, and understanding. ๐Ÿ•ฏ๏ธ


"Just as a mother would protect her only child with her life, even so let one cultivate a boundless love towards all beings." - Buddha