GeneInsight Documentation ========================= .. image:: https://img.shields.io/badge/license-MIT-blue :alt: MIT License .. image:: https://img.shields.io/badge/coverage-89%25-brightgreen :alt: Coverage .. image:: https://img.shields.io/badge/python-3.10%2B-blue :alt: Python 3.10+ **GeneInsight**: AI-powered tool for gene set interpretation through advanced topic modeling and large language models. .. image:: _static/GeneInsight.png :alt: GeneInsight logo Overview -------- GeneInsight addresses the challenge of interpreting gene sets by combining advanced topic modeling with large language models to automatically synthesize diverse biological annotations. It consolidates extensive annotations into coherent thematic summaries, enabling rapid extraction of biologically significant insights that conventional enrichment analyses often overlook. The tool provides a comprehensive pipeline for analyzing gene sets: 1. **Gene Enrichment**: Query STRING database (`https://string-db.org/ `_) for gene-specific annotations 2. **Topic Modeling**: Apply BERTopic (`https://maartengr.github.io/BERTopic/index.html `_) to identify coherent biological themes 3. **LLM Integration**: Use language models to refine and interpret topics through Retrieval Augmented Generation (RAG), enhancing topic interpretations with domain-specific knowledge 4. **Enrichment Analysis**: Perform hypergeometric testing to validate biological relevance 5. **Results Visualization**: Generate interactive reports with biological insights Key Features ----------- * Integrates gene-specific annotations from multiple sources including STRING database * Employs cluster-based topic modeling to identify coherent biological themes * Utilizes large language models for automated knowledge synthesis * Performs statistical validation through hypergeometric testing * Generates hierarchical summaries with interactive visualization * Supports multiple species through NCBI taxonomy IDs * Offers both command-line interface and API access .. toctree:: :maxdepth: 2 :caption: Contents: installation getting_started usage/index theory/index architecture/index examples/index References ---------- .. _bertopic-ref: **BERTopic** `https://maartengr.github.io/BERTopic/index.html `_ - Advanced topic modeling technique that leverages BERT embeddings and clustering algorithms to create coherent topics .. _stringdb-ref: **STRING-DB** `https://string-db.org/ `_ - Database of known and predicted protein-protein interactions, providing comprehensive gene annotations