Document Augmentation
This implementation demonstrates a text augmentation technique that leverages additional question generation to improve document retrieval within a vector database. By generating and incorporating various questions related to each text fragment, the system enhances the standard retrieval process, thus increasing the likelihood of finding relevant documents that can be utilized as context for generative question answering.
By enriching text fragments with related questions, we aim to significantly enhance the accuracy of identifying the most relevant sections of a document that contain answers to user queries.
What you'll learn
- 1PDF Processing and Text Chunking: Handling PDF documents and dividing them into manageable text fragments.
- 2Question Augmentation: Generating relevant questions at both the document and fragment levels using OpenAI's language models.
- 3Vector Store Creation: Calculating embeddings for documents using OpenAI's embedding model and creating a FAISS vector store.
- 4Retrieval and Answer Generation: Finding the most relevant document using FAISS and generating answers based on the context provided.
About this tutorial
This hands-on Jupyter notebook is part of RAG Techniques, a free open-source repository by Nir Diamant covering rag techniques with runnable code examples and detailed explanations.
RAG Made Simple
The book that extends this repo: 22 RAG techniques with the intuition behind each, side-by-side comparisons of when each wins (and quietly fails), and original illustrations.
Get it on Amazon⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited
