Skip to content
    AI engineering roles via the DiamantAI Collective.See open roles
    Context EnrichmentRAG Techniques

    Semantic Chunking

    This code implements a semantic chunking approach for processing and retrieving information from PDF documents, first proposed by Greg Kamradt and subsequently implemented in LangChain. Unlike traditional methods that split text based on fixed character or word counts, semantic chunking aims to create more meaningful and context-aware text segments.

    Traditional text splitting methods often break documents at arbitrary points, potentially disrupting the flow of information and context. Semantic chunking addresses this issue by attempting to split text at more natural breakpoints, preserving semantic coherence within each chunk.

    What you'll learn

    • 1
      PDF processing and text extraction
    • 2
      Semantic chunking using LangChain's SemanticChunker
    • 3
      Vector store creation using FAISS and OpenAI embeddings
    • 4
      Retriever setup for querying the processed documents

    About this tutorial

    This hands-on Jupyter notebook is part of RAG Techniques, a free open-source repository by Nir Diamant covering rag techniques with runnable code examples and detailed explanations.

    Free and open-sourceRunnable Jupyter notebookActive community support
    Go deeper · Amazon Bestseller in Generative AI

    RAG Made Simple

    The book that extends this repo: 22 RAG techniques with the intuition behind each, side-by-side comparisons of when each wins (and quietly fails), and original illustrations.

    Get it on Amazon

    ⭐ 4.4 stars · 1,500+ readers · Kindle $9.99 · Paperback $24.99 · Free with Kindle Unlimited

    More Context Enrichment tutorials

    More from RAG Techniques