Semantic Search: The Foundation of Modern AI

Before RAG. Before agents. Before modern AI.

There was semantic search.

Everything else is built on top.

What It Is#

Traditional search: match keywords.

Query: "refund policy"
Matches: documents containing "refund" and "policy"

Semantic search: match meaning.

Query: "refund policy"
Matches: documents about returns, money back, 
         cancellation, even if those exact words aren't used

The difference is understanding vs. pattern matching.

How It Works#

Step 1: Embed Everything#

Convert text to vectors (lists of numbers).

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("refund policy")
# → [0.023, -0.089, 0.156, ..., 0.044]  (384 numbers)

Similar meanings → similar vectors.

Step 2: Store in a Vector Database#

Index for fast retrieval.

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

collection.add(
    documents=["Return items within 30 days...", "Full refund available..."],
    embeddings=[embed(doc) for doc in documents],
    ids=["doc1", "doc2"]
)

Step 3: Query by Similarity#

Find nearest neighbors.

results = collection.query(
    query_embeddings=[embed("money back guarantee")],
    n_results=5
)
# Returns documents about refunds, returns, guarantees

Even if those exact words weren't in the query.

The Embedding Space#

Embeddings capture relationships.

king - man + woman ≈ queen
paris - france + italy ≈ rome

Similar concepts cluster together. Different concepts are far apart.

This is why semantic search works - it's navigating a space where meaning has geometry.

Choosing an Embedding Model#

Dimensions#

More dimensions = more nuance, more compute, more storage.

384 dims: fast, good for most use cases
768 dims: better quality, reasonable cost
1536+ dims: best quality, highest cost

Match to your quality needs.

Training Data#

Models trained on similar data work better.

General text → use general models
Code → use code-trained models
Medical/legal → use domain-specific models

Popular Options#

Model	Dims	Speed	Quality
all-MiniLM-L6-v2	384	⚡⚡⚡	⭐⭐
all-mpnet-base-v2	768	⚡⚡	⭐⭐⭐
text-embedding-3-small	1536	⚡⚡	⭐⭐⭐
text-embedding-3-large	3072	⚡	⭐⭐⭐⭐

Benchmark on YOUR data. General rankings don't always hold.

Vector Database Options#

Purpose-Built#

Pinecone: Managed, scalable, expensive
Weaviate: Open source, feature-rich
Qdrant: Open source, Rust-based, fast
Milvus: Open source, distributed

Add-ons to Existing DBs#

pgvector: PostgreSQL extension
Elasticsearch: Has vector search now
MongoDB: Atlas Vector Search

If you're already running Postgres, pgvector is often enough.

Stay Updated

Get updates on new labs and experiments.

The Architecture of Autonomous Flight

How we built a neural-symbolic hybrid system to control manned aircraft in real-time.

Practical Patterns#

Hybrid Search#

Vector search alone misses exact matches.

def hybrid_search(query, k=10):
    vector_results = vector_db.search(embed(query), k*2)
    keyword_results = keyword_db.search(query, k*2)
    
    # Combine and rerank
    combined = merge(vector_results, keyword_results)
    return rerank(combined, query)[:k]

"Invoice #12345" - you want exact match, not semantically similar invoices.

Chunking Strategy#

Documents → chunks → embeddings.

Too small: Loses context. "It" has no referent. Too large: Dilutes relevance. Buries the answer.

# Overlapping chunks preserve context
chunks = chunk(document, 
    size=500,      # tokens per chunk
    overlap=100    # tokens shared between adjacent chunks
)

500-1000 tokens with 10-20% overlap works for most cases.

Metadata Filtering#

Narrow search before similarity.

results = collection.query(
    query_embeddings=[embed(query)],
    where={"department": "engineering", "year": {"$gte": 2024}},
    n_results=10
)

Filter first, then find similar. Faster and more relevant.

Common Failures#

Cold Start#

New documents aren't searchable until embedded.

Fix: Embed on write, not on read. Keep index fresh.

Embedding Drift#

Model updates change the embedding space.

Old embeddings + new query embedding = bad results.

Fix: Re-embed everything when changing models.

Semantic Mismatch#

User's language ≠ document's language.

"How do I get my money back?" vs. formal policy documents.

Fix: Query expansion. Rephrase queries before searching.

Beyond Search#

Semantic search enables:

RAG: Find relevant context for LLM prompts
Recommendations: Find similar items
Deduplication: Find near-duplicate content
Clustering: Group similar documents
Anomaly detection: Find outliers

The foundation is the same. Applications vary.

Performance at Scale#

Approximate Nearest Neighbor (ANN)#

Exact search is O(n). ANN is O(log n).

# HNSW index - fast approximate search
collection.create_index(
    index_type="HNSW",
    metric="cosine",
    m=16,          # graph connectivity
    ef_construction=200  # build-time quality
)

Slightly less accurate. Much faster.

Sharding#

Millions of vectors → split across machines.

Shard 1: docs from A-M
Shard 2: docs from N-Z
# Query both, merge results

Most vector DBs handle this automatically.

2026 Field Notes: The Reality of Local Context Gateways#

The consensus among AI engineers is clear: Fine-tuning is for tone/style compliance, while RAG is for up-to-date facts. The number one failure mode in production RAG remains "Right documents, wrong chunks."

Additionally, with recent Model Context Protocol (MCP) privilege escalation vulnerabilities, security researchers are pushing for sandboxed local environments. We treat Local LLMs not as toys, but as Local Context Gateways—mandatory infrastructure for privacy-critical VPCs where data cannot leave the network.