Lightweight RAG: cheap wins with embeddings

June 2025

You can start using RAG without setting up a big fancy vector database. I made a working setup with just embeddings stored in a file and a simple cosine similarity search.

The basic setup
1. Storing the embeddings

I did not use a database here. A plain JSON file worked fine.

interface Document {
  id: string
  content: string
  embedding: number[]
  metadata: {
    title: string
    date: string
  }
}

const documents: Document[] = JSON.parse(fs.readFileSync('embeddings.json'))
2. Finding the most similar documents
function cosineSimilarity(a: number[], b: number[]): number {
  // this is just the dot product of the two vectors

  const dot = a.reduce((sum, val, i) => sum + val * b[i], 0)

  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0))
  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0))

  return dot / (magA * magB)
}



async function search(query: string, docs: Document[], topK: number = 3) {
  // this will turn the query into an embedding
  const queryEmbedding = await getEmbedding(query)

  return docs
    .map(doc => ({
      ...doc,
      similarity: cosineSimilarity(queryEmbedding, doc.embedding)
    }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK)
}
How it performed for me

The setup took about two hours. It's exponentially faster than the couple of days it can take to set up a real vector DB.
The verage query time was around 50 milliseconds.
Accuracy on my small test set was about 85 percent which is fair.
Overall torage cost was basically next to nothing, as it's about five cents a month compared to fifty dollars or more for a managed vector DB.

When you might need to upgrade

Switch to a proper vector database if you have more than ten thousand documents, need instant updates, want advanced filtering, or need rock solid uptime for production.

Things I learned

It is worth starting with something simple and making it better over time.
Good data quality helps more than fancy algorithms.
Keep an eye on your embeddings so they stay consistent.
Cache the embeddings you look up often to save time and cost.