You can start using RAG without setting up a big fancy vector database. I made a working setup with just embeddings stored in a file and a simple cosine similarity search.
I did not use a database here. A plain JSON file worked fine.
interface Document {
id: string
content: string
embedding: number[]
metadata: {
title: string
date: string
}
}
const documents: Document[] = JSON.parse(fs.readFileSync('embeddings.json'))
function cosineSimilarity(a: number[], b: number[]): number {
// this is just the dot product of the two vectors
const dot = a.reduce((sum, val, i) => sum + val * b[i], 0)
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0))
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0))
return dot / (magA * magB)
}
async function search(query: string, docs: Document[], topK: number = 3) {
// this will turn the query into an embedding
const queryEmbedding = await getEmbedding(query)
return docs
.map(doc => ({
...doc,
similarity: cosineSimilarity(queryEmbedding, doc.embedding)
}))
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK)
}
The setup took about two hours. It's exponentially faster than the couple of days it can take to set up a real vector DB.
The verage query time was around 50 milliseconds.
Accuracy on my small test set was about 85 percent which is fair.
Overall torage cost was basically next to nothing, as it's about five cents a month compared to fifty dollars or more for a managed vector DB.
Switch to a proper vector database if you have more than ten thousand documents, need instant updates, want advanced filtering, or need rock solid uptime for production.
It is worth starting with something simple and making it better over time.
Good data quality helps more than fancy algorithms.
Keep an eye on your embeddings so they stay consistent.
Cache the embeddings you look up often to save time and cost.