A large language model is, by default, an amnesiac. Every API call starts with a completely blank slate. To build systems that exhibit continuous learning, self-correction, and genuine autonomy, we must engineer external memory architectures.
In this lesson, we graduate from simple "chat history arrays" to Tiered Memory Systems utilizing vector databases and semantic retrieval.
Just like the human brain, an enterprise-grade agentic system separates memory by latency and relevance.
This is the immediate working memory. It is passed directly into the LLM prompt.
{"role": "user/assistant", "content": "..."} dicts.This is a chronological ledger of everything the agent has ever done, stored in a traditional relational database (SQLite/PostgreSQL).
This is where true "intelligence" lives. It allows the agent to recall specific facts, SOPs, or past experiences based on meaning rather than exact keywords.
How does an agent actually use its Long-Term Memory? Through an automated RAG injection loop before every major decision.
[Vectorize: "TechCorp CEO CRM Pitch successful examples"].A system that only reads memory is static. A true autonomous agent must write to its own memory. We do this through an asynchronous Consolidation Job.
At the end of an objective, the agent summarizes its experience and commits it to the Vector Store for future use.
import chromadb
from sentence_transformers import SentenceTransformer
class MemoryEngine:
def __init__(self):
# Initialize local vector database
self.chroma_client = chromadb.PersistentClient(path="./agent_memory")
self.collection = self.chroma_client.get_or_create_collection(name="strategic_insights")
self.embedder = SentenceTransformer('all-MiniLM-L6-v2') # Lightweight local embedder
def commit_insight(self, task_description: str, successful_outcome: str):
"""The agent reflects on what worked and saves it."""
insight = f"Task: {task_description} | Winning Strategy: {successful_outcome}"
# Convert text to mathematical vector
vector = self.embedder.encode(insight).tolist()
# Save to Long Term Memory
doc_id = f"insight_{hash(insight)}"
self.collection.add(
embeddings=[vector],
documents=[insight],
ids=[doc_id]
)
print(f"Memory Consolidated: {doc_id}")
def recall_strategy(self, current_problem: str, top_k: int = 2):
"""The agent searches its brain before acting."""
query_vector = self.embedder.encode(current_problem).tolist()
results = self.collection.query(
query_embeddings=[query_vector],
n_results=top_k
)
return results['documents'][0] # Returns the most relevant past insights
The highest tier of agentic memory is Error Recognition. When an agent fails (e.g., gets an API 400 error, or the user rejects its draft), it must explicitly write a "Negative Insight" to its database: "When contacting API X, using parameter Y causes a failure. Next time, use parameter Z."
By embedding negative constraints, the system becomes anti-fragile. It literally gets smarter every time it breaks.
Set up a local instance of ChromaDB in Python. Manually insert 5 "facts" about a fictional company into the vector database. Write a script that takes user input, converts it to an embedding, queries the database, and injects the retrieved fact into a Gemini 2.5 Flash prompt to answer the user's question accurately.