In local AI operations, raw clock speed is secondary to Tokens Per Second (TPS). In this lesson, we learn how to benchmark your local hardware to determine its capacity for industrial-scale automation.
To measure performance in Ollama or LM Studio:
# In Ollama, use the /verbose flag
/set verbose
"Write a 500 word technical brief on RAG."
# Check the output for:
# eval count: 512 tokens
# eval duration: 10.2s
# TPS = eval count / eval duration = ~50 TPS
Unlike standard gaming, LLM inference is highly intensive for long durations. If your "Laptop Server" hits 90°C, your TPS will drop by 50%. A professional setup requires active cooling (laptop stands or server racks).
Llama-3-8B at Q4 quantization. Record the TPS.Identify a task that requires processing 100,000 words. Based on your current TPS, calculate how many hours it would take to finish. Propose one hardware upgrade to cut that time in half.