Local LLM Trading Engine: Ollama + RAG on M3 24GB
Multi-model decision engine 100% on-device: qwen2.5:14b primary + llama3.1:8b fallback + LLaVA:7b vision for chart screenshots + ChromaDB RAG. Zero API cost, zero data leakage, full control over prompts and decision system.
Cloud LLM for trading — three reasons not to use it
(1) Latency. With hundreds of signals per day, every request to OpenAI/Claude adds seconds → you miss entry points. (2) Cost. $0.01-0.10 per signal × 1000 signals = $100/day on inference alone. (3) Strategy leakage. Trading edges in prompts are IP. Sending them to OpenAI means sharing your competitive advantage.
Needed a fully local engine capable of receiving ICT/Smart Money Concepts signals (BHM-3BP, iFVG, IMPULSIVE, REVERSAL), analyzing confluence (HTF bias + displacement + liquidity), and producing BUY/SELL/NO_TRADE decisions with a confidence metric.
Three models + RAG + bridge
Primary model for the trade decision. Reasoning over structured-input signals + historical context from RAG.
Backup on qwen timeouts and for fast auxiliary queries (market classification, sanity check).
Vision model for chart-screenshot analysis: pattern recognition, visual validation of order block / FVG / liquidity sweep.
Stores historical trades (win/loss + context). Pulls 5-10 nearest analogs for the current signal → gives the LLM context for the decision.
Decision engine with confidence score
For each incoming signal (BHM-3BP / iFVG / SM-OB-CONT / SM-ASIA-SWEEP), the decision engine assembles:
- Tier classification of the signal (A/B/C/D) → base confidence 0.45-0.65
- Confluence bonuses: HTF bias (+0.05) + displacement (+0.03) + liquidity sweep (+0.02)
- Historical modifier via RAG: 5-10 similar trades are pulled, win rate is computed
- Regulatory gating: time-of-day, session (Asia/London/NY), seasonality
The LLM produces a final confidence and decision (BUY/SELL/NO_TRADE). Threshold 0.70 — below that is cut off as "borderline". Code guardrails check lot size, distance to TP/SL, correlation with open positions.
Phase-3A dry run · 100 signals
within the 60-75% corridor → strict noise filter
all safety checks passed, no lot-size / SL / correlation violations
vs OpenAI/Claude API ~$100/day at comparable volume
Where it fits for clients: a local-LLM stack is ideal where data must not leave the company perimeter: healthcare (FZ-152 PII), banking, defense-affiliated, corporate security, legal-tech. Deployment on an M2/M3 Mac mini (~₽1000/mo electricity), Apple Silicon with unified memory handles 14B models at comfortable latency.
"The same approach I use myself — applied to you." Custom RAG over your documents + multi-model fallback + local inference. No external APIs, no IP leakage, full control.
Аудит за 5 000 ₽ — с конкретным отчётом и сметой
Расскажу что внедрить в вашем бизнесе в первую очередь, какая будет окупаемость, и нужен ли вообще AI для вашей задачи (иногда — нет).
Или просто напишите свой вопрос — отвечу в течение 2 часов