AI integration for enterprise.
Multi-agent systems, RAG, intelligent automation, and model selection — built to pass compliance, designed to survive scale. From discovery sprint to production rollout.
From €5k discovery sprint · project-based · for funded scale-ups and enterprise teams
What's included
RAG & retrieval
Vector store choice (pgvector, Qdrant, Pinecone), chunking strategy, citation-enforced generation, evaluation harness.
Multi-agent orchestration
Plan/act/reflect loops with tool calls, role specialization, supervisor patterns, observable traces, fallback paths.
Intelligent automation
Classification, extraction, routing, summarization at scale. Production-grade error handling.
Content moderation
Safety classifiers, content policy enforcement, audit-grade decision logs.
Model selection & fine-tuning
OpenAI, Claude, Llama, Mistral — cost/quality tradeoff analysis. Fine-tuning when warranted.
AI ops
Token budget tracking, prompt versioning, A/B evaluation, regression detection.
Example engagements
Dealko
First Slovenian AI telecom assistant — multi-agent, embeddable widget, GDPR-compliant lead flow.
CrewPress
7-agent CrewAI system for WordPress automation — content, SEO, dev, maintenance, analytics agents.
FAQ
How do you choose between models — OpenAI, Claude, open-source?
Per use case. Tool-use + structured output favours Claude Sonnet/Haiku 4.5. Bulk classification or RAG often runs cheaper on smaller models. Open-source (Llama, Mistral) for on-prem or data-residency mandates. I run the cost/quality experiments before recommending a stack.
What about hallucination risk?
You constrain it at the architecture level — RAG with explicit citation, tool calls that fetch authoritative data, structured outputs validated against schemas, and human-in-the-loop on high-stakes paths. The model is one component, not the system.
Can you keep data inside the EU / on-prem?
Yes. AWS Bedrock or Azure OpenAI in EU regions, OpenAI EU Data Residency, or self-hosted Llama/Mistral on your infra. I design the data flow before the model selection.
Who owns the prompts, the fine-tuned models, the IP?
You do. Contract is clear: code, prompts, fine-tunes, evaluation suites — all transferred. The exception is general-purpose libraries (Anthropic SDK, LangChain wrappers) where we contribute back upstream.
How long does a typical project take?
A discovery sprint runs 1 week. MVP build runs 4-8 weeks. Production-grade enterprise integration 3-6 months including evaluation harness, monitoring, and rollback procedures.
Book a discovery sprint
One week. I read your existing system, scope the AI integration honestly, deliver a written plan with cost, timeline, and a yes/no on whether the project is worth running.
Start a discovery sprint →