AI Services / Enterprise AI

AI integration for enterprise.

Multi-agent systems, RAG, intelligent automation, and model selection — built to pass compliance, designed to survive scale. From discovery sprint to production rollout.

From €5k discovery sprint · project-based · for funded scale-ups and enterprise teams

01 — INPUT user / system request 02 — AGENT LOOP plan act reflect 03 — TOOLS LLM (Claude, GPT) RAG / vectors APIs / DB memory 04 — OBSERVABILITY · token budget · prompt versioning · A/B evaluation · regression detection

What's included

RAG & retrieval

Vector store choice (pgvector, Qdrant, Pinecone), chunking strategy, citation-enforced generation, evaluation harness.

Multi-agent orchestration

Plan/act/reflect loops with tool calls, role specialization, supervisor patterns, observable traces, fallback paths.

Intelligent automation

Classification, extraction, routing, summarization at scale. Production-grade error handling.

Content moderation

Safety classifiers, content policy enforcement, audit-grade decision logs.

Model selection & fine-tuning

OpenAI, Claude, Llama, Mistral — cost/quality tradeoff analysis. Fine-tuning when warranted.

AI ops

Token budget tracking, prompt versioning, A/B evaluation, regression detection.

Example engagements

AI · Telecom · First in SI

Dealko

First Slovenian AI telecom assistant — multi-agent, embeddable widget, GDPR-compliant lead flow.

AI · SaaS · Multi-agent

CrewPress

7-agent CrewAI system for WordPress automation — content, SEO, dev, maintenance, analytics agents.

FAQ

How do you choose between models — OpenAI, Claude, open-source?

Per use case. Tool-use + structured output favours Claude Sonnet/Haiku 4.5. Bulk classification or RAG often runs cheaper on smaller models. Open-source (Llama, Mistral) for on-prem or data-residency mandates. I run the cost/quality experiments before recommending a stack.

What about hallucination risk?

You constrain it at the architecture level — RAG with explicit citation, tool calls that fetch authoritative data, structured outputs validated against schemas, and human-in-the-loop on high-stakes paths. The model is one component, not the system.

Can you keep data inside the EU / on-prem?

Yes. AWS Bedrock or Azure OpenAI in EU regions, OpenAI EU Data Residency, or self-hosted Llama/Mistral on your infra. I design the data flow before the model selection.

Who owns the prompts, the fine-tuned models, the IP?

You do. Contract is clear: code, prompts, fine-tunes, evaluation suites — all transferred. The exception is general-purpose libraries (Anthropic SDK, LangChain wrappers) where we contribute back upstream.

How long does a typical project take?

A discovery sprint runs 1 week. MVP build runs 4-8 weeks. Production-grade enterprise integration 3-6 months including evaluation harness, monitoring, and rollback procedures.

Book a discovery sprint

One week. I read your existing system, scope the AI integration honestly, deliver a written plan with cost, timeline, and a yes/no on whether the project is worth running.

Start a discovery sprint →