Build Notes

Lessons from building AI agent systems for real businesses. No hype, no affiliate links. Just what we learned, what we'd pick again, and what we'd avoid.

Note 1: Choosing an LLM for Business Agent Tasks

ModelsCostUpdated March 2025

Not every task needs the most powerful model. Most business agent tasks — drafting replies, summarizing data, categorizing messages — work well with mid-tier models. Save the expensive models for complex reasoning.

Decision matrix

Task type	Good enough	When to upgrade
Auto-replies & acknowledgments	GPT-4o-mini, Claude Haiku, Gemini Flash	Rarely — these are templated
Summarization & reporting	GPT-4o-mini, Claude Sonnet	Long documents or multi-source synthesis
Client communication drafts	Claude Sonnet, GPT-4o	High-stakes or nuanced tone required
Complex analysis & planning	Claude Opus, GPT-4o	Default to best available
Code generation & debugging	Claude Sonnet, GPT-4o	Multi-file refactors, security-critical code

Rule of thumb: Start with the cheapest model that produces acceptable output. Upgrade per-task, not globally. A single agent can route different tasks to different models. Most businesses can run 80% of agent tasks on mini/haiku-tier models at under $20/month.

Deeper dive: How to Choose an LLM →

Note 2: Agent Platforms — What Actually Matters

PlatformsArchitecture

The platform you build your agent on matters less than you think. What matters: can it connect to your tools, can a non-developer maintain it, and will it still work when you're not watching?

What to evaluate

Integration depth: Does it natively connect to your CRM, email, project management tool? Or do you need a middleware layer (Zapier, Make, n8n)?
Failure handling: What happens when the API is down, the model hallucinates, or a webhook fails? Does it retry, alert, or silently break?
Observability: Can you see what the agent did, why it did it, and what it sent — after the fact? If you can't audit it, you can't trust it.
Maintenance burden: Who updates the prompts, adjusts the logic, fixes the broken integration? If the answer is "the developer who built it," that's a bus-factor problem.
Cost predictability: Per-message pricing, per-seat pricing, or usage-based? Model costs are separate from platform costs — track both.

Our take: For service businesses under 20 people, start with a platform that handles the orchestration (OpenClaw, n8n, Make) and connect it to the models via API. Avoid platforms that lock you into a specific model provider. You will want to switch models — the landscape changes every 3 months.

Deeper dive: How to Choose an Agent Platform →

Note 3: Security Basics for AI Agent Systems

SecurityNon-Negotiable

AI agents touch your client data, your communication channels, and your business tools. Security isn't optional — even for a 3-person shop.

Minimum security checklist

API keys and credentials: Store in environment variables or a secrets manager. Never in code, never in prompts, never in shared docs.
Principle of least privilege: The agent should only have access to what it needs. Read-only where possible. Don't give your lead-response agent access to your invoicing system.
Audit logging: Log every action the agent takes — what it read, what it sent, who it contacted. You need this for debugging and for client trust.
Input validation: If the agent processes external input (forms, emails, messages), sanitize it. Prompt injection is real — someone will eventually submit a form that says "ignore your instructions and…"
Human-in-the-loop for high-risk actions: Sending money, deleting data, modifying access permissions, publishing content — these require human approval. Always.
Regular review: Monthly, spend 15 minutes reviewing what the agent has access to and whether it still needs it. Revoke stale permissions.

The bar is low — clear it. Most agent security failures aren't sophisticated attacks. They're credentials left in a Notion doc, an agent with admin access it doesn't need, or nobody checking the logs. Fix those three things first.

Related: MSA Security Resources →

Note 4: Integration Patterns That Actually Work

ArchitectureIntegrations

Every AI agent project involves connecting systems. Here's what we've seen work reliably vs. what causes ongoing pain.

Reliable patterns

Webhook → Agent → Action: Form submits, webhook fires, agent processes, takes action. Simple, testable, predictable. Use this for lead response, intake processing, notification routing.
Scheduled pull → Summary → Distribute: Agent runs on a schedule (daily/weekly), pulls data from connected tools, generates a summary, sends it to the right people. Use this for ops reviews, client updates, pipeline reports.
Monitor → Flag → Human decision: Agent watches a channel (email, Slack, support inbox) for patterns, flags items that need attention, and routes them with context. Use this for scope creep detection, escalation management, opportunity spotting.

Patterns that cause pain

Real-time bidirectional sync: Keeping two systems in perfect sync in real-time is fragile. Conflicts, race conditions, and silent failures. Use event-driven updates with conflict resolution rules instead.
Agent chains longer than 3 steps: Agent A calls Agent B calls Agent C. Each step adds latency, failure risk, and debugging complexity. If your chain is longer than 3 steps, redesign the workflow.
Screen scraping as an integration: If there's no API, build a buffer layer (spreadsheet, form, manual entry point) rather than scraping a UI. Scrapers break on every UI update.

Keep it boring. The best agent integrations are the ones that are easy to explain, easy to test, and easy to fix at 11pm when something breaks. Clever architectures are fun to design and miserable to maintain.

Note 5: What We Got Wrong (And Fixed)

LessonsHonesty

Building in public means sharing mistakes too. Here's what bit us.

Over-automating too early: We automated a client update flow before the content format was stable. Spent more time fixing the automation than it would have taken to do it manually for 3 months. Lesson: Automate after the process is proven, not before.
Skipping the handoff rules: An early agent didn't have clear escalation criteria. It confidently answered a pricing question wrong. Client was confused, we looked unprofessional. Lesson: Define the boundary before you deploy. (See the Handoff SOP →)
Using the best model for everything: Our first agent used GPT-4 for every task, including simple acknowledgments. Monthly cost was 5× what it needed to be. Lesson: Route by task complexity, not by default.
No monitoring for 2 weeks: Deployed an agent, celebrated, moved on. It silently failed 4 days later when an API token expired. Nobody noticed for 10 days. Lesson: Set up dead-man-switch alerts. If the agent hasn't done anything in 24 hours, something is probably wrong.

Next step

Use these notes to choose one tooling decision this week and document the result after 14 days.

Apply with a playbook →