Build Notes
Note 1: Choosing an LLM for Business Agent Tasks
ModelsCostUpdated March 2025Not every task needs the most powerful model. Most business agent tasks — drafting replies, summarizing data, categorizing messages — work well with mid-tier models. Save the expensive models for complex reasoning.
Decision matrix
| Task type | Good enough | When to upgrade |
|---|---|---|
| Auto-replies & acknowledgments | GPT-4o-mini, Claude Haiku, Gemini Flash | Rarely — these are templated |
| Summarization & reporting | GPT-4o-mini, Claude Sonnet | Long documents or multi-source synthesis |
| Client communication drafts | Claude Sonnet, GPT-4o | High-stakes or nuanced tone required |
| Complex analysis & planning | Claude Opus, GPT-4o | Default to best available |
| Code generation & debugging | Claude Sonnet, GPT-4o | Multi-file refactors, security-critical code |
Deeper dive: How to Choose an LLM →
Note 2: Agent Platforms — What Actually Matters
PlatformsArchitectureThe platform you build your agent on matters less than you think. What matters: can it connect to your tools, can a non-developer maintain it, and will it still work when you're not watching?
What to evaluate
- Integration depth: Does it natively connect to your CRM, email, project management tool? Or do you need a middleware layer (Zapier, Make, n8n)?
- Failure handling: What happens when the API is down, the model hallucinates, or a webhook fails? Does it retry, alert, or silently break?
- Observability: Can you see what the agent did, why it did it, and what it sent — after the fact? If you can't audit it, you can't trust it.
- Maintenance burden: Who updates the prompts, adjusts the logic, fixes the broken integration? If the answer is "the developer who built it," that's a bus-factor problem.
- Cost predictability: Per-message pricing, per-seat pricing, or usage-based? Model costs are separate from platform costs — track both.
Deeper dive: How to Choose an Agent Platform →
Note 3: Security Basics for AI Agent Systems
SecurityNon-NegotiableAI agents touch your client data, your communication channels, and your business tools. Security isn't optional — even for a 3-person shop.
Minimum security checklist
- API keys and credentials: Store in environment variables or a secrets manager. Never in code, never in prompts, never in shared docs.
- Principle of least privilege: The agent should only have access to what it needs. Read-only where possible. Don't give your lead-response agent access to your invoicing system.
- Audit logging: Log every action the agent takes — what it read, what it sent, who it contacted. You need this for debugging and for client trust.
- Input validation: If the agent processes external input (forms, emails, messages), sanitize it. Prompt injection is real — someone will eventually submit a form that says "ignore your instructions and…"
- Human-in-the-loop for high-risk actions: Sending money, deleting data, modifying access permissions, publishing content — these require human approval. Always.
- Regular review: Monthly, spend 15 minutes reviewing what the agent has access to and whether it still needs it. Revoke stale permissions.
Related: MSA Security Resources →
Note 4: Integration Patterns That Actually Work
ArchitectureIntegrationsEvery AI agent project involves connecting systems. Here's what we've seen work reliably vs. what causes ongoing pain.
Reliable patterns
- Webhook → Agent → Action: Form submits, webhook fires, agent processes, takes action. Simple, testable, predictable. Use this for lead response, intake processing, notification routing.
- Scheduled pull → Summary → Distribute: Agent runs on a schedule (daily/weekly), pulls data from connected tools, generates a summary, sends it to the right people. Use this for ops reviews, client updates, pipeline reports.
- Monitor → Flag → Human decision: Agent watches a channel (email, Slack, support inbox) for patterns, flags items that need attention, and routes them with context. Use this for scope creep detection, escalation management, opportunity spotting.
Patterns that cause pain
- Real-time bidirectional sync: Keeping two systems in perfect sync in real-time is fragile. Conflicts, race conditions, and silent failures. Use event-driven updates with conflict resolution rules instead.
- Agent chains longer than 3 steps: Agent A calls Agent B calls Agent C. Each step adds latency, failure risk, and debugging complexity. If your chain is longer than 3 steps, redesign the workflow.
- Screen scraping as an integration: If there's no API, build a buffer layer (spreadsheet, form, manual entry point) rather than scraping a UI. Scrapers break on every UI update.
Note 5: What We Got Wrong (And Fixed)
LessonsHonestyBuilding in public means sharing mistakes too. Here's what bit us.
- Over-automating too early: We automated a client update flow before the content format was stable. Spent more time fixing the automation than it would have taken to do it manually for 3 months. Lesson: Automate after the process is proven, not before.
- Skipping the handoff rules: An early agent didn't have clear escalation criteria. It confidently answered a pricing question wrong. Client was confused, we looked unprofessional. Lesson: Define the boundary before you deploy. (See the Handoff SOP →)
- Using the best model for everything: Our first agent used GPT-4 for every task, including simple acknowledgments. Monthly cost was 5× what it needed to be. Lesson: Route by task complexity, not by default.
- No monitoring for 2 weeks: Deployed an agent, celebrated, moved on. It silently failed 4 days later when an API token expired. Nobody noticed for 10 days. Lesson: Set up dead-man-switch alerts. If the agent hasn't done anything in 24 hours, something is probably wrong.
Use these notes to choose one tooling decision this week and document the result after 14 days.
Apply with a playbook →