What a 6-week production AI pilot actually looks like

Most AI pilots don’t ship.

The mid-market firm signs a contract with a big SI, runs a 12-week “discovery” phase, ends up with a slide deck, and quietly never builds anything. Or the in-house team experiments with the Claude API for six months and produces a demo that never reaches production. Or the CEO buys ChatGPT Enterprise seats and asks “where’s the ROI?” eighteen months later. The default outcome of an AI initiative inside a mid-market firm is no shipped tool.

We do something different. Six to eight weeks. Fixed scope. Fixed price. One specific workflow becomes a Claude-powered tool your team uses the day after we hand it off. This is what that engagement actually looks like, week by week.

Week 0: Scoping (before the contract is signed)

Before we sign anything, we spend a free 90 minutes with the person who would own the engagement on your side, plus one or two operators who do the actual work the tool would help with. The goal is to find ONE workflow that meets four criteria.

The workflow has to be painful enough that someone can name the hours lost to it per week. Not “we’d like to automate emails.” More like “our credit analysts spend 12 hours a week pulling data from three internal systems and writing standardized memos.”

The workflow has to be bounded enough to ship in six weeks. Not “modernize our customer service.” More like “draft the first version of a credit memo from a structured input.”

The data has to be accessible. If touching it requires a six-week security review, we can’t ship in six weeks. We look for systems we can read with a service account and that don’t require us to handle PHI or unencrypted PII in v1.

The buyer has to have discretionary spend authority at the $60-100k range. If the engagement needs to go to an investment committee or a board vote, it isn’t a pilot. It’s a procurement cycle.

If all four boxes check, we send a Statement of Work the next day. The scope is one sentence. The price is one number. The timeline is six to eight weeks.

Week 1: Discovery

Day one, our engineer is in your office (or on a video call all day if your team is remote-first) sitting next to the two operators who do the workflow today. We watch them do it. We don’t take notes about feelings. We take notes about what they actually click, what they actually look up, what they actually re-type, what they actually wait for.

By Friday of week one, we’ve produced two artifacts: a flowchart of the current workflow with timestamps on each step, and a one-page spec of the production tool we’re going to build. Your operators sign off on both. If they don’t, we revise.

The single most important thing we figure out in week one is not “what does Claude do.” It’s “what does the human do AFTER the tool produces an output.” If we don’t have that part nailed, we’re building a demo, not a tool.

Weeks 2 to 5: Build

The applied AI engineer sits in your environment (VPN access, service account, dev keys for Claude) and builds. Every Friday, a 30-minute demo to the same two operators from week one. They use the latest version. They tell us what’s broken. The next Monday, we fix it.

By the end of week four, the tool produces outputs an operator can review without rewriting from scratch. By the end of week five, the tool produces outputs an operator can ship 80% of the time without changes.

The unsexy work that happens in weeks two through five is integration. The Claude call itself is 30 lines of code. The other 700 lines are: authenticating to your CRM, parsing the input format, handling the edge cases your operators didn’t think to mention in week one, logging every output for audit, building a review interface that doesn’t make your operators hate their job, writing the eval set that proves the thing actually works.

If a vendor’s pitch is “we’ll build this in two weeks because Claude does everything,” they don’t ship. They produce a demo.

Weeks 6 to 8: Handoff and adoption

In week six, your operators stop using the OLD workflow during business hours. They use the new tool. We’re sitting next to them, watching what breaks. By the end of week six, we ship a v1.0 release tag and a runbook for your engineering team.

Week seven and week eight are adoption support. This is the part most consultancies skip. The tool exists. It works. But getting six analysts to actually use it instead of falling back on the old PDF-export-to-Excel routine takes deliberate work. We watch usage logs. We pair with the holdouts. We tune the prompts based on what trips them up. We update the runbook.

By the end of week eight, your team is using the tool without us there. Your engineers own the code. We’re available on Slack for a month after handoff, but the engagement is closed. We invoice the final payment. We move to the next client.

Your team using the tool is the deliverable. Not the code. The code is the artifact. The use is the outcome.

What this is NOT

This is not a 12-week assessment that produces a slide deck.

This is not a roadmap.

This is not a proof-of-concept that proves Claude can theoretically do the work.

This is not a research project.

This is not “let’s see what’s possible.”

The pilot ships one production tool that absorbs one specific workflow. That’s it. The reason this works is that the scope is small enough to actually finish. Most AI engagements fail not because the technology doesn’t work, but because the scope was too big to ever ship.

What to look for in any AI vendor’s pilot proposal

If you’re evaluating proposals from us or from anyone else, these are the questions to ask:

“Is the scope a deliverable, or a phase?” A deliverable is “the credit memo drafting tool will be in production on date X.” A phase is “we’ll spend six weeks understanding your AI maturity.” Phases don’t ship anything.

“Who from your team is doing the work, and who am I going to talk to weekly?” If the answer involves a “senior partner” who sells the deal and disappears, plus a junior team you’ve never met who shows up week one, walk away. If the answer is one senior engineer who shows up day one and is still there day forty, you have a real pilot.

“What’s the price, and what does it include?” Fixed-price pilots force the vendor to scope correctly. Time-and-materials engagements have no incentive to ship.

“What does adoption look like in the last two weeks?” If the vendor doesn’t have an answer, the engagement ends at code handoff and the tool dies on the shelf. The right answer involves pairing with end users during real workflow time.

“What’s the eval set, and can I see it?” A vendor without an eval set is shipping vibes. A vendor with one has a way to prove the tool works AND a way to keep proving it as you maintain it.

The right pilot ships. The wrong one produces a deck. The difference is in how the first conversation gets scoped.

That’s the work.