I Let an AI Help Me Build an AI System. It Fought Me the Entire Way.
At some point during this build, I realized something slightly absurd:
I was using an AI coding agent to debug an AI system that was failing because of another AI service while deploying through infrastructure that deletes itself for fun.
This is not satire. This is modern engineering.
This is what actually happened building the GRIZL AI system, what worked, what didn’t, and why the next phase is going to look less like a chatbot and more like a small army.
The Setup: Copilot Agent + Codespaces + Controlled Chaos
I didn’t build this the “traditional” way.
This wasn’t:
local dev
manual commits
careful iteration
This was:
GitHub Copilot Agent writing code, fixing issues, opening PRs
Codespaces acting as a disposable dev environment
Me acting more like a product owner / reviewer than a line-by-line coder
In theory, this is the dream:
“Describe what you want, agent builds it.”
In reality, it’s more like:
“Describe what you want, agent builds 80% of it perfectly and 20% of it like it has never seen a computer before.”
Still… that 80% is dangerous in a good way.
What I Actually Built
At a high level, GRIZL now has:
1. A Grounded AI Chat System
Azure OpenAI (chat + embeddings)
Azure AI Search (vector + semantic + keyword fallback)
Structured knowledge base (chunked, indexed, embedded)
This part is working for real now. Not demo-working. Production-working.
2. A Fully Instrumented Retrieval Layer
We went from:
“something broke”
to:
endpoint=grizl-search.search.windows.net semantic=true vector=true fallback=false
Which is the difference between:
guessing
and knowing exactly which piece failed
3. A Frontend That Actually Feels Like a Product
Chat UI with memory feel
Source attribution (now being cleaned up so it’s not fake links to nowhere)
Error handling that doesn’t scream “developer preview”
It finally looks like something you’d show someone without apologizing first.
What Worked Better Than Expected
Copilot Agent Is Actually Viable (With Supervision)
I’ll say it straight:
Copilot Agent (using Claude Sonnet 4.6) + Issues → PR → Review loop is fast
Like… uncomfortably fast.
Things it handled well:
wiring middleware
adding telemetry
refactoring constants (VECTOR_FIELD_NAME, etc.)
test scaffolding
CI/CD wiring
Where it struggled:
subtle infra bugs
Azure-specific quirks
anything involving “this service talks to that service under these conditions”
So the pattern becomes:
Agent builds You validate reality
Not optional.
Codespaces Removed All Friction
No “it works on my machine.”
No environment drift.
No local setup hell.
Just:
open → code → run → destroy → repeat
It pairs really well with agent-based workflows because everything is disposable.
Hybrid Search + Embeddings = Real Answers
Once everything aligned:
embeddings matched dimensions (1536)
index schema correct
semantic config actually real (not theoretical)
vector profile properly wired
The system stopped guessing and started answering.
That moment where:
fallback=false
…felt like hitting a checkpoint in a boss fight.
What Did Not Work (and Tried to Ruin My Life)
Azure Deployments Are Not Your Friend
If a variable is not explicitly defined in Bicep or pipeline:
It will disappear.
Not degrade. Not warn. Not log helpfully.
Disappear.
I have now set the same environment variables more times than I care to admit.
“Resource Not Found” Is a Lifestyle
At one point the system was trying to query:
grizl-openai.openai.azure.com/indexes/...
Which is not where search indexes live.
That wasn’t a bug in logic.
That was one miswired config.
And suddenly your entire system is confidently asking the wrong service for the right thing.
Vector + Semantic Setup Is Not Plug-and-Play
Things that broke, in no particular order:
API version mismatch
vector field not recognized
semantic config invalid
embedding deployment missing
field naming mismatches
index created but not actually usable
Every time you fix one thing, another thing steps forward like:
“hello, I am the real problem”
Silent Fallbacks Will Trick You
The system looked like it worked long before it actually did.
Because it was quietly doing:
semantic → fail → fallback
vector → fail → fallback
everything → keyword search
And still returning answers.
That is dangerous.
Because you don’t notice you’re running on the weakest version of your system.
The Part That Actually Matters
This project stopped being “a chatbot” pretty quickly.
What we’re really building is:
A System That Knows Things
retrieves structured knowledge
ranks it
explains it
cites it
That’s step one.
Where This Is Going Next (The Fun Part)
This is where it gets interesting.
1. GRIZL Multi-Agent System
Instead of one chat brain, we move to:
Retrieval Agent (search + ranking)
Reasoning Agent (answer construction)
Personality Agent (voice, tone, character)
Action Agent (commands, automation)
Each one focused. Coordinated. Replaceable.
Less “one smart model” More “team of specialized operators”
2. Self-Learning Feedback Loop
Right now:
user asks → system answers
Next:
user asks → system answers → system evaluates itself
We start capturing:
which answers worked
which didn’t
what sources were useful
Then feeding that back into:
ranking
prompt construction
knowledge weighting
Not full autonomy. Controlled learning.
3. Ticketing + Memory Layer
This is a big one.
Turning chat into:
persistent conversations
issue tracking
user context memory
Think:
“AI support system that actually remembers what you asked yesterday”
Instead of:
“stateless goldfish with confidence”
4. Real Source System (Not Decorative Links)
Right now sources exist.
Next:
clickable
mapped to real pages
possibly inline highlights
Because “sources” that go nowhere is just UI cosplay.
5. Performance + Perception
We already hacked perceived speed with streaming.
Next:
caching
faster retrieval
precomputed answers for common queries
Goal:
feels instant even when it isn’t
The Real Takeaway
The hardest part of this wasn’t AI.
It was alignment.
aligning services
aligning schemas
aligning environments
aligning expectations with reality
AI didn’t fail.
Infrastructure did. Configuration did. Assumptions did.
And maybe the most important realization:
The future of building AI systems isn’t just writing code It’s orchestrating systems and supervising other AIs doing the same thing
Copilot Agent didn’t replace me.
It made me faster.
But it also made it very clear:
You still need someone in the loop who knows when something is technically working but actually wrong
If you’re building in this space right now and it feels chaotic:
Good.
That means you’re not just using the tools.
You’re actually pushing them.
— Jeremiah Williams
Still debugging
Still shipping