My lessons from the first chatbot wars

Falling in love with AI isn't just a danger in the movies

Feb 28, 2023

“Three chatbots installing a data plumbing in a corporate office.” Sort of right.

AI and conversational interfaces are the new hot thing…again. It’s different this time, to be sure, but there are clear lessons from the prior wave of bot hype about how the best AI tech is relatively unimportant in building successful businesses. Myra Labs, where I worked across product and business in 2016-18, was one of the first chatbot companies at the dawn of the deep learning era. We learned some counterintuitive lessons during the first couple years: some of which are different today, and many which are not.

Don’t fall in love with the tech
1. Natural language-only inputs are quite risky if not handled right
2. AI infrastructure sucks up huge resources
3. AI complicates onboarding
Success hinges on designing proactively for error
Start with the plumbing

For the first wave of B2B AI software, these lessons favor existing software with value and infrastructure in place, as well as middleware companies that can help incumbents add AI pixie dust quickly and easily. After that, who knows — and when “fun” is a core product value, there’s an exciting new world ahead.

#1: Don’t fall in love with the tech

We had a novel deep-learning-driven natural language understanding system. So obviously, we should build the product and company positioning around it, right?

Natural language-only inputs are risky

On the surface, our fancy model that understood language looked advantageous:

It was platform-agnostic (Only text means SMS, WhatsApp, FB Messenger, or web input is equally good!)
It’d let users describe their problems specifically and naturally
It leveraged Myra’s existing AI infrastructure

Turns out, though, buttons are pretty good, especially on mobile, and the AI models weren’t perfect out of the gates. So, we should add buttons, no big deal, right?

However, the chat platforms did not offer button AND natural language input, and so that meant we had to create our own to accommodate our model.

The infrastructure sucked up near-term time with mostly long-term value

Around this time, we decided to focus on the customer support market where there was a clear path to increasing customer satisfaction and decreasing cost. To get to a working demo, we had to build a lot of infrastructure, including:

Conversational design interfaces (Let’s plot out: “Is your question about returns or pickups?”, then show “Returns” and “Pickups” buttons, then if it’s Returns, try to understand the natural language response while watching for a sixteen-digit Order ID, then…)
Conversational state management (Let’s do the above)
Model lifecycle management (Let’s make sure the language part works)
Model inference (Let’s understand the language quickly)

This complexity distracted from enterprise sales’ more core pleasures: Salesforce and Zendesk integrations, easy onboarding, and appealing interfaces, let alone security certifications and permissions, and all the rest. Otherwise known as: what our prospects saw and cared about to make a buying decision.

(The good news: monolithic pre-trained LLMs available via API simplify much of this quite a bit, but I suspect not totally as the error space continues to be quite wide.)

It complicated onboarding

Even once we had the buttons and a chat widget going, it still felt important to showcase natural language — otherwise, where was the differentiation?

Myra’s LSTM-based model frameworks needed many specific examples of each customer problem to train itself — at least hundreds, and preferably thousands. With that in place, the models could understand all sorts of ways to say things, like we see in ChatGPT today. But practically, it also meant that:

Small startups were out: They didn’t have the ticket volume to train the models.
Middle market tech companies were in: they had volume, technical appetite, modern customer care stacks, but also were getting pitched by other startups and knew the importance of security.

In addition to learning this brand new technology space, we were also a small technical team learning enterprise sales, and it turned out that it wasn’t an easy one. As an example, one of our first questions was “can I have a huge amount of your customer data?” It wasn’t the best look for a tiny company to be asking these large prospects for some of their most valuable data.

The times we were abler to get the data, we’d spend hundreds of hours hand-labeling tickets to train the models. Often, the model wouldn’t be as accurate out of the gates as we wanted. The fix? More data.

Bots with simpler technology could get started instantly. Not as smart, sure, but just add a few phrases to look for and get started, and you’re off to the races.

Success hinges on designing proactively for error

When regular software makes mistakes, it’s bad. When ML makes mistakes, it’s normal. “Siri sucks” is still a popular sentiment despite a decade-plus of Apple’s investment. The context matters, though. Suboptimal content recommendation, like on TikTok or Netflix, is not a huge deal. However, when the model needs to return a right-or-wrong answer, like when Siri hears you wrong or ChatGPT confidently hallucinates a fact, it’s frustrating.

During the demo: Will the bot say the wrong thing when a prospect asks an off-script question?
Pilot Risk: Once tested, will customers feel good without control?
Long-term Adoption Inertia: If launched, will your client’s support staff lose any enthusiasm that existed for your tool after dealing with customers angered by bot errors when they are already in need of help?

Improving model quality helps mitigate these risks, but almost never to 100%. Getting to that level comes from non-machine learning features.

Start with the plumbing, add sophistication later

Preferably, your product delivers value with no AI, and resources can be spent on delivering an easy onboarding to deliver immediate value. Over time, you can gradually add AI with these escape hatches built in:

Show multiple predictions or variants. Let the user choose their favorite variant, or skip bad predictions.
Human-in-the-loop. Assume model output will be helpful, but wrong — how can the user best make use of that through editing?
The escape hatch. Can the user in one-click move the AI out of the way when the prediction isn’t great? (The ‘talk to a human’ button.)
Hiding the bot. Instead of presenting as a conversational interface, should it look like a form until the AI is confident it can detect and provide value?

The plus side — all these escape hatches implicitly provide training and validation data that lets you improve functionality while monitoring performance and efficacy. It lets you have the option of fully automating pieces of workflow in a glorious AGI future – not the hard necessity.

These learnings, paired with the new generation of technology, imply that the answer to the question of “Is a startup that wraps features around OpenAI’s APIs defensible?” would be almost certainly be yes. Using AWS Redshift or BigQuery on Google Cloud doesn’t threaten your startup’s defensibility — delivering value to customers has less to do with this technology in itself and more what you do with it.

Greg’s Sometimes Newsletter

Discussion about this post