The history of software design is a history of constraints.
For the last thirty years the Graphical User Interface defined how humans talked to machines. This era relied on a simple contract between the designer and the user. The designer predicted every single action a user might want to take. They built a specific button or menu for each one.
This model made analytics incredibly easy. If you wanted to know what a user wanted you just looked at what they clicked. A click on a Pricing tab was an explicit declaration of intent. You did not have to guess. The button was the intent.
Tools like Google Analytics and Mixpanel were built for this deterministic world. They turned structured clicks into neat funnels. They told you exactly where users dropped off because the path was linear.
Generative AI has broken this model completely.
We are shifting to the Conversational User Interface. In this new world there are no menus. There are no filters. There are no Add to Cart buttons. There is just a blinking cursor and an empty text box.
The user does not select a pre-defined option. They type a sentence. They might say I need to fix the error in this Python script or Draft a legal response to this email.
This shift empowers users but it blinds product teams. When intent moves from a fixed coordinate on a screen to an open-ended sentence your existing metrics stop working. You can no longer track success by counting events. You have to start understanding language.
The black box of the text box
Standard product analytics fail in GenAI because the user behavior is hidden inside the text.
If you look at a chatbot session through a traditional tool every session looks identical. The user lands on the page. They type in the box. The server responds. The user leaves.
You might see that a session lasted ten minutes. In the world of web analytics a ten minute session is cause for celebration. It implies high engagement and stickiness.
But in the world of AI a ten minute session is often a disaster.
Consider two different users.
User A opens your internal coding assistant. They ask for a complex refactoring of a legacy codebase. The AI understands the context perfectly. It generates clean code. The user spends ten minutes reviewing the code and testing it. They leave happy.
User B opens the same assistant. They ask a simple question about a library. The AI gives a vague answer. The user asks again. The AI hallucinates a function that does not exist. The user gets frustrated. They type That is not what I meant and try a third time. They spend ten minutes wrestling with the model before giving up.
To a click tracking tool these two scenarios look exactly the same. They both show ten minutes of time on site. They both show high activity. But one user is a promoter and the other is a churn risk.
Recent industry data from 2025 suggests that up to 85% of AI projects fail to scale. The primary reason is not the quality of the model. It is the inability of product teams to distinguish between User A and User B. They see high usage numbers and assume success. They do not see the friction hidden in the text.
Why time on page is a vanity metric
In productivity tools efficiency is the goal. If a user opens your AI copilot to summarize a PDF you want that interaction to take thirty seconds. You do not want it to take ten minutes.
If your analytics dashboard celebrates increasing session duration you might be celebrating user struggle. A longer session often means the model failed to understand the prompt on the first try. It means the user had to spend time correcting the AI.
We need to stop measuring volume. We need to stop counting tokens and minutes. We need to start measuring value. Did the user get the answer they needed? Did they leave the session with a completed task or a new problem?
From explicit to implicit intent
The core difference between the old world and the new world is how users signal intent.
In the point and click world intent was explicit.A user clicked Pricing so you knew they cared about costs.A user clicked Cancel Subscription so you knew they wanted to leave.
In the conversation world intent is implicit. A user might type This response is too long. This is a formatting intent.A user might type Actually look at Q3 data not Q4. This is a correction intent.A user might type Are you sure about that number. This is a trust intent.
Traditional tools cannot read these sentences. They treat every input as a generic event. To understand GenAI users you need an analytics engine that uses Natural Language Understanding. This engine reads the conversation as it happens. It extracts these intents and clusters them into patterns.
This allows you to see the reality of your product. You can see that 20% of your users are trying to use the bot for legal advice even though it was designed for marketing. You can see that users in the Finance department are consistently frustrated with data accuracy. You move from guessing to knowing.
The three types of AI friction
In traditional software we optimized funnels. We wanted to smooth the path from Page A to Page B.
In GenAI there is no path. There is a loop. We need to optimize friction.
Friction in GenAI looks different than friction in a web app. It is cognitive friction. It is the gap between what the user expects and what the AI delivers.
We categorize this friction into three specific types that every Product Manager should track.
1. Blank Page Syndrome
This is the most common drop off point. The user opens the bot but does not know what to ask.
Adoption statistics from 2025 show that 42% of AI projects are abandoned partly due to complexity. Users stare at the blinking cursor. They feel the pressure of the infinite possibilities. They type Hello. They get a generic response. They leave.
This is a failure of onboarding. In a GUI the options are visible. In a CUI the capabilities are hidden.
Analyzing these drop off points allows you to build better starter prompts. You can guide the user. You can suggest Try asking me to analyze your Q3 spend instead of leaving them to guess.
2. The Rephrasing Loop
This is the clearest signal of model failure.
The user asks a question. The AI answers. The user asks again using slightly different words. The AI answers again.
This repetition is a scream of frustration. The user is saying You did not understand me so I will try saying it simpler.
Field studies show that satisfaction drops precipitously when users have to repeat themselves. Yet high active user numbers often mask this frustration. A traditional tool sees three messages and thinks the user is engaged. A semantic analytics tool sees three similar messages and identifies a loop.
Nebuly detects these loops automatically. We flag it as high friction. This allows your engineering team to inspect the conversation. They can see exactly where the retrieval failed or where the prompt instructions were unclear.
3. The Trust Gap
Trust is the currency of AI adoption. The Trust Gap occurs when the user gets an answer but does not believe it.
Research indicates that 82% of users are skeptical of AI outputs. However only 8% of users consistently check the sources.
This creates a dangerous middle ground. Users doubt the AI but they do not verify the work. They simply stop using the tool. This is silent churn.
You can identify this by tracking verification intents. When a user asks the AI to show its source they are signaling a lack of trust. When they ask the AI to check the math they are skeptical.
If you see a spike in these intents it means your model is not projecting confidence. You can improve citations. You can adjust the model to be less confident when it is unsure. You can design the interface to show the reasoning steps.
The economic impact of bad analytics
The cost of failure in GenAI is high.
In the world of chatbots a failed query cost fractions of a cent. In the world of Large Language Models a failed session can cost real money.
An agent that gets stuck in a loop might make fifteen calls to a paid API like GPT-4. It burns through tokens. It increases latency. And it still fails to solve the user problem.
This inverts the unit economics of software. A "power user" in a SaaS app is usually profitable. A "power user" in GenAI who is stuck in a loop is a cost center.
You need to move from measuring Cost Per Token to measuring Cost Per Outcome.If your agent costs fifty cents per run and has a 50% success rate your Cost Per Outcome is one dollar.If you improve the success rate to 90% your Cost Per Outcome drops significantly.
You cannot optimize this if you cannot measure success. You need to know which intents are succeeding and which are failing. You need to know if the high costs are driving value or just driving frustration.
Structuring the unstructured
This is why we built Nebuly.
We recognized that the point and click analytics stack is becoming obsolete. Companies do not need another tool to track page views. They need a tool to translate language into data.
Nebuly sits between your users and your model. It listens to the messy stream of conversation. It uses specialized small language models to structure this data in real time.
It detects topics. You can see that 20% of your users are asking about the new HR policy.It detects sentiment. You can see that users are frustrated when asking about Payroll.It detects implicit feedback. You can see that users reject the first draft of code 40% of the time.
By converting conversation into structured metrics we allow you to manage your AI product with the same rigor as your web product. You can treat language as data.
The era of point and click is passing. The era of ask and receive is here. Your analytics need to speak the language.
Frequently asked questions (FAQs)
You cannot manage what you cannot measure. If you are ready to see what your users are actually telling you we are here to help. Book a demo to see your data clearly.



