When the internet first entered enterprises, many companies set up dedicated “internet teams” to build and run the corporate website. This specialized group picked the tech stack, managed content, and decided what went online. Over time, that model disappeared as internet technology became integral to every department’s work: marketing ran online campaigns, sales managed web leads, HR recruited via digital platforms. The responsibility for “internet” dissolved into the fabric of each function. Generative AI is following the same trajectory.
Today, in most organizations AI is still concentrated in a central team (often under IT or engineering). This central AI group chooses models, manages integrations, and tracks technical performance. Centralizing made sense in AI’s early stages for a few reasons: early deployments required niche machine learning skills, companies needed tight risk controls around sensitive data, and initial projects were experimental pilots easier to manage in one place. In other words, just as early web projects lived with the “internet team,” early AI efforts have been siloed with an “AI team.”
But we’re already seeing the shift. In the near future, AI will be embedded in every department’s tools and processes. Signs of this are emerging today: marketing teams spinning up AI-driven content generators, HR teams deploying onboarding copilots for new hires, operations managers using conversational assistants for routine processes, and sales reps relying on AI for research and lead qualification. Instead of a distant AI expert team handling all use cases, each function will integrate generative AI into its daily workflows. The central AI team won’t disappear – it will provide governance, best practices, and technical oversight – but ownership of AI’s day-to-day use and improvement will shift to the teams that use AI every day. This mirrors how IT departments now set web infrastructure standards while marketing, sales, and others run their own web initiatives. The chatbot or AI assistant may “sit” with an AI team today, but its future is shared across the organization.
One company, many AIs (the multi-model enterprise)
As AI decentralizes, enterprises will not standardize on a single model or vendor for all use cases. Just as individual consumers mix and match AI tools for different tasks (one might use OpenAI’s GPT-4 for creative writing, Anthropic’s Claude for brainstorming legal language, Google’s Bard/Gemini for multimodal research, etc.), companies will do the same across their departments. Each team will gravitate to the model or AI service that best fits its domain and requirements.
For example, an HR department might adopt Google’s Gemini assistant for its strong multimedia and translation capabilities in training modules, the legal team might prefer Anthropic’s Claude for contract analysis due to its focus on compliance and careful reasoning, and the marketing group might lean on OpenAI’s GPT-4 for creative campaign content generation. These choices depend on each team’s specific use cases, data sensitivity, and success metrics. The result is an organization running a mosaic of AI assistants – each tuned to its function’s needs.
Several forces are driving this multi-model landscape:
Broad AI adoption across functions
Companies are embracing AI at an unprecedented rate, making it natural for multiple solutions to sprout. According to McKinsey’s latest global survey, 78% of organizations report using AI in at least one business function, up from just 55% a year earlier. Generative AI in particular saw usage nearly double within 10 months to 71% of businesses. As marketing, customer support, finance, and other teams all build AI into their workflows, they demand tools tailored to their context. Each team, being closest to its domain challenges, wants more control and customization over “its” AI.
Falling costs and more model options
The cost of running advanced AI models has plummeted. Since late 2022, the price of using a GPT-3.5 level model dropped from about $20 to just $0.07 per million tokens – a 280× reduction in under two years. At the same time, open-source and “open-weight” models have rapidly improved, in some cases approaching the performance of closed APIs. In fact, the performance gap between the top AI models is shrinking year over year. Smaller models (with far fewer parameters) can now achieve tasks that two years ago required giant 500B+ parameter models. This means teams no longer need a big budget or big infrastructure to leverage AI; many can fine-tune modest models on their own data or build custom AI apps cheaply. According to an AWS generative AI adoption study, 58% of businesses plan to customize existing models and 55% plan to build AI applications using fine-tuned models trained on proprietary data. In practice, one department might spin up a fine-tuned open-source model for its needs while another buys access to a premium API – whatever gets the best results. Once teams start experimenting independently, the only way to know which AI solutions work best is to measure how people actually use them. Adoption and outcome data become the benchmarks for comparing value across these different approaches.
Organizational changes supporting distributed AI
Companies are formally reorganizing to support AI everywhere. The same AWS study found that 60% of companies have already appointed a Chief AI Officer (CAIO) and another 26% plan to do so within a year. These roles, along with hiring of generative AI specialists in various departments, create a structure for more distributed AI ownership. Cross-functional teams (product managers, engineers, and domain experts together) are emerging to build AI features within departments. These groups iterate quickly and tailor AI to on-the-ground needs. For example, a product team in customer support might develop a custom AI agent to help resolve tickets faster, while a marketing content team fine-tunes an AI writer on brand-specific data. This distributed experimentation is healthy for innovation, but without coordination it can also lead to duplicated efforts and siloed learnings. Each team might be solving similar problems in parallel without sharing insights. A common measurement system – focusing on user adoption and outcomes – is needed so teams can learn from each other’s successes and failures. In essence, if each team is “doing its own AI thing,” a unified way to track and compare their AI’s performance is critical for the organization to maximize value and avoid reinventing wheels.
The net result of these trends will be AI embedded in every corner of the organization, with the central AI group acting more as a platform steward or center of excellence than a sole owner. This diffusion of AI brings huge opportunities for innovation and productivity – but also a new challenge: fragmentation of data and insight. When every department runs its own assistant or model, how can the company as a whole answer basic questions like “Which AI tools are actually working for us?” or “Where are users getting frustrated or dropping off?” or “What’s the ROI of our various generative AI initiatives?” Without a unified view, those answers remain elusive.
The fragmentation problem: Why we need cross-model analytics
When AI ownership spreads out, it becomes harder to see the full picture. In the early centralized model, the AI team could track usage and outcomes for the one or two chatbots or models it deployed. In a decentralized scenario, you might have five, ten, or more different AI systems in play across the business. Each likely comes with its own usage logs or vendor dashboard, but none of them alone shows how AI is performing enterprise-wide. Important patterns only emerge when you look across all AI tools in aggregate. This is why tracking adoption and user behavior across departments is the only way to truly spot where AI is thriving and where it’s underused (or causing issues).
Imagine trying to improve customer experience when your support team’s virtual agent uses one model, your website’s sales chatbot uses another, and internal teams use a mix of other AI assistants. You might see that one assistant is answering thousands of queries a week while another in a different department barely gets any use – a sign that one team’s tool provides more value or is easier to use than another’s. Or you might find that in a certain workflow, users consistently abandon an AI assistant halfway through the task, indicating friction or dissatisfaction. These insights are only visible if you have cross-model, cross-department analytics tying together all usage data.
Right now, most organizations lack this unified view. Each AI vendor might provide some metrics in its own silo, but there’s no “common dashboard” for all AI interactions happening in the business. The risk is flying blind: improving or troubleshooting one AI tool in isolation while missing bigger wins or failures elsewhere. A unified analytics layer for AI usage would serve as the shared source of truth that keeps everyone aligned. It would let a company answer questions like:
- What are users trying to do with our AI assistants? (e.g. Are employees mostly asking the HR bot policy questions? Are customers using the support bot for troubleshooting, account info, or something else?)
- Where do they succeed or fail? (Which requests are fulfilled well by the AI versus where does it often fall short, causing users to give up or escalate to a human?)
- Which features or use cases create real value? (Maybe the marketing content generator saves dozens of hours on blog drafts – indicated by high adoption – whereas a legal document summarizer is rarely used, indicating low value or poor UX.)
- Where do drop-offs and frustrations occur? (Do users tend to rephrase questions repeatedly, suggesting the AI didn’t get it right? Are there common points in a conversation or process where they get frustrated and abandon the AI?)
With multiple AI systems in play, having this bird’s-eye view is critical. In one real example, a Nebuly client in the automotive industry found that usage of their generative AI assistant varied widely by region. Initially, leadership assumed that some regions just weren’t as interested in the tool. But by digging into the conversational analytics, they discovered a specific problem: the assistant’s performance for non-English queries was poor, leading to high failure rates for those users.
In other words, employees in Latin American offices were asking questions, getting bad answers, and understandably using the tool less. The fix was to improve the assistant’s multilingual support – which removed a major blocker to adoption. System-level technical metrics alone would not have uncovered this issue; only by pairing technical data with user-centric analytics did the team get the full picture of why one deployment succeeded while another struggled. This kind of insight – understanding why one team’s AI is delivering value while another’s isn’t – is exactly what cross-model user analytics provides.
Perhaps most importantly, a unified adoption dataset allows the business to compare the effectiveness and ROI of each AI tool. For example, if Marketing’s content AI is used 10× more often than Sales’ lead-qualifying AI, that tells you where value is being realized (and maybe which team might need help improving their solution). Or if one department’s custom fine-tuned model drives measurable productivity gains, while another team’s off-the-shelf AI sees low engagement, you can make informed decisions about where to invest further, which approaches to replicate or scale back, or whether to consolidate solutions. Once each department starts shaping its own AI, adoption data becomes the benchmark for judging which approaches actually deliver value. Without that data, every team is just guessing or relying on anecdotal feedback.
In short, a cross-model analytics layer turns a potential fragmentation nightmare into an opportunity. It gives you a way to oversee and optimize AI at the portfolio level across all the different models and assistants your company uses. Instead of siloed views and guesswork, you get a holistic understanding of AI’s impact on your business. Without it, you’re essentially in the dark – you might improve some AI tool in isolation while missing a chance to replicate its success elsewhere (or failing to notice a high-risk failure in another corner). With unified analytics, you gain the visibility to govern AI use company-wide: guiding best practices, reallocating resources to high-impact projects, and ensuring every AI initiative aligns with business outcomes.
Lessons from web, BI, and CRM analytics
If this need for a unified view sounds conceptually familiar, that’s because we’ve seen this movie before. Every time businesses adopted a transformative technology at scale, they eventually needed a unified analytics or management layer to make sense of it across the organization. Consider a few analogies:
Web Analytics (e.g. Google Analytics)
In the early days of websites, companies often had very basic metrics – perhaps a hit counter on each page or server logs analyzed in silos. Marketing had its web stats, product teams had theirs, and there was no easy way to get a cohesive view of user behavior online. The introduction of platforms like Google Analytics changed the game by providing one unified view of user behavior across an entire website (and across marketing channels driving traffic to that site). Suddenly, everyone could agree on a single source of truth for web metrics – which campaigns bring in traffic, how users navigate pages, where they drop off in a funnel. This unified approach was crucial for the web to mature into a core business tool. Today, using a web analytics solution is just part of doing business online; it’s hard to imagine running a major website without tracking user traffic and conversions (indeed, over 80% of websites use Google Analytics or similar tools for tracking). We need the same for AI assistants: a single lens on how users interact with all our different AI interfaces, not just siloed stats for each.
Business Intelligence (BI tools like Tableau, Power BI)
Large enterprises have dozens of databases and software systems – finance, supply chain, HR, sales, marketing, support, etc. In the past, each might produce separate reports, making it hard for leadership to get the full picture. Modern BI platforms aggregate data from across these silos, enabling cross-functional dashboards and analysis. This centralization of reporting means a company can correlate metrics that would be impossible to see in isolation (for example, seeing how customer service response time impacts customer satisfaction scores or revenue retention). In the same way, GenAI usage insights must be unified so companies can correlate AI usage with business metrics. For instance, does increased use of an internal AI knowledge assistant correspond to faster project delivery or fewer support tickets? Does the customer support chatbot deflect a meaningful percentage of inquiries from call centers (and thereby save costs)? Only a unified data approach lets you tie those threads together.
Customer Relationship Management (CRM systems like Salesforce)
Before CRMs, customer interactions were scattered. Sales might track leads in spreadsheets, support had a separate ticketing system, marketing had an email list, etc. Salesforce and its peers created a single system of record for all customer interactions across departments. This not only improved internal efficiency but provided management with a holistic view of the customer journey. Similarly, as every department starts interacting with users (whether customers or employees or partners) via AI, those AI interactions become part of the overall user experience and journey. We’ll need a central record or analytics for those AI-driven interactions. For example, how many times did a customer attempt self-service with the chatbot before contacting a human support agent? What kinds of questions are employees asking an internal HR AI assistant, and are they getting answers or escalating to managers? These are new interaction points that should feed into our analytics and CRM thinking, just as web clicks or support calls do.
In all these cases – web analytics, BI, CRM – the pattern was the same: early fragmentation and siloed efforts eventually gave way to unified platforms and “single source of truth” approaches as usage scaled. Generative AI inside the enterprise is reaching that inflection point now. Companies that pioneered some AI pilots in one team are now scaling AI across the org, and they’re running into the limits of siloed monitoring and ad-hoc measurement. It’s time to bring AI usage data together. In fact, Nebuly’s vision is that a unified analytics layer for AI will become as essential as web analytics is today. You wouldn’t run a mission-critical website without Google Analytics or an equivalent in place; likewise, in a few years it will be unthinkable to deploy dozens of enterprise AI and assistant tools without proper user analytics and feedback loops to understand how they’re performing.
Beyond observability: Tracking user behavior in conversations
Up to now, many AI teams have relied on technical observability tools or the models’ own logs to monitor their systems. These are valuable for what they do – ensuring the technical system is functioning correctly. For example, observability dashboards will track metrics like response latency, error rates, throughput, and infrastructure usage. This kind of monitoring is essential, especially early on, to debug and scale the system. If your chatbot’s response times spike or an API call fails, you need to know immediately.
However, once the AI assistant is live with real users, those system metrics cannot tell you whether the AI is actually helping people or driving business outcomes. A model could be fast, stable, and error-free and still fail to deliver value if, say, users don’t find its answers helpful or stop using it after a few tries. Technical metrics alone miss the human side of the equation.
This is where tracking user behavior in conversations becomes critical. It captures the interaction from the user’s perspective: Are users engaging with the AI or abandoning it? What are they asking for? Do they have to rephrase or repeat questions (a sign they didn’t get what they needed)? How often do they give up on the AI and switch to another channel (like calling support or asking a colleague)? Traditional product analytics tools (which excel at tracking clicks, page views, and form submissions) provide very little insight here. And traditional web/app analytics were not built for the nuance of natural language dialogues. A conversation isn’t a series of discrete events like page loads – it’s a back-and-forth with context, intent, and sometimes ambiguity.
Crucially, conversational AI introduces a whole new layer of behavioral data that standard analytics tools don’t capture. Measuring where in a dialogue a user gets frustrated, for example, requires understanding the content and flow of the conversation, not just an isolated UI event. If a user asks a question, gets an answer, then asks the same question slightly differently two more times, a human can infer “the first answer didn’t satisfy them.” But a generic event logger might just count three queries. If you optimize only for system metrics, you might end up making the AI respond a few milliseconds faster while missing the fact that users are unhappy with the answers. Conversely, if you only look at high-level business outcomes (e.g., support ticket volume), you might not realize an AI tool is underperforming until it’s reflected in those outcomes – by which time you may have lost user trust or wasted effort.
To truly measure success in GenAI deployments, organizations need to track a broader set of metrics that link to user experience and business value. One helpful framework is to think of metrics in three layers:
1. Behavioral signals:
How users interact with the AI day-to-day. This includes adoption rates (how many people are using it, how often), engagement patterns (e.g. average conversation length, number of turns per session), retry or rephrase rates (do users need to ask multiple times to get a good answer?), and drop-off points (where in a conversation or task users give up). You can also gather implicit satisfaction cues – for instance, if a user abruptly ends the session or continually rephrases a query, that implies frustration. These metrics reveal whether people are actually embracing the tool and where they encounter friction or confusion.
2. Operational signals:
How AI is affecting core processes. For example, is the AI actually resolving issues or just handing them off? Metrics here could be things like self-service success rate (what percent of inquiries the chatbot resolves without human handoff), average handling time (if AI is involved in a workflow, does it speed it up?), or internal metrics like faster project completion when using an AI copilot. These connect AI usage to efficiency improvements in the business.
3. Financial or business outcomes:
The highest-level results influenced by AI. This includes direct outcomes like revenue generated or costs saved due to AI, as well as broader KPIs like customer satisfaction scores or employee productivity measures. Ultimately, these tell you if the AI investment is paying off in tangible terms.
Most observability tools focus on technical and perhaps some operational metrics. Traditional product analytics focus mainly on the very top of the funnel (behavioral events in a UI, but not the content or success of those events).
Conversational AI spans all three layers: you need to understand the interaction, link it to process outcomes, and ultimately see the business impact. Without all three layers, it’s easy to optimize for the wrong thing – like boosting usage numbers without improving outcomes, or improving model accuracy without improving user satisfaction.
So how can teams get these missing AI user insights? This is an emerging area, and it requires purpose-built solutions. Companies need to instrument their AI assistants in a way that captures conversational interactions similar to how we instrument websites and mobile apps for user actions. This might involve logging each turn of a conversation (user question and AI answer), noting user reactions (did the user ask a follow-up? rephrase? click a thumbs-down button if provided?), and tagging outcomes (was the issue resolved or did it escalate? what was the user’s sentiment or feedback?). Doing this at scale and making sense of the data isn’t trivial – but that’s exactly the challenge that new platforms like Nebuly are tackling.
A unified analytics platform for GenAI (how Nebuly helps)
Nebuly is building what we like to call the “user intelligence layer” for generative AI products. In essence, it’s a user analytics platform for LLMs – a centralized solution to capture and analyze how users interact with any AI assistant or agent, regardless of which model or vendor is behind it. Think of it as analogous to Google Analytics, but for AI-driven conversations. Just as GA tracks events on your website or app, Nebuly tracks events in a conversation: user queries, AI responses, follow-up actions, and implicit feedback signals. The goal is to turn all that unstructured conversational data into actionable insights that product owners and business stakeholders can use.
What does this look like in practice? Nebuly connects to your AI applications (whether it’s a customer support chatbot, an internal copilot integrated in your software, a voice assistant, etc.) and automatically logs the interactions. It then provides analytics and dashboards on top of this data, so you can see things like:
- Usage and adoption: e.g. how many users engage with each assistant daily/weekly, how many total queries, what the user retention looks like (do people come back and continue using it?).
- Conversation patterns and outcomes: e.g. the most common intents or questions users have, the average dialogue length, where users tend to drop off in a conversation, and what percentage of conversations are successful (however success is defined for that use case).
- Implicit feedback and friction: e.g. how often users rephrase their queries (which might indicate the AI didn’t satisfy them initially), how often they switch to a human channel or escalate, sentiment analysis of user messages, or detection of frustration signals like repeated “help” requests.
- Comparative performance: if you have multiple bots or multiple versions of a bot, Nebuly can compare them side by side. For instance, if one team is A/B testing two different models or prompt strategies, the platform can show which variant yields higher user satisfaction or task completion rates.
- Trend and ROI analysis: tying usage metrics to business outcomes. For example, correlating a rise in chatbot self-service resolution with a drop in live agent support tickets, or measuring how much time an internal AI tool is saving employees on certain tasks (time that can be quantified in cost savings).
Crucially, Nebuly is model-agnostic and cross-functional. Whether your HR bot is built on OpenAI, your finance analysis tool uses an open-source LLM on-prem, and your customer chatbot runs on Google’s API – Nebuly aggregates their analytics into one place. This provides the common language of adoption data we discussed earlier: a way for different teams and stakeholders to align on what’s working and what’s not. The platform is designed to deliver insights that are useful not just for engineers, but for product managers, UX designers, and business leaders as well. In fact, Nebuly emphasizes cross-team value: it’s one platform delivering relevant insights for product, marketing, customer experience, and compliance teams, not only for AI developers.
For example:
- Product teams can get faster feedback on new AI features by seeing how users actually interact with them and where they get stuck, enabling a more rapid iteration cycle based on real behavior.
- Marketing teams (in a customer-facing context) can gain a deep understanding of user intent and pain points from analyzing chatbot conversations, informing content strategy or identifying gaps in self-service resources.
- Customer experience or success teams can identify where users get frustrated with AI help and prioritize those areas for improvement or additional training content.
- Compliance and risk officers can even use the analytics to spot potential issues – e.g. if users are frequently asking an AI about something that might lead to inappropriate advice, or if employees are entering sensitive data despite policies, those patterns can be caught.
By contrast, traditional observability tools are built mainly for engineering metrics and don’t offer this kind of rich user-centric insight. And traditional product analytics (web/app analytics) can’t parse an AI conversation meaningfully – a chat isn’t a series of button clicks or page loads. Even product analytics vendors have recognized this gap: for instance, Pendo (a product analytics company) recently introduced a beta feature called Agent Analytics to measure AI agent interactions, an acknowledgment that new approaches are needed. However, these retrofits are often limited in depth because they’re bolted onto tools not originally designed for conversational data. Nebuly, by contrast, is purpose-built from the ground up for conversational AI analytics.
The table below highlights how a GenAI user analytics platform like Nebuly differs from traditional analytics tools:
By deploying a platform like Nebuly, companies essentially gain Google-Analytics-like visibility into their AI usage. You can pinpoint which assistant is delivering high ROI and which ones are underperforming. You can measure employee engagement with a new internal AI tool and identify if additional training or UX improvements are needed – for instance, in one global manufacturer’s rollout of an internal coding assistant, Nebuly’s analytics revealed that many engineers weren’t phrasing queries in a way the model could handle, leading to low success rates. The fix was to provide a short prompt training and examples to those teams, after which usage and success metrics climbed significantly (resulting in measurable productivity gains). Without user analytics, the company might have assumed the model itself was insufficient, when in fact the issue was a solvable user education gap.
Nebuly also helps enforce governance and compliance in this multi-model environment. A central AI governance team can see, for example, if a certain department’s AI usage spikes unexpectedly (maybe indicating an unmanaged “shadow AI” tool being used), or if certain types of queries (like requests involving sensitive data or policy-related questions) are causing problems across different bots. They can then step in to investigate or set organization-wide guidelines. Essentially, Nebuly closes the feedback loop for AI deployments: it provides the data to answer “Is our AI actually helping users? Where and how should we improve it?” Instead of guessing or relying only on anecdotal reports, teams get concrete evidence from user behavior.
In effect, Nebuly is positioning itself as the category-defining GenAI user analytics platform – purpose-built to handle the complexity of conversational data at scale. It gives enterprises a way to turn raw conversational logs into structured insights, revealing user behavior patterns that purely technical metrics would never show. As one Nebuly message puts it: If you want to deliver business value in the age of AI, you need to understand your users – and that now means understanding how users interact with AI systems, not just with traditional software.
Looking ahead: AI as the new everyday tool
The trend is clear: Just as the internet moved from a specialist project to a pervasive utility in business, AI is moving from a siloed capability to everyday infrastructure. In a few years, every department will take for granted that some form of AI assistant or generative tool helps power its work – analogous to how every department today relies on software and connectivity. We’re nearing a future where saying “we have an AI team that handles all our AI” will sound as outdated as having an “internet team” does now.
To make this transition successful (and not chaotic), companies will need a combination of robust technical monitoring, clear adoption metrics, and a business value framework. Technical monitoring ensures the models run reliably, securely, and within guardrails. But adoption metrics and user analytics ensure the AI is actually delivering measurable impact, not just being available. Governance will also evolve: early on, centralized teams set guardrails; over time, those become organization-wide policies with each department accountable for using AI responsibly and effectively. A unified analytics layer becomes the compass that guides that responsibility – showing where to course-correct and where to double down.
In summary, as generative AI gets embedded across every department, organizations will inevitably juggle multiple models and AI tools. It’s a positive development – allowing each team to leverage the AI that fits best – but it comes with the challenge of fragmentation. The solution is to zoom out and treat AI usage and behavior data as a first-class domain to be measured and optimized, just like web traffic or sales pipelines. By implementing cross-model user analytics and sharing those insights widely, companies can ensure all their AI efforts are rowing in the same direction. They’ll be able to compare apples-to-apples across tools, learn quickly what works (and what doesn’t), and make informed decisions to drive maximum value from AI.
The businesses that master this – that turn AI user data into actionable insight – will have a massive advantage. They’ll improve their AI systems faster (because they actually know what’s happening in those systems), deliver better user experiences, and ultimately achieve stronger ROI on AI investments. In the end, successful AI adoption isn’t just about model performance or technical feats; it’s about user acceptance, effective usage, and real outcomes. As we enter a world of AI-in-every-department, understanding your users’ behavior with those AI systems isn’t a “nice-to-have” – it’s the key to delivering business value in the age of AI.



