An intro piece to an upcoming series focusing on how to collect and analyze user feedback in LLMs-based products.
Companies are using Large Language Models (LLMs) more and more, with NLP accounting for almost 50% of daily Python data science library usage. Users’ feedback is at the heart of every product’s success, especially for products driven by LLMs. Capturing and building this feedback is the secret to transform your product from good to exceptional.
At the heart of LLMs there is a problem: hallucinations. Hallucinations happen when the language model is a little bit too creative and starts to making up stuff. Chip Huyen highlighted how companies like Dropbox, Langchain, Elastics and Anthropic think that hallucinations are the main reason why some businesses are not adopting LLMs in production.
Beyond hallucinations, LLM users face issues like high latency and inconsistency. For instance, due to the non-deterministic nature of machine learning, LLMs sometimes produce radically different responses to the same prompts.
With all these challenges, how can companies make sense of customer feedback and upgrade their LLM offerings?
Users can provide feedback in 2 ways: explicit or implicit. Each carries its unique weight and they give the possibility to focus on different problems in two completely different ways. Businesses need both to get a full picture.
Think of explicit feedback as direct conversations with LLMs’ users. It's feedback you can see and hear. Companies can gather insights from upvotes/downvotes, reviews, or short surveys. Let’s consider GitHub Copilot, a tool that helps developers during the coding process. GitHub Copilot has been trained on a lot of repositories, taking advantage of open source code for learning the rules of programming and the most common practices used in different projects. Imagine this tool receiving feedback that it underperforms with a particular coding language. This direct input empowers Microsoft to refine its product, benefiting both the end-user and the company's bottom line.
However, explicit feedback is not without cons. First, most users are not incentivized to provide feedback and as result very few do. Second, users who do provide feedback are often biased, i.e. only the very satisfied or angry ones speak up, which can skew decision-making.
This is where the silent majority of LLMs’ users speaks. By analyzing users’ experience, like the topic and sentiment on input prompts, repeated queries, etc companies can capture unbiased user satisfaction - without them saying a word. These subtle clues, when decoded, offer a lot of insights which are in turn instrumental to improving the LLM’s weak points.
At Nebuly we want AI-driven companies to succeed at building LLMs customers love. We believe capturing real users feedback is the way to go. Since systematically capturing and analyzing users feedback is hard, we’re building an LLMs Analytics that helps you building LLMs users love. We will also publish soon in depth posts about implicit feedback and how to analyze it, so stay tuned!