AI hallucinations
Research conducted by OpenAI found that its own ChatGTP has AI hallucinations 33% and 48% of the time. In AI parlance, a hallucination is what you and I might say is a mistake.
But they are not mistakes. Hallucinations are a byproduct of AI models’ learning. Humans make mistakes if they don’t spot and correct them. As more advanced versions of AI models come out, OpenAI’s research shows hallucinations are likely to increase.
AI metrics
How do you solve this problem? Put a measurement in place to identify and manage them. The key is to select the appropriate AI metrics for the task at hand. Most of the key AI metrics are based on four machine learning variables.
- True Positive: Percent of correct AI answers. For example, an AI agent is assigned to separate emails from spam. One hundred emails are legitimate; AI identifies 90 as accurate, but 10 are incorrectly identified as spam; the TP is 90%.
- True Negative: The percent of correct answers when looking for a negative result. For example, AI examines a diagnostic test to correctly identify individuals who do not have a particular condition or disease. One hundred people are negative. AI correctly identifies 90 as negative, but 10 are identified as positive. The TN is 90% and 10% may receive unnecessary treatment for a condition they don’t have.
- False Positive: The percent of incorrect AI answers. For the email and spam scenario, the 10 emails that were wrongly identified as spam result in an FP is 10%.
- False Negative: (also a Type II error or missed detection) is when a model incorrectly predicts a negative outcome when the reality is positive. For example, a medical AI misses a cancerous tumor (predicts negative when it’s positive) or a security system to detect a cyber threat. Out of 100 cyber threats, five are identified as negative but are not; the FN is 5% and the results could be consequential.
You can accomplish a lot with an understanding of these actions. Not only to prevent major mistakes, but to realize significant benefits. To achieve results with AI in business, here are the key AI metrics, along with use cases, that demonstrate what can be accomplished
5 AI Metrics to manage AI hallucinations
1. Accuracy. Is AI doing its job?
Why it matters: Accuracy shows how often AI gets things right.
Use Case: A major bank used AI to spot fraudulent credit card transactions. By improving accuracy by just 2%, it saved over $50 million annually by stopping fraudulent charges before they hit customer accounts.
Formula: Accuracy = (TP+TN) / (TP+FP+FN+TN)
Even small improvements in accuracy can create outsized results.
2. Precision and Recall. Are you catching critical AI hallucinations?
Why it matters: It’s not enough for AI to be right sometimes—you want it to be right where it counts.
Use Case: A hospital system introduced AI to detect early signs of sepsis. High recall meant the AI flagged nearly all at-risk patients, which reduced mortality rates by 17%. On the other hand, too many false alarms (low precision) overwhelm doctors. The sweet spot is balancing both.
Formula: Precision = TP / (TP + FP) and Recall = TP/(TP +FN)
When lives or revenues are at stake, missing the right signal can cost dearly.
3. F1 Score. Is Accuracy aligned with business needs?
Why it matters: This metric is a way of asking, “Is our AI practical in the real world?”
Use Case: An e-commerce giant used AI to recommend products. High accuracy looked good on paper, but customers ignored irrelevant suggestions. However, the team balanced the model’s precision and recall (measured by F1 score). Click-through rates rose by 35%, and revenue per visitor increased by 18%.
Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
It’s not just about being right, it’s about being right in ways that drive outcomes.
4. Cost per Prediction. Is it scalable?
Why it matters: Every AI prediction (a chatbot answer, a fraud alert, a product recommendation) costs computing power. At scale, those pennies add up.
Use Case: A logistics company built an AI route optimization system. By cutting cloud costs by 40% without sacrificing performance, they saved $6 million annually.
Formula: Cost per Prediction = (Total AI Operational Costs) / (Total Number of Predictions)
Tracking cost ensures AI is sustainable as usage grows.
5. Return on Investment (ROI). The ultimate measure of success
Why it matters: AI isn’t about fancy models—it’s about delivering value.
Case in point: Coca-Cola used AI to personalize marketing campaigns. The result? Campaign ROI improved by 4x, with significantly higher engagement and sales lift. Another example: An insurance provider reduced claims processing time by 80% through AI automation, saving tens of millions of dollars per year.
Formula: ROI = (Total Return – Total Cost) / Total Cost * 100%)
If you can’t tie AI back to ROI, it’s just an experiment.
Does this explain AI hallucination to you? Do these AI metrics help you understand how to manage them and see results? Are you ready to implement an AI measurement plan? One that achieves ROI?




