AI: A Trust issue  

Apr 28, 2024

|

2 min read

How can an intelligent agent be trusted? How can I tell if artificial intelligence is reliable? 

We must think about things like: How can we assist users in establishing that trust? How can we, as individuals, trust the technology we create? In the first place, what does it mean to "trust" an artificially intelligent technology? 

Since dealing with probability and chances when using computers at work is unfamiliar to many users, we are entering a new area of computing with our intelligent agents. Our computer systems are frequently expected to be precise devices that are always right. Most users do not have this experience with their professional work software, even though they have familiarity with probability in speech recognition, ads, and weather forecasts. It is important to emphasize that all of the information that an agent collects is truly available to the general public via the intranet or the internet, respectively. Through careful training, we have trained our models to distinguish between reliable and dubious sources. Our agents and artificial intelligence in general, however, give results with a certain possibility of being right or erroneous. 

Accuracy and Memory: As we explain, there is no one-size-fits-all response to the trust question because people's levels of faith in AI differ based on their viewpoint and the situation. Pattern recognition is one of the duties carried out by our developers and other machine learning technology vendors. classification as well as automated appraisals. They measure the system's performance and dependability using statistical measures like "precision" and "recall." Precision is the likelihood that a positive prediction will come to pass, meaning that the outcomes our agent recommends are, in fact, favourable outcomes. The likelihood that a positive target would be forecast positive is known as recall; in other words, our agent found all pertinent outcomes from the entire set of data. The system is more dependable and trustworthy if these two criteria are greater. We do this to guarantee the caliber of our agents as they are being developed. 

Both Subjective Precision and Subjective Recall

  • Our users typically see a different response to the trust question. When we assign our clients the responsibility of supplier search, for instance, we encounter the trust challenge. Our Company Identification Agent mimics a human search by doing quick, requirements-specific supplier searches using data from the public web based on specific customer requests. The agent selects a selection of businesses that offer a feature set and are situated in the designated region from thousands of firm prospects. "How can I trust the agent to find me all the relevant companies?" is the question we must answer. 

  • Following some more in-depth discussions, we concluded that the initial interactions with the agent have a significant influence on trust. Trust is high when the agent displays businesses that the user is already aware of; it is low when the agent fails to mention these businesses. "How can the rest of the list be any good if the agent even failed to find the companies I know?" users ask. Naturally, this "recall" metric is somewhat arbitrary. It is more about the agent identifying the businesses that our users are thinking about than it is about recalling all of the companies that are out there and calculating the proportion of those that the agent identified. Since no one can possibly know every company that exists, the only reference data customers have is subjective, therefore this makes a lot of sense to them. 

     

  • When it comes to accuracy, we observe another phenomenon. Users evaluate misleading results based on the apparent amount of (human) ignorance when the agent shares its findings. In other words, "How easy is it for a human user to understand that this is not a correct result?" or "How stupid was the agent in showing the false result?" There are, in fact, erroneous examples that are quite simple for humans to identify because the agent operates fundamentally differently in how it evaluates and links data. For instance, a supplier agent finds a record that is obviously unrelated to the supplier. However, the agents typically struggle with these insights because they lack contextual knowledge about the world. 

  • Unfortunately, the degree of trust in the agent is significantly impacted by one extremely "stupid" outcome among a number of otherwise excellent results. Humans do, in fact, follow the reasoning: "How can the agent be correct about the other results if one result is that stupid?"

  • Therefore, in both situations, recall and precision turn into arbitrary metrics that make it difficult to assess how much faith one should place in an agent. As a result of these unfounded initial perceptions, people may be reluctant to trust an agent who performs effectively. Conversely, consumers could have faith in agents who don't always perform well but were "lucky" to produce the desired outcome. 

  • Our users typically see a different response to the trust question. When we assign our clients the responsibility of supplier search, for instance, we encounter the trust challenge. Our Company Identification Agent mimics a human search by doing quick, requirements-specific supplier searches using data from the public web based on specific customer requests. The agent selects a selection of businesses that offer a feature set and are situated in the designated region from thousands of firm prospects. "How can I trust the agent to find me all the relevant companies?" is the question we must answer. 

  • Following some more in-depth discussions, we concluded that the initial interactions with the agent have a significant influence on trust. Trust is high when the agent displays businesses that the user is already aware of; it is low when the agent fails to mention these businesses. "How can the rest of the list be any good if the agent even failed to find the companies I know?" users ask. Naturally, this "recall" metric is somewhat arbitrary. It is more about the agent identifying the businesses that our users are thinking about than it is about recalling all of the companies that are out there and calculating the proportion of those that the agent identified. Since no one can possibly know every company that exists, the only reference data customers have is subjective, therefore this makes a lot of sense to them. 

     

  • When it comes to accuracy, we observe another phenomenon. Users evaluate misleading results based on the apparent amount of (human) ignorance when the agent shares its findings. In other words, "How easy is it for a human user to understand that this is not a correct result?" or "How stupid was the agent in showing the false result?" There are, in fact, erroneous examples that are quite simple for humans to identify because the agent operates fundamentally differently in how it evaluates and links data. For instance, a supplier agent finds a record that is obviously unrelated to the supplier. However, the agents typically struggle with these insights because they lack contextual knowledge about the world. 

  • Unfortunately, the degree of trust in the agent is significantly impacted by one extremely "stupid" outcome among a number of otherwise excellent results. Humans do, in fact, follow the reasoning: "How can the agent be correct about the other results if one result is that stupid?"

  • Therefore, in both situations, recall and precision turn into arbitrary metrics that make it difficult to assess how much faith one should place in an agent. As a result of these unfounded initial perceptions, people may be reluctant to trust an agent who performs effectively. Conversely, consumers could have faith in agents who don't always perform well but were "lucky" to produce the desired outcome. 

Establishing Credibility 

We were discussing this phenomenon at a recent conference when one of the attendees offered a remedy. Everyone, including our users, should become knowledgeable about test procedures, statistics, and other related topics. We should also rely on objective facts. Although this seems like a wonderful idea, it can be challenging to put into practice because not everyone can become a statistician and control their emotions. Even if we do, it is frequently necessary to carefully examine the test data itself to determine the implications of the findings for the specific use case that users may have in mind. 

As a result, we recommend a more practical strategy

To help users better understand the outcomes and their quality, as well as how they can design the agents more efficiently, we share and explain the method that the agent uses.

To help users better understand the outcomes and their quality, as well as how they can design the agents more efficiently, we share and explain the method that the agent uses.

We also thoroughly examine each of the individual incorrect findings and apply these insights to the agent's overall behavior.(Unfortunately, this approach has limitations, particularly for deep learning models, as the models and datasets get so complex that it is difficult to explain the individual outcome. Similar to our patent agent, we are unable to provide a detailed explanation of the reasoning behind each anticipated classification. Additionally, there are instances when our secret recipe—which we find difficult to divulge—is our primary intellectual property. 

For this reason, we also advise applying a very practical perspective to the agents' output and verifying whether the agent actually aids in, contributes to, or completes the work in its whole. Since an agent just supports a task and we are not obligated to its conclusion, we do not always need to totally trust it. False findings that are simple for people to identify are typically not a major issue. False outcomes that are difficult for human intellect to identify are far more serious. 

We go on to discuss the difficulties with the agent and concur with consumers that they shouldn't base their opinions on a single interaction. To test the agents' ability, we advise users to give them a variety of tasks. Like a human coworker, we require encounters to gain a deeper understanding of their strengths and areas of performance. 

Lastly, we clarify how artificial intelligence differs from human intelligence in terms of experience and assessment. Consequently, we advise against comparing the agents' intelligence levels on a human scale. An agent may do well in other situations while making an apparently stupid error, and vice versa. 

We also thoroughly examine each of the individual incorrect findings and apply these insights to the agent's overall behavior.(Unfortunately, this approach has limitations, particularly for deep learning models, as the models and datasets get so complex that it is difficult to explain the individual outcome. Similar to our patent agent, we are unable to provide a detailed explanation of the reasoning behind each anticipated classification. Additionally, there are instances when our secret recipe—which we find difficult to divulge—is our primary intellectual property. 

For this reason, we also advise applying a very practical perspective to the agents' output and verifying whether the agent actually aids in, contributes to, or completes the work in its whole. Since an agent just supports a task and we are not obligated to its conclusion, we do not always need to totally trust it. False findings that are simple for people to identify are typically not a major issue. False outcomes that are difficult for human intellect to identify are far more serious. 

We go on to discuss the difficulties with the agent and concur with consumers that they shouldn't base their opinions on a single interaction. To test the agents' ability, we advise users to give them a variety of tasks. Like a human coworker, we require encounters to gain a deeper understanding of their strengths and areas of performance. 

Lastly, we clarify how artificial intelligence differs from human intelligence in terms of experience and assessment. Consequently, we advise against comparing the agents' intelligence levels on a human scale. An agent may do well in other situations while making an apparently stupid error, and vice versa. 

We also thoroughly examine each of the individual incorrect findings and apply these insights to the agent's overall behavior.(Unfortunately, this approach has limitations, particularly for deep learning models, as the models and datasets get so complex that it is difficult to explain the individual outcome. Similar to our patent agent, we are unable to provide a detailed explanation of the reasoning behind each anticipated classification. Additionally, there are instances when our secret recipe—which we find difficult to divulge—is our primary intellectual property. 

For this reason, we also advise applying a very practical perspective to the agents' output and verifying whether the agent actually aids in, contributes to, or completes the work in its whole. Since an agent just supports a task and we are not obligated to its conclusion, we do not always need to totally trust it. False findings that are simple for people to identify are typically not a major issue. False outcomes that are difficult for human intellect to identify are far more serious. 

We go on to discuss the difficulties with the agent and concur with consumers that they shouldn't base their opinions on a single interaction. To test the agents' ability, we advise users to give them a variety of tasks. Like a human coworker, we require encounters to gain a deeper understanding of their strengths and areas of performance. 

Lastly, we clarify how artificial intelligence differs from human intelligence in terms of experience and assessment. Consequently, we advise against comparing the agents' intelligence levels on a human scale. An agent may do well in other situations while making an apparently stupid error, and vice versa. 

What comes next in terms of fostering confidence in the agents? 

As you can see, a lot of the aforementioned strategies call on us as human agent engineers and designers to help users develop trust. Although it is a difficult process to scale, this works. As a result, we are trying to enable the agents to assist users in learning how to gain experience and how to cope with trust in intelligent technology in a professional manner. 

Related Posts

5 min read

AI: A Trust Issue 

During the introduction of our agents, we often encounter inquiries that transcend the fundamentals of technology.

5 min read

AI: A Trust Issue 

During the introduction of our agents, we often encounter inquiries that transcend the fundamentals of technology.

5 min read

AI: A Trust Issue 

During the introduction of our agents, we often encounter inquiries that transcend the fundamentals of technology.

5 min read

AI: A Trust Issue 

During the introduction of our agents, we often encounter inquiries that transcend the fundamentals of technology.

Forget monday blues, paint the day with your color!

Forget monday blues, paint the day with your color!

Forget monday blues, paint the day with your color!