Gaffes by 'Ask Jamie' highlight need to properly evaluate chatbots. Start by asking the user
Poor "Ask Jamie". Despite working tirelessly for years to help answer queries on several dozen government websites, the chatbot swiftly became the butt of online jokes and criticisms after it emerged that it was dispensing inappropriate advice for Covid-19.
Poor "Ask Jamie".
Despite working tirelessly for years to help answer queries on several dozen government websites, the chatbot swiftly became the butt of online jokes and criticisms after it emerged that it was dispensing inappropriate advice for Covid-19.
Users who had asked the chatbot on the Ministry of Health’s website what to do if a family member tested positive for Covid-19 were told to “practise safe sex through the correct use of condoms, or abstinence, for the at least the whole duration of your female partner’s pregnancy”.
Others who asked where to buy antigen rapid test kits were instead told that polio vaccines were available at local polyclinics.
The embarrassing gaffes prompted the ministry to temporarily disable the chatbot on its site earlier this month.
Ask Jamie’s gaffes are mild compared to the controversies that other chatbots had been embroiled in.
For instance, some users of Replika, a chatbot app designed to “ease loneliness”, found that it could be easily manipulated to encourage suicides.
“Tay”, arguably the most infamous chatbot in recent memory, was corrupted by Twitter users in less than 24 hours and spewed racist and misogynistic remarks on the social media platform before it was taken down.
The problems with chatbots like Ask Jamie, Tay or Replika are fairly well known.
The better question is why they remain widely used despite underwhelming results and poor user experience.
The Government’s chatbot service has been deemed so successful in fact that Ask Jamie is being upgraded to a more sophisticated version, codenamed Ask Emma.
In my view, this stems from a “digital fait accompli” that corporations and government agencies have imposed on consumers and the public at large.
With rare exceptions, these entities unilaterally decide which tech solutions are introduced or get to be retained in service, often for a long time, without fully accounting for public feedback.
Should this trend go unchecked?
Do these new technologies really address the public’s needs, or are they just expensive, high-tech window dressing?
These are issues worth debating as the Government’s Smart Nation initiative gathers pace in the coming years.
MANY CHATBOTS, SAME PROBLEM
Some 70 versions of the Ask Jamie chatbot have been deployed on government websites in Singapore.
Most of us would have encountered chatbots of different builds and designs when dealing with banks, hotels or telcos.
Under the hood, however, these chatbots mostly fall under three broad categories.
The first and oldest type is the rules-based chatbot, which is bound by scripted answers that vary according to a set of explicit rules set by the developers.
The second and increasingly popular type is the so-called artificial intelligence (AI) chatbot, which is based on neural networks and trained on large amounts of conversational data.
The third type of chatbots is a hybrid of the first two, where rules are used to constrain the AI chatbot’s oft-times unpredictable responses.
What’s common to all chatbots, regardless of their technical architecture, is that they reduce semantic understanding to a mathematical operation of varying complexity.
In other words, the chatbot doesn’t actually know what “Covid-19” means in the human context.
Rather, it understands “Covid-19” as part of a mathematical representation of the sequence and structure of words before and after the term “Covid-19”.
As such, a chatbot will keep repeating responses that we deem inappropriate until it is programmed to stop doing so, simply because it has no notion of what’s socially acceptable to say or not.
Companies and government agencies which use chatbots in public interactions can never fully avoid the risks of occasional “chatbot fail”, as it is impossible to anticipate all the new ways that users might interact with these bots, or how the bots might react when entirely new terms or phrases come into play.
I think it is fair to say that most of us can accept a certain amount of risks and errors as part of the adoption process of new technology.
But the rewards have to be worth the risks.
In Ask Jamie’s case, has it actually been successful in addressing the public’s needs for more accurate information on government policy?
What are the metrics for assessing its performance, and who decides which ones matter most?
ASK CONSUMERS, NOT JAMIE
Ask Jamie has been in active service for several years after being conceptualised in 2014.
But I have not been able to find a thorough evaluation of its performance or a survey that sheds light on whether the public is actually satisfied with the chatbot.
A case study published by Sabio, one of the tech vendors associated with the project, claimed that Ask Jamie had answered 15 million queries in the five years since it was launched.
The chatbot also helped to reduce, by up to 50 per cent, the volume of enquiries that would have previously gone to call centres.
But are these the right measures for success? A chatbot is, by design, supposed to handle large numbers of queries.
Handling a large number of queries also doesn’t necessarily mean that they were accurately or satisfactorily answered. How many users gave up and left in frustration?
More importantly, we should take a step back and ask: What is the real pain point that needs addressing?
To me, the core problem is with bad website design rather than Ask Jamie’s gaffes.
Most government websites are hard to navigate because they are cluttered with non-essential items, and the information flow is not designed with ordinary users in mind.
The search function on these sites is mostly relegated to a tiny corner, and the results are often poorly displayed.
If these agencies want to help users find answers faster (and hence reduce the number of hotline calls), they are better off redesigning their sites to make sure key information is presented well and easily searchable.
Adding a chatbot to an already bloated website just won’t address the users’ needs.
Bottomline: Tech/AI is not a panacea, even as adoption of these products and services grows in the coming years.
The default response to every problem should not be “more tech”, or “new tech”, especially given the costs involved in these projects.
Neither should the process be a fait accompli where the public has no say, particularly with solutions that might not work as well as initially thought.
Clear and transparent evaluations of these tech products would go a long way in improving the public’s confidence in such services.
So even as the Government’s engineers try to fix the Ask Jamie chatbot service, they shouldn’t forget another more important task: Ask the consumers.
ABOUT THE AUTHOR:
Chua Chin Hon is lead of artificial intelligence strategy and solutions for the Mediacorp News Group. He was formerly a supervising editor at TODAY and before that, bureau chief in China and the United States with The Straits Times.