In Calligo’s latest Beyond Data podcast, co-hosts Sophie Chase Borthwick and Tessa Jones are joined by Alexander Visheratin, Artificial Intelligence Engineer at Beehive AI. Here we explore some of the episode’s highlights; the importance of Natural Learning Processing (NLP) and the pros and cons of output produced by examples like OpenAI’s ChatGPT-3.

“It can do anything, because it was trained on everything”

NLP models like ChatGPT are changing the way we search for data online. But if you average everything, the output will necessarily be average. And we have questions:

  • How ethical is the learning data that feeds these models, and how ethical was the process of collecting it?
  • How can global models be policed and regulated within individual countries?
  • What is the potential for small and specific training datasets to be manipulated by humans in a way that will limit and create biases in the algorithms?
  • Is it a ‘bug’ when a prompt doesn’t give us what we wanted? What we ask for is rarely what we actually get.

Confidence or competence?

One major drawback of the NLP process is that many models stopped learning at the turn of the decade, which as Alexander highlights, can easily lead to incorrect information being generated. “I asked one of the large models, ‘who is the president of the United States?’ and it answered very confidently, Barack Obama.” That confidence is interesting, because as humans we are predisposed to trust information that is given to us clearly and directly, with no hint of doubt.

Also, NLP models are built to prove or agree with the task given to them, and they sound so plausible. Alexander shares a specific example of Chat-GPT providing convincing output that could easily persuade someone unfamiliar with the facts.

“Andrew Ng, who is an Adjunct Professor at Stamford University, asked Chat-GPT to prove that CPU is better than GPU for deep learning. It was very confident and created a long paragraph of text proving it. Then he asked it to prove that some more primitive way of calculating is better than CPU, and it again provided very confident paragraph of text. He ended up basically ‘proving’ that an abacus is better than GPU for deep learning.”

In this age of misinformation, there is huge potential for NLP to spread misleading (or downright false) information very quickly to large audiences. ‘Facts’ which then become accepted, magnified and transmitted further.

Taking liberties with artistic license

There are obvious intellectual property issues when it comes to NLP and art generation. Asking an AI tool to create a piece in the style of a named artist will generate convincingly similar work. But if this output contravenes the artist’s morals or political views for example, it is easy to see how discomfort (and possibly even legal challenges) could follow. Conversely, when original artwork is produced that has been generated from hundreds of command iterations to finesse exactly the output required, can it still be seen as ‘art’? Is it the work of the individual using the AI tool, or the tool itself? But is this any different to the great works credited to Michelangelo that we know were produced in part by his students? Is the value of NLP in art actually more as an idea generator, a source of inspiration for the artist rather than the end point?

Alexander believes that creatives shouldn’t be afraid of natural learning. “I think NLP is more of a supplement, a good supplement, because it allows us to be more creative, pushing forward, advancing. It’s not like a replacement at all, it’s more like a co-worker or a supplemental ghost writer almost.”

Guard rails contain or keep out discriminatory language?

OpenAI were very upfront when ChatGPT first launched about the fact that the model would not allow misogynistic or racist material to be produced. Yet the very nature of the learning process saw AI models scraping huge amounts of learning data from the internet, much of which would inherently be of questionable bias and tone. Thus, what these models are drawing from as ‘normal’ is very much not.

“What Chat-GPT doesn’t allow, it feels like it doesn’t allow not because of how it was trained, but because of the huge amounts of guard rails that OpenAI built around it. So, they basically caged this model into all these sorts of limitations about stuff that it shouldn’t allow. But if you can get past these guard rails and into the model itself, it still has all these biases, like race, gender, all this stuff. It still has it, but they just try their very best to limit the way it can show it. Chat-GPT is essentially a celestial bureaucrat!”

NLPs provide assistance, not autonomy

Going forward, combining NLP output with factual SEO-sourced content feels like best practice when using AI tools. Alexander points out that this is quicker than finding the information yourself too and gives us the opportunity to validate what the models generate. Ultimately, he believes that directed and federated learning have fantastic potential, as long as we remain mindful of the risk of reverse engineering and privacy breaches. Using NLP as part of the solution, not the source of the only answer.

If you’d like to discuss the benefits of using Natural Learning Processing in your organization, please contact Tessa Jones to find out more.

You can also watch the fascinating podcast in full below.