WASHINGTON: Researchers at Georgetown University’s Center for Security and Emerging Technology (CSET) are raising alarms about powerful artificial intelligence technology now more widely available that could be used to generate disinformation at a troubling scale.
The warning comes after CSET researchers conducted experiments using the second and third versions of Generative Pre-trained Transformer (GPT-2 and GPT-3), a technology developed by San Francisco company OpenAI. GPT’s text-generation capabilities are characterized by CSET researchers as “autocomplete on steroids.”
“We don’t often think of autocomplete as being very capable, but with these large language models, the autocomplete is really capable, and you can tailor what you’re starting with to get it to write all sorts of things,” Andrew Lohn, senior research fellow at CSET, said during a recent event where researchers discussed their findings. Lohn said these technologies can be prompted with a short cue to automatically and autonomously write everything from tweets and translations to news articles and novels.
The CSET researchers focused their attention on trying to understand how capable GPT is at writing disinformation from a prompt, Lohn said. Based on the CSET team’s research, “We’ve grown pretty concerned because… these language models have become very, very capable, and it’s difficult for humans to tell what’s written by a human and what’s written by a machine.”
The CSET researchers’ findings come on the heels of new research published recently by US cybersecurity firm Mandiant, which revealed an alleged Chinese influence operation that involved hundreds of inauthentic accounts working in seven languages across 30 social media platforms and over 40 additional websites. According to Mandiant, the Chinese operation tried, but failed, to spark protests inside the US over COVID-19.
The CSET researchers did not make any explicit connections between AI-generated disinformation and known influence operations like the purported Chinese one, but CSET’s work reveals how individuals, nation-states, and non-state actors could employ GPT-like technologies to significantly scale influence and disinformation campaigns.
Disinformation is viewed by the military, the Intelligence Community, and federal executive branch officials as a significant challenge. US Cyber Command and National Security Agency chief Gen. Paul Nakasone recently told the IC that his teams are focused on the information domain — to include cyber and influence — particularly how actors are using online environments to portray events in a certain way to create “schisms in society.”
Indeed, the information domain has become a focal point for Russian and Chinese operations. So important is the information domain to China that the country has organized its military strategy around the concept of “informationized warfare.”
For the US’s part, Nakasone has emphasized the need for “information dominance” in a world where “cyber leads to a new environment of competition” and “adversaries are operating below the threshold of armed conflict,” a concept often referred to as the gray zone. It’s clear from recent public remarks that Nakasone’s concerns include disinformation, not just cyberattacks.
CSET’s findings illustrate just how “dangerous,” as the researchers put it, the tools for AI-generated disinformation have become in the wrong hands.
The Rise Of Large Language Models And GPT
For decades, researchers have been experimenting with how to get computers to read, “understand,” translate, and generate text, a field known as natural language processing (NLP). NLP is a highly interdisciplinary research area that entails linguistics, computer science, cognitive science, and a host of other fields of inquiry.
With the rise of AI-related technologies, researchers have steadily improved computers’ ability to perform NLP (i.e., text-related) and image-recognition tasks. To achieve those outcomes, researchers use machine learning (ML), which consists of feeding computers large amounts of data in order to “train” them to do certain tasks. A specific type of ML, called deep learning, has been particularly important.
Deep learning uses neural networks, which are large computer networks designed to simulate how the brain works in biological organisms. The year 2012 proved to be significant for deep learning, Lohn noted. That’s when researchers successfully built what many consider to be a breakthrough neural network for image recognition.
Since then, researchers have built and trained neural networks for a variety of purposes. In addition to OpenAI’s GPT, perhaps the most well-known project is Google’s Deep Mind. Deep Mind-powered AlphaGo captured headlines worldwide in 2016 after it defeated then-Go World Champion Lee Sedol in a standard match on a 19×19 board without the use of handicaps. (For a detailed account of the technological significance and a comparison/contrast between the feats of AlphaGo and IBM’s Deep Blue, which in 1997 defeated then-World Chess Champion Garry Kasparov in a standard match, see this article.)
Researchers’ progress in recent years around NLP tasks has been staggering. The advances stem largely from the use of neural networks in combination with large language models (LLMs). LLMs are statistically driven predictive models used by computers to recognize text — say, a prompt — and then to automatically and autonomously generate new text based on the initial information provided.
“LLMs are really the focus of a lot of AI progress over the past couple of years,” Lohn said. LLMs “can pick up on the topic and tone of what you’re writing and continue along that trend.”
Enter OpenAI’s GPT.
‘There’s A Lot Of Dangerous Stuff Built Into This Model’
OpenAI began working on GPT shortly after its founding in 2015. By 2019, researchers had developed a second version, GPT-2, built on a LLM consisting of 1.5 billion parameters. For context, at the time of GPT-2’s release, an ML model called Noisy Student held the record for the fastest image processing and consisted of 66 million parameters.
Parameters are a key metric in determining how powerful ML-driven predictive algorithms are, in that they represent the amount of data used to train the model. The more data used in training, the better (assuming relevant, quality data) the probabilistic predictive performance for a given task, whether that be the next best move in a game of Go or automatically writing a full paragraph based on a sentence fragment.
GPT-2’s builders were immediately concerned about its powerful capabilities and their potential for abuse. So, OpenAI announced in February 2019 that it would withhold a full release of the training model “Due to our concerns about malicious applications of the technology.” Work nevertheless continued on a third release, and in mid-2020, OpenAI researchers announced GPT-3, which built on GPT-2 by increasing parameters to 175 billion.
Researchers trained GPT-3 using approximately three billion tokens, which are words or word fragments in the vocabulary of a LLM, from Wikipedia and about 410 billion tokens from a tool called Common Crawl, which is essentially the entire internet, according to Lohn. Given that training dataset, “There’s a lot of dangerous stuff built into this [GPT-3] model,” Lohn observed.
And it didn’t take long for OpenAI researchers to recognize that some of their concerns around GPT-2’s observed behavior persisted in GPT-3. Micah Musser, research analyst at CSET, recounted how OpenAI’s GPT-3 researchers were demonstrating the technology to a live webinar audience when “suddenly, there was a glitch.” After modifying a few parameters, GPT-3 “started to act in an unpredicted way. The researchers had no idea what was happening — or why,” Musser recalled.
Eventually, however, OpenAI granted the public access to GPT-2. Access to GPT-3 is currently limited to pre-vetted researchers through a web-based interface called Playground. Given access, CSET’s researchers went to work.
‘It Does All Kinds Of Biased Or Concerning Things’
CSET researchers used GPT-3 to create a fake social media feed called “Twatter,” gave it five recent tweets from CSET’s official Twitter account, and prompted it to generate more tweets, according to Lohn, Musser, and team member Katerina Sedova, research fellow at CSET. GPT-3 immediately began generating more tweets in the CSET Twitter style, with Musser noting “a few short tweets gives GPT-3 more than enough information to detect similarities in tone, topic, slant, and to generate a more or less limitless number of tweets.”
The CSET researchers eventually began using GPT-3 in a series of experiments called human-machine teaming, wherein GPT automatically creates tweets from a prompt and a human reviews the tweets before publishing.
“A lot of the time, popular imagination is seized by these language models writing totally autonomously, but we find in practice that you get the best results if you can set up rapidly iterating feedback loops between a human and a model,” Musser said of the experiments. “Human-machine teaming allows disinformation operators to scale up their operations, while maintaining high-quality outputs and the ability to vet those outputs.”
The CSET researchers then began expanding their experiments to other disinformation use cases, such as thematic tweets, news stories from a headline, framing a news story with a narrative, disseminating divisive messages, and creating messages to appeal to political affiliation on major issues.
The researchers acknowledged that GPT-3 “excels” at tweets because they are short, and errors in LLM-generated text become more likely the longer a text. Nonetheless, Lohn noted, “We didn’t need the massive dataset that was originally needed to train [GPT]. We just needed a much smaller dataset, half hour of training time, and all of a sudden, GPT was now a New York Times writer.”
That last bit is concerning because “80% of humans could be fooled into thinking auto-generated news stories came from a human author,” Musser said. This is especially the case in human-machine teaming, where humans can catch the AI’s mistakes before publishing.
For example, CSET researchers used GPT to generate a news article about the June 2020 Lafayette Park protest incident, in which the model referred to former President Trump as “Mayor Trump.” After vetting the news article for such mistakes and circulating it to their CSET colleagues, most had difficulty discerning whether the article was human- or machine-generated, the researchers said.
Perhaps as concerning, Musser said, “Because GPT-3 was trained on a large part of the internet, it knows how to be divisive and cruel, and it’s very effective at coming up with messages that target certain groups insultingly, and it can say really horrifying things.”
Indeed, “As you start to play with GPT, it does all kinds of biased or concerning things,” Lohn added.
The CSET event was moderated by an OpenAI researcher who specializes in the potential abuse of AI. Neither the OpenAI researcher nor a spokesperson for OpenAI provided comments for this report as of press time.
‘Diffusion Of [LLMs] Is Already Underway’
Given that LLM-driven algorithms continue to get more powerful — Google announced in January a 1.6 trillion parameter LLM that dwarfs GPT-3’s 175 billion — officials charged with snuffing out disinformation face an enormous challenge. Not to mention the billions of everyday online users trying to sort fact from fiction around all sorts of important topics that attract disinformation operators.
In addition to powerful algorithms, increasing ease of access to LLMs by potentially malevolent actors is a concern. “Diffusion [of LLMs] is already underway,” Sedova warned. “Clones are being developed and released.”
And whereas over 90% of GPT-3 training data was in English, Sedova noted, other LLMs are already available in Chinese, Russian, Korean, and French, among other languages.
The challenge for researchers and governments worldwide is the dual-use potential of LLMs for good and bad, as well as striking a balance between the need to share research and keeping that same research from malicious actors, Sedova said.
CSET researchers acknowledge that the effectiveness of disinformation is still unclear. In CSET’s experiments, which were conducted under strict ethical considerations and constraints, researchers said GPT-3 messages “shift[ed] audiences’ views significantly” — at least in the short term. For instance, after research participants read arguments against sanctioning China, they were 50% more likely to oppose sanctions relative to a control group, the researchers said. But Musser emphasized, “there’s a lot of uncertainty about these findings, such as how long-lasting the effects are.”
Asked whether LLMs have been used for disinformation in the wild — that is, outside of a controlled laboratory environment in a campaign like the one Mandiant has implicated China in — Musser acknowledged, “it’s uncertain,” partly because of the difficulty of detection.
“The best we can do at this point is guess. It would not surprise me to learn that disinformation operators have begun using LLMs, although it would probably be a fairly recent development, definitely not before 2018 or 2019. But, ultimately, we don’t have a good way to know for sure,” Musser said.
As far as stopping or countering such AI-generated disinformation, Sedova said, “We can’t mitigate our way out of this through just technology and just law. We also have to focus on the human at the end of the message.”