Brain-computer interfaces are a breakthrough technology that could help paralyzed people regain functions they’ve lost, such as moving a hand. These devices record signals from the brain and decipher the user’s intended action, bypassing damaged or degraded nerves that would normally transmit brain signals to control muscles.
Since 2006, demonstrations of brain-computer interfaces in humans have focused primarily on restoring arm and hand movements by allowing people to control computer cursors or robotic arms. Recently, researchers have begun developing speech brain-computer interfaces to restore communication for people who cannot speak.
As the user attempts to speak, these brain-computer interfaces record the person’s unique brain signals associated with the muscle movements being attempted to speak and then translate them into words. These words can then be displayed as text on a screen or spoken aloud using text-to-speech software.
I’m a researcher in the Neuroprosthetics Lab at the University of California, Davis, which is part of the BrainGate2 clinical trial. My colleagues and I recently demonstrated a speech brain-computer interface that deciphers the speech attempts of a man with ALS, or amyotrophic lateral sclerosis, also known as Lou Gehrig’s disease. The interface converts neural signals into text with greater than 97% accuracy. The key to our system is a set of artificial intelligence language models – artificial neural networks that help interpret natural models.
Recording brain signals
The first step in our speech-brain-computer interface is to record brain signals. There are several sources of brain signals, some of which require surgery to record. Surgically implanted recording devices can record high-quality brain signals because they are placed closer to neurons, resulting in stronger signals with less interference. These neural recording devices include grids of electrodes that are placed on the surface of the brain or electrodes that are implanted directly into brain tissue.
In our study, we used electrode arrays surgically placed in the speech motor cortex, the part of the brain that controls muscles associated with speech, of participant Casey Harrell. We recorded neural activity from 256 electrodes as Harrell attempted to speak.
Decoding brain signals
The next challenge is to relate the complex brain signals to the words the user is trying to say.
One approach is to map neural activity patterns directly onto spoken words. This method requires recording brain signals corresponding to each word multiple times to identify the average relationship between neural activity and specific words. While this strategy works well for small vocabularies, as demonstrated in a 2021 study using a 50-word vocabulary, it becomes impractical for larger ones. Imagine asking the user of the brain-computer interface to try to say every word in the dictionary multiple times—it could take months, and it still wouldn’t work for new words.
Instead, we use an alternative strategy: mapping brain signals to phonemes, the basic units of sound that make up words. In English, there are 39 phonemes, including ch, er, oo, pl, and sh, that can be combined to form any word. We can measure the neural activity associated with each phoneme multiple times by asking the participant to read a few sentences aloud. By accurately mapping neural activity to phonemes, we can assemble them into any English word, even those the system has not been explicitly trained on.
To map brain signals into phonemes, we use advanced machine learning models. These models are particularly well-suited for this task because of their ability to find patterns in large amounts of complex data that are impossible for humans to discern. Think of these models as super-intelligent listeners that can pick up important information from noisy brain signals, just as you might focus on a conversation in a busy room. Using these models, we were able to decipher phoneme sequences during attempted speech with over 90% accuracy.
From phonemes to words
Once we have the decoded phoneme sequences, we need to convert them into words and sentences. This is a challenge, especially if the decoded phoneme sequence is not perfectly accurate. To solve this puzzle, we use two complementary types of machine learning language models.
The first are n-gram language models, which predict which word is most likely to be a sequence of N words. We trained a 5-gram, or five-word, language model on millions of sentences to predict the likelihood of a word based on the preceding four words, capturing local context and common phrases. For example, after “I am very good,” it might suggest that “today” is more likely than “potato.” Using this model, we convert our phoneme sequences into the 100 most likely word sequences, each with an associated probability.
The second is large language models, which power AI chatbots and also predict which words are likely to follow others. We use large language models to refine our choices. These models, trained on large amounts of diverse text, have a broader understanding of language structure and meaning. They help us determine which of our 100 candidate sentences makes the most sense in a broader context.
By carefully balancing the probabilities of the n-gram model, the large language model, and our initial phoneme predictions, we can make a very educated guess about what the user of the brain-computer interface is trying to say. This multi-step process allows us to handle the uncertainties in decoding phonemes and produce coherent, contextually appropriate sentences.
Real world benefits
In practice, this speech decoding strategy has proven remarkably successful. We have enabled Casey Harrell, a man with ALS, to “speak” with his thoughts alone with over 97% accuracy. This breakthrough has allowed him to easily converse with his family and friends for the first time in years, all in the comfort of his own home.
Speech-brain-computer interfaces represent a significant step forward in restoring communication. As we continue to refine these devices, they hold the promise of giving a voice to those who have lost the ability to speak, reconnecting them with their loved ones and the world around them.
However, challenges remain, such as making the technology more accessible, portable, and sustainable over years of use. Despite these obstacles, speech-brain-computer interfaces are a powerful example of how science and technology can come together to solve complex problems and dramatically improve people’s lives.
This article is republished from The Conversation, a nonprofit, independent news organization that brings you facts and reliable analysis to help you understand our complex world. It was written by: Nicholas Card, University of California, Davis
Read more:
Nicholas Card is not an employee of, an advisor to, an owner of stock in, or a recipient of funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond his academic appointment.