By Yin Nwe Ko

 

I INITIALLY began my aca­demic journey with a strong focus on engineering because I found great satisfaction in see­ing how my work could be applied to solve real-world problems. The tangible impact of engineering was gratifying, but I sought a way to leverage this knowledge to con­tribute to health improvements. My interest gradually shifted to­wards the biological applications of engineering, specifically in the study of otoacoustic emissions – the sounds our ears generate. This fascinating phenomenon is used to understand the function­ality of the hearing system, which is particularly crucial for assess­ing hearing in infants or individ­uals with cognitive impairments who cannot effectively commu­nicate their hearing issues. This objective method of evaluating hearing health intrigued me and motivated me to explore more objective tools and techniques for assessing human health overall.

 

This interest eventually led me to the field of voice science and voice disorders, areas closely related to the hearing system. Our hearing system is integral to voice perception, as it processes the sounds we hear when someone speaks. My research now encom­passes both voice production and perception, utilizing my engineer­ing background to develop the methodologies and tools essential for this work. By studying how we produce and perceive voice, I aim to enhance our understanding of these processes and contribute to the development of better di­agnostic and therapeutic tools.

The connection between vocal production and perception is deeply rooted in the brain, which controls both systems. For instance, infants need to hear sounds to learn to produce them, highlighting the interplay between these systems during speech development. Despite this connection, the high-level pro­cessing of voice production and perception occurs in separate sys­tems. Voice production primarily involves the phonatory system located in the larynx, controlled by over 20 muscles, along with additional muscles linked to the jaw, sternum, and other parts of the body. The respiratory system also plays a critical role, as we need to exhale to speak, while the articulatory system, including the mouth, shapes the sound.

 

On the other hand, the hear­ing system comprises distinct structures, such as the periph­eral and central auditory systems. These systems process sound signals at various stages, with the brain handling the high-lev­el processing. In my research, I am exploring how artificial intel­ligence can replicate this com­plex processing. By simulating the brain’s approach to sound processing, we aim to develop AI systems that can mimic human voice perception, enhancing our understanding and capabilities in this field.

 

When we talk about dys­phonia, we refer to an abnormal quality of voice. For instance, a person with dysphonia might have a breathy voice that’s difficult to hear or a voice that’s frequent­ly interrupted. This condition indicates a problem with the bi­omechanics or physics of voice production in the larynx or voice box. My research focuses on neu­rogenic voice disorders, which are particularly challenging to diagnose because their symptoms often resemble those of other dis­orders. Although the underlying mechanisms affecting the vocal folds or the larynx differ, these distinctions are hard to identify in a clinical setting. That is where our research and artificial intelli­gence (AI) come into play.

 

To diagnose these disor­ders more effectively, we utilize high-speed video technology to capture thousands of images of the vocal folds’ vibrations during speech. This method provides a wealth of data critical for un­derstanding and differentiating these disorders. We then employ AI to analyze these images and extract significant features. AI’s capability to process vast datasets quickly allows us to identify sub­tle patterns and hidden features that are not discernible through manual inspection. This speed and precision are particularly val­uable in clinical environments, where time is often limited.

Beyond diagnostics, AI has other significant applications in our research. Similar to facial rec­ognition technology, which identi­fies facial features and emotions, AI can analyze high-speed videos of vocal folds to classify various tissue structures automatically. This process eliminates the need for manual image examination, making it feasible to analyze ex­tensive datasets rapidly. We de­sign AI tools to identify and track specific features within these videos, enabling us to study the movement and behaviour of dif­ferent tissues comprehensively.

 

AI operates in two primary ways in our research. First, it cap­tures and processes information that is visible but difficult to ana­lyze manually due to the volume of data. Second, AI uncovers hidden information that aids in the differ­ential diagnosis of voice disorders. For instance, AI can detect subtle variations in voice quality, such as breathiness or strain, which are challenging to identify otherwise.

 

One remarkable capability of AI is its ability to analyze tem­poral data – information spread across time. In just one minute of recording, we might have around 300,000 images. Instead of teach­ing AI-specific features to look for, we can provide it with raw data and let it perform a complex, trial-and-error analysis. Although we guide the AI in this process, it largely operates independently, extracting features and creating a comprehensive information map. This map highlights critical time intervals and specific voice qualities, such as breathiness or strain, enabling us to make more accurate diagnoses and under­stand the underlying issues more thoroughly.

 

To record high-speed vide­os of vocal fold vibrations while someone is talking, we use two primary methods. The first meth­od involves a rigid endoscope, which is a cylindrical device in­serted through the mouth and positioned towards the back of the tongue. This endoscope is connected to a high-speed cam­era located outside the mouth. When the endoscope is in place, the person can produce vowels or perform a pitch glide, moving up and down in pitch. However, this method limits the range of speech the person can produce.

 

The second method employs a flexible scope connected to the high-speed camera, which is in­serted through the nose to a po­sition just above the vocal folds. This approach allows for a broad­er range of speech, enabling the person to speak various words, vowels, and consonants. We re­cently utilized this technique for data collection at the Mayo Clinic in Arizona. This method is particularly beneficial for study­ing task-specific voice disorders, where symptoms vary significant­ly based on what the person is say­ing. For instance, someone might be able to say “Aaaaah” without issue, but problems arise when they start speaking in sentences. This disorder, known as laryngeal dystonia, is rare and particularly interesting because it can affect specific activities like singing while leaving normal speech unaf­fected. We are currently studying a subject with this condition to gain a deeper understanding of the disorder.

 

Our ongoing research into laryngeal dystonia has yielded intriguing results. We are exam­ining how the content of speech, specifically the transition between different vowels and consonants, affects the severity of symptoms. One notable observation is the significant variability in symp­toms among individuals with the same disorder. Clinically, patients are often categorized similarly and receive the same treatment, typically Botox injections, based on the collective experience of laryngologists. However, we aim to understand these individual dif­ferences better to develop more personalized treatment approach­es. By analyzing the variability in laryngeal behaviour during speech, we hope to determine if a uniform treatment approach is effective for all patients or if individualized treatments are necessary.

 

Our ultimate goal is to design tailored treatment plans that ac­count for the unique character­istics of each patient’s condition. We are working to model voice production to answer critical questions about the effectiveness of treatments across different in­dividuals. For example, we want to know if the same treatment will work for patients with overlapping symptoms. By addressing these questions, we aim to enhance the effectiveness of treatments for laryngeal dystonia and improve the overall quality of life for those affected by this disorder.

 

We are designing mechanical models to investigate how airflow from the lungs interacts with the vocal folds and how tissue move­ments generate sound. These models help us understand what happens if there is a disruption in these systems, how it manifests in a person’s voice, and how we might develop treatment strate­gies for such anomalies.

 

Regarding the singer’s dystonia, which affects only the singing voice, one hypothesis is that it could be an early indicator of a neurological disease. Voice production involves the precise coordination of the respiratory system and vocal fold vibrations. Any disorder affecting these sys­tems, such as a neurological or mental health issue, will reflect in the voice. For instance, many research groups are exploring voice evaluation for diagnosing conditions like Parkinson’s dis­ease.

 

In our lab, we aim to use voice analysis to assess depres­sion and anxiety in patients. We are developing a tool that can evaluate an individual’s voice as a biomarker for various health issues. Voice production involves complex physiological processes, and any weakness in the mus­cles can affect the voice, although these changes may not be easily perceivable. Therefore, having objective tools to evaluate voice can reveal information that is not evident through simple listening or acoustic signals.

 

Artificial intelligence (AI) holds great potential for creating personalized models of a person’s voice by summarizing various measurements and information. For example, surgeons dealing with laryngeal cancer currently use extensive questionnaires to decide on treatment approach­es. AI could streamline this pro­cess by providing summarized, relevant information based on the disorder’s pathophysiology. However, AI will likely not replace the nuanced decision-making of doctors, who rely on real experi­ence to make informed choices.

 

Our research goal is to de­velop protocols that can be used clinically. We aim to understand the fundamentals of voice pro­duction and how various factors, such as mental health status or treatments like radiation therapy for laryngeal cancer, might affect it. Laryngeal cancer is of particu­lar interest because it has one of the highest suicide rates among cancers, possibly due to its impact on the ability to communicate.

We strive to use our under­standing of voice to improve clinical diagnoses and create ac­curate, individualized treatment approaches. Additionally, we are exploring the use of voice as a biomarker for other health issues. We have started data collection at Henry Ford Health and anticipate obtaining intriguing results from these projects soon. Our ultimate aim is to enhance the diagnostic and treatment capabilities for voice-related disorders and oth­er health conditions reflected in the voice.

 

Reference:

The excerpt of the interview with Maryam Naghibolhosseini, the director of the Analysis of Voice and Hearing Laboratory at Michigan State University, and the editor-in-chief from American Scientist Magazine (July-August 2024 issue)