The AI Voice

Personifying machinery and mechanizing humanity

Jun 13, 2023

In 2013, Hatsune Miku sold out a worldwide stadium tour. Her songs have been featured on Japan’s Top 40 music chart, and she’s sold millions of albums worldwide. Sounds pretty standard for an international superstar... but Hatsune Miku is artificial intelligence. Created by Crypton Future Media in 2007, Hatsune Miku, which means “the first sound from the future,” is a Vocaloid software voicebank that quickly became a “collaboratively constructed cyber celebrity with a growing user community.”

Our obsession with artificial intelligence spans decades, with more and more products hitting the market designed to do the job of a human being. While current discourse fixates on whether or not AI will be able to replace human work, especially in creative industries, I am more concerned with our obsession with personifying machinery and mechanizing humanity.

The robot-ifying of music reminds me a lot of the physical perfection we seek with plastic surgery, skincare products, and filters. Jessica Defino, on her predictions for 2023 beauty trends, said “I predict that the skincare industry will continue to push consumers further from the purpose of life- AKA being fully present in your one wild and precious human body – with “cyborg skin”, the futuristic follow-up to 2022’s “jello skin” and “glazed donut skin.” Inspired by Metaverse avatars, AI art, and the democratization of photo-editing software, cyborg skin will seek to flatten any and all signs of life (wrinkles, pimples, pores) into a one-dimensional approximation of perfection: skin with no deviation in tone or texture, finished with a screen-like sheen perhaps courtesy of more “NASA-backed skincare devises promising to “optimize” human existence? The look marks a cultural shift from self-objectification (emulating inanimate foodstuffs) to self-mechanization (emulating humanoid machinery).” The same perspective can be applied to the music industry. When used to achieve perfection, AI removes humanity from music, replacing it with flat, robotic sounds and unnervingly on-point pitches.

Vocaloid software programs and voice banks can be used by anyone (for a price) to create any song. Hatsune Miku is featured on 100,000 songs released worldwide, all by different artists. This is shocking when you consider the variability those songs are lacking simply because they are using the same featureless voice. It reminds me of the scene in iRobot where thousands of identical robots are standing in perfect rows. Eerie, cold, and devoid of creativity.

The prevalence of artificial intelligence in music pushes music further and further from human origin. Pop culture seems to be enamored with anything that can smooth over what makes human beings human. We’re replacing imperfect, messy art with standardized, artificial products.

I, Robot. © 2004 Twentieth Century Fox Film Corporation.

I love imperfections in music, especially in artists’ voices. Gritty voice cracks, slight deviations in pitch, timbre that characterizes an artist’s sound. Who are Bob Dylan, Freddy Mercury, Ella Fitzgerald, and Britney Spears without their distinctive voices? Entire genres can be characterized by the vocal stylings of lead singers, and many techniques have been developed by artists to achieve a unique sound.

Vocal tracks are limited without a lead singer writing melodies with their own unique timbre and style, but they are likely to be more successful on social media, which, as I wrote about last week, is all that matters to the current mainstream music industry. Harling Ross Anton wrote in a recent Gumshoe post, “A social media algorithm is designed to identify patterns and therefore – you guessed it – consistencies. When online creators behave in consistent ways, algorithms know exactly what to do with them. In other words, how to categorize them, and who to serve their content to... it’s no surprise then, that in an age where social media dictates so much of our thinking around how we get dressed,... style would be swept up by the gravitational pull toward categorization as well.” Everything that used to be an expression of ourselves is moving towards homogenization. The depreciation of voices allows the algorithm to swiftly categorize music and find an audience for it.

Beyond artificially generated voices, voice-copying software like iMyFone Voxbox allows users to steal established artists’ voices and create songs with them to share online without the original artist’s knowledge or permission. While the voices aren’t robotic like those created by Vocaloid software, this technology presents many of the same problems. The voice may be human, but the product is not. Stealing someone’s voice and making them say whatever you want is right out of a horror movie (or The Little Mermaid). Beyond the grossness of listening to an artist sing a song they didn’t write and had no part in, this places marketable value on singers who have already had success and are therefore seen as better to use than an artist’s own voice. It is unnatural and disorienting to replace your voice for another even if that voice is human.

Dr. Toni Pikoos, for Rolling Stone, said in a discussion about Lensa A.I., (the app that creates animated avatars that drastically alter your face and body to adhere to beauty standards), “When there’s a bigger discrepancy between ideal and perceived appearance, it can fuel body dissatisfaction, distress, and a desire to fix or change one’s appearance.” I think the same can be said for using someone else’s voice and musical stylings for your own music, only to walk away distressed by your own artistic capabilities. Lensa AI allows you to see yourself as the animated, beauty-industry version of yourself and now AI voicebox programs allow you to replace your own voice with a “better model.”

It’s been a long time since Hatsune Miku was created, but her success is emblematic of society’s obsession with artificial perfection. The music industry has continued to follow the toxic trends of the beauty and fashion industries, favoring inhuman perfection over natural humanity. The beauty and fashion industries are thriving because we don’t want to look like ourselves. Why wouldn’t the music industry follow suit with increasingly narrow vocal standards?

Thanks for reading! If you liked this post, please share it.

Be Your Sound

Discussion about this post