Holly Herndon is mother to an AI baby. Its name is Spawn and for the past few years Herndon has been training its machine-learning program on her vocals, letting it mutate and create its own noises, not unlike a gurgling toddler just developing its voice.
Spawn is an imperfect system because AI is often imperfect, despite prevailing ideas in pop culture that align the term with Earth-conquering robots. On Herndon’s newest record PROTO (out May 10), Spawn is a collaborator joining an ensemble of vocalists, artists, and writers, who join together in jovial chorus across the album. Looking to folk vocal traditions that privilege guttural, communal release, Herndon draws a direct line between the history of singing and to Spawn’s burgeoning capabilities as it begins to find its voice. Across her glitchy, choral productions, Herndon creates a community of what art might look like if people used AI not to recreate tired ideas of music-making but to celebrate its unknowable sounds, and to envision a world in which AI was taught with ethics and care.
Herndon, who recently completed her doctoral program at Stanford’s Center For Computer Research In Music And Acoustics, has explored the sticky ethical ramifications of artificial intelligence and its relationship to humankind over the course of her career as an artist. Her highly conceptual records Movement (2012) and Platform (2015) ruminated on her relationship with her laptop, musically and personally, and the latter explored issues of computer surveillance.
Here, Jezebel talks to Herndon about the perceived problems in contemporary AI, communal singing, and why we need to build ethics into our tech before it’s too late. Our conversation has been condensed and edited for clarity.
JEZEBEL: The first time listeners really heard Spawn was on the album’s song “Godmother” featuring Jlin. How have her abilities have changed or expanded since then?
HOLLY HERNDON: The voice model was the third technique that we tried. Basically, Spawn is a metaphor for all the different machine-learning techniques that we have tried. It felt like the most developed technique we tried because it was the third and I had been working with Jlin for awhile with neural networks, trying these different methods with her music. This was the first time it really made sense and I felt like it was ready to present the public. Even though “Godmother” was still kind of ugly or rough or monstrous in a way, to my ears I was really excited when I heard it for the first time because it’s not at all like how I trained it. I trained Spawn on my speech and on some singing and it was interpreting Jlin’s music it created this really weird, almost like a beat-boxing thing, which is hilarious.
Spawn really responds to more percussive sounds than anything. She loves Jlin’s music for this reason because she loves transients, when you have a sound wave on a file and it has a really big spike at the front, a major shift from no energy, silence, to a lot of energy. Anything with a big peak like that she really likes because she can understand that, she can see it going from silence to a big change. It’s almost like primary colors or shapes that kids use.
The note about being ugly, when you had released the song you had said that it felt sort of “liberating” to release something ugly. Where do you see ugliness as fitting into your work?
I think especially around this AI conversation, I feel like so much of the stuff that we see is this glossy, perfected, the sort of femme-bot. Or, not even a gendered thing, this perfected version of whatever is being presented. I think it’s a bit disingenuous. The technology is impressive and it’s cool but it’s really early still. We really wanted to be honest about that and show its mistakes and show how kind of rough the technology is still because I think it’s a more honest and more interesting, to allow it to have its own aesthetic.
[People] come up with this really exciting new technology and then [they’re] like, how do we make it sound like something we already did? It’s like the synthesizer: put a keyboard on it, make it sound like a piano! No, don’t make it sound like a piano, let it be something entirely new, it’s a new instrument. We’re not trying to have it sound like this perfect thing but allowing it to be ugly and be fucked up.
I’ve seen some other experiments with AI and music, what Sony or Google are currently doing for example. One of the first “AI songs” out of Sony was a fake Beatles song, where they had fed it all of this Beatles music to create a new song. And it reminds me of exactly what you’re talking about, where someone has this new technology but they’re trying to make it sound like the Beatles.
And that’s like, beyond. A lot of people like what you’re talking about are dealing with MIDI data and forms, so automated composing, and we were dealing with sound as material so dealing with the audio files themselves. A lot of this MIDI data stuff, it reduces music down to these key kind of parameters which are like machine-readable and then extracting the logic of whatever Beatles or Bach song or whatever and then recreating that forever. That’s really problematic for multiple reasons: one, it’s boring! [Laughs] And two, it gets us in kind of a feedback loop culturally which does not move us forward. It doesn’t respond to what’s happening now and music should be responsive to the politic and the material world around it.
There are some pretty big ethical questions there. If we can extract the kind of logic and aesthetic of a dead person and then reanimate them in future versions of themselves, they weren’t able to opt into that. We could write an entirely new catalog of Tupac music that he could perform in this voice model and we could maybe give it really fucked up lyrics that Tupac would have hated. Not only can he not opt into that, but also what does that say for our culture if we’re constantly reanimating our dead for our entertainment?
Miles Davis had this quote, even though it’s a little fucked up and problematic, but he talked about sampling in hip-hop and he called it “artistic necrophilia.” He says every generation must create a new sound for itself and I think he gets it a little bit wrong because hip-hop does do this and creates an exciting new genre. But if you take what he says and apply it to the voice model, it almost comes across as this prophetic understanding of what sampling could do; if you have an audio file you could create a sample of something else and the sample can sprout legs and run off into a different direction.
Obviously there’s a misunderstanding of the term AI, people hear it and immediately think of humanoid robots taking over jobs. But that sort artistic necrophilia you’re talking about, using these systems to recreate someone else or their sound, really feeds into those fears. It’s interesting to me that people would be so afraid of AI replacing them, but then people are creating all these projects where they’re sort of recreating or duplicating other people.
Yeah and I think that fear is very real because when you do that there’s a great entitlement there. The history of music and our shared, human, intellectual project that leads up to today, is a shared resource that we all tap into and we all learn from. So if an individual can just scrape that and then claim so much of that as their own because they hold the keys to this AI, and then they can recreate it, of course it’s going to give people anxiety because there’s an ethical issue with that. We don’t know how to share. [Laughs] I like to look at the history of sampling as a precursor to what we’re going into now, because some of these same issues came up. Once we had recording technology, musique concrète came in and Pierre Schaeffer’s grand vision [that] sound could be decontexualized from its source and you could just enjoy sound for its pure sonic qualities. That’s a really beautiful vision but this decontextualization is also one of the problems with sampling, you’re taking something away from its origins. We saw this a lot in the ’90s with Moby sampling the Alan Lomax recording series and all of the Delta Blues musicians that remain unnamed in his work.
There’s that famous “Return to Innocence” song from the ’90s and that’s a Taiwanese farmer couple [Kuo Ying-nan and Kuo Hsiu-chu] who became a folk sensation and this German band Enigma just sampled it and made it their own. Of course, they then took them to court so they ended getting paid, but there are so many examples of this where people feel this entitlement to just take some emotionally performed music that might even have some sort of religious meaning and stick a beat on it and make it their electronica thing. That’s our history that we have to contend with, but AI takes that and puts that on a super highway to insanity and there are no laws in place for how we deal with attribution.
I think as well about the tech or the devices people interact with on a daily basis, whether it’s their laptop or their phone, they don’t think about the people behind them or are teaching those devices how to interact with us especially given issues of surveillance. Has created Spawn changed the way you think about people behind systems of AI?
That’s one of the biggest problems of AI; it’s this kind of opaque, black box technology, and when we have this glossy press release where it’s like “the machine just wrote this song” you’re totally discounting all the human labor that went into the training set that the thing learns on. That was a really important part of how we set up the project and the way that we did. We wanted the people training Spawn to be visible, to be audible, to be named, to be compensated, because I think that’s a huge part of what we’re facing with this thing today.
Do you feel a strong sense of correction there?
There’a a desire we have to create a counter narrative. It’s a bit of a David and Goliath situation. I have one GPU unit and a little ensemble and we’re doing our thing and then there’s Facebook which has a data center full of machines and it’s hoovering up data from everyone all the time. There’s no comparison there but I think music and art is about storytelling and creating fantasies and hopefully opening up space for different approaches to things, so that’s where I feel like my job is.
You said before that you see Spawn almost as an extension of your own voice. What’s exciting to you about extending your voice this way?
I often joke that my best attribute is my mediocre voice because it forced me to create all these digital appendages to make it more interesting. Had I been born with, I don’t know, Adele’s voice, I’d probably just be singing but I’m like, no I have to make this shit interesting somehow. [Laughs] So it’s always been about me trying to transcend my physical limitations, whether it’s a delay line that makes me sing longer or pitch shifting, whatever it is that I’m using. It’s been about me trying to transcend the limitations of my physical body. So I feel like Spawn is kind of a development on that, if I can create different models to sing through my voice can kind of become anything.
You get asked all the time about the Cyborg Manifesto and this idea of extending one’s existence outside of one’s body. I feel like sometimes reading the Cyborg Manifesto in 2019 it’s describing what’s already happened as opposed to what will happen given everyone’s close relationship to their technology.
Haraway wrote that in the ’80s and she was very prophetic. I think her most recent writing is really applicable to what we’re doing today where she’s writing about kin. She’s looking at inhuman intelligences, animal intelligences, other kinds of intelligences, and asking humans to open up their families to other kinds of intelligences to see us as sharing this planet with other species and how can we learn from them and not see as ourselves as necessarily as the apex of intelligence. In recent years there’s been this conversation around the idea of the inhuman and it’s almost like an extension of the Cyborg Manifesto, seeing the inhuman as this way of allowing ourselves to redefine what human is always. Instead of it being this kind of fixed entity, looking at this inhuman intelligence we’re developing and reflecting on that and allowing us to update what it means to be human.
A lot of your albums have been very collaborative, but something that strikes me about PROTO is the many different voices you bring into it, working with Martine Syms and Jenna Sutela among others. What was it about this album that made you want to bring in these different vocalists?
Coming off of a pretty long tour for Platform I was noticing a lot of the electronic music scene felt like things were getting more and more fully automated. I was kind of missing the live performer in electronic music, I was seeing less and less human bodies on the stage playing live music. So I was really kind of craving that. I also get really lonely in the studio and it can just straight up being depressing to spend hours and hours a day, every day alone in the studio, so I was really wanting human contact. I know it seems like ironic or something to be working with an artificial intelligence because I want more human contact but that’s why I started working with an ensemble and then Spawn kind of came up and she joined as an ensemble member rather than as an automated composer writing music for us.
I wanted to work with human vocalists in this expanded ensemble but I also wanted to work with some writers as well who were also dealing with AI to kind of open up my own understanding of the topic. Martine Syms has been working with AI for a couple years now and she has this concept of digital versions of herself that live online that maybe some future intelligence might find and can recreate a copy of herself in some weird, mutated form because the digital self is not the actual self. And then Jenna [Sutela] is this crazy artist who has been working with decentralized computational systems in nature, so she deals with slime mold, this really fascinating organism. She’ll create this big maze and she’ll put oats in the maze and this organism, this slime mold, even though it doesn’t have a brain, will find its way to the oats because it really likes oats. That was a natural fit.
In craving that human, face-to-face singing aspect of your work, I’m curious how that will translate to your live performances.
When you start thinking about artificial intelligence you start thinking about the history of human intelligence and the evolution of society through singing and the evolution of language. We’re tapping into a lot of vocal traditions without trying to be nostalgic about it or trying to recreate any earlier vocal period. We’re still trying to figure out how to get Spawn to perform in real time. We don’t want to do any kind of fake system. I feel like there’s so much of that in AI, people say they’re doing one thing but it’s actually something else. We’re really trying to be as honest and upfront about the process as possible. The rendering time is pretty long, the shortest time we’ve had is a minute or two and it’s hard for an audience to be like, “and wait a minute!” If things work out then Spawn will go on tour. But there’s a lot of love in the ensemble. My earliest music moments were in church situations and I feel like part of me is resisting but another part is craving that kind of public ecstasy and release that I experience in singing with people in public.
What vocal traditions are you drawing on?
I was trying to be careful not to just kind of wholesale take something that’s not mine in a way but instead be inspired by things. I was really interested in this idea of dissonance, so you hear that a lot in Bulgarian folk music they’ll use intervals of seconds or sevenths and you’ll also hear that in Appalachian folk music. I wanted less of this Western idea of beautiful singing. Of course I enjoy that as well, but I also wanted to tap into a more kind of guttural and really direct approach to singing you often find with vocal traditions where they’re not concerned about the audience as much. It’s more about singing for each other and the people who are doing it rather than the audience watching it.
Other than in the choir like you mentioned, there is a real lack of communal singing in American culture.
Right and it’s usually tied to religious ceremony and now that we’re in this secular society it’s hard to find a way to celebrate and emote and sing together publicly without tapping into some kind of religious ideology that we don’t believe in. I’d love it if at our shows people would sing.
You mentioned trying to get Spawn ready for a live show. Is it hard to work with AI when people have all these expectations of what it should be or what it should look like?
I think it’s hard to communicate because there’s so much bullshit out there, so it’s hard to be like “no, this is for real!” Also, because a lot of really interesting neural network research was released around 2016, 2017, a lot of people have started to work with it because there were these breakthroughs.
What do you think it’s going to take for there to be less “bullshit” as you said?
I think it just has to become part of the public conversation. When I was doing Platform back in the day and we were talking about surveillance, capitalism, issues around privacy and social media manipulation, at shows we’d tell people leave Facebook, and then the elections come around and it’s like: you’ve been manipulated through social media! That wasn’t a moment of, I told you so, it was more of like a fuck, you guys, we knew that already. You feel a little bit like a tinfoil hat conspiracy theorist but then it becomes a part of the public consciousness when it’s too late almost? That could happen with this AI conversation, so we have to start figuring out as a society what our ethics are when it comes to this and start hard coding that into the program.
I hate to be the kind of person who reads your tweets back to you, but I was really interested in some things you were saying recently about Spotify and automated music and who that could even be a threat to given how basic some of the algorithms can be. Obviously there’s this whole conversation about “fake artists” and “lean back music” and I wonder how much that conversation is at the front of your mind navigating the music industry.
Streaming culture obviously prioritizes passive listening to active listening, because it’s a per-play valuation system. I think that’s fundamentally at odds with how music should be valued. We get into these arguments about how this platform pays more per-play than this platform and I’m like I don’t care about that, I think the logic of the per-play valuation is wrong. I don’t think that’s how culture works, I don’t think that’s how value works.
I have so many thoughts about Spotify, where to even begin? I think ultimately it’s like how all major platforms on the Internet now will just ultimately become a large ad network where a person and their listening habits, their emotional state of mind, becomes something that’s quantifiable and then marketable. I think that’s something we should all worry about. The Internet started out as this beautiful utopian dream of connecting humanity and providing humanity with access to information and we’ve turned it into a giant fucking mall. That sucks, it doesn’t have to be that way, and I see Spotify as a continuation of that logic and an even perhaps more insidious version of that because it’s tied to music which is tied to people’s emotional state.
As a musician I live in this society and I am working with the infrastructure that is available to me and we’ve been told that music isn’t valuable anymore, it’s all of the things around music. I still very much value music. I am an album artist, singles don’t capture the whole story for me, and I think Spotify is kind of obliterating the album in a lot of ways. It’s also not necessarily entirely new, whatever kind of distribution network music has been a part of has always shaped the music to some degree. The pop song came out of radio time and the LP length is the side of how much music you could fit on a two-sided wax disc. I think what is specifically new about this particular…
Yeah, whatever this particular thing is, we’re seeing a shift from the focus of the artist and their ideas and their expression to focusing on what the consumer wants. So how is the consumer feeling, what’s the heart rate of the consumer, let’s have the music be at the same heart rate, how fast does the consumer want to jog, and let’s play music that fits their running pattern. Those are fundamentally different things: what does a composer want, and how does a neoliberal worker want to spend their day and how can we lubricate that in some way.
If a system was trying to please my 16-year-old self forever I never would have developed the aesthetics I have today because my 16-year-old self wasn’t exposed to that much! You sometimes have to encounter things that make you uncomfortable or that you don’t understand in order to grow. I’m so glad I heard things I didn’t understand or scared me or that made me curious about what was going on in that person’s mind and curious about other world views. Music has to take risks in order to develop. Spotify loves to have kind of ambient [music] but ambient came from somewhere, ambient wasn’t a thing until someone took a risk. If we want to continue and develop and have whatever the next ambient is and enjoy that, we have to be able to go outside whatever the consumer expectation is.