AI is coming for music, too
Artificial intelligence was barely a term in 1956, when top scientists from the field of computing arrived at Dartmouth College for a summer conference. The computer scientist John McCarthy had coined the phrase in the funding proposal for the event, a gathering to work through how to build machines that could use language, solve problems…

Artificial intelligence was barely a term in 1956, when top scientists from the field of computing arrived at Dartmouth College for a summer conference. The computer scientist John McCarthy had coined the phrase in the funding proposal for the event, a gathering to work through how to build machines that could use language, solve problems like humans, and improve themselves. But it was a good choice, one that captured the organizers’ founding premise: Any feature of human intelligence could “in principle be so precisely described that a machine can be made to simulate it.”
In their proposal, the group had listed several “aspects of the artificial intelligence problem.” The last item on their list, and in hindsight perhaps the most difficult, was building a machine that could exhibit creativity and originality.
At the time, psychologists were grappling with how to define and measure creativity in humans. The prevailing theory—that creativity was a product of intelligence and high IQ—was fading, but psychologists weren’t sure what to replace it with. The Dartmouth organizers had one of their own. “The difference between creative thinking and unimaginative competent thinking lies in the injection of some randomness,” they wrote, adding that such randomness “must be guided by intuition to be efficient.”
Nearly 70 years later, following a number of boom-and-bust cycles in the field, we now have AI models that more or less follow that recipe. While large language models that generate text have exploded in the last three years, a different type of AI, based on what are called diffusion models, is having an unprecedented impact on creative domains. By transforming random noise into coherent patterns, diffusion models can generate new images, videos, or speech, guided by text prompts or other input data. The best ones can create outputs indistinguishable from the work of people, as well as bizarre, surreal results that feel distinctly nonhuman.
Now these models are marching into a creative field that is arguably more vulnerable to disruption than any other: music. AI-generated creative works—from orchestra performances to heavy metal—are poised to suffuse our lives more thoroughly than any other product of AI has done yet. The songs are likely to blend into our streaming platforms, party and wedding playlists, soundtracks, and more, whether or not we notice who (or what) made them.
For years, diffusion models have stirred debate in the visual-art world about whether what they produce reflects true creation or mere replication. Now this debate has come for music, an art form that is deeply embedded in our experiences, memories, and social lives. Music models can now create songs capable of eliciting real emotional responses, presenting a stark example of how difficult it’s becoming to define authorship and originality in the age of AI.
The courts are actively grappling with this murky territory. Major record labels are suing the top AI music generators, alleging that diffusion models do little more than replicate human art without compensation to artists. The model makers counter that their tools are made to assist in human creation.
In deciding who is right, we’re forced to think hard about our own human creativity. Is creativity, whether in artificial neural networks or biological ones, merely the result of vast statistical learning and drawn connections, with a sprinkling of randomness? If so, then authorship is a slippery concept. If not—if there is some distinctly human element to creativity—what is it? What does it mean to be moved by something without a human creator? I had to wrestle with these questions the first time I heard an AI-generated song that was genuinely fantastic—it was unsettling to know that someone merely wrote a prompt and clicked “Generate.” That predicament is coming soon for you, too.
Making connections
After the Dartmouth conference, its participants went off in different research directions to create the foundational technologies of AI. At the same time, cognitive scientists were following a 1950 call from J.P. Guilford, president of the American Psychological Association, to tackle the question of creativity in human beings. They came to a definition, first formalized in 1953 by the psychologist Morris Stein in the Journal of Psychology: Creative works are both novel, meaning they present something new, and useful, meaning they serve some purpose to someone. Some have called for “useful” to be replaced by “satisfying,” and others have pushed for a third criterion: that creative things are also surprising.
Later, in the 1990s, the rise of functional magnetic resonance imaging made it possible to study more of the neural mechanisms underlying creativity in many fields, including music. Computational methods in the past few years have also made it easier to map out the role that memory and associative thinking play in creative decisions.
What has emerged is less a grand unified theory of how a creative idea originates and unfolds in the brain and more an ever-growing list of powerful observations. We can first divide the human creative process into phases, including an ideation or proposal step, followed by a more critical and evaluative step that looks for merit in ideas. A leading theory on what guides these two phases is called the associative theory of creativity, which posits that the most creative people can form novel connections between distant concepts.

“It could be like spreading activation,” says Roger Beaty, a researcher who leads the Cognitive Neuroscience of Creativity Laboratory at Penn State. “You think of one thing; it just kind of activates related concepts to whatever that one concept is.”
These connections often hinge specifically on semantic memory, which stores concepts and facts, as opposed to episodic memory, which stores memories from a particular time and place. Recently, more sophisticated computational models have been used to study how people make connections between concepts across great “semantic distances.” For example, the word apocalypse is more closely related to nuclear power than to celebration. Studies have shown that highly creative people may perceive very semantically distinct concepts as close together. Artists have been found to generate word associations across greater distances than non-artists. Other research has supported the idea that creative people have “leaky” attention—that is, they often notice information that might not be particularly relevant to their immediate task.
Neuroscientific methods for evaluating these processes do not suggest that creativity unfolds in a particular area of the brain. “Nothing in the brain produces creativity like a gland secretes a hormone,” Dean Keith Simonton, a leader in creativity research, wrote in the Cambridge Handbook of the Neuroscience of Creativity.
The evidence instead points to a few dispersed networks of activity during creative thought, Beaty says—one to support the initial generation of ideas through associative thinking, another involved in identifying promising ideas, and another for evaluation and modification. A new study, led by researchers at Harvard Medical School and published in February, suggests that creativity might even involve the suppression of particular brain networks, like ones involved in self-censorship.
So far, machine creativity—if you can call it that—looks quite different. Though at the time of the Dartmouth conference AI researchers were interested in machines inspired by human brains, that focus had shifted by the time diffusion models were invented, about a decade ago.
The best clue to how they work is in the name. If you dip a paintbrush loaded with red ink into a glass jar of water, the ink will diffuse and swirl into the water seemingly at random, eventually yielding a pale pink liquid. Diffusion models simulate this process in reverse, reconstructing legible forms from randomness.
For a sense of how this works for images, picture a photo of an elephant. To train the model, you make a copy of the photo, adding a layer of random black-and-white static on top. Make a second copy and add a bit more, and so on hundreds of times until the last image is pure static, with no elephant in sight. For each image in between, a statistical model predicts how much of the image is noise and how much is really the elephant. It compares its guesses with the right answers and learns from its mistakes. Over millions of these examples, the model gets better at “de-noising” the images and connecting these patterns to descriptions like “male Borneo elephant in an open field.”
Now that it’s been trained, generating a new image means reversing this process. If you give the model a prompt, like “a happy orangutan in a mossy forest,” it generates an image of random white noise and works backward, using its statistical model to remove bits of noise step by step. At first, rough shapes and colors appear. Details come after, and finally (if it works) an orangutan emerges, all without the model “knowing” what an orangutan is.
Musical images
The approach works much the same way for music. A diffusion model does not “compose” a song the way a band might, starting with piano chords and adding vocals and drums. Instead, all the elements are generated at once. The process hinges on the fact that the many complexities of a song can be depicted visually in a single waveform, representing the amplitude of a sound wave plotted against time.
Think of a record player. By traveling along a groove in a piece of vinyl, a needle mirrors the path of the sound waves engraved in the material and transmits it into a signal for the speaker. The speaker simply pushes out air in these patterns, generating sound waves that convey the whole song.
From a distance, a waveform might look as if it just follows a song’s volume. But if you were to zoom in closely enough, you could see patterns in the spikes and valleys, like the 49 waves per second for a bass guitar playing a low G. A waveform contains the summation of the frequencies of all different instruments and textures. “You see certain shapes start taking place,” says David Ding, cofounder of the AI music company Udio, “and that kind of corresponds to the broad melodic sense.”
Since waveforms, or similar charts called spectrograms, can be treated like images, you can create a diffusion model out of them. A model is fed millions of clips of existing songs, each labeled with a description. To generate a new song, it starts with pure random noise and works backward to create a new waveform. The path it takes to do so is shaped by what words someone puts into the prompt.
Ding worked at Google DeepMind for five years as a senior research engineer on diffusion models for images and videos, but he left to found Udio, based in New York, in 2023. The company and its competitor Suno, based in Cambridge, Massachusetts, are now leading the race for music generation models. Both aim to build AI tools that enable nonmusicians to make music. Suno is larger, claiming more than 12 million users, and raised a $125 million funding round in May 2024. The company has partnered with artists including Timbaland. Udio raised a seed funding round of $10 million in April 2024 from prominent investors like Andreessen Horowitz as well as musicians Will.i.am and Common.
The results of Udio and Suno so far suggest there’s a sizable audience of people who may not care whether the music they listen to is made by humans or machines. Suno has artist pages for creators, some with large followings, who generate songs entirely with AI, often accompanied by AI-generated images of the artist. These creators are not musicians in the conventional sense but skilled prompters, creating work that can’t be attributed to a single composer or singer. In this emerging space, our normal definitions of authorship—and our lines between creation and replication—all but dissolve.
The results of Udio and Suno so far suggest there’s a sizable audience of people who may not care whether the music they listen to is made by humans or machines.
The music industry is pushing back. Both companies were sued by major record labels in June 2024, and the lawsuits are ongoing. The labels, including Universal and Sony, allege that the AI models have been trained on copyrighted music “at an almost unimaginable scale” and generate songs that “imitate the qualities of genuine human sound recordings” (the case against Suno cites one ABBA-adjacent song called “Prancing Queen,” for example).
Suno did not respond to requests for comment on the litigation, but in a statement responding to the case posted on Suno’s blog in August, CEO Mikey Shulman said the company trains on music found on the open internet, which “indeed contains copyrighted materials.” But, he argued, “learning is not infringing.”
A representative from Udio said the company would not comment on pending litigation. At the time of the lawsuit, Udio released a statement mentioning that its model has filters to ensure that it “does not reproduce copyrighted works or artists’ voices.”
Complicating matters even further is guidance from the US Copyright Office, released in January, that says AI-generated works can be copyrighted if they involve a considerable amount of human input. A month later, an artist in New York received what might be the first copyright for a piece of visual art made with the help of AI. The first song could be next.
Novelty and mimicry
These legal cases wade into a gray area similar to one explored by other court battles unfolding in AI. At issue here is whether training AI models on copyrighted content is allowed, and whether generated songs unfairly copy a human artist’s style.
But AI music is likely to proliferate in some form regardless of these court decisions; YouTube has reportedly been in talks with major labels to license their music for AI training, and Meta’s recent expansion of its agreements with Universal Music Group suggests that licensing for AI-generated music might be on the table.
If AI music is here to stay, will any of it be any good? Consider three factors: the training data, the diffusion model itself, and the prompting. The model can only be as good as the library of music it learns from and the descriptions of that music, which must be complex to capture it well. A model’s architecture then determines how well it can use what’s been learned to generate songs. And the prompt you feed into the model—as well as the extent to which the model “understands” what you mean by “turn down that saxophone,” for example—is pivotal too.
Is the result creation or simply replication of the training data? We could ask the same question about human creativity.
Arguably the most important issue is the first: How extensive and diverse is the training data, and how well is it labeled? Neither Suno nor Udio has disclosed what music has gone into its training set, though these details will likely have to be disclosed during the lawsuits.
Udio says the way those songs are labeled is essential to the model. “An area of active research for us is: How do we get more and more refined descriptions of music?” Ding says. A basic description would identify the genre, but then you could also say whether a song is moody, uplifting, or calm. More technical descriptions might mention a two-five-one chord progression or a specific scale. Udio says it does this through a combination of machine and human labeling.
“Since we want to target a broad range of target users, that also means that we need a broad range of music annotators,” he says. “Not just people with music PhDs who can describe the music on a very technical level, but also music enthusiasts who have their own informal vocabulary for describing music.”
Competitive AI music generators must also learn from a constant supply of new songs made by people, or else their outputs will be stuck in time, sounding stale and dated. For this, today’s AI-generated music relies on human-generated art. In the future, though, AI music models may train on their own outputs, an approach being experimented with in other AI domains.
Because models start with a random sampling of noise, they are nondeterministic; giving the same AI model the same prompt will result in a new song each time. That’s also because many makers of diffusion models, including Udio, inject additional randomness through the process—essentially taking the waveform generated at each step and distorting it ever so slightly in hopes of adding imperfections that serve to make the output more interesting or real. The organizers of the Dartmouth conference themselves recommended such a tactic back in 1956.
According to Udio cofounder and chief operating officer Andrew Sanchez, it’s this randomness inherent in generative AI programs that comes as a shock to many people. For the past 70 years, computers have executed deterministic programs: Give the software an input and receive the same response every time.
“Many of our artists partners will be like, ‘Well, why does it do this?’” he says. “We’re like, well, we don’t really know.” The generative era requires a new mindset, even for the companies creating it: that AI programs can be messy and inscrutable.
Is the result creation or simply replication of the training data? Fans of AI music told me we could ask the same question about human creativity. As we listen to music through our youth, neural mechanisms for learning are weighted by these inputs, and memories of these songs influence our creative outputs. In a recent study, Anthony Brandt, a composer and professor of music at Rice University, pointed out that both humans and large language models use past experiences to evaluate possible future scenarios and make better choices.
Indeed, much of human art, especially in music, is borrowed. This often results in litigation, with artists alleging that a song was copied or sampled without permission. Some artists suggest that diffusion models should be made more transparent, so we could know that a given song’s inspiration is three parts David Bowie and one part Lou Reed. Udio says there is ongoing research to achieve this, but right now, no one can do it reliably.
For great artists, “there is that combination of novelty and influence that is at play,” Sanchez says. “And I think that that’s something that is also at play in these technologies.”
But there are lots of areas where attempts to equate human neural networks with artificial ones quickly fall apart under scrutiny. Brandt carves out one domain where he sees human creativity clearly soar above its machine-made counterparts: what he calls “amplifying the anomaly.” AI models operate in the realm of statistical sampling. They do not work by emphasizing the exceptional but, rather, by reducing errors and finding probable patterns. Humans, on the other hand, are intrigued by quirks. “Rather than being treated as oddball events or ‘one-offs,’” Brandt writes, the quirk “permeates the creative product.”

He cites Beethoven’s decision to add a jarring off-key note in the last movement of his Symphony no. 8. “Beethoven could have left it at that,” Brandt says. “But rather than treating it as a one-off, Beethoven continues to reference this incongruous event in various ways. In doing so, the composer takes a momentary aberration and magnifies its impact.” One could look to similar anomalies in the backward loop sampling of late Beatles recordings, pitched-up vocals from Frank Ocean, or the incorporation of “found sounds,” like recordings of a crosswalk signal or a door closing, favored by artists like Charlie Puth and by Billie Eilish’s producer Finneas O’Connell.
If a creative output is indeed defined as one that’s both novel and useful, Brandt’s interpretation suggests that the machines may have us matched on the second criterion while humans reign supreme on the first.
To explore whether that is true, I spent a few days playing around with Udio’s model. It takes a minute or two to generate a 30-second sample, but if you have paid versions of the model you can generate whole songs. I decided to pick 12 genres, generate a song sample for each, and then find similar songs made by people. I built a quiz to see if people in our newsroom could spot which songs were made by AI.
The average score was 46%. And for a few genres, especially instrumental ones, listeners were wrong more often than not. When I watched people do the test in front of me, I noticed that the qualities they confidently flagged as a sign of composition by AI—a fake-sounding instrument, a weird lyric—rarely proved them right. Predictably, people did worse in genres they were less familiar with; some did okay on country or soul, but many stood no chance against jazz, classical piano, or pop. Beaty, the creativity researcher, scored 66%, while Brandt, the composer, finished at 50% (though he answered correctly on the orchestral and piano sonata tests).
Remember that the model doesn’t deserve all the credit here; these outputs could not have been created without the work of human artists whose work was in the training data. But with just a few prompts, the model generated songs that few people would pick out as machine-made. A few could easily have been played at a party without raising objections, and I found two I genuinely loved, even as a lifelong musician and generally picky music person. But sounding real is not the same thing as sounding original. The songs did not feel driven by oddities or anomalies—certainly not on the level of Beethoven’s “jump scare.” Nor did they seem to bend genres or cover great leaps between themes. In my test, people sometimes struggled to decide whether a song was AI-generated or simply bad.
How much will this matter in the end? The courts will play a role in deciding whether AI music models serve up replications or new creations—and how artists are compensated in the process—but we, as listeners, will decide their cultural value. To appreciate a song, do we need to picture a human artist behind it—someone with experience, ambitions, opinions? Is a great song no longer great if we find out it’s the product of AI?
Sanchez says people may wonder who is behind the music. But “at the end of the day, however much AI component, however much human component, it’s going to be art,” he says. “And people are going to react to it on the quality of its aesthetic merits.”
In my experiment, though, I saw that the question really mattered to people—and some vehemently resisted the idea of enjoying music made by a computer model. When one of my test subjects instinctively started bobbing her head to an electro-pop song on the quiz, her face expressed doubt. It was almost as if she was trying her best to picture a human rather than a machine as the song’s composer. “Man,” she said, “I really hope this isn’t AI.”
It was.