How to Steal a Whale

0

Maybe that isn’t quite the right description of what I’ve been asked to explain lately, but bear with me. It has to do with a concern and curiosity I see more and more among colleagues.
It boils down to this: if an AI can tell a blue whale from a bird song, can it catch a music thief?

There is a quietly amusing irony at the heart of modern AI research. Google’s Perch 2.0 model, trained almost entirely on bird sounds, can accurately identify whale vocalizations. Not only does it recognize that whales exist, it can even distinguish between types. It can tell a Southern Resident killer whale from a Northern Resident killer whale without ever having formally studied either one.

If you are uploading fifty thousand AI-generated tracks a day to streaming platforms as part of a fraud operation, that should be deeply unsettling.

What whales and fake playlists have in common

The mechanism behind Perch 2.0 is transfer learning.

A bird call and a whale song are both pressure waves. Both decompose through a Fast Fourier Transform into their component frequencies. Both produce spectrograms — visual maps of sound broken into frequency bands over time — carrying structural signatures: harmonic overtones, rhythmic spacing, and tonal envelope shapes.

An AI trained to distinguish thousands of bird species develops a generalized understanding of acoustic structure that naturally transfers to any sound sharing the same physical grammar.

Music is no different.

A guitar riff, a synthesized bassline, a generated melody, or a stolen chorus all travel as pressure waves and break down into spectrograms. A large model trained on millions of musical works creates an embedding space — a multi-dimensional numerical representation of acoustic structure — where true stylistic relationships cluster together and anomalies drift apart. When a derivative work is analyzed, its embedding often lands strikingly close to the original. Even if the tempo is changed, the key shifted, or the mix compressed to hide the theft, semantic closeness remains visible.

The scale of the problem

Fraud detection cannot rely on human reviewers. In blind tests, 97% of consumers fail to distinguish AI-generated music from human-created music by ear. Automated analysis, which takes less than 200 milliseconds per track and achieves over 94% detection accuracy, is the only scalable solution.

Detecting the generator, not just the copy

Different AI music platforms leave distinct spectral signatures. Suno AI produces vocal artifacts in the 2–5 kHz range. Udio shows phase-coherence anomalies at 32-second boundaries. These are like whale dialects — patterns that reveal not just what was created, but which system created it.

Making the spectrogram stick

Identifying infringement is only half the battle. Each track needs to be registered with a permanent blockchain record that includes a cryptographic identifier, ownership data, and an acoustic fingerprint captured at creation. When a later upload lands suspiciously close in embedding space to a registered work, the timestamped blockchain record provides proof for enforcement.

What the whales are actually telling us

Perch 2.0 shows us that intent to deceive leaves lasting traces in embedding space. Post-processing cannot erase structural relationships in sound. Streaming fraud has long relied on scale to hide illicit activity. Transfer learning removes that assumption.

A model trained on the acoustic grammar of music can process spectrograms, generate embeddings, measure distances in vector space, and flag anomalies continuously — all in under 200 milliseconds per track.

Over the years, I have learned to expect surprises in copyright. I did not expect whales to teach anyone about it. But here we are.

And perhaps that is the most reassuring part: the same technologies that make imitation easier may also make accountability inevitable.

#MusicTech #Copyright #AI