🔬 How the AI Music Detector Works

A step-by-step breakdown of the pipeline behind our free AI music detector — from Web Audio decoding through spectrum averaging, residual features, and the client-side classifier — with notes on lossless vs MP3 behavior.

1
📂

Step 1: Upload Your Audio File

Drag and drop or select an MP3, WAV, FLAC, OGG, or M4A file. Files are processed entirely in your browser — nothing is uploaded to any server.

2
🔄

Step 2: Browser Decodes Audio Buffer

The Web Audio API decodes your file into a raw PCM buffer, giving access to the full waveform data for analysis — including both stereo channels and the complete frequency range.

3
📡

Step 3: Average spectrum & detrending

We compute an averaged magnitude spectrum (Hann-windowed FFT, 8192-point frames), convert to log magnitude, and subtract a sliding local minimum to emphasize narrow residual structure over broad musical form.

4
🧮

Step 4: Feature vector extraction

Codec-scale peaks, baseline bands, and residual statistics are combined into a fixed-length feature vector (24 dimensions) designed to separate AI codec fingerprints from typical human recordings — especially on cleaner masters.

5
⚖️

Step 5: On-device classifier

modelWeights.json supplies either an AdaBoost ensemble of decision stumps or a linear fallback. Scoring runs entirely in JavaScript so your audio never leaves the page.

6
🎯

Step 6: Probability & evidence

You get a model probability plus strong vs weak evidence. When lossy compression creates broadband spectral speckle, background peak rate is high and the UI marks results as inconclusive rather than over-confident.

🧪 The Science Behind AI Music Detection

📊 Why Spectral Flatness Reveals AI Music

Spectral flatness (also called Wiener entropy) is a measure of how uniform the energy distribution is across the frequency spectrum. A perfectly flat spectrum has a spectral flatness of 1.0 (pure white noise), while a pure sine wave has near-zero flatness. Human music occupies a complex middle ground, with characteristic peaks corresponding to fundamental frequencies and harmonics, surrounded by micro-variations from room acoustics, bow friction, breath noise, and finger placement.

AI audio generation models — particularly diffusion-based vocoders used by Suno and Udio — synthesize audio by predicting spectral tokens from a learned distribution. This process, while impressive, produces spectral envelopes that are statistically smoother than physically-produced sound. The variance in spectral flatness across a track is a key discriminative signal in our AI music detector algorithm.

🎼 Fourier Artifacts: The Mathematical Fingerprint

Research presented at ISMIR 2025 (International Society for Music Information Retrieval) identified a remarkable finding: AI music generators leave periodic artifacts in the Fourier domain that arise directly from the architecture of neural audio codecs. When AI models compress and reconstruct audio using discrete codebook tokens (as in EnCodec or SoundStream), the quantization process introduces regular patterns in the frequency domain — mathematical fingerprints that are absent in recordings of physical instruments and human voices.

These Fourier artifacts appear as subtle periodic peaks in the high-frequency range (typically 12–20kHz) that don't correspond to any musical harmonic series. Our detector's harmonic artifact analysis specifically targets this frequency band, comparing the energy distribution pattern against known profiles of AI-generated and human-recorded audio.

🥁 Microtiming: The Human Feel an AI Can't Fake

Perhaps the most intuitive feature for non-technical users: human musicians never play perfectly in time. A professional drummer playing at 120 BPM will land individual beats anywhere from 5 to 30 milliseconds ahead of or behind the theoretical perfect grid position. This "microtiming" creates the characteristic groove that distinguishes a live performance from a quantized MIDI sequence — and from AI-generated music.

AI music generators, by design, produce rhythmically quantized outputs because they generate audio in chunks and token sequences that align to regular time grids. The coefficient of variation (CV) of inter-onset intervals is a reliable discriminative metric: human recordings typically show CV values of 0.05–0.20, while AI-generated rhythms show CV values often below 0.03. Our AI music detection algorithm uses onset detection followed by interval statistics to quantify this human-vs-machine rhythmic signature.

🔗 Stereo Correlation: The Room That Wasn't There

When music is recorded in a physical space, the acoustic environment creates natural decorrelation between left and right microphone channels. Sound waves arrive at slightly different times, bounce off walls and surfaces differently, and interact with microphone capsule proximity effects — all producing a stereo field with measurable cross-channel independence. Even direct-injection recordings (guitar plugged straight into a mixing desk) show deliberate stereo processing that introduces channel differences.

AI-generated stereo audio frequently shows near-perfect left-right correlation because the model generates both channels from the same latent representation without a physical acoustic environment to create natural decorrelation. A Pearson correlation coefficient above 0.95 between stereo channels is a strong indicator of synthetic origin. Combined with other features, this contributes 15% of the total AI music detector confidence score.

⚠️ Limitations and Honest Caveats

No AI music detector — including ours — achieves perfect accuracy. Several factors can reduce reliability: heavy lossy compression (MP3 at 128kbps or below) degrades the high-frequency artifacts that are key discriminative signals; some AI tools (especially those trained on large datasets with sophisticated post-processing) can produce more naturalistic acoustic profiles; and some human music (particularly electronic, quantized, or heavily produced pop) may exhibit features that superficially resemble AI generation.

Our tool provides a probabilistic estimate, not a legal determination. For professional copyright disputes or formal platform submissions, consult a qualified audio forensics specialist. We recommend using our detector as a first-pass screening tool and treating Medium-confidence results as inconclusive.

⚖️ Detection Algorithm Weights

📊 Spectral Flatness Analysis 25%
🥁 Rhythmic Deviation (Microtiming) 25%
📉 Dynamic Range Variance 20%
🎼 Pitch & Harmonic Artifacts 15%
🔗 Stereo Channel Correlation 15%

📺 AI Music Artifacts Explained

▶ Audio Fingerprinting vs AI Detection — Why Content ID Can't Prove AI Music

▶ Using AI to Detect AI Music

❓ Frequently Asked Questions

What is spectral flatness and why does it matter for AI detection? +

Spectral flatness measures how noise-like versus tone-like an audio signal is. AI-generated music often shows unnaturally smooth spectral transitions because neural networks synthesize audio token-by-token, lacking the micro-variations caused by physical instruments, room acoustics, and human performance imperfections.

Can AI music generators fool the detector? +

Advanced AI models are continuously improving, and some high-end outputs may score lower on AI probability. However, fundamental physical constraints — like the absence of real mechanical resonance and natural acoustic decay — still leave detectable traces in frequency analysis.

What is a Fourier artifact in AI music? +

Research published at ISMIR 2025 found that AI music generators produce specific periodic patterns in the Fourier domain — essentially mathematical fingerprints in the frequency spectrum that arise from the way neural audio models quantize and reconstruct sound waveforms.

Does audio compression affect detection accuracy? +

Yes. Heavy lossy compression (MP3 at 128kbps or below) can partially mask AI-specific artifacts. For the most reliable results, analyze uncompressed WAV or FLAC files. If only compressed formats are available, the tool will flag reduced confidence in the result.