Fourier Transform and Deepfake Detection, From First Principles
I was a good math student growing up. Got high grades, went to top universities, got a master’s degree. But looking back, I never really understood where any of it was used. Fourier transforms, Laplace, eigenvalues, I could solve them on paper, but I never asked the simple question: where is this actually used? I never tried to research it or connect it to anything real. As Feynman said, “You can know the name of a bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird.” I knew the names. I didn’t know the birds.
So out of curiosity, I’ve decided to go back and revisit some of these topics and see how they relate to modern innovations like generative AI, cybersecurity, and deepfake detection.
Let’s start with Fourier Transforms. You don’t need any math background to follow along. If you can picture a smoothie and a guitar, you’re good.
The Smoothie
Imagine someone hands you a smoothie. It’s blended into one color, one texture. You want to find out what fruits were used to make it, the recipe. You could guess by tasting it, but imagine a machine that could analyze it and tell you exactly what went in: 40% strawberry, 30% banana, 20% mango, 10% spinach.
That’s the Fourier Transform. It takes something that’s all mixed together and tells you exactly what individual pieces make it up. The smoothie is the mix. The fruits are the pieces. And the machine separates them without physically un-blending anything.
It works the other way too. Give someone the recipe, they can blend it back into the same smoothie. That’s the Inverse Fourier Transform. You can go back and forth between the blended version and the ingredient list.
Now Think About Sound
Same idea, applied to something you already know.
When you pluck a guitar string, it vibrates. That vibration travels through the air as a wave and reaches your ear as a note.
A thick bass string vibrates slowly, maybe 100 times per second, and you hear a deep sound. A thin string vibrates faster, maybe 1,000 times per second, and you hear a sharp, high sound. Not louder, just higher. Think of the difference between thunder (deep, slow vibration) and a whistle (sharp, fast vibration). They can both be loud or quiet. The speed of the vibration determines the type of sound, not the volume.
That “times per second” is frequency. Low frequency, slow vibration, deep sound. High frequency, fast vibration, high pitch.
When you strum a chord, multiple strings vibrate at the same time. Your ear receives all of it as one combined wave. But your brain somehow pulls it apart. You hear the bass and the treble separately, even though they arrived together.
Your brain is doing a Fourier Transform. Taking one complex wave and separating it into individual frequencies. The same thing the smoothie machine does, just with sound waves instead of fruit.
What Does This Have to Do With Images?
Sound is a wave that changes over time. That makes sense. But an image is just pixels on a screen. It doesn’t move. So where are the waves?
Think about a photo of a clear blue sky. The color barely changes from one pixel to the next. Smooth, gradual. If you graphed the pixel brightness from left to right, you’d get a nearly flat line.
Now think about a photo of a striped shirt. The color alternates rapidly: white, blue, white, blue, over and over. The pixel values are changing fast, jumping back and forth with every few pixels.
That’s frequency in an image.
Low frequency means things change slowly across the image. A clear sky, a blurred background, a smooth wall. The color stays roughly the same from one pixel to the next.
High frequency means things change fast. The sharp edge where your face ends and the background begins, that’s a sudden jump in color. Individual hair strands against skin, each one is a rapid back-and-forth between dark and light, just like the striped shirt.
Every image is a mix of these, just like every chord is a mix of notes. The Fourier Transform separates an image into its frequency components and gives you a map (also called a frequency spectrum) showing which ones are present and how strong they are.
What Does That Map Look Like?
When you run the Fourier Transform on an image, you get a new image called a frequency map. Think of it as a layout where:
- Low frequencies sit in the center
- High frequencies sit around the edges
- The brighter a spot is, the more of that frequency exists in the original image. Dark means that frequency is barely there.
Brightness doesn’t mean low or high frequency. It means “how much.” A bright spot in the center means the image has a lot of smooth, gradual areas. A bright spot near the edges means the image has a lot of sharp detail.
So a landscape photo (mostly smooth sky and soft ground) will have a bright center, because most of the image is low frequency. A photo of a striped shirt would have bright edges, because most of the image is rapid back-and-forth detail.
Most real-world photos have a mix of both, and the pattern on the map looks organic and a bit random, because real scenes have a natural, irregular combination of smooth areas and sharp details.
Now that you know what the Fourier Transform does and what its output looks like, let’s look at where this actually gets used.
Where Fourier Shows Up in the Real World
This idea of breaking things into their individual pieces is one of the most widely used concepts in engineering. Once you see the pattern, you’ll notice it everywhere.
Music and audio. When you adjust the equalizer on Spotify, you’re working with frequencies. Slide the bass up, you’re making the low frequencies louder. Slide the treble down, you’re making the high frequencies quieter. You’re reaching into the smoothie and adjusting individual ingredients. Noise-canceling headphones do something similar: they figure out the frequency of the noise around you, generate the exact opposite wave at that frequency, and the two cancel each other out. Silence.
Medical imaging. This one surprised me. MRI (Magnetic Resonance Imaging) machines don’t take pictures the way a camera does. A camera captures light. An MRI uses magnetic fields to capture frequency data from your body. It collects the “recipe” first, then uses the Fourier Transform to build the “smoothie,” which is the image your doctor looks at. The image doesn’t exist until Fourier constructs it. Without this math, there’s no MRI.
Telecommunications. When you’re on a phone call while streaming Spotify and loading a webpage, all of that data travels through the same connection. How? Each signal gets assigned a different frequency, and they all travel together. On the receiving end, Fourier separates them back out. Your phone gets the music, the voice, and the webpage data, all from one combined signal. Same principle as your brain separating bass from treble in a chord.
Image compression. When you take a photo on your phone, the original file is huge. To make it smaller for storage and sharing, your phone saves it as a JPEG. Here’s how that works: it runs a version of Fourier on the image to separate it into low-frequency parts (the big shapes and colors) and high-frequency parts (tiny details like individual skin pores or the grain in a wooden table). Then it throws away the high-frequency parts that your eyes wouldn’t notice anyway. The photo looks almost the same, but the file goes from 10MB to 500KB. You’re removing ingredients from the smoothie that you wouldn’t taste.
Earthquake detection. When the ground shakes, it’s not just one vibration. It’s a mix of different waves at different frequencies, some from the earthquake, some from traffic, construction, ocean waves. Seismologists use Fourier to separate the earthquake signal from all the background noise, the same way you’d separate strawberry from spinach in the smoothie.
Same pattern every time. Take something complex, break it into its pieces, work with them individually.
What Are Deepfakes and Why Should You Care?
A deepfake is media generated or manipulated by AI to show something that never happened. A politician saying words they never said. A CEO on a video call authorizing a wire transfer. A celebrity in content they were never part of.
The technology behind deepfakes has evolved fast. Earlier deepfakes were built using Generative Adversarial Networks, or GANs. A GAN is an AI system designed to generate realistic fake content. It works by having two neural networks compete: one generates fake images, the other tries to spot them. They train together, pushing each other to improve until the fakes are impossible to tell apart from real ones. More recently, diffusion models have become the dominant approach. A diffusion model works differently from a GAN. Instead of two networks competing, it starts with random static (think of a TV with no signal) and learns to gradually remove the noise, step by step, until a clear image appears. This is the technology behind tools like Stable Diffusion, DALL-E, and Midjourney. The technique varies, but the result is the same: AI-generated content that looks real.
The real-world impact is already here.
In 2024, a finance worker in Hong Kong transferred $25 million after a video call with what appeared to be the company’s CFO and several colleagues. Every person on that call was a deepfake.
That same year, a fake robocall impersonating President Biden told New Hampshire voters not to vote in the primary. Deepfake audio and video targeting political leaders has shown up in elections across dozens of countries.
Reported deepfakes went from roughly 500,000 in 2023 to an estimated 8 million in 2025. A 1,500% increase in two years.
And maybe the scariest part isn’t the fakes themselves. Once people know deepfakes exist, they start doubting real content too. “That video of me? That’s a deepfake.” Plausible deniability for everyone.
So how do you catch something that looks perfect to the human eye?
How Fourier Catches Deepfakes
A real photo and a deepfake might look identical. Same face, same lighting, same expression. Just looking at them, you can’t tell the difference.
But run both through the Fourier Transform, and the frequency map tells a different story.
Real Photos Have a Natural Frequency Pattern
When a camera takes a photo, here’s what actually happens. Light from the sun or a lamp hits a surface, say someone’s face. That light bounces off and travels into the camera lens. The lens focuses it onto a sensor. Think of the sensor as a grid of millions of tiny buckets, each one catching light. How much light lands in each bucket becomes one pixel. Millions of pixels together make the photo.
The key difference from AI-generated images is that this whole process is physical. The light is real, the reflections are real, the sensor has tiny random electrical noise. The lens has subtle imperfections. The scene itself has natural randomness: irregular skin texture, uneven hair, tiny wrinkles that aren’t perfectly symmetrical.
All of this produces a frequency map that looks organic. The high-frequency parts, the fine details and textures, have a natural, somewhat random distribution. No repeating patterns, no suspicious regularity. Like the difference between a forest where trees grow wherever they happen to grow, and a tree farm where they’re planted in perfectly spaced rows.
Deepfakes Look Different in the Frequency Map
AI-generated images are built through a different process, and that process leaves fingerprints.
The upsampling problem. Most GANs generate a small image and then scale it up to a much larger size. That scaling step is called upsampling.
Think about what that means. You have a small image and need to make it bigger. You need to fill in pixels that don’t exist yet. The network has to invent values between the ones it already has.
The common way to do this applies a learned pattern across the image at regular intervals. Same pattern, stamped repeatedly, like a rubber stamp pressed in a grid. In the image itself, you can’t see it. The face looks fine.
But in the frequency map, repetition creates spikes. If something repeats every N pixels, it produces a sharp peak at that frequency. Like a drummer hitting a perfectly steady beat. In the music, it blends in. In the frequency map, it’s an unmistakable spike.
Real photos don’t have these repeating spikes. Real scenes don’t repeat at precise mathematical intervals. So when the Fourier Transform shows regular peaks, something artificial is going on.
The missing texture problem. Real skin has pores. Real hair has strands that catch light differently. Real fabric has a weave with tiny imperfections. All of this shows up as rich, complex detail in the high-frequency part of the map.
GANs are good at large-scale features like face shape and eye color. But fine-grained texture is often too smooth, too uniform. The high-frequency region comes out suspiciously clean.
Like a counterfeit bill. The design looks right, but under a magnifying glass, the micro-printing is blurry. The Fourier Transform is that magnifying glass.
The checkerboard pattern. That rubber stamp from above? When it overlaps with itself slightly on each pass, it creates a faint grid pattern across the image, like a checkerboard. You can’t see it by looking at the photo. But in the frequency map, it lights up clearly.
What about newer techniques? GANs aren’t the only way to make deepfakes anymore. Newer tools like Stable Diffusion and DALL-E take a different approach. Instead of building a small image and scaling it up, they start with random static (like a TV with no signal) and clean it up step by step until a face appears. Different process, so no rubber stamp pattern. But the cleanup isn’t perfect either. It leaves its own traces in the frequency map, just subtler ones. Fourier still catches them.
How Detection Works
The pipeline is straightforward:
- Take the suspect image.
- Run the Fourier Transform on it. This converts the image into its frequency map.
- Look for signs of AI generation: repeating peaks, suspiciously clean high-frequency regions, checkerboard patterns.
- A trained AI examines the map and makes the call: real or fake.
Research from 2024 showed this hitting over 99% accuracy on GAN-generated faces. The frequency map makes the invisible visible.
What About Video? Fourier in Three Dimensions
Everything above applies to a single image. But deepfakes are usually videos. And video gives Fourier one more thing to work with: time.
A video is just a sequence of images played back fast enough to look like motion, typically 24 or 30 per second. For a real person on camera, things change smoothly between frames. Your head turns gradually, your expression shifts naturally, even a quick blink follows a smooth curve. That’s because real movement follows physics.
Deepfakes don’t simulate physics. The AI predicts what each frame should look like based on patterns it learned during training, and those predictions aren’t always perfectly consistent. A single hair might shift by one pixel between frames. A skin pore might appear, disappear, and reappear. The boundary between the face and the background might wobble slightly. You won’t notice any of this. Your brain smooths it out, the same way it ignores minor video glitches.
But Fourier sees it. For a single image, Fourier analyzes how things change across the width and height. For video, it adds time as a third dimension. Now it can measure how fast things change from one frame to the next.
Real faces change slowly between frames, like a smooth wave. Deepfake faces have rapid, erratic changes between frames, like static. Fourier picks up that static as noise that shouldn’t be there.
The Heartbeat Test
Your heart beats roughly once per second, somewhere between 60 and 90 beats per minute for most people. Every time it beats, blood rushes to your face and changes your skin color by a tiny amount. You can’t see it. Nobody can. But it’s there.
Researchers discovered that if you record someone’s face on video and track the average color of their skin over time, you get a faint rhythmic pattern that matches the heartbeat. Run a Fourier Transform on that signal, and you see a clear peak at the person’s heart rate, around once per second.
Deepfakes don’t have a heartbeat. The AI copies the appearance of skin but doesn’t simulate blood flow underneath it. So when you run the same analysis on a deepfake video, the heartbeat peak is either missing entirely or replaced by random noise.
The Fourier Transform turns a video of a face into a frequency map, and then checks: is there a heartbeat in there? If yes, probably real. If no, probably fake.
It’s not about looking for a glitch you can see. It’s about listening for a pulse that should be there.
Other Ways to Catch Deepfakes
Fourier isn’t the only way to detect deepfakes. Some detectors skip the frequency map entirely and look at the image directly for visual problems: blending around the edges of the face, lighting that doesn’t match, skin texture that changes between the face and neck. Others step back and check the big picture: do the reflections in the eyes match the room? Is the shadow direction consistent across the whole image? The best detectors today combine all of these, frequency analysis, visual checks, and big-picture consistency, and run them at the same time. A deepfake might fool one check, but fooling all of them at once is much harder.
The Cat and Mouse Game
This isn’t a solved problem. As detectors get better, the AI that generates deepfakes gets better too. Early deepfakes left obvious traces in the frequency map. Newer ones have learned to reduce them.
Researchers keep adapting. They train detectors to not rely on just one type of flaw, so they can catch fakes from AI models they’ve never seen before. They also deliberately try to fool their own detectors during training to make them tougher.
Nobody expects a perfect detector. The goal is to stay one step ahead.
Stepping Back
Joseph Fourier came up with this in 1822 to study how heat moves through metal. He couldn’t have imagined it would help catch AI-generated fake videos of world leaders two centuries later.
That’s what I find interesting about revisiting these fundamentals. The concepts don’t expire. They just keep finding new uses. Fourier’s core insight, that any complex signal is really just simple waves combined, works as well today as it did two hundred years ago.
References
- Discrete Fourier Transform in Unmasking Deepfake Images - MDPI, 2024.
- Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Domain Learning - AAAI 2024.
- A Deepfake Detection Algorithm Based on Fourier Transform of Biological Signal - ResearchGate, 2024.
- A Review of Deep Learning-based Approaches for Deepfake Content Detection - arXiv, 2024.
- The Danger of Deepfakes to Democracy - Brennan Center, 2024.
- 2024 Deepfakes and Election Disinformation Report - Recorded Future.
- Deepfakes are here to stay and we should remain vigilant - World Economic Forum, 2025.
- AI Fake News 2025: Cases and Detection - Fake Off, 2025.
- An Introduction to the Fourier Transform: Relationship to MRI - American Journal of Roentgenology.