digitalmars.D.learn - result of FFT

Matthew (35/35) Jul 08 Hi,

Dennis (28/35) Jul 08 The magnitude of each element (computed with `std.complex.abs`)

Matthew (31/49) Jul 09 This was the key bit of information I didn't know.

Andy Valencia (6/11) Jul 09 Many moons ago I had to fix Asterisk's DTMF detection/decode.

Serg Gini (40/47) Jul 08 The result of FFT should be the same as in NumPy[1] I suppose.

Serg Gini (7/10) Jul 08 Or even better - create D library for this :)

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= (4/14) Jul 09 I believe dplug exists for this kind of thing.

Timon Gehr (89/132) Jul 08 Well, with `N=fft_size`, `fft` computes

Timon Gehr (13/22) Jul 09 As `f_0` is also real, similar to `f_(N/2)`, this should of course have

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= (14/50) Jul 09 I have not been doing DSP stuff since [2018]. I am not sure how
claptrap (27/63) Jul 10 The output is an array of "bins", each bin holds a complex
Guillaume Piolat (43/47) Jul 13 Bin 0 is energy at 0Hz

Matthew <monkeytonk115 gmail.com> writes:

Hi,

I'm writing a program where I'm trying to decode DTMF tones. I 
already completed the wave file decoder and I know I'm supposed 
to use an FFT to transform the samples from time domain to 
frequency domain but I'm stuck on determining which of the DTMF 
frequencies are present.

Consider the following, the input is the sound wave samples as an 
array of floats between -1 and +1.

```D
void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		// res is an array of 4096 complex numbers
	}
}
```

I can't figure out how the 4096 results of the FFT relate to the 
frequencies in the input.

I tried taking the magnitude of each element, I tried taking just 
the real or imaginary parts.  I plotted them but the graph 
doesn't look how I'm expecting.

What do the 4096 resulting complex numbers represent?
How should I use the result to check whether the 1209Hz, 1336Hz, 
1477Hz, or 1633Hz tones are present in that part of the sound?

Thanks,
Matthew

Jul 08

Dennis <dkorpel gmail.com> writes:

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element,

That's correct!

 What do the 4096 resulting complex numbers represent?

The magnitude of each element (computed with `std.complex.abs`) 
corresponds to the amplitude of each frequency component, the 
angle in the complex plane represents the phase (computed with 
`std.complex.arg` in radians).

The frequencies are all relative to the FFT window. `res[0]` is 0 
Hz, `res[1]` corresponds to a sine wave that fits 1 cycle inside 
your window, res[2] is 2 cycles etc. The frequency in Hz depends 
on your sample rate. If it's 44100, 44100 / 4096 = ~10 so your 
window fits 10 times in 1 second. That means res[1] is around 10 
hz, res[2] 20 hz etc. up to res[4095] which is 40950 hz. Although 
everything from res[2048] onwards is just a mirrored copy since 
44100 samples/s can only capture frequencies up to 22 Khz (for 
more info search 'Nyquist frequency' and 'aliasing').

 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, 
which is not an exact match so it will leak frequencies in all 
other bins, but it will still mostly contribute to bins 112 and 
113 so it's probably good enough to just check those. If you need 
better frequency resolution you can try applying a 'window 
function' to reduce spectral leakage or increasing the window 
size either by including more samples reducing the time 
resolution, or by padding the window with 0's which will 
essentially adds interpolated bins.

I haven't programmed pitch detection myself yet (still on my todo 
list!) so I don't know how much of this is needed for your use 
case, but you can just start by checking the closest bin and see 
how far you get. Good luck!

Jul 08

Matthew <monkeytonk115 gmail.com> writes:

On Tuesday, 8 July 2025 at 19:39:37 UTC, Dennis wrote:

 The magnitude of each element (computed with `std.complex.abs`) 
 corresponds to the amplitude of each frequency component, the 
 angle in the complex plane represents the phase (computed with 
 `std.complex.arg` in radians).

This is what I picked up from wikipedia.

 The frequencies are all relative to the FFT window. `res[0]` is 
 0 Hz, `res[1]` corresponds to a sine wave that fits 1 cycle 
 inside your window, res[2] is 2 cycles etc. The frequency in Hz 
 depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 
 so your window fits 10 times in 1 second. That means res[1] is 
 around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 
 hz. Although everything from res[2048] onwards is just a 
 mirrored copy since 44100 samples/s can only capture 
 frequencies up to 22 Khz (for more info search 'Nyquist 
 frequency' and 'aliasing').

This was the key bit of information I didn't know.

 The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, 
 which is not an exact match so it will leak frequencies in all 
 other bins, but it will still mostly contribute to bins 112 and 
 113 so it's probably good enough to just check those.

Checking the closest bins to each frequency seems to work well, 
at least with clean sound from a wav file. It remains to be seen 
how it fares against noisy real life signals. I'll likely need a 
window function or interpolation but this is a good start.

```D
void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		double[] magnitudes = res.map!(x => x.abs).array;

		auto high = [magnitudes[112], magnitudes[124], magnitudes[137], 
magnitudes[151]].maxIndex;
		auto low  = [magnitudes[65], magnitudes[72], magnitudes[78], 
magnitudes[87]].maxIndex;

		// high and low are 0,1,2,3 depending on which DTMF tone pair 
was detected
	}
}
```

Thanks,
Matthew

Jul 09

Andy Valencia <dont spam.me> writes:

On Wednesday, 9 July 2025 at 19:39:43 UTC, Matthew wrote:

 Checking the closest bins to each frequency seems to work well, 
 at least with clean sound from a wav file. It remains to be 
 seen how it fares against noisy real life signals. I'll likely 
 need a window function or interpolation but this is a good 
 start.

Many moons ago I had to fix Asterisk's DTMF detection/decode.  
Ultimately fixes (but not my code) made it into their source 
repo, and if you need guidance on a very mature DTMF treatment, 
main/dsp.c in the Asterisk source would be a solid resource.

Andy

Jul 09

Serg Gini <kornburn yandex.ru> writes:

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,
 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew

The result of FFT should be the same as in NumPy[1] I suppose.
So D's module doesn't have the functionality that you want, so 
you need to write it by yourself.

chatGPT showed this python code, that you can port to D:
```python
n = len(data)
fft_result = np.fft.fft(data)
frequencies = np.fft.fftfreq(n, d=1/fs)
magnitude = np.abs(fft_result)

dtmf_freqs = [697, 770, 852, 941, 1209, 1336, 1477]





detected_freqs = []
for target in dtmf_freqs:
     idx = np.where((frequencies > target - resolution) & 
(frequencies < target + resolution))

adjust threshold if needed
         detected_freqs.append(target)

print("Detected DTMF frequencies:", detected_freqs)
```

There is no command for `fftfreq` in D, but you can write your 
own implementation, based on the NumPy documentation[2].

Also you can be interested in some other packages and examples:
- audio-formats[3] for reading and decoding different formats
- NumPy-similar package example, which is similar to NumPy 
functionality[4]

 From my perspective - solve it in NumPy will be safer approach, 
but it should be doable in D as well.

References:
[1] 
https://numpy.org/doc/2.1/reference/generated/numpy.fft.fft.html
[2] 
https://numpy.org/doc/2.1/reference/generated/numpy.fft.fftfreq.html
[3] https://github.com/AuburnSounds/audio-formats
[4] 
https://github.com/libmir/numir/blob/master/example/audio_separation/source/app.d

Jul 08

Serg Gini <kornburn yandex.ru> writes:

On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 From my perspective - solve it in NumPy will be safer approach, 
 but it should be doable in D as well.

Or even better - create D library for this :)
It seems (after fast googling) there are plenty examples in the 
Web

https://github.com/gdereese/dtmf
https://github.com/PlasterPate/DTMF_Decoder
https://github.com/hfeeki/dtmf

Jul 08

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:

On Tuesday, 8 July 2025 at 21:10:35 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 From my perspective - solve it in NumPy will be safer 
 approach, but it should be doable in D as well.

 Or even better - create D library for this :)
 It seems (after fast googling) there are plenty examples in the 
 Web

 https://github.com/gdereese/dtmf
 https://github.com/PlasterPate/DTMF_Decoder
 https://github.com/hfeeki/dtmf

I believe dplug exists for this kind of thing.

https://github.com/AuburnSounds/Dplug
https://github.com/AuburnSounds/Dplug/tree/0e96b27db15bd0a3d6c69e4d5827175337310225/fft/dplug/fft

Jul 09

Timon Gehr <timon.gehr gmx.ch> writes:

On 7/8/25 20:11, Matthew wrote:
 Hi,
 
 I'm writing a program where I'm trying to decode DTMF tones. I already 
 completed the wave file decoder and I know I'm supposed to use an FFT to 
 transform the samples from time domain to frequency domain but I'm stuck 
 on determining which of the DTMF frequencies are present.
 
 Consider the following, the input is the sound wave samples as an array 
 of floats between -1 and +1.
 
 ```D
 void decode_sound(float[] samples)
 {
      import std.numeric;
      import std.math;
      import std.complex;
      import std.algorithm;
 
      size_t fft_size = 4096;
 
      auto f = new Fft(fft_size);
 
      foreach(ch; samples.chunks(fft_size))
      {
          auto res = f.fft(ch);
          // res is an array of 4096 complex numbers
      }
 }
 ```
 
 I can't figure out how the 4096 results of the FFT relate to the 
 frequencies in the input.
 
 I tried taking the magnitude of each element, I tried taking just the 
 real or imaginary parts.  I plotted them but the graph doesn't look how 
 I'm expecting.
 
 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, 
 or 1633Hz tones are present in that part of the sound?
 
 Thanks,
 Matthew

Well, with `N=fft_size`, `fft` computes

```
output[j]=∑ₖe^(-2πijk/N) input[k].
```

and we know that ifft gives:

```
input[k]=∑ⱼe^(2πijk/N) output[j]/N.
```

You are trying to write your input as a linear combination of sine 
waves, so we interpret `k` as time and `j/N` as frequency:

```
input[k]=∑ⱼe^(2πi k j/N)·output[j]/N.
```

Or, with `f_j[k]=e^(2πi k j/N)`:

```
input[k]=∑ⱼ(output[j]/N)·f_j[k].
```

One complication here is that both `f_j` and `output/N` are complex 
functions.

Note the relationship:

```
cos(ω·t+φ) = ½·(e^(i(ωt+φ))+e^(-i(ωt+φ))) =
½·(e^(iωt)·e^φ+e^(-iωt)·e^-φ)
```

In our setup, e^(iωt) corresponds to f_j, e^(-iωt) corresponds to 
f_(N-j), φ corresponds to the phase component of our outputs.

Interestingly, as your `input` signal is real, we actually know that 
`output[j]` and `output[N-j]` must be conjugate to each other as this is 
the only way to cancel out the imaginary parts.

This also implies that `output[N/2]` is real, which works out because 
`f_(N/2)[k]=(-1)^k`.


Putting it in practice:

```d
import std;

alias K=double;

enum sample_rate=44100;

auto fToWave(R)(size_t N,R coefficients_f){
     return iota(N/2+1).map!(j=>
         tuple!("magnitude","frequency","phase")(
             (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
             K(j)*sample_rate/N,
             std.complex.log(coefficients_f[j]).im)
         );
}

void main(){
     enum N=4096;
     auto input=wave(N,1336.0f);
     auto output=fft(input);
     auto coefficients_f=output.map!(o=>o/N);
     //auto input2=linearCombination(N,coefficients_f,f(N)); // (slow, 
just for illustration)
     //assert(isClose(input,input2,1e-7,1e-7));
     auto coefficients_w=fToWave(N,coefficients_f);
     //auto input3=reconstructWave(N,coefficients_w); // (slow, just for 
illustration)
     //assert(isClose(input,input3,1e-3,1e-3));
     auto sorted=coefficients_w.array.sort.retro;
     foreach(magnitude,frequency,phase;sorted.until!(coeff=>coeff[0]<1e-2)){
  
writefln!"%s·cos(2π·%s·t%s%s)"(magnitude,frequency,text(phase).startsWith("-")?"":"+",phase);
     }
}

// helper functions for illustration

auto time(size_t N)=>iota(N).map!(i=>K(i)/sample_rate);

auto wave(size_t N,K frequency,K phase=0.0f){
     K angularVelocity=2*PI*frequency;
     return time(N).map!(t=>cos(angularVelocity*t+phase));
}

/+
// helper functions for slow illustrations:

enum I=Complex!K(0,1);

auto f(size_t N){
     return iota(N).map!(j=>iota(N).map!(k=>exp(2*PI*I*j*k/N)));
}

auto linearCombination(R,RoR)(size_t N,R coefficients,RoR functions){ // 
compute ∑ⱼcoefficients[j]·functions[j]
     return 
iota(N).map!(k=>sum(iota(coefficients.length).map!(j=>coefficients[j]*functions[j][k])));
}

auto reconstructWave(R)(size_t N,R coefficients_w){
     auto coefficients=coefficients_w.map!(coeff=>coeff.magnitude);
     auto 
functions=coefficients_w.map!(coeff=>wave(N,coeff.frequency,coeff.phase));
     return linearCombination(N,coefficients,functions);
}
+/

```

(Note that additional processing may be helpful, as `fft` is based on 
the assumption that signals repeat with a period of `N`.)

Jul 08

Timon Gehr <timon.gehr gmx.ch> writes:

On 7/8/25 23:06, Timon Gehr wrote:
 
 auto fToWave(R)(size_t N,R coefficients_f){
      return iota(N/2+1).map!(j=>
          tuple!("magnitude","frequency","phase")(
              (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
              K(j)*sample_rate/N,
              std.complex.log(coefficients_f[j]).im)
          );
 }

As `f_0` is also real, similar to `f_(N/2)`, this should of course have 
been:

```d
auto fToWave(R)(size_t N,R coefficients_f){
     return iota(N/2+1).map!(j=>
         tuple!("magnitude","frequency","phase")(
             (j==0||j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
             K(j)*sample_rate/N,
             std.complex.log(coefficients_f[j]).im)
         );
}
```

Jul 09

Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,

 I'm writing a program where I'm trying to decode DTMF tones. I 
 already completed the wave file decoder and I know I'm supposed 
 to use an FFT to transform the samples from time domain to 
 frequency domain but I'm stuck on determining which of the DTMF 
 frequencies are present.

 Consider the following, the input is the sound wave samples as 
 an array of floats between -1 and +1.

 ```D
 void decode_sound(float[] samples)
 {
 	import std.numeric;
 	import std.math;
 	import std.complex;
 	import std.algorithm;

 	size_t fft_size = 4096;
 	
 	auto f = new Fft(fft_size);

 	foreach(ch; samples.chunks(fft_size))
 	{
 		auto res = f.fft(ch);
 		// res is an array of 4096 complex numbers
 	}
 }
 ```

 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element, I tried taking 
 just the real or imaginary parts.  I plotted them but the graph 
 doesn't look how I'm expecting.

 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew

I have not been doing DSP stuff since [2018]. I am not sure how 
DTMF is similar to a regular sound signal, but I remember a few 
things from sound signals. The FFT result is symmetrical, so you 
probably only need to take the first 2048 of 4096. A Spectrogram 
would be useful to visualize the components [2]. If this is not 
the issue, sorry and please ignore this.

1: https://en.wikipedia.org/wiki/Spectrogram
2: 
https://medium.com/ adler7210/manually-decoding-dtmf-through-spectrogram-562e4b0b99c3

2018: Kurtulmuş, F., Öztüfekçi, S., & Kavdır, İ. (2018). 
Classification of chestnuts according to moisture levels using 
impact sound analysis and machine learning. Journal of Food 
Measurement and Characterization, 12(4), 2819-2834.

Jul 09

claptrap <clap trap.com> writes:

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,

 I'm writing a program where I'm trying to decode DTMF tones. I 
 already completed the wave file decoder and I know I'm supposed 
 to use an FFT to transform the samples from time domain to 
 frequency domain but I'm stuck on determining which of the DTMF 
 frequencies are present.

 Consider the following, the input is the sound wave samples as 
 an array of floats between -1 and +1.

 ```D
 void decode_sound(float[] samples)
 {
 	import std.numeric;
 	import std.math;
 	import std.complex;
 	import std.algorithm;

 	size_t fft_size = 4096;
 	
 	auto f = new Fft(fft_size);

 	foreach(ch; samples.chunks(fft_size))
 	{
 		auto res = f.fft(ch);
 		// res is an array of 4096 complex numbers
 	}
 }
 ```

 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element, I tried taking 
 just the real or imaginary parts.  I plotted them but the graph 
 doesn't look how I'm expecting.

 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew

The output is an array of "bins", each bin holds a complex 
number, the magnitide of the complex number is the magnitude of 
that bin, and the phase of the complex number is the phase of 
that bin. The frequency the bin represents is..

(samplerate/2) * idx/res.length

where idx is the bin index. IE the first bin = 0hz, and the last 
bin is samplerate/2. And they are linearly spaced across that 
range.

This is for a "real fft", rather than a "complex fft", looks like 
that's what your doing tbh. A complex FFT you get negative 
frequencies too i think, but i never real got my head around that.

Note that FFTs aren't perfect, unless the frequencies line up 
exactly on the bin frequencies you get leakage into neighbouring 
bins. A single sine wave between two bins will show up as peak at 
those two bins and leakage tailing off to some of the bins above 
and below it. You can adjust the distribute of this leakage by 
applying a window function to the sample before you FFT it, but 
it may or may not matter in your case, if the tones your looking 
for are not too close to each other.

You might need to average a few bins around the frequency of 
interest for best results.

I'd probably just make a few test samples with each single sine 
wave, and see what the FFT bins look like, and adjust my 
detection algo based on that. Tweaking FFT length and window 
function will give you enough separation between the tones your 
interested in.

Jul 10

Guillaume Piolat <first.nam_e gmail.com> writes:

On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 What do the 4096 resulting complex numbers represent

Bin 0 is energy at 0Hz
Bin 1 to 2047 are energy at (bin * samplingRate / 4096) hz
Bin 2048 is energy at Nyquist frequency
Bin 2049 to 4095 are the energy for negative frequencies and 
conjugate of bin 2047 down to 1.

     float fftBinToFrequency(float fftBin, int fftSize, float 
samplingRate)
     {
         return (samplingRate * fftBin) / fftSize;
     }


A FFT that operates only on real numbers (aka "RealFFT") will 
thus give you 2048 + 1 bins instead of 4096; and is twice as fast.

Each of these bins carry both amplitude (quantity of energy 
similar to a frequency) and phase information, relative to the 
START of your time-domain window, if you need energy and phase at 
the center of the window you will need "zero-phase" windowing 
which is basically an offset. (See Dplug's FFTAnalyzer).

Each bin of a time-frequency transform basically answers the 
question: "How much does this signal looks like a sinusoid that 
would make k rotation inside the time-window, and with how much 
offset ?".


Now, it seems you want to carry peak picking in a spectrum.
The best method I know is called the "2nd Quinn estimator" from 
the paper:
"Estimation of frequency, amplitude and phase from the DFT of a 
time series" (1997)

Since your sound isn't complex it's going to be _way_ simpler to 
find and classify those peaks (no need for the best estimator 
either) rather than a real pitch detection that would work on 
voice.

You should resist the temptation to interpolate in the bin 
domain, it's better to oversample the FFT results with generous 
"zero-padding" instead, then using an estimation method like the 
above.


 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

1. Detect all peaks larger in magnitude from their 2 or 4 
neighbours,
2. if sufficiently above the mean energy of the spectrum
3. then optionally refine their amplitude and frequency with an 
estimator like the one above
4. from which it will very clear if 1336hz or nearby was present.

Your FFT doesn't need to operate on much more than 20ms in 
general.

Jul 13

D Programming

C/C++ Programming

Other

digitalmars.D.learn - result of FFT