www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - result of FFT

reply Matthew <monkeytonk115 gmail.com> writes:
Hi,

I'm writing a program where I'm trying to decode DTMF tones. I 
already completed the wave file decoder and I know I'm supposed 
to use an FFT to transform the samples from time domain to 
frequency domain but I'm stuck on determining which of the DTMF 
frequencies are present.

Consider the following, the input is the sound wave samples as an 
array of floats between -1 and +1.

```D
void decode_sound(float[] samples)
{
	import std.numeric;
	import std.math;
	import std.complex;
	import std.algorithm;

	size_t fft_size = 4096;
	
	auto f = new Fft(fft_size);

	foreach(ch; samples.chunks(fft_size))
	{
		auto res = f.fft(ch);
		// res is an array of 4096 complex numbers
	}
}
```

I can't figure out how the 4096 results of the FFT relate to the 
frequencies in the input.

I tried taking the magnitude of each element, I tried taking just 
the real or imaginary parts.  I plotted them but the graph 
doesn't look how I'm expecting.

What do the 4096 resulting complex numbers represent?
How should I use the result to check whether the 1209Hz, 1336Hz, 
1477Hz, or 1633Hz tones are present in that part of the sound?

Thanks,
Matthew
Jul 08
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element,
That's correct!
 What do the 4096 resulting complex numbers represent?
The magnitude of each element (computed with `std.complex.abs`) corresponds to the amplitude of each frequency component, the angle in the complex plane represents the phase (computed with `std.complex.arg` in radians). The frequencies are all relative to the FFT window. `res[0]` is 0 Hz, `res[1]` corresponds to a sine wave that fits 1 cycle inside your window, res[2] is 2 cycles etc. The frequency in Hz depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 so your window fits 10 times in 1 second. That means res[1] is around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 hz. Although everything from res[2048] onwards is just a mirrored copy since 44100 samples/s can only capture frequencies up to 22 Khz (for more info search 'Nyquist frequency' and 'aliasing').
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?
The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, which is not an exact match so it will leak frequencies in all other bins, but it will still mostly contribute to bins 112 and 113 so it's probably good enough to just check those. If you need better frequency resolution you can try applying a 'window function' to reduce spectral leakage or increasing the window size either by including more samples reducing the time resolution, or by padding the window with 0's which will essentially adds interpolated bins. I haven't programmed pitch detection myself yet (still on my todo list!) so I don't know how much of this is needed for your use case, but you can just start by checking the closest bin and see how far you get. Good luck!
Jul 08
parent reply Matthew <monkeytonk115 gmail.com> writes:
On Tuesday, 8 July 2025 at 19:39:37 UTC, Dennis wrote:

 The magnitude of each element (computed with `std.complex.abs`) 
 corresponds to the amplitude of each frequency component, the 
 angle in the complex plane represents the phase (computed with 
 `std.complex.arg` in radians).
This is what I picked up from wikipedia.
 The frequencies are all relative to the FFT window. `res[0]` is 
 0 Hz, `res[1]` corresponds to a sine wave that fits 1 cycle 
 inside your window, res[2] is 2 cycles etc. The frequency in Hz 
 depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 
 so your window fits 10 times in 1 second. That means res[1] is 
 around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 
 hz. Although everything from res[2048] onwards is just a 
 mirrored copy since 44100 samples/s can only capture 
 frequencies up to 22 Khz (for more info search 'Nyquist 
 frequency' and 'aliasing').
This was the key bit of information I didn't know.
 The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, 
 which is not an exact match so it will leak frequencies in all 
 other bins, but it will still mostly contribute to bins 112 and 
 113 so it's probably good enough to just check those.
Checking the closest bins to each frequency seems to work well, at least with clean sound from a wav file. It remains to be seen how it fares against noisy real life signals. I'll likely need a window function or interpolation but this is a good start. ```D void decode_sound(float[] samples) { import std.numeric; import std.math; import std.complex; import std.algorithm; size_t fft_size = 4096; auto f = new Fft(fft_size); foreach(ch; samples.chunks(fft_size)) { auto res = f.fft(ch); double[] magnitudes = res.map!(x => x.abs).array; auto high = [magnitudes[112], magnitudes[124], magnitudes[137], magnitudes[151]].maxIndex; auto low = [magnitudes[65], magnitudes[72], magnitudes[78], magnitudes[87]].maxIndex; // high and low are 0,1,2,3 depending on which DTMF tone pair was detected } } ``` Thanks, Matthew
Jul 09
parent Andy Valencia <dont spam.me> writes:
On Wednesday, 9 July 2025 at 19:39:43 UTC, Matthew wrote:

 Checking the closest bins to each frequency seems to work well, 
 at least with clean sound from a wav file. It remains to be 
 seen how it fares against noisy real life signals. I'll likely 
 need a window function or interpolation but this is a good 
 start.
Many moons ago I had to fix Asterisk's DTMF detection/decode. Ultimately fixes (but not my code) made it into their source repo, and if you need guidance on a very mature DTMF treatment, main/dsp.c in the Asterisk source would be a solid resource. Andy
Jul 09
prev sibling next sibling parent reply Serg Gini <kornburn yandex.ru> writes:
On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,
 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew
The result of FFT should be the same as in NumPy[1] I suppose. So D's module doesn't have the functionality that you want, so you need to write it by yourself. chatGPT showed this python code, that you can port to D: ```python n = len(data) fft_result = np.fft.fft(data) frequencies = np.fft.fftfreq(n, d=1/fs) magnitude = np.abs(fft_result) dtmf_freqs = [697, 770, 852, 941, 1209, 1336, 1477] detected_freqs = [] for target in dtmf_freqs: idx = np.where((frequencies > target - resolution) & (frequencies < target + resolution)) adjust threshold if needed detected_freqs.append(target) print("Detected DTMF frequencies:", detected_freqs) ``` There is no command for `fftfreq` in D, but you can write your own implementation, based on the NumPy documentation[2]. Also you can be interested in some other packages and examples: - audio-formats[3] for reading and decoding different formats - NumPy-similar package example, which is similar to NumPy functionality[4] From my perspective - solve it in NumPy will be safer approach, but it should be doable in D as well. References: [1] https://numpy.org/doc/2.1/reference/generated/numpy.fft.fft.html [2] https://numpy.org/doc/2.1/reference/generated/numpy.fft.fftfreq.html [3] https://github.com/AuburnSounds/audio-formats [4] https://github.com/libmir/numir/blob/master/example/audio_separation/source/app.d
Jul 08
parent reply Serg Gini <kornburn yandex.ru> writes:
On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 From my perspective - solve it in NumPy will be safer approach, 
 but it should be doable in D as well.
Or even better - create D library for this :) It seems (after fast googling) there are plenty examples in the Web https://github.com/gdereese/dtmf https://github.com/PlasterPate/DTMF_Decoder https://github.com/hfeeki/dtmf
Jul 08
parent Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Tuesday, 8 July 2025 at 21:10:35 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 19:59:39 UTC, Serg Gini wrote:
 On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 From my perspective - solve it in NumPy will be safer 
 approach, but it should be doable in D as well.
Or even better - create D library for this :) It seems (after fast googling) there are plenty examples in the Web https://github.com/gdereese/dtmf https://github.com/PlasterPate/DTMF_Decoder https://github.com/hfeeki/dtmf
I believe dplug exists for this kind of thing. https://github.com/AuburnSounds/Dplug https://github.com/AuburnSounds/Dplug/tree/0e96b27db15bd0a3d6c69e4d5827175337310225/fft/dplug/fft
Jul 09
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/8/25 20:11, Matthew wrote:
 Hi,
 
 I'm writing a program where I'm trying to decode DTMF tones. I already 
 completed the wave file decoder and I know I'm supposed to use an FFT to 
 transform the samples from time domain to frequency domain but I'm stuck 
 on determining which of the DTMF frequencies are present.
 
 Consider the following, the input is the sound wave samples as an array 
 of floats between -1 and +1.
 
 ```D
 void decode_sound(float[] samples)
 {
      import std.numeric;
      import std.math;
      import std.complex;
      import std.algorithm;
 
      size_t fft_size = 4096;
 
      auto f = new Fft(fft_size);
 
      foreach(ch; samples.chunks(fft_size))
      {
          auto res = f.fft(ch);
          // res is an array of 4096 complex numbers
      }
 }
 ```
 
 I can't figure out how the 4096 results of the FFT relate to the 
 frequencies in the input.
 
 I tried taking the magnitude of each element, I tried taking just the 
 real or imaginary parts.  I plotted them but the graph doesn't look how 
 I'm expecting.
 
 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, 
 or 1633Hz tones are present in that part of the sound?
 
 Thanks,
 Matthew
Well, with `N=fft_size`, `fft` computes ``` output[j]=∑ₖe^(-2πijk/N) input[k]. ``` and we know that ifft gives: ``` input[k]=∑ⱼe^(2πijk/N) output[j]/N. ``` You are trying to write your input as a linear combination of sine waves, so we interpret `k` as time and `j/N` as frequency: ``` input[k]=∑ⱼe^(2πi k j/N)·output[j]/N. ``` Or, with `f_j[k]=e^(2πi k j/N)`: ``` input[k]=∑ⱼ(output[j]/N)·f_j[k]. ``` One complication here is that both `f_j` and `output/N` are complex functions. Note the relationship: ``` cos(ω·t+φ) = ½·(e^(i(ωt+φ))+e^(-i(ωt+φ))) = ½·(e^(iωt)·e^φ+e^(-iωt)·e^-φ) ``` In our setup, e^(iωt) corresponds to f_j, e^(-iωt) corresponds to f_(N-j), φ corresponds to the phase component of our outputs. Interestingly, as your `input` signal is real, we actually know that `output[j]` and `output[N-j]` must be conjugate to each other as this is the only way to cancel out the imaginary parts. This also implies that `output[N/2]` is real, which works out because `f_(N/2)[k]=(-1)^k`. Putting it in practice: ```d import std; alias K=double; enum sample_rate=44100; auto fToWave(R)(size_t N,R coefficients_f){ return iota(N/2+1).map!(j=> tuple!("magnitude","frequency","phase")( (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re, K(j)*sample_rate/N, std.complex.log(coefficients_f[j]).im) ); } void main(){ enum N=4096; auto input=wave(N,1336.0f); auto output=fft(input); auto coefficients_f=output.map!(o=>o/N); //auto input2=linearCombination(N,coefficients_f,f(N)); // (slow, just for illustration) //assert(isClose(input,input2,1e-7,1e-7)); auto coefficients_w=fToWave(N,coefficients_f); //auto input3=reconstructWave(N,coefficients_w); // (slow, just for illustration) //assert(isClose(input,input3,1e-3,1e-3)); auto sorted=coefficients_w.array.sort.retro; foreach(magnitude,frequency,phase;sorted.until!(coeff=>coeff[0]<1e-2)){ writefln!"%s·cos(2π·%s·t%s%s)"(magnitude,frequency,text(phase).startsWith("-")?"":"+",phase); } } // helper functions for illustration auto time(size_t N)=>iota(N).map!(i=>K(i)/sample_rate); auto wave(size_t N,K frequency,K phase=0.0f){ K angularVelocity=2*PI*frequency; return time(N).map!(t=>cos(angularVelocity*t+phase)); } /+ // helper functions for slow illustrations: enum I=Complex!K(0,1); auto f(size_t N){ return iota(N).map!(j=>iota(N).map!(k=>exp(2*PI*I*j*k/N))); } auto linearCombination(R,RoR)(size_t N,R coefficients,RoR functions){ // compute ∑ⱼcoefficients[j]·functions[j] return iota(N).map!(k=>sum(iota(coefficients.length).map!(j=>coefficients[j]*functions[j][k]))); } auto reconstructWave(R)(size_t N,R coefficients_w){ auto coefficients=coefficients_w.map!(coeff=>coeff.magnitude); auto functions=coefficients_w.map!(coeff=>wave(N,coeff.frequency,coeff.phase)); return linearCombination(N,coefficients,functions); } +/ ``` (Note that additional processing may be helpful, as `fft` is based on the assumption that signals repeat with a period of `N`.)
Jul 08
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 7/8/25 23:06, Timon Gehr wrote:
 
 auto fToWave(R)(size_t N,R coefficients_f){
      return iota(N/2+1).map!(j=>
          tuple!("magnitude","frequency","phase")(
              (j==N/2?1.0:2.0)*abs(coefficients_f[j]).re,
              K(j)*sample_rate/N,
              std.complex.log(coefficients_f[j]).im)
          );
 }
As `f_0` is also real, similar to `f_(N/2)`, this should of course have been: ```d auto fToWave(R)(size_t N,R coefficients_f){ return iota(N/2+1).map!(j=> tuple!("magnitude","frequency","phase")( (j==0||j==N/2?1.0:2.0)*abs(coefficients_f[j]).re, K(j)*sample_rate/N, std.complex.log(coefficients_f[j]).im) ); } ```
Jul 09
prev sibling next sibling parent Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,

 I'm writing a program where I'm trying to decode DTMF tones. I 
 already completed the wave file decoder and I know I'm supposed 
 to use an FFT to transform the samples from time domain to 
 frequency domain but I'm stuck on determining which of the DTMF 
 frequencies are present.

 Consider the following, the input is the sound wave samples as 
 an array of floats between -1 and +1.

 ```D
 void decode_sound(float[] samples)
 {
 	import std.numeric;
 	import std.math;
 	import std.complex;
 	import std.algorithm;

 	size_t fft_size = 4096;
 	
 	auto f = new Fft(fft_size);

 	foreach(ch; samples.chunks(fft_size))
 	{
 		auto res = f.fft(ch);
 		// res is an array of 4096 complex numbers
 	}
 }
 ```

 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element, I tried taking 
 just the real or imaginary parts.  I plotted them but the graph 
 doesn't look how I'm expecting.

 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew
I have not been doing DSP stuff since [2018]. I am not sure how DTMF is similar to a regular sound signal, but I remember a few things from sound signals. The FFT result is symmetrical, so you probably only need to take the first 2048 of 4096. A Spectrogram would be useful to visualize the components [2]. If this is not the issue, sorry and please ignore this. 1: https://en.wikipedia.org/wiki/Spectrogram 2: https://medium.com/ adler7210/manually-decoding-dtmf-through-spectrogram-562e4b0b99c3 2018: Kurtulmuş, F., Öztüfekçi, S., & Kavdır, İ. (2018). Classification of chestnuts according to moisture levels using impact sound analysis and machine learning. Journal of Food Measurement and Characterization, 12(4), 2819-2834.
Jul 09
prev sibling next sibling parent claptrap <clap trap.com> writes:
On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 Hi,

 I'm writing a program where I'm trying to decode DTMF tones. I 
 already completed the wave file decoder and I know I'm supposed 
 to use an FFT to transform the samples from time domain to 
 frequency domain but I'm stuck on determining which of the DTMF 
 frequencies are present.

 Consider the following, the input is the sound wave samples as 
 an array of floats between -1 and +1.

 ```D
 void decode_sound(float[] samples)
 {
 	import std.numeric;
 	import std.math;
 	import std.complex;
 	import std.algorithm;

 	size_t fft_size = 4096;
 	
 	auto f = new Fft(fft_size);

 	foreach(ch; samples.chunks(fft_size))
 	{
 		auto res = f.fft(ch);
 		// res is an array of 4096 complex numbers
 	}
 }
 ```

 I can't figure out how the 4096 results of the FFT relate to 
 the frequencies in the input.

 I tried taking the magnitude of each element, I tried taking 
 just the real or imaginary parts.  I plotted them but the graph 
 doesn't look how I'm expecting.

 What do the 4096 resulting complex numbers represent?
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?

 Thanks,
 Matthew
The output is an array of "bins", each bin holds a complex number, the magnitide of the complex number is the magnitude of that bin, and the phase of the complex number is the phase of that bin. The frequency the bin represents is.. (samplerate/2) * idx/res.length where idx is the bin index. IE the first bin = 0hz, and the last bin is samplerate/2. And they are linearly spaced across that range. This is for a "real fft", rather than a "complex fft", looks like that's what your doing tbh. A complex FFT you get negative frequencies too i think, but i never real got my head around that. Note that FFTs aren't perfect, unless the frequencies line up exactly on the bin frequencies you get leakage into neighbouring bins. A single sine wave between two bins will show up as peak at those two bins and leakage tailing off to some of the bins above and below it. You can adjust the distribute of this leakage by applying a window function to the sample before you FFT it, but it may or may not matter in your case, if the tones your looking for are not too close to each other. You might need to average a few bins around the frequency of interest for best results. I'd probably just make a few test samples with each single sine wave, and see what the FFT bins look like, and adjust my detection algo based on that. Tweaking FFT length and window function will give you enough separation between the tones your interested in.
Jul 10
prev sibling parent Guillaume Piolat <first.nam_e gmail.com> writes:
On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
 What do the 4096 resulting complex numbers represent
Bin 0 is energy at 0Hz Bin 1 to 2047 are energy at (bin * samplingRate / 4096) hz Bin 2048 is energy at Nyquist frequency Bin 2049 to 4095 are the energy for negative frequencies and conjugate of bin 2047 down to 1. float fftBinToFrequency(float fftBin, int fftSize, float samplingRate) { return (samplingRate * fftBin) / fftSize; } A FFT that operates only on real numbers (aka "RealFFT") will thus give you 2048 + 1 bins instead of 4096; and is twice as fast. Each of these bins carry both amplitude (quantity of energy similar to a frequency) and phase information, relative to the START of your time-domain window, if you need energy and phase at the center of the window you will need "zero-phase" windowing which is basically an offset. (See Dplug's FFTAnalyzer). Each bin of a time-frequency transform basically answers the question: "How much does this signal looks like a sinusoid that would make k rotation inside the time-window, and with how much offset ?". Now, it seems you want to carry peak picking in a spectrum. The best method I know is called the "2nd Quinn estimator" from the paper: "Estimation of frequency, amplitude and phase from the DFT of a time series" (1997) Since your sound isn't complex it's going to be _way_ simpler to find and classify those peaks (no need for the best estimator either) rather than a real pitch detection that would work on voice. You should resist the temptation to interpolate in the bin domain, it's better to oversample the FFT results with generous "zero-padding" instead, then using an estimation method like the above.
 How should I use the result to check whether the 1209Hz, 
 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the 
 sound?
1. Detect all peaks larger in magnitude from their 2 or 4 neighbours, 2. if sufficiently above the mean energy of the spectrum 3. then optionally refine their amplitude and frequency with an estimator like the one above 4. from which it will very clear if 1336hz or nearby was present. Your FFT doesn't need to operate on much more than 20ms in general.
Jul 13