Voice 5. Comparison with other programs

Voicing, jitter, and shimmer measurements made by Praat cannot always be compared directly with those made by other programs such as MDVP. The causes are the voicing decision strategy and the accuracy of period and peak determination.

5.1. Voicing decisions: slightly different

Different programs use very different methods for deciding whether an irregular part of the signal is voiced or not. A comparison of Boersma (1993) for Praat and Deliyski (1993) for MDVP leads to the following considerations. Both Praat and MDVP use an autocorrelation method for pitch analysis, but MDVP quantizes the amplitudes into the values -1, 0, and +1 before computing the autocorrelation, whereas Praat uses the original amplitude. Also, Praat corrects the autocorrelation function by dividing it by the autocorrelation function of the window, unlike any other program. Lastly, Praat uses sinc interpolation to compute an accurate estimate of the height of the autocorrelation peaks, unlike any other program. All three of these differences (and there are more) influence the measurement of the height of the autocorrelation peak at 1/F0. This height is generally taken as a criterion for voicing: if it is more than the voicing threshold (which you can change with Pitch settings..., the frame is considered voiced, otherwise voiceless. In Praat, the standard voicing threshold is 0.45, in MDVP it is 0.29, which suggests that MDVP tends to regard more frames as voiced than Praat. But the difference between these two numbers may partly be explained by the fact that MDVP does not correct the autocorrelation function and that MDVP does not do an accurate sinc interpolation: both of these properties cause the measured height of the peak at 1/F0 (in MDVP) to be lower than the real height, as explained by Boersma (1993).

5.2. Jitter measurements: sometimes very different

The jitter measures in various programs may yield different results, with Praat often giving much lower values than MDVP, especially for noisy sounds. I will now explain where the difference comes from. A more elaborate explanation with pictures is given in Boersma (2009a).

If a sound is computer-generated as a glottal source signal with a random period duration variation of 1 percent (around a constant F0), then filtered with the characteristics of a vocal tract configuration corresponding to a sustained vowel, both Praat and MDVP will measure this sound as having a "jitter" of 1 percent. For non-noisy jittery sginals, therefore, the two programs give equally accurate results.

If a sound is computer-generated as a glottal source signal with a constant period, then filtered with the characteristics of a vocal tract configuration corresponding to a sustained vowel, both Praat and MDVP will measure this sound as having a "jitter" of less than 0.01 percent. The two programs, therefore, have a comparable sensitivity in measuring small jitter values.

So far, the two programs give comparable results. The difference between the two programs comes when noise is added.

If a sound is computer-generated as a glottal source signal with a constant period, then filtered with the characteristics of a vocal tract configuration corresponding to a sustained vowel, and if then 1 percent additive "white" noise (a quite usual amount) is added, Praat will measure this sound as having a "jitter" of 0.02 percent, whereas MDVP will measure this sound as having a "jitter" of 0.6 percent. In other words, Praat will tell you that there is almost no jitter, whereas MDVP will tell you that the jitter is of an almost pathological level. The relevant curves can be seen in my papers "Stemmen meten met Praat" and Boersma (2009a), and the numbers are confirmed by Deliyski, Shaw & Evans (Journal of Voice, 2005: 23).

One can see that Praat's "jitter" measure attempts to separate the influence of period duration variation (which it reports as "jitter") from the influence of additive noise (which is does not report as "jitter"), and that MDVP's "jitter" measure combines the influence of period duration variation with the influence of additive noise (both of which it reports as "jitter").

The difference between Praat's and MDVP's jitter measures is due to a difference between the way in which periods are measured. Praat uses waveform-matching, in which the duration of a period is determined by looking for best matching wave shapes (a "cross-correlation" maximum). MDVP uses peak-picking instead, where the duration of a period is determined by measuring the time difference between two locally highest peaks in the wave form. The waveform-matching method averages away much of the influence of additive noise, whereas peak-picking is highly sensitive to additive noise. For detailed illustrations, see Boersma (2009a).