The results showed that Gruberova's rendering of the cadenza was generally
preferred and thought to express more "tender passion" and "sadness" than the
other singers by the judges in this study. Acoustically, Gruberova's voice was
markedly different from all the other singers, showing lower energy in a pre-
dicted singer's formant band and stronger high frequency energy components in
the spectrum. Her voice was like that of Sutherland and Callas in being
characterized by high energy in the F0 band and little variation in lower
Reviewers of the above study raised the possibility that the strong higher fre-
quency energy measured for Gruberova (which might have affected the judges
ratings) could be due to the sound engineers' selective boosting of specific fre-
quency bands. To study this possibility, we recorded the two excerpts of the ca-
denza live in Gruberova's dressing room before a performance of Lucia at the
Zurich opera, having obtained the artist's cooperation for this study. In this
paper we compare the acoustic analyses for this recording with several
professional recordings - a new CD recording and a radio broadcast from a
concert hall, in addition to the cassette recording used in the earlier study. The
recording of Sutherland (1971 CD) from the original study is also included for
This paper reports the acoustic results and reviews the role of the higher fre-
quency spectral energy bands and the singer's formant for the expression of
emotion in speech and singing.
Table 1. Acoustic analysis results for the five recordings . Rows 1-3 are given in
dB, normalized with respect to overall intensity. Rows 3-7 are given in Hertz.
Gruberova Sutherland Cassette Live CD Radio CD Singers Formant -36 -46 -32 -36 -36 3500-10000 Hz. -28 -32 -36 -27 -36 3500-4500 Hz. -29 -32 -32 -27 -38 Peak 3 Mean 3779 3618 3546 3797 3443 Peak 3 Std. Dev. 236 399 321 229 355 Peak 4 Mean 4982 4826 5427 4934 5320 Peak 4 Std. Dev. 1640 1253 1981 1427 403
With the exception of the CD recording, the energy in the high frequency band
(3500 to 10000 Hz.) is higher in the Gruberova recordings than in the recording
of Sutherland (Figure 1).
Figure 1. Average spectra for Sutherland (broken line) and live recording of
Gruberova (solid line). The spectra are normailized to overall intensity.
The CSL LPC formant tracking program was then applied to each recording,
yielding means and standard deviations of the frequencies of tracked formants
(Table 1, rows 4 to 7). The figures for the fourth tracked formant are included
for purposes of comparison with the data from the original study. The figures
pertaining to this formant should be regarded with some caution however, as the
formant was not consistently identified by the tracking algorithm and the number
of valid samples varied substantially between the different recordings.
The formant figures also show general agreement with the original study.
Specifically, the recordings of Gruberova consistently show a third formant
located at a higher frequency than that of Sutherland. The recordings of
Gruberova also have a higher standard deviation of the fourth formant than the
Sutherland recording, as in . Contrary to the previous study however, there
was no significant difference between the two singers in the mean positions of
the fourth formant.
An examination of the relationship between the different measured parameters
can help determine the nature of the measured high frequency energy.
Correlations were calculated between the two formant frequencies and the
energy in the first two frequency bands. It was found that the frequency of the
third peak correlates strongly with the amount of energy in the spectrum above
3500 Hz (Pearson's r = 0.97). Examination of the long term average spectrum
for the five recordings indicates that the amplitude decreases sharply above 4500
Hz., indicating that energy in this region does not contribute substantially to the
high frequency band measured. Thus it would seem that the large amount of
high frequency energy in the Gruberova recordings is due to the higher
frequency position of the third spectral peak. This was confirmed by measuring
the energy in the frequency range from 3500 to 4500 Hz. (Table 1, row 3). All
the Gruberova recordings were characterized by more energy in this band than
the recording of Sutherland. Importantly, Sutherland's (and the other Dive
studied in ) third formant is located under the 3500 Hz. cutoff for the two
measured high frequency bands.
Figure 2. Normalized spectrograms of Sutherland (top) and Gruberova
An examination of spectrograms can be used to better understand the nature of
the formant structure in the different recordings. Figure 2 shows the spectro-
grams for the live recording of Gruberova and the recording of Sutherland (only
one spectrogram of Gruberova is displayed here, although spectrograms of the
other Gruberova recordings were very similar, showing concentrations of
energy in the same regions). The spectrograms reveal quite a different spectral
energy distribution for Gruberova as compared to Sutherland. Gruberova shows
a concentration of energy in two closely spaced bands between 2900 and 4100
Hz., with relatively low energy in the 1500 to 2500 Hz. range. In comparison,
Sutherland shows a more constant spectral slope, with three bands or reducing
energy located between 1500 and 3900 Hz. It is also apparent that the automatic
formant tracking program was not able to distinguish between the third and
fourth formants of both singers, thus compounding the two into one measured
formant track (the third formant as given in Table 1, rows 4 and 5).
An explanation of the high energy region in the recordings of Gruberova might
be the presence of a singer's formant centred at about 3600 Hz. As discussed in
, the singer's formant is not a single formant as such, but rather a clustering
of formants around a predicted frequency of about 3000 Hz (in sopranos).
When clustered sufficiently closely, individual formants tend to reinforce each
other, leading to a spectral region with increased overall resonance. In the case
of soprano singing, the partials are spaced widely apart, which makes the exact
positions of the formants relative to the partials crucial.
Whilst most sopranos may be able to vary the formants to follow the positions
of the harmonics, the way in which this is done may vary between singers. Thus
some singers might raise the fourth formant in order to make it coincide with a
harmonic, thus separating it from the lower formants, which typically might
drop (, pp. 125-129). Such a separation of third and fourth formants, which
would prevent the development of a singer's formant, would seem to be the case
with Sutherland. In the recordings of Gruberova, however, the fourth formant
drops along with the third formant, thus maintaining a close distance and al-
lowing the formants to reinforce. The presence of a singer's formant will not
necessarily ensure high energy in that region of the spectrum; the spectral drop-
off of the harmonics must also be sufficiently gradual.
The question of whether sopranos possess a singers formant has been discussed
recently by Berndtsson and Sundberg . Berndtsson and Sundberg compared
the classification by trained judges of synthesized soprano voices for various
manipulated singer's formant positions. Also included in the study was one
recording resynthesized using the formant positions from a professional
soprano. The study found that perceived quality of the synthesized voices in-
creased as the centre frequency of the singer's formant increased. The recording
resynthesized from the professional soprano's formant positions was, however,
judged as natural as the best of the synthesized recordings, despite its lack of a
The strong correlation found in  between the proportion of energy above
3500 Hz. and judgments of emotional expressivity may well be due to the pres-
ence of a singer's formant at about 3600Hz. This would fit well with the results
using synthesized recordings in . As that study only examined singer's
formant positions up to 3500 Hertz, it was unclear whether even higher posi-
tions for the singer's formant might be judged even better. This study indicates
that perceived quality and expressivity might continue to increase with an even
higher singer's formant. The finding in  that the resynthesized recording with
no singer's formant was judged to be as natural as the synthesized recordings
might have been due to the more natural formant spacing, rather than the lack of
a singer's formant per se. The relative positions of the formants in relation to the
harmonic structure might be crucial to perceived quality, and thus the synthe-
sized recordings using formant spacing taken from baritones might have
suffered from their somewhat arbitrary relative formant positions. As admitted
by the authors of that study, none of the recordings in their study were judged as
being particularly natural.
Many of the conclusions drawn here concerning higher frequency spectral re-
gions and the formant structure of sopranos remains speculative. In particular,
the temporal changes to these features have not been examined. It is clear that
much further empirical research is required in order to better understand the
processes involved in emotional expression in singing.