The Acoustic Phonetics English Language Essay

The chapter Vowels, acoustics events with a relatively open vocal tract deals with the examination of the various acoustic properties that can result when the vocal tract is in relatively open configuration. The chapter discusses primarily the sounds produced when the narrowest point in the vocal tract is not sufficiently constricted for modes of vibration for which the average airflow is not large enough to cause a significant pressure drop at the constriction. This configuration is normally associated with vowel sounds. The author explains formant bandwidths for vowels by describing the vocal tract as a pole or a tube and when this has no branches or cross modes and the source of sound is a volume velocity source at the glottis, the transfer function to the volume velocity at the mouth opening is an all pole function. When the shape of the vocal tract is changed due to the position of the body of the tongue or any other structure the frequency at the glottis also changes. at occasions there are acoustic losses in the vocal tract and these are due to various reasons such as vocal tract walls, viscosity, heat conduction and radiation. The author has used a graph to explain the acoustic loss caused by these factors, and it also measures the frequency. The data in the graph was obtained from sweep-tone measurements, in which estimates of the transfer function were made by applying a transducer to the neck surface and measuring the sound pressure radiated from the mouth using a sinusoidal source. The glottis were closed when the measurements were made. From the graph it is under stood that there is a difference in frequency between male and female and radiation casuses the most of acoustic loss. The two figures also show the average values of the bandwidths of the first three formants for several vowel configurations were 54 , 65, 70 Hz respectively, with the first formant band varying from 39 to73 Hz for different vowels. In the high frequency range above about 2000 Hz , a major contributor to the bandwidth is acoustic loss is radiation but there is also considerable variability in the format bandwidths at these frequencies depending primarily on the size of the mouth opening and the cavity affiliation of first format frequency.

High vowels:

A number of acoustic, physiological and auditory factors combine to define a category of vowels that are produced with a high tongue body position and a low first formant frequency. The impedance of vacal tract walls contributes to stability of first format, the tongue surface in the lateral direction can be shaped to produce a stable acoustic output (atleast tongued body positions) that is insensitive to the degree of contraction for the muscles controlling tongue height and the auditory responses to sound with a low with a low f1 appears to have distinctive properties.

Front back distinction

We find a common acoustic consequence of front back displacements of the tongue body independent of tongue height. Forward movement of the tongue body causes an increase of the second formant-frequency to maximum value consistent to the types of constrictions that are possible for the different tongue heights. This maximum value is higher for the high vowels than for the low vowels. For the highest tongue body position, and, to some extent for the intermediate position, the third and fourth formants combine with the second to produce a center of gravity of the higher frequency spectral prominence that is higher than F2. front vowels then are always characterized by a broad minimum or empty space in the spectrum in the mid frequency between F1 and F2. For a back tongued body, on the other hand ,F2 is displaced to value that is maximally low and close to F1 for a proper selection of the tongue body position. In the case of the non low vowels, a value of F2 that is lowest and closest to F1 and can be reached by rounding the lips. An acoustic consequence of an F2 value that is low is low and close to F1 is that the amplitudes of higher frequency peaks in the spectrum are low relative to the amplitudes of F1 and F2 peaks and probably do not play a significant role in determining vowel quality. Electeomyographic data show a sharp distinction in the muscle activity involved in producing front and back vowels. Data reported by Baer et al. show that all back vowels exhibit activity of the stylogloccus muscle, which is oriented to displace the tongue body backward and upward. This muscle is specially active for non low back vowels. Front vowels on the other hand, show no activity of the stylogloccus muscle.

A “neutral” vowel is defined as a vowel produced by a vocal tract configuration that has uniform cross-sectional area along its entire length. Whilst no vowel articulation can actually meet this requirement accurately, the vowel in “heard” and some productions of schwa can approximate this configuration. For such vowels, and only for such vowels, the vocal tract can be treated mathematically as a single uniform tube closed at one end (the glottis) and open at the other (the lips) for the purposes of calculating the resonances of the vocal tract. The acoustics of vowels are fairly well understood. The different vowel qualities are realized in acoustic analyses of vowels by the relative values of the formants, acoustic resonances of the vocal tract which show up as dark bands on a spectrogram. The vocal tract acts as a resonant cavity, and the position of the jaw, lips, and tongue affect the parameters of the resonant cavity, resulting in different formant values. The acoustics of vowels can be visualized using spectrograms, which display the acoustic energy at each frequency, and how this changes with time.

The first formant, abbreviated “F1”, corresponds to vowel openness (vowel height). Open vowels have high F1 frequencies while close vowels have low F1 frequencies, as can be seen at right: The [i] and [u] have similar low first formants, whereas [Ã‰â€˜] has a higher formant.

The second formant, F2, corresponds to vowel frontness. Back vowels have low F2 frequencies while front vowels have high F2 frequencies. This is very clear at right, where the front vowel [i] has a much higher F2 frequency than the other two vowels. However, in open vowels the high F1 frequency forces a rise in the F2 frequency as well, so an alternative measure of frontness is the difference between the first and second formants. For this reason, some people prefer to plot as F1 vs. F2 – F1. (This dimension is usually called ‘backness’ rather than ‘frontness’, but the term ‘backness’ can be counterintuitive when discussing formants.)

In the third edition of his textbook, Peter Ladefoged recommended use of plots of F1 against F2 – F1 to represent vowel quality. [4] However, in the fourth edition, he changed to adopt a simple plot of F1 against F2, [5] and this simple plot of F1 against F2 was maintained for the fifth (and final) edition of the book. [6] Katrina Hayward compares the two types of plots and concludes that plotting of F1 against F2 – F1 “is not very satisfactory because of its effect on the placing of the central vowels”, [7] so she also recommends use of a simple plot of F1 against F2. In fact, this kind of plot of F1 against F2 has been used by analysts to show the quality of the vowels in a wide range of languages, including RP British English, [8] [9] the Queen’s English, [10] American English, [11] Singapore English, [12] Brunei English, [13] North Frisian, [14] Turkish Kabardian, [15] and various indigenous Australian languages. [16]Rounding is generally realized by a complex relationship between F2 and F3 that tends to reinforce vowel backness. One effect of this is that back vowels are most commonly rounded while front vowels are most commonly unrounded; another is that rounded vowels tend to plot to the right of unrounded vowels in vowel charts. That is, there is a reason for plotting vowel pairs the way they are.

The usual description of vowels in respect to their “phonetic quality” requires the linguist to locate them within a so-called “vowel space,” apparently articulatory in nature, and having three dimensions labeled high-low (or close-open), front-back, and unrounded-rounded. The first two are coordinates of tongue with associated jaw position, while the third specifies the posture of the lips. It is recognized that vowels can vary qualitatively in ways that this three-dimensional space does not account for. So, for example, vowels may differ in degree of nasalization, and they may be rhotacized or r-colored. Moreover, it is recognized that while this vowel space serves important functions within the community of linguists, both the two measures of tongue position and the one for the lips inadequately identify those aspects of vocal tract shapes that are primarily responsible for the distinctive phonetic qualities of vowels (Ladefoged 1971). With all this said, it remains true enough that almost any vowel pair of different qualities can be described as occupying different positions with the space. Someone hearing two vowels in sequence and detecting a quality difference will presumably also be able to diagnose the nature of the articulatory shift executed in going from one vowel to the other.

Esophageal talkers may have reduced intelligibility due to both time domain and frequency domain variability. The unpredictable nature of esophageal speech can cause problems when automatic procedures are used in applications such as long-distance telephone messages. The current study compared a standard coding algorithm (LPC-10e) with a novel approach to determining voiced periods (vocal tract area functions) in the speech of esophageal talkers. The results of the study showed that the sentences synthesized with the vocal tract area function algorithm were more intelligible than those synthesized with the standard LPC-10e algorithm. Supplemental information, such as vocal tract area functions, may be useful in determining voiced epochs when variability in vocal parameters is high.

In the last 40 years, many vocal pedagogy authors have written about the need for appropriate vowel modification. Modification involves shading vowels with respect to the location of vowel formants, so that the sung pitch or one of its harmonics receives an acoustical boost by being near a formant. The goals of such modification include a unified quality throughout the entire range, smoother transitions between registers, enhanced dynamic range and control and improved intelligibility. Elite singers, whether they consciously recognize they are modifying vowels or not, become experts at making subtle changes in vowels as they sing, or they do not have consistent careers. Modification concepts which have been widely accepted are summarized below:

Although there is a strong correlation between voice classification and formant frequencies, due to subtle articulation and anatomical differences, formant frequencies are unique to each individual.

The amount of modification needed varies with the size of the voice, the “weight” of the voice, the duration of the note being considered, the dynamic level, and how the note in question is approached. Sensitive singers report that the amount of modification they need may vary daily and also during the day, depending on how much they have warmed up.

Vowel formants are frequency bands, not one specific pitch.

Precise tuning of each note in a piece is not very practical nor is it acoustically beneficial. During a rapid passage, a singer may not have enough time to adjust for optimal resonance on each vowel on each note; moving on to the next note in the passage smoothly is a greater priority than exact tuning of each tone.

Males and females “tune” differently. In general, males seek to match harmonics above the fundamental to a formant, while females, especially in the upper voice, tend to reinforce the fundamental itself by matching it to the first or lowest formant.

Several general “rules” for modifying vowels exist (as summarized by Titze): (a) formant frequencies lower uniformly by lengthening the vocal tract (either by lowering the larynx or protruding the lips or some combination of both); (b) formant frequencies are lowered uniformly by lip rounding and raised by lip spreading; (c) fronting and arching the tongue lowers the first formant and raises the second formant, while backing and lowering the tongue raises the first formant and lowers the second formant; (d) opening the jaw raises the first formant and lowers the second formant.

Vocal fold vibration for voicing is achieved by the combined efforts of muscular tension, tissue elasticity and aerodynamic forces. The vocal folds are initially drawn together by the activities of the various laryngeal adductor muscles. As the folds come together the velocity of air passing through the glottis increases which results in a pressure drop between the medial edges of the folds (Bernoulli effect) causing them to be sucked together. Pressure then builds up below the closed glottis until the folds are forced apart and the cycle repeats (Van den Berg, 1958; 1968). One necessary condition of voicing is that subglottal pressure exceeds supraglottal pressure (the transglottal pressure difference) (Ohala, 1983; Sawashima and Hirose, 1983).

The activity of the larynx during phonation causes the airstream flowing out of the lungs to be broken up into a rapid series of puffs due to the opening and closing of the vocal folds . Each burst of compressed air escapes through the glottis at high speed and collides with the column of air inside the vocal tract. This causes an acoustic shock wave which is propagated to the outside.

The spectrum of the periodic glottal waveform is a line spectrum comprising harmonics which occur at multiples of the fundamental frequency. According to theoretical calculations (Fant, 1960; Rosenberg 1971), the glottal tone for normal phonation has a spectrum that falls off at about 12dB per octave. Other phonation types, as described by Laver (1980), display different glottal tone characteristics.

Vowel sounds are most frequently described with reference to their formant characteristics which provide an indication of the resonance positions and hence the articulatory shape for the vowel production.

Early speech perception studies (Delattre, Liberman, Cooper and Gerstman, 1952; Miller, 1953) showed that the frequencies of first three formants were the most important cues to vowel identification. These findings have been supported by several subsequent analyses (Fox, 1985, Kewley-Port and Atal, 1989; Klein, Plomp and Pols, 1970; Rackerd and Verbrugge, 1985; Shepard, 1972; Terbeek, 1977). The first formant has been shown to be associated with the auditory quality of height and the second formant with the auditory impression of the front/back dimension, or, more correctly, degree of constriction and point of maximal constriction . Ladefoged, De Clerk, Lindau and PapÃ§un (1972) remind us that degree of lip opening, or protrusion, pharyngeal width and larynx height also contribute to modifications of acoustic output. Lindblom and Sundberg (1971) found that all formants were lowered by lip rounding but that for palatal configurations, F3 was particularly affected. HÃ¶gberg (1995) also found that lip area was an important factor in the determination of F3 for the front vowels. When the first two formants are plotted on axes with certain directional and scaling characteristics, the vowel relationships closely resembles the traditional auditory vowel map . Such vowel spaces, with axes F1 and F2, rely on the concept of the vowel target which is the part of the vowel least influenced by its surrounding phonetic context. The vowel target is where the articulators, and therefore the formants, are moving the least and is referred to as the steady-state component of the vowel. The target is considered to be either a point in the time course of the vowel or else a section of time during which the vowel position remains stable. A single point is often used to provide an estimate of the target position, and for most vowels this can be assumed to be approximately mid way though the nucleus . Several authors have noted the problems inherent in the target theory for vowels citing the difficulties often encountered in establishing steady state components by eye or by automatic extraction procedures (Benguerel and McFadden, 1989; Nearey and Assmann, 1986). Van Son and Pols (1990), however, examined five different methods of identifying vowel targets and found that the use of the different methods made little difference to the results of their experiments.

The conventional method of depicting the F1/F2 does not adequately represent the multi-dimensional nature of vowel quality. Delattre et al. (1952) showed that the third formant influenced listeners judgements of vowel quality and more recent experiments have determined that the higher formants have a combined influence on vowel perception. The combined upper formant is referred to as F2 prime (F2′) (Bladon, 1983; Bladon and Fant, 1978; Carlson, Fant and Ganstrom, 1975; Paliwal, Lindsay and Ainsworth, 1983). Delattre et al. (1952) suggested that the ear averages formants that are close together. Carlson, Ganstrom and Fant (1970) tested this hypothesis for Swedish vowels concluding that all vowels could be effectively synthesised using two formant approximations. Chistovich and colleagues found that formant averaging or integration occurred only if two formants were situated within a critical distance of 3 to 3.5 bark (Chistovich and Lublinskaya, 1979 and Chistovich, Sheikin and Lublinskaya, 1979). More recent studies have examined global spectral features suggesting that the F3 – F2 difference is a more accurate way of identifying vowel frontedness. Syrdal and Gopal (1986) have shown that the separation between back and front vowels is more closely linked to the F3 – F2 difference than the F2 – F1 difference. It is important to recognise, however, that F3 and F4 vary more than F1 and F2 as a result of speaker characteristics whereas they are relatively stable across vowel categories in contrast to F1 and F2 which vary greatly as a result of vowel quality. The higher formants are therefore less effective carriers of phonetic information than the lower formants (Harrington and Cassidy, 1999).

Vowels can be described in terms of the centre frequencies of the first three formants at the vowel target (or targets for diphthongs). Vowel duration and other dynamic spectral information contribute to a more complete description but the extent of this contribution remains unclear. Contextual environment as well as suprasegmental factors plays an important role in the ultimate realisation of the vowel phoneme and so such characteristics must be carefully controlled in phonetic research.

Physiological differences between speakers also affect vowel characteristics and such effects must be accounted for in phonetic research and minimised if necessary. One method of minimising physiological effects is to use one of the many normalisation procedures available to reduce variance but care must always be taken when manipulating data to ensure that phonetic accuracy is preserved. The question of sex specific articulations remains open as researchers have been unable to adequately model male to female vowel behaviour.

Acoustic data provides an accessible means for hypothesising about articulatory behaviour and it is customary, in phonetic discussions of vowel characteristics, to use articulatory labels to refer to auditory and acoustic properties (Ladefoged and Maddieson, 1990). Articulatory discussions provide convenient global labels for describing acoustic effects, however, specific articulatory detail should not be ascribed to acoustic vowel data.

Order Now