BAAP 2004 Colloquium: Oral Presentation Abstracts

Patricia Ashby (University of Westminster): Primary cardinal vowels: the effects of contextualization and length of training on identification success rates.

A recent longitudinal study measured the correlation between learners’ identification success rate (ISR) with a) perception of isolated vs contextualised tokens of primary Cardinal Vowels and b) length of training (measured in “actual hours of ear-training”). Results suggest that while learners find it easier to identify isolated tokens, there is a possible saturation point for contextualized ones beyond which no further progress is made. The study followed 125 learners through a 24-week taught programme involving traditional ear-training techniques and assessed their accuracy in Cardinal Vowel identification at two points: mid-way through the programme (after 12 weeks of training and practice) and at the end (after 24 weeks). Identification of isolated tokens was not especially problematic, but once vowels were contextualised in nonsense words, success rates plummeted and remained consistently lower regardless of length of training. Results suggest that after 12 weeks, little further progress is made in year-long courses regarding ISR. The findings lead to the conclusions that: 1) for any given learner, ISR could be governed at least in part by the perceptual capacity of that learner and that 2) the extent of this capacity may be realised after only a short period of training.

Lluisa Astruc (University of Cambridge): Intonation of sentential adverbs in English and Catalan

Sentential adverbs are adverbs like ‘frankly’, ‘sadly’, or ‘fortunately’. Traditionally, they are considered as a type of sentence-external element. They differ from phrasal adverbs in their wider syntactic and semantic scope – they modify the whole clause – and in their intonation. They are frequently described in the literature as lacking prominence and forming independent tonal units. We have reasons to believe that there are substantial differences in the intonation of sentential adverbs in Catalan and in English. According to the data gathered in a pilot experiment, in Catalan, they are mostly accented and form independent tonal units. In English, they constitute separate tonal units as well, but tend to be deaccented. The experiment reported in this paper tests the hypothesis that English and Catalan sentential adverbs are indeed different intonationally.

Eva Liina Asu (University of Cambridge): A comparison of intonational peak alignment in two Estonian dialects

The dialects of Estonian are popularly known to be characterised, in part, by prosodic differences. Of immediate interest in this paper is the intonation of the variety of Estonian spoken on the island of Saaremaa, which is normally singled out as differing from the standard dialect in its ‘sing-songy’ melody. It has been suggested in previous work (e.g. Niit 1980) that this is due to a difference in peak alignment. Asu (2004) presents the first systematic study of standard Estonian intonation, but does not specifically quantify peak alignment except in connection with pitch cues to the three-way quantity distinction of Estonian. This paper will therefore present an entirely new analysis comparing the alignment of H*+L accents in the standard dialect and the Saaremaa dialect. The study will use some of the same read materials used in Asu (2004), and also spontaneous data.

Zoe Butterfint (University of Manchester): The speaker discriminating potential of intra-speaker variation in fundamental frequency

Large numbers of studies have examined the potential use of F0 in Forensic Speaker Identification. However, the presence of considerable intra-speaker variability has led to disappointing results in discrimination trials.

This paper presents the results of an investigation into individuality in intra-speaker variation and the extent to which this may be a useful tool with which to characterize speakers.

F0 was analysed in speech representing different styles and emotional content from four speakers. Correlation analysis was employed to determine whether the range and distribution of F0 values exploited by individuals could be used to distinguish between speakers.

Results indicate that in the majority of cases, correlations between samples by the same speaker are high, even across the different speaking styles and emotional contents. This suggests consistency in the range and distribution of F0 values used by each of the individuals. Much lower correlations are found between samples from different speakers, suggesting good levels of discrimination are possible.

Further study is required to determine whether large numbers of speakers can be discriminated in this way, and how this technique is affected by other forensically pertinent factors such as situational stress and fatigue.

Paul Carter & John Local (University of York): Rhythm and resonance: metrical structure and the sub-F3 spectrum in liquids

The liquids are distinguished in varieties of English not only by F3 but also by the local and temporally-distributed characteristics of the formant structure and spectral balance below F3. In Newcastle, [l] typically has relatively high F2 and a relatively large F2-F1 space whereas [r] typically has relatively low F2 and a relatively small F2-F1 space. Leeds exhibits a systematic reversal of this pattern.

We present a spectro-temporal analysis of [l] and [r] in the speech of 16 young speakers from Newcastle and Leeds. The acoustic details we identify are partly dependent on innovations in the liquid system (labiodental productions of [r]) and partly on prosodic structure: in particular, the Newcastle F2 distinction between [l] and [r] is attenuated by labiodentality (which itself varies with phonological context) and is also attenuated in trochaic contexts. In Newcastle English the acoustic distinction is maintained precisely in the contexts where there is most phonological motivation for the differentiation of [l] and [r].

Yiya Chen (University of Edinburgh): Signature of prosody in tonal realization: evidence from standard Chinese

It is by now widely accepted that the articulation of speech is influenced by the prosodic structure into which the utterance is organized. Furthermore, the effect of prosody on F0 realization has been shown to be mainly phonological (Beckman & Pierrehumbert 1986; Selkirk & Shen 1990). In this paper, I will present data from the F0 realizations of lexical tones in Standard Chinese and show that prosodic factors may influence the articulation of a lexical tone and induce phonetic variations in its surface F0 contours, similar to the phonetic effect of prosody on segment articulation (de Jong 1995, Keating & Foureron 1997). Data were elicited from four native speakers of Standard Chinese producing all four lexical tones in different tonal contexts and under various focus conditions (i.e. under focus, no focus, and post focus), with three renditions for each condition. The observed F0 variations are argued to be best analyzed as resulted from prosodically driven differences in the phonetic implementation of the lexical tonal targets, which in turn is induced by pragmatically driven differences in how distinctive an underlying tonal target should be realized. Implications of this study on the phonetic implementation of phonological tonal targets will also be discussed.

Martha Dalton & Ailbhe Ni Chasaide (Trinity College Dublin): The ups and downs of Irish intonation

The work presented here is in the context of a new project on the prosody of four Irish dialects, Donegal, Mayo, Aran Islands and Kerry Irish. The project sets out to model how the three phonetic dimensions of pitch, voice quality and temporal features are exploited for linguistic and paralinguistic functions of prosody. Results to date have shown that there are striking differences in the contours of Northern vs. Southern dialects. This paper will focus on the strikingly different tonal contours found for Northern Irish as compared to the Southern dialects looked at. Questions of tonal alignment and of the temporal/rhythmic differences between the dialects will be discussed.

Jana Dankovicova, Jill House, Anna Crooks & Katie Jones (UCL, North Surrey NHS PCT): Relationship between musical skills, music training and intonation analysis skills

Beck (2003) reports the relationship between musical and phonetic skills, but without focussing specifically on intonation. We report results of two pilot studies which investigated whether musical skills and musical training affect the ability to analyse intonation in English. Both studies used a similar set of music tasks, designed specifically to match as far as possible the intonation tasks. The music tasks targeted pitch discrimination and tonal memory at various levels of difficulty. The intonation tasks, based on a single speaker, involved: locating the nucleus, identifying the nuclear tone in stimuli of different length and complexity and same/different contour judgements. The subjects were UCL students with basic training in intonational analysis. Both studies showed an overall significant positive correlation between musical training and intonation task scores, and between the music test scores and intonation test scores. A more detailed analysis, focussing on the relationship between the individual music and intonation tests, yielded a more complicated picture. Further investigation is needed to clarify implications for training in intonational analysis and whether a similar relationship can be found in production skills.

John Dawson (University College London): Word-final obstruents and vowel durations in the interlanguage English of native speakers of Modern Greek

The paper presents research on the durations of vowels preceding word-final obstruents in the interlanguage English of native speakers of Modern Greek, and on speakers’ realisations of the voice value of the obstruents following the vowels.

Acoustically analysed data shows that there is a statistically significant tendency among Greek learners of English to realise the obstruents as voiceless irrespective of their voice value in English, and to realise the vowels as durationally short irrespective of their duration in English. This is compared with data from native English speakers uttering identical tokens.

In accounting for the interlanguage realisations I look at possible inputs from the L1, L2 and Universal Grammar, and I advance the hypothesis that, in the early interlanguage of Greek learners of English, it is the vowel duration that is the physical marker of the voice contrast, and hence that vowel duration enables one to predict the voice value, not vice versa.

The paper also discusses the degree to which there is in Greek a measurable variation in the duration of a vowel preceding an intervocalic voicing contrast, compared to that which occurs in English.

Gerry Docherty, Paul Foulkes & Ghada Khattab (U. Newcastle, York, Newcastle): Social-indexical variability and speech perception: an experimental study

A number of recent studies (Docherty & Foulkes 2000, Docherty et al 2004, Hay, Jannedy & Mendoza-Denton, Hawkins 2003, Pierrehumbert 2003) have suggested that rich, substance-based lexical representations, as proposed for example by the advocates of episodic models, may provide a plausible framework within which to account for the systematic social-indexical patterns found abundantly within speakers’ performance. One feature of such models is the assumption that, as a result of experience, individuals develop an implicit awareness of the variability associated with a lexical item and of the social meaning associated with that variability. Evidence of this awareness has been provided in a recent study by Warren, Rae & Hay (2003). Nevertheless, in general our understanding of the perceptual evaluation of variable forms by listeners is based more on anecdote than on firm empirical grounds. In the present paper we describe a pilot experiment designed to investigate the extent to which individuals’ performance in a simple word-spotting task reflects their awareness of key aspects of social-indexical variability.

Gerry Docherty, Paul Foulkes & Dom Watt (Universities of Newcastle, York, Aberdeen): The phonetic properties of ‘pre-aspirated’ variants of (p, t, k) in the North-East of England

A striking feature of the recent literature on phonological variation and change in British English is the growing evidence of the existence of ‘pre-aspirated’ variants of voiceless plosives (especially (t)) within the urban varieties of the North-eastern quadrant of England. Hitherto absent from accounts of these accents, pre-aspirated variants have been reported in Newcastle (Docherty & Foulkes 1999; Watt & Allen, forthcoming), Middlesbrough (Jones & Llamas 2003), and Hull (Kerswill & Williams 1999). The evidence from these areas suggests that the occurrence of these forms is heavily influenced among other factors by speaker age and sex, being most prevalent in young female speakers. However, there is a question about what exactly constitute the phonetic properties of these variants. Docherty & Foulkes (1999), for example, point to a range of acoustic properties associated with these forms, not all of which are encountered in every token. Furthermore, not all of the properties which have been identified are typically associated with the term ‘pre-aspirated’ as conventionally used within phonetic theory. The issue then is whether ‘pre-aspirated’ is an appropriate cover label for a related set of phonetic realisations, or whether use of this term is masking significant phonetic differentiation.

This paper reports the findings of further acoustic study of the phonetic properties of ‘pre-aspirated’ variants of (t) from a number of varieties of English. The results highlight the diversity of acoustic properties associated with the ‘pre-aspirated’ variants and provide a basis from which to infer the corresponding articulatory properties. The results of the study are discussed in respect of their implications for accounting for phonological variation and change as well as for phonological accounts of final consonant weakening.

Ellen Douglas-Cowie & Roddy Cowie (Queen’s University Belfast): The description of naturally occurring emotional speech

Most studies of the vocal signs of emotion depend on acted data. This paper reports the development of a vocal coding system to describe the signs of emotion in naturally occurring emotion. The system has been driven by empirical observation, not by a priori assumptions based on acted or laboratory data. The data used to develop it is the Belfast Naturalistic Database. The system takes a multi-level approach to coding, starting broad brush and moving through progressive layers to finer resolution. The first level uses broad categories which apply to each clip as a whole. Thereafter it uses a tiered approach, starting with an outer tier of relatively coarse descriptors and progressing through successive tiers to more detailed descriptors associated more precisely with locations in a clip. The coding system shows that there are vocal signs of naturally occurring emotion which have not been picked up before in acted data.

Bronwen Evans & Paul Iverson (University College London): Vowel normalization for accent: an investigation of perceptual plasticity in young adults.

Previous research found that listeners are able to change their vowel categorization decisions to adjust to different accents of British English (Evans and Iverson 2004). The patterns of normalization were affected by individual differences in language background (e.g. whether listeners grew up in the north or south of England), and were linked to the changes in production that speakers typically make due to sociolinguistic factors when living in multidialectal environments. This paper presents the results from a longitudinal study which investigated whether listeners who had no previous experience of living in multidialectal environments adapted their speech perception and production when attending university. Participants were tested before beginning university and then again 3 months later. An acoustic analysis of production was carried out and perceptual tests were used to investigate changes in word intelligibility and vowel categorization. Preliminary results confirm that listeners – even at such a late stage in their language development – are able to adjust their phonetic representations, and that these patterns of adjustment are linked to changes in production that speakers typically make when interacting with speakers of a different accent.

Peter French & Philip Harrison (JP French Associates, York): Adapting the Praat speech analysis programme to the purposes of forensic phonetic casework and research

This is a practical demonstration of script developments and modifications to the Praat speech analysis programme. The initial motivation for the work was to facilitate casework and research in forensic phonetics. However, the developments have much wider applications within phonetics teaching and research.

The modified version of the programme allows one to track and log the centre frequency values of vowel formants. When the tracking and logging procedure is complete, the programme automatically calculates the following in respect of f1, 2, 3 and 4 for the tokens of each vowel phoneme analysed:
Average frequency; Maximum and minimum values; Range; Standard deviation

The same information is provided for the delta scores: f2 – f1, f3 – f2 and f3 – f1.

On the basis of their f1 and f2 – f1 delta values, the vowel tokens analysed are also mapped into vowel space on an automatically-generated vowel quadrilateral.

The modifications allow one to generate large amounts of elegantly presented useable data with a small number of keystrokes.

Frank Gooding (University of Wales, Bangor): The perceptual prominence of the upper formant region of front vowels

It is well known that the phonetic front – back distinction in vowels correlates with the position of the second and higher formants: front vowels are associated with higher frequencies of F2 particularly, and also F3, than corresponding back vowels. It has also long been noted that the upper formant region of front vowels is perceptually particularly salient compared to back vowels: front vowels are associated with ‘sharp’ and ‘bright’ judgements, while back vowels were associated with the ‘dull’, ‘dark’ end of the scale.

The basis for the perceptual prominence of the upper formant region of front vowels, despite their acoustically low levels, is investigated. Ss matched the loudness of F1 and F2 in two two-formant front vowels. Matched loudnesses of F2 re F1 were at levels well below those predicted by current psychoacoustic models, but at a fairly consistent level above F2 threshold. Having quantified the discrepancy, the question of whether it can be accounted for by peripheral or central mechanisms is discussed.

Olga Gordeeva & James Scobbie (Queen Margaret University College, Edinburgh): Non-normative preaspiration of voiceless fricatives in Scottish English: a comparison with Swedish preaspiration.

Helgason and Gobl and Ní Chasaide have both observed what Helgason calls “non-normative” (i.e. transitional, non-phonological) preaspiration in Swedish between vowels and following voiceless stops. The preaspiration of voiceless fricatives is far less studied, though devoicing before voiceless fricatives is accomplished earlier than before voiced ones in various languages.

We examined “bus”, “fish” and “goose” vowel-consonant transitions in five adult female speakers of Scottish Standard English. Above a threshold of 20ms, any whispered or [h]-like transition was classified as preaspiration.

All speakers have preaspiration. It appears almost categorically in “bus”, in 25% of “fish” (mainly due to one speaker) and unsystematically in “goose”. The mean duration of all preaspirated cases of “bus” (n=71) is quite large, at 60ms, compared to the fully voiced preceding vowel portion (102ms). Preaspiration is substantially more frequent in phrase-final position than initially or medially. Comparable data from spontaneous child and adult speech will also be presented.

We will consider the possible conditioning factors of this non-normative preaspiration, such as the tense/lax distinction, vowel height, phrasal position and consonant type. The dialectal nature of this phenomenon will also be addressed with reference to other recent studies of preaspiration in the UK.

Esther Grabe, Greg Kochanski & John Coleman (University of Oxford): Quantitative modelling of intonational variation

The shape of intonation is affected by a large number of factors. Particularly influential are language, dialect, speaking style, utterance type, gender and speaker. We investigated the effect of dialect and utterance type on one acoustic correlate of intonation, fundamental frequency. Our speech data were taken from the IViE corpus, an existing corpus of recordings from seven dialects of English spoken in the British Isles (Grabe, Post and Nolan 2001). Six speakers of each dialect read a list of declaratives, wh-questions, yes/no (polar) questions and declarative questions. From these data, fundamental frequency (f0) values were extracted and modelled mathematically. For each utterance type in each dialect, we generated orthogonal polynomial models of f0.

Two findings emerged. Firstly, dialect and utterance type affected the shape of f0. In the four utterance types, speakers typically produced different contours. Distinctions between statements and questions were made throughout the utterance (i.e. not only in intonation phrase final position, as traditionally described), and involved a variety of patterns. In some dialects, the four utterance types were more clearly distinguished than in others.

Secondly, we found some common behaviours across all dialects: mean f0 in questions was higher than in statements. The slope of the utterance distinguished declaratives from declarative questions.

Our findings challenge autosegmental-metrical models of intonational phonology. In these models, the phonological structure of intonation is equivalent to sequential targets in f0. Register and overall slope play not part. Our data suggest that some contour differences between English dialects may not be phonological. They are allophonic and comparable to the difference between rhotic and non-rhotic dialects. The contribution of register and slope, by contrast, appears to be ‘phonological’. These parameters provide consistent cues to the distinction between questions and statements.

Nina Grønnum (University of Copenhagen): Why is Danish so hard to understand?

What is so special about Danish pronunciation? — Every language is special. And most languages have characteristics that make them recognizable, once you have made their acquaintance. So the question really is: is Danish particularly special? — Yes, it is. But the answer makes sense only against a background of some other language, with which it is meaningful — for independent reasons — to compare Danish. That is what I shall do: you will learn something about Danish in a Swedish perspective. In this way I also lend credit to a common experience: that Swedes have a harder time understanding Danish than vice versa — comprehension is asymmetric.

Barry Heselwood (University of Leeds): Instrumental evidence for the auditory quality of the Arabic ‘ayn

The particular phonetic properties of the Arabic ‘ayn were remarked on by the Arab grammarians of the eighth century, and in more recent times have been investigated instrumentally by several researchers. Stop, fricative and approximant variants have all been reported, the latter in particular often accompanied by creak or creaky voice. But notably, many observers have also commented on its auditory quality using such impressionistic terms as ‘strong’, ‘rough’, and ‘jarring’.

This paper briefly presents cross-dialectal acoustic and laryngographic evidence that ‘ayn can be realised by almost any possible combination of manner of pharyngeal articulation and glottal state. It then attempts to explain the auditory quality of a certain commonly-encountered class of approximant realisations in terms of the temporal theory of pitch perception. These realisations exhibit, in addition to a lowering of pitch, an extreme reduction in the amplitude of the lowest harmonics. The ‘strong’ auditory quality is attributed at least in part to the beating of the remaining unaffected harmonics within the same auditory critical band. Possible physiological causes of this pattern of harmonic damping will be addressed.

Jan Hognestad (Agder University College): Tone on the South-Western coast: some observations from a research project on Norwegian tonal accent

It is a well known fact that any Norwegian word of more than one syllable is pronounced with one of two possible tonal melodies, commonly referred to as accent 1 and 2. In structuralist terms, therefore, we have the possibility of lexical contrast based on tone. Less attention has been paid to the fact that there is considerable variation in the realization of the two accentual melodies across dialects, particularly in Western Norway. It is thus likely that for native speakers, tone constitutes a primary cue to dialect identification, both regionally and locally.

My Ph.D. project belongs to ‘A Typology of Norwegian Tonal Accents’, an ongoing research project supported by the Norwegian Research Council. Its aim is to describe the range of tonal variation across Norwegian dialects and to establish whether we are dealing with differing realizations of a uniform underlying system, or whether the phonetic variation reflects deeper, structural differences among the varieties involved.

In my talk, I will briefly present the project, give a few examples to illustrate different accent realizations in present-day varieties and then focus on the Stavanger dialect, where it is possible, thanks to earlier investigations dating as far back as the 1920s, to study the Norwegian accent system diachronically. In the course of the 20th century, a change in accent 1 realization seems to have taken place in this dialect, which may be of interest for the study of Norwegian prosodic variation generally.

Sara Howard (University of Sheffield): Between-word junctures in children with impaired speech production

The phonetic and phonological events which occur at word boundaries in normal adult speech production are well-documented (see, for example, Brown 1991 for a summary), but much less is known about the development of these connected speech processes in either normal or atypical speech development. A small number of perceptual studies address this issue in normally developing children (e.g. Newton & Wells, 1998, 2002; Stemberger, 1988; Wells, 1994), but our knowledge of connected speech processes in children with atypical speech development is very limited. However, the realizations which children with speech problems use to negotiate between-word junctures can shed light not only on the nature of developmental speech impairments, but also on phonetic and phonological issues about word junctures in normal speech production.

This study uses perceptual and electropalatographic (EPG) analysis to explore between-word junctures in the speech of five children with impaired speech production. It uses speech data taken from picture description tasks to investigate a range of the connected speech processes which might be expected to occur in normal speech production (e.g. assimilation, elision, coalescence, and liaison). A number of unusual behaviours are identified in the data, and there is significant inter- and intra-speaker variation.

Mark Huckvale (University College London): Accent characterisation and recognition using self normalisation

Previous approaches to accent recognition match the speech signal to one or more acoustic reference models. For example Barry, Hoequist & Nolan (1989) align a known utterance to a reference template to extract acoustic parameters for accent classification, while Arslan & Hansen (1996) use multiple accent-specific phone recognisers. The problem with such approaches is that acoustic distances contain information about differences in speaker characteristics other than just accent – for example differences in vocal tract length, voice pitch, voice quality, speaking style and speaking rate.

Recently, Nobuaki Minematsu (University of Tokyo) has developed a new means of accent characterisation using not the form of sound elements but their mutual similarity. Speech from a single person is analysed into a phonetic tree based on the acoustic distances between mean productions of sound elements for that speaker.

This paper will report on a study of how this kind of self-normalising analysis can be applied to regional British English accents taken from the ABI corpus. Phonetically balanced sentences recorded from 10-male and 10-female speakers of Birmingham, East Anglia, Glasgow, Liverpool, Newcastle, Republic of Ireland, and Ulster varieties were automatically labelled so that vowel distance matrices could be extracted. The resulting data can be plotted as both maps and trees and used for both characterisation and recognition of accents.

Mark Jones & Carmen Llamas (Universities of Cambridge, Aberdeen): Dialect-specific assibilation patterns: Dublin and Middlebrough compared

Cross-linguistic and historical data indicate that a plosive in one variety may be related to a fricative in another, e.g. High German /ts/ or /s/ = English or Dutch /t/. Cases of assibilation of /t/ to [s] occur in a wide range of languages such as Ancient Greek, Turkana, Korean and Finnish. An acoustic study of /t/ assibilation in Dublin English and Middlesbrough English is reported here which compares the assibilated /t/ with the fricatives /s/ and /∫/. The assibilated /t/ results in two different acoustic patterns: in Dublin it resembles /∫/, in Middlesbrough it broadly resembles /s/. These results are interpreted as indicative of a phonetic process in Middlesbrough, but of a stable phonological pattern in Dublin, probably rooted in historical contact with Irish.

Miho Kamata (University of Leeds): English and Japanese voiceless alveolar/dental stops and their affricated realisations

The aim of this study is to observe the differences/similarities of the acoustic characteristics of English and Japanese voiceless alveolar/dental stops and their affricated realisations, especially focusing on their acoustic properties: stop-gap, noise, and burst durations and VOT. The interest in this topic stems from (1) the intuitive impression that an aspirated voiceless alveolar stop [?h] in English sounds more like a Japanese voiceless alveolar affricated [?s] rather than a Japanese dental stop, and (2) the knowledge that a voiceless alveolar stop /?/ in certain contexts and accents in English is sometimes pronounced with an affricated realisation [?s]. Two native speakers of English from London and two native speakers of Japanese were asked to read out words, including target sounds, in carrier phrases in their native language. Digital recordings were then used to measure duration of closure, burst, and friction. As a result, it was found that English aspirated stops [?h] as well as affricated stops [?s] and sequences of /?/+/?/ can be said to be more similar in duration and VOT to Japanese alveolar affricated [?s] rather than the dental [t�1] which is more similar to English [?] following [?].

Elinor Keane & Ron Asher (Universities of Oxford, Edinburgh): Diphthongs and diglossia

Tamil is a diglossic language, and impressionistic descriptions note that in some environments a monophthong ([e] or [a]) is the informal realization of /ai/, which is invariably produced as a diphthong in formal speech. Instances of /ai/ in both formal and informal varieties were therefore compared to establish its internal structure in various contexts. Three native speakers produced three repetitions each of 28 words containing /ai/ embedded in simple carrier phrases. Informal tokens were elicited by asking subjects to respond to pre-recorded questions on the basis of visual stimuli, and formal sentences were presented orthographically. Analysis of duration and formant frequencies confirmed the occurrence of monophthongs in informal speech, in the final syllable of non-monosyllabic words – but only in a single speaker’s data. This illustrates the difficulty of eliciting authentically colloquial utterances of Tamil, even using a methodology that specifically precludes orthographic influence. Statistical analysis of the diphthongs revealed significant effects of word length, and also the lexical or inflectional status of the /ai/. Nevertheless, the overall configuration (a glide followed by an offset steady state) and specifically the rate of change of F2 remained stable across different conditions.

Ghada Khattab (University of Newcastle upon Tyne): Accent features in English-Arabic bilingual children and adults

This study presents details from a phonetic analysis of vowel production by English-Arabic bilingual children aged between five and ten and living in Yorkshire. A total of 23 subjects (including bilinguals and monolinguals) were recorded in picture-naming, story-telling, and reading activities. The phonological variables being studied include the vowels in the BATH, STRUT, FACE, PALM, START, and GOAT lexical sets (Wells, 1982). These were chosen because of their context- and dialect-specific phonetic realisations.

On the whole, the choice of phonetic realisations for each of the target vowels by the bilingual children is similar to that of the monolingual controls, and shows signs of the bilinguals having acquired phonological patterns that are appropriate to their community. The bilinguals’ wider linguistic repertoire also showed in a number of vowel realisations that were exclusive to the subjects’ productions. Some of these realisations may be due to an influence from the bilingual parents’ foreign accent, which constitutes a challenge to Chambers’ (2003) suggestion that bilinguals have an “accent filter” that enables them to filter out foreign-accent features in the input that they receive.

Mikhail Kissine, Hans Van de Velde & Roeland van Hout (Universities of Cambridge, Utrecht, Nijmegen): Variationist contributions to phonetics: a case study of the /v/-/f/ contrast in Dutch

This paper on the devoicing of /v/ in Dutch demonstrates how quantitative sociolinguistics can help to address important theoretical questions in phonetics. In standard Dutch devoicing of /v/ is observed all over the Dutch language area, but devoicing varies in its strength and in its acoustic correlates (Kissine, Van de Velde & Van Hout 2003). Data are taken from a reading experiment which was part of a large sociolinguistic study. The 160 subjects are stratified for community (The Netherlands vs. Flanders), region, age and sex. Per subject two occurrences of word initial /v/ and /f/ are analysed. The following acoustic measurements were done in Praat: f0 extent in the fricative, fricative duration, noise intensity of the fricative, vowel duration (following the fricative), frequency of f0 in the vowel, and slope of f0. The statistical analyses show that the /v/-/f/ distinction cannot be described in terms of “open/closed glottis” gestures, nor of the tension of the vocal folds. A more detailed acoustic examination of some prototypical speakers confirms that acoustic proprieties regulating the /v/-/f/ distinction cannot be byproducts of the state of the vocal folds during supra-glottal constriction, but are produced by controlled articulatory gestures aiming to enhance the auditory contrast.

Rachael-Anne Knight (Roehampton University of Surrey and UCL): Nuclear accent shape and perceived prominence

It is well known that prominence judgments are affected by the fundamental frequency of the intonation contour associated with individual syllables. Syllables associated with accent-peaks of higher frequency sound higher in pitch and are judged to be more prominent than lower frequency accents. One issue that has received little attention is how the shape of the fundamental frequency contour affects judgments of prominence. As Knight (2003) demonstrates that a flat plateau in the contour makes an accent sound higher in pitch than a sharp peak of the same frequency, the present study investigates whether this effect can be extended to listeners’ judgments about within-utterance syllable prominence.

Results from a perception study show that plateau stimuli sound both higher and more prominent than peak stimuli at every frequency indicating that, in addition to frequency, the shape of the accent influences the perception of prominence. In natural speech plateaux have consistently been observed in nuclear position and also in prenuclear position when a new topic is introduced. The present results suggest that plateaux may act as a substitute variable for peak height allowing speakers to increase the perceived height and prominence of accents by sustaining a particular fundamental frequency rather than, or in addition to, producing a higher fundamental frequency.

Greg Kochanski, Chilin Shih & Tan Lee (Universities of Oxford, Illinois; CUHK): Connecting acoustics to linguistics in Chinese intonation

We report new models of intonation applied to Mandarin and Cantonese speech. The model are based on the neurophysiological literature on muscle control strategies for voluntary motions. It gives a quantitative prediction for the pitch contour in terms of (5 for Mandarin, and 6 or 9 for Cantonese) discrete tone categories, the ideal shape of each category, and a prosodic strength for each syllable that expresses how carefully each tone is realized.

Our model minimises the sum of a motion cost plus an error cost that measures the deviation from a set of ideal pitch patterns. The model works accurately even with rapid speech, where individual syllable gestures overlap enough even to convert nominal falling tones into rising tones.

We trained this model against a corpus of Mandarin and another corpus of Cantonese speech. In the training process, a computer program adjusted the ideal shape of each tone category the prosodic strength of each word and metrical patterns, to match the corpi. The model was able to accurately reproduce the data, with a RMS error of less than 15 Hz. More importantly, the resulting tone shapes, strengths and metrical patterns can be associated with abstract linguistic structures.

We found, in both languages, that sentences, clauses, phrases and words are all marked by a higher prosodic strength (more careful/broader intonation) at their beginnings, and a lower prosodic strength (less close to ideal/narrower intonation) at their ends. Nouns tend to be more carefully articulated (in the sense of intonation) than most other classes of words, especially particles, and longer words tend to more carefully articulated than single-syllable words. There is some reason to believe that the prosodic strength may be a reflection of the predictability of the next syllable, following the general rule that one doesn’t need to articulate carefully if one is saying something expected and predictable. The results show that some syllables are much more important than others and are executed much more carefully, and the care of execution is controlled by larger scale linguistic structures.

Ee Ling Low, David Deterding & Fiona Ong (Nanyang Technological University, Singapore): Rhythm indexes: a comparative study of their reliability

Research on speech rhythm in the last 5 years has been largely concerned with attempting to search for the acoustic correlates of rhythmic classification. Researchers are interested to find out whether there is experimental evidence to support the traditional rhythmic classification of languages as being stress- or syllable- based (Low 1998, Ramus, Nespor & Mehler 1999, Low, Grabe & Nolan 2000, Deterding 2001, Grabe & Low 2002, Cummins 2002, Ramus 2002). These studies have been able to provide some form of acoustic justification for the classification of languages into traditional rhythmic categories which hitherto had been considered a largely perceptual phenomenon.

This paper focuses on the rhythm indexes developed by Ramus et al (1999), Low et al (2000) and Deterding (2001). Ramus et al’s index showed that intervocalic intervals (which he calls %V) and the standard deviation of consonantal intervals (?C) provided good clues to rhythmic classification. Low et al used the pairwise variability index (PVI) to measure the variability in duration between successive vowels in the read speech of Singaporean English (SE) and British English (BE) speakers and found that successive vowels in what has often been labeled a syllable-timed language like SE exhibited less variability in duration than a so-called stress-timed language like BE. Deterding (2001) provided empirical support for lower variability in successive syllable durations in conversational speech for SE compared to BE.

In this paper, we apply the 3 rhythm indexes on the conversational speech data of British and Singapore English speakers obtained from the National Institute of Education Corpus of Spoken Singapore English (NIECSSE). We aim to investigate the degree of variation in the results obtained by each index when the data is measured by two different researchers. Secondly, we study the correlation between the results generated by the different indexes and the rhythmic perception of the utterances by a group of listeners.

John Local & Gareth Walker (University of York): Prosody, focus and repair in talk-in-interaction

Most studies of the vocal signs of emotion depend on acted data. This paper reports the development of a vocal coding system to describe the signs of emotion in naturally occurring emotion. The system has been driven by empirical observation, not by a priori assumptions based on acted or laboratory data. The data used to develop it is the Belfast Naturalistic Database. The system takes a multi-level approach to coding, starting broad brush and moving through progressive layers to finer resolution. The first level uses broad categories which apply to each clip as a whole. Thereafter it uses a tiered approach, starting with an outer tier of relatively coarse descriptors and progressing through successive tiers to more detailed descriptors associated more precisely with locations in a clip. The coding system shows that there are vocal signs of naturally occurring emotion which have not been picked up before in acted data.

Kirsty McDougall (University of Cambridge): Individual variation in vowel-to-vowel coarticulation in British English

In vowel-to-vowel coarticulation, one vowel influences another vowel across any intervening consonants (ñhman, 1966). This phenomenon has been observed to extend beyond adjacent syllables, for example, in studies of English by Magen (1997) and Huffman (1985). In those studies, considerable variation in the coarticulatory behaviour of individual speakers was observed, although the numbers of subjects investigated were small. The present study examines vowel-to-vowel coarticulatory effects produced by a group of 10 Standard Southern British English speakers. The subjects were recorded reading sentences containing nonsense sequences of the form /bV1bV2bV3b/, where V1 and V3 are the four “corner” vowels in all combinations, V2 is schwa, and either V1 or V3 receives nuclear stress. F1, F2 and F3 frequencies were measured at the midpoints of V1, schwa and V3. This study will analyse differences among the speakers in the coarticulatory effects of the full vowels across medial schwa, focussing on the relationship between the directionality of vowel-to-vowel coarticulation and level of stress.

Inneke Mennen & Margit Aufterbeck (QMUC Edinburgh, University of Cambridge): Pitch-accent realisation in Southern British and Scottish English

Although recent research has highlighted differences in the realisation of pitch accents across different varieties of British English (e.g. Grabe et al., 2000), Scottish English is relatively under-investigated. This paper systematically compares the realisation of pitch accents in Southern British English and Edinburgh Scottish English. There is some suggestion that Southern British English and Scottish English may differ in their alignment (i.e. timing) and scaling (i.e. the height) of pitch accents (Ladd & Schepman, 2003, for alignment; Patterson, 2000, for scaling). Our data will establish whether this is the case by providing results for the alignment and scaling of certain target points (i.e. initial low, first accent peak, post-accentual valley, second accent peak, and utterance-final low). The materials consisted of short question-answer pairs designed to elicit narrow and broad focus in short statements, of the type:

Who did Mona see? Ramona.
What happened? Mona saw Ramona.

Measurements were taken from the readings of these sets of materials by twenty four male speakers (i.e. twelve Southern British English and twelve Edinburgh Scottish English speakers). Preliminary results support our hypothesis of both alignment and scaling differences in the two varieties.

Peggy Mok (University of Cambridge): Vowel-to-vowel coarticulation in Cantonese and Mandarin

This experiment tested the hypothesis that density of phonemic vowels in a language influences the degree of vowel-to-vowel coarticulation it exhibits. Cantonese has more phonemic vowels than Mandarin, but fewer phonetic vowels. Cantonese would show less vowel-to-vowel coarticulation than Mandarin if number of phonemes is critical, but more if number of allophones is critical. 8 native speakers of Cantonese and 8 of Mandarin read nonsense trisyllables with symmetrical context (e.g. /pipapi/) in carrier phrases. F1 and F2 frequencies were measured at onset, midpoint and offset of the middle target vowel. Preliminary results show that the vowel space density is not a determining factor of vowel-to-vowel coarticulation in Cantonese and Mandarin. Speaking style (and rate) also seems not to strongly influence degree of vowel-to-vowel coarticulation. Different vowels exhibited differences in vowel-to-vowel coarticulation, with /a/ being the most susceptible. There were also coarticulatory differences in F1 and F2 dimensions for different gender. Unexpectedly, target vowels appear to be more centralized in homogeneous contexts (e.g. /papapa/), than in different contexts (e.g. /pipapi/). Effects of stress and direction are being investigated. These results show that factors in addition to physiological constraints and distribution of vowel phonemes are influential in vowel-to-vowel coarticulation.

Richard Mullooly (Queen Margaret University College, Edinburgh): An EMA analysis of [r]-insertion in English

Alternating [r]’s intervocalic insertion in English non-rhotic dialects following the consonant’s deletion in codas is perhaps a unique diachronic change: There is no phonetic explanation. It could be an underlying phoneme as Gick assumes (1999). There is no instrumental evidence from non-rhotic dialects though.

Electromagnetic-articulograph data from three English speakers were analysed. Stimuli were designed to elicit articulatory evidence of an underlying geminate /rr/ in phrases containing double orthographic (tuner rant) by comparing tokens with phrases with only word initial (tune a rant). Eighteen repetitions of each phrase were recorded for all speakers. Duration of labial, apical and dorsal gestures was measured with the prediction that if there is a geminate, gestural duration should be longer for tuner rant. No difference was found.

There is an alternative explanation for [r]’s insertion. Motor sequences become stereotyped through repetition due partly to the reduction of neurones’ threshold to synaptic impulses (Hardcastle 1976: 8-9). The motor sequence [r] is necessary in words with intervocalic /r/, e.g., story, staring. Alternating [r] could be a stereotyped motor sequence with no underlying /r/ where its insertion is treated postlexically. The proprioceptive system may cause [r]’s insertion rather than a different consonant like [l]. Acoustic data from vowels preceding laterals and rhotics suggest speakers have different vocal tract configurations (Local 2000).

Elinor Payne (University of Cambridge): Connected speech processes in Italian: a study in phonetic motifs

This paper introduces the concept of phonetic ‘motifs’ as an approach to formalising language-specific variability in phonetic interpretation. In particular, I present an analysis of diversity in connected speech strategies for regional variants of Italian, looking at degree and direction of place assimilation for word boundary consonants clusters. Evidence for both gradient and categorical assimilation is found, raising the question whether assimilation occurs at the phonetic or cognitive level. There is evidence too of diverse prosodic influences on assimilation. I shall also consider the role of phonetic motifs in structural innovation and the implications of this for mechanisms of sound change.

Richard Ogden, Auli Hakulinen & Liisa Tainio (U. York, U. Helsinki, U. Helsinki): Indexing ‘no news’ with stylization in Finnish

In this paper, we consider the phonetic, linguistic, and interactional properties of one stylized figure in Finnish. Our data come mostly from the collection of everyday and institutional interaction in the Department Finnish held at the University of Helsinki; other data come from the collection of institutional interaction held at The Research Centre for the Languages of Finland, and material recorded from the radio. The data consist of both phone calls and face-to-face interaction. There are 130 instances of the stylized figure in our collection, of which 69 come from everyday conversation. While the stylized figure is phonetically salient to Finnish speakers (indeed, we delimited our data primarily by listening for cases of it), we show that it also involves a constellation of other linguistic features. These features and the sequential positioning of the stylized figure work together to downplay the importance of what is said, to mark it as not worthy of comment, or as something obvious, and already known: in other words, to index ‘no news’. This paper contributes to a more empirical grounding of the linguistic properties, and in particular of the ‘meaning’, of stylization.

Sue Peppe, Fiona Gibbon & Joanne McCann (Queen Margaret Univ. College, Edinburgh): How do you pitch-accent in uptalk?

A little research has been done on the perception of pitch accents in rising contours of English. From data collected in the course of administering a prosody assessment procedure, we show that it is apparently harder to distinguish pitch-accent in rising contours than in falling ones, and that when an utterance with rising contour is spliced into alternative contexts, the place of pitch accent within the utterance is likely to be assigned differently, according to the influence of context on pitch-accent expectation in the listener. Furthermore, using data from judgment reliability testing, we show that listeners tend to be certain of where pitch accent occurs in what they hear, and are reluctant to opt for a judgment of ‘ambiguous’. We investigate some avenues as to why judgments are more likely to concur in falling than in rising contours, and suggest some implications (for communication) of errors in assignment of pitch-accent, as relevant for varieties of English where rising contours / high terminals are the norm.

Ninik Poedjianto (University of Glasgow): The perception of Indonesian English stops: a bidirectional investigation

This paper presents results of two perceptual tasks on word-initial English stops conducted with 30 Indonesian learners of English (from 3 proficiency levels, 5 female and 5 male at each level in a private English language school in Surabaya, East Java, Indonesia) and 6 native Southern British English (SBE) speakers (3 female and 3 male). This study aims to investigate: (1) how Indonesian learners identify English stops produced by a native SBE speaker and an Indonesian speaker of English, (2) how native SBE speakers identify English stops produced by a native SBE speaker and Indonesian learners of English, and (3) how native SBE listeners judge the goodness of fit of Indonesian English stops compared to their SBE stop category. To achieve the aims, this study focuses on /p/ and /b/ in 2 English target words (i.e. ‘pet’-‘bet’) presented to the Indonesian listeners, and 6 English target words (i.e. ‘path’-‘bath’, ‘push’-‘bush’, ‘pass’, and ‘bus’) presented to the SBE listeners. Cross reference to the perception of Indonesian /p-b/ and the production of Indonesian English /p-b/ is made to help account for the findings in the Indonesian and SBE listeners respectively.

My Segerup (University of Lund): Word accents in Gothenburg Swedish

One salient characteristic which differentiates Swedish dialects is their prosody, and in particular the realization of the two contrastive word accents. This paper investigates the word accents in Gothenburg Swedish. Disyllabic word pairs contrasted by accent and exemplifying phonologically long and short stressed vowels were placed in focal, sentence final position. The materials were read by seven native speakers of Gothenburg Swedish aged between sixty and eighty. In order to test which features of the Fo contour are essential for characterizing the word accents, the materials were recorded at two different speaking levels. In order to achieve this the subjects were seated either close to an interlocutor (with whom they also did an interactive task in the same recording session), or further away, in which case they were instructed to speak up. This method was successful in eliciting two loudness levels. Preliminary results show that both word classes are characterized in focal position by a phrase accent which involves a high target at the end of the second vowel. The main distinction between the accents is found in the first syllable. In accent I words the pitch falls from the start of the first vowel, whereas in accent II words the fall is delayed until partway through the vowel. Additionally accent II words have a higher pitch on the first syllable. This higher pitch may cause the phrase accent to be higher in accent II words as well, giving them a higher pitch overall. Results concerning the stability of pitch alignment across the two loudness levels will also be discussed as will durational phenomena correlated with the accent distinction.

Tomasina Oh & Ee Ling Low (NUS, NTU, Singapore): Rhythm in the disorganized speech of schizophrenic patients: a preliminary study

Schizophrenia affects one person in every hundred. Approximately a fifth of schizophrenia patients exhibit the symptom formal thought disorder (FTD); these patients produce disorganized speech that is particularly difficult to follow. The speech of FTD patients has been investigated at different levels to determine what it is that makes much of their speech incomprehensible, but to date analyses at the level of prosody have been relatively sparse. Existing work suggests that prosody is abnormal, but whether this is a feature of schizophrenia in general or specific to FTD (the symptom associated with incomprehensible speech) is unclear.

Prosodic aspects of speech, including speech rhythm, contribute to speech being comprehensible and help listeners decode speakers’ intended meaning(s). This preliminary study examines whether abnormalities in speech rhythm – if any – are specific to FTD, and therefore a contributing factor toward patients’ incomprehensibility. Spontaneous speech from an initial sampling of one matched pair of schizophrenia patients with and without FTD was analysed using the Pairwise Variability Index (PVI) which measures the successive vowel to vowel duration between two speech samples. The PVI has been widely applied in capturing rhythmic differences between different languages, different varieties of English and rhythmic variation in the speech of children.

Claire Timmins & Jane Stuart-Smith (University of Glasgow): An acoustic investigation of L-vocalization in Glaswegian adolescents

L-vocalization (Wells 1982: 258) is typical of Southern English, such as Cockney and elsewhere in the UK. Early reports for Glaswegian were confirmed more recently (Stuart-Smith 1999). L-vocalization here refers to the auditory identification of vowels where [l] is expected. There seems to have been very little work on the acoustic characteristics of L-vocalization, though /l/ has been studied: Carter (2002: 86) notes among other features, high F3 and variable lower formants, with a particularly low F2 for dark [l] which “acoustically resembles vowels much more than does clear [l]”.

Auditory analysis of L-vocalization in read speech from 18 adolescent male Glaswegian speakers reveals most vocalization in postconsonantal (syllabic) /l/ (e.g. sidle), then coda /l/ (e.g. fill), and least in preconsonantal /l/ (e.g. milk). Formant tracks were taken of F1/F2/F3 of all instances of /l/ and preceding (high)/front vowel. F1/F2 were also taken for BOOT, COT, COAT, and CAUGHT. With this information we present an acoustic analysis of /l/ in these contexts, and consider the extent to which acoustically distinct groups emerge which correspond to auditory categories. We also look at the acoustic qualities of these realizations of /l/ and how they relate to existing (high)/back vowels.

Ian Watson (University of Oxford): Post-focal intonation in English and French

A commonplace of many intonational theories has been that tonal and or accentual information after the focussed item in an IP is either lacking or predictable. This raises the question of whether linguistic items usually differentiated by stress or accent can be distinguished at all in post-focal position, and if so, how. In this study, I address these issue using elicited data from English, and from French, for which it has recently been shown that a degree of tonal information may persist in post-focal contexts.

Bill Wells (University of Sheffield): Prosody, focus and repair: a developmental perspective

In some languages, the main point of prosodic prominence is fixed at the end of the turn constructional unit, serving a turn-delimitative function, while in others such as English speakers can deploy prosodic prominence at variable places in the turn constructional unit. The received view is that this marks ‘focus’. However, Local and Walker (this conference) argue that that in English the prosodic prominence traditionally associated with ‘focus’ is a resource deployed by participants to handle other-initiated repair. In the present paper, evidence is presented that children learning English start out with a turn-delimitative system of fixed final accentual prominence and that they learn the English system of mobile prominence as they are confronted with the need to handle other-initiated repair in their earliest multiword utterances. The evidence is taken from detailed interactional and phonetic analysis of young children talking with caregivers.

Anne Wichmann (University of Central Lancashire): “Sorry” in casual conversation: Functions and segmental / prosodic realisation

The word “sorry” is known to have a number of different functions in casual conversation, including ‘real’ apologies, indicating repair sequences, requests for clarification, and acknowldgement of communication difficulties.

Using the spoken part of the International Corpus of British English (ICE GB) as data, approximately 400,000 words of transcribed speech, a lexical search was used to identify all occurrences of “sorry” in the transcript, whether isolate uses, constituting a complete utterance, or part of longer utterances. Based on analysis of both cotext and context, the “sorry”-utterances were categorised according to their syntactic structure (e.g. initial or final position of “sorry”) and their apparent pragmatic function.

The sound files were then analysed auditorily and each utterance was transcribed segmentally and prosodically (in most cases just intonation contours). The data is mostly too noisy for instrumental analysis, but the analysis of a limited number of items is planned.

The segmental and prosodic characteristics were identified for each utterance and related to the functional categories already established. A close relationship appears to hold between both prosodic and segmental characteristics and pragmatic function.

I attempt to account for some of the observed differences in terms of models of intonational meaning, referring both to phonological models (e.g. Pierrehumbert and Hirschberg 1990) and accounts of global phonetic realisation (e.g. ‘Biological Codes’ Gussenhoven, 2002).

Melissa Wright (University of York): The Phonetic Properties of ‘Multi-Unit First Closing Turns’ in British-English Telephone Call Closing Sequences

When speaking on the telephone, the participants are faced with the interactionally delicate task of how and when to negotiate call closure. This paper describes the interactional and phonetic properties of one device, ‘multi-unit first closing turns’ such as ‘yes + okay then’, which British-English speakers regularly employ to manoeuvre the call from some on-topic talk into the closing sequence. It demonstrates that these turns are comprised of two separate units of talk. Each unit performs two distinct actions: the first, which typically contains ‘yes’, receipts the preceding sequence and the second, which mostly contains ‘okay then’, offers call closure. Moreover, these two distinct units/actions are shown to have systematic differences in phonetic design: the first are typically produced with narrow pitch spans and are placed relatively low in the speaker’s pitch range; in comparison, the second have wider pitch spans and are placed relatively high in the speaker’s pitch range. In addition, audible clicks and portions of glottality regularly occur between the two units and serve to further index the action structure of the talk. These findings contribute to our understanding of the phonetic organisation of conversation and demonstrate the fruitfulness of conducting interactional and phonetic investigations hand-in-hand.