18.01.08

Introduction to Linguistics- 7.11.2007

PHONETICS: REALISING SOUNDS

Speech:
Production
Transmission
Perception


Phonetic domains: the Phonetic Cycle
● The Articulatory Domain
– The IPA (A = Alphabet / Association)
– The Source-Filter Model of Speech Production

● The Acoustic Domain
– The Speech Wave-Form
– Basic Speech Signal Parameters
– The Time Domain: the Speech Wave-Form
– The Frequency Domain: simple & complex signals
* Fourier Analysis: the Spectrum
* Pitch extraction
– Analog-to-Digital (A/D) Conversion

● The Auditory Domain: Anatomy of the Ear


The domains of phonetics:
The articulatory organs:
– Lungs
– Vocal cords in the larynx (Adam’s Apple)

● Positions:
– Uvula
o with back of tongue
– Pharynx
o with velum (nasals)
– Velum (soft palate)
o contact with tongue: velars
– Palate (hard palate)
o with tongue
– Alveolar ridge
– Upper teeth
o with tongue
o with lower lip
– Upper lip
o with lower lip
o perhaps with tongue


Speech production: Source- Filter Model:
Description of sounds: two levels
● For general pronunciation representation in the lexicon:
– phonemic transcription
– just enough phonetic detail to distinguish words
● For detailed representation of speech pronunciation:
– phonetic transcription based on
o articulatory phonetics (about speech production)
– remember the other dimensions of speech description:
o acoustic phonetics (about speech wave transmission)
o auditory phonetics (about speech perception)
The ear:

15.01.08

How to make a dictionary- 15.1.2007

COMPUTATIONAL LEXICOGRAPHY




Criteria for Good Lexicography:

• Quantity:
– Completeness of coverage:
o extensional coverage: number of entries
o intensional coverage: number of types of lexical information

• Quality:
– Correctness of information:
o Types of lexical information
– Consistency of structure:
o Macrostructure
o Microstructure

o Mesostructure



Concordance :
• A KWIC (KeyWord In Context) concordance is a special kind of preliminary, corpusbased dictionary:
– each word in a text corpus is paired with its contexts of occurence in this corpus
• Note: Google is a special form of KWIC concordance
• Example text:
“My first sight of England was on a foggy March night in 1973 when I arrived on the midnight ferry from Calais.”



Alphabetically ordered KWIC:




Simplest KWIC procedure:
1. Corpus creation: make a corpus of texts in electronic format
2. Tokenisation (re-process each text):
- process punctuation marks
- break the text into context units (lines/sentences)
3. Keyword list extraction (all words in text)
4. Context collation (for each keyword)
5. Search for KWIC in corpus
6. Store output and format– for printing, hypertext (CD, web)




KWIC: Dictionary Making
• The function of a KWIC is
– to make searching for lexical information more efficient by putting context information about words in one place – for making “Word Sketches” (Adam Kilgarriff)
• grammatical descriptions: parts of speech
• dictionaries: examples of use, collocations, ...
• Project: Make concordances from your text corpora and use them to collect lexical information for your Toolbox lexical databases


The Status of Dictionaries:
• Remember that the dictionary is
– one of the three main components of language documentation:
• corpus of recordings and texts
• dictionary
• sketch grammar
– the central component of any linguistic description
– the most useful linguistic product for use by the speech community, or non-linguists in general



The Ibibio Dictionary:
• The Ibibio Dictionary
– uses information from Elaine Kaufmann's Ibibio Dictionary
– the information was re-typed into an Office table format
– this was converted into
• Toolbox format for further lexicographic extension
• LaTeX for formatting (cf. the Ibibio Concordance)• Project: extend the Ibibio corpus, concordance, dictionary in scope & context





QUIZ :
• What are the 6 main steps in KWIC
concordance construction?
• Explain each of these steps:




KWIC procedure: 1. Corpus collation
• My first sight of England was on a foggy March night in 1973 when I arrived on the midnight ferry from Calais.



KWIC procedure: 2. Tokenisation
• In the text:
My first sight of England was on a foggy March night in 1973 when I arrived on the midnight ferry from Calais.
• Process
– upper case (capital) letters
– punctuation marks
• To produce:
my first sight of england was on a foggy march night in 1973 when i arrived on the midnight ferry from calais


KWIC procedure: 3. Keyword List
• Replace each SP (space) sequence by a LF (linefeed) / NL (newline)
• Sort the list alphabetically
• Remove duplicate words









KWIC procedure: 4.Contexts




KWIC procedure: 5. Search
• For example:
– on is found in the middle of the following context
units:
• was on a
• arrived on the
– arrived is found in the middle of the following context
units:
• i arrived on
– etc.



KWIC procedure: 6. Output format














































How to make a dictionary- 27.11.2007

TYPES OF LEXICAL INFORMATION: GRAMMAR (PARTS OF SPEECH CATEGORIES & SUBCATEGORIES)





● Types of lexical information: syntax
– Sentence structure - “syntax”, “phrasal syntax”
– Syntactic categories
o parts of speech (POS)
o subcategories
o phrasal categories

● The structure of language: constitutive relations:
– structural relations
o syntagmatic relations
o paradigmatic relations
– semiotic relations
o interpretation relations
o realisation relations

● Text structure - “text syntax”





Grammar:
● “Grammar” is a rather broad term
– It covers
o orthography
o phonology
o morphology
o syntax (sentence structure)
o lexical idiosyncrasies
– Sometimes “grammar” is restricted to mean just
o sentence structure.

● The term “syntax” originally meant structure
– However, “syntax” is also sometimes restricted to mean just
o sentence structure

● However there are other meanings:
– word grammar, word syntax
– text grammar, text syntax


Sentence structure:
● A structure is an arrangement of objects in a certain
orderin relation to each other.
● This applies to
– architecture
– traffic systems
– paintings
– music
– written and spoken language
– ...

● A structure consists of relations of two kinds:
– paradigmatic relations
o classificatory relations of similarity and difference between objects
– syntagmatic relations
o compositional relations between parts of a larger whole

● Sentences consist of
– Words
o and which are grouped into larger phrases
– Phrases
● which are grouped into
o even larger phrases
o and into sentences
– Sentences
● which may also be grouped into more complex sentences:
o with subordinate clauses
* relative clauses










* adverbial clauses
o or with coordinate clauses
* and
* but





* for


Definition of a sentence:
– Simple sentence:
● A sentence is a simple sentence
o The Pepsi worker allegedly assaulted the Coca-Cola employee.

– Coordinating sentence:
● A sentence is a sentence linked with a sentence by means of a
coordinating conjunction
o An assembly worker hid screws in a specially designed hiding place and took up to 7,000 home with him every day.

– Subordinating sentence:
● A sentence is a sentence with a subordinate simple sentence
(clause) inserted into it
o e.g. relative clause, adverbial clause
o A car dealership owner killed two employees because they kept asking for more pay.

– Exclusion condition:
● Nothing else is a sentence.





Syntactic categories (parts of speech):
● Nominal categories:
– Nouns
– Pronouns (special glue: co-reference)
– Articles
– Adjectives

● Verbal categories:
– Verbs
– Adverbs

● Glue categories:
– Prepositions (intra-sentence glue)
– Conjunctions (inter-sentence glue)
– Interjections (dialogue glue)






Syntagmatic relations in:





- Syllables






- Words



- Sentences


How to make a dictionary- 20.11.2007


TYPES OF LEXICAL INFORMATION: MORPHOLOGY (INFLECTION AND WORD FORMATION)


reasons for word formation:
- New concepts require new words
- Sometimes new words are invented on the spot

who needs word formation?:
- Scientists
- Engineers
- Product branding companies:
- Everybody
* including poets...


Branches of morphology:














Morphology:
- Inflection:
* Functionality (external structure):
o marks the relation of words to their contexts
o no change in the basic meaning of words
* Form (internal structure):
o affix (prefix, suffix, infix), superfix, stem vowel change
- Word formation:
* Functionality (external structure):
o creation of new words / parts of speech / meanings
o in principle infinite extendability of the lexicon
* Form (internal structure):
o Root/morpheme creation (blending, abbreviation, ...)
o Derivation: 1 stem + affix (prefix, suffix, infix), superfix, vowel change
o Compounding: 2 stems, perhaps with interfix or inflection-like affix

- Morphemes are:
* smallest meaningful parts of words
- There are 2 main morpheme types:
* lexical morpheme (content morpheme, root):
o open set: girl, boy, car, box, spoon, grass, sky
* grammatical morpheme (structural morpheme):
o closed set
~ free: grammatical words: prepositions, conjunctions,auxiliary verbs
~bound: affixes, suffixes (inflection and derivation)


Properties of inflection:
- External structure:
* marks the syntagmatic relation of words to their contexts
o syntactic contexts (agreement in person, number, case):
~ subject-verb (English)
~ subject verb; determiner - adjective - noun, preposition-nominals (German)
o situational contexts:
~ Verbs: temporal relations, spatial relations
~ Nominals: quantity and definiteness relations
- Internal structure: stem + affix
* Prefix
* Suffix
* Circumfix
* Infix
* superfix

Construction of inflected words:
- a stem + an inflection

* the stem has lexical meaning, e.g.:
o table, chair, cabbage, happiness, wonderful, blog
* the inflection has grammatical meaning
o relates a word to its syntactic context:
~ subject-verb agreement (person, case, number)
o relates a word to its semantic context:
~ tense/time, quantity, speaker-addressee, ...
o e.g.: cats, dogs, horses, sheep, oxen, men, women, children



How words are built - form + function:

- Derivation:
* derivations (based on one root):
o unable, impossible, antidisestablishmentarianism
o skilful, reddish, happiness
* Internal structure of derivations:
o 1 stem + affixes: prefix, suffix, circumfix, infix, superfix
* External structure of derivations:
o suffixes in English may create new Parts of Speech (POS)
o all affixes create new meanings

- Derivations consist of one stem with an affix.
* However, the stem itself may consist of a stem with an affix
* Therefore the stem has to be defined in a recursive definition

- A stem is :
* a root (simplest case)
* a stem plus an affix (complex cases)
Example: beautifully

stem = root = beauty
stem = stem + affix = beauty + ful
stem = stem + affix = beauty + ful + ly

Another example: Work out the derivation of antidisestablishmentarianism














- Compounding:


* Form (internal structure): from at least 2 existing stems
o lamp-post
o whisky-soda
o red-head


Four main types of compound:
- endocentric (tatpurusa):
* jam-jar, honeypot, harddisk, bus-stop, ...
o An armchair is a chair
- bicentric (dvandva):
* fighter-bomber, gentleman-farmer
o whisky-soda: A whisky-soda is a whisky and a soda.
- exocentric (bahuvrihi):
* blue-stocking, redskin,
o red-head: A redhead has a red head.
- synthetic compounds (both derivation and compound): bus-driver, steam-roller


- compounding: a stem plus another stem
- three main types:
* endocentric: armchair An armchair is a chair.
* bicentric: whisky-soda A whisky-soda is whisky and soda.
* exocentric: red-head A redhead has a red head.
- Synthetic compounds combine compounds and derivations:
* a derivation plus a stem, e.g.:
~ bus-driver
~ steam-roller




A hierarchy of words and their parts:

Some simplex and complex words:
- simplex:
* oh, ah, eh, oo, I, err, owe, ewe
* pa, ma, far, car, star
- complex:
* blends, abbreviations (simplex roots based on more than one stem):
o brunch, ... ; NATO, ...
* derivations (based on one root):
o unable, impossible, happiness, antidisestablishmentarianism
o temerity, antidisestablishmentarianism
* compounds (based on more than one root/stem):
o tatpurusa (endocentric): jam-jar, honeypot, harddisk, bus-stop, ...
o dvandva (bicentric): whisky-soda, gentleman-farmer, ...bahuvrihi (exocentric): red-head, redskin, blue
-stocking, ...

How to make a dictionary- 13.11.2007

TYPES OF LEXICAL INFORMATION: PRONOUNCIATION



surface structure:
-2 levels: (1) linguistic description--> metalanguage
(2) units of language--> object language



-surface structure of:

* dictionaries--> metalanguage: the typography and layout of a book, hypertext…

* words in dictionaries--> object language: spelling, pronounciation…



metalanguage: language to speak about another language
f.e.: german (= metalanguage) used to talk about english (= object language)



semasiological dictionary--> metalanguage (f.e. headword)
onomasiological dictionary--> object language



Transcription: different views of sounds:
(1) Narrow phonetic transcription (1 and 2 more important, than 3 and 4)
(2) Phonemic representation
(3) Broad phonetic transcription
(4) Morphonemic representation

Goals: (1) represent as many phonetic details of phones (i.e. the allophones of phonemes) as needed
(2) represent phonemes (generalised from allophones using only information on phonetic context) needed
in the lexicon
(3) represent phonemes (generalised from allophones using information on phonetic context) as they
occur in texts
(4) represent morphophonemes (generalised from phonemes with additional information on morphological context) aas they occur in grammatical contexts



Phonemic transcription:
- the transcription used in dictionaries
- preferably in IPA ( not in “ad hoc” pseudo-spelling)
- the minimum amount of pronounciation
* to distinguish words
* for a native (or other competent) speaker of the language



Phonetic transcription:
- the transcription used to give as many details of pronunciation as possible
- actual pronunciation of phonemes (varies in different contexts)

For phonemic pronunciation representation in the lexicon:
- phonemic transcription
- just enough phonetic detail to distinguish words

For detailed phonetic representation of pronunciation:
- phonetic transcription based on
- articulatory phonetics (about speech production)
- remember the other dimensions of speech description:
* acoustic phonetics (about speech wave transmission)
* auditory phonetics (about speech perception)



Sounds in dictionaries:
Prosodic hierarchy:

Phonemes:
Function: smallest word- distinguishing segments

Internal structure: configurations of distinctive phonetic features

External structure:smallest part of syllables

Rendering: contextual variants, allophones

Syllables:

Function: word distinguishing phoneme configuration

Internal structure: configurations of sequential features and simultaneous features

External structure: (word)

Rendering: a function of the rendering of phonemes



Phonemes:
Several ways of defining phonemes:
(1) the minimal word- distinguishing sound segment
(2) the smallest unit of a syllable
(3) consists of dictinctive features
(4) consits of a set of allophones



English syllables: basics:
- Basic syllable structure: CCCVVCCC, e.g. /streIndZ/ - but affricates /dZ/ count as 1 phoneme, though phonetically they have 2 parts
- More detailed syllable structure- like a map: this kind of map is sometimes called a “transition network” or a “state diagramme”
--> each transition from one circle/ pode/ state describes the correct position of one phoneme











Spelling- to- sound rules:
- spelling: VISUAL modality
* ghoti à pronounced: fish
- I before e except after c, consonant doubling



More about English Syllable Structure:





How to make a dictionary - 6.11.2007


LEXICAL DATABASES


Semasiological dictionaries:
– the basic form is a table
● the rows are lexical entries, with a specific microstructure
● the columns are single types of lexical information
– if the orthography or phonology of a lexical item is
ambiguous, then
● either the item is repeated with the new information
● or a sub-table is created

– but this depends on the kind of ambiguity:
● homonymy (homography, homophony)
● polysemy
Homonym --> spelling & pronounciation of two words are the same
Homograph --> spelling the same & pronounciation different
Homophom --> spelling different & pronounciation the same
The basic model of a table:
- Table: a list of rows
- Row: a list of fields
- Column: a list of fields in the same row position
Illustration of table construction in OpenOffice/ MS-Excel: