19.410, Disc: Automatical Metrical Markup

Mon Feb 4 22:08:10 UTC 2008

LINGUIST List: Vol-19-410. Mon Feb 04 2008. ISSN: 1068 - 4875.

Subject: 19.410, Disc: Automatical Metrical Markup

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Ann Sawyer <sawyer at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 01-Feb-2008
From: Klemens Bobenhausen < klemens.bobenhausen at germanistik.uni-freiburg.de >
Subject: Automatical Metrical Markup

-------------------------Message 1 ---------------------------------- 
Date: Mon, 04 Feb 2008 17:06:23
From: Klemens Bobenhausen [klemens.bobenhausen at germanistik.uni-freiburg.de]
Subject: Automatical Metrical Markup
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-410.html&submissionid=168121&topicid=5&msgnumber=1  

All,

Automatic metrical markup (AMM) of written (not spoken) poetry means to
reach a 100% computer based analysis of the metrical information of a poem,
beginning with identifying poems (not yet reached), strophes, verse lines,
words, syllables and ending with distinction of pronounced (+) and
unpronounced (-) syllables and rhyme-schemata and putting all these
analysis in an XML-document (TEI P5 compatible).

Strophe 1:
Silbenzerlegung:
  (Fried|lich) (be|käm|pfen) 
  (Nacht) (sich) (und) (Tag) . 
  (Wie) (das) (zu) (däm|pfen) , 
  (Wie) (das) (zu) (lö|sen) (ver|mag) ! 
Metrik:
  Silben=5, Betonung=''+--+-''
  Silben=4, Betonung=''+--+''
  Silben=5, Betonung=''+--+-''
  Silben=7, Betonung=''+--+--+''
Reim:
   Endreim=''abab'' (Kreuzreim)

After I collected lots of prosodic forecasts of the German written
language, I'm now able to analyse regular poems (with a regular row of
pronounced and unpronounced syllables for each verse/strophe) in about 100%
- and irregular poems (with an irregular row of pronounced and unpronounced
syllables for each verse/strophe) in about 98% of their syllables. The
amount of percents is a set of syllables

a) defined over pronounced syllables (60%)
b) defined over euphonic rules (25%)
c) defined over analogies to other verses (7%) 
d) defined over unpronounced syllables (5%)
e) defined over rhymes (1%)

I'm not using any kind of POS or morphological tagging, because the system
should work also with historical texts and their orthography. The missing
2% are coming from foreign or non-Germanic words (like 'Musik' or 'Natur')
and compounds, which in German language are mostly pronounced on the part
of the compound which describes the other part (like 'Biergarten', being
pronounced on the first syllable, because 'Bier' describes which kind of
'Garten' a 'Biergarten' is.)

And now I'm out of ideas and need assistance. Is anyone interested in stuff
like this? The algorithm will not work with other languages than German,
but the ideas may. 

Klemens (+-) 

Linguistic Field(s): Computational Linguistics
                     Ling & Literature
                     Phonology
                     Text/Corpus Linguistics

-----------------------------------------------------------
LINGUIST List: Vol-19-410