[Corpora-List] tagset for latin in tree-tagger

BOFÍAS ALBERCH, EVA eva.bofias at upf.edu
Wed Mar 20 11:10:37 UTC 2013


Thank you Marco for your reply.
Our corpus has classic as well as medieval Latin.
It woud be very good if you could provide us with Index Thomisticus Treebank we
might have better results.

If there is anyone interested the corpus can be consulted at:
http://parles.upf.edu/llocs/cqp/latin/
user:guest
password:guest

Eva Bofias

2013/3/19 Passarotti Marco Carlo <marco.passarotti at unicatt.it>

>  Hi Eva,****
>
> ** **
>
> in this paper the results on an experiment on PoS-tagging Latin with
> TreeTagger are reported.****
>
> ** **
>
> Bamman, D. & Crane, G. (2008). Building a Dynamic Lexicon from a Digital
> Library. In Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital
> Libraries (JCDL 2008).****
>
> ** **
>
> The authors use the tagset of the Perseus Digital Library.****
>
> The training set features Classical Latin texts.****
>
> ** **
>
> But this is not the training set used to train the Latin parameter file
> available on the website of TreeTagger.****
>
> No tagset documentation on Latin is reported on TreeTagger homepage. From
> the parameter file, it seems like it uses the tagset of William Whitaker's
> Words: but I am not sure.****
>
> The Latin TreeTagger was trained using resources (treebanks) that share
> the same syntactic annotation style, but feature different morphological
> tagsets. Further, the language of the three Latin resources used to train
> the tagger is pretty different (Classical Latin, Late Latin, Medieval
> Latin; prose-poetry; different authors).****
>
> From our experience, I can say that genre, author and era are features
> that affect very much the performances of PoS taggers (at least for ancient
> languages). Thus, maybe it is better to train a tool with less data, but
> more homogeneous.****
>
> ** **
>
> It depends on "which kind of Latin" you want to tag.****
>
> If you are interested in tagging Medieval Latin, I can provide you with
> the Index Thomisticus Treebank and you can train the HunPos tagger by
> yourself (it works very well with our data).****
>
> ** **
>
> Hope it helps.****
>
> ** **
>
> Best,****
>
> ** **
>
> Marco****
>
> ** **
>
> *Da:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *Per conto
> di *BOFÍAS ALBERCH, EVA
> *Inviato:* martedì 19 marzo 2013 13.06
> *A:* corpora at uib.no
> *Oggetto:* [Corpora-List] tagset for latin in tree-tagger****
>
> ** **
>
> Hi,
> I am using the Tree-Tagger for tagging a Latin corpus. I haven't been able
> to find the tagset. Does any one have it or know where to find
> documentation related to the tags they use in Latin?
>
> Thanks
> Eva Bofias****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130320/4579ecfb/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list