new transcription norms?

Brian MacWhinney macw at cmu.edu
Wed Sep 17 03:21:28 UTC 2003


On 9/15/03 10:00 AM, "Camilla Bardel" <camilla.bardel at fraita.su.se> wrote:

> Hello,
> we are working on a large corpus of transcribed and coded italian
> interlanguage since a couple of years.
>

Interlanguage is a somewhat ambiguous term.  For some folks, it means "a
stage in L2 acquisition".  For others it means some complex sort of
interlanguage code-switching or mixing.


> We redownloaded childes on September 4, 2003, and suddenly CHECK doesn't
> accept our transcribed files anymore. It reacts to the tabs following the
> participant names. We also noticed that the symbol "-:" wasn't accepted
> anymore. Has it been replaced by another symbol or do we have to remove all
> the occurrences of this symbol from our files?
>

There have been truly massive changes in CHAT over the last two months.  The
manual has been updated to cover the new codes, but it is a very good idea
to discuss these changes in info-chibolts, since it is often difficult to
work through all the details of the manual.

First, regarding the tabs on the header tiers and elsewhere, the standard
has always been that immediately after the colon there should be one tab and
no other character.  However, until recently, CHECK did not enforce this
standard.  It does now.  It is pretty easy to fix these globally with either
CHSTRING or else some method of applying regular expression replacement.
For example, some corpora have lots of spaces after the tab, so I replace
tab+space with tab and that replacement can't really hurt anything.

Regarding the -: symbol, the problem was that it was redundant with the
word-internal colon that represents drawling or lengthening.  For the new
XML schema, this kind of ambiguity is fatal and had to be fixed. The easiest
way to fix this is to globally replace " -:" with ":"  This moves the
lengthening to the end of the word, which is legal.  The colon can be placed
anywhere inside a word to represent lengthening of a segment and it will be
ignored by programs such as FREQ and such.

> In some of our transcriptions we have placed the bullet directly after the
> participants tier.

Why did you put the bullet here?  If there is a good reason for doing this,
we could change CHECK and the Schema, so please provide a bit more detail.

> This has not been a problem before, but now CHECK comes
> up with a message saying:
> Item '<bullet>' must be preceded by text or '0'.(73)
>
> Are there other changes that we need to know about in order to get CHECK to
> accept our files?

I can't tell without seeing your files.  Try to fix one file  and see if you
have any other problems.  If you have lots of problems, then send me a
sample.

> How can we update our files?

You need to correct the errors.  The best tool I have ever found for doing
this is BBEdit on the Mac.  I think that there must be similar tools on
Windows, but I don't use Windows.

> We have looked in the manual for a description of the error numbers that
> appear in the results of CHECK, but we can't find any - is there one
> available somewhere?
>

I thought that the messages that the program gave were usually descriptive
enough.  They say things like "illegal use of delimiter in a word" and then
show you which word is bad.  Are there some that are hard to understand?

> Finally, we wonder how important it is to adapt to all these changes - what
> are the technical consequences if we continue using our transcriptions the
> way they are transcribed originally?
>

If you hope to contribute your file to the database, they would have to pass
CHECK, so that they could be converted to XML.  If you hope to use newer
versions of the program that we are now developing, you would also need to
fix them.  However, if you only plan to use the older versions of the
program and don't plan on sharing your data with other people, you could
continue with files that do not pass CHECK.  There would, of course, be
various errors and inconsistencies here and there, but probably only a few
of them would effect your output results.

--Brian MacW

> Best regards,
>
> Camilla Bardel, Johanna Wahlberg and Anna Gudmundson
>
>
>
>



More information about the Chibolts mailing list