new transcription norms

camilla bardel camilla.bardel at fraita.su.se
Mon Sep 22 13:50:18 UTC 2003


Dear Brian,
thank you for answering all our questions. I will
try to clearify on some points:

Our project is on L2 acquisition of Italian. We
are collecting and transcribing data from
university students of Italian at the University of
Stockholm.

Regarding the tabs on the header tears: When
transcribing, we have always used the Ctrl + 1;
Ctrl + 2 etc. function for the respective
participants of the recording, and CHECK has
never reacted on the tabs before, that's why we
found it strange that it suddenly did. Anyway, we
followed your advice with CHSTRING and fixed
it. Thank you!

We also tried to use CHSTRING for replacing ()
with (xxx). (We have been trying to avoid
guessing what the learner would have said, had
he/she completed the word - this is why we
have been using empty parentheses () for
incompleted words). This did not work however.
We find it strange however that CHSTRING
does not accept (), since it should accept all
ASCII characters.

Would you say it's OK to use "xxx" within the
parentheses, instead of the actual letters one
believes the speaker might be leaving out?

We will take your advice and replace "-:" with ":".
Then should we or should we not leave a space
before the colon at the end of a word?

As for the error numbers, we were just curious
to know if there were any list that we might have
missed. As a matter of fact the messages in
CHECK are quite clear, so that's not a
problem.

Our problems with the placement of some of
our bullets origin in the fact that one transcriber,
who used to work with us, decided to place all
the bullets at the beginning of the transcribed
material, i.e. after *XYZ:

There was actually no particular reason for this,
as far as I understand, she probably just
thought that one could place them
either before or after the transcribed material,
creating in either way a space between two
bullets, where the sound was represented.

We will change it; the only way of doing that
without having to redo the sound connection is
probably to cut and paste...

Finally, I would like to add that when the corpus
is fit for it, we do hope to contribute to the
database!

Thanks again for your help.
Camilla Bardel

> Hello,
> we are working on a large corpus of
transcribed and coded italian
> interlanguage since a couple of years.
>

Interlanguage is a somewhat ambiguous
term.  For some folks, it means "a
stage in L2 acquisition".  For others it means
some complex sort of
interlanguage code-switching or mixing.


> We redownloaded childes on September 4,
2003, and suddenly CHECK doesn't
> accept our transcribed files anymore. It reacts
to the tabs following the
> participant names. We also noticed that the
symbol "-:" wasn't accepted
> anymore. Has it been replaced by another
symbol or do we have to remove all
> the occurrences of this symbol from our files?
>

There have been truly massive changes in
CHAT over the last two months.  The
manual has been updated to cover the new
codes, but it is a very good idea
to discuss these changes in info-chibolts,
since
it is often difficult to
work through all the details of the manual.

First, regarding the tabs on the header tiers and
elsewhere, the standard
has always been that immediately after the
colon there should be one tab and
no other character.  However, until recently,
CHECK did not enforce this
standard.  It does now.  It is pretty easy to fix
these globally with either
CHSTRING or else some method of applying
regular expression replacement.
For example, some corpora have lots of spaces
after the tab, so I replace
tab+space with tab and that replacement can't
really hurt anything.

Regarding the -: symbol, the problem was that
it
was redundant with the
word-internal colon that represents drawling or
lengthening.  For the new
XML schema, this kind of ambiguity is fatal and
had to be fixed. The easiest
way to fix this is to globally replace " -:" with
":"  This moves the
lengthening to the end of the word, which is
legal.  The colon can be placed
anywhere inside a word to represent
lengthening of a segment and it will be
ignored by programs such as FREQ and such.

> In some of our transcriptions we have placed
the bullet directly after the
> participants tier.

Why did you put the bullet here?  If there is a
good reason for doing this,
we could change CHECK and the Schema, so
please provide a bit more detail.

> This has not been a problem before, but now
CHECK comes
> up with a message saying:
> Item '<bullet>' must be preceded by text or
'0'.(73)
>
> Are there other changes that we need to know
about in order to get CHECK to
> accept our files?

I can't tell without seeing your files.  Try to fix one
file  and see if you
have any other problems.  If you have lots of
problems, then send me a
sample.

> How can we update our files?

You need to correct the errors.  The best tool I
have ever found for doing
this is BBEdit on the Mac.  I think that there must
be similar tools on
Windows, but I don't use Windows.

> We have looked in the manual for a
description of the error numbers that
> appear in the results of CHECK, but we can't
find any - is there one
> available somewhere?
>

I thought that the messages that the program
gave were usually descriptive
enough.  They say things like "illegal use of
delimiter in a word" and then
show you which word is bad.  Are there some
that are hard to understand?

> Finally, we wonder how important it is to adapt
to all these changes - what
> are the technical consequences if we continue
using our transcriptions the
> way they are transcribed originally?
>

If you hope to contribute your file to the
database, they would have to pass
CHECK, so that they could be converted to
XML.  If you hope to use newer
versions of the program that we are now
developing, you would also need to
fix them.  However, if you only plan to use the
older versions of the
program and don't plan on sharing your data
with other people, you could
continue with files that do not pass
CHECK.  There would, of course, be
various errors and inconsistencies here and
there, but probably only a few
of them would effect your output results.

--Brian MacW

> Best regards,
>
> Camilla Bardel, Johanna Wahlberg and Anna
Gudmundson









--
Camilla Bardel
Institutionen för Franska och Italienska
Stockholms Universitet
106 91 Stockholm
tel. 0046-08-16 35 88
--



More information about the Chibolts mailing list