Arabic-L:LING:Followup on Concordancer Query

Dilworth Parkinson dilworth_parkinson at byu.edu
Mon Mar 7 22:48:24 UTC 2005


------------------------------------------------------------------------
-
Arabic-L: Mon 07 Mar  2005
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Followup on Concordancer Query

-------------------------Messages-----------------------------------
1)
Date: 07 Mar  2005
From:wasamy at umich.edu
Subject:Followup on Concordancer Query

Hi Martha, nice to 'see' you again.

The following assumes you are using the latest Windows and the latest
Office
:)

The first step is to create a working copy of a small corpus, say an
hour's
worth of kalaam from al-Jazira, or some such, which you paste into Word.

Make no formatting changes to this file; any formatting can be done
later in
a Working copy of the file.

You might wish to determine what encoding scheme under which to save the
file, and there are important considerations.  I hope others on the list
will say something about this.
Select either Windows (Arabic) (called 1256 code page), which works
well, or
Unicode, which is said to be the future for all things computerese.

Thus, for strategic purposes, it might be better to do your work with
Unicode, instead of Windows (1256).  I have not worked with Unicode yet,
believing as I do that I can convert my files later if I have to.

You will want to save your Working file as "text" so as to take out all
the
formatting codes from your working file, which you don't need.

I have found that Notepad is the application that I am comfortable doing
this with.  In other words, I select (ctrl-a) all the text in Word, then
paste it into Notepad.
Now, the Notepad document can be saved with the Windows 1256 encoding
scheme
or Unicode.

To save the Notepad working file as Unicode file select Save As, in the
"Encoding" box, select, UTF-8, and in the "Save as type"  box, select
"Text
Documents".  Otherwise just save it as text.
(It is also helpful to save the file with line breaks (LF), which seems
to
have disappeared from my latest Office(!))

Next, download trial versions of different concordance software.  Give
yourself plenty of time to 'fiddle'.  Mine is currently set up, so I
don't
remember exactly what I did.

Basically, start the concordance software and have it load your work
file.
If you see garbage, then it is an encoding problem.  Concordance will
let
you change the encoding settings.  With Concordance, it was also
necessary
to swap the location of the two context columns so as to display the
text in
correct sequential order.  You will find there to be a lot of
'adjustments'
to do until you are comfortable.

I have been able to do work with Concordance, but it has not been
completely
compatible with Arabic.  So I have had to develop some workarounds.
RJWatt,
the author, has recently released an upgrade, but I've not yet
installed it.

Our friends at Nimegen have a lot of experience with this, and to the
best
of my knowledge, they have been using Monoconc.  Download each program
and
try it out.  I experimented with both about 4 years ago.  I don't
remember
why I chose Concordance.

Good luck Martha.

Waheed

------------------------------------------------------------------------
--
End of Arabic-L:  07 Mar  2005



More information about the Arabic-l mailing list