[Corpora-List] Linguistic Tree Constructor

Ulrik Petersen ulrikp at hum.aau.dk
Wed Aug 29 15:08:57 UTC 2007


maxwell at umiacs.umd.edu wrote:
> Hanane wrote:
>   
>> My data is a word file but i didn't succeed in opening it through ltc
>> ...
>> how can i change the extention of a .doc file to .txt or .gen file? and
>> does it help if i put my file under a format other than word?
>>     
>
> I don't know anything about ltc, but I can't imagine any program other
> than Word being able to read a Word doc file.  (Or any comp ling program
> being able to read any other word processing file, for that matter.)
>
> As for making this Word doc file usable, it's not (just) the file
> extension that you want to change, it's the contents of the file. 
> Probably you want to do something like 'File | Save As...' to save it in
> some kind of text format.  The particular text format you want to use will
> depend on your application; Word can save-as text files where new lines
> happen at each paragraph, or it can break paragraphs into lines at the
> points where you would get an apparent line break on-screen (or in a
> printed version of the document).
>
> Word will also let you choose whether to use LF or CR-LF as your line
> break characters (I would suggest LF, assuming you'll be working with
> Linux programs).
>
> And finally, unless the file is vanilla English, you'll probably need to
> choose the encoding.  Again, the correct choice depends on your
> application program, but for most purposes today, Unicode in the UTF-8
> encoding would be appropriate (and if it gives you a choice, don't save it
> with a BOM).
>
> If this doesn't give you a file your program can work with, you may want
> to sit down with someone who understands more about the nature of
> application data and file formats.
>
>    Mike Maxwell
>    CASL/ U MD
>   


Thanks, Dr. Maxwell.  As the author of Linguistic Tree Constructor, I 
had already sent Hanane a reply off-list, saying much the same thing as 
you did, only less detailed.  Thanks again.

Ulrik Petersen
--
Ulrik Petersen, PhD candidate
University of Aalborg, Denmark
http://ulrikp.org -- Homepage
http://emdros.org -- Emdros is a corpus query system


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list