[Corpora-List] Linguistic Tree Constructor

Michael Maxwell maxwell at umiacs.umd.edu
Wed Aug 29 14:46:56 UTC 2007


Hanane wrote:
> My data is a word file but i didn't succeed in opening it through ltc
> ...
> how can i change the extention of a .doc file to .txt or .gen file? and
> does it help if i put my file under a format other than word?

I don't know anything about ltc, but I can't imagine any program other
than Word being able to read a Word doc file.  (Or any comp ling program
being able to read any other word processing file, for that matter.)

As for making this Word doc file usable, it's not (just) the file
extension that you want to change, it's the contents of the file. 
Probably you want to do something like 'File | Save As...' to save it in
some kind of text format.  The particular text format you want to use will
depend on your application; Word can save-as text files where new lines
happen at each paragraph, or it can break paragraphs into lines at the
points where you would get an apparent line break on-screen (or in a
printed version of the document).

Word will also let you choose whether to use LF or CR-LF as your line
break characters (I would suggest LF, assuming you'll be working with
Linux programs).

And finally, unless the file is vanilla English, you'll probably need to
choose the encoding.  Again, the correct choice depends on your
application program, but for most purposes today, Unicode in the UTF-8
encoding would be appropriate (and if it gives you a choice, don't save it
with a BOM).

If this doesn't give you a file your program can work with, you may want
to sit down with someone who understands more about the nature of
application data and file formats.

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list