[Corpora-List] workstation advice for corpus linguistics work

Michal Ptaszynski ptaszynski at media.eng.hokudai.ac.jp
Tue Jan 18 13:16:44 UTC 2011


Dear Don

You might look for a configuration similar to the one as below.

1. RAM: 24GB or more. if you aim in processing speed and have a large  
budget, you might order a higher class RAM than the usual crap they put  
into PCs in stores.
2. hard disk, RAID: if you wish to do lots of queries to the corpus in a  
short time I'd recommend SSD, for example 4x256GB SSD in RAID 0 (= 1 TB  
SSD). However, since SSDs have a short durability, I'd also do frequent  
copies on a traditional hard drive. I was told that a year is long if you  
are a hard-core corpus analysis maniac. :)
3. CPU: depending on how much RAM you want to stuff your PC with, the CPU  
and therefore motherboard will also differ. For example, it is said that  
Intel's i7 processors swallow effectively not more than 24 RAM. If you  
want more, you should choos Xeon, etc.
4. OS: Linux and Win both in x64. Also, I'd recommend using 64 bit  
software, like Excel 2010. As for using Perl on 64bit machines, couple of  
years ago there were still some problems with compiling, but they should  
be resolved till now.

Best regards and good luck!
--
Michal PTASZYNSKI
Institute of Engineering, Hokkai-Gakuen University
High-Tech Research Center, Intelligent Techniques Laboratory 6,
Minami 26, Nishi 11, Chuo-ku, Sapporo, 064-0926, Japan
ptaszynski at hgu.jp, ptaszynski at ieee.org
TEL: +81-11-841-1161 (ext.: 7796), FAX: +81-11-551-2951
http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/

----------------------------
Od: Justin Washtell <lec3jrw at leeds.ac.uk>
Do: Donald E Hardy <donhardy at unr.edu>, "CORPORA at UIB.NO" <CORPORA at uib.no>
Data: Mon, 17 Jan 2011 21:17:32 +0000
Temat: Re: [Corpora-List] workstation advice for corpus linguistics work

Dear all,

I’m looking for advice on purchasing a workstation for corpus work.

These are the software that I will be using and operating systems that I  
am thinking I will need:

R (e.g., for multiple runs of Fisher’s exact test)
  Word
Windows
Linux
Perl programs (multiple text manipulation programs)
Excel
Access
Perhaps other SQL applications
XAIRA
ICECUP 3.1

I’m sure there will be other software packages added to the list.

Corpora include data gathered from Corpus of Contemporary American  
English, Corpus of Historical American English, BNC, Treebank, ICE-GB,  
Brown, Frown

I’m looking at Dell workstations.

Recommendations I’m looking for are operating system(s), CPU, RAM, Video  
card, hard disk, RAID.

I am relatively computer literate (program in Perl, manage a server); and,  
I do have expert technicians for help and advice locally.  However, I  
don’t have anyone locally for advice on the best system setup for corpus  
linguistic work.

Thanks very much,

Don Hardy

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list