[Corpora-List] workstation advice for corpus linguistics work

Florian Petran florian.petran at googlemail.com
Tue Jan 18 15:43:58 UTC 2011


Basically what Michal said, but a little more on SSD: There are
different options if you go down that road. Multi Level Cell SSD are
cheaper (about half the price per GB), but less durable than Single
Level Cell ones - meaning they allow less writes. They are also
slightly slower. But according to Intel, even their MLC SSD X25-M will
last at least 5 years at a rate of 20GB written per day.[1] If you
expect to exceed that by much, you could consider using the SSD for
read-only data and maybe a HDD RAID or plain HDD for the writes - if
that's feasible for the type of work you're doing.

RAID: An SSD RAID, according to my information, doesn't gain that much
in performance over a single large one - at least it's not comparable
to the gain when using HDD RAID. I also understand that the SSD can't
use their TRIM feature when in a RAID. That means that their
performance degrades further over time, as the TRIM feature cleans up
the disc and saves unnecessary cache reads when writing. They even
make 1TB SSD now, or if you want to combine several smaller ones, you
can maybe use something like LVM or it's Windows equivalent.

OS: When you decide to use SSD with Windows, you should definitely use
Windows 7 as it is the only version that is optimized for both HDD and
SSD. Also, you might be unable to use the disc's TRIM feature with the
drivers supplied with older versions of Windows.

Best regards,

Florian Petran

[1] http://www.electroiq.com/index/display/article-display/337689/articles/solid-state-technology/semiconductors/industry-news/business-news/2008/08/intels-take-on-the-hdd-vs-ssd-debate.html

2011/1/18 Michal Ptaszynski <ptaszynski at media.eng.hokudai.ac.jp>:
> Dear Don
>
> You might look for a configuration similar to the one as below.
>
> 1. RAM: 24GB or more. if you aim in processing speed and have a large
> budget, you might order a higher class RAM than the usual crap they put into
> PCs in stores.
> 2. hard disk, RAID: if you wish to do lots of queries to the corpus in a
> short time I'd recommend SSD, for example 4x256GB SSD in RAID 0 (= 1 TB
> SSD). However, since SSDs have a short durability, I'd also do frequent
> copies on a traditional hard drive. I was told that a year is long if you
> are a hard-core corpus analysis maniac. :)
> 3. CPU: depending on how much RAM you want to stuff your PC with, the CPU
> and therefore motherboard will also differ. For example, it is said that
> Intel's i7 processors swallow effectively not more than 24 RAM. If you want
> more, you should choos Xeon, etc.
> 4. OS: Linux and Win both in x64. Also, I'd recommend using 64 bit software,
> like Excel 2010. As for using Perl on 64bit machines, couple of years ago
> there were still some problems with compiling, but they should be resolved
> till now.
>
> Best regards and good luck!
> --
> Michal PTASZYNSKI
> Institute of Engineering, Hokkai-Gakuen University
> High-Tech Research Center, Intelligent Techniques Laboratory 6,
> Minami 26, Nishi 11, Chuo-ku, Sapporo, 064-0926, Japan
> ptaszynski at hgu.jp, ptaszynski at ieee.org
> TEL: +81-11-841-1161 (ext.: 7796), FAX: +81-11-551-2951
> http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/
>
> ----------------------------
> Od: Justin Washtell <lec3jrw at leeds.ac.uk>
> Do: Donald E Hardy <donhardy at unr.edu>, "CORPORA at UIB.NO" <CORPORA at uib.no>
> Data: Mon, 17 Jan 2011 21:17:32 +0000
> Temat: Re: [Corpora-List] workstation advice for corpus linguistics work
>
> Dear all,
>
> I’m looking for advice on purchasing a workstation for corpus work.
>
> These are the software that I will be using and operating systems that I am
> thinking I will need:
>
> R (e.g., for multiple runs of Fisher’s exact test)
>  Word
> Windows
> Linux
> Perl programs (multiple text manipulation programs)
> Excel
> Access
> Perhaps other SQL applications
> XAIRA
> ICECUP 3.1
>
> I’m sure there will be other software packages added to the list.
>
> Corpora include data gathered from Corpus of Contemporary American English,
> Corpus of Historical American English, BNC, Treebank, ICE-GB, Brown, Frown
>
> I’m looking at Dell workstations.
>
> Recommendations I’m looking for are operating system(s), CPU, RAM, Video
> card, hard disk, RAID.
>
> I am relatively computer literate (program in Perl, manage a server); and, I
> do have expert technicians for help and advice locally.  However, I don’t
> have anyone locally for advice on the best system setup for corpus
> linguistic work.
>
> Thanks very much,
>
> Don Hardy
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list