[Corpora-List] Cleaning text to take word frequency

True Friend true.friend2004 at gmail.com
Sun Jun 1 11:07:38 UTC 2008


HI
I am a corpus linguistics student and learning C# for this purpose as well.
I've created a simple application to find the frequency of a given word in
two files. Actually this simple application is a practice version in C# of a
Perl script a respected subscriber of this list (Alexander Schutz) written
for me on my request on this list. I needed it then, now I am trying to
programm myself so I tried to implement that idea in C#.
I have done that all and it works also but it does not give me 100%
frequency of the word as the Perl script does. What I've done is that the
application takes three files as input. 1 wordlist which it reads line by
line and stores in an array. other two are simple text files which are
splitted by c# String.Split() method. I've used an array of characters like
';', ',' etc. The resulting string array was cleaned from such characters
but I couldn't get the 100% result. The frequency of most words are less
than that of Perl script (which does the same thing). After trying myself I
am requesting here if someone can help me. I am attaching both files (Perl
script and C# .cs file) so you can examine the code and point out where I am
wrong.
Regards
-- 
Muhammad Shakir Aziz محمد شاکر عزیز
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080601/2afaca2b/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: frequency.cs
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080601/2afaca2b/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wordlist_corpus_freq.pl
Type: application/octet-stream
Size: 5433 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080601/2afaca2b/attachment-0001.obj>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list