[Corpora-List] Cleaning text to take word frequency

True Friend true.friend2004 at gmail.com
Tue Jun 3 07:59:39 UTC 2008


Thanks for your message. The Perl script is written not by me, the person is
a subscriber of this list who made it on my request. It is obviously better
to use hashtables and C# do have hashtables also. The reason of using arrays
is just to practice what I've learnt. Your valuable suggestions about
delimiters and changing the code will help me to make it better. I'll use
hashtalbes next time and will apply better methods approach and will remove
duplicate methods.
Thank for your kind suggestions.
Regards

On Tue, Jun 3, 2008 at 11:28 AM, jeremy ellman <jeremyellman at gmail.com>
wrote:

> Hi,
>
> The two implementations are quite different. In Perl you are using hashes,
> and in C# you are using arrays. C# has hashtables and most other features
> that Perl has, including iterate over hashtables.
>
> Split and regular expressions are identical between C# and Perl (leastwise,
> I've never found a difference), but I do notice that you are using different
> delimiters as you declare them twice. I suggest that you make delims an
> instance variable, and declare that once. Then you can test you application
> with a small paragraph length text to understand what it is the two programs
> do differently
>
> It is also a bad idea to use static methods in C# unless you need to (and
> you don't).
>
> Incidentally, it is much faster to write Corpus applications in Perl,
> although C# apps are more robust.
>
> Jeremy
>
>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>


-- 
Muhammad Shakir Aziz محمد شاکر عزیز
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080603/8eacf0c4/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list