[Corpora-List] Keywords Generator (fwd)

Jason Baldridge jbaldrid at mail.utexas.edu
Mon Feb 18 15:16:26 UTC 2008


If you'd like to learn more detail about the *nix commands and learn how to
roll your own, check out Chapter 3 of Chris Brew and Mark Moens book draft:
http://www.ling.ohio-state.edu/~cbrew/2007/spring/684.02/dilbook.pdf

We also have a tips and tricks wiki for UT Austin's compling lab that
includes some notes on Unix commands:

http://comp.ling.utexas.edu/wiki/doku.php/tips_and_tricks#handy_unix_commands

Also, on a related note, we put Peyton Todd's corpus linguistics compilation
(posted to corpora list some time ago) on our wiki and added to it:

http://comp.ling.utexas.edu/wiki/doku.php/corpus_linguistics

Others are welcome to add to the wiki if they wish.

Jason

On Feb 18, 2008 8:44 AM, Trevor Jenkins <trevor.jenkins at suneidesis.com>
wrote:

> On Mon, 18 Feb 2008, True Friend <true.friend2004 at gmail.com> asked for
> help:
>
> Antconc has a word frequency count feature. Why not use that?
>
> Ben Allison has given you a UNIX solution. Here's mine
>
> tr "[:space:]" "\n" <Sense\ and\ Sensibility.txt|tr "[:upper:]"
> "[:lower:]"|tr -d "[:punct:]"|sort|uniq -c|sort > SS-list
>
> Change "Sense\ and\ Sensibility.txt" and "SS-list" to what ever your own
> files are call. You can tell what I've been playing with recently. ;-)
>
> The difference between mine and Ben's is mine relies solely upon standard
> filters that should be available on every UNIX machine. You might not have
> Perl installed, which is required by Ben's version. Of course, you might
> not have the GNU version of textutils, which I'm relying upon. We're both
> sorting on ascending frequency.
>
> > Hi Folks
> I need a a programm/script (even of *nix) that can provide frequency of a
> wordlist from two corpora. Actually I have made this list by comparing two
> word lists one from general english (specifically from Pakistani Origin)
> and
> law english (also of Pakistani origin). I know want to present these
> keywords with their frequencies in both corpora as a proof that these
> words
> are more frequent in law. Keywords are generated by Antconc.
> Is there any script/tool that can generate a parallel list of frequencies
> of
> each word in both corpora?
> Regards
> M Shakir Aziz
> A Corpus Linguistics Student
> Pakistan
>
> --
> محمد شاکر عزیز
>
>
> Regards, Trevor
>
> <>< Re: deemed!
>
>
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://comp.ling.utexas.edu/jbaldrid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080218/5566e953/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list