[Corpora-List] Common connectors

Yuanyong Wang wyy at cse.unsw.EDU.AU
Sun Apr 24 09:18:49 UTC 2005


    Hi Wallace:

           I'm doing a bit of research in NLP as well, I think it depends
on what do you mean when refering to "common", there there are different
sets of common words for different domains. If the connectors play
some role similar to that of functional words then I suggest they are all
common(irrespective to domains). Regardless, 270 words extracted
from a five-million word corpus don't seem to be a very big set. I guess
you want to make some differentiation within the set itself, then relative
frequency would be useful for this purpose. I don't know much of your
research context, I hope this could shed a thread of light on the matter.




     Regards
     Robin.



On Fri, 22 Apr 2005, Wallace Chen wrote:

> Dear Corpora colleagues,
>
> I am currently doing a research on Chinese connectors, which have around 270 types and broadly include conjunctions and sentence adverbs. These are derived from a five-million-word corpus of contemporary Chinese. My question is how to determine which ones are "common"? Are there statistical criteria (e.g. cut-off point) to determine "common connectors" from such a list? Do I look at their frequencies or rankings? I appreciate anyone who can help me answer the questions or direct me to relevant resources. Thanks in advance for all your help!
>
> Wallace Chen



More information about the Corpora mailing list