[Corpora-List] complete list of closed-class words in English

Siddhartha Jonnalagadda sid.kgp at gmail.com
Wed Nov 23 10:15:10 UTC 2011


Okay, before the conversation is completely hijacked... :)

I'm looking for an authentic and comprehensive list which is more like an
ontology, if you will. I would prefer a structured representation than just
a laundry list. For example, Determiners --> Articles --> Definite Articles
--> "the". I'm more of an applied NLP researcher. Sticking to some standard
version might work for people like me, even if it is slightly
controversial.

I would return the favor by releasing a Java source preloaded with this
information. If nothing works, I will have to go through some books like
Quirk et al. and compile one.

Sincerely,
Siddhartha Jonnalagadda, Ph.D.
sjonnalagadda.wordpress.com




On Wed, Nov 23, 2011 at 3:08 AM, Alexander Yeh <asy at mitre.org> wrote:

> One thing about citing or making use of Wikipedia contents: I find that I
> need to give a date for the "version" of the article that I am referring
> to. I have found some articles of interest to drastically change in its
> contents in less than a year.
>
> -Alex
>
> Yannick Versley wrote:
>
>>    But I disagree with your assumption that Wikipedia is not
>>    "authentic" - Wikipedia has sophisticated mechanisms for fostering
>>    and monitoring supervised collaboration, producing a resource which
>>    is arguably more authoritative and unbiased than a single-authored
>>    source; e.g. see [...]
>>
>> I think what *was* meant was something closer to the original research.
>>
>> Wikipedia is meant as an encyclopedia - not a primary source (i.e.
>> original research) or
>> a secondary source (survey papers, textbooks). Encyclopedias (tertiary
>> sources) gain
>> credibility by looking (or formulating) a consensus between texts inside
>> a domain and
>> making them accessible to people outside a domain.
>>
>> WP often gets coverage for obscure topics that have no secondary sources
>> and then people delete the article for non-notability or lack of sources
>> (these
>> are part of the "sophisticated mechanisms" - Wikipedia has long drifted
>> away
>> from the initial anarchy, but it also introduced scary-looking people
>> with truncheons
>> in the process).
>>
>> If you look at the WP page on "Closed class", the article is relatively
>> short and
>> incoherent (but the WP:Administrators don't seem to have noticed it),
>> whereas
>> the one for "function word" is a lot longer and has "citation needed"
>> and "original
>> research" stuck to its top. The "function word" article links to a page
>> with a
>> list of "function words" where they include
>>
>>    Auxiliary Verbs
>>    Conjunctions
>>    Determiners
>>    Prepositions
>>    Pronouns
>>    Quantifiers
>>
>>
>> And, of course, some people think that prepositions don't really fit the
>> function
>> word criteria and say that they're somewhere between function words (which
>> usually have no meaning that is independent of context) and lexical
>> words (which do).
>> http://www.atsweb.neu.edu/**hlittlefield/ResearchDocs/**Chapter1.pdf<http://www.atsweb.neu.edu/hlittlefield/ResearchDocs/Chapter1.pdf>
>> seems to give a sensible overview on who claims what in that discussion.
>>
>> Best,
>> Yannick Versley
>>
>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>
>
>
> ______________________________**_________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111123/45c23935/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list