[Corpora-List] Sentence Initial Constituents

Adam Kilgarriff adam at lexmasterclass.com
Fri Nov 23 09:53:10 UTC 2007


Tom,

1.	Get a Sketch Engine account (http://www.sketchengine.co.uk ) 
2.	Load your corpora using the CorpusBuilder facility
	  Choose the template to enable TreeTagger
	   (which does sentence-breaking)
3.	Search (in the CQL box under Concordance/Keyword) for 
		<s> ".*"
	(e.g. new sentence followed by any word)
4.	Use the "Frequency" button to get a frequency list of the words
matched

Adam

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Tom Rankin
Sent: 22 November 2007 14:36
To: CORPORA at uib.no
Subject: [Corpora-List] Sentence Initial Constituents


Dear all,

I need to find all the sentence initial constituents in a number of 
smallish corpora (each c. 200k words) - problem is i want to know if 
they' are subjects or some other constituent, i can't use just want a 
list of words. corpora are tagged but not parsed (and aren't likely 
to be). do i just have to do lots of manual sorting of concordances 
with full stops or can someone suggest a better labour saving idea??

cheers

tom


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list