[Corpora-List] How long is the sentence?

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Thu Oct 6 16:57:50 UTC 2011


Counting 'number of words' to measure sentence-length surely ignores compounding and agglutinative tendencies?
Counting morphemes might be a more accurate lexico-grammatical measure, but is not yet feasible computationally?
I'm not sure whether counting characters would be an adequate approximation?

To illustrate the problem, the following comparisons of George Orwell's 1984 came from TELRI, perhaps via Multext-East:


Sentences

Paragraphs

Words

English

6701

1286

104,302

Bulgarian

6649

1321

87,235

Czech

6714

1285

80,366

Estonian

6658

1289

79,334

Hungarian

6732

1292

81,147

Romanian

6487

1335

101,460

Slovene

6689

1288

91,619


best


Ramesh Krishnamurthy
Visiting Academic Fellow, School of Languages and Social Sciences, Aston University, Birmingham B4 7ET
Room: NX01. Tel: 0121-204-3812.
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/
Corpus Consultant, GeWiss (Volkswagen Foundation) project: http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academic-discourse/


Message: 7

Date: Wed, 5 Oct 2011 19:39:51 +0100

From: Trevor Jenkins <trevor.jenkins at suneidesis.com<mailto:trevor.jenkins at suneidesis.com>>

Subject: Re: [Corpora-List] How long is the sentence?

To: Corpora List <Corpora list <corpora at uib.no<mailto:corpora at uib.no>>>



It depends upon many factors not all of which fall within your categories rather it is possible the detailed argument that the sentence(s) are intended to convey to the reader/speaker requires long or longer sentences instead of shorter ones than might normally be expected so that the meaning is given. The foregoing being a deliberately long sentence as an exemplar. Some authors will use different length sentences to provide pace and variety to their writing.  Short sentences are boring. Short sentences do not help. Sequences of short sentences lack flavour. There is no colour. The discourse becomes stilted.



Some languages seem to encourage the use of long languages. The Anglo-Irish author George Bernard Shaw wrote an English example that parodied what he perceived to be the Gemanic style of very long sentences. His sample contained in excess of 140 words. My opening sentence is an attempt to mimic his critique within the context of answering your question. Personally I allow a little more lattitude in this by saying that some cultures have writings that are paratactic while others are hypotactic. The argumentation style is different. And the culture of the each re-inforces the para-/hypotaxic style of authors, which might be seen as a taught trait. Sentences in one therefore could be longer than those in another. Lopez Guix and Wilinkinson argue this explicitly when comparing English and Spanish (see their 1997 text Manual de Traducion).



In addition to Shaw's Germanic parody there are real examples of long, very long sentences in literature. The Apostle Paul writing in Koine Greek during the first century AD commonly constructs long and intricate sentences. In one of his epistles he exceeds Shaw's word count. Whereas the writer of the gospel of Mark uses shorter sentences, principally because he is writing in a second language not in his native tongue. Indeed his shorter sentences mark him out as a second language user as a result.



Which authors did you look at? What genre(s) were they writing in? Writing for fiction could produce different length sentences than if one were writing non-fiction. What periods did the writers live between? Dickens, as an example of 19th century English writing, appears to construct much longer sentences than do contemporary English writers. One might also ask another chronologic focused question, what age were the writers whose examples you analysed? Younger writers, children especially, pen short sentences. The elongation of sentences is regarded as an indication of maturity in language use and ability. How did you define "word"? Some people count hyphenated words as a single lexeme whereas others consider them to be two distinct lexemes. Numbers can also cause word count "inaccuracies". The European convention of using space between three digit groups would give a different count from the English or American convention of using the comma to separate those same groups. All of those questions and their consequent answers may well have affected your results.



Regards, Trevor.



<>< Re: deemed!



Sent from my iPad



On 5 Oct 2011, at 11:25, "Yuri Tambovtsev" <yutamb at mail.ru<mailto:yutamb at mail.ru>> wrote:



> Dear colleagues, how long is the sentence of every writer? I measured how many words the sentence contains. I took British and American writers. Really I cannot understand why different writers have different length of sentences. Is it connected with their brains? Or is it because they were taught differently? Are there many articles published on that? Looking forward to hearing from you to yutamb at mail.ru<mailto:yutamb at mail.ru>  Be well, Yuri Tambovtsev, Novosibirsk, Russia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111006/18ed2c19/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list