[Corpora-List] Call for application to the Phd Pro gramm in Linguistics at Pontificia Universidad Cat ólica de Valparaíso

Thu Sep 4 14:21:08 UTC 2008

Dear members of the corpora List:
We are happy to inform you that the call for application to the Phd. Program
in Linguistics at Pontificia Universidad Católica de Valparaíso, Chile has
been opened. More information (in Spanish) is available at:
http://www.linguistica.cl/prontus_linguistica/site/edic/base/port/programas.
html

René Venegas

-----Mensaje original-----
De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de
corpora-request at uib.no
Enviado el: jueves, 04 de septiembre de 2008 9:00
Para: corpora at uib.no
Asunto: Corpora Digest, Vol 15, Issue 4

Today's Topics:

   1. Re:  concordance program for large files (Max Silberztein)
   2. Re:  concordance program for large files (James Thomas)
   3. Re:  concordance program for large files (Francis Tyers)
   4.  algo for semantic structure (Vrone)
   5. Re:  concordance program for large files (Max Silberztein)
   6. Re:  concordance program for large files (Francis Tyers)
   7. Re:  algo for semantic structure (John F. Sowa)
   8. Re:  Chinese texts (Gemma Boleda) (Xiaolin Wang)
   9.  CLIN 19: Call for abstracts (Barbara Plank)

----------------------------------------------------------------------

Message: 1
Date: Wed, 3 Sep 2008 16:05:39 +0200
From: "Max Silberztein" <max.silberztein at univ-fcomte.fr>
Subject: Re: [Corpora-List] concordance program for large files
To: <Corpora at uib.no>

Francis,

I agree with you: no flame war!

However, you should know that the only people who ever bought INTEX did not
buy it from my university, but from an organization called ASSTRIL run by
the same guys who brought you Unitex; Unitex was born a few weeks after a
lawyer forced them to stop selling INTEX, after a year-long battle.

I am puzzled by your argument: does GPL philosophy really advocate copying
other people's work without proper authorization or citation? Call me naive
but I thought GPL licensing was created to *protect* original authors, not
to pirate them. It took me 10 years to write INTEX and I was devastated when
unscrupulous colleagues plagiarized it.

Actually I believe that Unitex is an insult to the GPL community: how does
Unitex-style behavior help academics feel confident that they can indeed
give away the source of their work without fearing that they will be
plagiarized?

Finally, your argument against the right of a small lab (which has no access
to CNRS funding) to decide on its distribution policy is used by pirates to
advocate copying movies/CDs/software. Don't you think there ought to be some
minimal respect of author's rights, especially within the academic
community?

--Max

-----Original Message-----
From: Francis Tyers [mailto:ftyers at prompsit.com] 
Sent: Wednesday, September 03, 2008 11:11 AM
To: max.silberztein at univ-fcomte.fr
Cc: Corpora at uib.no
Subject: Re: [Corpora-List] concordance program for large files

El mié, 03-09-2008 a las 10:53 +0200, Max Silberztein escribió:
> Unitex is a non-authorized copy of a free software named INTEX; it has not
> been "designed". See:

By "free software" I assume you mean free as in price as opposed to free
as in freedom. Unitex is licensed under the GPL and is truely free
software, which means people are free to: 

* Use it for any purpose
* Study how it works
* Share it with their friends and colleagues
* Improve it and adapt it to their needs, and then release these changes
so that everyone can benefit.

Compared to the INTEX licence where people,

* Can use it for unambiguously non-commercial purposes only
* Cannot study how it works
* Cannot share it with friends and colleagues
* Cannot improve it and adapt it to their needs, and then release these
changes so that everyone can benefit.

I'd write more, but I think the above speaks for itself and don't want
to risk getting into a flame war ;)

Fran

------------------------------

Message: 2
Date: Wed, 3 Sep 2008 16:10:24 +0200
From: "James Thomas" <jaedth at gmail.com>
Subject: Re: [Corpora-List] concordance program for large files
To: Corpora at uib.no

Not quite sure how one defines "wasting money". In this context it is
probably equivalent to spending any money at all when there is a free
alternative available.

I have seen the Sketch Engine <http://www.sketchengine.co.uk> (c. 50 Euro
p.a.) perform very rapidly on a corpus of 1,000 mill (a.k.a. billion) words.
And it includes Corpus Builder which allows you to, erm, build your own
corpus.

Speaking of wasting money, I am writing from the Eurocall Conference where
my Hungarian hotel costs are far in excess of 50 Euro per night. But one
deserves a little luxury, n'est pas?

James Thomas
Faculty of Arts
Masaryk University
A. Novaka 1
602 00 Brno
Czech Republic
----
Tel: +420 54949 7614
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 808 bytes
Desc: not available
Url :
http://www.uib.no/mailman/public/corpora/attachments/20080903/6fc7694a/attac
hment.txt 

------------------------------

Message: 3
Date: Wed, 03 Sep 2008 16:16:28 +0200
From: Francis Tyers <ftyers at prompsit.com>
Subject: Re: [Corpora-List] concordance program for large files
To: max.silberztein at univ-fcomte.fr
Cc: Corpora at uib.no

El mié, 03-09-2008 a las 16:05 +0200, Max Silberztein escribió:
> Francis,
> 
> I agree with you: no flame war!

Cool! :)

> However, you should know that the only people who ever bought INTEX did
not buy 
> it from my university, but from an organization called ASSTRIL run by the
same 
> guys who brought you Unitex; Unitex was born a few weeks after a lawyer
forced 
> them to stop selling INTEX, after a year-long battle.

No comment. This is about the software, not the company / organisation.

> I am puzzled by your argument: does GPL philosophy really advocate copying
other 
> people's work without proper authorization or citation? Call me naive but
I thought 
> GPL licensing was created to *protect* original authors, not to pirate
them. It 
> took me 10 years to write INTEX and I was devastated when unscrupulous
colleagues 
> plagiarized it.

>>From what I understood from your post, they made a GPL version of a
non-GPL piece of software. No code was copied. If code was copied then
that is another issue.

> Actually I believe that Unitex is an insult to the GPL community: how does
Unitex-style 
> behavior help academics feel confident that they can indeed give away the
source of 
> their work without fearing that they will be plagiarized?

GPL licence violations are another issue. They can be, and have been
successfully 
pursued through the court system.[1]

> Finally, your argument against the right of a small lab (which has no
access to 
> CNRS funding) to decide on its distribution policy is used by pirates to
advocate 
> copying movies/CDs/software. Don't you think there ought to be some
minimal respect 
> of author's rights, especially within the academic community?

I do not condone unauthorised copying of software code (what you refer
to as "piracy"). However, if someone writes something that works the
same as your software with their own code and their own hands, this
should not be a problem. I don't believe in software patents.

My understanding of the original post was that someone had replicated
the functionality of your software as a piece of free software. This is
something I completely endorse. There are many pieces of free software
which have started out as "clones" of other pieces of non-free software,
and have in time become better. If this wasn't the case, my apologies
for the "holier than thou" earlier response.

Regards,

Fran

1. http://gpl-violations.org/

------------------------------

Message: 4
Date: Wed, 3 Sep 2008 14:20:41 +0000
From: Vrone <vrone at hotmail.co.uk>
Subject: [Corpora-List] algo for semantic structure
To: <corpora at uib.no>

Hi all,

Can any one give an idea about best algorithm to transform syntactic
structure into a semantic structure, in NLP context.

Regards,
Vrone
_________________________________________________________________
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 513 bytes
Desc: not available
Url :
http://www.uib.no/mailman/public/corpora/attachments/20080903/c50d9d1c/attac
hment.txt 

------------------------------

Message: 5
Date: Wed, 3 Sep 2008 17:33:12 +0200
From: "Max Silberztein" <max.silberztein at univ-fcomte.fr>
Subject: Re: [Corpora-List] concordance program for large files
To: <ftyers at prompsit.com>
Cc: Corpora at uib.no

Obviously it is impossible to prove that they used the copy/paste command.
However, there are clues that they had full access to INTEX's whole source
and resources while "designing" (their word) Unitex. The least technical
ones:

-- INTEX was sold by ASSTRIL (the same guys). How could they sell it without
my knowing, and yet be able to generate the right decryption license keys
for their customers to run INTEX?

-- You really need to check out the analysis report on the "similarities"
between the two pieces of software, manuals, file formats and linguistic
resources at:

http://mshe.univ-fcomte.fr/intex/Unitex.htm

If you don't read French, at least check out the screen shots, for instance
of the Sentence graph (search for section "2.4.2 Découpage en phrases (p.
13)"): a few weeks of work, just stolen. Do you really believe the Unitex
graph was "designed" to be compatible with INTEX, rather than just copied?
by the way the comments "regle #1", "regle #2", etc. are references to the
INTEX manual; why would Unitex contain links to INTEX's manual?

-- half a dozen features of INTEX are exactly the same in Unitex, even
though they no longer make any sense in a Unicode environment. For instance,
INTEX which is based on 8-bit ASCII, needs functions to deal with
alphabetical order, accented and foreign letters; why are these functions
even exist in a Unicode Unitex ? don't they know about Char.IsALetter () ?
how come they ended up writing C functions with the same names as INTEX's if
they did not have access to INTEX's source?

-- etc. (there are other funny clues)

Just in case you wonder how they could do it and even capitalize on it,
really big time: in the French academic world,  a researcher has no right on
his/her work, which it is owned by his university solely. Therefore he/she
cannot defend himself/herself legally. And French universities do not sue
other French universities, because they are all state-controlled by the same
minister. And then some researchers just love open software so much that
they don't want to know! There is also the big-CNRS-vs-little-provincial-lab
thing. And other considerations. Checkmate.

--Max

------------------------------

Message: 6
Date: Wed, 03 Sep 2008 17:42:47 +0200
From: Francis Tyers <ftyers at prompsit.com>
Subject: Re: [Corpora-List] concordance program for large files
To: max.silberztein at univ-fcomte.fr
Cc: Corpora at uib.no

El mié, 03-09-2008 a las 17:33 +0200, Max Silberztein escribió:
> Obviously it is impossible to prove that they used the copy/paste command.
However, 
> there are clues that they had full access to INTEX's whole source and
resources 
> while "designing" (their word) Unitex. The least technical ones:
> 
> -- INTEX was sold by ASSTRIL (the same guys). How could they sell it
without my 
> knowing, and yet be able to generate the right decryption license keys for
their 
> customers to run INTEX?
> 
> -- You really need to check out the analysis report on the "similarities"
between 
> the two pieces of software, manuals, file formats and linguistic resources
at:
> 
> http://mshe.univ-fcomte.fr/intex/Unitex.htm
> 
> If you don't read French, at least check out the screen shots, for
instance of the 
> Sentence graph (search for section "2.4.2 Découpage en phrases (p. 13)"):
a few 
> weeks of work, just stolen. Do you really believe the Unitex graph was
"designed" 
> to be compatible with INTEX, rather than just copied? by the way the
comments 
> "regle #1", "regle #2", etc. are references to the INTEX manual; why would

> Unitex contain links to INTEX's manual?
> 
> -- half a dozen features of INTEX are exactly the same in Unitex, even
though 
> they no longer make any sense in a Unicode environment. For instance,
INTEX 
> which is based on 8-bit ASCII, needs functions to deal with alphabetical
order, 
> accented and foreign letters; why are these functions even exist in a
Unicode 
> Unitex ? don't they know about Char.IsALetter () ? how come they ended up 
> writing C functions with the same names as INTEX's if they did not have
access 
> to INTEX's source?
> 
> -- etc. (there are other funny clues)
> 
> Just in case you wonder how they could do it and even capitalize on it,
really 
> big time: in the French academic world,  a researcher has no right on
his/her 
> work, which it is owned by his university solely. Therefore he/she cannot 
> defend himself/herself legally. And French universities do not sue other 
> French universities, because they are all state-controlled by the same 
> minister. And then some researchers just love open software so much that
they 
> don't want to know! There is also the big-CNRS-vs-little-provincial-lab
thing. And 
> other considerations. Checkmate.

If they stole the code, they stole the code. Nobody is condoning that.

My arguments are only valid for work-the-same or work-a-like programs
where the code has been written from scratch.

Fran

------------------------------

Message: 7
Date: Thu, 04 Sep 2008 01:35:17 -0400
From: "John F. Sowa" <sowa at bestweb.net>
Subject: Re: [Corpora-List] algo for semantic structure
To: Vrone <vrone at hotmail.co.uk>
Cc: corpora at uib.no

That is the Holy Grail:

 > Can any one give an idea about best algorithm to transform
 > syntactic structure into a semantic structure, in NLP context.

A general algorithm for that task would be a solution to the
problem of "language understanding".  And no computer system
can truly be said to understand natural language with anything
remotely resembling human ability.

In any case, I would first ask what you mean by "semantic structure".
Most people who propose a theory of semantic structure have also
proposed some method for relating a syntactic structure, such as a
parse tree or a dependency graph, to a semantic structure of the
type specified by the theory.

There are many good parsers that produce correct syntactic
structures for a reasonable percentage of sentences of some genre.
However, the percentage of completely correct semantic structures
generated from those syntactic structures is much, much lower.

John Sowa

------------------------------

Message: 8
Date: Thu, 4 Sep 2008 10:28:19 +0800
From: "Xiaolin Wang" <arthur_general at sjtu.edu.cn>
Subject: Re: [Corpora-List] Chinese texts (Gemma Boleda)
To: <corpora at uib.no>,	<gboleda at lsi.upc.edu>

Dear Boleda:

I'm a PhD candidate in China. As for the novels (literary 
texts), there are many web novels by anonymous writers which 
meet your requirement. They are very popular nowadays and 
have a lot of fans. Usually they consist of more than a 
hundred episodes, so the length is not a problem. If you 
need them, I might download some and send you.

    Sincerely,

Xiaolin Wang

Jiaotong University, Shanghai, China

arthur_general at sjtu.edu.cn

 http://bcmi.sjtu.edu.cn/~wangxiaolin/

----------------------------------------------------------------------
>Date: Tue, 02 Sep 2008 15:20:20 +0200
>From: Gemma Boleda <gboleda at lsi.upc.edu>
>Subject: [Corpora-List] Chinese texts
>To: Corpora at uib.no

>Dear members,

>I am looking for a couple of texts in Chinese that have the 
>properties
>listed below. Finding them is more difficult than I had 
>foreseen (e.g.,
>I have checked the Gutenberg project, but most texts are in 
>classical
>Chinese; those that are not, for instance those by Lu Xun, 
>are too short
>for my purposes). Any pointers would be appreciated.

>The texts should be:

>- freely available for research purposes;
>- written in modern Chinese;
>- by a single author (no translations);
>- long (the longer, the better), at least 100 thousand 
>words;

>Any topic/genre would do; ideally, I'd like to have at 
>least one novel
>and one non-literary piece (e.g., a textbook on economy or 
>history).

>Thank you,

>Gemma Boleda
>Universitat Politècnica de Catalunya

------------------------------

Message: 9
Date: Thu, 04 Sep 2008 14:32:28 +0200
From: Barbara Plank <b.plank at rug.nl>
Subject: [Corpora-List] CLIN 19: Call for abstracts
To: corpora at uib.no

----------------------------------------------------------------------------
CLIN 19 - CALL FOR ABSTRACTS

19th Meeting of Computational Linguistics in The Netherlands
Thursday 22 January 2009
http://www.let.rug.nl/clin/
----------------------------------------------------------------------------

The Nineteenth Annual Meeting of Computational Linguistics in
The Netherlands (CLIN) will be held on Thursday 22 January 2009
in Groningen, The Netherlands. We invite abstract submissions on
all aspects of computational linguistics and related language
technologies.

SUBMISSION INSTRUCTIONS

Authors should submit an abstract in English. The abstract should contain:

    * author name, address, affiliation, and email address
    * abstract title
    * abstract text (250 words maximum)
    * preference for oral presentation or a poster

Abstracts should be sent by email to clin at rug.nl by Monday 17 November 2008.

IMPORTANT DATES

Monday 17 November 2008: Deadline for submitting abstracts
Friday 21 November 2008: Notification of acceptance
Thursday 22 January 2009: CLIN in Groningen

INVITED TALK

The invited talk at CLIN 19 will be presented by Mirella Lapata from
the University of Edinburgh.

CO-LOCATED EVENT

CLIN 19 will be co-located with TLT 7, the 7th International
Workshop on Treebanks and Linguistic Theories, which will be
held on 23-24 January 2009, in Groningen.

ORGANIZATION

Barbara Plank
Ça?r? Çöltekin
Erik Tjong Kim Sang
Gertjan van Noord
Gosse Bouma
Jelena Proki?
Tim van de Cruys

----------------------------------------------------------------------
Send Corpora mailing list submissions to
	corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
	corpora-request at uib.no

You can reach the person managing the list at
	corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 15, Issue 4
**************************************

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora