[Corpora-List] (no subject)

Mike Maxwell maxwell at umiacs.umd.edu
Fri May 25 12:33:50 UTC 2012


On 5/23/2012 2:24 AM, fatima zuhra wrote:
> Some of my work includes the development of a corpus, a
> morphological analyzer, a parser and a transliterator for Pashto langauge. I have also worked on
> a part of speech tagger for Pashto and the work is in progress. I am interested in the knowledge
> and discussions about copyright rules. In my view, a more severe problem is that if someone
> integrates in his/her software an algorithm (or even the software code) from another scholar's
> work (e.g. my morphological analyzer code and methodology) without the knowledge of the scholar.
> It will be very hard to check the code of such a larger software for 'plagiarism'!!!!

Very few researchers today would create an algorithm to do morphological parsing of some language. 
Rather, most morphological analyzers these days are based on three components: a language-agnostic 
parsing engine (which contains algorithms); a set of grammar rules for morphology; and a lexicon. 
Commonly used parsing engines include the Xerox finite state transducer (xfst) and the Stuttgart 
finite state transducer (sfst), among others.

If two groups use the same engine for the same language, there will be significant similarities in 
their code--the same affixes, for example.  It could be hard to demonstrate plagiarism there, simply 
because the code *has* to be similar.  Even morphosyntactic feature names will often be the same 
(how many ways can you say "tense" or "number"?).

On the other hand, if there are significant morpho-phonological processes, that part of the grammar 
could and probably would differ in analysis, because there are different ways to describe the 
natural classes involved, or to order the rules.  Or if there is not an agreed-on set of declension 
classes (as there is not, for Pashto), there would likely be differences in that part of the grammar 
on the part of different teams.
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
	"My definition of an interesting universe is
	one that has the capacity to study itself."
         --Stephen Eastmond

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list