[Corpora-List] (no subject)
Mike Maxwell
maxwell at umiacs.umd.edu
Fri May 25 12:33:50 UTC 2012
On 5/23/2012 2:24 AM, fatima zuhra wrote:
> Some of my work includes the development of a corpus, a
> morphological analyzer, a parser and a transliterator for Pashto langauge. I have also worked on
> a part of speech tagger for Pashto and the work is in progress. I am interested in the knowledge
> and discussions about copyright rules. In my view, a more severe problem is that if someone
> integrates in his/her software an algorithm (or even the software code) from another scholar's
> work (e.g. my morphological analyzer code and methodology) without the knowledge of the scholar.
> It will be very hard to check the code of such a larger software for 'plagiarism'!!!!
Very few researchers today would create an algorithm to do morphological parsing of some language.
Rather, most morphological analyzers these days are based on three components: a language-agnostic
parsing engine (which contains algorithms); a set of grammar rules for morphology; and a lexicon.
Commonly used parsing engines include the Xerox finite state transducer (xfst) and the Stuttgart
finite state transducer (sfst), among others.
If two groups use the same engine for the same language, there will be significant similarities in
their code--the same affixes, for example. It could be hard to demonstrate plagiarism there, simply
because the code *has* to be similar. Even morphosyntactic feature names will often be the same
(how many ways can you say "tense" or "number"?).
On the other hand, if there are significant morpho-phonological processes, that part of the grammar
could and probably would differ in analysis, because there are different ways to describe the
natural classes involved, or to order the rules. Or if there is not an agreed-on set of declension
classes (as there is not, for Pashto), there would likely be differences in that part of the grammar
on the part of different teams.
--
Mike Maxwell
maxwell at umiacs.umd.edu
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list