[Corpora-List] phrase alignment (Saeed Farzi)

Nadir Durrani aquadurrani at gmail.com
Thu Feb 13 11:49:51 UTC 2014


Hi,

Moses outputs phrase-alignments by default which are then removed in a
subsequent step. For example

by a government |19-21| authority |22-22| of their state |23-24|

means that a phrase made up of source words at indexes 19, 20 and 21 were
translated to target phrase "by a government".

If you want phrase-internal alignments (word-to-word), you can add one of
the following to the decoder command and get alignment information.

-alignment-output-file-print-alignment-info-in-n-best


If target string is known in advance, you can use force decoding. It will
give you the best phrasal alignment.

Cheers,
Nadir

On Thu, Feb 13, 2014 at 11:00 AM, <corpora-request at uib.no> wrote:

> Today's Topics:
>
>    1.  phrase alignment (Saeed Farzi)
>    2.  First Announcement: The Fifth Swedish Language Technology
>       Conference (SLTC-14) (Jörg Tiedemann)
>    3.  SEPLN 2013 - 1st Call for Papers (Horacio Saggion)
>    4.  Call for Demos: NLDB'2014, Montpellier - France (Mathieu Roche)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 12 Feb 2014 17:55:15 +0330
> From: Saeed Farzi <saeedfarzi at gmail.com>
> Subject: [Corpora-List] phrase alignment
> To: "corpora at uib.no" <corpora at uib.no>, moses-support
>         <moses-support at mit.edu>
>
> Dear all,
>
>
> I have a question about finding the best  phrase alignments.
> The alignments are used by  MOSES during the decoding phrase.
>
>  I have a  pair parallel sentences ( a source / a target).  I need the best
> phrase alignment between the source and target sentences.  The best
> phrase alignment is a alignment that MOSES is used to translate the source
> sentence to the target sentence.
>
> Let me use an example to explain what i want.
> Example:
>  I have a pair sentence:
> Source : I  go to the home
> Target  : man be khaneh miravam (in farsi)
>
> I need the following alignment:
>
> The Best alignment : [I-->man] [to the --> beh] [home-->khaneh] [ go -->
> miravam]
> the result includes two sort of information
>
> 1- the best segments
> 2- the best alignment
>
> We can use MOSES for extracting the alignments when using training
> sentences as input sentence of the MOses's decoder. But there is problem.
> The output of the decoder is not exactly same as the target sentence.
>
> I know that the giza++ is used for word alignments. I need a solution for
> phrase alignments.
> Tnx
> --
>            S.Farzi, Ph.D. Student
>     Natural Language Processing Lab,
>   School of Electrical and Computer Eng.,
>                Tehran University
>              Tel: +9821-6111-9719
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 1685 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20140212/d8a40cf4/attachment.txt
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 12 Feb 2014 14:50:43 +0000
> From: Jörg Tiedemann <Jorg.Tiedemann at lingfil.uu.se>
> Subject: [Corpora-List] First Announcement: The Fifth Swedish Language
>         Technology Conference (SLTC-14)
> To: sltc2014 <sltc2014 at lingfil.uu.se>
> Cc: "elsnet-list at elsnet.org" <elsnet-list at elsnet.org>,
>         "acl at aclweb.org" <acl at aclweb.org>,      "nordlingnet at uib.no"
>         <nordlingnet at uib.no>,   "alla at gslt.hum.gu.se" <alla at gslt.hum.gu.se
> >,
>         "nodali at helsinki.fi" <nodali at helsinki.fi>,      "corpora at uib.no"
>         <corpora at uib.no>
>
> The Fifth Swedish Language Technology Conference (SLTC-14)
> http://www2.lingfil.uu.se/SLTC2014/
>
>
> Uppsala, Sweden
> November 13-14, 2014
>
> The Fifth Swedish Language Technology Conference (SLTC-14) will be held in
> Uppsala, November 13-14, 2013, organized by the Computational Linguistics
> Group at the Department of Linguistics and Philology at Uppsala University.
> Papers and workshops will be invited on all aspects of language technology,
> including natural language processing, speech technology, and relevant
> neighboring areas. Call for workshops and papers will be issued in early
> March.
>
>
> Important dates:
>
> Workshop Proposal Submission: May 31, 2014
> Workshop Notification of Acceptance: June 15, 2014
> Abstract Submission: September 1, 2014
> Notification of Acceptance: September 22, 2014
> Final Abstract Submission: October 13, 2014
> Registration (Early Bird): October 13, 2014
> Conference and Workshops: November 13-14, 2014
>
>
> URL:
> http://www2.lingfil.uu.se/SLTC2014/
>
>
> Contact:
> Scientific issues: sltc2014 at lingfil.uu.se
> Practical issues: sltc2014 at akademikonferens.uu.se
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 12 Feb 2014 16:28:18 +0100
> From: Horacio Saggion <horacio.saggion at upf.edu>
> Subject: [Corpora-List] SEPLN 2013 - 1st Call for Papers
> To: corpora <corpora at uib.no>
>
> --------------------------------------------------------------------
>
> CALL FOR PAPERS:
>
> 30th CONFERENCE OF THE SPANISH SOCIETY
>
> FOR NATURAL LANGUAGE
> PROCESSING (SEPLN 2014)
> September 17-19, 2014
> Universitat de Girona
> http://www.taln.upf.edu/pages/sepln2014/es/index.html
>
> --------------------------------------------------------------------
>
>
> INTRODUCTION
> -----------------------
>
>
> The 30th edition of the Annual Conference of the Spanish Society for
> Natural Language Processing (SEPLN) will take place in Universitat de
> Girona, Girona, Spain on 17-19 September 2014.  We also expect to organize
> associated workshops.
>
>
> The huge amount of information available in digital format and in different
> languages calls for systems to enable us to access this vast library in an
> increasingly more structured way.
>
> In this same area, there is a renewed interest in improving information
> accessibility and information exploitation in multilingual environments.
> Many of the formal foundations for dealing appropriately with these
> necessities have been, and are still being established in the area of
> Natural Language Processing and its many branches: Information extraction
> and retrieval, Questions answering systems, Machine translation, Automatic
> analysis of textual content, Text summarization, Text generation, and
> Speech recognition and synthesis.
>
> The aim of the conference is to provide a forum for discussion and
> communication where the latest research work and developments in the field
> of Natural Language Processing (NLP) can be presented by scientific and
> business communities. The conference also aims at exposing new
> possibilities of real applications and R&D projects in this field.
>
> Moreover, as in previous editions, there is the intention of identifying
> future guidelines or paths for basic research and foreseen software
> applications, in order to compare them against the market needs. Finally,
> the conference intends to be an appropriate forum in helping new
> professionals to become active members in this field.
>
>
> TOPICS
>
> -----------
>
>
> Researchers and companies are encouraged to send communications, project
> abstracts or demonstrations related to any language technology topic
> including but not limited to the following:
>
> * Linguistic, mathematic and psycholinguistic models of language.
> * Machine learning in NLP.
> * Computational lexicography and terminology.
> * Corpus linguistics.
> * Development of linguistic resources and tools.
> * Grammars and formalisms for morphological and syntactic analysis.
> * Semantics, pragmatics and discourse.
> * Lexical ambiguity resolution.
> * Monolingual and multilingual text generation.
> * Machine translation.
> * Speech synthesis and recognition.
> * Dialogue systems.
> * Audio indexing.
> * Monolingual and multilingual information extraction and retrieval.
> * Question answering systems.
> * Evaluation of NLP systems.
> * Automatic textual content analysis.
> * Sentiment analysis and opinion mining.
> * Plagiarism detection.
> * Negation and speculation processing.
> * Text mining in blogosphere and social networks.
> * Text summarization.
> * Image retrieval.
> * NLP in biomedical domain.
>
> * NLP-based generation of teaching resources.
> * NLP for languages with limited resources.
> * NLP industrial applications.
>
>
> CONTACT
> --------------
>
> All information related to the conference can be found in the web:
>
>
> http://www.taln.upf.edu/pages/sepln2014/en/index.html
>
>
> STRUCTURE OF THE CONFERENCE
> --------------------------------------------------
>
>
> The conference will last three days, and will consist of sessions devoted
> to presenting papers, posters, tutorials, ongoing research projects and
> prototype or product demonstrations connected with topics addressed in the
> conference. Besides, we expect to organize associated workshops.
>
>
> SUBMISSION OF CONTRIBUTIONS
> -------------------------------------------------
>
>
> Authors are encouraged to send theoretical or application-oriented
> proposals related to NLP. The proposals must include the following
> sections:
>
> * The title of the communication.
>
> * An abstract in English and Spanish (maximum 150 words) and a list
>
> of keywords.
>
> * The paper can be written in Spanish or English. Its overall maximum
> length
> will be 8 pages, including references.
> * The documents must not include headers or footers.
>
> * Papers should NOT include the names of the authors.
>
> The papers proposed will be reviewed at least by three reviewers, and can
> be accepted to be presented either as posters or as communications,
> depending on the program necessities. However, no distinction will be made
> between communications and posters in the printed version of the SEPLN
> journal.
>
>
>
>
>
> *** IMPORTANT NOTE ON CAMERA READY ****
>
> The final version of the paper (camera ready) should be submitted together
> with a cover letter explaining how the suggestions of the reviewers were
> implemented in the final version.
>
> **********************************************
>
>
>
> Please, send your proposals using the following link::
>
> http://www.sepln.org/myreview-sepln53/
>
>
> The format of the SEPLN journal must be followed:
>
> http://www.sepln.org/?page_id=1285&lang=en
>
>  In addition, all proposals will have to comply with the following
> requirements, depending on whether they pare papers, demos or projects.
>
>
> PROJECTS AND DEMOS
> ----------------------------------
>
>
> As in previous editions, the organizers encourage participants to give oral
> presentations of R&D projects and demos of systems or tools related to the
> NLP field. For oral presentations on R&D projects to be accepted, the
> following information must be included:
>
> * Project title.
> * Name, affiliation, address, e-mail and phone number of the project
> director.
> * Funding institutions.
> * Groups participating in the project.
> * Abstract (4 pages maximum, including references).
>
>
> For demonstrations to be accepted, the following information is mandatory:
>
> * Demo title.
> * Name, affiliation, e-mail and phone number of the authors.
> * Abstract (4 pages maximum, including references).
> * Time estimation for the whole presentation.
>
>  **** SEE NOTE ON CAMERA READY ABOVE ****
>
> IMPORTANT DATES
> ----------------------------
>
>
>  Deadline for full papers, demos, and projects: 10th April 2014
>
> Notifications: 26th May 2014
>
> Camera Ready: 7th June 2014
>
>
>
>
> --
> Dr. Horacio Saggion
> TALN / DTIC
> Universitat Pompeu Fabra
> http://www.dtic.upf.edu/~hsaggion/
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 13262 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20140212/174eeccd/attachment.txt
> >
>
> ------------------------------
>
> Message: 4
> Date: Wed, 12 Feb 2014 18:16:32 +0100
> From: Mathieu Roche <Mathieu.Roche at lirmm.fr>
> Subject: [Corpora-List] Call for Demos: NLDB'2014, Montpellier -
>         France
> To: <corpora at uib.no>, <acl at aclweb.org>, <ISWORLD at listserv.heanet.ie>,
>         <IRList at lists.shef.ac.uk>, <bionlp at bionlp.org>, <
> dbworld at cs.wisc.edu>,
>         <ln at cines.fr>, <liste-egc at polytech.univ-nantes.fr>, <
> bull-i3 at irit.fr>
>
>  *******************************************
>
>  Call for Demos - NLDB'2014
>
>  18-20 June 2014 - Montpellier, France
>
>  http://www.nldb.org/
>
>  *******************************************
>
>  The 19th International Conference on Application of Natural Language to
>  Information Systems (NLDB?2014) invites submissions of demonstrations of
>  state-of-the-art research or industrial prototypes related to all
>  aspects of Natural Language in the Database and Information Systems
>  field.
>  Topics of interest include but are not limited to:
>  - Applications of NLP in Information Systems
>  - Social Media and Web Data
>  - Big Data and Natural Language
>  - Semantic Web and Open Linked Data
>  - Question Answering (QA)
>  - Natural language and Ubiquitous Computing
>  - Natural Language in Conceptual Modeling
>  - NLP Applications (Opinion Mining, Information Extraction, ?)
>
>  Demo submissions will be handled online via the easychair conference
>  management system:
>  https://www.easychair.org/conferences/?conf=nldb2014demonstratio
>
>  Demonstration paper submissions should have 4 pages (LNCS format).
>  Developers should outline the design of their system and provide
>  details to allow the evaluation of its validity, quality, originality,
>  and relevance to NLP in Information Systems.
>
>  The accepted papers for demos will be included in the conference
>  proceedings, to be published by Springer Verlag in the "Lecture Notes in
>  Computer Science" (LNCS) Series. The demos will be presented in a
>  special demonstration session. At least one of the demo submitters must
>  register for the conference, and perform the demo on site.
>
>  Important Dates:
>  - Demo submission deadline (firm): March 13, 2014
>  - Notification of acceptance: March 28, 2014
>  - Camera-ready paper due: April 7, 2014
>
>
>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
>         corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
>         corpora-request at uib.no
>
> You can reach the person managing the list at
>         corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 80, Issue 14
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140213/6613121c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list