30.3540, Software: Release of the DSM-parameter-analysis GitHub repository

The LINGUIST List linguist at listserv.linguistlist.org
Fri Sep 20 09:14:27 UTC 2019


LINGUIST List: Vol-30-3540. Fri Sep 20 2019. ISSN: 1069 - 4875.

Subject: 30.3540, Software: Release of the DSM-parameter-analysis GitHub repository

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 20 Sep 2019 05:10:04
From: András Dobó [dobo at inf.u-szeged.hu]
Subject: Release of the DSM-parameter-analysis GitHub repository

 
Release of the DSM-parameter-analysis GitHub repository

Dear Colleagues,

We are pleased to announce the release of the GitHub repository connected to
the PhD dissertation of András Dobó:
Dobó, A.: A comprehensive analysis of the parameters in the creation and
comparison of feature vectors in distributional semantic models for multiple
languages. University of Szeged (2019)
http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

The GitHub repository, including the source code, as well as the used
libraries, resources and test datasets, is available at:
https://github.com/doboandras/dsm-parameter-analysis

The project implements a distributional semantic model (DSM), with 10 freely
adjustable parameters. For some of the parameters more than a thousand
possible settings are implemented, resulting in trillions of possible
configurations. This freely configurable DSM can have any corpus or word
vectors as input, and can be tested on multiple standard test datasets. It
currently works for the following languages: English, Spanish and Hungarian.

Abstract of the dissertation:

Measuring the semantic similarity and relatedness of words is important for
many natural language processing tasks. Although distributional semantic
models designed for this task have many different parameters, such as vector
similarity measures, weighting schemes and dimensionality reduction
techniques, there is no truly comprehensive study simultaneously evaluating
these parameters while also analysing the differences in the findings for
multiple languages.

We would like to address this gap with our systematic study by searching for
the best configuration in the creation and comparison of feature vectors in
distributional semantic models for English, Spanish and Hungarian separately,
and then comparing our findings across these languages.

During our extensive analysis we test a large number of possible settings for
all parameters, with more than a thousand novel variants in case of some of
them. As a result of this we were able to find such configurations that
significantly outperform conventional configurations and achieve
state-of-the-art results.

For more information please see the below publications:

Dobó, A.: A comprehensive analysis of the parameters in the creation and
comparison of feature vectors in distributional semantic models for multiple
languages. University of Szeged (2019)
http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

Dobó A., Csirik J.: Comparison of the Best Parameter Settings in the Creation
and Comparison of Feature Vectors in Distributional Semantic Models Across
Multiple Languages. In: MacIntyre J., Maglogiannis I., Iliadis L., Pimenidis
E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2019. IFIP
Advances in Information and Communication Technology, vol 559. 487-499.
Springer, Cham. (2019)
http://www.inf.u-szeged.hu/~dobo/Publications/Comparison%20of%20the%20best%20p
arameter%20settings%20of%20DSMs%20across%20languages.pdf

Dobó A., Csirik J.: A Comprehensive Study of the Parameters in the Creation
and Comparison of Feature Vectors in Distributional Semantic Models. Journal
of Quantitative Linguistics (2019)
https://doi.org/10.1080/09296174.2019.1570897

If you have any questions or comments, please email me at:
dobo at inf.u-szeged.hu.

Best regards,
Andras Dobo
Institute of Informatics
University of Szeged
dobo at inf.u-szeged.hu
http://www.inf.u-szeged.hu/~dobo/


Linguistic Field(s): Computational Linguistics
                     Semantics

Subject Language(s): English (eng)
                     Hungarian (hun)
                     Spanish (spa)



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-3540	
----------------------------------------------------------






More information about the LINGUIST mailing list