header

Using Semantic Data to Improve Cross-Lingual Linking of Article Clusters

8 Pages Posted: 17 Jan 2020 Publication Status: Accepted

See all articles by Evgenia Belyaeva

Evgenia Belyaeva

Jožef Stefan Institute; Jožef Stefan Institute - International Postgraduate School

Aljaz Kosmerlj

Jožef Stefan Institute

Andrej Muhic

Jožef Stefan Institute

Jan Rupnik

Jožef Stefan Institute

Flavio Fuart

Jožef Stefan Institute

Abstract

This paper presents a system that uses semantic data to improve cross-lingual linking of news article clusters. Two approaches are compared. The first based on two different Canonical Correlation Analysis (CCA) feature vector definitions: MAX-CCA and SUM-CCA, whereas the second one has been developed using a better-performed CCA approach in combination with Entity vectors. The aim of the comparison was to determine whether taking into account the semantic aspect of news increases performance and improves linking. Evaluations of the aforementioned techniques on a news corpus, both against Google News and manual, revealed good performance of our system. The overall gain in precision and recall when using entity vectors was significant.

Keywords: semantic data, natural language processing, cross-linguality, canonical correlation analysis

Suggested Citation

Belyaeva, Evgenia and Kosmerlj, Aljaz and Muhic, Andrej and Rupnik, Jan and Fuart, Flavio, Using Semantic Data to Improve Cross-Lingual Linking of Article Clusters (2015). Available at SSRN: https://ssrn.com/abstract=3198921 or http://dx.doi.org/10.2139/ssrn.3198921

Evgenia Belyaeva (Contact Author)

Jožef Stefan Institute ( email )

Jamova cesta 39
Ljubljana, 1000
Slovenia

Jožef Stefan Institute - International Postgraduate School

Jamova 39
Ljubljana, SI-1000
Slovenia

Aljaz Kosmerlj

Jožef Stefan Institute

Jamova cesta 39
Ljubljana, 1000
Slovenia

Andrej Muhic

Jožef Stefan Institute

Jamova cesta 39
Ljubljana, 1000
Slovenia

Jan Rupnik

Jožef Stefan Institute

Jamova cesta 39
Ljubljana, 1000
Slovenia

Flavio Fuart

Jožef Stefan Institute ( email )

Jamova cesta 39
Ljubljana, 1000
Slovenia

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
40
Abstract Views
519
PlumX Metrics