Multilingual Media Summarization and Sentiment Analysis

Project description

News aggregators gather thousands of news articles every day from across the world and cluster them into news stories comprising large numbers of articles about the same event. This is even augmented by the increasing amount of information in social media where mass opinions about news events can be monitored. A promising way to reduce this bulk of highly redundant data is offered by the language technologies known as multidocument text summarisation and sentiment analysis. A major problem is that news articles are in many languages while current technology has mostly dealt with English, and the question of how current research can be applied in a heavily multilingual context has barely been addressed.

Most multi-document summarisation research has focused on the news domain and MediaGist will do likewise in order to build on existing techniques and resources. However, summarising posts in social media, representing complementary and unbiased information, will be considered as well. Media professionals, however, will want to go beyond summaries from sources in one language and consider how news events are reported in other countries and from other perspectives. Identifying differences in opinion towards entities and events may provide some clues as the disagreements in reporting across languages. MediaGist will perform multilingual sentiment analysis and will thus make possible the generation of summaries that reveal these disagreements.

The goal of MediaGist is to make significant advances in multilingual research so as to extract and present the GIST (the main content and opinions) of online multilingual news and the corresponding content in social media.

Recipient: University of West Bohemia, Pilsen, Czech Republic
Funded by: European Commission
Program: FP7 People Programme - Marie Curie Actions - Career Integration Grants.
Project: MediaGist, no. 630786
Duration: 03-2014 - 02-2018

Gu, Y., Anderson, A., Steinberger, J., Celli, F., Strapparava, C. and Poesio, M. (2014): Using Brain Data for Sentiment Analysis. In: Journal for Language Technology and Computational Linguistics 29(1), pages 79-94, ISSN 2190-6858.
Brychcin, T. and Konkol, M., Steinberger, J. (2014): UWB: Machine Learning Approach to Aspect-Based Sentiment Analysis. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 817–822, ACL, Dublin, Ireland. ISBN 978-1-941643-24-2
Steinberger, J., Brychcin, T. and Konkol M.(2014): Aspect-Level Sentiment Analysis in Czech. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 24-30, ACL. Baltimore, USA. ISBN 978-1-941643-11-2
Habernal, I., Ptacek, T. and Steinberger, J. (2014): Supervised Sentiment Analysis in Czech Social Media. In: Information Processing & Management 50(5), pages 693–707, ISSN 0306-4573, DOI: 10.1016/j.ipm.2014.05.001, Elsevier.
Steinberger, J., Tanev, H., Zavarella, V., Steinberger, R., Turchi, M. (2014): Aspects of Multilingual News Summarisation. In: Alessandro Fiori (ed.): Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding, Advances in Data Mining and Database Management series, pages 277-294, ISSN: 2327-1981, ISBN 978-1-4666-5019-0, DOI: 10.4018/978-1-4666-5019-0.ch012, IGI Global.
Giannakopoulos, G., Kubina, J., Conroy, J., and Steinberger, J., Favre, B., Kabadjov, M., Kruschwitz, U. and Poesio, M. (2015): MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations. In: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SigDial'15), pages 270-274. Prague, Czech Republic, ACL, ISBN 978-1-941643-75-4.
Steinberger, J., Tanev, H. (2015): Towards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language. In: Proceedings of the 10th International Conference Recent Advances in Natural Language Processing (RANLP'15), pages 254-260. Hissar, Bulgaria, Incoma Ltd., ISSN 1313-8502.
Kabadjov, M., Steinberger, J., Barker, E., Kruschwitz, U. and Poesio, M. (2015): OnForumS: The Shared Task on Online Forum Summarisation at MultiLing’15. In: Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2015), pages 21-26. ACM. ISBN: 978-1-4503-4004-5
Krejzl, P., Steinberger, J., Hercig, T., and Brychcin, T. (2016): Online Forum Summarization. In: Proceedings of Data a znalosti 2015, pages 85-88. VSB Ostrava. ISBN 978-80-248-3824-3.
Steinberger, J., Kabadjov, M. and Poesio, M. (2016): Coreference Applications to Summarization. In: Massimo Poesio, Roland Stuckardt and Yannick Versley (eds.), Anaphora Resolution: Algorithms, Resources, and Evaluation, Chapter 15, Springer, ISBN: 978-3-662-47908-7.
Kabadjov, M., Kruschwitz, U., Poesio, M., Steinberger, J. and Zaragoza, H. (2016): The OnForumS corpus from the Shared Task on Online Forum Summarisation at MultiLing 2015. In: Proceedings of the 10th Language Resources and Evaluation Conference, pages 814-818, ELRA, Portoroz, Slovenia, May 2016. ISBN 978-2-9517408-9-1.
Krejzl, P. and Steinberger, J. (2016): UWB at SemEval-2016 Task 6: Stance Detection. In: The 10th International Workshop on Semantic Evaluation (SemEval'16), pages 408-412, ACL. San Diego, USA, June 2016. ISBN: 978-1-941643-95-2.
Steinberger, J. (2016): MediaGist: A cross-lingual analyser of aggregated news and commentaries. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - System Demonstrations, pages 145-150, ACL. Berlin, Germany, August, 2016. ISBN: 978-1-945626-03-6.
Hercig T., Brychcin T., Svoboda, L., Konkol, M. and Steinberger, J. (2016): Unsupervised methods to improve aspect-based sentiment analysis in Czech. In: Computacion y Sistemas, Vol. 20, No. 3, pages 365-375, ISSN 1405-5546.
Krejzl, P., Hourova, B., and Steinberger, J. (2016): Stance detection in online discussions. In: Proceedings of WIKT/DaZ'16, Smolenice, Slovakia, November 2016.
Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P., Lloret, E., Steinberger, J., Litvak, M., Favre, B. (2017): MultiLing 2017 Overview. In: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres/i>, Varna, Bulgaria, September 2017.
Piskorski, J., Pivovarova, L., Snajder, J., Steinberger, J., Yangarber, R. (2017): The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages. In: Proceedings of the Sixth Workshop on Balto-Slavic Natural Language Processing, ACL, Valencia, Spain, April 2017.
Tanev, H., Zavarella, V. and Steinberger, J. (2017): Monitoring disaster impact: detecting micro-events and eyewitness reports in mainstream and social media. In: Proceedings of the 14th International Conference on Information Systems for Crisis Response and Management, Albi, France, May 2017.
Hercig, T., Krejzl, P., Hourova, B., Steinberger, J. and Lenc, L. (2017): Detecting stance in Czech news commentaries. In: Proceedings of the 17th conference ITAT 2017: Slovenskocesky NLP workshop (SloNLP 2017), CreateSpace Independent Publishing Platform, Bratislava, Slovakia, September 2017.
Steinberger, J., Brychcin, T., Hercig, T., Krejzl, P. (2017): Cross-lingual flames detection in news discussions. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, Varna, Bulgaria, September 2017.
Steinberger, J., Hercig, T., Brychcin, T. (2017): Pyramid-based Summary Evaluation Using Abstract Meaning Representation. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, Varna, Bulgaria, September 2017.
Josef Steinberger, jstein[at]kiv[dot]zcu[dot]cz

University of West Bohemia

Faculty of Applied Sciences

Department of Computer Science and Engineering

NTIS - New Technologies for the Information Society, Research Center

Supported by

European Commission

Seventh Framework Programme