Automatic Keyphrase Extraction

Keyphrases (KPs) are phrases that “capture the main topic discussed on a given document”. More specifically, KPs are phrases typically one to five words long that appear verbatim in a document, and can be used to briefly summarize its content. The task of finding such KPs is called Automatic Keyphrase Extraction (AKE). Recently, AKE has received a lot of attention, because it has been successfully used in many natural language processing (NLP) tasks, such as text summarization, document clustering, or non-NLP tasks such as social network analysis or user modeling. AKE  approaches have been also applied in Information Retrieval of relevant documents in digital document archives which can contain heterogeneous types of items, such as books articles, papers etc. However, given the wide variety of lexical, linguistic and semantic aspects that can contribute to define a keyphrase, it difficult to design hand-crafted feature, and even the best performing algorithms hardly reach F1-Scores of 50% on the most common evaluation sets. […]


Predicting the Usefulness of Amazon Reviews Using Argumentation Mining

Argumentation is the discipline that studies the way in which humans debate and articulate their opinions and beliefs. Argumentation mining is a research area at the cross-road of many fields, such as computational linguistics, machine learning, artificial intelligence, natural-language processing. The main goal of argumentation mining is the automatically extraction and identification of arguments and their relations from natural language text documents. Internet users generate content at unprecedented rates. Building intelligent systems capable of discriminating useful content within this ocean of information is thus becoming a urgent need. In this paper, we aim to predict the usefulness of Amazon reviews, and to do this we exploit features coming from an off-the-shelf argumentation mining system. We argue that the usefulness of a review, in fact, is strictly related to its argumentative content, whereas the use of an already trained system avoids the costly need of relabeling a novel dataset. Results obtained on a large publicly available corpus support this hypothesis. Related […]