Video Question Answering

Video Question Answering (VideoQA) is a task that requires to analyze and jointly reason on both the given video data and a visual content-related question, to produce a meaningful and coherent answer to it. Solving this task would approach human-level capability of the model to deal with both complex video data and the visual contents-related textual data, since it would require to learn to isolate and pinpoint objects of interest in video, to identify and reason about their interactions in both spatial and temporal domains, while finding the essential bindings with the given question. Thus, VideoQA represents a challenging task at the interface between Computer Vision and Natural Language Processing (NLP). Modern approaches to this task involve a wide selection of different techniques, such as: temporal attention and spatial attention, in order to learn which frames and which regions in each frame are more important to solve the task; given the multimodal nature of the data, cross-modality fusion mechanisms, question-answer-aware […]


Adverse Drug Events (ADE) Extraction

Regulators, such as the Food and Drug Administration (FDA) and the European Medicine Agency (EMA), approve every year dozens of drugs, after verifying their safety and therapeutic effectiveness in clinical trials. Sometimes, however, clinical trials are not sufficient to discover all potential Adverse Drug Events (ADE). Pharmacovigilance, therefore, monitors the drugs in the market to ensure that unexpected effects are immediately identified and actions are taken to minimize their harm. This process relies on formal reporting methods, such as physician notes. However, a constantly growing number of patients prefers to describe the side effects on social media platforms, health forums and similar outlets. Patients have started reporting Adverse Drug Event (ADE) on social media, health forums and similar outlets, often utilizing informal language. Given the need to monitor these sources for pharmacovigilance purposes, systems for the automatic extraction of ADE are becoming an important research topic in the NLP community. Recent shared tasks on the topic of ADE extraction have […]


Predictive Maintenance

The remaining useful life (RUL) estimation of a component is an interesting problem within the Prognostics and Health Management (PHM) field, which consists in estimating the number of time steps occurring between the current time step and the end of the component life. Being able to reliably estimate this value can lead to an improvement of the maintenance scheduling and a reduction of the costs associated with it. Data driven approaches are often used in the literature and they are the preferred choice over model-based approaches: in fact, not only they are easier to build, but the data over which they are built can be gathered easily in many industrial applications. During the last years, neural networks like Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) have found many applications in this area, this because of their suitability to uncover hidden patterns within the sensor data. In recent years a greater availability of high quality sensors and easiness of data […]


Pathology Classification based on Limbs Kinematics

With the advancement of the technology of portable sensors, it’s now possible to gather data regarding the arm movements made by people with shoulder pathologies in an uninvasive ways. Such data can then be used in order to train a classifier in order to enstablish whether a person that reports problems with his limb is actually affected by a limb pathology or not. AILab Udine is cooperating with the NCS Lab of the NCS Company of Carpi in order to develop a neural network-based classifier which can be able to detect problems to a patient’ shoulders from the movements of the limbs (abduction and adduction) performed by such patient. The project is based on the Showmotion technology of the NCS Company.


Generalized Born radii computation using linear models and neural networks

Implicit solvent models play an important role in describing the thermodynamics and the dynamics of biomolecular systems. Key to an efficient use of these models is the computation of Generalized Born (GB) radii, which is accomplished by algorithms based on the electrostatics of inhomogeneous dielectric media. The speed and accuracy of such computations is still an issue especially for their intensive use in classical molecular dynamics. Here, we propose an alternative approach that encodes the physics of the phenomena and the chemical structure of the molecules in model parameters which are learned from examples. In our project, GB radii have been computed using i) a linear model and ii) a neural network. The input is the element, the histogram of counts of neighbouring atoms, divided by atom element, within 16 Å. Linear models are ca. 8 times faster than the most widely used reference method and the accuracy is higher with correlation coefficient with the inverse of “perfect” GB radii […]


Fake News Detection (AILAB-Udine – MIT Boston)

In the last few years, we have assisted at the explosion of news sharing and commenting in social networks. While this practice has positive aspects, as it stimulates the debate, it has been polluted by the diffusion of unreliable news, generally referred to as Fake News. Since these contents are often produced with malicious intents and they have a tremendous real-world political and social impact, the Natural Language Processing (NLP) community has been called to propose algorithms for their identification. Most of the currently existing works are so far based on stylistic and linguistic peculiarity of the Fake News texts (such as excessive use emphasis and hyperbolic expressions). As time passes, however, the Fake News tend to be stylistically and linguistically more similar to Real News, so that Fact Checking remains as the only reliable approach to isolate them. In this project, we employ Artificial Intelligence to assess the reliability of news on the base of not only intrinsic criteria […]


Visual Saliency Prediction

When human observers look at an image, attentive mechanisms drive their gazes towards salient regions. Emulating such ability has been studied for more than 80 years by neuroscientists and by computer vision researchers, while only recently, thanks to the large spread of deep learning, saliency prediction models have achieved a considerable improvement. Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks for predicting gaze fixations. In this project we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and we present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a Convolutional LSTM that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. Additionally, to tackle the center bias present in human eye fixations, our model can learn a set of prior maps generated with […]


Ambient Assisted Living ChatBot

A chatbot is a computer program or an Artificial Intelligence agent which conducts a conversation via auditory or textual methods. Such programs are often designed to convincingly simulate how a human would behave as a conversational partner. Chatbots are typically used in dialog systems for various practical purposes including customer service or information acquisition. They may use sophisticated natural language processing systems and are either accessed via virtual assistants, via messaging apps or via individual organizations’ apps and websites. The aim of our project is to study and analyze a series of innovative technologies which, integrated together in a prototype managed by a virtual assistant, allows to renovate the concept of domotics as we know it. Also, since chatbots are user more and more in a working environment, our aim includes developing a virtual assistant integrated with a chatbot to back up maintenance activities in an industrial setting. The project is supported by POR FESR FVG Project and conducted as […]


Egocentric Vision for Cultural Heritage

Augmented Reality and Humanity  present the opportunity for more customization of the museum experience, such as new varieties of self-guided tours or real-time translation of interpretive.  At the end of this year several companies  will release wearable computers with a head-mounted display (such as Google or Vuzix). We’d like to investigate the usage of these devices for Cultural Heritage applications. Augmented reality is a real-time direct or indirect view of a physical real-word environment that has been enhanced/augmented by adding virtual computer-generated information on it. Augmented Reality aims at simplifying the user’s life by bring virtual information non only to his immediate surroundings, but also to any indirect view of the real-world environment, such as live-video stream. AR enhances the user’s perception of and interaction with the real world. While Virtual reality technology or virtual Environment as called by Milgram, completely immerses users in a synthetic world without seeing the real world, AR technology augments the sense of reality by superimposing […]


Arabic Keyphrase Extraction

Arabic keyphrase extraction is a crucial task due to the significant and growing amount of Arabic text on the web generated by a huge population. It is becoming a challenge for the community of Arabic natural language processing because of the severe shortage of resources and published processing systems. In this paper we propose a deep learning based approach for Arabic keyphrase extraction that achieves better performance compared to the related competitive approaches. We also introduce the community with an annotated large-scale dataset of about 6000 scientific abstracts which can be used for training, validating and evaluating deep learning approaches for Arabic keyphrase extraction. Related publications: Helmy M., Vigneshram R. M., Serra G., Tasso C. Applying Deep Learning for Arabic Keyphrase Extraction. In: Proc. of the 4th International Conference on Arabic Computational Linguistics (ACLing 2018), November 17-19 2018, Dubai, United Arab Emirates. Resources: Arabic Abstracts Dataset


Egocentric Vision for Detecting Social Relationships

Social interactions are so natural that we rarely stop wondering who is interacting with whom or which people are gathering into a group and who are not. Nevertheless, humans naturally do that neglecting that the complexity of this task increases when only visual cues are available. Different situations need different behaviors: while we accept to stand in close proximity to strangers when we at- tend some kind of public event, we would feel uncomfortable in having people we do not know close to us when we have a coffee. In fact, we rarely exchange mutual gaze with people we are not interacting with, an important clue when trying to discern different social clusters. We address the problem of partitioning people in a video sequence into socially related groups from an egocentric vision (from now on, ego-vision) perspective. Human behavior is by no means random: when interacting with each other we generally stand in determined positions to avoid occlusions in our […]


Automatic Keyphrase Extraction

Keyphrases (KPs) are phrases that “capture the main topic discussed on a given document”. More specifically, KPs are phrases typically one to five words long that appear verbatim in a document, and can be used to briefly summarize its content. The task of finding such KPs is called Automatic Keyphrase Extraction (AKE). Recently, AKE has received a lot of attention, because it has been successfully used in many natural language processing (NLP) tasks, such as text summarization, document clustering, or non-NLP tasks such as social network analysis or user modeling. AKE  approaches have been also applied in Information Retrieval of relevant documents in digital document archives which can contain heterogeneous types of items, such as books articles, papers etc. However, given the wide variety of lexical, linguistic and semantic aspects that can contribute to define a keyphrase, it difficult to design hand-crafted feature, and even the best performing algorithms hardly reach F1-Scores of 50% on the most common evaluation sets. […]


Predicting the Usefulness of Amazon Reviews Using Argumentation Mining

Argumentation is the discipline that studies the way in which humans debate and articulate their opinions and beliefs. Argumentation mining is a research area at the cross-road of many fields, such as computational linguistics, machine learning, artificial intelligence, natural-language processing. The main goal of argumentation mining is the automatically extraction and identification of arguments and their relations from natural language text documents. Internet users generate content at unprecedented rates. Building intelligent systems capable of discriminating useful content within this ocean of information is thus becoming a urgent need. In this paper, we aim to predict the usefulness of Amazon reviews, and to do this we exploit features coming from an off-the-shelf argumentation mining system. We argue that the usefulness of a review, in fact, is strictly related to its argumentative content, whereas the use of an already trained system avoids the costly need of relabeling a novel dataset. Results obtained on a large publicly available corpus support this hypothesis. Related […]


Local Pyramidal Descriptors for Image Recognition

In this paper we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution pyramids in feature space. We propose that image patches be represented at multiple levels of descriptor detail and that these levels be defined in terms of local spatial pooling resolution. Preserving multiple levels of detail in local descriptors is a way of hedging one’s bets on which levels will most relevant for matching during learning and recognition. We introduce the Pyramid SIFT (P-SIFT) descriptor and show that its use in four state-of-the-art image recognition pipelines improves accuracy and yields state-of-the-art results. Our technique is applicable independently of spatial pyramid matching and we show that spatial pyramids can be combined with local pyramids to obtain further improvement. We achieve state-of-the-art results on Caltech-101 (80.1%) and Caltech-256 (52.6%) when compared to other approaches based on SIFT features over intensity images. Our technique is efficient and is extremely easy to integrate into […]


Human action categorization in unconstrained videos

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor. First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative […]