Κυριακή 24 Νοεμβρίου 2019

Cognitive Modelling and Learning for Multimedia Mining and Understanding

RGB-D Scene Classification via Multi-modal Feature Learning

Abstract

Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.

Ensemble p -Laplacian Regularization for Scene Image Recognition

Abstract

Recently, manifold regularized semi-supervised learning (MRSSL) received considerable attention, because it successfully exploits the geometry of the intrinsic data probability distribution to leverage the performance of a learning model. As a natural nonlinear generalization of graph Laplacian, p-Laplacian has been proved having the rich theoretical foundations to better preserve the local structure. However, it is difficult to determine the fitting graph p-Lapalcian, i.e., the parameter p, which is a critical factor for the performance of graph p-Laplacian. Therefore, we develop an ensemble p-Laplacian regularization (EpLapR) to fully approximate the intrinsic manifold of the data distribution. EpLapR incorporates multiple graphs into a regularization term in order to sufficiently explore the complementation of graph p-Laplacian. Specifically, we construct a fused graph by introducing an optimization approach to assign suitable weights on different p value graphs. And then, we conduct semi-supervised learning framework on the fused graph. Extensive experiments on UC-Merced dataset and Scene 15 dataset demonstrate the effectiveness and efficiency of the proposed method.

CSA-DE/EDA: a Novel Bio-inspired Algorithm for Function Optimization and Segmentation of Brain MR Images

Abstract

The clonal selection algorithm (CSA), which describes the basic features of an immune response to an antigenic stimulus, has drawn a lot of attention in the biologically inspired computing community, due to its highly adaptive and easy-to-implement nature. Despite many successful applications, CSA still suffers from limited ability to explore the solution space. In this paper, we incorporate the differential evolution (DE) algorithm and the estimation of distribution algorithm (EDA) into CSA, and thus propose a novel bio-inspired algorithm referred to as CSA-DE/EDA. In the proposed algorithm, the hypermutation and receptor editing processes are implemented based on DE and EDA, which provide improved local and global search ability, respectively. We have applied the proposed algorithm to five commonly used benchmark functions for optimization and brain magnetic resonance (MR) image segmentation. Our comparative experimental results show that the proposed CSA-DE/EDA algorithm outperforms several bio-inspired computing techniques. CSA-DE/EDA is a compelling bio-inspired algorithm for optimization tasks.

Clustering of Remote Sensing Imagery Using a Social Recognition-Based Multi-objective Gravitational Search Algorithm

Abstract

Cognitively inspired swarm intelligence algorithms (SIAs) have attracted much attention in the research area of clustering since it can give machine the ability of self-learning to achieve better classification results. Recently, the SIA-based multi-objective optimization (MOO) methods have shown their superiorities in data clustering. However, their performances are limited when applying to the clustering of remote sensing imagery (RSI). To construct an excellent MOO-based clustering method, this paper presents a social recognition-based multi-objective gravitational search algorithm (SMGSA) to achieve simultaneous optimization of two conflicting cluster validity indices, i.e., the Xie-Beni (XB) index and the Jm index. In the SMGSA, searching particles not only are guided by those elite particles stored in an external archive by the gravitational force but also learn from the social recognition of the whole population through the position difference. SMGSA thereby formed with outstanding exploitation ability. Comparison experiments on two public RSI data sets, including a moderate aerial image and a hyperspectral, validated that the MOO-based clustering methods could obtain more accurate results than the single validity index-based method. Moreover, the SMGSA-based method can achieve superior results than that of the multi-objective gravitational search algorithm without social recognition ability. The proposed SMGSA performs favorable balance between the two conflicting cluster validity indices and achieves preferable classification for the RSI. In addition, this study indicates that the swarm intelligence-based cognitive computing is potential for the intelligent interpretation and understanding of complicated remote sensing scene.

Neural Image Caption Generation with Weighted Training and Reference

Abstract

Image captioning, which aims to automatically generate a sentence description for an image, has attracted much research attention in cognitive computing. The task is rather challenging, since it requires cognitively combining the techniques from both computer vision and natural language processing domains. Existing CNN-RNN framework-based methods suffer from two main problems: in the training phase, all the words of captions are treated equally without considering the importance of different words; in the caption generation phase, the semantic objects or scenes might be misrecognized. In our paper, we propose a method based on the encoder-decoder framework, named Reference based Long Short Term Memory (R-LSTM), aiming to lead the model to generate a more descriptive sentence for the given image by introducing reference information. Specifically, we assign different weights to the words according to the correlation between words and images during the training phase. We additionally maximize the consensus score between the captions generated by the captioning model and the reference information from the neighboring images of the target image, which can reduce the misrecognition problem. We have conducted extensive experiments and comparisons on the benchmark datasets MS COCO and Flickr30k. The results show that the proposed approach can outperform the state-of-the-art approaches on all metrics, especially achieving a 10.37% improvement in terms of CIDEr on MS COCO. By analyzing the quality of the generated captions, we come to a conclusion that through the introduction of reference information, our model can learn the key information of images and generate more trivial and relevant words for images.

A Novel Deep Density Model for Unsupervised Learning

Abstract

Density models are fundamental in machine learning and have received a widespread application in practical cognitive modeling tasks and learning problems. In this work, we introduce a novel deep density model, referred to as deep mixtures of factor analyzers with common loadings (DMCFA), with an efficient greedy layer-wise unsupervised learning algorithm. The model employs a mixture of factor analyzers sharing common component loadings in each layer. The common loadings can be considered to be a feature selection or reduction matrix which makes this new model more physically meaningful. Importantly, sharing common components is capable of reducing both the number of free parameters and computation complexity remarkably. Consequently, DMCFA makes inference and learning rely on a dramatically more succinct model and avoids sacrificing its flexibility in estimating the data density by utilizing Gaussian distributions as the priors. Our model is evaluated on five real datasets and compared to three other competitive models including mixtures of factor analyzers (MFA), MFA with common loadings (MCFA), deep mixtures of factor analyzers (DMFA), and their collapsed counterparts. The results demonstrate the superiority of the proposed model in the tasks of density estimation, clustering, and generation.

Unsupervised Object Transfiguration with Attention

Abstract

Object transfiguration is a subtask of the image-to-image translation, which translates two independent image sets and has a wide range of applications. Recently, some studies based on Generative Adversarial Network (GAN) have achieved impressive results in the image-to-image translation. However, the object transfiguration task only translates regions containing target objects instead of whole images; most of the existing methods never consider this issue, which results in mistranslation on the backgrounds of images. To address this problem, we present a novel pipeline called Deep Attention Unit Generative Adversarial Networks (DAU-GAN). During the translating process, the DAU computes attention masks that point out where the target objects are. DAU makes GAN concentrate on translating target objects while ignoring meaningless backgrounds. Additionally, we construct an attention-consistent loss and a background-consistent loss to compel our model to translate intently target objects and preserve backgrounds further effectively. We have comparison experiments on three popular related datasets, demonstrating that the DAU-GAN achieves superior performance to the state-of-the-art. We also export attention masks in different stages to confirm its effect during the object transfiguration task. The proposed DAU-GAN can translate object effectively as well as preserve backgrounds information at the same time. In our model, DAU learns to focus on the most important information by producing attention masks. These masks compel DAU-GAN to effectively distinguish target objects and backgrounds during the translation process and to achieve impressive translation results in two subsets of ImageNet and CelebA. Moreover, the results show that we cannot only investigate the model from the image itself but also research from other modal information.

A New Algorithm for SAR Image Target Recognition Based on an Improved Deep Convolutional Neural Network

Abstract

In an attempt to exploit the automatic feature extraction ability of biologically-inspired deep learning models, and enhance the learning of target features, we propose a novel deep learning algorithm. This is based on a deep convolutional neural network (DCNN) trained with an improved cost function, and combined with a support vector machine (SVM). Specifically, class separation information, which explicitly facilitates intra-class compactness and inter-class separability in the process of learning features, is added to an improved cost function as a regularization term, to enhance the DCNN’s feature extraction ability. The enhanced DCNN is applied to learn the features of Synthetic Aperture Radar (SAR) images, and the SVM is utilized to map features into output labels. Simulation experiments are performed using benchmark SAR image data from the Moving and Stationary Target Acquisition and Recognition (MSTAR) database. Comparative results demonstrate the effectiveness of our proposed method, with an average accuracy of 99% on ten types of targets, including variants and articulated targets. We conclude that our proposed DCNN method has significant potential to be exploited for SAR image target recognition, and can serve as a new benchmark for the research community.

Determination of Temporal Stock Investment Styles via Biclustering Trading Patterns

Abstract

Due to the effects of many deterministic and stochastic factors, it has always been a challenging goal to gain good profits from the stock market. Many methods based on different theories have been proposed in the past decades. However, there has been little research about determining the temporal investment style (i.e., short term, middle term, or long term) for the stock. In this paper, we propose a method to find suitable stock investment styles in terms of investment time. Firstly, biclustering is applied to a matrix that is composed of technical indicators of each trading day to discover trading patterns (regarded as trading rules). Subsequently a k-nearest neighbor (KNN) algorithm is employed to transform the trading rules to the trading actions (i.e., the buy, sell, or no-action signals). Finally, a min-max and quantization strategy is designed for determination of the temporal investment style of the stock. The proposed method was tested on 30 stocks from US bear, bull, and flat markets. The experimental results validate its usefulness.

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου