Author Profiling using Semantic and Syntactic Features: Notebook for PAN at CLEF 2019
In this paper we present an approach for the PAN 2019 Author Profiling challenge. The task here is to detect Twitter bots and also to classify the gender of human Twitter users as male or female, based on a hundred select tweets from their profile. Focusing on feature engineering, we explore the semantic categories present in tweets. We combine these semantic features with part of speech tags and other stylistic features–eg character floodings and the use of capital letters–for our eventual feature set. We have experimented with different machine learning techniques, including ensemble techniques, and found AdaBoost to be the most successful (attaining an F1-score of 0.99 on the development set). Using this technique, we achieved an accuracy score of 89.17% for English language tweets in the bot detection subtask.
Subword semantic hashing for intent classification on small datasets
In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods , ,  are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications . Our benchmarks are available online.Read More
ProbAct: A Probabilistic Activation Function for Deep Neural Networks
Activation functions play an important role in training artificial neural networks. The majority of currently used activation functions are deterministic in nature, with their fixed input-output relationship. In this work, we propose a novel probabilistic activation function, called ProbAct. ProbAct is decomposed into a mean and variance and the output value is sampled from the formed distribution, making ProbAct a stochastic activation function. The values of mean and variances can be fixed using known functions or trained for each element. In the trainable ProbAct, the mean and the variance of the activation distribution is trained within the back-propagation framework alongside other parameters. We show that the stochastic perturbation induced through ProbAct acts as a viable generalization technique for feature augmentation. In our experiments, we compare ProbAct with well-known activation functions on classification tasks on different modalities: Images(CIFAR-10, CIFAR-100, and STL-10) and Text (Large Movie Review). We show that ProbAct increases the classification accuracy by +2-3% compared to ReLU or other conventional activation functions on both original datasets and when datasets are reduced to 50% and 25% of the original size. Finally, we show that ProbAct learns an ensemble of models by itself that can be used to estimate the uncertainties associated with the prediction and provides robustness to noisy inputs.
Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference
We introduce a novel uncertainty estimation for classification tasks for Bayesian convolutional neural networks with variational inference. By normalizing the output of a Softplus function in the final layer, we estimate aleatoric and epistemic uncertainty in a coherent manner. The intractable posterior probability distributions over weights are inferred by Bayes by Backprop. Firstly, we demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures. On multiple datasets in supervised learning settings (MNIST, CIFAR-10, CIFAR-100), this variational inference method achieves performances equivalent to frequentist inference in identical architectures, while the two desiderata, a measure for uncertainty and regularization are incorporated naturally. Secondly, we examine how our proposed measure for aleatoric and epistemic uncertainties is derived and validate it on the aforementioned datasets.
A comprehensive guide to bayesian convolutional neural network with variational inference
Artificial Neural Networks are connectionist systems that perform a given task by learning on examples without having prior knowledge about the task. This is done by finding an optimal point estimate for the weights in every node. Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions. In this paper, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks. The results are compared to point-estimates based architectures on MNIST, CIFAR-10 and CIFAR-100 datasets for Image CLassification task, on BSD300 dataset for Image Super Resolution task and on CIFAR10 dataset again for Generative Adversarial Network task. BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior. We, therefore, introduce the idea of applying two convolutional operations, one for the mean and one for the variance. Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Moreover, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases. Finally, we propose ways to prune the Bayesian architecture and to make it more computational and time effective.Read More