Papers using Fashion-MNIST (till 09.18)

	A	B	C	D	E	F	G	H
1	⭐	Title	URL	Instititue	Topic	Conf	Country	Abstract

2		Merging Deep Neural Networks for Mobile Devices	http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w33/Chou_Merging_Deep_Neural_CVPR_2018_paper.pdf	Academia Sinica, MOST Joint Research Center, National Chiayi University	model compression	ECCV18	Taiwan	In this paper, a novel method to merge convolutional neural networks for the inference stage is introduced. When two feed-forward networks already trained for handling different tasks are given, our method can align the layers of these networks and merge them into a unified model by sharing the representative weights. The performance of the merged model can be restored or improved via re-training. Without needing high-performance hardware, the proposed method effectively produces a compact model to run the original tasks simultaneously on resource-limited devices. The system development time, as well as training overhead, is substantially reduced because our method leverages the co-used weights and preserves the general architectures of the well-trained networks. The merged model is jointly compressed and can be implemented faster than the original models with a comparable accuracy. When combining VGG-Avg and ZF-Net models, our approach can achieve higher than 12 and 2.5 times of compression and speedup ratios compared to the original whole models, respectively, while the accuracy remains approximately the same
3		An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification	https://arxiv.org/abs/1712.03541	Adamson University	CNN, SVM		Philippines	Convolutional neural networks (CNNs) are similar to "ordinary" neural networks in the sense that they are made up of hidden layers consisting of neurons with "learnable" parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.
6		Deep Learning using Rectified Linear Units (ReLU)	https://arxiv.org/abs/1803.08375	Adamson University	classification		Philippines	We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.
7		DynMat, a network that can learn after learning	https://arxiv.org/abs/1806.06253	Allen Institute for Brain Science	Lifelong learning		USA	To survive in the dynamically-evolving world, we accumulate knowledge and improve our skills based on experience. In the process, gaining new knowledge does not disrupt our vigilance to external stimuli. In other words, our learning process is 'accumulative' and 'online' without interruption. However, despite the recent success, artificial neural networks (ANNs) must be trained offline, and they suffer catastrophic interference between old and new learning, indicating that ANNs' conventional learning algorithms may not be suitable for building intelligent agents comparable to our brain. In this study, we propose a novel neural network architecture (DynMat) consisting of dual learning systems, inspired by the complementary learning system (CLS) theory suggesting that the brain relies on short- and long-term learning systems to learn continuously. Our experiments show that 1) DynMat can learn a new class without catastrophic interference and 2) it does not strictly require offline training.
8		Compatibility Family Learning for Item Recommendation and Generation	https://arxiv.org/abs/1712.01262	Appier Inc, National Taiwan University, National Tsing Hua University	GAN, recommendation	AAAI18	Taiwan	Compatibility between items, such as clothes and shoes, is a major factor among customer's purchasing decisions. However, learning "compatibility" is challenging due to (1) broader notions of compatibility than those of similarity, (2) the asymmetric nature of compatibility, and (3) only a small set of compatible and incompatible items are observed. We propose an end-to-end trainable system to embed each item into a latent vector and project a query item into K compatible prototypes in the same space. These prototypes reflect the broad notions of compatibility. We refer to both the embedding and prototypes as "Compatibility Family". In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item. We evaluate our system on a toy dataset, two Amazon product datasets, and Polyvore outfit dataset. Our method consistently achieves state-of-the-art performance. Finally, we show that we can visualize the candidate compatible prototypes using a Metric-regularized Conditional Generative Adversarial Network (MrCGAN), where the input is a projected prototype and the output is a generated image of a compatible item. We ask human evaluators to judge the relative compatibility between our generated images and images generated by CGANs conditioned directly on query items. Our generated images are significantly preferred, with roughly twice the number of votes as others.
9		A comparison of activation functions in artificial neural networks	https://ieeexplore.ieee.org/abstract/document/8404724/	Bahçeşehir Üniversitesi	activation function	2018 26th Signal Processing and Communications Applications Conference (SIU)	Turkey	In this study, the effects of Activation Functions (AF) in Artificial Neural Network (ANN) on regression and classification performance are compared. In comparisons, success rates in test data and duration of training are evaluated for both problems. A total of 11 AF functions, 10 AF commonly used in the literature and Square function proposed in this study, are compared using 7 different datasets, 2 for regression and 5 for classification. 3 different ANN architectures, which are considered to be the most appropriate for each dataset are employed in the experiments. As a result of totally 231 different training procedures, the effects of Afs are examined for different datasets and architectures. Similarly, the effects of AF on training time are shown for different datasets. In the experiments it is shown that ReLU is the most succesfull AF in general purposes. In addition to ReLU, Square function gives the better results in image datasets.
10		An Improved Method of Identifying Mislabeled Data and the Mislabeled Data in MNIST and CIFAR-10 Appendix Findings in Fashion-MNIST	https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3097307	Beijing University of Posts and Telecommunications	dataset, classification		China	Objects classification is an important part of machine learning and the quality of the training data plays an important role in it. Some mislabeled data detection techniques have been proposed; however, there is no such work done on MNIST and CIFAR-10, the result on which is an important criterion for a machine learning models or algorithms. In this paper I develop an improved method to identify mislabeled data and find 675 mislabeled data in MNIST, 118 mislabeled data in CIFAR-10, some mislabeled data in fashion MNIST.
11		A New Structure and New Critiques for Dataset of Image Classification	https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3103621	Beijing University of Posts and Telecommunications	dataset, classification		China	Dataset is an important part of image classification. The current dataset structure has following problems: lack of mutually exclusive explicit definition for each class, and the definitions lack of features to distinguish the class, and the instances are incoherent to the definition, and the label can’t present uncertainty, and there is only critique: the accuracy of algorithms. In this paper, I demonstrate a new structure of dataset in image classification, including: mutually exclusive explicit definition of labels and the label of labels, which classify labels into coherent, wrong and uncertain (including multi-objects, mid-object, unknown and unclear) and explore three datasets (MNIST, CIFAR-10 and Fashion-MNIST) to explain why it is necessary. And I define a new critique to assess algorithms performance on uncertainty and anti-attack. At end I provide some advice on how to build such a dataset.
12		A New Structure and Criterion for Dataset of Image Classification	https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3107322	Beijing University of Posts and Telecommunications	dataset, classification		China	What is the correct label for an image in Machine Learning? I investigate the process of label assign-ment of MNIST, CIFAR-10 and Fashion-MNIST and the results show that even in these simple da-tasets classes are not always as easy to distinguish as lived and died, and sometimes even impossible. The reasons include: lack of mutually exclusive definition for each class, and the definition lack of features to distinguish the classes, and the instances are incoherent to the definition and overlap, and the label can’t present uncertainty. In current research, these images and labels are treated as noise label. To give a clearer definition for these cases, I pro-pose a new structure of dataset including: mutually exclusive definition of classes and the label of la-bels, which classify labels into coherent, wrong and uncertain (including multi-objects, mid-object, un-known and unclear), and instances of different la-bels, and criterion to assess algorithms performance. Such structure could also apply to more complex dataset such as ChestXray14 in medical field to make learning and prediction more accurate and meaningful.
13		A Generalized Active Learning Approach for Unsupervised Anomaly Detection	https://arxiv.org/abs/1805.09411	Belo Horizonte	anomaly detection		Brazil	This work formalizes the new framework for anomaly detection, called active anomaly detection. This framework has, in practice, the same cost of unsupervised anomaly detection but with the possibility of much better results. We show that unsupervised anomaly detection is an undecidable problem and that a prior needs to be assumed for the anomalies probability distribution in order to have performance guarantees. Finally, we also present a new layer that can be attached to any deep learning model designed for unsupervised anomaly detection to transform it into an active anomaly detection method, presenting results on both synthetic and real anomaly detection datasets.
16		Generation of Synthetic Images with Generative Adversarial Networks	http://www.diva-portal.org/smash/record.jsf?pid=diva2:1180839	Blekinge Institute of Technology	classification, deep learning, GAN, machine learning		Sweden	Machine Learning is a fast growing area that revolutionizes computer programs by providing systems with the ability to automatically learn and improve from experience. In most cases, the training process begins with extracting patterns from data. The data is a key factor for machine learning algorithms, without data the algorithms will not work. Thus, having sufficient and relevant data is crucial for the performance. In this thesis, the researcher tackles the problem of not having a sufficient dataset, in terms of the number of training examples, for an image classification task. The idea is to use Generative Adversarial Networks to generate synthetic images similar to the ground truth, and in this way expand a dataset. Two types of experiments were conducted: the first was used to fine-tune a Deep Convolutional Generative Adversarial Network for a specific dataset, while the second experiment was used to analyze how synthetic data examples affect the accuracy of a Convolutional Neural Network in a classification task. Three well known datasets were used in the first experiment, namely MNIST, Fashion-MNIST and Flower photos, while two datasets were used in the second experiment: MNIST and Fashion-MNIST. The results of the generated images of MNIST and Fashion-MNIST had good overall quality. Some classes had clear visual errors while others were indistinguishable from ground truth examples. When it comes to the Flower photos, the generated images suffered from poor visual quality. One can easily tell the synthetic images from the real ones. One reason for the bad performance is due to the large quantity of noise in the Flower photos dataset. This made it difficult for the model to spot the important features of the flowers. The results from the second experiment show that the accuracy does not increase when the two datasets, MNIST and Fashion-MNIST, are expanded with synthetic images. This is not because the generated images had bad visual quality, but because the accuracy turned out to not be highly dependent on the number of training examples. It can be concluded that Deep Convolutional Generative Adversarial Networks are capable of generating synthetic images similar to the ground truth and thus can be used to expand a dataset. However, this approach does not completely solve the initial problem of not having adequate datasets because Deep Convolutional Generative Adversarial Networks may themselves require, depending on the dataset, a large quantity of training examples.
17		Conditional Information Gain Networks	https://arxiv.org/abs/1807.09534	Bogazici University, PerceptiveIO, Inc	mixture of experts, decision tree	ICPR18	Turkey	Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained end-to-end using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines.
18		Learning relevant features of data with multi-scale tensor networks	http://iopscience.iop.org/article/10.1088/2058-9565/aaba1a/meta	Center for Computational Quantum Physics	Coarse-grained modeling	Quantum Science and Technology	USA	Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance.
19		Conceptual alignment deep neural networks	https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs169457	Central South University, Guangzhou University, Hubei University of Education, Providence University	interpretability, conceptual alignment	[J] Intelligent and Fuzzy Systems	China, Taiwan	Deep Neural Networks (DNNs) have powerful recognition abilities to classify different objects. Although the models of DNNs can reach very high accuracy even beyond human level, they are regarded as black boxes that are absent of interpretability. In the training process of DNNs, abstract features can be automatically extracted from high-dimensional data, such as images. However, the extracted features are usually mapped into a representation space that is not aligned with human knowledge. In some cases, the interpretability is necessary, e.g. medical diagnoses. For the purpose of aligning the representation space with human knowledge, this paper proposes a kind of DNNs, termed as Conceptual Alignment Deep Neural Networks (CADNNs), which can produce interpretable representations in the hidden layers. In CADNNs, some hidden neurons are selected as conceptual neurons to extract the human-formed concepts, while other hidden neurons, called free neurons, can be trained freely. All hidden neurons will contribute to the final classification results. Experiments demonstrate that the CADNNs can keep up with the accuracy of DNNs, even though CADNNs have extra constraints of conceptual neurons. Experiments also reveal that the free neurons could learn some concepts aligned with human knowledge in some cases.
20		On Batch Adaptive Training for Deep Learning: Lower Loss and Larger Step Size	https://openreview.net/forum?id=SybqeKgA-	Chinese Academy of Sciences	gradient, regularziation technique, batch size		China	Mini-batch gradient descent and its variants are commonly used in deep learning. The principle of mini-batch gradient descent is to use noisy gradient calculated on a batch to estimate the real gradient, thus balancing the computation cost per iteration and the uncertainty of noisy gradient. However, its batch size is a fixed hyper-parameter requiring manual setting before training the neural network. Yin et al. (2017) proposed a batch adaptive stochastic gradient descent (BA-SGD) that can dynamically choose a proper batch size as learning proceeds. We extend the BA-SGD to momentum algorithm and evaluate both the BA-SGD and the batch adaptive momentum (BA-Momentum) on two deep learning tasks from natural language processing to image classification. Experiments confirm that batch adaptive methods can achieve a lower loss compared with mini-batch methods after scanning the same epochs of data. Furthermore, our BA-Momentum is more robust against larger step sizes, in that it can dynamically enlarge the batch size to reduce the larger uncertainty brought by larger step sizes. We also identified an interesting phenomenon, batch size boom. The code implementing batch adaptive framework is now open source, applicable to any gradient-based optimization problems.
21		Provable defenses against adversarial examples via the convex outer adversarial polytope	http://proceedings.mlr.press/v80/wong18a.html	CMU	adversarial learning	ICML18	USA	We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded ℓ∞ norm less than ϵ=0.1).
22		Cross Domain Image Generation through Latent Space Exploration with Adversarial Loss	https://arxiv.org/abs/1805.10130	CMU	VAE		USA	Conditional domain generation is a good way to interactively control sample generation process of deep generative models. However, once a conditional generative model has been created, it is often expensive to allow it to adapt to new conditional controls, especially the network structure is relatively deep. We propose a conditioned latent domain transfer framework across latent spaces of unconditional variational autoencoders(VAE). With this framework, we can allow unconditionally trained VAEs to generate images in its domain with conditionals provided by a latent representation of another domain. This framework does not assume commonalities between two domains. We demonstrate effectiveness and robustness of our model under widely used image datasets.
23		BourGAN: Generative Networks with Metric Embeddings	https://arxiv.org/abs/1805.07674	Columbia University	GAN	NIPS18	USA	This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Under this geometric lens, we embed subsamples of the dataset from an arbitrary metric space into the l2 space, while preserving their pairwise distance distribution. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features.
24		PCA and Autoencoders	http://tylermd.com/pdf/pca_ae.pdf	Concordia University	PCA, dimensionality reduction		Canada	In this paper, I compare two dimensionality reduction techniques for processing images before learning a multinomial logistic regression model. Principal component analysis is used as a linear mapping and autoencoders, a neural network technique, is used as a non-linear mapping. The reconstruction differences and classification errors are both compared using the fashion MNIST dataset.
25		Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels	https://arxiv.org/abs/1805.07836	Cornell University	loss function	NIPS18	USA	Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.
26		Asynchronous Evolution of Convolutional Networks	http://ceur-ws.org/Vol-2203/80.pdf	Czech Academy of Sciences	architecture search	ITAT18	Czech	Due to many successful practical applications, deep neural networks and convolutional networks have become the state-of-art machine learning methods recently. The choice of network architecture for the task at hand is typically made by trial and error. This work deals with an automatic data-dependent architecture design. We propose an algorithm for optimization of architecture of convolutional network based on asynchronous evolution. The algorithm is inspired by and designed directly for the Keras library which is one of the most common implementations of deep neural networks. The proposed algorithm is successfully tested on MNIST and Fashion-MNIST data sets.
27		Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams.	https://www.ijcai.org/proceedings/2018/0393.pdf	Dalian University of Technology, University of Rochester	data stream, kernel method	IJCAI18	China, USA	Data stream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while updating the model in an efficient and stable fashion, especially for the chunk data. This paper proposes a fast factorization-free kernel learning method to unify novelty detection and incremental learning for unlabeled chunk data streams in one framework. The proposed method constructs a joint reproducing kernel Hilbert space from known class centers by solving a linear system in kernel space. Naturally, unlabeled data can be detected and classified among multi-classes by a single decision model. And projecting samples into the discriminative feature space turns out to be the product of two small-sized kernel matrices without needing such time-consuming factorization like QR-decomposition or singular value decomposition. Moreover, the insertion of a novel class can be treated as the addition of a new orthogonal basis to the existing feature space, resulting in fast and stable updating schemes. Both theoretical analysis and experimental validation on real-world datasets demonstrate that the proposed methods learn chunk data streams with significantly lower computational costs and comparable or superior accuracy than the state of the art.
28		Self-Organization adds application robustness to deep learners	https://openreview.net/forum?id=BJ8lbVAfz	Dallas University	self-organizing map		USA	While self-organizing principles have motivated much of early learning models, such principles have rarely been included in deep learning architectures. Indeed, from a supervised learning perspective it seems that topographic constraints are rather decremental to optimal performance. Here we study a network model that incorporates self-organizing maps into a supervised network and show how gradient learning results in a form of a self-organizing learning rule. Moreover, we show that such a model is robust in the sense of its application to a variety of areas, which is believed to be a hallmark of biological learning systems.
29		Separable explanations of neural network decisions	http://www.interpretable-ml.org/nips2017workshop/papers/05.pdf	Denmark Technical University	Interpretability	NIPS17 workshop	Denmark	Deep Taylor Decomposition is a method used to explain neural network decisions. When applying this method to non-dominant classifications, the resulting explanation does not reflect important features for the chosen classification. We propose that this is caused by the dense layers and propose a method to alleviate the effect by applying regularization. We assess the result by measuring the quality of the resulting explanations objectively and subjectively.
30		Neural Clustering By Predicting And Copying Noise	https://openreview.net/forum?id=BJvVbCJCb	DigitalGenius	unsupervised learning, clustering, deep learning		UK	We propose a neural clustering model that jointly learns both latent features and how they cluster. Unlike similar methods our model does not require a predefined number of clusters. Using a supervised approach, we agglomerate latent features towards randomly sampled targets within the same space whilst progressively removing the targets until we are left with only targets which represent cluster centroids. To show the behavior of our model across different modalities we apply our model on both text and image data and very competitive results on MNIST. Finally, we also provide results against baseline models for fashion-MNIST, the 20 newsgroups dataset, and a Twitter dataset we ourselves create. TL;DR: Neural clustering without needing a number of clusters
31		Low-Rank Sparse Preserving Projections for Dimensionality Reduction	https://ieeexplore.ieee.org/abstract/document/8410623/	Sichuan University	Dimensionality reduction , feature extraction , manifold learning , low-rank representation , image classification	[J] IEEE Transactions on Image Processing	China	Learning an efficient projection to map high-dimensional data into a lower dimensional space is a rather challenging task in the community of pattern recognition and computer vision. Manifold learning is widely applied because it can disclose the intrinsic geometric structure of data. However, it only concerns the geometric structure and may lose its effectiveness in case of corrupted data. To address this challenge, we propose a novel dimensionality reduction method by combining the manifold learning and low-rank sparse representation, termed low-rank sparse preserving projections (LSPP), which can simultaneously preserve the intrinsic geometric structure and learn a robust representation to reduce the negative effects of corruptions. Therefore, LSPP is advantageous to extract robust features. Because the formulated LSPP problem has no closed-form solution, we use the linearized alternating direction method with adaptive penalty and eigen-decomposition to obtain the optimal projection. The convergence of LSPP is proven, and we also analyze its complexity. To validate the effectiveness and robustness of LSPP in feature extraction and dimensionality reduction, we make a critical comparison between LSPP and a series of related dimensionality reduction methods. The experimental results demonstrate the effectiveness of LSPP.
32		Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network	https://www.sciencedirect.com/science/article/pii/S0893608018301096	Dongseo University	Dropout, regularziation technique, CNN	[J] Neural Networks	South Korea	Training a deep neural network with a large number of parameters often leads to overfitting problem. Recently, Dropout has been introduced as a simple, yet effective regularization approach to combat overfitting in such models. Although Dropout has shown remarkable results on many deep neural network cases, its actual effect on CNN has not been thoroughly explored. Moreover, training a Dropout model will significantly increase the training time as it takes longer time to converge than a non-Dropout model with the same architecture. To deal with these issues, we address Biased Dropout and Crossmap Dropout, two novel approaches of Dropout extension based on the behavior of hidden units in CNN model. Biased Dropout divides the hidden units in a certain layer into two groups based on their magnitude and applies different Dropout rate to each group appropriately. Hidden units with higher activation value, which give more contributions to the network final performance, will be retained by a lower Dropout rate, while units with lower activation value will be exposed to a higher Dropout rate to compensate the previous part. The second approach is Crossmap Dropout, which is an extension of the regular Dropout in convolution layer. Each feature map in a convolution layer has a strong correlation between each other, particularly in every identical pixel location in each feature map. Crossmap Dropout tries to maintain this important correlation yet at the same time break the correlation between each adjacent pixel with respect to all feature maps by applying the same Dropout mask to all feature maps, so that all pixels or units in equivalent positions in each feature map will be either dropped or active during training. Our experiment with various benchmark datasets shows that our approaches provide better generalization than the regular Dropout. Moreover, our Biased Dropout takes faster time to converge during training phase, suggesting that assigning noise appropriately in hidden units can lead to an effective regularization.
33		Comparative study of deep learning methods for one-shot image classification	https://pure.tue.nl/ws/files/91222834/Comparative_study_of_Deep_Learning_methds_for_One_shot_Image_Classifications.pdf	Eindhoven University of Technology	zero/few-shot learning	DBDBD17	The Netherlands	Training deep learning models for images classification requires large amount of labeled data to overcome the challenges of overfitting and underfitting. Usually, in many practical applications, these labeled data are not available. In an attempt to solve this problem, the one-shot learning paradigm tries to create machine learning models capable to learn well from one or (maximum) few labeled examples per class. To understand better the behavior of various deep learning models and approaches for one-shot learning, in this abstract, we perform a comparative study of the most used ones, on a challenging real-world dataset, i.e Fashion-MNIST
34	⭐	Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science	https://www.nature.com/articles/s41467-018-04316-3	Eindhoven University of Technology, University of Derby	training algorithm, sparsity, regularziation technique	[J] Nature Communications	UK, The Netherlands	Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős–Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.
35		Differentiable Abstract Interpretation for Provably Robust Neural Networks	http://proceedings.mlr.press/v80/mirman18b.html	ETH Zurich	robust learning, abstract interpretation	ICML18	Switzerland	We introduce a scalable method for training robust neural networks based on abstract interpretation. We present several abstract transformers which balance efficiency with precision and show these can be used to train large neural networks that are certifiably robust to adversarial perturbations.
36		Defending Against Adversarial Attacks by Leveraging an Entire GAN	https://arxiv.org/abs/1805.10652	ETH Zurich	GAN, adversarial attack	ICML18	Switzerland	Recent work has shown that state-of-the-art models are highly vulnerable to adversarial perturbations of the input. We propose cowboy, an approach to detecting and defending against adversarial attacks by using both the discriminator and generator of a GAN trained on the same dataset. We show that the discriminator consistently scores the adversarial samples lower than the real samples across multiple attacks and datasets. We provide empirical evidence that adversarial samples lie outside of the data manifold learned by the GAN. Based on this, we propose a cleaning method which uses both the discriminator and generator of the GAN to project the samples back onto the data manifold. This cleaning procedure is independent of the classifier and type of attack and thus can be deployed in existing systems.
37		Efficient Image Dataset Classification Difficulty Estimation for Predicting Deep-Learning Accuracy	https://arxiv.org/abs/1803.09588	ETH Zurich, IBM Research, Università di Bologna, Queen’s University of Belfast	difficulty estimation		Switzerland, Italy, UK	In the deep-learning community new algorithms are published at an incredible pace. Therefore, solving an image classification problem for new datasets becomes a challenging task, as it requires to re-evaluate published algorithms and their different configurations in order to find a close to optimal classifier. To facilitate this process, before biasing our decision towards a class of neural networks or running an expensive search over the network space, we propose to estimate the classification difficulty of the dataset. Our method computes a single number that characterizes the dataset difficulty 27x faster than training state-of-the-art networks. The proposed method can be used in combination with network topology and hyper-parameter search optimizers to efficiently drive the search towards promising neural-network configurations.
38		Deep Self-Organization: Interpretable Discrete Representation Learning on Time Series	https://arxiv.org/abs/1806.02199	ETH Zurich, max planck institute, UCL	VAE		Switzerland, Germany, UK	Human professionals are often required to make decisions based on complex multivariate time series measurements in an online setting, e.g. in health care. Since human cognition is not optimized to work well in high-dimensional spaces, these decisions benefit from interpretable low-dimensional representations. However, many representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose to couple a variational autoencoder to a discrete latent space and introduce a topological structure through the use of self-organizing maps. This allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the latent space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application. In the latter experiment, our representation uncovers meaningful structure in the acute physiological state of a patient.
39		Evaluation of the impact of deep learning architectural components selection and dataset size on a medical imaging task	https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10579/1057911/Evaluation-of-the-impact-of-deep-learning-architectural-components-selection/10.1117/12.2293395.short	GE Healthcare	evaluation, classification	Medical Image 2018	USA	Deep Learning (DL) has been successfully applied in numerous fields fueled by increasing computational power and access to data. However, for medical imaging tasks, limited training set size is a common challenge when applying DL. This paper explores the applicability of DL to the task of classifying a single axial slice from a CT exam into one of six anatomy regions. A total of ~29000 images selected from 223 CT exams were manually labeled for ground truth. An additional 54 exams were labeled and used as an independent test set. The network architecture developed for this application is composed of 6 convolutional layers and 2 fully connected layers with RELU non-linear activations between each layer. Max-pooling was used after every second convolutional layer, and a softmax layer was used at the end. Given this base architecture, the effect of inclusion of network architecture components such as Dropout and Batch Normalization on network performance and training is explored. The network performance as a function of training and validation set size is characterized by training each network architecture variation using 5,10,20,40,50 and 100% of the available training data. The performance comparison of the various network architectures was done for anatomy classification as well as two computer vision datasets. The anatomy classifier accuracy varied from 74.1% to 92.3% in this study depending on the training size and network layout used. Dropout layers improved the model accuracy for all training sizes.
40		Hierarchical Convolutional Neural Networks for Fashion Image Classification	https://www.sciencedirect.com/science/article/pii/S0957417418305992	Ewha Womans University	CNN, classification	[J] Expert Systems with Applications	South Korea	Deep learning can be applied in various business fields for better performance. Especially, fashion-related businesses have started to apply deep learning techniques on their e-commerce such as apparel recognition, apparel search and retrieval engine, and automatic product recommendation. The most important backbone of these applications is the image classification task. However, apparel classification can be difficult due to its various apparel properties, and complexity in the depth of categorization. In other words, multi-class apparel classification can be hard and ambiguous to separate among similar classes. Here, we find the need of image classification reflecting hierarchical structure of apparel categories. In most of the previous studies, hierarchy has not been considered in image classification when using Convolutional Neural Networks (CNN), and not even in fashion image classification using other methodologies. In this paper, we propose to apply Hierarchical Convolutional Neural Networks (HCNN) on apparel classification. This study has contribution in that this is the first trial to apply hierarchical classification of apparel using CNN and has significance in that the proposed model is a knowledge embedded classifier outputting hierarchical information. We implement HCNN using VGGNet on Fashion-MNIST dataset. Results have shown that when using HCNN model, the loss gets decreased and the accuracy gets improved than the base model without hierarchical structure. We conclude that HCNN brings better performance in classifying apparel.
41		SIN2: Stealth infection on neural network — A low-cost agile neural Trojan attack methodology	https://ieeexplore.ieee.org/abstract/document/8383920/	Florida International University, University of Florida	cloud computing , invasive software , learning algorithm/theory , neural nets	HOST18	USA	Deep Neural Network (DNN) has recently become the “de facto” technique to drive the artificial intelligence (AI) industry. However, there also emerges many security issues as the DNN based intelligent systems are being increasingly prevalent. Existing DNN security studies, such as adversarial attacks and poisoning attacks, are usually narrowly conducted at the software algorithm level, with the misclassification as their primary goal. The more realistic system-level attacks introduced by the emerging intelligent service supply chain, e.g. the third-party cloud based machine learning as a service (MLaaS) along with the portable DNN computing engine, have never been discussed. In this work, we propose a low-cost modular methodology-Stealth Infection on Neural Network, namely “SIN 2 ”, to demonstrate the novel and practical intelligent supply chain triggered neural Trojan attacks. Our “SIN 2 ” well leverages the attacking opportunities built upon the static neural network model and the underlying dynamic runtime system of neural computing framework through a bunch of neural Trojaning techniques. We implement a variety of neural Trojan attacks in Linux sandbox by following proposed “SIN 2 ”. Experimental results show that our modular design can rapidly produce and trigger various Trojan attacks that can easily evade the existing defenses.
42		OCEAN: An On-Chip Incremental-Learning Enhanced Artificial Neural Network Processor with Multiple Gated-Recurrent-Unit Accelerators	https://ieeexplore.ieee.org/abstract/document/8403217/	Fudan University	rnn , rnn , inference , on-chip training , deep learning processor , energy-efficient accelerator , gradient	[J] IEEE Journal on Emerging and Selected Topics in Circuits and Systems	China	This paper presents OCEAN: an artificial neural network processor designed for accelerating gated-recurrent-unit (GRU) inference and on-chip incremental learning for sequential modeling. Implemented in 65-nm CMOS with silicon area of $2.9 \times 3.5\,\,\mathrm {mm}^{2}$ , the OCEAN processor features a 32-bit reduced instruction set computing core, 64-KB on-chip SRAM, and eight 16-bit four-cell GRU accelerators for inference and gradient computation. Each GRU accelerator is optimized and enhanced for efficient gradient computation. The processor is measured to consume 155 mW at the peak clock rate of 400 MHz and the supply of 1.2 V or 6.6 mW at 20 MHz/0.8 V. Both inference and on-chip incremental learning are accomplished on well-known AI tasks such as handwritten digit recognition, semantic natural language processing, and biomedical waveform-based seizure detection.
43		Global Semantic Consistency for Zero-Shot Learning	https://arxiv.org/abs/1806.08503	Fudan University, Tongji University	zero/few-shot learning, global semantic consistency, loss function, parametric anomaly detection		China	In image recognition, there are many cases where training samples cannot cover all target classes. Zero-shot learning (ZSL) utilizes the class semantic information to classify samples of the unseen categories that have no corresponding samples contained in the training set. In this paper, we propose an end-to-end framework, called Global Semantic Consistency Network (GSC-Net for short), which makes complete use of the semantic information of both seen and unseen classes, to support effective zero-shot learning. We also adopt a soft label embedding loss to further exploit the semantic relationships among classes. To adapt GSC-Net to a more practical setting, Generalized Zero-shot Learning (GZSL), we introduce a parametric novelty detection mechanism. Our approach achieves the state-of-the-art performance on both ZSL and GZSL tasks over three visual attribute datasets, which validates the effectiveness and advantage of the proposed framework.
44		Evaluation of generative networks through their data augmentation capacity	https://openreview.net/forum?id=HJ1HFlZAb	Université de Montréal, Paris Tech	GAN, evaluation		Canada, France	Generative networks are known to be difficult to assess. Recent works on generative models, especially on generative adversarial networks, produce nice samples of varied categories of images. But the validation of their quality is highly dependent on the method used. A good generator should generate data which contain meaningful and varied information and that fit the distribution of a dataset. This paper presents a new method to assess a generator. Our approach is based on training a classifier with a mixture of real and generated samples. We train a generative model over a labeled training set, then we use this generative model to sample new data points that we mix with the original training data. This mixture of real and generated data is thus used to train a classifier which is afterwards tested on a given labeled test dataset. We compare this result with the score of the same classifier trained on the real training data mixed with noise. By computing the classifier's accuracy with different ratios of samples from both distributions (real and generated) we are able to estimate if the generator successfully fits and is able to generalize the distribution of the dataset. Our experiments compare the result of different generators from the VAE and GAN framework on MNIST and fashion MNIST dataset.
45		A Generative Deep Recurrent Model for Exchangeable Data	https://arxiv.org/abs/1802.07535	Ghent University, google, Twitter, UCL, University of Oxford	Bayesian inference, zero/few-shot learning	NIPS18	UK	We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks that require generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, and anomaly detection.
46		Are gans created equal? a large-scale study	https://arxiv.org/abs/1711.10337	Google	GAN, evaluation	NIPS18	USA	Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the original one.
47		Learning to Learn Without Labels	https://openreview.net/forum?id=ByoT9Fkvz	Google	Meta-Learning, unsupervised learning, meta-learning	ICLR18	USA	A major goal of unsupervised learning is for algorithms to learn representations of data, useful for subsequent tasks, without access to supervised labels or other high-level attributes. Typically, these algorithms minimize a surrogate objective, such as reconstruction error or likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect (e.g. semi-supervised classification). In this work, we propose using meta-learning to learn an unsupervised learning rule, and meta-optimize the learning rule directly to produce good representations for a desired task. Here, our desired task (meta-objective) is the performance of the representation on semi-supervised classification, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations that perform well under this meta-objective. We examine the performance of the learned algorithm on several datasets and show that it learns useful features, generalizes across both network architectures and a wide array of datasets, and outperforms existing unsupervised learning techniques.
48		Clustering Small Samples with Quality Guarantees: Adaptivity with One2all pps	https://arxiv.org/abs/1706.03607	google, Tel Aviv University	clustering	AAAI18	USA, Israel	Clustering of data points is a fundamental tool in data analysis. We consider points $X$ in a relaxed metric space, where the triangle inequality holds within a constant factor. The {\em cost} of clustering $X$ by $Q$ is $V(Q)=\sum_{x\in X} d_{xQ}$. Two basic tasks, parametrized by $k \geq 1$, are {\em cost estimation}, which returns (approximate) $V(Q)$ for queries $Q$ such that $\|Q\|=k$ and {\em clustering}, which returns an (approximate) minimizer of $V(Q)$ of size $\|Q\|=k$. With very large data sets $X$, we seek efficient constructions of small samples that act as surrogates to the full data for performing these tasks. Existing constructions that provide quality guarantees are either worst-case, and unable to benefit from structure of real data sets, or make explicit strong assumptions on the structure. We show here how to avoid both these pitfalls using adaptive designs. At the core of our design is the {\em one2all} construction of multi-objective probability-proportional-to-size (pps) samples: Given a set $M$ of centroids and $\alpha \geq 1$, one2all efficiently assigns probabilities to points so that the clustering cost of {\em each} $Q$ with cost $V(Q) \geq V(M)/\alpha$ can be estimated well from a sample of size $O(\alpha \|M\|\epsilon^{-2})$. For cost queries, we can obtain worst-case sample size $O(k\epsilon^{-2})$ by applying one2all to a bicriteria approximation $M$, but we adaptively balance $\|M\|$ and $\alpha$ to further reduce sample size. For clustering, we design an adaptive wrapper that applies a base clustering algorithm to a sample $S$. Our wrapper uses the smallest sample that provides statistical guarantees that the quality of the clustering on the sample carries over to the full data set. We demonstrate experimentally the huge gains of using our adaptive instead of worst-case methods.
49		Dropout with Tabu Strategy for Regularizing Deep Neural Networks	https://arxiv.org/abs/1808.09907	Griffith University, Jinan University	regularziation technique		Australia, China	Dropout has proven to be an effective technique for regularization and preventing the co-adaptation of neurons in deep neural networks (DNN). It randomly drops units with a probability $p$ during the training stage of DNN. Dropout also provides a way of approximately combining exponentially many different neural network architectures efficiently. In this work, we add a diversification strategy into dropout, which aims at generating more different neural network architectures in a proper times of iterations. The dropped units in last forward propagation will be marked. Then the selected units for dropping in the current FP will be kept if they have been marked in the last forward propagation. We only mark the units from the last forward propagation. We call this new technique Tabu Dropout. Tabu Dropout has no extra parameters compared with the standard Dropout and also it is computationally cheap. The experiments conducted on MNIST, Fashion-MNIST datasets show that Tabu Dropout improves the performance of the standard dropout.
50		Contractive Slab and Spike Convolutional Deep Boltzmann Machine	https://www.sciencedirect.com/science/article/pii/S0925231218301887	Harbin Engineering University	RBM, Restricted Boltzmann Machine, robust learning	[J] Neurocomputing	China	Deep unsupervised learning for robust and effective feature extractions from high-resolution images still keeps greatly challenging. Although Deep Boltzmann Machine (DBM) has demonstrated the impressive capacity of feature extractions, there is still a lot of potential for improvement in scaling such model to full-sized images, the robustness, and the quality of the learned features. In this paper, we propose a Contractive Slab and Spike Convolutional Deep Boltzmann Machine in order to settle these issues. First, the proposed model extends convolution operation to the DBM in order to deal with real-size images. Second, we induce element-wise multiplication between real-valued slab hidden units and binary spike hidden units in order to enhance the quality of feature extraction in the receptive field. Then, we add the frobenius norm of the jacobian of the features as a regularization term to the maximum likelihood function in order to enhance the robustness of the features during training. The proposed regularization term results in a localized space contraction, which in turn obtains robust features on the hidden layer. Last, we use a new block-Contractive Slab and Spike Convolutional Restricted Boltzmann Machine in order to pretrain the proposed model. The proposed deep model shows the better capacity to extract high-level representations. The results on various visual tasks demonstrate the proposed model can achieve the improved performance over several state-of-the-art methods.
51		End-to-end Learning of Deterministic Decision Trees	https://arxiv.org/abs/1712.02743	Heidelberg University	decision tree		Germany	Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable, and to learn from scratch those features that best allow to solve a given supervised learning problem. Recent work (Kontschieder 2015) has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after a deterministic annealing process become deterministic at test time. We also analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split node. In summary, we present the first end-to-end learning scheme for deterministic decision trees and present results on par with or superior to published standard oblique decision tree algorithms.
52		Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks	null	Helsana AG		[B] Book from Apress	Switzerland	Work with advanced topics in deep learning, such as optimization algorithms, hyper-parameter tuning, dropout, and error analysis as well as strategies to address typical problems encountered when training deep neural networks. You’ll begin by studying the activation functions mostly with a single neuron (ReLu, sigmoid, and Swish), seeing how to perform linear and logistic regression using TensorFlow, and choosing the right cost function. The next section talks about more complicated neural network architectures with several layers and neurons and explores the problem of random initialization of weights. An entire chapter is dedicated to a complete overview of neural network error analysis, giving examples of solving problems originating from variance, bias, overfitting, and datasets coming from different distributions. Applied Deep Learning also discusses how to implement logistic regression completely from scratch without using any Python library except NumPy, to let you appreciate how libraries such as TensorFlow allow quick and efficient experiments. Case studies for each method are included to put into practice all theoretical information. You’ll discover tips and tricks for writing optimized Python code (for example vectorizing loops with NumPy). What You Will Learn Implement advanced techniques in the right way in Python and TensorFlow Debug and optimize advanced methods (such as dropout and regularization) Carry out error analysis (to realize if one has a bias problem, a variance problem, a data offset problem, and so on) Set up a machine learning project focused on deep learning on a complex dataset Who This Book Is For Readers with a medium understanding of machine learning, linear algebra, calculus, and basic Python programming.
53		Entropy Estimates for Generative Models	https://openreview.net/forum?id=Hk0ZOFkwf	Higher School of Economics	GAN		Russia	Different approaches to generative modeling entail different approaches to evaluation. While some models admit test likelihood estimation, for others only proxy metrics for visual quality are being reported. In this paper, we propose a simple method to compute differential entropy of an arbitrary decoder-based generative model. Using this approach, we found that models with qualitatively different samples are distinguishable in terms of entropy. In particular, adversarially trained generative models typically have higher entropy than variational autoencoders. Additionally, we provide support for the application of entropy as a measure of sample diversity.
54		Tuning the Layers of Neural Networks for Robust Generalization	https://csce.ucmss.com/cr/books/2018/LFS/CSREA2018/ICD8017.pdf	Hong Kong University of Science and Technology	generalization, weak layer identification, architecture search, data augmentation, random epochs training.	ICDATA18	Hong Kong	Neural Networks are known to have generalization ability for test data. The generalization ability depends on the fine tuning of the network architecture, which mainly depended on design experience. In this work, we explore a simple way to identify the network layer responsible for the lack of performance robustness for translationally displaced input patterns, and hence provide evidence to improve the translational robustness of the network by modifying that particular layer for small datasets. It achieves a significant improvement in the weighted average error with modification hints provided by the random epochs training process on MNIST and Fashion MNIST datasets. This method also provides a way to understand the weight space development of neural networks.
55		Finding Competitive Network Architectures Within a Day Using UCT	https://arxiv.org/abs/1712.07420	IBM Research	architecture search		Ireland	The design of neural network architectures for a new data set is a laborious task which requires human deep learning expertise. In order to make deep learning available for a broader audience, automated methods for finding a neural network architecture are vital. Recently proposed methods can already achieve human expert level performances. However, these methods have run times of months or even years of GPU computing time, ignoring hardware constraints as faced by many researchers and companies. We propose the use of Monte Carlo planning in combination with two different UCT (upper confidence bound applied to trees) derivations to search for network architectures. We adapt the UCT algorithm to the needs of network architecture search by proposing two ways of sharing information between different branches of the search tree. In an empirical study we are able to demonstrate that this method is able to find competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day. Extending the search time to five GPU days, we are able to outperform human architectures and our competitors which consider the same types of layers.
56	⭐	A Visual Interaction Framework for Dimensionality Reduction Based Data Exploration	https://dl.acm.org/citation.cfm?id=3174209	IBM Research	Dimensionality reduction, visualization	CHI18	USA	Dimensionality reduction is a common method for analyzing and visualizing high-dimensional data. However, reasoning dynamically about the results of a dimensionality reduction is difficult. Dimensionality-reduction algorithms use complex optimizations to reduce the number of dimensions of a dataset, but these new dimensions often lack a clear relation to the initial data dimensions, thus making them difficult to interpret. Here we propose a visual interaction framework to improve dimensionality-reduction based exploratory data analysis. We introduce two interaction techniques, forward projection and backward projection, for dynamically reasoning about dimensionally reduced data. We also contribute two visualization techniques, prolines and feasibility maps, to facilitate the effective use of the proposed interactions. We apply our framework to PCA and autoencoder-based dimensionality reductions. Through data-exploration examples, we demonstrate how our visual interactions can improve the use of dimensionality reduction in exploratory data analysis.
57		Defending Against Model Stealing Attacks Using Deceptive Perturbations	https://arxiv.org/abs/1806.00054	IBM Research	adversarial learning, defending		USA	Machine learning models are vulnerable to simple model stealing attacks if the adversary can obtain output labels for chosen inputs. To protect against these attacks, it has been proposed to limit the information provided to the adversary by omitting probability scores, significantly impacting the utility of the provided service. In this work, we illustrate how a service provider can still provide useful, albeit misleading, class probability information, while significantly limiting the success of the attack. Our defense forces the adversary to discard the class probabilities, requiring significantly more queries before they can train a model with comparable performance. We evaluate several attack strategies, model architectures, and hyperparameters under varying adversarial models, and evaluate the efficacy of our defense against the strongest adversary. Finally, we quantify the amount of noise injected into the class probabilities to mesure the loss in utility, e.g., adding 1.26 nats per query on CIFAR-10 and 3.27 on MNIST. Our evaluation shows our defense can degrade the accuracy of the stolen model at least 20%, or require up to 64 times more queries while keeping the accuracy of the protected model almost intact.
58		SGAN: An Alternative Training of Generative Adversarial Networks	http://openaccess.thecvf.com/content_cvpr_2018/papers_backup/Chavdarova_SGAN_An_Alternative_CVPR_2018_paper.pdf	Idiap Research, EPFL	GAN	ECCV18	Switzerland	The Generative Adversarial Networks (GANs) have demonstrated impressive performance for data synthesis, and are now used in a wide range of computer vision tasks. In spite of this success, they gained a reputation for being difficult to train, what results in a time-consuming and human-involved development process to use them. We consider an alternative training process, named SGAN, in which several adversarial “local” pairs of networks are trained independently so that a “global” supervising pair of networks can be trained against them. The goal is to train the global pair with the corresponding ensemble opponent for improved performances in terms of mode coverage. This approach aims at increasing the chances that learning will not stop for the global pair, preventing both to be trapped in an unsatisfactory local minimum, or to face oscillations often observed in practice. To guarantee the latter, the global pair never affects the local ones. The rules of SGAN training are thus as follows: the global generator and discriminator are trained using the local discriminators and generators, respectively, whereas the local networks are trained with their fixed local opponent. Experimental results on both toy and real-world problems demonstrate that this approach outperforms standard training in terms of better mitigating mode collapse, stability while converging and that it surprisingly, increases the convergence speed as well.
59	⭐	Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data	http://openaccess.thecvf.com/content_cvpr_2018/papers/Pal_Adversarial_Data_Programming_CVPR_2018_paper.pdf	IIT Hyderabad	GAN, data programming	ECCV18	India	Paucity of large curated hand-labeled training data forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label, given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.
60		Classification of fashion article images using convolutional neural networks	https://ieeexplore.ieee.org/abstract/document/8313740/	IIT Patna	CNN, classification	ICIIP17	India	In this paper, we propose a state-of-the-art model for classification of fashion article images. We trained convolutional neural network based deep learning architectures to classify images in the Fashion-MNIST dataset. We have proposed three different convolutional neural network architectures and used batch normalization and residual skip connections for ease and acceleration of learning process. Our model shows impressive results on the benchmark dataset of Fashion-MNIST. Comparisons show that our proposed model reports improved accuracy of around 2% over the existing state-of-the-art systems in literature.
61		Dense and Diverse Capsule Networks: Making the Capsules Learn Better	https://arxiv.org/abs/1805.04001	IIT Ropar	CapsNet, classification		India	Past few years have witnessed exponential growth of interest in deep learning methodologies with rapidly improving accuracies and reduced computational complexity. In particular, architectures using Convolutional Neural Networks (CNNs) have produced state-of-the-art performances for image classification and object recognition tasks. Recently, Capsule Networks (CapsNet) achieved significant increase in performance by addressing an inherent limitation of CNNs in encoding pose and deformation. Inspired by such advancement, we asked ourselves, can we do better? We propose Dense Capsule Networks (DCNet) and Diverse Capsule Networks (DCNet++). The two proposed frameworks customize the CapsNet by replacing the standard convolutional layers with densely connected convolutions. This helps in incorporating feature maps learned by different layers in forming the primary capsules. DCNet, essentially adds a deeper convolution network, which leads to learning of discriminative feature maps. Additionally, DCNet++ uses a hierarchical architecture to learn capsules that represent spatial information in a fine-to-coarser manner, which makes it more efficient for learning complex data. Experiments on image classification task using benchmark datasets demonstrate the efficacy of the proposed architectures. DCNet achieves state-of-the-art performance (99.75%) on MNIST dataset with twenty fold decrease in total training iterations, over the conventional CapsNet. Furthermore, DCNet++ performs better than CapsNet on SVHN dataset (96.90%), and outperforms the ensemble of seven CapsNet models on CIFAR-10 by 0.31% with seven fold decrease in number of parameters.
62		Distributed One-class Learning	https://arxiv.org/abs/1802.03583	Imperial College London, Queen Mary University of London	Distributed Learning, One-Class Autoencoder, Privacy		UK	We propose a cloud-based filter trained to block third parties from uploading privacy-sensitive images of others to online social media. The proposed filter uses Distributed One-Class Learning, which decomposes the cloud-based filter into multiple one-class classifiers. Each one-class classifier captures the properties of a class of privacy-sensitive images with an autoencoder. The multi-class filter is then reconstructed by combining the parameters of the one-class autoencoders. The training takes place on edge devices (e.g. smartphones) and therefore users do not need to upload their private and/or sensitive images to the cloud. A major advantage of the proposed filter over existing distributed learning approaches is that users cannot access, even indirectly, the parameters of other users. Moreover, the filter can cope with the imbalanced and complex distribution of the image content and the independent probability of addition of new users. We evaluate the performance of the proposed distributed filter using the exemplar task of blocking a user from sharing privacy-sensitive images of other users. In particular, we validate the behavior of the proposed multi-class filter with non-privacy-sensitive images, the accuracy when the number of classes increases, and the robustness to attacks when an adversary user has access to privacy-sensitive images of other users.
63		Catastrophic Importance of Catastrophic Forgetting	https://arxiv.org/abs/1808.07049	Independent Researcher	Catastrophic Forgetting, Reinforcement Learning			This paper describes some of the possibilities of artificial neural networks that open up after solving the problem of catastrophic forgetting. A simple model and reinforcement learning applications of existing methods are also proposed.
64		Practical Computer Vision: Extract insightful information from images using TensorFlow, Keras, and OpenCV	https://dl.acm.org/citation.cfm?id=3217393	Independent Researcher		[B] from Packt Publishing		A practical guide designed to get you from basics to current state of art in computer vision systems. Key Features Master the different tasks associated with Computer Vision and develop your own Computer Vision applications with easeLeverage the power of Python, Tensorflow, Keras, and OpenCV to perform image processing, object detection, feature detection and moreWith real-world datasets and fully functional code, this book is your one-stop guide to understanding Computer VisionBook DescriptionIn this book, you will find several recently proposed methods in various domains of computer vision. You will start by setting up the proper Python environment to work on practical applications. This includes setting up libraries such as OpenCV, TensorFlow, and Keras using Anaconda. Using these libraries, you'll start to understand the concepts of image transformation and filtering. You will find a detailed explanation of feature detectors such as FAST and ORB; you'll use them to find similar-looking objects. With an introduction to convolutional neural nets, you will learn how to build a deep neural net using Keras and how to use it to classify the Fashion-MNIST dataset. With regard to object detection, you will learn the implementation of a simple face detector as well as the workings of complex deep-learning-based object detectors such as Faster R-CNN and SSD using TensorFlow. You'll get started with semantic segmentation using FCN models and track objects with Deep SORT. Not only this, you will also use Visual SLAM techniques such as ORB-SLAM on a standard dataset. By the end of this book, you will have a firm understanding of the different computer vision techniques and how to apply them in your applications. What you will learn Learn the basics of image manipulation with Open CV Implement and visualize image filters such as smoothing, dilation, histogram equalization, and moreSet up various libraries and platforms, such as OpenCV, Keras, and Tensorflow, in order to start using computer vision, along with appropriate datasets for each chapter, such as MSCOCO, MOT, and Fashion-MNIST Understand image transformation and downsampling with practical implementations. Explore neural networks for computer vision and convolutional neural networks using Keras Understand working on deep-learning-based object detection such as Faster-R-CNN, SSD, and more Explore deep-learning-based object tracking in action Understand Visual SLAM techniques such as ORB-SLAM Who This Book Is For This book is for machine learning practitioners and deep learning enthusiasts who want to understand and implement various tasks associated with Computer Vision and image processing in the most practical manner possible. Some programming experience would be beneficial while knowing Python would be an added bonus.
65		Regression to MLP in Keras	https://link.springer.com/chapter/10.1007/978-1-4842-3516-4_5	Independent Researcher		[B] from Apress	India	You have been working on regression while solving machine learning applications. Linear regression and nonlinear regression are used to predict numeric targets, while logistic regression and other classifiers are used to predict non-numeric target variables. In this chapter, I will discuss the evolution of multilayer perceptrons.
66		Deep Learning with Applications Using Python	https://link.springer.com/content/pdf/10.1007/978-1-4842-3516-4.pdf	Independent Researcher		[B] from Apress	India	Build deep learning applications, such as computer vision, speech recognition, and chatbots, using frameworks such as TensorFlow and Keras. This book helps you to ramp up your practical know-how in a short period of time and focuses you on the domain, models, and algorithms required for deep learning applications. Deep Learning with Applications Using Python covers topics such as chatbots, natural language processing, and face and object recognition. The goal is to equip you with the concepts, techniques, and algorithm implementations needed to create programs capable of performing deep learning. This book covers intermediate and advanced levels of deep learning, including convolutional neural networks, recurrent neural networks, and multilayer perceptrons. It also discusses popular APIs such as IBM Watson, Microsoft Azure, and scikit-learn.
67		Adversarial Network Compression	https://arxiv.org/abs/1803.10750	Innovation OSRAM GmbH, Technical University of Munich	network compression, adversarial learning, teacher-student		Germany	Neural network compression has recently received much attention due to the computational requirements of modern deep models. In this work, our objective is to transfer knowledge from a deep and accurate model to a smaller one. Our contributions are threefold: (i) we propose an adversarial network compression approach to train the small student network to mimic the large teacher, without the need for labels during training; (ii) we introduce a regularization scheme to prevent a trivially-strong discriminator without reducing the network capacity and (iii) our approach generalizes on different teacher-student models. In an extensive evaluation on five standard datasets, we show that our student has small accuracy drop, achieves better performance than other knowledge transfer approaches and it surpasses the performance of the same network trained with labels. In addition, we demonstrate state-of-the-art results compared to other compression strategies.
68		Deep Anomaly Detection Using Geometric Transformations	https://arxiv.org/abs/1805.10917	Israel Institute of Technology	anomaly detection	NIPS18	Israel	We consider the problem of anomaly detection in images, and present a new detection technique. Given a sample of images, all known to belong to a "normal" class (e.g., dogs), we show how to train a deep neural model that can detect out-of-distribution images (i.e., non-dog objects). The main idea behind our scheme is to train a multi-class model to discriminate between dozens of geometric transformations applied on all the given images. The auxiliary expertise learned by the model generates feature detectors that effectively identify, at test time, anomalous images based on the softmax activation statistics of the model when applied on transformed images. We present extensive experiments using the proposed detector, which indicate that our algorithm improves state-of-the-art methods by a wide margin.
69		Finding Flatter Minima with SGD	https://openreview.net/forum?id=r1VF9dCUG	Jagiellonian University, Université de Montréal, Facebook, University of Bonn, The University of Edinburgh	optimization, gradient	ICLR18	Poland, Canada, UK, Germany	It has been discussed that over-parameterized deep neural networks (DNNs) trained using stochastic gradient descent (SGD) with smaller batch sizes generalize better compared with those trained with larger batch sizes. Additionally, model parameters found by small batch size SGD tend to be in flatter regions. We extend these empirical observations and experimentally show that both large learning rate and small batch size contribute towards SGD finding flatter minima that generalize well. Conversely, we find that small learning rates and large batch sizes lead to sharper minima that correlate with poor generalization in DNNs.
70		OS-ELM-FPGA: An FPGA-Based Online Sequential Unsupervised Anomaly Detector	http://www.arc.ics.keio.ac.jp/~matutani/papers/tsukada_heteropar2018.pdf	Keio University, The University of Tokyo	FPGA, Autoencoder		Japan	Autoencoder, a neural-network based dimensionality reduction algorithm has demonstrated its effectiveness in anomaly detection. It can detect whether an input sample is normal or abnormal by just training only with normal data. In general, Autoencoder is built on backpropagationbased neural networks (BP-NNs). When BP-NNs are implemented in edge devices, they are typically specialized only for prediction with weight matrices precomputed offline. However, such a system cannot be immediately adapted to trend changes of input data that the system has never encountered. In this paper, we propose an FPGA-based unsupervised anomaly detector, called OS-ELM-FPGA, that combines Autoencoder and an online sequential learning algorithm OS-ELM. Based on our theoretical analysis of the algorithm, the proposed OS-ELM-FPGA completely eliminates matrix pseudoinversions while improving the learning throughput. Simulation results using open-source datasets show that OS-ELM-FPGA achieves favorable anomaly detection accuracy compared to CPU and GPU implementations of BP-NNs. Learning throughput of OS-ELM-FPGA is 3.47x to 27.99x and 5.22x to 78.06x higher than those of CPU and GPU implementations of OS-ELM. It is also 3.62x to 36.15x and 1.53x to 43.44x higher than those of CPU and GPU implementations of BP-NNs.
71		Gradient Based Evolution to Optimize the Structure of Convolutional Neural Networks	https://ieeexplore.ieee.org/abstract/document/8451394/	KIT, silicon-software	Genetic algorithm, differential evolution	ICIP18	Germany	Due to decreasing hardware prices, machine learning is becoming increasingly interesting for industrial applications such as automatic visual inspection (AVI). This paper presents a metaheuristic approach to the automatic generation of a well suited convolutional neural network (CNN) based on differential evolution. This makes it possible to find a suitable architecture of a CNN for a given task with little prior knowledge. Another aim is to reduce the resources needed in the inference as much as possible. Therefore, we choose a function that considers both the accuracy and the resources used to measure the fitness of a CNN. For typical industrial datasets, we obtain CNNs with an accuracy of more than 98 % on average within relatively short processing time.
72		Morphing architectures for pose-based image generation of people in clothing	http://www.diva-portal.org/smash/get/diva2:1239446/FULLTEXT01.pdf	KTH Royal Institute of Technology		Thesis	Sweden
73		Why Should I Trust Interactive Learners? Explaining Interactive Queries of Classifiers to Users	https://arxiv.org/abs/1805.08578	KU Leuven, TU Darmstadt	interactive learning, interpretability, active learning	NIPS18	Belgium, Germany	Although interactive learning puts the user into the loop, the learner remains mostly a black box for the user. Understanding the reasons behind queries and predictions is important when assessing how the learner works and, in turn, trust. Consequently, we propose the novel framework of explanatory interactive learning: in each step, the learner explains its interactive query to the user, and she queries of any active classifier for visualizing explanations of the corresponding predictions. We demonstrate that this can boost the predictive and explanatory powers of and the trust into the learned model, using text (e.g. SVMs) and image classification (e.g. neural networks) experiments as well as a user study.
74		Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge	https://arxiv.org/abs/1804.08333	Kyoto University, The University of Tokyo	federated Learning, mobile edge computing	Globecom18	Japan	We envision a mobile edge computing (MEC) framework for machine learning (ML) technologies, which leverages distributed client data and computation resources for training high-performance ML models while preserving a client privacy. Toward this future goal, this work aims to extend Federated Learning (FL), which enables privacy-preserving training of models, to work with heterogeneous clients in a practical cellular network. The FL protocol iteratively asks random clients to download a trainable model from a server, update it with own data, and upload the updated model to the server, while asking the server to aggregate multiple client updates to further improve the model. While clients in this protocol are free from disclosing own private data, the overall training process can become inefficient when some clients are with limited computational resources (i.e., requiring longer update time) or under poor wireless channel conditions (longer upload time). Our new FL protocol, which we refer to as FedCS, mitigates this problem and performs FL efficiently while actively managing clients based on their resource conditions. Specifically, FedCS solves a client selection problem with resource constraints, which selects the maximum possible number of clients who can complete the FL's download, update, and upload steps within a certain deadline. This selection strategy results in the server aggregating as many client updates as possible and accelerating performance improvement in ML models (e.g., classification accuracy.) We conducted an experimental evaluation using publicly-available large-scale image datasets to train deep neural networks on MEC environment simulations. The experimental results show that FedCS is able to complete its training process in a significantly shorter time compared to the original FL protocol.
75		LEARNING WITH A GENERATIVE ADVERSARIAL NETWORK FROM A POSITIVE UNLABELED DATASET FOR IMAGE CLASSIFICATION	https://hal.archives-ouvertes.fr/hal-01811008/	Laboratoire des signaux et systèmes, Institut VEDECOM, Institut franco-allemand de recherches de Saint-Louis	PU Learning	ICIP18	France	In this paper, we propose a new approach which addresses the Positive Unlabeled learning challenge for image classification. Its functioning is based on GAN abilities in order to generate fake images samples whose distribution gets closer to negative samples distribution included in the unlabeled dataset available, while being different to the distribution of the unlabeled positive samples. Then we train a CNN classifier with the positive samples and the fake generated samples, as it would be done with a classic Positive Negative dataset. The tests performed on three different image classification datasets show that the system is stable up to an acceptable fraction of positive samples present in the unlabeled dataset. Although very different, this method outperforms the state of the art PU learning on the RGB dataset CIFAR-10.
76		Classification d'images en apprenant sur des échantillons positifs et non labélisés avec un réseau antagoniste génératif	https://hal.archives-ouvertes.fr/hal-01811036/	Laboratoire des signaux et systèmes, Institut VEDECOM, Institut franco-allemand de recherches de Saint-Louis	PU learning	Conférence Nationale en Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle 2018	France	Dans ce document, nous proposons une nouvelle approche répondant à la tâche de classification d'images à partir d'un apprentissage sur données positives et non-labélisées. Son bon fonctionnement repose sur certaines particularités des réseaux antagonistes génératifs (GANs). Ces derniers nous permettent de générer des fausses images dont la distribution se rapproche de la distribution des échantillons négatifs inclus dans le jeu de données non labélisé disponible, tout en restant différente de la distribution des échantillons positifs non labélisés. Ensuite, nous entraînons un classifieur convolutif avec les échantillons positifs et les faux échantillons générés, tel que cela aurait été fait avec un jeu de données classique de type Positif Négatif. Les tests réalisés sur trois jeux de données différents de classification d'images montrent que le système est stable dans son comportement jusqu'à une fraction conséquente d'échantillons positifs présents dans le jeu de données non labélisé. Bien que très différente, cette méthode surpasse l'état de l'art PU learning sur le jeu de données RVB CIFAR-10.
77		Exploration of Capsule Networks	https://nigel-schuster.de/uploads/capsnets.pdf	Lawrence University	CapsNet		USA	Capsule Networks provide a novel approach for understanding data with especially promising results in the field of computer vision. Capsule Networks discover successive transformations of an image and thereby allow deeper understand than traditional ConvNet architectures. For MNIST this technique was successfully applied to produce state of the art results. CapsNets were further found to be particularly adept at deciphering overlapping digits. We use CapsNets first to verify state of the art results and then to explore their ability to classify novel testing sets using MNIST training. Subsequently we apply CapsNets to CIFAR-10 and to building a CAPTCHA recognizer for reCAPTCHAs, leveraging the CapsNets ability to tolerate lateral shifts and zooms in character datasets very well.
78		Progressive prune network for memory efficient continual learning	https://openreview.net/forum?id=HyuNGQkvG	Leapmind, Ascent Robotics	Deep learning, transfer learning, compression, classification		Japan	We present a method for the transfer of knowledge between tasks in memory-constrained devices. In this setting, the per-parameter performance over multiple tasks is a critical objective. Specifically, we consider continual training and pruning of a progressive neural network. This type of multi-task network was introduced in Rusu et al. (2016)a, which optimised for performance, while the number of parameters grew quadratically with the number of tasks. Our preliminary results demonstrates that it is possible to limit the parameter growth to be linear, while still achieving a performance boost, and sharing knowledge across different tasks.
79		Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients	https://arxiv.org/abs/1705.07774	max planck institute	optimization, gradient	ICML18	Germany	The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner's toolbox for problems where ADAM fails.
80		Assessing Generative Models via Precision and Recall	https://arxiv.org/abs/1806.00035	max planck institute, google	GAN, evaluation	NIPS18	Germany, USA	Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fr\'echet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.
81		Reproducibility of “Three Factors Influencing Minima in SGD”	https://www.cs.mcgill.ca/~kkutsc/reproduce.pdf	McGill			Canada	In this project, we sought to reproduce the results of the paper “Three Factors Influencing Minima in SGD”[1]. This paper used a mathematical approach to draw conclusions about the effect of batch size and learning rate with respect to the generalization of neural networks. Three experiments were recreated that tested this relationship
82		Constructing Unrestricted Adversarial Examples with Generative Models	https://arxiv.org/abs/1805.07894	Microsoft, Stanford University	adversarial learning, GAN	NIPS18	USA	Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model, without being restricted to norm-bounded perturbations. We first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over inputs. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that this new kind of adversarial images, which we call Generative Adversarial Examples, are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that generative adversarial examples can easily bypass strong adversarial training and certified defense methods which can foil existing adversarial attacks.
83		Learning Network Size While Training with ShrinkNets	https://www.sysml.cc/doc/161.pdf	MIT	model compression	SYSML18	USA
84		Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds	https://arxiv.org/abs/1804.05345	MIT, University of Haifa	coresets, compression		USA, Israel	We present an efficient coresets-based neural network compression algorithm that provably sparsifies the parameters of a trained fully-connected neural network in a manner that approximately preserves the network's output. Our approach is based on an importance sampling scheme that judiciously defines a sampling distribution over the neural network parameters, and as a result, retains parameters of high importance while discarding redundant ones. We leverage a novel, empirical notion of sensitivity and extend traditional coreset constructions to the application of compressing parameters. Our theoretical analysis establishes guarantees on the size and accuracy of the resulting compressed neural network and gives rise to new generalization bounds that may provide novel insights on the generalization properties of neural networks. We demonstrate the practical effectiveness of our algorithm on a variety of neural network configurations and real-world data sets.
85		Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations	https://arxiv.org/abs/1804.02485	Montreal Institute for Learning Algorithms	manifold learning, DAE, adversarial learning		Canada	Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both black-box and white-box threat models; (ii) suggest that our improvements are not primarily due to the gradient masking problem and (iii) show the advantage of doing this fortification in the hidden layers instead of the input space.
86		Rectify Heterogeneous Models with Semantic Mapping	http://proceedings.mlr.press/v80/ye18c.html	Nanjing University	zero/few-shot learning, robust learning	ICML18	China	On the way to the robust learner for real-world applications, there are still great challenges, including considering unknown environments with limited data. Learnware (Zhou; 2016) describes a novel perspective, and claims that learning models should have reusable and evolvable properties. We propose to Encode Meta InformaTion of features (EMIT), as the model specification for characterizing the changes, which grants the model evolvability to bridge heterogeneous feature spaces. Then, pre-trained models from related tasks can be Reused by our REctiFy via heterOgeneous pRedictor Mapping (REFORM}) framework. In summary, the pre-trained model is adapted to a new environment with different features, through model refining on only a small amount of training data in the current task. Experimental results over both synthetic and real-world tasks with diverse feature configurations validate the effectiveness and practical utility of the proposed framework.
87		Fast Dynamic Routing Based on Weighted Kernel Density Estimation	https://arxiv.org/abs/1805.10807	Nanjing University of Post and Telecommunication, Chinese Academy of Sciences	CapsNet, dynamic-routing, clustering		China	Capsules as well as dynamic routing between them are most recently proposed structures for deep neural networks. A capsule groups data into vectors or matrices as poses rather than conventional scalars to represent specific properties of target instance. Besides of pose, a capsule should be attached with a probability (often denoted as activation) for its presence. The dynamic routing helps capsules achieve more generalization capacity with many fewer model parameters. However, the bottleneck that prevents widespread applications of capsule is the expense of computation during routing. To address this problem, we generalize existing routing methods within the framework of weighted kernel density estimation, and propose two fast routing methods with different optimization strategies. Our methods prompt the time efficiency of routing by nearly 40\% with negligible performance degradation. By stacking a hybrid of convolutional layers and capsule layers, we construct a network architecture to handle inputs at a resolution of $64\times{64}$ pixels. The proposed models achieve a parallel performance with other leading methods in multiple benchmarks.
88		Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions	https://arxiv.org/abs/1710.04806	Nanjing University, Duke University	interpretability	AAAI18	China, USA	Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability -- they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as "black box" models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes.
89		Multi-task Learning on MNIST Image Datasets	https://openreview.net/forum?id=S1PWi_lC-	National Sun Yat-sen University	multi-task learning,		Taiwan	We apply multi-task learning to image classification tasks on MNIST-like datasets. MNIST dataset has been referred to as the {\em drosophila} of machine learning and has been the testbed of many learning theories. The NotMNIST dataset and the FashionMNIST dataset have been created with the MNIST dataset as reference. In this work, we exploit these MNIST-like datasets for multi-task learning. The datasets are pooled together for learning the parameters of joint classification networks. Then the learned parameters are used as the initial parameters to retrain disjoint classification networks. The baseline recognition model are all-convolution neural networks. Without multi-task learning, the recognition accuracies for MNIST, NotMNIST and FashionMNIST are 99.56\%, 97.22\% and 94.32\% respectively. With multi-task learning to pre-train the networks, the recognition accuracies are respectively 99.70\%, 97.46\% and 95.25\%. The results re-affirm that multi-task learning framework, even with data with different genres, does lead to significant improvement.
90		Neural Networks in an Adversarial Setting and Ill-Conditioned Weight Space	https://www.researchgate.net/profile/Brett_Drury/publication/327655305_Proceedings_Of_the_2nd_International_Workshop_on_AI_In_Security/links/5b9bf0b745851574f7cb44fc/Proceedings-Of-the-2nd-International-Workshop-on-AI-In-Security.pdf	National University of Ireland		2nd International Workshop on A.I. In Security	Ireland
91		Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks	https://arxiv.org/abs/1805.12185	New York University	deep learning, backdoor, trojan	RAID18	USA	Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.
92		Structured Disentangled Representations	http://www.ccs.neu.edu/home/jwvdm/assets/pdf/esmaeli_arxiv_2018.pdf	Northeastern University, university of cambridge, University of Oxford	VAE		USA	Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.
93		Hierarchical Disentangled Representations	https://arxiv.org/abs/1804.02086	Northeastern University, university of cambridge, University of Oxford, university of cambridge	VAE		USA, UK	Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.
95		Cognitive Consistency Routing Algorithm of Capsule-network	https://arxiv.org/abs/1808.09062	Northern Arizona University	CapsNet, routing		USA	Artificial Neural Networks (ANNs) are computational models inspired by the central nervous system (especially the brain) of animals and are used to estimate or generate unknown approximation functions relied on large amounts of inputs. Capsule Neural Network (Sabour S, et al.[2017]) is a novel structure of Convolutional Neural Networks which simulates the visual processing system of human brain. In this paper, we introduce psychological theories which called Cognitive Consistency to optimize the routing algorithm of Capsnet to make it more close to the work pattern of human brain. It has been shown in the experiment that a progress had been made compared with the baseline.
96		Autonomous Deep Learning: A Genetic DCNN Designer for Image Classification	https://arxiv.org/abs/1807.00284	Northwestern Polytechnical University	architecture search		China	Recent years have witnessed the breakthrough success of deep convolutional neural networks (DCNNs) in image classification and other vision applications. Although freeing users from the troublesome handcrafted feature extraction by providing a uniform feature extraction-classification framework, DCNNs still require a handcrafted design of their architectures. In this paper, we propose the genetic DCNN designer, an autonomous learning algorithm can generate a DCNN architecture automatically based on the data available for a specific image classification problem. We first partition a DCNN into multiple stacked meta convolutional blocks and fully connected blocks, each containing the operations of convolution, pooling, fully connection, batch normalization, activation and drop out, and thus convert the architecture into an integer vector. Then, we use refined evolutionary operations, including selection, mutation and crossover to evolve a population of DCNN architectures. Our results on the MNIST, Fashion-MNIST, EMNISTDigit, EMNIST-Letter, CIFAR10 and CIFAR100 datasets suggest that the proposed genetic DCNN designer is able to produce automatically DCNN architectures, whose performance is comparable to, if not better than, that of stateof- the-art DCNN models
97		Maximum principle based algorithms for deep learning	http://www.jmlr.org/papers/volume18/17-653/17-653.pdf	Peking University, A-Star	deep learning, optimal control, Pontryagin’s maximum principle, method of successive approximations	[J] Journal of Machine Learning Research	China, Singapore	The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms. Training is recast as a control problem and this allows us to formulate necessary optimality conditions in continuous time using the Pontryagin’s maximum principle (PMP). A modification of the method of successive approximations is then used to solve the PMP, giving rise to an alternative training algorithm for deep learning. This approach has the advantage that rigorous error estimates and convergence results can be established. We also show that it may avoid some pitfalls of gradient-based methods, such as slow convergence on flat landscapes near saddle points. Furthermore, we demonstrate that it obtains favorable initial convergence rate periteration, provided Hamiltonian maximization can be efficiently carried out - a step which is still in need of improvement. Overall, the approach opens up new avenues to attack problems associated with deep learning, such as trapping in slow manifolds and inapplicability of gradient-based methods for discrete trainable variables.
98		Adversarial Noise Layer: Regularize Neural Network By Adding Noise	https://arxiv.org/abs/1805.08000	Peking University, University of Electronic Science and Technology of China, Australian National University	adversarial attack, robust learning	NIPS18	China, Australia	In this paper, we introduce a novel regularization method called Adversarial Noise Layer (ANL), which significantly improve the CNN's generalization ability by adding adversarial noise in the hidden layers. ANL is easy to implement and can be integrated with most of the CNN-based models. We compared the impact of the different type of noise and visually demonstrate that adversarial noise guide CNNs to learn to extract cleaner feature maps, further reducing the risk of over-fitting. We also conclude that the model trained with ANL is more robust to FGSM and IFGSM attack. Code is available at: this https URL
99		Conducting Credit Assignment by Aligning Local Representations	https://arxiv.org/abs/1803.01834	Penn State University	learning algorithm/theory		USA	Using back-propagation and its variants to train deep networks is often problematic for new users. Issues such as exploding gradients, vanishing gradients, and high sensitivity to weight initialization strategies often make networks difficult to train, especially when users are experimenting with new architectures. Here, we present Local Representation Alignment (LRA), a training procedure that is much less sensitive to bad initializations, does not require modifications to the network architecture, and can be adapted to networks with highly nonlinear and discrete-valued activation functions. Furthermore, we show that one variation of LRA can start with a null initialization of network weights and still successfully train networks with a wide variety of nonlinearities, including tanh, ReLU-6, softplus, signum and others that may draw their inspiration from biology. A comprehensive set of experiments on MNIST and the much harder Fashion MNIST data sets show that LRA can be used to train networks robustly and effectively, succeeding even when back-propagation fails and outperforming other alternative learning algorithms, such as target propagation and feedback alignment.
100		Sanny: 大規模 EC サイトのための精度と速度を両立した分散可能な近似近傍探索エンジン	https://rand.pepabo.com/papers/iot42-proceeding-miyakey.pdf	Pepabo R&D	Nearest Neighbour search		Japan
101		New hybrid kernel architectures for deep learning	https://upcommons.upc.edu/handle/2117/119034	Polytechnic University of Catalonia	cnn, kernel method	Thesis		In this work we explore the possibilities of combining neural network architectures and kernel methods by introducing hybrid kernel blocks. We present hybrid architectures which can be trained as traditional neural networks and introduce novel training and regularization methodologies for them.
102		Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference	https://arxiv.org/abs/1805.07862	Princeton University	GAN		USA	Deep neural networks have been demonstrated to be vulnerable to adversarial attacks, where small perturbations are intentionally added to the original inputs to fool the classifier. In this paper, we propose a defense method, Featurized Bidirectional Generative Adversarial Networks (FBGAN), to capture the semantic features of the input and filter the non-semantic perturbation. FBGAN is pre-trained on the clean dataset in an unsupervised manner, adversarially learning a bidirectional mapping between the high-dimensional data space and the low-dimensional semantic space, and mutual information is applied to disentangle the semantically meaningful features. After the bidirectional mapping, the adversarial data can be reconstructed to denoised data, which could be fed into the classifier for classification. We empirically show the quality of reconstruction images and the effectiveness of defense.
103		Pushing the limits of capsule networks	https://pdfs.semanticscholar.org/de7f/27677ae04bf1f09c223f68cea71af90be3d4.pdf	Princeton University	CapsNet		USA	Convolutional neural networks use pooling and other downscaling operations to maintain translational invariance for detection of features, but in their architecture they do not explicitly maintain a representation of the locations of the features relative to each other. This means they do not represent two instances of the same object in different orientations the same way, like humans do, and so training them often requires extensive data augmentation and exceedingly deep networks.
104		Symmetric Rectified Linear Units for Fully Connected Deep Models	https://link.springer.com/chapter/10.1007/978-3-319-99247-1_26	Renmin University of China	activation function, activation function	KSEM18	China	Rectified Linear Units (ReLU) is one of the key aspects for the success of Deep Learning models. It has been shown that deep networks can be trained efficiently using ReLU without pre-training. In this paper, we compare and analyze various kinds of ReLU variants in fully-connected deep neural networks. We test ReLU, LReLU, ELU, SELU, mReLU and vReLU on two popular datasets: MNIST and Fashion-MNIST. We find vReLU, a symmetric ReLU variant, shows promising results in most experiments. Fully-connected networks (FCN) with vReLU activation are able to achieve a higher accuracy. It achieves relative improvement in test error rate of 39.9% compared to ReLU on MNIST dataset; and achieves relative improvement of 6.3% compared to ReLU on Fashion-MNIST dataset.
105		DEFRAG: Deep Euclidean Feature Representations through Adaptation on the Grassmann Manifold	https://arxiv.org/abs/1806.07688	Rochester Institute of Technology	clustering	CVPR18 workshop	USA	We propose a novel technique for training deep networks with the objective of obtaining feature representations that exist in a Euclidean space and exhibit strong clustering behavior. Our desired features representations have three traits: they can be compared using a standard Euclidian distance metric, samples from the same class are tightly clustered, and samples from different classes are well separated. However, most deep networks do not enforce such feature representations. The DEFRAG training technique consists of two steps: first good feature clustering behavior is encouraged though an auxiliary loss function based on the Silhouette clustering metric. Then the feature space is retracted onto a Grassmann manifold to ensure that the L_2 Norm forms a similarity metric. The DEFRAG technique achieves state of the art results on standard classification datasets using a relatively small network architecture with significantly fewer parameters than many standard networks.