NIPS2017 タイトル＆概要訳一覧

	A	B	C	D	E	F
1		URL	Title	Title訳	Abstract	Abstract訳
2	1	https://papers.nips.cc//paper/6606-wider-and-deeper-cheaper-and-faster-tensorized-lstms-for-sequence-learning	Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning	より広く、より深く、より安く、より速く: シーケンス学習のためのテンソル化 LSTM	Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model.	Long Short-Term Memory (LSTM) は、リカレントニューラルネットワークの長期の時間情報を保存する能力を高めるための一般的なアプローチです。 LSTM ネットワークの容量は、層を広げたり追加したりすることで増加できます。ただし、通常、前者では追加のパラメーターが導入され、後者では実行時間が増加します。代替案として、隠れ状態がテンソルで表され、層間畳み込みを介して更新される Tensorized LSTM を提案します。テンソルのサイズを大きくすると、パラメータがテンソル内の異なる場所で共有されるため、パラメータを追加することなくネットワークを効率的に拡張できます。出力を遅延させることで、各タイムステップの深い計算がシーケンスの時間的な計算にマージされるため、追加のランタイムをほとんど使用せずにネットワークを暗黙的に深くすることができます。 5 つの困難なシーケンス学習タスクに対して行われた実験は、提案されたモデルの可能性を示しています。
3	2	https://papers.nips.cc//paper/6607-concentration-of-multilinear-functions-of-the-ising-model-with-applications-to-network-data	Concentration of Multilinear Functions of the Ising Model with Applications to Network Data	イジングモデルの多線形関数の集中化とネットワークデータへの応用	We prove near-tight concentration of measure for polynomial functions of the Ising model, under high temperature, improving the radius of concentration guaranteed by known results by polynomial factors in the dimension (i.e.~the number of nodes in the Ising model). We show that our results are optimal up to logarithmic factors in the dimension. We obtain our results by extending and strengthening the exchangeable-pairs approach used to prove concentration of measure in this setting by Chatterjee. We demonstrate the efficacy of such functions as statistics for testing the strength of interactions in social networks in both synthetic and real world data.	我々は、高温下でイジングモデルの多項式関数の測定値がほぼ厳密に集中していることを証明し、次元（つまり、イジングモデルのノードの数）の多項式因数によって既知の結果によって保証される集中半径を改善します。結果が次元の対数因数まで最適であることを示します。我々は、Chatterjee がこの設定における測定の集中を証明するために使用した交換可能ペアのアプローチを拡張および強化することによって結果を取得します。私たちは、合成データと現実世界のデータの両方で、ソーシャルネットワークにおけるインタラクションの強さをテストするための統計などの関数の有効性を実証します。
4	3	https://papers.nips.cc//paper/6608-deep-subspace-clustering-networks	Deep Subspace Clustering Networks	深亜空間クラスタリングネットワーク	We present a novel deep neural network architecture for unsupervised subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the ""self-expressiveness"" property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. We further propose pre-training and fine-tuning strategies that let us effectively learn the parameters of our subspace clustering networks. Our experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised subspace clustering methods.	教師なし部分空間クラスタリングのための新しいディープニューラルネットワークアーキテクチャを紹介します。このアーキテクチャは、入力データを潜在空間に非線形にマッピングするディープオートエンコーダーに基づいて構築されています。私たちの重要なアイデアは、エンコーダとデコーダの間に新しい自己表現層を導入して、従来の部分空間クラスタリングで効果的であることが証明されている「自己表現力」特性を模倣することです。微分可能であるため、新しい自己表現層は、標準的な逆伝播手順を通じてすべてのデータポイント間のペアごとの類似性を学習するためのシンプルかつ効果的な方法を提供します。非線形であるため、ニューラルネットワークベースの方法は、複雑な (多くの場合非線形) 構造を持つデータポイントをクラスター化できます。さらに、部分空間クラスタリングネットワークのパラメーターを効果的に学習できる事前トレーニングおよび微調整戦略を提案します。私たちの実験は、提案された方法が最先端の教師なし部分空間クラスタリング方法よりも大幅に優れていることを示しています。
5	4	https://papers.nips.cc//paper/6609-attentional-pooling-for-action-recognition	Attentional Pooling for Action Recognition	アクション認識のための注意プーリング	We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII (12.5% relative improvement) and HMDB (RGB) datasets. We also perform an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.	アクション認識および人間オブジェクトのインタラクションタスクに注意を組み込むための、シンプルでありながら驚くほど強力なモデルを紹介します。私たちが提案するアテンションモジュールは、追加の監視の有無にかかわらずトレーニングでき、ネットワークサイズと計算コストをほぼ同じに保ちながら、精度を大幅に向上させます。これにより、静止画像とビデオにわたる 3 つの標準的なアクション認識ベンチマークにおける最先端のベースアーキテクチャに対する大幅な改善がもたらされ、MPII (12.5% の相対的改善) および HMDB (RGB) データセットに関する新しい最先端の技術が確立されました。また、経験的および分析的にアテンションモジュールの広範な分析を実行します。後者に関しては、双線形プーリング法 (通常は詳細な分類に使用される) の低ランク近似として、ボトムアップとトップダウンの注目の新しい導出を導入します。この観点から、私たちの注意の定式化は、きめの細かい認識問題としてのアクション認識の新しい特徴付けを示唆しています。
6	5	https://papers.nips.cc//paper/6610-on-the-consistency-of-quick-shift	On the Consistency of Quick Shift	クイックシフトの一貫性について	Quick Shift is a popular mode-seeking and clustering algorithm. We present finite sample statistical consistency guarantees for Quick Shift on mode and cluster recovery under mild distributional assumptions. We then apply our results to construct a consistent modal regression algorithm.	Quick Shift は、一般的なモードシークおよびクラスタリングアルゴリズムです。穏やかな分布仮定の下で、Quick Shift オンモードとクラスター回復に対する有限サンプルの統計的一貫性保証を示します。次に、結果を適用して一貫したモーダル回帰アルゴリズムを構築します。
7	6	https://papers.nips.cc//paper/6611-breaking-the-nonsmooth-barrier-a-scalable-parallel-method-for-composite-optimization	Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization	非スムーズな障壁を打ち破る: 複合最適化のためのスケーラブルな並列手法	Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as the Lasso, group Lasso or empirical risk minimization with convex constraints. In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse method inspired by SAGA, a variance reduced incremental gradient algorithm. The proposed method is easy to implement and significantly outperforms the state of the art on several nonsmooth, large-scale problems. We prove that our method achieves a theoretical linear speedup with respect to the sequential version under assumptions on the sparsity of gradients and block-separability of the proximal term. Empirical benchmarks on a multi-core architecture illustrate practical speedups of up to 12x on a 20-core machine.	確率的勾配降下の並列非同期バリアントは、そのシンプルさと優れたパフォーマンスにより、マルチコアアーキテクチャ上で広範囲にわたる大規模な最適化問題を解決するための一般的な方法となっています。しかし、実際には成功しているにもかかわらず、滑らかでない目標に対するサポートがまだ不足しているため、なげなわ、グループなげなわ、凸型制約による経験的リスクの最小化など、機械学習で関心のある多くの問題には適していません。この研究では、分散を低減した増分勾配アルゴリズムである SAGA からインスピレーションを得た完全非同期スパース手法である ProxASAGA を提案し、分析します。提案された方法は実装が簡単で、いくつかの滑らかでない大規模な問題に関して最先端技術を大幅に上回ります。勾配のスパース性と近位項のブロック分離性に関する仮定の下で、私たちの方法が逐次バージョンに関して理論的な線形高速化を達成することを証明します。マルチコアアーキテクチャの実証ベンチマークでは、20 コアマシンで最大 12 倍の実質的な高速化が示されています。
8	7	https://papers.nips.cc//paper/6612-dual-agent-gans-for-photorealistic-and-identity-preserving-profile-face-synthesis	Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis	フォトリアリスティックでアイデンティティを保持したプロファイル顔合成のためのデュアルエージェント GAN	Synthesizing realistic profile faces is promising for more efficiently training deep pose-invariant models for large-scale unconstrained face recognition, by populating samples with extreme poses and avoiding tedious annotations. However, learning from synthetic faces may not achieve the desired performance due to the discrepancy between distributions of the synthetic and real face images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve the realism of a face simulator's output using unlabeled real faces, while preserving the identity information during the realism refinement. The dual agents are specifically designed for distinguishing real v.s. fake and identities simultaneously. In particular, we employ an off-the-shelf 3D face model as a simulator to generate profile face images with varying poses. DA-GAN leverages a fully convolutional network as the generator to generate high-resolution images and an auto-encoder as the discriminator with the dual agents. Besides the novel architecture, we make several key modifications to the standard GAN to preserve pose and texture, preserve identity and stabilize training process: (i) a pose perception loss; (ii) an identity perception loss; (iii) an adversarial loss with a boundary equilibrium regularization term. Experimental results show that DA-GAN not only presents compelling perceptual results but also significantly outperforms state-of-the-arts on the large-scale and challenging NIST IJB-A unconstrained face recognition benchmark. In addition, the proposed DA-GAN is also promising as a new approach for solving generic transfer learning problems more effectively.	リアルな横顔の顔を合成すると、サンプルに極端なポーズを取り込み、面倒な注釈を回避することで、大規模な制約のない顔認識用のディープポーズ不変モデルをより効率的にトレーニングできることが期待されます。ただし、合成顔からの学習では、合成顔画像と実際の顔画像の分布の不一致により、望ましいパフォーマンスが得られない可能性があります。このギャップを狭めるために、私たちはデュアルエージェント敵対生成ネットワーク (DA-GAN) モデルを提案します。このモデルは、リアリズムの改良中にアイデンティティ情報を維持しながら、ラベルのない実際の顔を使用して顔シミュレーターの出力のリアリズムを向上させることができます。デュアルエージェントは、本物のエージェントと本物のエージェントを区別するために特別に設計されています。偽物とアイデンティティを同時に。特に、既製の 3D 顔モデルをシミュレータとして使用し、さまざまなポーズの横顔画像を生成します。 DA-GAN は、高解像度画像を生成するジェネレータとして完全な畳み込みネットワークを利用し、デュアルエージェントによる識別器としてオートエンコーダを利用します。新しいアーキテクチャに加えて、姿勢とテクスチャを保持し、アイデンティティを保持し、トレーニングプロセスを安定させるために、標準 GAN にいくつかの重要な変更を加えています。(i) 姿勢知覚の損失。 (ii) アイデンティティ認識の喪失。 (iii) 境界均衡正則化項を伴う敵対的損失。実験結果は、DA-GAN が説得力のある知覚結果を提示するだけでなく、大規模で困難な NIST IJB-A 制約のない顔認識ベンチマークにおいて最先端技術を大幅に上回るパフォーマンスを示していることを示しています。さらに、提案された DA-GAN は、一般的な転移学習問題をより効果的に解決するための新しいアプローチとしても有望です。
9	8	https://papers.nips.cc//paper/6613-dilated-recurrent-neural-networks	Dilated Recurrent Neural Networks	拡張リカレントニューラルネットワーク	Learning with recurrent neural networks (RNNs) on long sequences is a notoriously difficult task. There are three major challenges: 1) complex dependencies, 2) vanishing and exploding gradients, and 3) efficient parallelization. In this paper, we introduce a simple yet effective RNN connection structure, the DilatedRNN, which simultaneously tackles all of these challenges. The proposed architecture is characterized by multi-resolution dilated recurrent skip connections and can be combined flexibly with diverse RNN cells. Moreover, the DilatedRNN reduces the number of parameters needed and enhances training efficiency significantly, while matching state-of-the-art performance (even with standard RNN cells) in tasks involving very long-term dependencies. To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures. We rigorously prove the advantages of the DilatedRNN over other recurrent neural architectures. The code for our method is publicly available at https://github.com/code-terminator/DilatedRNN.	リカレントニューラルネットワーク (RNN) を使用して長いシーケンスを学習することは、非常に難しいことで知られています。 1) 複雑な依存関係、2) 勾配の消失と爆発、3) 効率的な並列化という 3 つの大きな課題があります。このペーパーでは、これらすべての課題に同時に取り組む、シンプルかつ効果的な RNN 接続構造である DiratedRNN を紹介します。提案されたアーキテクチャは、多重解像度拡張リカレントスキップ接続を特徴とし、多様な RNN セルと柔軟に組み合わせることができます。さらに、DiratedRNN は必要なパラメーターの数を減らし、トレーニング効率を大幅に向上させながら、非常に長期的な依存関係を伴うタスクで最先端のパフォーマンス (標準 RNN セルを使用した場合でも) に匹敵します。アーキテクチャの利点を理論に基づいて定量化するために、メモリ容量の尺度である平均反復長を導入します。これは、既存の尺度よりも長いスキップ接続を持つ RNN に適しています。私たちは、他のリカレントニューラルアーキテクチャと比較した DiratedRNN の利点を厳密に証明しています。私たちのメソッドのコードは、https://github.com/code-terminator/DiratedRNN で公開されています。
10	9	https://papers.nips.cc//paper/6614-hunt-for-the-unique-stable-sparse-and-fast-feature-learning-on-graphs	Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs	グラフで学習する、ユニークで安定したスパースかつ高速な機能を探す	For the purpose of learning on graphs, we hunt for a graph feature representation that exhibit certain uniqueness, stability and sparsity properties while also being amenable to fast computation. This leads to the discovery of family of graph spectral distances (denoted as FGSD) and their based graph feature representations, which we prove to possess most of these desired properties. To both evaluate the quality of graph features produced by FGSD and demonstrate their utility, we apply them to the graph classification problem. Through extensive experiments, we show that a simple SVM based classification algorithm, driven with our powerful FGSD based graph features, significantly outperforms all the more sophisticated state-of-art algorithms on the unlabeled node datasets in terms of both accuracy and speed; it also yields very competitive results on the labeled datasets - despite the fact it does not utilize any node label information.	グラフで学習する目的で、特定の一意性、安定性、およびスパース性の特性を示しながら、高速計算に適したグラフ特徴表現を探します。これにより、グラフスペクトル距離のファミリー (FGSD として示される) とそれに基づくグラフ特徴表現の発見につながり、これらの望ましい特性のほとんどを備えていることが証明されます。 FGSD によって生成されたグラフ特徴の品質を評価し、その有用性を実証するために、それらをグラフ分類問題に適用します。広範な実験を通じて、強力な FGSD ベースのグラフ機能で駆動されるシンプルな SVM ベースの分類アルゴリズムが、精度と速度の両方の点で、ラベルなしノードデータセットに対するすべてのより洗練された最先端のアルゴリズムよりも大幅に優れていることがわかりました。また、ノードラベル情報をまったく利用していないにもかかわらず、ラベル付きデータセットで非常に競争力のある結果が得られます。
11	10	https://papers.nips.cc//paper/6615-scalable-generalized-linear-bandits-online-computation-and-hashing	Scalable Generalized Linear Bandits: Online Computation and Hashing	スケーラブルな一般化線形バンディット: オンライン計算とハッシュ	Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number $N$ of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., ``hash-amenable'') and result in a time complexity sublinear in $N$. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for $d$-dimensional arm sets scales with $d^{3/2}$, whereas GLOC's regret bound scales with $d$. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with $d^{5/4}$. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.	確率的線形バンディットの自然な拡張である一般化線形バンディット (GLB) は、近年人気があり、成功を収めています。しかし、既存の GLB は弾数やアームの数に応じて拡張性が低く、実際の実用性が制限されています。この文書では、GLB 問題に対する新しいスケーラブルな解決策を 2 つの点で提案します。まず、タイムステップごとの空間と時間の複雑さが時間 $t$ とともに少なくとも線形に増加する既存の GLB とは異なり、一定の空間と時間の複雑さを享受するためにオンライン計算を実行する新しいアルゴリズムを提案します。その中心となるのは、\emph{any} オンライン学習アルゴリズムを GLB アルゴリズムに変換する、オンラインから信頼セットへの変換 (GLOC メソッド) の新しい一般化線形拡張です。特別なケースとして、オンラインニュートンステップアルゴリズムに GLOC を適用します。これにより、以前の研究よりも時間とメモリの複雑さがはるかに少なく、後悔の少ない GLB アルゴリズムが得られます。第二に、アームの数 $N$ が非常に大きい場合のために、次の各アームが内積検索によって選択される新しいアルゴリズムを提案します。このような方法は、ハッシュアルゴリズム (つまり、「ハッシュに適した」アルゴリズム) を介して実装でき、結果として $N$ のサブリニアな時間計算量が得られます。 GLOC の Thompson サンプリング拡張はハッシュに適していますが、$d$ 次元のアームセットのリグレス限界は $d^{3/2}$ でスケールするのに対し、GLOC のリグロング限界は $d$ でスケールします。このギャップを埋めるために、我々はリグレスバウンドが $d^{5/4}$ に応じてスケールする新しいハッシュに適したアルゴリズムを提案します。最後に、最新技術よりも精度が高く、独立して興味深い可能性がある、高速な近似ハッシュキー計算 (内積) を提案します。この論文は、私たちの方法の利点を確認する予備的な実験結果で締めくくられています。
12	11	https://papers.nips.cc//paper/6616-probabilistic-models-for-integration-error-in-the-assessment-of-functional-cardiac-models	Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models	心臓機能モデルの評価における積分誤差の確率モデル	This paper studies the numerical computation of integrals, representing estimates or predictions, over the output $f(x)$ of a computational model with respect to a distribution $p(\mathrm{d}x)$ over uncertain inputs $x$ to the model. For the functional cardiac models that motivate this work, neither $f$ nor $p$ possess a closed-form expression and evaluation of either requires $\approx$ 100 CPU hours, precluding standard numerical integration methods. Our proposal is to treat integration as an estimation problem, with a joint model for both the a priori unknown function $f$ and the a priori unknown distribution $p$. The result is a posterior distribution over the integral that explicitly accounts for dual sources of numerical approximation error due to a severely limited computational budget. This construction is applied to account, in a statistically principled manner, for the impact of numerical errors that (at present) are confounding factors in functional cardiac model assessment.	この論文は、不確実な入力 $x$ にわたる分布 $p(\mathrm{d}x)$ に関する計算モデルの出力 $f(x)$ に対する推定または予測を表す積分の数値計算を研究します。モデル。この研究を動機付ける機能的心臓モデルの場合、$f$ も $p$ も閉じた形式の式を持たず、どちらの評価にも $\およそ$ 100 CPU 時間が必要となり、標準的な数値積分法が不可能になります。私たちの提案は、先験的未知関数 $f$ と先験的未知分布 $p$ の両方の結合モデルを使用して、積分を推定問題として扱うことです。結果は積分上の事後分布となり、計算予算が厳しく制限されているために生じる数値近似誤差の 2 つの原因を明確に説明します。この構造は、（現時点では）心臓機能モデルの評価における交絡因子となっている数値誤差の影響を統計的に原理的に説明するために適用されます。
13	12	https://papers.nips.cc//paper/6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent	Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent	敵対者による機械学習: ビザンチン耐性勾配降下法	We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of $n$ workers, up to $f$ being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We propose \emph{Krum}, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.	私たちは、確率的勾配降下法 (SGD) の分散実装のビザンチン障害に対する回復力を研究します。これまで、分散機械学習フレームワークは、失敗の可能性、特に恣意的な (ビザンチン的な) 失敗の可能性をほとんど無視してきました。障害の原因には、ソフトウェアのバグ、ネットワークの非同期、ローカルデータセットの偏り、システム全体を侵害しようとする攻撃者などが含まれます。 $n$ ワーカーのセット ($f$ までがビザンチンである) を仮定して、次元やパラメーター空間のサイズを制限することなく、SGD がどの程度回復力を持つことができるかを尋ねます。まず、ワーカーによって提案されたベクトルの線形結合に基づく勾配集計ルール (つまり、現在のアプローチ) は単一のビザンチン障害を許容できないことを示します。次に、$f$ ビザンチンワーカーにもかかわらず収束を保証するための基本要件を捉えた集計ルールの回復力特性を定式化します。私たちは、復元力特性を満たす集計ルール \emph{Krum} を提案します。これは、分散 SGD の証明された最初のビザンチン復元アルゴリズムであると主張します。また、Krum の実験評価についても報告します。
14	13	https://papers.nips.cc//paper/6618-dynamic-safe-interruptibility-for-decentralized-multi-agent-reinforcement-learning	Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning	分散型マルチエージェント強化学習のための動的安全な中断機能	In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to interrupt an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong defined safe interruptibility for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces dynamic safe interruptibility, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: joint action learners and independent learners. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.	強化学習では、エージェントはアクションを実行し、その結果を観察することによって学習します。場合によっては、危険な状況の発生を防ぐために人間のオペレーターがエージェントを中断することが望ましい場合があります。しかし、エージェントは学習プロセスの一環として、報酬に影響を与えるこれらの中断を特定の状態に関連付け、意図的に回避する場合があります。エージェントは自分自身の過去の中断からだけでなく、他のエージェントの中断からも学習する可能性があるため、この状況は特にマルチエージェントのコンテキストでは困難です。 Orseau と Armstrong は 1 人の学習者に対する安全な中断可能性を定義しましたが、彼らの研究は自然にマルチエージェントシステムに拡張されるわけではありません。この論文では、分散学習の問題により適した代替定義である動的安全割り込み性を紹介し、共同行動学習者と独立学習者という 2 つの学習フレームワークでこの概念を研究します。共同行動学習者の場合に動的に安全な中断可能性を可能にするための現実的な十分条件を学習アルゴリズムに与えますが、これらの条件は独立した学習者にとっては十分ではないことを示します。ただし、エージェントが中断を検出できれば、独立した学習者であっても、動的に安全な中断可能性を確保するために観察を刈り取ることが可能であることを示します。
15	14	https://papers.nips.cc//paper/6619-interactive-submodular-bandit	Interactive Submodular Bandit	インタラクティブなサブモジュラーバンディット	In many machine learning applications, submodular functions have been used as a model for evaluating the utility or payoff of a set such as news items to recommend, sensors to deploy in a terrain, nodes to influence in a social network, to name a few. At the heart of all these applications is the assumption that the underlying utility/payoff function is known a priori, hence maximizing it is in principle possible. In real life situations, however, the utility function is not fully known in advance and can only be estimated via interactions. For instance, whether a user likes a movie or not can be reliably evaluated only after it was shown to her. Or, the range of influence of a user in a social network can be estimated only after she is selected to advertise the product. We model such problems as an interactive submodular bandit optimization, where in each round we receive a context (e.g., previously selected movies) and have to choose an action (e.g., propose a new movie). We then receive a noisy feedback about the utility of the action (e.g., ratings) which we model as a submodular function over the context-action space. We develop SM-UCB that efficiently trades off exploration (collecting more data) and exploration (proposing a good action given gathered data) and achieves a $O(\sqrt{T})$ regret bound after $T$ rounds of interaction. Given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upper-confidence bound on the payoff function that allows it to asymptotically achieve no-regret. Finally, we evaluate our results on four concrete applications, including movie recommendation (on the MovieLense data set), news recommendation (on Yahoo! Webscope dataset), interactive influence maximization (on a subset of the Facebook network), and personalized data summarization (on Reuters Corpus). In all these applications, we observe that SM-UCB consistently outperforms the prior art.	多くの機械学習アプリケーションでは、サブモジュール関数は、いくつか例を挙げると、推奨するニュース項目、地形に展開するセンサー、ソーシャルネットワークに影響を与えるノードなどのセットの効用や利益を評価するためのモデルとして使用されています。これらすべてのアプリケーションの中心となるのは、基礎となる効用/利得関数が事前にわかっており、したがってそれを最大化することが原理的に可能であるという仮定です。ただし、現実の状況では、効用関数は事前に完全にはわかっておらず、相互作用を通じてのみ推定できます。たとえば、ユーザーが映画を好きかどうかは、映画を見せられた後にのみ確実に評価できます。あるいは、ソーシャルネットワークにおけるユーザーの影響範囲は、そのユーザーが製品の宣伝に選ばれた後にのみ推定できます。このような問題をインタラクティブなサブモジュールバンディット最適化としてモデル化します。各ラウンドでコンテキスト (たとえば、以前に選択した映画) を受け取り、アクション (たとえば、新しい映画の提案) を選択する必要があります。次に、アクションの有用性 (評価など) に関するノイズの多いフィードバックを受け取り、コンテキストアクション空間上のサブモジュール関数としてモデル化します。私たちは、探索 (より多くのデータを収集する) と探索 (収集されたデータを基に適切なアクションを提案する) を効率的にトレードオフし、$T$ 回のインタラクション後に $O(\sqrt{T})$ の後悔限界を達成する SM-UCB を開発します。ユーティリティ関数の滑らかさを制御するコンテキスト-アクション-ペイオフ空間上の有界 RKHS ノルムカーネルが与えられると、SM-UCB はペイオフ関数に対して信頼の上限を維持し、漸近的に後悔のない状態を達成できるようにします。最後に、映画の推奨 (MovieLense データセット上)、ニュースの推奨 (Yahoo! Webscope データセット上)、インタラクティブな影響力の最大化 (Facebook ネットワークのサブセット上)、およびパーソナライズされたデータの要約 (ロイターコーパスに掲載されています）。これらすべてのアプリケーションにおいて、SM-UCB が従来技術よりも一貫して優れていることが観察されています。
16	15	https://papers.nips.cc//paper/6620-learning-to-see-physics-via-visual-de-animation	Learning to See Physics via Visual De-animation	ビジュアルデアニメーションによる物理の見方の学習	We introduce a paradigm for understanding physical scenes without human annotations. At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines. During training, the perception module and the generative models learn by visual de-animation --- interpreting and reconstructing the visual information stream. During testing, the system first recovers the physical world state, and then uses the generative models for reasoning and future prediction. Even more so than forward simulation, inverting a physics or graphics engine is a computationally hard problem; we overcome this challenge by using a convolutional inversion network. Our system quickly recognizes the physical world state from appearance and motion cues, and has the flexibility to incorporate both differentiable and non-differentiable physics and graphics engines. We evaluate our system on both synthetic and real datasets involving multiple physical scenes, and demonstrate that our system performs well on both physical state estimation and reasoning problems. We further show that the knowledge learned on the synthetic dataset generalizes to constrained real images.	人間による注釈なしで物理的なシーンを理解するためのパラダイムを紹介します。私たちのシステムの中核となるのは、最初に知覚モジュールによって回復され、次に物理エンジンとグラフィックエンジンによって利用される物理世界表現です。トレーニング中に、知覚モジュールと生成モデルは、視覚的なデアニメーション、つまり視覚情報ストリームの解釈と再構築によって学習します。テスト中、システムはまず物理世界の状態を回復し、次に生成モデルを推論と将来予測に使用します。フォワードシミュレーションよりもさらに、物理エンジンまたはグラフィックエンジンの反転は計算的に難しい問題です。私たちは、畳み込み反転ネットワークを使用することで、この課題を克服しました。私たちのシステムは、外観や動きの手がかりから物理世界の状態を迅速に認識し、微分可能および微分不可能な物理エンジンとグラフィックスエンジンを組み込む柔軟性を備えています。複数の物理シーンを含む合成データセットと実際のデータセットの両方でシステムを評価し、システムが物理状態の推定と推論の問題の両方で良好に機能することを実証します。さらに、合成データセットで学習された知識が制約付きの実画像に一般化されることを示します。
17	16	https://papers.nips.cc//paper/6621-label-efficient-learning-of-transferable-representations-acrosss-domains-and-tasks	Label Efficient Learning of Transferable Representations acrosss Domains and Tasks	ラベルのドメインとタスク間で転送可能な表現の効率的な学習	We propose a framework that learns a representation transferable across different domains and tasks in a data efficient manner. Our approach battles domain shift with a domain adversarial loss, and generalizes the embedding to novel task using a metric learning-based approach. Our model is simultaneously optimized on labeled source data and unlabeled or sparsely labeled data in the target domain. Our method shows compelling results on novel classes within a new domain even when only a few labeled examples per class are available, outperforming the prevalent fine-tuning approach. In addition, we demonstrate the effectiveness of our framework on the transfer learning task from image object recognition to video action recognition.	私たちは、データ効率の高い方法で、さまざまなドメインやタスク間で転送可能な表現を学習するフレームワークを提案します。私たちのアプローチは、ドメイン敵対的損失によるドメインシフトと闘い、メトリクス学習ベースのアプローチを使用して新しいタスクへの埋め込みを一般化します。私たちのモデルは、ラベル付きソースデータと、ターゲットドメイン内のラベルなしまたはまばらにラベル付けされたデータに対して同時に最適化されます。私たちの方法は、クラスごとに少数のラベル付きサンプルしか利用できない場合でも、新しいドメイン内の新しいクラスに関して説得力のある結果を示し、一般的な微調整アプローチを上回ります。さらに、画像オブジェクト認識からビデオアクション認識への転移学習タスクにおけるフレームワークの有効性を実証します。
18	17	https://papers.nips.cc//paper/6622-decoding-with-value-networks-for-neural-machine-translation	Decoding with Value Networks for Neural Machine Translation	ニューラル機械翻訳のためのバリューネットワークによるデコード	Neural Machine Translation (NMT) has become a popular technology in recent years, and beam search is its de facto decoding method due to the shrunk search space and reduced computational complexity. However, since it only searches for local optima at each time step through one-step forward looking, it usually cannot output the best target sentence. Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1,\cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e.g., BLEU score) of the partial target sentence if it is completed by the NMT model. Following the practice in reinforcement learning, we call this prediction network \emph{value network}. Specifically, we propose a recurrent structure for the value network, and train its parameters from bilingual data. During the test time, when choosing a word $w$ for decoding, we consider both its conditional probability given by the NMT model and its long-term value predicted by the value network. Experiments show that such an approach can significantly improve the translation accuracy on several translation tasks.	ニューラル機械翻訳 (NMT) は近年人気のテクノロジーとなっており、検索空間が縮小され計算量が軽減されるため、ビーム検索が事実上のデコード方法となっています。ただし、1 ステップ先読みで各タイムステップの局所最適値を探索するだけであるため、通常は最適な目的文を出力できません。 AlphaGo の成功と方法論に触発されて、この論文では、ソース文 $x$、現在利用可能なデコード出力 $y_1,\cdots, y_{t-1}$ を取得するビーム検索を改善するために予測ネットワークを使用することを提案します。ステップ $t$ の候補単語 $w$ を入力として、部分ターゲット文が NMT モデルによって完成された場合、その長期値 (BLEU スコアなど) を予測します。強化学習の実践に従って、この予測ネットワークを \emph{値ネットワーク} と呼びます。具体的には、値ネットワークの再帰構造を提案し、バイリンガルデータからそのパラメーターをトレーニングします。テスト中に、デコードする単語 $w$ を選択するとき、NMT モデルによって与えられるその条件付き確率と、値ネットワークによって予測されるその長期値の両方を考慮します。実験によると、このようなアプローチにより、いくつかの翻訳タスクで翻訳精度が大幅に向上することが示されています。
19	18	https://papers.nips.cc//paper/6623-parametric-simplex-method-for-sparse-learning	Parametric Simplex Method for Sparse Learning	スパース学習のためのパラメトリックシンプレックス法	High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we investiage a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). PSM offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Particularly, we demonstrate the superiority of PSM over various sparse learning approaches, including Dantzig selector for sparse linear regression, sparse support vector machine for sparse linear classification, and sparse differential network estimation. We then provide sufficient conditions under which PSM always outputs sparse solutions such that its computational performance can be significantly boosted. Thorough numerical experiments are provided to demonstrate the outstanding performance of the PSM method.	高次元のスパース学習は、大規模なデータ分析に大きな計算上の課題を課しています。この論文では、{\em 正則化係数} によってパラメータ化された線形プログラムとして定式化された広範なクラスのスパース学習アプローチを調査し、それらをパラメトリックシンプレックス法 (PSM) によって解決します。 PSM には、他の競合する方法に比べて、次のような大きな利点があります。(1) PSM は、正則化パラメータのすべての値に対する完全な解パスを自然に取得します。 (2) PSM は高精度の二重証明書停止基準を提供します。 (3) PSM は、非常に少ない反復で疎な解を生成し、解の疎性により反復あたりの計算コストが大幅に削減されます。特に、スパース線形回帰用の Dantzig セレクター、スパース線形分類用のスパースサポートベクターマシン、スパース差分ネットワーク推定など、さまざまなスパース学習アプローチに対する PSM の優位性を実証します。次に、PSM が常にスパース解を出力する十分な条件を提供し、その計算パフォーマンスを大幅に向上させることができます。 PSM 法の優れた性能を実証するために、徹底的な数値実験が提供されています。
20	19	https://papers.nips.cc//paper/6624-group-sparse-additive-machine	Group Sparse Additive Machine	グループスパース添加機	A family of learning algorithms generated from additive models have attracted much attention recently for their flexibility and interpretability in high dimensional data analysis. Among them, learning models with grouped variables have shown competitive performance for prediction and variable selection. However, the previous works mainly focus on the least squares regression problem, not the classification task. Thus, it is desired to design the new additive classification model with variable selection capability for many real-world applications which focus on high-dimensional data classification. To address this challenging problem, in this paper, we investigate the classification with group sparse additive models in reproducing kernel Hilbert spaces. A novel classification method, called as \emph{group sparse additive machine} (GroupSAM), is proposed to explore and utilize the structure information among the input variables. Generalization error bound is derived and proved by integrating the sample error analysis with empirical covering numbers and the hypothesis error estimate with the stepping stone technique. Our new bound shows that GroupSAM can achieve a satisfactory learning rate with polynomial decay. Experimental results on synthetic data and seven benchmark datasets consistently show the effectiveness of our new approach.	相加的モデルから生成された学習アルゴリズムのファミリーは、高次元データ分析における柔軟性と解釈可能性により、最近大きな注目を集めています。その中でも、グループ化された変数を含む学習モデルは、予測と変数選択において競合するパフォーマンスを示しています。ただし、これまでの研究では、分類タスクではなく、主に最小二乗回帰問題に焦点を当てていました。したがって、高次元データ分類に焦点を当てた多くの実世界のアプリケーション向けに、変数選択機能を備えた新しい加法的分類モデルを設計することが望まれています。この困難な問題に対処するために、この論文では、カーネルヒルベルト空間を再現する際のグループスパース加法モデルを使用した分類を調査します。 \emph{group sparse additive machine} (GroupSAM) と呼ばれる新しい分類方法が、入力変数間の構造情報を探索して利用するために提案されています。一般化誤差限界は、サンプル誤差分析と経験的カバー数、および仮説誤差推定と飛び石手法を統合することによって導出され、証明されます。新しい限界は、GroupSAM が多項式減衰で満足のいく学習率を達成できることを示しています。合成データと 7 つのベンチマークデータセットに関する実験結果は、私たちの新しいアプローチの有効性を一貫して示しています。
21	20	https://papers.nips.cc//paper/6625-uprooting-and-rerooting-higher-order-graphical-models	Uprooting and Rerooting Higher-Order Graphical Models	高次グラフィカルモデルのルートの削除と再ルート	The idea of uprooting and rerooting graphical models was introduced specifically for binary pairwise models by Weller (2016) as a way to transform a model to any of a whole equivalence class of related models, such that inference on any one model yields inference results for all others. This is very helpful since inference, or relevant bounds, may be much easier to obtain or more accurate for some model in the class. Here we introduce methods to extend the approach to models with higher-order potentials and develop theoretical insights. In particular, we show that the triplet-consistent polytope TRI is unique in being `universally rooted'. We demonstrate empirically that rerooting can significantly improve accuracy of methods of inference for higher-order models at negligible computational cost.	グラフィカルモデルの根絶と再ルートのアイデアは、モデルを関連モデルの同値クラス全体のいずれかに変換する方法として、特に Weller (2016) によってバイナリペアワイズモデルに導入されました。これにより、任意の 1 つのモデルに対する推論がすべてのモデルの推論結果が得られます。その他。クラス内の一部のモデルでは、推論または関連する境界を取得するのがはるかに簡単、またはより正確になる可能性があるため、これは非常に役立ちます。ここでは、高次のポテンシャルを持つモデルへのアプローチを拡張し、理論的な洞察を開発する方法を紹介します。特に、我々は、三重項一貫性のあるポリトープ TRI が「普遍的に根を張った」という点で独特であることを示します。私たちは、再ルート化により、無視できる計算コストで高次モデルの推論方法の精度が大幅に向上することを経験的に示します。
22	21	https://papers.nips.cc//paper/6626-the-unreasonable-effectiveness-of-structured-random-orthogonal-embeddings	The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings	構造化ランダム直交埋め込みの不合理な効果	We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation. For both the Johnson-Lindenstrauss transform and the angular kernel, we show that we can select matrices yielding guaranteed improved performance in accuracy and/or speed compared to earlier methods. We introduce matrices with complex entries which give significant further accuracy improvement. We provide geometric and Markov chain-based perspectives to help understand the benefits, and empirical results which suggest that the approach is helpful in a wider range of applications.	次元削減やカーネル近似などの多くの機械学習アプリケーションに適用できる、直交行を持つ構造化ランダム行列に基づく埋め込みのクラスを調べます。ジョンソン・リンデンシュトラウス変換と角度カーネルの両方について、以前の方法と比較して精度および/または速度の向上が保証された行列を選択できることを示します。精度をさらに大幅に向上させる複雑なエントリを含む行列を導入します。私たちは、利点を理解するのに役立つ幾何学的およびマルコフ連鎖ベースの視点と、このアプローチがより広範囲のアプリケーションで役立つことを示唆する経験的結果を提供します。
23	22	https://papers.nips.cc//paper/6627-from-parity-to-preference-based-notions-of-fairness-in-classification	From Parity to Preference-based Notions of Fairness in Classification	パリティから優先ベースの分類における公平性の概念へ	The adoption of automated, data-driven decision making in an ever expanding range of applications has raised concerns about its potential unfairness towards certain social groups. In this context, a number of recent studies have focused on defining, detecting, and removing unfairness from data-driven decision systems. However, the existing notions of fairness, based on parity (equality) in treatment or outcomes for different social groups, tend to be quite stringent, limiting the overall decision making accuracy. In this paper, we draw inspiration from the fair-division and envy-freeness literature in economics and game theory and propose preference-based notions of fairness -- given the choice between various sets of decision treatments or outcomes, any group of users would collectively prefer its treatment or outcomes, regardless of the (dis)parity as compared to the other groups. Then, we introduce tractable proxies to design margin-based classifiers that satisfy these preference-based notions of fairness. Finally, we experiment with a variety of synthetic and real-world datasets and show that preference-based fairness allows for greater decision accuracy than parity-based fairness.	自動化されたデータ主導の意思決定がアプリケーションの範囲を拡大し続けているため、特定の社会グループに対する潜在的な不公平性についての懸念が生じています。これに関連して、最近の多くの研究は、データ駆動型意思決定システムの不公平性を定義、検出、除去することに焦点を当てています。しかし、さまざまな社会集団に対する治療や結果の平等（平等）に基づく既存の公平性の概念は非常に厳格である傾向があり、全体的な意思決定の正確性が制限されています。この論文では、経済学とゲーム理論における公平な分割と羨望のない自由度の文献からインスピレーションを得て、好みに基づいた公平性の概念を提案します。さまざまな決定処理または結果のセットの間で選択が与えられた場合、ユーザーのグループは集合的に、他のグループと比較した（格差）に関係なく、その治療や結果を好みます。次に、扱いやすいプロキシを導入して、これらの好みに基づく公平性の概念を満たすマージンに基づく分類器を設計します。最後に、さまざまな合成データセットと現実世界のデータセットを実験し、優先ベースの公平性の方がパリティベースの公平性よりも高い決定精度を実現できることを示します。
24	23	https://papers.nips.cc//paper/6628-inferring-generative-model-structure-with-static-analysis	Inferring Generative Model Structure with Static Analysis	静的解析による生成モデル構造の推測	Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning n-th degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.	複雑な識別モデルを堅牢にトレーニングするために十分なラベル付きデータを取得することは、機械学習パイプラインの大きなボトルネックです。一般的なソリューションは、生成モデルを使用して弱い監視の複数のソースを組み合わせることです。これらのモデルの構造はトレーニングラベルの品質に影響しますが、グラウンドトゥルースラベルなしで学習することは困難です。代わりに、プログラムでエンコードされているため、何らかの構造を持つ弱い監視ソースに依存します。我々は、これらのヒューリスティックのコードを静的に分析することで生成モデルの構造を推論し、構造の学習に必要なデータ量を大幅に削減するパラダイムである Coral を紹介します。 Coral のサンプルの複雑さは、ヒューリスティックの数と特定された関係の数に応じて準線形にスケールし、n 次の関係を学習する場合に n で指数関数的に増加する標準的なサンプルの複雑さを改善することを証明します。経験的には、Coral は従来の構造学習アプローチと同等か、それを最大 3.81 F1 ポイント上回るパフォーマンスを示します。 Coral を使用して独立性を仮定するのではなく依存関係をモデル化すると、ヒューリスティックを使用してグラウンドトゥルースラベルを使用せずに放射線医学データにラベルを付ける場合、完全教師ありモデルよりも 3.07 精度ポイント高いパフォーマンスが得られます。
25	24	https://papers.nips.cc//paper/6629-structured-embedding-models-for-grouped-data	Structured Embedding Models for Grouped Data	グループ化されたデータの構造化埋め込みモデル	Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (S-EFE), a method for discovering embeddings that vary across related groups of data. We study how the word usage of U.S. Congressional speeches varies across states and party affiliation, how words are used differently across sections of the ArXiv, and how the co-purchase patterns of groceries can vary across seasons. Key to the success of our method is that the groups share statistical information. We develop two sharing strategies: hierarchical modeling and amortization. We demonstrate the benefits of this approach in empirical studies of speeches, abstracts, and shopping baskets. We show how SEFE enables group-specific interpretation of word usage, and outperforms EFE in predicting held-out data.	単語埋め込みは言語を分析するための強力なアプローチであり、指数関数ファミリー埋め込み (EFE) はそれを他のタイプのデータに拡張します。ここでは、関連するデータグループ間で異なるエンベディングを発見するための方法である、構造化指数関数ファミリーエンベディング (S-EFE) を開発します。私たちは、米国議会の演説での単語の使用が州や党派によってどのように異なるか、ArXiv のセクション間で単語の使用がどのように異なるか、食料品の共同購入パターンが季節によってどのように変化するかを研究しています。私たちの方法の成功の鍵は、グループが統計情報を共有することです。私たちは、階層モデリングと償却という 2 つの共有戦略を開発します。私たちは、スピーチ、要約、買い物かごの実証的研究において、このアプローチの利点を実証します。 SEFE がどのように単語使用のグループ固有の解釈を可能にし、保持されたデータの予測において EFE よりも優れているのかを示します。
26	25	https://papers.nips.cc//paper/6630-a-linear-time-kernel-goodness-of-fit-test	A Linear-Time Kernel Goodness-of-Fit Test	線形時間カーネル適合度テスト	We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.	計算コストがサンプル数に線形となる、新しい適合度の適応テストを提案します。偽陰性率を最小限に抑えることで、観察されたサンプルと参照モデルの間の違いを最もよく示すテスト特徴を学習します。これらの特徴は Stein の方法によって構築されます。つまり、モデルの正規化定数を計算する必要はありません。新しいテストの漸近バハードゥル効率を分析し、平均値シフト代替法の下では、テストのパラメーターの選択に関係なく、テストの相対効率が常に以前の線形時間カーネルテストよりも高いことを証明します。実験では、私たちの方法のパフォーマンスは以前の線形時間テストのパフォーマンスを上回り、二次時間カーネルテストの検出力と同等かそれを超えています。高次元でモデル構造が利用される可能性がある場合、適合度テストは、モデルから抽出されたサンプルを使用した最大平均乖離に基づく二次時間の 2 サンプルテストよりもはるかに優れたパフォーマンスを発揮します。
27	26	https://papers.nips.cc//paper/6631-cortical-microcircuits-as-gated-recurrent-neural-networks	Cortical microcircuits as gated-recurrent neural networks	ゲートリカレントニューラルネットワークとしての皮質微小回路	Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which information is gated through inhibitory cells that are subtractive (subLSTM). We propose a natural mapping of subLSTMs onto known canonical excitatory-inhibitory cortical microcircuits. Our empirical evaluation across sequential image classification and language modelling tasks shows that subLSTM units can achieve similar performance to LSTM units. These results suggest that cortical circuits can be optimised to solve complex contextual problems and proposes a novel view on their computational function. Overall our work provides a step towards unifying recurrent networks as used in machine learning with their biological counterparts.	皮質回路は、さまざまな脳領域にわたって非常に類似した複雑な反復構造を示します。このようなステレオタイプの構造は、共通の計算原理の存在を示唆しています。しかし、そのような原則は依然としてほとんどとらえどころのないままです。ゲート型記憶ネットワーク、すなわち長短期記憶ネットワーク (LSTM) に触発されて、減算的な抑制性細胞 (subLSTM) を通じて情報がゲート制御されるリカレントニューラルネットワークを紹介します。我々は、既知の標準的な興奮性抑制性皮質微小回路上へのsubLSTMの自然なマッピングを提案します。逐次画像分類タスクと言語モデリングタスクにわたる経験的評価では、subLSTM ユニットが LSTM ユニットと同様のパフォーマンスを達成できることが示されています。これらの結果は、複雑な文脈上の問題を解決するために皮質回路を最適化できることを示唆しており、皮質回路の計算機能に関する新しい見解を提案しています。全体として、私たちの研究は、機械学習で使用されるリカレントネットワークと生物学的対応物を統合するための一歩を提供します。
28	27	https://papers.nips.cc//paper/6632-k-support-and-ordered-weighted-sparsity-for-overlapping-groups-hardness-and-algorithms	k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms	重複グループの k サポートと順序重み付きスパース性: 硬度とアルゴリズム	The k-support and OWL norms generalize the l1 norm, providing better prediction accuracy and better handling of correlated variables. We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups. The resulting norms are in general NP-hard to compute, but they are tractable for certain collections of groups. To demonstrate this fact, we develop a dynamic program for the problem of projecting onto the set of vectors supported by a fixed number of groups. Our dynamic program utilizes tree decompositions and its complexity scales with the treewidth. This program can be converted to an extended formulation which, for the associated group structure, models the k-group support norms and an overlapping group variant of the ordered weighted l1 norm. Numerical results demonstrate the efficacy of the new penalties.	k-support および OWL ノルムは l1 ノルムを一般化し、より優れた予測精度と相関変数のより適切な処理を提供します。 k-support ノルムと OWL ノルムを重複するグループが存在する設定に拡張することで得られるノルムを研究します。結果として得られる規範は、一般に NP で計算するのが困難ですが、グループの特定のコレクションでは扱いやすくなります。この事実を実証するために、固定数のグループによってサポートされるベクトルのセットに投影する問題に対する動的プログラムを開発します。私たちの動的プログラムはツリー分解を利用しており、その複雑さはツリー幅に応じて変化します。このプログラムは、関連するグループ構造について、k グループのサポートノルムと順序付けされた重み付き l1 ノルムの重複するグループバリアントをモデル化する拡張定式化に変換できます。数値結果は、新しい罰則の有効性を示しています。
29	28	https://papers.nips.cc//paper/6633-a-simple-model-of-recognition-and-recall-memory	A simple model of recognition and recall memory	記憶の認識と想起の単純なモデル	We show that several striking differences in memory performance between recognition and recall tasks are explained by an ecological bias endemic in classic memory experiments - that such experiments universally involve more stimuli than retrieval cues. We show that while it is sensible to think of recall as simply retrieving items when probed with a cue - typically the item list itself - it is better to think of recognition as retrieving cues when probed with items. To test this theory, by manipulating the number of items and cues in a memory experiment, we show a crossover effect in memory performance within subjects such that recognition performance is superior to recall performance when the number of items is greater than the number of cues and recall performance is better than recognition when the converse holds. We build a simple computational model around this theory, using sampling to approximate an ideal Bayesian observer encoding and retrieving situational co-occurrence frequencies of stimuli and retrieval cues. This model robustly reproduces a number of dissociations in recognition and recall previously used to argue for dual-process accounts of declarative memory.	我々は、認識課題と想起課題の間の記憶能力におけるいくつかの顕著な違いが、古典的な記憶実験に特有の生態学的バイアスによって説明されること、つまりそのような実験には一般に、検索手がかりよりも多くの刺激が含まれることを示す。我々は、想起を、手がかり (通常は項目リスト自体) で調べたときに単に項目を取得するものと考えるのが賢明である一方、認識は項目で調べたときの手がかりを取得すると考える方がよいことを示します。この理論を検証するために、記憶実験で項目と手がかりの数を操作することにより、項目の数が手がかりの数よりも多い場合、認識性能が想起性能よりも優れているなど、被験者内の記憶能力におけるクロスオーバー効果を示します。逆が成り立つ場合、再現パフォーマンスは認識パフォーマンスよりも優れています。我々は、この理論に基づいて単純な計算モデルを構築し、サンプリングを使用して理想的なベイジアン観察者エンコーディングを近似し、刺激と検索キューの状況に応じた共起頻度を取得します。このモデルは、宣言的記憶の二重プロセス説明を議論するために以前に使用された、認識と想起における多くの解離を堅牢に再現します。
30	29	https://papers.nips.cc//paper/6634-on-structured-prediction-theory-with-calibrated-convex-surrogate-losses	On Structured Prediction Theory with Calibrated Convex Surrogate Losses	校正された凸サロゲート損失を使用した構造化予測理論について	We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called ""calibration function"" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.	私たちは、一貫性を保証した効率的な凸型サロゲート損失の最小化という文脈での構造化予測に関する新しい理論的洞察を提供します。あらゆるタスク損失に対して、確率的勾配降下法によって最適化できる凸型サロゲートを構築し、過剰なサロゲートリスクを実際のリスクに関連付けるいわゆる「キャリブレーション関数」の厳しい限界を証明します。これまでの関連研究とは対照的に、学習の保証におけるクラスの指数関数的な数の影響と、最適化の複雑さへの影響を注意深く監視しています。興味深い結果として、一部のタスクの損失は他のタスクよりも学習を難しくし、古典的な 0-1 の損失は構造化された予測には不向きであるという直観を形式化しました。
31	30	https://papers.nips.cc//paper/6635-best-of-both-worlds-transferring-knowledge-from-discriminative-learning-to-a-generative-visual-dialog-model	Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model	両方の長所: 識別学習から生成視覚対話モデルへの知識の伝達	We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce 'safe' and generic responses like ""I don't know"", ""I can't tell""). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it can not be deployed to have real conversations with users. Our work aims to achieve the best of both worlds -- the practical usefulness of G and the strong performance of D -- via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution -- specifically, a RNN is augmented with a sequence of GS samplers, which coupled with the straight-through gradient estimator enables end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10). The source code can be downloaded from https://github.com/jiasenlu/visDial.pytorch	ニューラルシーケンスモデル、特に根拠のあるダイアログ生成のための新しいトレーニングフレームワークを紹介します。これらのモデルの標準的なトレーニングパラダイムは、最尤推定 (MLE)、つまり人間の応答のクロスエントロピーを最小化することです。さまざまなドメインにわたって、MLE でトレーニングされた生成ニューラルダイアログモデル (G) で繰り返し発生する問題は、「わかりません」、「わかりません」などの「安全な」一般的な応答を生成する傾向があることです。 ")。対照的に、人間の応答候補のリストをランク付けするように訓練された識別対話モデル (D) は、対応する生成型対話モデルよりも優れたパフォーマンスを発揮します。自動メトリクス、多様性、応答の有益性の観点から。ただし、D はユーザーと実際に会話するために導入できないため、実際には役に立ちません。私たちの仕事は、D から G への知識の伝達を通じて、G の実用的な有用性と D の強力なパフォーマンスという両方の長所を達成することを目的としています。私たちの主な貢献は、エンドツーエンドのトレーニング可能な生成ビジュアルダイアログモデルです。 G は、G からサンプリングされたシーケンスの知覚的な (敵対的ではない) 損失として D から勾配を受け取ります。離散分布に対する最近提案されたガンベルソフトマックス (GS) 近似を活用します。特に、RNN は GS サンプラーのシーケンスで強化されます。、ストレートスルー勾配推定器と組み合わせることで、エンドツーエンドの微分可能性が可能になります。また、視覚的な対話のためのより強力なエンコーダを導入し、回答のエンコーディングにセルフアテンションメカニズムを採用し、メトリック学習損失とともに、D が回答の意味的類似性をより適切に捕捉できるようにします。全体として、私たちが提案したモデルは、VisDial データセットにおける最先端のモデルを大幅に上回っています (再現率 @10 で 2.67%)。ソースコードは https://github.com/jiasenlu/visDial.pytorch からダウンロードできます。
32	31	https://papers.nips.cc//paper/6636-maskrnn-instance-level-video-object-segmentation	MaskRNN: Instance Level Video Object Segmentation	MaskRNN: インスタンスレベルのビデオオブジェクトセグメンテーション	Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance - a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent component and the localization component, our method is able to take advantage of long-term temporal structures of the video data as well as rejecting outliers. We validate the proposed algorithm on three challenging benchmark datasets, the DAVIS-2016 dataset, the DAVIS-2017 dataset, and the Segtrack v2 dataset, achieving state-of-the-art performance on all of them.	インスタンスレベルのビデオオブジェクトのセグメンテーションは、ビデオの編集と圧縮にとって重要な技術です。時間的コヒーレンスを捉えるために、この論文では、MaskRNN を開発します。MaskRNN は、オブジェクトインスタンスごとに 2 つのディープネットの出力を各フレームで融合するリカレントニューラルネットアプローチです。マスクを提供するバイナリセグメンテーションネットとバウンディングボックスを提供するローカリゼーションネットです。。反復コンポーネントとローカリゼーションコンポーネントにより、私たちの方法は外れ値を排除するだけでなく、ビデオデータの長期の時間構造を利用することができます。私たちは、DAVIS-2016 データセット、DAVIS-2017 データセット、Segtrack v2 データセットという 3 つの困難なベンチマークデータセットで提案されたアルゴリズムを検証し、それらすべてで最先端のパフォーマンスを達成しました。
33	32	https://papers.nips.cc//paper/6637-gated-recurrent-convolution-neural-network-for-ocr	Gated Recurrent Convolution Neural Network for OCR	OCR 用のゲート型リカレント畳み込みニューラルネットワーク	Optical Character Recognition (OCR) aims to recognize text in natural images. Inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network (RCNN), we propose a new architecture named Gated RCNN (GRCNN) for solving this problem. Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. In addition, an efficient Bidirectional Long Short-Term Memory (BLSTM) is built for sequence modeling. The GRCNN is combined with BLSTM to recognize text in natural images. The entire GRCNN-BLSTM model can be trained end-to-end. Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR.	光学式文字認識 (OCR) は、自然画像内のテキストを認識することを目的としています。一般的な画像分類用に最近提案されたモデルであるリカレント畳み込みニューラルネットワーク (RCNN) に触発され、この問題を解決するために、ゲート付き RCNN (GRCNN) と呼ばれる新しいアーキテクチャを提案します。その重要なコンポーネントであるゲート再帰畳み込み層 (GRCL) は、RCNN の重要なコンポーネントである再帰畳み込み層 (RCL) にゲートを追加することによって構築されます。ゲートは RCL のコンテキスト変調を制御し、フィードフォワード情報と反復情報のバランスをとります。さらに、効率的な双方向長期短期メモリ (BLSTM) がシーケンスモデリング用に構築されています。 GRCNN は BLSTM と組み合わされて、自然画像内のテキストを認識します。 GRCNN-BLSTM モデル全体をエンドツーエンドでトレーニングできます。実験の結果、提案されたモデルは、IIIT-5K、ストリートビューテキスト (SVT)、ICDAR を含むいくつかのベンチマークデータセットで既存の方法よりも優れたパフォーマンスを発揮することが示されています。
34	33	https://papers.nips.cc//paper/6638-towards-accurate-binary-convolutional-neural-network	Towards Accurate Binary Convolutional Neural Network	正確なバイナリ畳み込みニューラルネットワークを目指して	We introduce a novel scheme to train binary convolutional neural networks (CNNs) -- CNNs with weights and activations constrained to \{-1,+1\} at run-time. It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption. However, previous works on binarizing CNNs usually result in severe prediction accuracy degradation. In this paper, we address this issue with two major innovations: (1) approximating full-precision weights with the linear combination of multiple binary weight bases; (2) employing multiple binary activations to alleviate information loss. The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations.	バイナリ畳み込みニューラルネットワーク (CNN)、つまり実行時に \{-1,+1\} に制約された重みとアクティベーションを持つ CNN をトレーニングするための新しいスキームを導入します。バイナリの重みとアクティベーションを使用すると、メモリサイズとアクセスが大幅に削減され、算術演算をより効率的なビット単位の演算に置き換えることができるため、テスト時間の推論が大幅に高速化され、消費電力が削減されることが知られています。ただし、CNN の 2 値化に関するこれまでの研究では、通常、予測精度が大幅に低下します。この論文では、2 つの主要な革新によってこの問題に対処します。(1) 複数のバイナリ重みベースの線形結合を使用して完全精度の重みを近似します。 (2) 情報損失を軽減するために複数のバイナリアクティベーションを採用します。 ABC-Net として示される、結果として得られるバイナリ CNN の実装は、完全精度の対応物にはるかに近いパフォーマンスを達成し、適切なバイナリ重みベースとアクティベーションが与えられた場合には、ImageNet および林道データセット上で同等の予測精度にさえ達することが示されています。
35	34	https://papers.nips.cc//paper/6639-semi-supervised-learning-for-optical-flow-with-generative-adversarial-networks	Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks	敵対的生成ネットワークを使用したオプティカルフローの半教師あり学習	Convolutional neural networks (CNNs) have recently been applied to the optical flow estimation problem. As training the CNNs requires sufficiently large ground truth training data, existing approaches resort to synthetic, unrealistic datasets. On the other hand, unsupervised methods are capable of leveraging real-world videos for training where the ground truth flow fields are not available. These methods, however, rely on the fundamental assumptions of brightness constancy and spatial smoothness priors which do not hold near motion boundaries. In this paper, we propose to exploit unlabeled videos for semi-supervised learning of optical flow with a Generative Adversarial Network. Our key insight is that the adversarial loss can capture the structural patterns of flow warp errors without making explicit assumptions. Extensive experiments on benchmark datasets demonstrate that the proposed semi-supervised algorithm performs favorably against purely supervised and semi-supervised learning schemes.	最近、畳み込みニューラルネットワーク (CNN) がオプティカルフロー推定問題に適用されています。 CNN のトレーニングには十分な大規模なグラウンドトゥルーストレーニングデータが必要であるため、既存のアプローチは合成された非現実的なデータセットに頼っています。一方、教師なし手法では、グラウンドトゥルースフローフィールドが利用できないトレーニングに現実世界のビデオを活用できます。ただし、これらの方法は、動きの境界付近では保持されない、明るさの一定性と空間的滑らかさの事前分布という基本的な仮定に依存しています。この論文では、敵対的生成ネットワークを使用したオプティカルフローの半教師あり学習にラベルのないビデオを活用することを提案します。私たちの重要な洞察は、明示的な仮定をすることなく、敵対的損失によってフローワープエラーの構造パターンを捉えることができるということです。ベンチマークデータセットに関する広範な実験により、提案された半教師ありアルゴリズムが純粋教師あり学習スキームおよび半教師あり学習スキームに対して有利に機能することが実証されました。
36	35	https://papers.nips.cc//paper/6640-learning-a-multi-view-stereo-machine	Learning a Multi-View Stereo Machine	マルチビューステレオマシンの学習	We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming to geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches and recent learning based methods.	多視点立体視のための学習されたシステムを紹介します。 3D 再構成のための最近の学習ベースの方法とは対照的に、視線に沿った特徴の投影と非投影を通じて問題の基礎となる 3D ジオメトリを活用します。これらの操作を微分可能な方法で定式化することにより、メトリック 3D 再構築のタスクのためにシステムをエンドツーエンドで学習することができます。エンドツーエンドの学習により、幾何学的制約に準拠しながら形状事前分布について共同で推論できるため、古典的なアプローチで必要とされるよりもはるかに少ない画像 (単一の画像であっても) からの再構成や、目に見えない表面の完成が可能になります。私たちは、ShapeNet データセットに対するアプローチを徹底的に評価し、従来のアプローチや最近の学習ベースの手法と比較した利点を実証します。
37	36	https://papers.nips.cc//paper/6641-phase-transitions-in-the-pooled-data-problem	Phase Transitions in the Pooled Data Problem	プールされたデータ問題における相遷移	In this paper, we study the {\em pooled data} problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a {\em phase transition} between complete success and complete failure. In addition, we present a novel {\em noisy} variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an {\em approximate recovery} setting, where a given number of errors is allowed in the decoded labels.	このペーパーでは、プール内の各ラベルの数を明らかにする一連のプールされたテストに基づいて、アイテムの大規模なコレクションに関連付けられたラベルを識別するという {\em プールされたデータ} 問題を研究します。ノイズのない設定では、最適なデコードを使用して必要なテスト数で正確な漸近しきい値を特定し、完全な成功と完全な失敗の間の {\em 段階遷移} を証明します。さらに、この問題の新しい {\em noisy} バリエーションを提示し、一般的なランダムノイズモデルに必要なテスト数を特徴付けるための情報理論的フレームワークを提供します。私たちの結果は、低ノイズレベルでもスケーリング則が厳密に増加するため、ノイズが問題をかなり困難にする可能性があることを明らかにしています。最後に、{\em 近似回復} 設定で同様の動作を示します。この設定では、デコードされたラベルで指定された数のエラーが許可されます。
38	37	https://papers.nips.cc//paper/6642-universal-style-transfer-via-feature-transforms	Universal Style Transfer via Feature Transforms	フィーチャー変換によるユニバーサルスタイルの転送	Universal style transfer aims to transfer arbitrary visual styles to content images. Existing feed-forward based methods, while enjoying the inference efficiency, are mainly limited by inability of generalizing to unseen styles or compromised visual quality. In this paper, we present a simple yet effective method that tackles these limitations without training on any pre-defined styles. The key ingredient of our method is a pair of feature transforms, whitening and coloring, that are embedded to an image reconstruction network. The whitening and coloring transforms reflect direct matching of feature covariance of the content image to a given style image, which shares similar spirits with the optimization of Gram matrix based cost in neural style transfer. We demonstrate the effectiveness of our algorithm by generating high-quality stylized images with comparisons to a number of recent methods. We also analyze our method by visualizing the whitened features and synthesizing textures by simple feature coloring.	ユニバーサルスタイル転送は、任意のビジュアルスタイルをコンテンツ画像に転送することを目的としています。既存のフィードフォワードベースの手法は、推論効率を享受できますが、目に見えないスタイルや妥協したビジュアル品質に一般化できないことによって主に制限されています。このペーパーでは、事前定義されたスタイルのトレーニングを行わずにこれらの制限に対処する、シンプルかつ効果的な方法を紹介します。私たちの方法の重要な要素は、画像再構成ネットワークに埋め込まれた一対の特徴変換、ホワイトニングとカラーリングです。ホワイトニングとカラーリングの変換は、コンテンツ画像の特徴共分散と特定のスタイル画像の直接的なマッチングを反映しており、ニューラルスタイル転送におけるグラム行列ベースのコストの最適化と同様の精神を共有しています。多くの最近の方法と比較して、高品質の様式化された画像を生成することにより、アルゴリズムの有効性を実証します。また、白色化された特徴を視覚化し、単純な特徴の色付けによってテクスチャを合成することによって、この方法を分析します。
39	38	https://papers.nips.cc//paper/6643-on-the-model-shrinkage-effect-of-gamma-process-edge-partition-models	On the Model Shrinkage Effect of Gamma Process Edge Partition Models	ガンマ処理エッジパーティションモデルのモデル収縮効果について	The edge partition model (EPM) is a fundamental Bayesian nonparametric model for extracting an overlapping structure from binary matrix. The EPM adopts a gamma process ($\Gamma$P) prior to automatically shrink the number of active atoms. However, we empirically found that the model shrinkage of the EPM does not typically work appropriately and leads to an overfitted solution. An analysis of the expectation of the EPM's intensity function suggested that the gamma priors for the EPM hyperparameters disturb the model shrinkage effect of the internal $\Gamma$P. In order to ensure that the model shrinkage effect of the EPM works in an appropriate manner, we proposed two novel generative constructions of the EPM: CEPM incorporating constrained gamma priors, and DEPM incorporating Dirichlet priors instead of the gamma priors. Furthermore, all DEPM's model parameters including the infinite atoms of the $\Gamma$P prior could be marginalized out, and thus it was possible to derive a truly infinite DEPM (IDEPM) that can be efficiently inferred using a collapsed Gibbs sampler. We experimentally confirmed that the model shrinkage of the proposed models works well and that the IDEPM indicated state-of-the-art performance in generalization ability, link prediction accuracy, mixing efficiency, and convergence speed.	エッジ分割モデル (EPM) は、バイナリ行列から重複構造を抽出するための基本的なベイジアンノンパラメトリックモデルです。 EPM は、アクティブな原子の数を自動的に縮小する前に、ガンマプロセス ($\Gamma$P) を採用します。ただし、EPM のモデル縮小は通常は適切に機能せず、過剰適合したソリューションにつながることが経験的にわかりました。 EPM の強度関数の期待値の分析により、EPM ハイパーパラメーターのガンマ事前分布が内部 $\Gamma$P のモデル収縮効果を妨害することが示唆されました。 EPM のモデル収縮効果が適切に機能することを保証するために、我々は、EPM の 2 つの新しい生成構造を提案しました。制約付きガンマ事前分布を組み込んだ CEPM と、ガンマ事前分布の代わりにディリクレ事前分布を組み込んだ DEPM です。さらに、$\Gamma$P 事前の無限原子を含むすべての DEPM のモデルパラメーターを周辺化することができ、したがって、コラプスされた Gibbs サンプラーを使用して効率的に推論できる真の無限 DEPM (IDEPM) を導出することができました。提案されたモデルのモデル縮小がうまく機能し、IDEPM が汎化能力、リンク予測精度、混合効率、収束速度において最先端のパフォーマンスを示すことを実験的に確認しました。
40	39	https://papers.nips.cc//paper/6644-pose-guided-person-image-generation	Pose Guided Person Image Generation	ポーズガイド付き人物画像の生成	This paper proposes the novel Pose Guided Person Generation Network (PG$^2$) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose. Our generation framework PG$^2$ utilizes the pose information explicitly and consists of two key stages: pose integration and image refinement. In the first stage the condition image and the target pose are fed into a U-Net-like network to generate an initial but coarse image of the person with the target pose. The second stage then refines the initial and blurry result by training a U-Net-like generator in an adversarial way. Extensive experimental results on both 128$\times$64 re-identification images and 256$\times$256 fashion photos show that our model generates high-quality person images with convincing details.	本稿では、人物の画像と新しいポーズに基づいて、任意のポーズの人物画像を合成できる新しいポーズ誘導人物生成ネットワーク (PG$^2$) を提案します。私たちの生成フレームワーク PG$^2$ はポーズ情報を明示的に利用し、ポーズの統合と画像の洗練という 2 つの主要な段階で構成されます。第 1 段階では、条件画像とターゲットポーズが U-Net のようなネットワークに入力され、ターゲットポーズを持つ人物の初期ではあるが粗い画像が生成されます。次に、第 2 段階では、U-Net のようなジェネレーターを敵対的な方法でトレーニングすることにより、初期のぼやけた結果を洗練します。 128$\times$64 枚の再識別画像と 256$\times$256 枚のファッション写真の両方に関する広範な実験結果は、私たちのモデルが説得力のある詳細を備えた高品質の人物画像を生成することを示しています。
41	40	https://papers.nips.cc//paper/6645-inference-in-graphical-models-via-semidefinite-programming-hierarchies	Inference in Graphical Models via Semidefinite Programming Hierarchies	半明確なプログラミング階層を介したグラフィカルモデルの推論	Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{\Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $d\ge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.	グラフィカルモデルにおける最大事後確率 (MAP) 推論は、グラフ構造の組み合わせ最適化問題を解くことになります。信念伝播 (BP) や一般化信念伝播 (GBP) などの一般的な推論アルゴリズムは、シェラリ-アダムス階層内の線形計画法 (LP) 緩和と密接に関連しています。これらのアルゴリズムの人気にもかかわらず、半定値計画法 (SDP) に基づく二乗和 (SOS) 階層が優れた保証を提供できることはよく理解されています。残念ながら、$n$ 頂点を持つグラフの SOS 緩和には、$n^{\Theta(d)}$ 変数 ($d$ は階層の次数) を使用して SDP を解く必要があります。実際には、$d\ge 4$ の場合、このアプローチは数十の変数を超えて拡張できません。この論文では、計算効率に焦点を当てた 2 つの革新を備えた SOS 階層を使用した MAP 推論のためのバイナリ SDP 緩和を提案します。まず、BP とその変形と同様に、グラフィカルモデル内の連続領域に対応する決定変数のみを導入します。次に、非凸 Burer-Monteiro スタイルの手法を使用して結果の SDP を解き、逐次丸め手順を開発します。結果として得られたアルゴリズムが数分以内に数万の変数を伴う問題を解決でき、画像ノイズ除去やイジングスピングラスなどの実際的な問題では BP および GBP よりも優れたパフォーマンスを発揮することを実証します。最後に、特定のグラフタイプについて、提案されている部分的な SOS 緩和の緊密さのための十分条件を確立します。
42	41	https://papers.nips.cc//paper/6646-variable-importance-using-decision-trees	Variable Importance Using Decision Trees	デシジョンツリーを使用した変数重要度	Decision trees and random forests are well established models that not only offer good predictive performance, but also provide rich feature importance information. While practitioners often employ variable importance methods that rely on this impurity-based information, these methods remain poorly characterized from a theoretical perspective. We provide novel insights into the performance of these methods by deriving finite sample performance guarantees in a high-dimensional setting under various modeling assumptions. We further demonstrate the effectiveness of these impurity-based methods via an extensive set of simulations.	デシジョンツリーとランダムフォレストは確立されたモデルであり、優れた予測パフォーマンスを提供するだけでなく、豊富な特徴の重要性情報も提供します。実務者は、この不純物ベースの情報に依存する可変重要度手法を採用することがよくありますが、これらの手法は理論的な観点からは十分に特徴付けられていません。私たちは、さまざまなモデリング仮定の下で高次元の設定で有限サンプルのパフォーマンス保証を導き出すことで、これらのメソッドのパフォーマンスに関する新しい洞察を提供します。さらに、一連の広範なシミュレーションを通じて、これらの不純物ベースの手法の有効性を実証します。
43	42	https://papers.nips.cc//paper/6647-preventing-gradient-explosions-in-gated-recurrent-units	Preventing Gradient Explosions in Gated Recurrent Units	ゲート反復ユニットでの勾配爆発の防止	A gated recurrent unit (GRU) is a successful recurrent neural network architecture for time-series data. The GRU is typically trained using a gradient-based method, which is subject to the exploding gradient problem in which the gradient increases significantly. This problem is caused by an abrupt change in the dynamics of the GRU due to a small variation in the parameters. In this paper, we find a condition under which the dynamics of the GRU changes drastically and propose a learning method to address the exploding gradient problem. Our method constrains the dynamics of the GRU so that it does not drastically change. We evaluated our method in experiments on language modeling and polyphonic music modeling. Our experiments showed that our method can prevent the exploding gradient problem and improve modeling accuracy.	ゲート付きリカレントユニット (GRU) は、時系列データに対する成功したリカレントニューラルネットワークアーキテクチャです。 GRU は通常、勾配ベースの方法を使用してトレーニングされます。これは、勾配が大幅に増加する爆発勾配問題の影響を受けます。この問題は、パラメータのわずかな変動による GRU のダイナミクスの突然の変化によって引き起こされます。本論文では、GRU のダイナミクスが劇的に変化する条件を発見し、爆発勾配問題に対処するための学習方法を提案します。私たちの方法では、GRU のダイナミクスが大幅に変化しないように制限します。私たちは言語モデリングとポリフォニック音楽モデリングの実験でこの方法を評価しました。私たちの実験では、私たちの方法が勾配爆発の問題を回避し、モデリングの精度を向上できることが示されました。
44	43	https://papers.nips.cc//paper/6648-on-the-power-of-truncated-svd-for-general-high-rank-matrix-estimation-problems	On the Power of Truncated SVD for General High-rank Matrix Estimation Problems	一般的な高位行列推定問題に対する切り捨て SVD の威力について	We show that given an estimate $\widehat{\mat A}$ that is close to a general high-rank positive semi-definite (PSD) matrix $\mat A$ in spectral norm (i.e., $\\|\widehat{\mat A}-\mat A\\|_2 \leq \delta$), the simple truncated Singular Value Decomposition of $\widehat{\mat A}$ produces a multiplicative approximation of $\mat A$ in Frobenius norm. This observation leads to many interesting results on general high-rank matrix estimation problems: 1.High-rank matrix completion: we show that it is possible to recover a {general high-rank matrix} $\mat A$ up to $(1+\varepsilon)$ relative error in Frobenius norm from partial observations, with sample complexity independent of the spectral gap of $\mat A$. 2.High-rank matrix denoising: we design algorithms that recovers a matrix $\mat A$ with relative error in Frobenius norm from its noise-perturbed observations, without assuming $\mat A$ is exactly low-rank. 3.Low-dimensional estimation of high-dimensional covariance: given $N$ i.i.d.~samples of dimension $n$ from $\mathcal N_n(\mat 0,\mat A)$, we show that it is possible to estimate the covariance matrix $\mat A$ with relative error in Frobenius norm with $N\approx n$,improving over classical covariance estimation results which requires $N\approx n^2$.	与えられた推定値 $\widehat{\mat A}$ が、スペクトルノルムにおける一般的な高位正半定値 (PSD) 行列 $\mat A$ に近いことを示します (つまり $\\|\widehat{\ mat A}-\mat A\\|_2 \leq \delta$)、$\widehat{\mat A}$ の単純な切り捨て特異値分解は、フロベニウスノルムにおける $\mat A$ の乗法近似を生成します。この観察は、一般的な高位行列の推定問題に関する多くの興味深い結果につながります。 1.高位行列の補完: {一般的な高位行列} $\mat A$ を $(1 まで) 回復できることを示します。 +\varepsilon)$ 部分観測からのフロベニウスノルムの相対誤差。サンプルの複雑さは $\mat A$ のスペクトルギャップに依存しません。 2.高ランク行列のノイズ除去: $\mat A$ がまさに低ランクであると仮定することなく、ノイズ摂動観測からフロベニウスノルムの相対誤差を持つ行列 $\mat A$ を復元するアルゴリズムを設計します。 3.高次元共分散の低次元推定: $\mathcal N_n(\mat 0,\mat A)$ から次元 $n$ の $N$ i.i.d.~samples が与えられた場合、共分散を推定できることを示します。行列 $\mat A$ は $N\estimate n$ のフロベニウスノルムに相対誤差を持ち、$N\estimate n^2$ を必要とする古典的な共分散推定結果よりも改善されています。
45	44	https://papers.nips.cc//paper/6649-f-gans-in-an-information-geometric-nutshell	f-GANs in an Information Geometric Nutshell	情報幾何学の要約における f-GAN	Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints on the generator's design and compare to the flourishing but almost exclusively experimental literature on the subject? In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families ---. We show that current deep architectures are able to factorize a very large number of such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in the $f$-GAN game. This result holds given a sufficient condition on \textit{activation functions} --- which turns out to be satisfied by popular choices. The key to our results is a variational generalization of an old theorem that relates the KL divergence between regular exponential families and divergences between their natural parameters. We complete this picture with additional results and experimental insights on how these results may be used to ground further improvements of GAN architectures, via (i) a principled design of the activation functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator.	Nowozin \textit{et al} は昨年、GAN \textit{principle} をすべての $f$-divergences に拡張する方法を示しました。このアプローチは洗練されていますが、教師ありゲームの完全な説明には及ばず、主要なプレーヤーであるジェネレーターについてはほとんど述べていません。たとえば、GAN ゲームを解くことがパラメータの空間での収束を意味する場合、ジェネレーターは実際に何に収束するのでしょうか?それは発電機の設計に関するヒントを提供し、この主題に関する盛んではあるがほぼ実験的な文献と比較してどうなるでしょうか?この論文では、そのような収束が起こる分布の広範なクラス、つまり変形指数関数族、指数関数族の幅広いスーパーセットを明らかにします。現在のディープアーキテクチャは、特にコンパクトな設計を使用して非常に多くの密度を因数分解できることを示し、$f$-GAN ゲームにおけるディープアーキテクチャのパワーとその一致性を示します。この結果は、\textit{活性化関数} に関する十分条件を前提として成立します --- 一般的な選択肢によって満たされることが判明しました。私たちの結果の鍵は、正規指数関数族間の KL 発散とそれらの自然パラメータ間の発散を関連付ける古い定理の変分一般化です。追加の結果と、(i) ジェネレーターの活性化関数の原理的な設計、および (ii) 適切な複合損失の明示的な統合を介して、これらの結果を GAN アーキテクチャのさらなる改善にどのように使用できるかについての実験的洞察を加えて、この全体像を完成させます。 ' 識別子のリンク関数。
46	45	https://papers.nips.cc//paper/6650-toward-multimodal-image-to-image-translation	Toward Multimodal Image-to-Image Translation	マルチモーダルな画像から画像への変換に向けて	Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a distribution of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.	単一の入力画像が複数の可能な出力に対応する可能性があるため、画像間の変換の問題の多くはあいまいです。この作業では、条件付き生成モデリング設定で可能な出力の分布をモデル化することを目的としています。マッピングの曖昧さは低次元の潜在ベクトルに抽出され、テスト時にランダムにサンプリングできます。ジェネレーターは、指定された入力をこの潜在コードと組み合わせて出力にマッピングすることを学習します。出力と潜在コードの間の接続が反転可能であることを明示的に推奨します。これは、モード崩壊の問題としても知られる、トレーニング中の潜在コードから出力への多対 1 マッピングを防止し、より多様な結果を生成するのに役立ちます。私たちは、さまざまなトレーニング目標、ネットワークアーキテクチャ、潜在コードを注入する方法を採用することで、このアプローチのいくつかのバリエーションを検討します。私たちが提案した方法は、潜在的な符号化モードと出力モードの間の全単射一貫性を促進します。私たちは、知覚的リアリズムと多様性の両方について、私たちの方法と他のバリアントの体系的な比較を示します。
47	46	https://papers.nips.cc//paper/6651-mixture-rank-matrix-approximation-for-collaborative-filtering	Mixture-Rank Matrix Approximation for Collaborative Filtering	協調フィルタリングのための混合ランク行列の近似	Low-rank matrix approximation (LRMA) methods have achieved excellent accuracy among today's collaborative filtering (CF) methods. In existing LRMA methods, the rank of user/item feature matrices is typically fixed, i.e., the same rank is adopted to describe all users/items. However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy. In this paper, a mixture-rank matrix approximation (MRMA) method is proposed, in which user-item ratings can be characterized by a mixture of LRMA models with different ranks. Meanwhile, a learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to MRMA. Experimental studies on MovieLens and Netflix datasets demonstrate that MRMA can outperform six state-of-the-art LRMA-based CF methods in terms of recommendation accuracy.	低ランク行列近似 (LRMA) 手法は、今日の協調フィルタリング (CF) 手法の中でも優れた精度を実現しています。既存の LRMA 方法では、ユーザー/アイテムの特徴行列のランクは通常固定されています。つまり、すべてのユーザー/アイテムを記述するために同じランクが採用されます。ただし、私たちの研究では、ランクの異なる部分行列が同じユーザーアイテム評価行列内に共存する可能性があるため、固定ランクによる近似では評価行列の内部構造を完全に記述することができず、そのため推奨精度が低下することが示されています。この論文では、混合ランク行列近似 (MRMA) 方法を提案します。この方法では、ユーザー項目の評価を、異なるランクの LRMA モデルの混合によって特徴付けることができます。一方、MRMA に関連する非凸最適化問題に取り組むために、反復条件モードを利用した学習アルゴリズムが提案されています。 MovieLens と Netflix データセットに関する実験研究では、MRMA が推奨精度の点で 6 つの最先端の LRMA ベースの CF 手法を上回るパフォーマンスを発揮できることが実証されています。
48	47	https://papers.nips.cc//paper/6652-non-monotone-continuous-dr-submodular-maximization-structure-and-algorithms	Non-monotone Continuous DR-submodular Maximization: Structure and Algorithms	非単調な連続 DR サブモジュール最大化: 構造とアルゴリズム	DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we study the problem of maximizing non-monotone DR-submodular continuous functions under general down-closed convex constraints. We start by investigating geometric properties that underlie such objectives, e.g., a strong relation between (approximately) stationary points and global optimum is proved. These properties are then used to devise two optimization algorithms with provable guarantees. Concretely, we first devise a ""two-phase'' algorithm with 1/4 approximation guarantee. This algorithm allows the use of existing methods for finding (approximately) stationary points as a subroutine, thus, harnessing recent progress in non-convex optimization. Then we present a non-monotone Frank-Wolfe variant with 1/e approximation guarantee and sublinear convergence rate. Finally, we extend our approach to a broader class of generalized DR-submodular continuous functions, which captures a wider spectrum of applications. Our theoretical findings are validated on synthetic and real-world problem instances.	DR サブモジュラー連続関数は、決定点プロセス (DPP) における MAP 推論や、確率的サブモジュラーモデルの平均場推論など、幅広い現実世界のアプリケーションに関わる重要な目的です。 DR サブモジュール性は、多項式時間での正確な最小化と近似最大化の両方を可能にする非凸関数のサブクラスをキャプチャします。この研究では、一般的な下閉凸制約の下で非単調 DR サブモジュラー連続関数を最大化する問題を研究します。私たちは、そのような目的の基礎となる幾何学的特性を調査することから始めます。たとえば、(ほぼ) 静止点と大域最適との間の強い関係が証明されます。これらのプロパティは、証明可能な保証を持つ 2 つの最適化アルゴリズムを考案するために使用されます。具体的には、まず1/4近似を保証する「2フェーズ」アルゴリズムを考案します。このアルゴリズムを使用すると、サブルーチンとして (ほぼ) 静止点を見つけるための既存の方法を使用できるため、非凸最適化における最近の進歩を活用できます。次に、1/e 近似保証と線形未満の収束率を備えた非単調 Frank-Wolfe バリアントを提示します。最後に、私たちはアプローチをより広範なクラスの一般化された DR サブモジュール連続関数に拡張し、より広範囲のアプリケーションを捉えます。私たちの理論的発見は、合成および現実世界の問題インスタンスで検証されています。
49	48	https://papers.nips.cc//paper/6653-learning-with-average-top-k-loss	Learning with Average Top-k Loss	平均上位 k 損失による学習	In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method. We further study the Statistical Learning Theory of \matk by establishing its classification calibration and statistical consistency of \matk which provide useful insights on the practical choice of the parameter $k$. We demonstrate the applicability of \matk learning combined with different individual loss functions for binary and multi-class classification and regression using synthetic and real datasets.	この研究では、教師あり学習の新しいアンサンブル損失として、平均上位 $k$ (\atk) 損失を導入します。 \atk 損失は、広く使用されている 2 つのアンサンブル損失、つまり平均損失と最大損失を自然に一般化したものです。さらに、\atk loss はそれらの利点を組み合わせ、対応する欠点を軽減して、さまざまなデータ分布に適切に適応できます。 \atk 損失により、正しく分類されたデータに対する連続的かつ凸状の個別損失のペナルティを軽減する直観的な解釈が可能になることを示します。 \atk 損失は、従来のサブ勾配ベースの方法で効果的に解決できる凸最適化問題を引き起こす可能性があります。 \matk の分類校正と統計的一貫性を確立することにより、\matk の統計的学習理論をさらに研究します。これにより、パラメータ $k$ の実際的な選択に関する有用な洞察が得られます。合成データセットと実際のデータセットを使用して、バイナリおよびマルチクラスの分類と回帰に対するさまざまな個別の損失関数と組み合わせた \matk 学習の適用可能性を実証します。
50	49	https://papers.nips.cc//paper/6654-learning-multiple-visual-domains-with-residual-adapters	Learning multiple visual domains with residual adapters	残りのアダプターを使用して複数の視覚ドメインを学習する	There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to recognize well uniformly.	さまざまな種類の問題やデータに適した学習データ表現への関心が高まっています。この論文では、犬種から一時停止標識や数字に至るまで、非常に異なる種類の画像の分析にうまく利用できる単一の視覚表現を学習するタスクに特に注目します。別のパラメータを予測する学習ネットワークに関する最近の研究に触発され、アダプター残差モジュールを使用してさまざまな視覚領域にオンザフライで操作できる、調整可能なディープネットワークアーキテクチャを開発しました。私たちの方法は、ドメイン固有の表現の精度を維持または向上させながら、高度なパラメータ共有を実現します。また、10 の非常に異なる視覚領域を同時に捕捉する表現の能力を評価し、均一に適切に認識する能力を測定するベンチマークである Visual Decathlon Challenge も紹介します。
51	50	https://papers.nips.cc//paper/6655-dykstras-algorithm-admm-and-coordinate-descent-connections-insights-and-extensions	Dykstra's Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions	ディクストラのアルゴリズム、ADMM、座標降下法: 接続、洞察、拡張機能	We study connections between Dykstra's algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the penalty is a separable sum of support functions, is exactly equivalent to Dykstra's algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate descent. For example, from existing convergence theory on Dykstra's algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections.	凸集合の交点に投影するためのダイクストラのアルゴリズム、乗算器の拡張ラグランジュ法または ADMM、およびブロック座標降下の間の関係を研究します。ペナルティがサポート関数の分離可能な合計である正則化回帰問題の座標降下が、双対問題に適用されるダイクストラのアルゴリズムとまったく同じであることを証明します。双対問題に関する ADMM も、1 つが線形部分空間である 2 つのセットの特殊なケースでは等価であることがわかります。これらの関係は、それ自体が興味深いだけでなく、座標降下を分析および拡張する新しい方法を示唆しています。たとえば、多面体上のダイクストラのアルゴリズムに関する既存の収束理論から、なげなわ問題の座標降下が (漸近的に) 線形速度で収束することがわかります。また、Dykstra 接続と ADMM 接続に基づいて、座標降下法の 2 つの並列バージョンも開発します。
52	51	https://papers.nips.cc//paper/6656-learning-spherical-convolution-for-fast-features-from-360-imagery	Learning Spherical Convolution for Fast Features from 360° Imagery	360° 画像からの高速特徴のための球面畳み込みの学習	While 360° cameras offer tremendous new possibilities in vision, graphics, and augmented reality, the spherical images they produce make core feature extraction non-trivial. Convolutional neural networks (CNNs) trained on images from perspective cameras yield “flat"" filters, yet 360° images cannot be projected to a single plane without significant distortion. A naive solution that repeatedly projects the viewing sphere to all tangent planes is accurate, but much too computationally intensive for real problems. We propose to learn a spherical convolutional network that translates a planar CNN to process 360° imagery directly in its equirectangular projection. Our approach learns to reproduce the flat filter outputs on 360° data, sensitive to the varying distortion effects across the viewing sphere. The key benefits are 1) efficient feature extraction for 360° images and video, and 2) the ability to leverage powerful pre-trained networks researchers have carefully honed (together with massive labeled image training sets) for perspective images. We validate our approach compared to several alternative methods in terms of both raw CNN output accuracy as well as applying a state-of-the-art “flat"" object detector to 360° data. Our method yields the most accurate results while saving orders of magnitude in computation versus the existing exact reprojection solution.	360° カメラは、ビジョン、グラフィックス、拡張現実において大きな新たな可能性をもたらしますが、生成される全天球画像により、中核となる特徴の抽出が簡単ではなくなります。パースペクティブカメラからの画像でトレーニングされた畳み込みニューラルネットワーク (CNN) は「フラット」フィルターを生成しますが、360 度の画像を大きな歪みなしに単一の平面に投影することはできません。すべての接平面に表示球を繰り返し投影する単純なソリューションは正確ですが、実際の問題に対しては計算量が多すぎます。私たちは、平面 CNN を変換して正距円筒図法で 360° 画像を直接処理する球面畳み込みネットワークを学習することを提案します。私たちのアプローチは、表示領域全体にわたるさまざまな歪み効果に敏感な、360° データ上のフラットフィルター出力を再現することを学習します。主な利点は、1) 360° 画像とビデオの効率的な特徴抽出、2) 遠近感のある画像に対して研究者が (大規模なラベル付き画像トレーニングセットと合わせて) 慎重に磨き上げた強力な事前トレーニング済みネットワークを活用できることです。私たちは、生の CNN 出力精度と、最先端の「平面オブジェクト検出器」を 360° データに適用するという両方の観点から、いくつかの代替方法と比較してアプローチを検証します。私たちの方法は、既存の正確な再投影ソリューションと比較して、計算量を大幅に節約しながら、最も正確な結果をもたらします。
53	52	https://papers.nips.cc//paper/6657-marrnet-3d-shape-reconstruction-via-25d-sketches	MarrNet: 3D Shape Reconstruction via 2.5D Sketches	MarrNet: 2.5D スケッチによる 3D 形状の再構築	3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenge for learning-based approaches, as 3D object annotations in real images are scarce. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from the domain adaptation issue when tested on real data. In this work, we propose an end-to-end trainable framework, sequentially estimating 2.5D sketches and 3D object shapes. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image, and to transfer from synthetic to real data. Second, for 3D reconstruction from the 2.5D sketches, we can easily transfer the learned model on synthetic data to real images, as rendered 2.5D sketches are invariant to object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches, making the framework end-to-end trainable on real images, requiring no real-image annotations. Our framework achieves state-of-the-art performance on 3D shape reconstruction.	単一の画像からの 3D オブジェクトの再構成は、十分に解明されていない問題であり、妥当な 3D 形状についての強力な事前知識が必要です。実際の画像には 3D オブジェクトの注釈がほとんどないため、学習ベースのアプローチには課題が生じます。以前の研究では、グラウンドトゥルース 3D 情報を使用して合成データでトレーニングすることを選択しましたが、実際のデータでテストするとドメイン適応の問題が発生しました。この研究では、2.5D スケッチと 3D オブジェクト形状を逐次推定する、エンドツーエンドのトレーニング可能なフレームワークを提案します。当社のほぐされた 2 段階の配合には 3 つの利点があります。まず、完全な 3D 形状と比較して、2.5D スケッチは 2D 画像からの復元や、合成データから実際のデータへの転送がはるかに簡単です。次に、2.5D スケッチからの 3D 再構築の場合、レンダリングされた 2.5D スケッチは照明やテクスチャなどの実画像のオブジェクトの外観の変化に対して不変であるため、合成データで学習したモデルを実際の画像に簡単に転送できます。これにより、さらに負担が軽減されます。ドメイン適応の問題。 3 番目に、3D 形状から 2.5D スケッチへの微分可能な射影関数を導出し、フレームワークを実画像上でエンドツーエンドでトレーニングできるようにし、実画像の注釈を必要としません。私たちのフレームワークは、3D 形状再構築において最先端のパフォーマンスを実現します。
54	53	https://papers.nips.cc//paper/6658-multimodal-learning-and-reasoning-for-visual-question-answering	Multimodal Learning and Reasoning for Visual Question Answering	視覚的な質問応答のためのマルチモーダル学習と推論	Reasoning about entities and their relationships from multimodal data is a key goal of Artificial General Intelligence. The visual question answering (VQA) problem is an excellent way to test such reasoning capabilities of an AI model and its multimodal representation learning. However, the current VQA models are over-simplified deep neural networks, comprised of a long short-term memory (LSTM) unit for question comprehension and a convolutional neural network (CNN) for learning single image representation. We argue that the single visual representation contains a limited and general information about the image contents and thus limits the model reasoning capabilities. In this work we introduce a modular neural network model that learns a multimodal and multifaceted representation of the image and the question. The proposed model learns to use the multimodal representation to reason about the image entities and achieves a new state-of-the-art performance on both VQA benchmark datasets, VQA v1.0 and v2.0, by a wide margin.	マルチモーダルデータからエンティティとその関係について推論することは、汎用人工知能の重要な目標です。ビジュアル質問応答 (VQA) 問題は、AI モデルとそのマルチモーダル表現学習の推論能力をテストする優れた方法です。ただし、現在の VQA モデルは過度に単純化されたディープニューラルネットワークであり、質問理解のための長期短期記憶 (LSTM) ユニットと、単一の画像表現を学習するための畳み込みニューラルネットワーク (CNN) で構成されています。私たちは、単一の視覚的表現には画像の内容に関する限られた一般的な情報が含まれており、したがってモデルの推論能力が制限されていると主張します。この研究では、画像と質問のマルチモーダルかつ多面的な表現を学習するモジュール式ニューラルネットワークモデルを導入します。提案されたモデルは、マルチモーダル表現を使用して画像エンティティについて推論することを学習し、VQA ベンチマークデータセット (VQA v1.0 と v2.0) の両方で大幅な差をつけて新しい最先端のパフォーマンスを達成します。
55	54	https://papers.nips.cc//paper/6659-adversarial-surrogate-losses-for-ordinal-regression	Adversarial Surrogate Losses for Ordinal Regression	順序回帰に対する敵対的代理損失	Ordinal regression seeks class label predictions when the penalty incurred for mistakes increases according to an ordering over the labels. The absolute error is a canonical example. Many existing methods for this task reduce to binary classification problems and employ surrogate losses, such as the hinge loss. We instead derive uniquely defined surrogate ordinal regression loss functions by seeking the predictor that is robust to the worst-case approximations of training data labels, subject to matching certain provided training data statistics. We demonstrate the advantages of our approach over other surrogate losses based on hinge loss approximations using UCI ordinal prediction tasks.	順序回帰では、ラベルの順序に従ってミスに対するペナルティが増加する場合のクラスラベルの予測を求めます。絶対誤差は標準的な例です。このタスクに対する既存の手法の多くは、バイナリ分類問題に帰着し、ヒンジ損失などの代理損失を採用しています。代わりに、提供された特定のトレーニングデータ統計と一致することを条件として、トレーニングデータラベルの最悪の場合の近似に対して堅牢な予測子を求めることにより、独自に定義された代理順序回帰損失関数を導出します。 UCI順序予測タスクを使用したヒンジ損失近似に基づく他の代理損失に対する私たちのアプローチの利点を実証します。
56	55	https://papers.nips.cc//paper/6660-hypothesis-transfer-learning-via-transformation-functions	Hypothesis Transfer Learning via Transformation Functions	変換関数による仮説転移学習	We consider the Hypothesis Transfer Learning (HTL) problem where one incorporates a hypothesis trained on the source domain into the learning procedure of the target domain. Existing theoretical analysis either only studies specific algorithms or only presents upper bounds on the generalization error but not on the excess risk. In this paper, we propose a unified algorithm-dependent framework for HTL through a novel notion of transformation functions, which characterizes the relation between the source and the target domains. We conduct a general risk analysis of this framework and in particular, we show for the first time, if two domains are related, HTL enjoys faster convergence rates of excess risks for Kernel Smoothing and Kernel Ridge Regression than those of the classical non-transfer learning settings. We accompany this framework with an analysis of cross-validation for HTL to search for the best transfer technique and gracefully reduce to non-transfer learning when HTL is not helpful. Experiments on robotics and neural imaging data demonstrate the effectiveness of our framework.	ソースドメインでトレーニングされた仮説をターゲットドメインの学習手順に組み込む、仮説転移学習 (HTL) 問題を検討します。既存の理論分析は、特定のアルゴリズムのみを研究するか、一般化誤差の上限のみを提示し、超過リスクについては提示しません。この論文では、ソースドメインとターゲットドメイン間の関係を特徴付ける変換関数の新しい概念を通じて、HTL のアルゴリズムに依存する統一フレームワークを提案します。私たちはこのフレームワークの一般的なリスク分析を実施し、特に 2 つのドメインが関連している場合、HTL は古典的な非転移学習よりもカーネルスムージングとカーネルリッジ回帰の超過リスクの収束速度が速いことを初めて示しました。設定。このフレームワークには HTL の相互検証の分析が付属しており、最適な転送手法を検索し、HTL が役に立たない場合には非転送学習に適切に削減します。ロボット工学と神経画像データに関する実験は、私たちのフレームワークの有効性を実証しています。
57	56	https://papers.nips.cc//paper/6661-controllable-invariance-through-adversarial-feature-learning	Controllable Invariance through Adversarial Feature Learning	敵対的特徴学習による制御可能な不変性	Learning meaningful representations that maintain the content necessary for a particular task while filtering away detrimental variations is a problem of great interest in machine learning. In this paper, we tackle the problem of learning representations invariant to a specific factor or trait of data. The representation learning process is formulated as an adversarial minimax game. We analyze the optimal equilibrium of such a game and find that it amounts to maximizing the uncertainty of inferring the detrimental factor given the representation while maximizing the certainty of making task-specific predictions. On three benchmark tasks, namely fair and bias-free classification, language-independent generation, and lighting-independent image classification, we show that the proposed framework induces an invariant representation, and leads to better generalization evidenced by the improved performance.	有害な変動をフィルタリングしながら特定のタスクに必要なコンテンツを維持する意味のある表現を学習することは、機械学習において非常に興味深い問題です。この論文では、データの特定の要素または特性に対して不変な表現を学習するという問題に取り組みます。表現学習プロセスは、敵対的なミニマックスゲームとして定式化されます。私たちはそのようなゲームの最適な均衡を分析し、それが与えられた表現から有害な要因を推測する不確実性を最大化すると同時に、タスク固有の予測を行う確実性を最大化することにつながることを発見しました。公平でバイアスのない分類、言語に依存しない生成、および照明に依存しない画像分類という 3 つのベンチマークタスクに関して、提案されたフレームワークが不変表現を誘導し、パフォーマンスの向上によって証明されるより良い一般化につながることを示します。
58	57	https://papers.nips.cc//paper/6662-convergence-analysis-of-two-layer-neural-networks-with-relu-activation	Convergence Analysis of Two-layer Neural Networks with ReLU Activation	ReLU Activation を使用した 2 層ニューラルネットワークの収束解析	In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called ""identity mapping"". We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps. Unlike normal vanilla networks, the ""identity mapping"" makes our network asymmetric and thus the global minimum is unique. To complement our theory, we are also able to show experimentally that multi-layer networks with this mapping have better performance compared with normal vanilla networks. Our convergence theorem differs from traditional non-convex optimization techniques. We show that SGD converges to optimal in ""two phases"": In phase I, the gradient points to the wrong direction, however, a potential function $g$ gradually decreases. Then in phase II, SGD enters a nice one point convex region and converges. We also show that the identity mapping is necessary for convergence, as it moves the initial point to a better place for optimization. Experiment verifies our claims.	近年、確率的勾配降下法 (SGD) ベースの手法がニューラルネットワークをトレーニングするための標準ツールとなっています。ただし、SGD が実際にニューラルネットワークを訓練できる理由についての正式な理論的理解はほとんど不足しています。この論文では、ReLU アクティベーションを備えた 2 層フィードフォワードネットワークの豊富なサブセット上で SGD の収束解析を提供することで、この謎の理解が進みました。このサブセットは、「アイデンティティマッピング」と呼ばれる特別な構造によって特徴付けられます。入力がガウス分布に従う場合、重みの標準 $O(1/\sqrt{d})$ 初期化を使用して、SGD が多項式ステップ数のグローバル最小値に収束することを証明します。通常のバニラネットワークとは異なり、「アイデンティティマッピング」によりネットワークが非対称になるため、グローバル最小値は一意になります。私たちの理論を補完するために、このマッピングを備えた多層ネットワークが通常のバニラネットワークと比較してパフォーマンスが優れていることを実験的に示すこともできました。私たちの収束定理は、従来の非凸最適化手法とは異なります。 SGD が「2 段階」で最適に収束することを示します。段階 I では、勾配は間違った方向を指しますが、ポテンシャル関数 $g$ は徐々に減少します。次にフェーズ II では、SGD は良好な 1 点凸領域に入り、収束します。また、初期点を最適化に適した場所に移動するため、恒等マッピングが収束に必要であることも示します。実験により私たちの主張が検証されます。
59	58	https://papers.nips.cc//paper/6663-doubly-accelerated-stochastic-variance-reduced-dual-averaging-method-for-regularized-empirical-risk-minimization	Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization	二重加速確率的分散低減二重平均法による正規化された経験的リスク最小化	We develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches has become a golden standard in the machine learning community, because the mini-batch techniques stabilize the gradient estimate and can easily make good use of parallel computing. The core of our proposed method is the incorporation of our new ``double acceleration'' technique and variance reduction technique. We theoretically analyze our proposed method and show that our method much improves the mini-batch efficiencies of previous accelerated stochastic methods, and essentially only needs size $\sqrt{n}$ mini-batches for achieving the optimal iteration complexities for both non-strongly and strongly convex objectives, where $n$ is the training set size. Further, we show that even in non-mini-batch settings, our method achieves the best known convergence rate for non-strongly convex and strongly convex objectives.	ミニバッチ設定における凸正則化経験的リスク最小化問題を効率的に解くための新しい加速確率的勾配法を開発します。ミニバッチ技術は勾配推定を安定させ、並列コンピューティングを簡単に活用できるため、ミニバッチの使用は機械学習コミュニティの黄金標準となっています。私たちが提案する手法の中核は、新しい「二重加速」手法と分散低減手法を組み込んだことです。私たちは提案した方法を理論的に分析し、私たちの方法が以前の加速確率的方法のミニバッチ効率を大幅に改善し、本質的に非強力な両方の最適な反復複雑さを達成するためにサイズ $\sqrt{n}$ のミニバッチのみを必要とすることを示します。 $n$ はトレーニングセットのサイズです。さらに、非ミニバッチ設定でも、私たちの方法は、非強凸および強凸の目的に対して最もよく知られている収束率を達成することを示します。
60	59	https://papers.nips.cc//paper/6664-langevin-dynamics-with-continuous-tempering-for-training-deep-neural-networks	Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks	ディープニューラルネットワークをトレーニングするための連続テンパリングを使用したランジュバンダイナミクス	Minimizing non-convex and high-dimensional objective functions is challenging, especially when training modern deep neural networks. In this paper, a novel approach is proposed which divides the training process into two consecutive phases to obtain better generalization performance: Bayesian sampling and stochastic optimization. The first phase is to explore the energy landscape and to capture the `fat'' modes; and the second one is to fine-tune the parameter learned from the first phase. In the Bayesian learning phase, we apply continuous tempering and stochastic approximation into the Langevin dynamics to create an efficient and effective sampler, in which the temperature is adjusted automatically according to the designed ``temperature dynamics''. These strategies can overcome the challenge of early trapping into bad local minima and have achieved remarkable improvements in various types of neural networks as shown in our theoretical analysis and empirical experiments.	非凸および高次元の目的関数を最小限に抑えることは、特に最新のディープニューラルネットワークをトレーニングする場合には困難です。この論文では、トレーニングプロセスを 2 つの連続するフェーズに分割して、より優れた汎化パフォーマンスを得る新しいアプローチ、つまりベイジアンサンプリングと確率的最適化を提案します。最初の段階は、エネルギーの状況を調査し、「脂肪」モードを捉えることです。 2 つ目は、最初のフェーズで学習したパラメータを微調整することです。ベイジアン学習フェーズでは、連続テンパリングと確率的近似をランジュバンダイナミクスに適用して、設計された「温度ダイナミクス」に従って温度が自動的に調整される、効率的で効果的なサンプラーを作成します。これらの戦略は、悪い極小値に早期にトラップされるという課題を克服することができ、理論分析と実証実験で示されているように、さまざまなタイプのニューラルネットワークで顕著な改善を達成しました。
61	60	https://papers.nips.cc//paper/6665-efficient-online-linear-optimization-with-approximation-algorithms	Efficient Online Linear Optimization with Approximation Algorithms	近似アルゴリズムを使用した効率的なオンライン線形最適化	We revisit the problem of Online Linear Optimization in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $\alpha$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied offline linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the $\alpha$-regret which is the natural extension of the standard regret in online learning to this setting. We present new algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem. Mainly, for both variants, we present $\alpha$-regret bounds of $O(T^{-1/3})$, were $T$ is the number of prediction rounds, using only $O(\log(T))$ calls to the approximation oracle per iteration, on average. These are the first results to obtain both average oracle complexity of $O(\log(T))$ (or even poly-logarithmic in $T$) and $\alpha$-regret bound $O(T^{-c})$ for a positive constant $c$, for both variants.	係数 $\alpha$ の乗法近似保証を備えた近似線形最適化オラクルを通じて実行可能なアクションのセットにアクセスできる場合に備えて、オンライン線形最適化の問題を再検討します。この設定は、NP 困難でありながら効率的な近似アルゴリズムを可能にする、よく研究されたオフライン線形最適化問題の自然なオンライン拡張を捉えるため、特に興味深いものです。ここでの目標は、オンライン学習における標準的な後悔をこの設定に自然に拡張した $\alpha$-regret を最小限に抑えることです。私たちは、問題の完全な情報と盗賊の変種の両方に対して、オラクルの複雑さが大幅に改善された新しいアルゴリズムを提示します。主に、両方のバリアントについて、$O(T^{-1/3})$ の $\alpha$-regret 境界を提示します。$T$ は予測ラウンド数であり、$O(\log(T ))$ は、平均して反復ごとに近似オラクルを呼び出します。これらは、平均オラクル複雑度 $O(\log(T))$ (または $T$ の多対数) と $\alpha$-regret バウンド $O(T^{-c}) の両方を取得した最初の結果です。 )$ は、正の定数 $c$ の場合、両方のバリアントで使用されます。
62	61	https://papers.nips.cc//paper/6666-geometric-descent-method-for-convex-composite-minimization	Geometric Descent Method for Convex Composite Minimization	凸複合最小化のための幾何降下法	In this paper, we extend the geometric descent method recently proposed by Bubeck, Lee and Singh to tackle nonsmooth and strongly convex composite problems. We prove that our proposed algorithm, dubbed geometric proximal gradient method (GeoPG), converges with a linear rate $(1-1/\sqrt{\kappa})$ and thus achieves the optimal rate among first-order methods, where $\kappa$ is the condition number of the problem. Numerical results on linear regression and logistic regression with elastic net regularization show that GeoPG compares favorably with Nesterov's accelerated proximal gradient method, especially when the problem is ill-conditioned.	この論文では、Bubeck、Lee、Singh によって最近提案された幾何学的降下法を拡張して、非滑らかで凸面の強い複合問題に取り組みます。幾何近接勾配法 (GeoPG) と呼ばれる私たちの提案したアルゴリズムが線形速度 $(1-1/\sqrt{\kappa})$ で収束し、したがって一次法の中で最適な速度を達成することを証明します。 kappa$ は問題の条件番号です。弾性ネット正則化を使用した線形回帰およびロジスティック回帰の数値結果は、特に問題の条件が悪い場合に、GeoPG が Nesterov の加速近位勾配法と比べて優れていることを示しています。
63	62	https://papers.nips.cc//paper/6667-diffusion-approximations-for-online-principal-component-estimation-and-global-convergence	Diffusion Approximations for Online Principal Component Estimation and Global Convergence	オンライン主成分推定と大域収束のための拡散近似	In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for PCA under the additional assumption of bounded samples.	この論文では、主成分分析のためのオンライン確率的勾配法である Oja の反復のダイナミクスを研究するために拡散近似ツールを採用することを提案します。 Oja の反復では、ストリーミングデータからの真の主成分の実行推定値が維持され、時間的および空間的な複雑さが軽減されます。最上位固有ベクトルに対する Oja の反復により、単位球上で連続状態の離散時間マルコフ連鎖が生成されることを示します。拡散近似と弱い収束ツールを使用して、Oja の反復を 3 つのフェーズで特徴付けます。我々の三相解析はさらに、実行推定値の有限サンプル誤差限界を提供します。これは、有限サンプルの追加仮定の下で PCA のミニマックス情報の下限と一致します。
64	63	https://papers.nips.cc//paper/6668-avoiding-discrimination-through-causal-reasoning	Avoiding Discrimination through Causal Reasoning	因果推論による差別の回避	Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively. Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from ""What is the right fairness criterion?"" to ""What do we want to assume about our model of the causal data generating process?"" Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.	機械学習の公平性に関する最近の研究は、さまざまな統計的識別基準とそれらのトレードオフに焦点を当てています。これらの基準のほとんどは観察的なものであり、予測子、保護された属性、特徴、結果の結合分布のみに依存します。観察基準には作業には便利ですが、公平性の問題を最終的に解決することを妨げる重大な固有の制限があります。観察基準を超えて、保護された属性に基づく差別の問題を因果推論の言語で組み立てます。この視点は、「正しい公平性の基準は何か?」から「因果関係のデータ生成プロセスのモデルについて何を想定したいのか?」に注意を移し、因果関係のレンズを通して、私たちはいくつかの貢献をします。まず、観察基準が満たされない理由と時期を明確に明確にし、意見の問題以前の内容を形式化します。第二に、私たちのアプローチは、これまで無視されてきた微妙な点と、それが問題の根本である理由を明らかにします。最後に、自然因果的無差別基準を提案し、それを満たすアルゴリズムを開発します。
65	64	https://papers.nips.cc//paper/6669-nonparametric-online-regression-while-learning-the-metric	Nonparametric Online Regression while Learning the Metric	メトリック学習中のノンパラメトリックオンライン回帰	We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother. Our algorithm learns the Mahalanobis metric based on the gradient outer product matrix $\boldsymbol{G}$ of the regression function (automatically adapting to the effective rank of this matrix), while simultaneously bounding the regret ---on the same data sequence--- in terms of the spectrum of $\boldsymbol{G}$. As a preliminary step in our analysis, we extend a nonparametric online learning algorithm by Hazan and Megiddo enabling it to compete against functions whose Lipschitzness is measured with respect to an arbitrary Mahalanobis metric.	私たちは、回帰関数がより滑らかになる方向を学習するオンラインノンパラメトリック回帰のアルゴリズムを研究します。私たちのアルゴリズムは、回帰関数の勾配外積行列 $\boldsymbol{G}$ に基づいてマハラノビス計量を学習し (この行列の有効ランクに自動的に適応します)、同時に後悔を同じデータシーケンスに制限します。 -- $\boldsymbol{G}$ のスペクトルに関して。分析の予備ステップとして、Hazan と Megiddo によるノンパラメトリックオンライン学習アルゴリズムを拡張し、任意のマハラノビス計量に関してリプシッツネスが測定される関数と競合できるようにします。
66	65	https://papers.nips.cc//paper/6670-recycling-privileged-learning-and-distribution-matching-for-fairness	Recycling Privileged Learning and Distribution Matching for Fairness	公平性のための特権学習と分配マッチングのリサイクル	Equipping machine learning models with ethical and legal constraints is a serious issue; without this, the future of machine learning is at risk. This paper takes a step forward in this direction and focuses on ensuring machine learning models deliver fair decisions. In legal scholarships, the notion of fairness itself is evolving and multi-faceted. We set an overarching goal to develop a unified machine learning framework that is able to handle any definitions of fairness, their combinations, and also new definitions that might be stipulated in the future. To achieve our goal, we recycle two well-established machine learning techniques, privileged learning and distribution matching, and harmonize them for satisfying multi-faceted fairness definitions. We consider protected characteristics such as race and gender as privileged information that is available at training but not at test time; this accelerates model training and delivers fairness through unawareness. Further, we cast demographic parity, equalized odds, and equality of opportunity as a classical two-sample problem of conditional distributions, which can be solved in a general form by using distance measures in Hilbert Space. We show several existing models are special cases of ours. Finally, we advocate returning the Pareto frontier of multi-objective minimization of error and unfairness in predictions. This will facilitate decision makers to select an operating point and to be accountable for it.	機械学習モデルに倫理的および法的制約を与えることは深刻な問題です。これがなければ、機械学習の将来は危険にさらされます。このペーパーでは、この方向に一歩前進し、機械学習モデルが公正な意思決定を確実に行えるようにすることに焦点を当てています。法学においては、公平性の概念自体が進化し、多面的になっています。私たちは、公平性のあらゆる定義、その組み合わせ、さらには将来規定される可能性のある新しい定義を処理できる統合機械学習フレームワークを開発するという包括的な目標を設定しました。私たちの目標を達成するために、私たちは 2 つの確立された機械学習技術、特権学習と分散マッチングを再利用し、多面的な公平性の定義を満たすためにそれらを調和させます。当社では、人種や性別などの保護された特性は、トレーニング時には利用可能ですがテスト時には利用できない特権的な情報であると考えています。これにより、モデルのトレーニングが加速され、無意識によって公平性が実現されます。さらに、人口統計上のパリティ、均等化されたオッズ、機会の平等を、条件付き分布の古典的な 2 サンプル問題としてキャストします。これは、ヒルベルト空間の距離測度を使用することで一般的な形式で解くことができます。いくつかの既存モデルが当社の特別なケースであることを示します。最後に、予測における誤差と不公平性を多目的に最小化するパレートフロンティアに戻すことを主張します。これにより、意思決定者が動作点を選択し、それに対して責任を負いやすくなります。
67	66	https://papers.nips.cc//paper/6671-safe-and-nested-subgame-solving-for-imperfect-information-games	Safe and Nested Subgame Solving for Imperfect-Information Games	不完全情報ゲームの安全なネストされたサブゲーム解決	In imperfect-information games, the optimal strategy in a subgame may depend on the strategy in other, unreached subgames. Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games. Nevertheless, it is possible to first approximate a solution for the whole game and then improve it by solving individual subgames. This is referred to as subgame solving. We introduce subgame-solving techniques that outperform prior methods both in theory and practice. We also show how to adapt them, and past subgame-solving techniques, to respond to opponent actions that are outside the original action abstraction; this significantly outperforms the prior state-of-the-art approach, action translation. Finally, we show that subgame solving can be repeated as the game progresses down the game tree, leading to far lower exploitability. These techniques were a key component of Libratus, the first AI to defeat top humans in heads-up no-limit Texas hold'em poker.	不完全情報ゲームでは、サブゲームの最適な戦略は、他の到達されていないサブゲームの戦略に依存する可能性があります。したがって、完全情報ゲームとは異なり、サブゲームを単独で解決することはできず、ゲーム全体の戦略を全体として考慮する必要があります。それにもかかわらず、最初にゲーム全体の解決策を近似し、次に個々のサブゲームを解決することでそれを改善することは可能です。これはサブゲーム解決と呼ばれます。理論的にも実践的にも従来の方法よりも優れたサブゲーム解決テクニックを紹介します。また、元のアクション抽象化の範囲外にある敵のアクションに対応するために、それらを適応させる方法と、過去のサブゲーム解決テクニックも示します。これは、従来の最先端のアプローチであるアクション変換を大幅に上回ります。最後に、ゲームがゲームツリーの下位に進むにつれてサブゲームの解決が繰り返され、悪用可能性が大幅に低下する可能性があることを示します。これらのテクニックは、ヘッズアップノーリミットテキサスホールデムポーカーでトップの人間を破った最初の AI である Libratus の重要なコンポーネントでした。
68	67	https://papers.nips.cc//paper/6672-unsupervised-image-to-image-translation-networks	Unsupervised Image-to-Image Translation Networks	教師なし画像間変換ネットワーク	Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit.	教師なし画像間変換は、個々のドメインの周辺分布からの画像を使用して、異なるドメインの画像の結合分布を学習することを目的としています。与えられた周辺分布に到達できる結合分布の無限のセットが存在するため、追加の仮定がなければ周辺分布から結合分布について何も推測できません。この問題に対処するために、共有潜在空間を仮定し、結合 GAN に基づく教師なし画像間変換フレームワークを提案します。提案されたフレームワークを競合するアプローチと比較し、街路画像翻訳、動物画像翻訳、顔画像翻訳など、さまざまな困難な教師なし画像翻訳タスクにおける高品質の画像翻訳結果を提示します。また、提案されたフレームワークをドメイン適応に適用し、ベンチマークデータセットで最先端のパフォーマンスを実現します。コードと追加の結果は https://github.com/mingyuliutw/unit で入手できます。
69	68	https://papers.nips.cc//paper/6673-coded-distributed-computing-for-inverse-problems	Coded Distributed Computing for Inverse Problems	逆問題のためのコード化分散コンピューティング	Computationally intensive distributed and parallel computing is often bottlenecked by a small set of slow workers known as stragglers. In this paper, we utilize the emerging idea of ``coded computation'' to design a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers. Example machine-learning applications include inverse problems such as personalized PageRank and sampling on graphs. We provably show that our coded-computation technique can reduce the mean-squared error under a computational deadline constraint. In fact, the ratio of mean-squared error of replication-based and coded techniques diverges to infinity as the deadline increases. Our experiments for personalized PageRank performed on real systems and real social networks show that this ratio can be as large as $10^4$. Further, unlike coded-computation techniques proposed thus far, our strategy combines outputs of all workers, including the stragglers, to produce more accurate estimates at the computational deadline. This also ensures that the accuracy degrades ``gracefully'' in the event that the number of stragglers is large.	計算量が多い分散並列コンピューティングは、ストラグラーと呼ばれる少数の遅いワーカーによってボトルネックになることがよくあります。この論文では、「コード化計算」という新たなアイデアを利用して、ストラグラーの影響を受ける並列実装で特定の反復法の下で線形逆問題を解くための、誤り訂正コードにヒントを得た新しい手法を設計します。機械学習アプリケーションの例には、パーソナライズされた PageRank やグラフのサンプリングなどの逆問題が含まれます。私たちのコード化計算手法が、計算期限の制約の下で平均二乗誤差を削減できることを証明します。実際、レプリケーションベースの技術とコード化された技術の平均二乗誤差の比率は、期限が長くなるにつれて無限大に広がります。実際のシステムと実際のソーシャルネットワークで実行されたパーソナライズされた PageRank の実験では、この比率が $10^4$ に達する可能性があることが示されています。さらに、これまでに提案されたコード化された計算手法とは異なり、私たちの戦略は、落伍者を含むすべてのワーカーの出力を組み合わせて、計算期限までにより正確な推定値を生成します。これにより、落伍者の数が多い場合に精度が「緩やかに」低下することも保証されます。
70	69	https://papers.nips.cc//paper/6674-a-screening-rule-for-l1-regularized-ising-model-estimation	A Screening Rule for l1-Regularized Ising Model Estimation	l1 正則化イジングモデル推定のスクリーニングルール	We discover a screening rule for l1-regularized Ising model estimation. The simple closed-form screening rule is a necessary and sufficient condition for exactly recovering the blockwise structure of a solution under any given regularization parameters. With enough sparsity, the screening rule can be combined with various optimization procedures to deliver solutions efficiently in practice. The screening rule is especially suitable for large-scale exploratory data analysis, where the number of variables in the dataset can be thousands while we are only interested in the relationship among a handful of variables within moderate-size clusters for interpretability. Experimental results on various datasets demonstrate the efficiency and insights gained from the introduction of the screening rule.	l1 正則化イジングモデル推定のスクリーニング規則を発見しました。単純な閉じた形式のスクリーニング規則は、与えられた正則化パラメーターの下で解のブロックごとの構造を正確に回復するための必要十分条件です。十分なスパース性があれば、スクリーニングルールをさまざまな最適化手順と組み合わせて、実際にソリューションを効率的に提供できます。スクリーニングルールは、データセット内の変数の数が数千にもなる可能性がある大規模な探索的データ分析に特に適していますが、解釈可能性のために中程度のサイズのクラスター内の少数の変数間の関係のみに関心があります。さまざまなデータセットの実験結果は、スクリーニングルールの導入から得られる効率と洞察を示しています。
71	70	https://papers.nips.cc//paper/6675-improved-dynamic-regret-for-non-degenerate-functions	Improved Dynamic Regret for Non-degenerate Functions	非縮退関数の動的リグレットの改善	Recently, there has been a growing research interest in the analysis of dynamic regret, which measures the performance of an online learner against a sequence of local minimizers. By exploiting the strong convexity, previous studies have shown that the dynamic regret can be upper bounded by the path-length of the comparator sequence. In this paper, we illustrate that the dynamic regret can be further improved by allowing the learner to query the gradient of the function multiple times, and meanwhile the strong convexity can be weakened to other non-degenerate conditions. Specifically, we introduce the squared path-length, which could be much smaller than the path-length, as a new regularity of the comparator sequence. When multiple gradients are accessible to the learner, we first demonstrate that the dynamic regret of strongly convex functions can be upper bounded by the minimum of the path-length and the squared path-length. We then extend our theoretical guarantee to functions that are semi-strongly convex or self-concordant. To the best of our knowledge, this is the first time that semi-strong convexity and self-concordance are utilized to tighten the dynamic regret.	最近、一連のローカルミニマイザーに対するオンライン学習者のパフォーマンスを測定する動的後悔の分析に対する研究の関心が高まっています。強い凸面性を利用することにより、これまでの研究では、動的なリグレスの上限がコンパレータシーケンスのパス長によって制限される可能性があることが示されています。この論文では、学習者が関数の勾配を複数回クエリできるようにすることで動的なリグレスがさらに改善され、同時に強い凸性が他の非縮退条件に弱められることを示します。具体的には、比較器シーケンスの新しい規則性として、パス長よりもはるかに小さい可能性がある 2 乗パス長を導入します。学習者が複数の勾配にアクセスできる場合、最初に、強凸関数の動的リグレスが経路長の最小値と経路長の 2 乗によって上限が制限される可能性があることを示します。次に、理論的保証を、半強凸関数または自己一致関数に拡張します。私たちの知る限り、準強凸と自己一致を利用して動的後悔を締めるのはこれが初めてです。
72	71	https://papers.nips.cc//paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation	Learning Efficient Object Detection Models with Knowledge Distillation	知識の蒸留による効率的な物体検出モデルの学習	Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much reduced accuracy. In this work, we propose a new framework to learn compact and fast ob- ject detection networks with improved accuracy using knowledge distillation [20] and hint learning [34]. Although knowledge distillation has demonstrated excellent improvements for simpler classification setups, the complexity of detection poses new challenges in the form of regression, region proposals and less voluminous la- bels. We address this through several innovations such as a weighted cross-entropy loss to address class imbalance, a teacher bounded loss to handle the regression component and adaptation layers to better learn from intermediate teacher distribu- tions. We conduct comprehensive empirical evaluation with different distillation configurations over multiple datasets including PASCAL, KITTI, ILSVRC and MS-COCO. Our results show consistent improvement in accuracy-speed trade-offs for modern multi-class detection models.	畳み込みニューラルネットワーク (CNN) ベースの物体検出器の精度は大幅に向上しましたが、リアルタイムアプリケーションの画像を処理するには法外なランタイムが必要になることがよくあります。最先端のモデルでは、多くの浮動小数点演算を伴う非常に深いネットワークが使用されることがよくあります。モデル圧縮などの取り組みでは、パラメーターの数が少ないコンパクトなモデルを学習しますが、精度は大幅に低下します。この研究では、知識蒸留 [20] とヒント学習 [34] を使用して、精度を向上させたコンパクトで高速な物体検出ネットワークを学習するための新しいフレームワークを提案します。知識の蒸留により、より単純な分類設定に対する優れた改善が実証されましたが、検出の複雑さにより、回帰、領域の提案、およびボリュームの少ないラベルという形で新たな課題が生じています。私たちは、クラスの不均衡に対処するための重み付きクロスエントロピー損失、回帰コンポーネントを処理するための教師限定損失、中間の教師分布からより適切に学習するための適応層など、いくつかの革新を通じてこの問題に対処します。 PASCAL、KITTI、ILSVRC、MS-COCO などの複数のデータセットにわたって、さまざまな蒸留構成を使用して包括的な実証評価を実施します。私たちの結果は、最新のマルチクラス検出モデルの精度と速度のトレードオフが一貫して向上していることを示しています。
73	72	https://papers.nips.cc//paper/6677-one-sided-unsupervised-domain-mapping	One-Sided Unsupervised Domain Mapping	片側の監視されていないドメインマッピング	In unsupervised domain mapping, the learner is given two unmatched datasets $A$ and $B$. The goal is to learn a mapping $G_{AB}$ that translates a sample in $A$ to the analog sample in $B$. Recent approaches have shown that when learning simultaneously both $G_{AB}$ and the inverse mapping $G_{BA}$, convincing mappings are obtained. In this work, we present a method of learning $G_{AB}$ without learning $G_{BA}$. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint. Our entire code is made publicly available at~\url{https://github.com/sagiebenaim/DistanceGAN}.	教師なしドメインマッピングでは、学習者には 2 つの一致しないデータセット $A$ と $B$ が与えられます。目標は、$A$ のサンプルを $B$ のアナログサンプルに変換するマッピング $G_{AB}$ を学習することです。最近のアプローチでは、$G_{AB}$ と逆写像 $G_{BA}$ の両方を同時に学習すると、説得力のある写像が得られることが示されています。この研究では、$G_{BA}$ を学習せずに $G_{AB}$ を学習する方法を紹介します。これは、サンプルのペア間の距離を維持するマッピングを学習することによって行われます。さらに、マッピングの前後で同じサンプルの異なる部分間の距離を維持することによっても、良好なマッピングが得られます。我々は、新しい方法が片側マッピング学習を可能にするだけでなく、既存の円形性ベースの制約よりも好ましい数値結果をもたらすという実験結果を提示します。私たちのコード全体は、~\url{https://github.com/sagiebenaim/DistanceGAN} で公開されています。
74	73	https://papers.nips.cc//paper/6678-deep-mean-shift-priors-for-image-restoration	Deep Mean-Shift Priors for Image Restoration	画像復元のための深い平均シフト事前分布	In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In addition, we learn the mean-shift vector field using denoising autoencoders, and use it in a gradient descent approach to perform Bayes risk minimization. We demonstrate competitive results for noise-blind deblurring, super-resolution, and demosaicing.	この論文では、自然画像分布のガウス平滑化バージョンを直接表す自然画像事前分布を導入します。画像復元の定式化にベイズ推定器として事前の情報を含めます。これにより、ノイズブラインド画像復元問題も解決できます。事前分布の勾配が自然画像分布の平均シフトベクトルに対応することを示します。さらに、ノイズ除去オートエンコーダーを使用して平均シフトベクトル場を学習し、それを勾配降下法で使用してベイズリスク最小化を実行します。当社は、ノイズブラインドブラー除去、超解像度、デモザイク処理において優れた結果を示しています。
75	74	https://papers.nips.cc//paper/6679-greedy-algorithms-for-cone-constrained-optimization-with-convergence-guarantees	Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees	収束保証を備えたコーン制約最適化のための貪欲なアルゴリズム	Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e^{-t})) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.	Matching Pursuit (MP) アルゴリズムや Frank-Wolfe (FW) アルゴリズムなどの貪欲な最適化手法は、そのシンプルさ、有効性、理論的保証により、近年再び人気を取り戻しています。 MP と FW は、それぞれ原子セットの線形スパンと凸包にわたる最適化に取り組みます。この論文では、一般的な原子セットの円錐包としてパラメータ化された凸円錐上の最適化の中間ケースを検討します。これにより、非負 MP アルゴリズムの最初の原理的な定義が得られ、それに対して明示的な収束率が与えられ、優れた経験的結果が実証されます。パフォーマンス。特に、一般的な滑らかな凸対物レンズでの準線形収束 (O(1/t)) と、強い凸対物レンズでの線形収束 (O(e^{-t})) を、どちらの場合も一般的な原子セットに対して導出します。さらに、MP および FW の文献からの既知のアルゴリズムに対するアルゴリズムの明確な対応関係を確立します。私たちの新しいアルゴリズムと分析は、一般的な原子セットと一般的な目的関数を対象としているため、さまざまな学習設定に直接適用できます。
76	75	https://papers.nips.cc//paper/6680-a-new-theory-for-matrix-completion	A New Theory for Matrix Completion	行列補完のための新しい理論	Prevalent matrix completion theories reply on an assumption that the locations of the missing data are distributed uniformly and randomly (i.e., uniform sampling). Nevertheless, the reason for observations being missing often depends on the unseen observations themselves, and thus the missing data in practice usually occurs in a nonuniform and deterministic fashion rather than randomly. To break through the limits of random sampling, this paper introduces a new hypothesis called \emph{isomeric condition}, which is provably weaker than the assumption of uniform sampling and arguably holds even when the missing data is placed irregularly. Equipped with this new tool, we prove a series of theorems for missing data recovery and matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used nonconvex programs. Unlike the existing theories for nonconvex matrix completion, which are built upon the same condition as convex programs, our theory shows that nonconvex programs have the potential to work with a much weaker condition. Comparing to the existing studies on nonuniform sampling, our setup is more general.	一般的な行列補完理論は、欠損データの位置が一様かつランダムに分布している (つまり、一様なサンプリング) という仮定に基づいて答えています。それにもかかわらず、観測値が欠落する理由は、目に見えない観測値自体に依存することが多いため、実際のデータの欠落は通常、ランダムではなく不均一で決定的な形で発生します。ランダムサンプリングの限界を突破するために、この論文では \emph{異性条件} と呼ばれる新しい仮説を導入しました。この仮説は、一様サンプリングの仮定よりも弱いことが証明されており、欠損データが不規則に配置されている場合でもおそらく成り立ちます。この新しいツールを使用して、欠落データの回復と行列の補完に関する一連の定理を証明します。特に、ターゲット行列を特定する正確な解が、一般的に使用される非凸プログラムによって臨界点として含まれていることを証明します。凸プログラムと同じ条件に基づいて構築される非凸行列補完に関する既存の理論とは異なり、私たちの理論は、非凸プログラムがはるかに弱い条件でも機能する可能性があることを示しています。不均一なサンプリングに関する既存の研究と比較して、私たちの設定はより一般的です。
77	76	https://papers.nips.cc//paper/6681-robust-hypothesis-test-for-nonlinear-effect-with-gaussian-processes	Robust Hypothesis Test for Nonlinear Effect with Gaussian Processes	ガウス過程を使用した非線形効果の堅牢な仮説検定	This work constructs a hypothesis test for detecting whether an data-generating function $h: \real^p \rightarrow \real$ belongs to a specific reproducing kernel Hilbert space $\mathcal{H}_0$, where the structure of $\mathcal{H}_0$ is only partially known. Utilizing the theory of reproducing kernels, we reduce this hypothesis to a simple one-sided score test for a scalar parameter, develop a testing procedure that is robust against the mis-specification of kernel functions, and also propose an ensemble-based estimator for the null model to guarantee test performance in small samples. To demonstrate the utility of the proposed method, we apply our test to the problem of detecting nonlinear interaction between groups of continuous features. We evaluate the finite-sample performance of our test under different data-generating functions and estimation strategies for the null model. Our results revealed interesting connection between notions in machine learning (model underfit/overfit) and those in statistical inference (i.e. Type I error/power of hypothesis test), and also highlighted unexpected consequences of common model estimating strategies (e.g. estimating kernel hyperparameters using maximum likelihood estimation) on model inference.	この研究は、データ生成関数 $h: \real^p \rightarrow \real$ が特定の再現カーネルヒルベルト空間 $\mathcal{H}_0$ に属するかどうかを検出するための仮説検定を構築します。ここで、$\mathcal の構造は{H}_0$ は部分的にしか知られていません。カーネルの再現理論を利用して、この仮説をスカラーパラメーターの単純な片側スコアテストに還元し、カーネル関数の指定ミスに対して堅牢なテスト手順を開発し、また、カーネル関数のアンサンブルベースの推定器を提案します。 null モデルを使用して、小さなサンプルでのテストのパフォーマンスを保証します。提案された方法の有用性を実証するために、連続特徴のグループ間の非線形相互作用を検出する問題にテストを適用します。さまざまなデータ生成関数とヌルモデルの推定戦略の下で、テストの有限サンプルのパフォーマンスを評価します。私たちの結果は、機械学習の概念 (モデルのアンダーフィット/オーバーフィット) と統計的推論の概念 (つまり、タイプ I エラー/仮説検定の検出力) の間の興味深い関係を明らかにし、また、一般的なモデル推定戦略 (例: 最大値を使用したカーネルハイパーパラメーターの推定) の予期せぬ結果も強調しました。尤度推定) をモデル推論に適用します。
78	77	https://papers.nips.cc//paper/6682-lower-bounds-on-the-robustness-to-adversarial-perturbations	Lower bounds on the robustness to adversarial perturbations	敵対的な摂動に対するロバスト性の下限	The input-output mappings learned by state-of-the-art neural networks are significantly discontinuous. It is possible to cause a neural network used for image recognition to misclassify its input by applying very specific, hardly perceptible perturbations to the input, called adversarial perturbations. Many hypotheses have been proposed to explain the existence of these peculiar samples as well as several methods to mitigate them. A proven explanation remains elusive, however. In this work, we take steps towards a formal characterization of adversarial perturbations by deriving lower bounds on the magnitudes of perturbations necessary to change the classification of neural networks. The bounds are experimentally verified on the MNIST and CIFAR-10 data sets.	最先端のニューラルネットワークによって学習される入出力マッピングは大幅に不連続です。敵対的摂動と呼ばれる、非常に特殊でほとんど知覚できない摂動を入力に適用することにより、画像認識に使用されるニューラルネットワークが入力を誤分類する可能性があります。これらの特異なサンプルの存在を説明するために、多くの仮説と、それらを軽減するためのいくつかの方法が提案されています。しかし、証明された説明は依然としてわかりにくい。この研究では、ニューラルネットワークの分類を変更するために必要な摂動の大きさの下限を導出することで、敵対的な摂動の正式な特徴付けに向けた一歩を踏み出します。境界は、MNIST および CIFAR-10 データセットで実験的に検証されています。
79	78	https://papers.nips.cc//paper/6683-minimizing-a-submodular-function-from-samples	Minimizing a Submodular Function from Samples	サンプルからのサブモジュール関数の最小化	In this paper we consider the problem of minimizing a submodular function from training data. Submodular functions can be efficiently minimized and are conse- quently heavily applied in machine learning. There are many cases, however, in which we do not know the function we aim to optimize, but rather have access to training data that is used to learn the function. In this paper we consider the question of whether submodular functions can be minimized in such cases. We show that even learnable submodular functions cannot be minimized within any non-trivial approximation when given access to polynomially-many samples. Specifically, we show that there is a class of submodular functions with range in [0, 1] such that, despite being PAC-learnable and minimizable in polynomial-time, no algorithm can obtain an approximation strictly better than 1/2 － o(1) using polynomially-many samples drawn from any distribution. Furthermore, we show that this bound is tight using a trivial algorithm that obtains an approximation of 1/2.	この論文では、トレーニングデータから部分モジュール関数を最小化する問題を検討します。サブモジュール関数は効率的に最小化できるため、機械学習に頻繁に適用されます。ただし、最適化しようとしている関数が分からないが、その関数を学習するために使用されるトレーニングデータにアクセスできる場合も多くあります。この論文では、そのような場合にサブモジュール関数を最小化できるかどうかの問題を検討します。多項式的に多くのサンプルへのアクセスが与えられた場合、学習可能な部分モジュラー関数であっても、自明でない近似内で最小化することはできないことを示します。具体的には、範囲が [0, 1] の部分モジュラー関数のクラスがあり、PAC で学習可能で多項式時間で最小化できるにもかかわらず、厳密には 1/2 － o( 1) 任意の分布から抽出された多項式の多数のサンプルを使用します。さらに、1/2 の近似値を得る簡単なアルゴリズムを使用して、この境界が狭いことを示します。
80	79	https://papers.nips.cc//paper/6684-introspective-classification-with-convolutional-nets	Introspective Classification with Convolutional Nets	畳み込みネットによる内省的分類	We propose introspective convolutional networks (ICN) that emphasize the importance of having convolutional neural networks empowered with generative capabilities. We employ a reclassification-by-synthesis algorithm to perform training using a formulation stemmed from the Bayes theory. Our ICN tries to iteratively: (1) synthesize pseudo-negative samples; and (2) enhance itself by improving the classification. The single CNN classifier learned is at the same time generative --- being able to directly synthesize new samples within its own discriminative model. We conduct experiments on benchmark datasets including MNIST, CIFAR-10, and SVHN using state-of-the-art CNN architectures, and observe improved classification results.	私たちは、畳み込みニューラルネットワークに生成機能を持たせることの重要性を強調する内省的畳み込みネットワーク (ICN) を提案します。合成による再分類アルゴリズムを採用し、ベイズ理論に由来する定式化を使用してトレーニングを実行します。私たちの ICN は、次のことを反復的に試みます。(1) 擬似陰性サンプルを合成します。 (2) 分類を改善することで自身を強化します。学習された単一の CNN 分類器は同時に生成的であり、独自の識別モデル内で新しいサンプルを直接合成できます。私たちは、最先端の CNN アーキテクチャを使用して、MNIST、CIFAR-10、SVHN などのベンチマークデータセットに対して実験を実施し、分類結果の向上を観察しています。
81	80	https://papers.nips.cc//paper/6685-label-distribution-learning-forests	Label Distribution Learning Forests	ラベル配布ラーニングフォレスト	Label distribution learning (LDL) is a general learning framework, which assigns to an instance a distribution over a set of labels rather than a single label or multiple labels. Current LDL methods have either restricted assumptions on the expression form of the label distribution or limitations in representation learning, e.g., to learn deep features in an end-to-end manner. This paper presents label distribution learning forests (LDLFs) - a novel label distribution learning algorithm based on differentiable decision trees, which have several advantages: 1) Decision trees have the potential to model any general form of label distributions by a mixture of leaf node predictions. 2) The learning of differentiable decision trees can be combined with representation learning. We define a distribution-based loss function for a forest, enabling all the trees to be learned jointly, and show that an update function for leaf node predictions, which guarantees a strict decrease of the loss function, can be derived by variational bounding. The effectiveness of the proposed LDLFs is verified on several LDL tasks and a computer vision application, showing significant improvements to the state-of-the-art LDL methods.	ラベル分布学習 (LDL) は一般的な学習フレームワークであり、単一のラベルまたは複数のラベルではなくラベルのセットにわたる分布をインスタンスに割り当てます。現在の LDL 手法には、ラベル分布の表現形式に関する仮定が制限されているか、表現学習 (エンドツーエンドの方法で深い特徴を学習するなど) に制限があります。この論文では、ラベル分布学習フォレスト (LDLF) について説明します。これは、微分可能なデシジョンツリーに基づく新しいラベル分布学習アルゴリズムであり、次のような利点があります。 1) デシジョンツリーには、リーフノード予測の混合によってラベル分布の一般的な形式をモデル化できる可能性があります。。 2) 微分可能な決定木の学習は、表現学習と組み合わせることができます。フォレストの分布ベースの損失関数を定義し、すべてのツリーを共同で学習できるようにし、損失関数の厳密な減少を保証する葉ノード予測の更新関数が変分境界によって導出できることを示します。提案された LDLF の有効性は、いくつかの LDL タスクとコンピュータービジョンアプリケーションで検証され、最先端の LDL 手法に対する大幅な改善が示されています。
82	81	https://papers.nips.cc//paper/6686-unsupervised-learning-of-object-frames-by-dense-equivariant-image-labelling	Unsupervised learning of object frames by dense equivariant image labelling	高密度等変画像ラベリングによる物体フレームの教師なし学習	One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame. This coordinate frame is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. We demonstrate the applicability of this method to simple articulated objects and deformable objects such as human faces, learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision.	視覚認識の重要な課題の 1 つは、視点、オクルージョン、動き、変形などの複雑な迷惑要因の影響を受ける視覚測定から 3D オブジェクトとオブジェクトカテゴリの抽象モデルを抽出することです。視点因数分解の最近のアイデアから出発して、我々は、物体の多数の画像が与えられ、他の監視がない場合に、密な物体中心の座標フレームを抽出できる新しいアプローチを提案します。この座標フレームは画像の変形に対して不変であり、画像ピクセルを対応するオブジェクト座標にマッピングできる高密度等変ラベリングニューラルネットワークが付属しています。我々は、この手法が単純な関節オブジェクトや人間の顔などの変形可能なオブジェクトに適用可能であることを実証し、ランダムな合成変換やオプティカルフローの対応から埋め込みを学習します。すべて手動による監視は必要ありません。
83	82	https://papers.nips.cc//paper/6687-compression-aware-training-of-deep-networks	Compression-aware Training of Deep Networks	ディープネットワークの圧縮を意識したトレーニング	In recent years, great progress has been made in a variety of application domains thanks to the development of increasingly deeper neural networks. Unfortunately, the huge number of units of these networks makes them expensive both computationally and memory-wise. To overcome this, exploiting the fact that deep networks are over-parametrized, several compression strategies have been proposed. These methods, however, typically start from a network that has been trained in a standard manner, without considering such a future compression. In this paper, we propose to explicitly account for compression in the training process. To this end, we introduce a regularizer that encourages the parameter matrix of each layer to have low rank during training. We show that accounting for compression during training allows us to learn much more compact, yet at least as effective, models than state-of-the-art compression techniques.	近年、ますます深くなったニューラルネットワークの開発のおかげで、さまざまな応用分野で大きな進歩が見られました。残念ながら、これらのネットワークのユニット数は膨大であるため、計算とメモリの両方で高価になります。これを克服するために、ディープネットワークが過剰にパラメータ化されているという事実を利用して、いくつかの圧縮戦略が提案されています。ただし、これらの方法は通常、そのような将来の圧縮を考慮せずに、標準的な方法でトレーニングされたネットワークから開始します。この論文では、トレーニングプロセスにおける圧縮を明示的に考慮することを提案します。この目的を達成するために、トレーニング中に各層のパラメーター行列が低いランクになるように奨励する正則化子を導入します。トレーニング中に圧縮を考慮すると、最先端の圧縮技術よりもはるかにコンパクトでありながら、少なくとも同等の効果的なモデルを学習できることを示します。
84	83	https://papers.nips.cc//paper/6688-multiscale-semi-markov-dynamics-for-intracortical-brain-computer-interfaces	Multiscale Semi-Markov Dynamics for Intracortical Brain-Computer Interfaces	皮質内ブレインコンピューターインターフェースのためのマルチスケールセミマルコフダイナミクス	Intracortical brain-computer interfaces (iBCIs) have allowed people with tetraplegia to control a computer cursor by imagining the movement of their paralyzed arm or hand. State-of-the-art decoders deployed in human iBCIs are derived from a Kalman filter that assumes Markov dynamics on the angle of intended movement, and a unimodal dependence on intended angle for each channel of neural activity. Due to errors made in the decoding of noisy neural data, as a user attempts to move the cursor to a goal, the angle between cursor and goal positions may change rapidly. We propose a dynamic Bayesian network that includes the on-screen goal position as part of its latent state, and thus allows the person’s intended angle of movement to be aggregated over a much longer history of neural activity. This multiscale model explicitly captures the relationship between instantaneous angles of motion and long-term goals, and incorporates semi-Markov dynamics for motion trajectories. We also introduce a multimodal likelihood model for recordings of neural populations which can be rapidly calibrated for clinical applications. In offline experiments with recorded neural data, we demonstrate significantly improved prediction of motion directions compared to the Kalman filter. We derive an efficient online inference algorithm, enabling a clinical trial participant with tetraplegia to control a computer cursor with neural activity in real time. The observed kinematics of cursor movement are objectively straighter and smoother than prior iBCI decoding models without loss of responsiveness.	皮質内脳コンピューターインターフェイス (iBCI) により、四肢麻痺の人は、麻痺した腕や手の動きを想像してコンピューターカーソルを制御できるようになりました。人間の iBCI に導入されている最先端のデコーダーは、意図された動きの角度に関するマルコフ力学と、神経活動の各チャネルの意図された角度に対する単峰性の依存性を仮定するカルマンフィルターから派生しています。ノイズの多いニューラルデータのデコード時に発生したエラーにより、ユーザーがカーソルをゴールに移動しようとすると、カーソルとゴールの位置の間の角度が急速に変化する可能性があります。私たちは、画面上の目標位置を潜在状態の一部として含む動的なベイジアンネットワークを提案します。これにより、人間の意図した動きの角度を、はるかに長い神経活動の履歴にわたって集約できるようになります。このマルチスケールモデルは、瞬間的な動きの角度と長期的な目標の間の関係を明示的に捉え、動きの軌跡にセミマルコフ力学を組み込んでいます。また、臨床応用のために迅速に校正できる神経集団の記録のためのマルチモーダル尤度モデルも紹介します。記録されたニューラルデータを使用したオフライン実験では、カルマンフィルターと比較して運動方向の予測が大幅に改善されたことを実証しました。私たちは効率的なオンライン推論アルゴリズムを導き出し、四肢麻痺の臨床試験参加者が神経活動によってコンピューターカーソルをリアルタイムで制御できるようにします。観測されたカーソル移動の運動学は、応答性を損なうことなく、以前の iBCI デコードモデルよりも客観的により真っ直ぐで滑らかです。
85	84	https://papers.nips.cc//paper/6689-predrnn-recurrent-neural-networks-for-predictive-learning-using-spatiotemporal-lstms	PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs	PredRNN: 時空間 LSTM を使用した予測学習のためのリカレントニューラルネットワーク	The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical frames, where spatial appearances and temporal variations are two crucial structures. This paper models these structures by presenting a predictive recurrent neural network (PredRNN). This architecture is enlightened by the idea that spatiotemporal predictive learning should memorize both spatial appearances and temporal variations in a unified memory pool. Concretely, memory states are no longer constrained inside each LSTM unit. Instead, they are allowed to zigzag in two directions: across stacked RNN layers vertically and through all RNN states horizontally. The core of this network is a new Spatiotemporal LSTM (ST-LSTM) unit that extracts and memorizes spatial and temporal representations simultaneously. PredRNN achieves the state-of-the-art prediction performance on three video prediction datasets and is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures.	時空間シーケンスの予測学習は、空間的外観と時間的変動が 2 つの重要な構造である過去のフレームから学習することで将来の画像を生成することを目的としています。この論文では、予測リカレントニューラルネットワーク (PredRNN) を提示することにより、これらの構造をモデル化します。このアーキテクチャは、時空間予測学習が空間的外観と時間的変化の両方を統合メモリプールに記憶する必要があるという考えによって啓発されています。具体的には、メモリ状態は各 LSTM ユニット内で制約されなくなりました。代わりに、スタックされた RNN 層を垂直に横断することと、すべての RNN 状態を水平に通過するという 2 つの方向にジグザグに移動することが許可されます。このネットワークの中核は、空間表現と時間表現を同時に抽出して記憶する新しい時空間 LSTM (ST-LSTM) ユニットです。 PredRNN は 3 つのビデオ予測データセットで最先端の予測パフォーマンスを実現し、より一般的なフレームワークであり、他のアーキテクチャと統合することで他の予測学習タスクに簡単に拡張できます。
86	85	https://papers.nips.cc//paper/6690-detrended-partial-cross-correlation-for-brain-connectivity-analysis	Detrended Partial Cross Correlation for Brain Connectivity Analysis	脳接続性分析のための傾向除去偏相互相関	Brain connectivity analysis is a critical component of ongoing human connectome projects to decipher the healthy and diseased brain. Recent work has highlighted the power-law (multi-time scale) properties of brain signals; however, there remains a lack of methods to specifically quantify short- vs. long- time range brain connections. In this paper, using detrended partial cross-correlation analysis (DPCCA), we propose a novel functional connectivity measure to delineate brain interactions at multiple time scales, while controlling for covariates. We use a rich simulated fMRI dataset to validate the proposed method, and apply it to a real fMRI dataset in a cocaine dependence prediction task. We show that, compared to extant methods, the DPCCA-based approach not only distinguishes short and long memory functional connectivity but also improves feature extraction and enhances classification accuracy. Together, this paper contributes broadly to new computational methodologies in understanding neural information processing.	脳の接続性分析は、健康な脳と病気の脳を解読する進行中のヒトコネクトームプロジェクトの重要な要素です。最近の研究では、脳信号のべき乗則 (複数時間スケール) 特性が強調されています。しかし、短期範囲と長期範囲の脳の接続を具体的に定量化する方法は依然として不足しています。この論文では、トレンド除去偏相互相関分析 (DPCCA) を使用して、共変量を制御しながら、複数の時間スケールで脳の相互作用を描写するための新しい機能的接続性の尺度を提案します。豊富なシミュレートされた fMRI データセットを使用して提案された方法を検証し、それをコカイン依存予測タスクの実際の fMRI データセットに適用します。既存の方法と比較して、DPCCA ベースのアプローチは、短いメモリ機能の接続性と長いメモリ機能の接続性を区別するだけでなく、特徴抽出が改善され、分類精度が向上することを示します。合わせて、この論文は神経情報処理を理解する際の新しい計算方法論に広く貢献します。
87	86	https://papers.nips.cc//paper/6691-contrastive-learning-for-image-captioning	Contrastive Learning for Image Captioning	画像キャプションのための対照学習	Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learning (CL), for image captioning. Specifically, via two constraints formulated on top of a reference model, the proposed method can encourage distinctiveness, while maintaining the overall quality of the generated captions. We tested our method on two challenging datasets, where it improves the baseline model by significant margins. We also showed in our studies that the proposed method is generic and can be used for models with various structures.	コンピュータビジョンの人気トピックである画像キャプションは、近年大幅な進歩を遂げています。しかし、自然描写の独自性は以前の研究では見落とされがちでした。特徴的なキャプションは画像をその独自の側面で説明する可能性が高いため、これはキャプションの品質と密接に関係しています。この研究では、画像キャプションのための新しい学習方法である対照学習 (CL) を提案します。具体的には、提案された方法は、参照モデルに基づいて定式化された 2 つの制約によって、生成されるキャプションの全体的な品質を維持しながら、識別性を促進できます。私たちは 2 つの困難なデータセットでメソッドをテストし、ベースラインモデルを大幅に改善しました。また、提案手法が汎用的であり、さまざまな構造のモデルに使用できることも研究で示しました。
88	87	https://papers.nips.cc//paper/6692-safe-model-based-reinforcement-learning-with-stability-guarantees	Safe Model-based Reinforcement Learning with Stability Guarantees	安定性が保証された安全なモデルベースの強化学習	Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.	強化学習は、実験データから最適なポリシーを学習するための強力なパラダイムです。ただし、最適なポリシーを見つけるために、ほとんどの強化学習アルゴリズムは、現実世界のシステムにとって有害となる可能性のあるすべての可能なアクションを調査します。その結果、現実世界では安全性が重要なシステムに学習アルゴリズムが適用されることはほとんどありません。この論文では、安定性の保証の観点から定義された安全性を明示的に考慮した学習アルゴリズムを紹介します。具体的には、リアプノフ安定性検証に関する制御理論の結果を拡張し、ダイナミクスの統計モデルを使用して、証明可能な安定性証明書を備えた高性能制御ポリシーを取得する方法を示します。さらに、事前のガウス過程に関する追加の規則性の仮定の下で、ダイナミクスについて学習するためにデータを効果的かつ安全に収集できることを証明し、それによって制御性能の向上と状態空間の安全領域の拡大の両方が可能になります。私たちの実験では、結果として得られるアルゴリズムが、シミュレートされた倒立振子上で、振り子が決して落下することなく、ニューラルネットワークポリシーを安全に最適化できる方法を示します。
89	88	https://papers.nips.cc//paper/6693-online-multiclass-boosting	Online multiclass boosting	オンラインマルチクラスブースティング	Recent work has extended the theoretical analysis of boosting algorithms to multiclass problems and to online settings. However, the multiclass extension is in the batch setting and the online extensions only consider binary classification. We fill this gap in the literature by defining, and justifying, a weak learning condition for online multiclass boosting. This condition leads to an optimal boosting algorithm that requires the minimal number of weak learners to achieve a certain accuracy. Additionally, we propose an adaptive algorithm which is near optimal and enjoys an excellent performance on real data due to its adaptive property.	最近の研究では、ブースティングアルゴリズムの理論的分析がマルチクラス問題とオンライン設定に拡張されました。ただし、マルチクラス拡張はバッチ設定にあり、オンライン拡張はバイナリ分類のみを考慮します。私たちは、オンラインマルチクラスブースティングの弱い学習条件を定義し、正当化することで、文献のこのギャップを埋めます。この条件により、特定の精度を達成するために最小限の数の弱学習器を必要とする最適なブースティングアルゴリズムが導き出されます。さらに、最適に近く、その適応特性により実データ上で優れたパフォーマンスを享受できる適応アルゴリズムを提案します。
90	89	https://papers.nips.cc//paper/6694-matching-on-balanced-nonlinear-representations-for-treatment-effects-estimation	Matching on Balanced Nonlinear Representations for Treatment Effects Estimation	治療効果推定のためのバランスの取れた非線形表現のマッチング	Estimating treatment effects from observational data is challenging due to the missing counterfactuals. Matching is an effective strategy to tackle this problem. The widely used matching estimators such as nearest neighbor matching (NNM) pair the treated units with the most similar control units in terms of covariates, and then estimate treatment effects accordingly. However, the existing matching estimators have poor performance when the distributions of control and treatment groups are unbalanced. Moreover, theoretical analysis suggests that the bias of causal effect estimation would increase with the dimension of covariates. In this paper, we aim to address these problems by learning low-dimensional balanced and nonlinear representations (BNR) for observational data. In particular, we convert counterfactual prediction as a classification problem, develop a kernel learning model with domain adaptation constraint, and design a novel matching estimator. The dimension of covariates will be significantly reduced after projecting data to a low-dimensional subspace. Experiments on several synthetic and real-world datasets demonstrate the effectiveness of our approach.	反事実が欠けているため、観察データから治療効果を推定することは困難です。マッチングは、この問題に対処するための効果的な戦略です。最近傍マッチング (NNM) などの広く使用されているマッチング推定器は、共変量の観点から治療対象ユニットと最も類似したコントロールユニットを組み合わせ、それに応じて治療効果を推定します。ただし、既存のマッチング推定器は、対照群と治療群の分布が不均衡な場合にはパフォーマンスが低下します。さらに、理論分析は、因果効果推定の偏りは共変量の次元とともに増加することを示唆しています。この論文では、観測データの低次元平衡非線形表現 (BNR) を学習することで、これらの問題に対処することを目的としています。特に、反事実予測を分類問題として変換し、ドメイン適応制約を備えたカーネル学習モデルを開発し、新しいマッチング推定器を設計します。共変量の次元は、データを低次元の部分空間に投影した後、大幅に減少します。いくつかの合成データセットと現実世界のデータセットでの実験により、私たちのアプローチの有効性が実証されました。
91	90	https://papers.nips.cc//paper/6695-learning-overcomplete-hmms	Learning Overcomplete HMMs	過完全な HMM の学習	We study the basic problem of learning overcomplete HMMs---those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results---both positive and negative---which help define the boundaries between the tractable-learning setting and the intractable setting. We show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned and have small probability mass on short cycles. We also show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.	私たちは、過完全な HMM、つまり多くの隠れ状態を持つが出力アルファベットが小さい HMM を学習するという基本的な問題を研究します。このような HMM は実用上重要な意味を持っているにもかかわらず、効率的な学習に関して肯定的な結果も否定的な結果も得られず、十分に理解されていません。この論文では、肯定的なものと否定的なものの両方で、扱いやすい学習環境と扱いにくい学習環境の間の境界を定義するのに役立ついくつかの新しい結果を紹介します。遷移行列が疎でよく条件付けされており、短いサイクルで確率質量が小さい HMM の大規模なサブクラスに対して肯定的な結果が得られます。また、出力アルファベットが小さく、遷移行列が次数の大きいランダムな正規グラフである HMM のサンプルの多項式数だけが与えられた場合、学習が不可能であることも示します。また、これらの結果について、長期的な依存関係を捕捉できる HMM の学習との関連で説明します。
92	91	https://papers.nips.cc//paper/6696-gp-cake-effective-brain-connectivity-with-causal-kernels	GP CaKe: Effective brain connectivity with causal kernels	GP CaKe: 因果核との効果的な脳接続	A fundamental goal in network neuroscience is to understand how activity in one brain region drives activity elsewhere, a process referred to as effective connectivity. Here we propose to model this causal interaction using integro-differential equations and causal kernels that allow for a rich analysis of effective connectivity. The approach combines the tractability and flexibility of autoregressive modeling with the biophysical interpretability of dynamic causal modeling. The causal kernels are learned nonparametrically using Gaussian process regression, yielding an efficient framework for causal inference. We construct a novel class of causal covariance functions that enforce the desired properties of the causal kernels, an approach which we call GP CaKe. By construction, the model and its hyperparameters have biophysical meaning and are therefore easily interpretable. We demonstrate the efficacy of GP CaKe on a number of simulations and give an example of a realistic application on magnetoencephalography (MEG) data.	ネットワーク神経科学の基本的な目標は、ある脳領域の活動が他の領域の活動をどのように推進するのか、つまり効果的な接続と呼ばれるプロセスを理解することです。ここでは、効果的な接続性の豊富な分析を可能にする積分微分方程式と因果カーネルを使用して、この因果的相互作用をモデル化することを提案します。このアプローチは、自己回帰モデリングの扱いやすさと柔軟性と、動的因果モデリングの生物物理学的解釈可能性を組み合わせています。因果カーネルはガウス過程回帰を使用してノンパラメトリックに学習され、因果推論のための効率的なフレームワークが得られます。私たちは、因果カーネルの望ましい特性を強制する因果共分散関数の新しいクラスを構築します。これは GP CaKe と呼ばれるアプローチです。構造上、モデルとそのハイパーパラメータには生物物理学的意味があるため、簡単に解釈できます。我々は、多くのシミュレーションで GP CaKe の有効性を実証し、脳磁図 (MEG) データの現実的な応用例を示します。
93	92	https://papers.nips.cc//paper/6697-decoupling-when-to-update-from-how-to-update	Decoupling "when to update" from "how to update"	「いつ更新するか」を切り離す「アップデート方法」より	Deep learning requires data. A useful approach to obtain data is to be creative and mine data from various sources, that were created for different purposes. Unfortunately, this approach often leads to noisy labels. In this paper, we propose a meta algorithm for tackling the noisy labels problem. The key idea is to decouple ``when to update'' from ``how to update''. We demonstrate the effectiveness of our algorithm by mining data for gender classification by combining the Labeled Faces in the Wild (LFW) face recognition dataset with a textual genderizing service, which leads to a noisy dataset. While our approach is very simple to implement, it leads to state-of-the-art results. We analyze some convergence properties of the proposed algorithm.	ディープラーニングにはデータが必要です。データを取得するための有効なアプローチは、創造性を発揮して、さまざまな目的で作成されたさまざまなソースからデータをマイニングすることです。残念ながら、このアプローチではラベルにノイズが多くなることがよくあります。この論文では、ノイズの多いラベルの問題に取り組むためのメタアルゴリズムを提案します。重要な考え方は、「いつ更新するか」と「更新方法」を切り離すことです。 Labeled Faces in the Wild (LFW) 顔認識データセットとテキストの性別化サービスを組み合わせて性別分類用のデータをマイニングすることで、アルゴリズムの有効性を実証します。これにより、ノイズの多いデータセットが生成されます。私たちのアプローチは実装が非常に簡単ですが、最先端の結果が得られます。提案されたアルゴリズムのいくつかの収束特性を分析します。
94	93	https://papers.nips.cc//paper/6698-self-normalizing-neural-networks	Self-Normalizing Neural Networks	自己正規化ニューラルネットワーク	Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are ""scaled exponential linear units"" (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance -- even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. For FNNs we considered (i) ReLU networks without normalization, (ii) batch normalization, (iii) layer normalization, (iv) weight normalization, (v) highway networks, (vi) residual networks. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep.	ディープラーニングは、畳み込みニューラルネットワーク (CNN) による視覚とリカレントニューラルネットワーク (RNN) による自然言語処理に革命をもたらしました。ただし、標準のフィードフォワードニューラルネットワーク (FNN) を使用した深層学習の成功例はまれです。良好なパフォーマンスを発揮する FNN は通常、浅いため、多くのレベルの抽象表現を利用できません。自己正規化ニューラルネットワーク (SNN) を導入して、高レベルの抽象表現を可能にします。バッチ正規化には明示的な正規化が必要ですが、SNN のニューロンの活性化は、ゼロ平均と単位分散に向かって自動的に収束します。 SNN の活性化関数は「スケーリングされた指数線形単位」(SELU) であり、自己正規化特性を誘発します。バナッハの不動点定理を使用して、多くのネットワーク層を伝播する平均と単位分散がゼロに近い活性化は、ノイズや摂動が存在する場合でも、平均と単位分散がゼロに収束することを証明します。 SNN のこの収束特性により、(1) 多くの層で深いネットワークをトレーニングすること、(2) 強力な正則化を採用すること、(3) 学習を非常に堅牢にすることが可能になります。さらに、単位分散に近づかない活性化については、分散の上限と下限を証明するため、勾配の消失や爆発は不可能です。 (a) UCI 機械学習リポジトリの 121 タスク、(b) 創薬ベンチマーク、および (c) 天文学タスクに関する SNN を、標準 FNN およびランダムフォレストやサポートベクターマシンなどの他の機械学習手法と比較しました。 FNN については、(i) 正規化なしの ReLU ネットワーク、(ii) バッチ正規化、(iii) 層正規化、(iv) 重み正規化、(v) ハイウェイネットワーク、(vi) 残差ネットワークを検討しました。 SNN は、121 の UCI タスクで競合するすべての FNN メソッドを大幅に上回り、Tox21 データセットではすべての競合メソッドを上回り、天文学データセットでは新記録を樹立しました。優れた SNN アーキテクチャは、多くの場合、非常に奥深いものです。
95	94	https://papers.nips.cc//paper/6699-learning-to-pivot-with-adversarial-networks	Learning to Pivot with Adversarial Networks	敵対的ネットワークでピボットする方法を学ぶ	Several techniques for domain adaptation have been proposed to account for differences in the distribution of the data used for training and testing. The majority of this work focuses on a binary domain label. Similar problems occur in a scientific context where there may be a continuous family of plausible data generation processes associated to the presence of systematic uncertainties. Robust inference is possible if it is based on a pivot -- a quantity whose distribution does not depend on the unknown values of the nuisance parameters that parametrize this family of data generation processes. In this work, we introduce and derive theoretical results for a training procedure based on adversarial networks for enforcing the pivotal property (or, equivalently, fairness with respect to continuous attributes) on a predictive model. The method includes a hyperparameter to control the trade-off between accuracy and robustness. We demonstrate the effectiveness of this approach with a toy example and examples from particle physics.	トレーニングとテストに使用されるデータの分布の違いを考慮して、ドメイン適応のためのいくつかの手法が提案されています。この作業の大部分は、バイナリドメインラベルに焦点を当てています。同様の問題は、体系的な不確実性の存在に関連するもっともらしいデータ生成プロセスが連続して存在する可能性がある科学の分野でも発生します。ピボット (この一連のデータ生成プロセスをパラメータ化する迷惑パラメータの未知の値に分布が依存しない量) に基づいている場合、確実な推論が可能です。この研究では、予測モデルで極めて重要な特性 (または同等に、連続属性に関する公平性) を強制するための敵対的ネットワークに基づくトレーニング手順の理論的結果を導入し、導き出します。この方法には、精度と堅牢性の間のトレードオフを制御するハイパーパラメーターが含まれています。おもちゃの例と素粒子物理学の例を使用して、このアプローチの有効性を示します。
96	95	https://papers.nips.cc//paper/6700-schnet-a-continuous-filter-convolutional-neural-network-for-modeling-quantum-interactions	SchNet: A continuous-filter convolutional neural network for modeling quantum interactions	SchNet: 量子相互作用をモデル化するための連続フィルター畳み込みニューラルネットワーク	Deep learning has the potential to revolutionize quantum chemistry as it is ideally suited to learn representations for structured data and speed up the exploration of chemical space. While convolutional neural networks have proven to be the first choice for images, audio and video data, the atoms in molecules are not restricted to a grid. Instead, their precise locations contain essential physical information, that would get lost if discretized. Thus, we propose to use continuous-filter convolutional layers to be able to model local correlations without requiring the data to lie on a grid. We apply those layers in SchNet: a novel deep learning architecture modeling quantum interactions in molecules. We obtain a joint model for the total energy and interatomic forces that follows fundamental quantum-chemical principles. Our architecture achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories. Finally, we introduce a more challenging benchmark with chemical and structural variations that suggests the path for further work.	深層学習は、構造化データの表現を学習し、化学空間の探索を高速化するのに理想的に適しているため、量子化学に革命をもたらす可能性があります。畳み込みニューラルネットワークは、画像、音声、およびビデオデータの最初の選択肢であることが証明されていますが、分子内の原子はグリッドに制限されません。その代わりに、それらの正確な位置には、離散化されると失われてしまう重要な物理情報が含まれています。したがって、データをグリッド上に配置する必要なく、局所相関をモデル化できるように、連続フィルター畳み込み層を使用することを提案します。私たちはこれらの層を SchNet (分子内の量子相互作用をモデル化する新しい深層学習アーキテクチャ) に適用します。基本的な量子化学原理に従って、総エネルギーと原子間力の共同モデルが得られます。当社のアーキテクチャは、平衡分子と分子動力学軌道のベンチマークにおいて最先端のパフォーマンスを実現します。最後に、さらなる研究への道筋を示唆する、化学的および構造的変化を伴うより挑戦的なベンチマークを紹介します。
97	96	https://papers.nips.cc//paper/6701-active-bias-training-more-accurate-neural-networks-by-emphasizing-high-variance-samples	Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples	アクティブバイアス: 分散の高いサンプルを強調して、より正確なニューラルネットワークをトレーニングする	Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation.	マイペース学習とハードサンプルマイニングにより、トレーニングインスタンスの重みが再調整され、学習の精度が向上します。この論文では、確率的勾配降下法 (SGD) におけるサンプルの不確実性の軽量推定値に基づく 2 つの改善された代替案を示します。それは、ミニバッチ SGD の反復にわたる正しいクラスの予測確率の分散と、決定閾値への正しいクラスの確率の近さです。。 6 つのデータセットに関する広範な実験結果は、私たちの手法が、残差学習、運動量、ADAM、バッチ正規化、ドロップアウト、蒸留などの他の一般的なトレーニング手法に加えて追加のゲインを含め、さまざまなネットワークアーキテクチャで精度を確実に向上させることを示しています。
98	97	https://papers.nips.cc//paper/6702-differentiable-learning-of-submodular-functions	Differentiable Learning of Submodular Functions	サブモジュール関数の微分可能学習	Can we incorporate discrete optimization algorithms within modern machine learning models? For example, is it possible to use in deep architectures a layer whose output is the minimal cut of a parametrized graph? Given that these models are trained end-to-end by leveraging gradient information, the introduction of such layers seems very challenging due to their non-continuous output. In this paper we focus on the problem of submodular minimization, for which we show that such layers are indeed possible. The key idea is that we can continuously relax the output without sacrificing guarantees. We provide an easily computable approximation to the Jacobian complemented with a complete theoretical analysis. Finally, these contributions let us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation.	最新の機械学習モデルに離散最適化アルゴリズムを組み込むことはできるでしょうか?たとえば、出力がパラメーター化されたグラフの最小部分であるレイヤーをディープアーキテクチャで使用することは可能ですか?これらのモデルが勾配情報を活用してエンドツーエンドでトレーニングされることを考えると、出力が非連続であるため、このようなレイヤーの導入は非常に困難であるように思えます。この論文では、サブモジュールの最小化の問題に焦点を当て、そのような層が実際に可能であることを示します。重要なアイデアは、保証を犠牲にすることなく出力を継続的に緩和できるということです。完全な理論分析によって補完された、簡単に計算できるヤコビアンの近似値を提供します。最後に、これらの貢献により、バイレベル変分推論定式化を介して確率的対数スーパーモジュラーモデルを実験的に学習できるようになります。
99	98	https://papers.nips.cc//paper/6703-inductive-representation-learning-on-large-graphs	Inductive Representation Learning on Large Graphs	大きなグラフでの帰納的表現学習	Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.	大規模なグラフへのノードの低次元埋め込みは、コンテンツの推奨からタンパク質の機能の特定まで、さまざまな予測タスクで非常に役立つことが証明されています。ただし、既存のアプローチのほとんどでは、埋め込みのトレーニング中にグラフ内のすべてのノードが存在する必要があります。これらの以前のアプローチは本質的に変換的であり、目に見えないノードに自然に一般化するものではありません。ここでは、ノードの特徴情報 (テキスト属性など) を活用してノードの埋め込みを効率的に生成する一般的な帰納的フレームワークである GraphSAGE を紹介します。各ノードの個別のエンベディングをトレーニングする代わりに、ノードのローカル近傍から特徴をサンプリングして集約することによってエンベディングを生成する関数を学習します。私たちのアルゴリズムは、3 つの帰納的ノード分類ベンチマークで強力なベースラインを上回っています。引用と Reddit 投稿データに基づいて、進化する情報グラフ内の未見のノードのカテゴリを分類し、次のマルチグラフデータセットを使用して、アルゴリズムが完全に未見のグラフに一般化していることを示します。タンパク質間の相互作用。
100	99	https://papers.nips.cc//paper/6704-subset-selection-and-summarization-in-sequential-data	Subset Selection and Summarization in Sequential Data	連続データのサブセットの選択と要約	Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives.	サブセット選択は、大規模なグラウンドセットから代表的なアイテムの小さなサブセットを見つけるタスクであり、さまざまな分野で多数の用途があります。時系列データや順序付けされたデータなどのシーケンシャルデータには、基礎となるデータの動的モデルによって強制される項目間の重要な構造的関係が含まれており、代表の選択において重要な役割を果たすはずです。ただし、既存のほぼすべてのサブセット選択手法は、データの基礎となるダイナミクスを無視し、項目を独立して扱うため、互換性のない代表セットが生成されます。この論文では、データの動的モデルと互換性のある代表セットを見つける、逐次サブセット選択のための新しいフレームワークを開発します。そのために、項目に遷移動的モデルを装備し、連続した項目の代表への割り当てに対する整数バイナリ最適化として問題を提起します。これにより、高いエンコード、多様性、および遷移の可能性がもたらされます。私たちの定式化は、施設間の遷移ダイナミクスを組み込んで、連続データを処理するためのよく知られた施設の位置目標を一般化します。提案された定式化は非凸であるため、問題を効率的に解決するための最大合計メッセージパッシングアルゴリズムを導出します。教育用ビデオの要約を含む合成データと実際のデータに関する実験では、当社の逐次サブセット選択フレームワークが最先端技術よりも優れたエンコードと多様性を達成するだけでなく、データのダイナミクスをうまく組み込んで、互換性のある代表につながることを示しています。