α-entmax convergence

α-entmax convergence viz

2D slice with selected methods (z₂…zₙ=0)

n (components) 2

α (entmax) 1.50

α (normmax) 2.00

k (top-k) 1

b (ReLU norm) 1.0

a (ReLU norm) 1.0

Budget (budget) 1.0

Select methods to display:

Softmax α-entmax Sparsemax Normmax Top-k ReLU Norm Budget

Mathematical definition of α-entmax

The \(\alpha\)-entmax transformation of a score vector \(\boldsymbol{z} \in \mathbb{R}^n\) is defined as:

\[ \alpha\text{-entmax}(\boldsymbol{z}) = \arg\max_{\boldsymbol{p} \,\in\, \triangle_n} \boldsymbol{p}^\top \boldsymbol{z} + H_\alpha(\boldsymbol{p}), \quad \triangle_n = \bigl\{\boldsymbol{p} \in \mathbb{R}^n_{+}:\sum_{i=1}^n p_i = 1\bigr\}, \]

where \(H_\alpha(\boldsymbol{p})\) is the Tsallis(\(\alpha\)) entropy. The closed form for \(\alpha\text{-entmax}\) with \(\alpha > 1\) is:

\[ p_i^\star = \bigl[(\alpha-1)\,z_i - \tau(\boldsymbol{z})\bigr]_+^{\tfrac{1}{\alpha-1}}, \quad \sum_{i=1}^n p_i^\star = 1, \quad [\cdot]_+ = \max(0,\cdot). \]

Here, \(\tau(\boldsymbol{z})\) is chosen so that \(\boldsymbol{p}^\star\) sums to 1.

Citation

Ben Peters, Vlad Niculae, and André F. T. Martins. 2019. Sparse sequence-to-sequence models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1504–1519, Florence, Italy. Association for Computational Linguistics. https://arxiv.org/abs/1905.05702

α-entmax convergence viz

2D slice with selected methods (z₂…zₙ=0)

3D Surface: p₁(z₂, z₃) with z₁=0

Side-by-side (max 3)

f(τ) with higher derivatives

Root finding: choose a method

Mathematical definition of α-entmax

Citation