The \(\alpha\)-entmax transformation of a score vector \(\boldsymbol{z} \in \mathbb{R}^n\) is defined as:
\[ \alpha\text{-entmax}(\boldsymbol{z}) = \arg\max_{\boldsymbol{p} \,\in\, \triangle_n} \boldsymbol{p}^\top \boldsymbol{z} + H_\alpha(\boldsymbol{p}), \quad \triangle_n = \bigl\{\boldsymbol{p} \in \mathbb{R}^n_{+}:\sum_{i=1}^n p_i = 1\bigr\}, \]where \(H_\alpha(\boldsymbol{p})\) is the Tsallis(\(\alpha\)) entropy. The closed form for \(\alpha\text{-entmax}\) with \(\alpha > 1\) is:
\[ p_i^\star = \bigl[(\alpha-1)\,z_i - \tau(\boldsymbol{z})\bigr]_+^{\tfrac{1}{\alpha-1}}, \quad \sum_{i=1}^n p_i^\star = 1, \quad [\cdot]_+ = \max(0,\cdot). \]Here, \(\tau(\boldsymbol{z})\) is chosen so that \(\boldsymbol{p}^\star\) sums to 1.
Ben Peters, Vlad Niculae, and André F. T. Martins. 2019. Sparse sequence-to-sequence models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1504–1519, Florence, Italy. Association for Computational Linguistics. https://arxiv.org/abs/1905.05702