Self-attentive model for headline generation D Gavrilov, P Kalaidin, V Malykh Advances in Information Retrieval: 41st European Conference on IR Research …, 2019 | 69 | 2019 |
Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning E Lagutin, D Gavrilov, P Kalaidin Proceedings of the 16th Conference of the European Chapter of the …, 2021 | 18 | 2021 |
Learn your reference model for real good alignment A Gorbatovski, B Shaposhnikov, A Malakhov, N Surnachev, Y Aksenov, ... arXiv preprint arXiv:2404.09656, 2024 | 12 | 2024 |
PALBERT: Teaching ALBERT to Ponder N Balagansky, D Gavrilov Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 14002 …, 2022 | 8 | 2022 |
Diffusion Language Models Generation Can Be Halted Early SM Lo Cicero Vaina, N Balagansky, D Gavrilov arXiv e-prints, arXiv: 2305.10818, 2023 | 5* | 2023 |
Classifiers are better experts for controllable text generation A Sitdikov, N Balagansky, D Gavrilov, A Markov arXiv preprint arXiv:2205.07276, 2022 | 4 | 2022 |
Weight squeezing: Reparameterization for extreme compression and fast inference C Artem, G Daniil, B Nikita, K Pavel arXiv: 2010.06993, 2020 | 2 | 2020 |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Y Aksenov, N Balagansky, SMLC Vaina, B Shaposhnikov, A Gorbatovski, ... arXiv preprint arXiv:2402.10644, 2024 | 1 | 2024 |
Ahead-of-Time P-Tuning D Gavrilov, N Balagansky arXiv preprint arXiv:2305.10835, 2023 | 1 | 2023 |
Linear interpolation in parameter space is good enough for fine-tuned language models M Rofin, N Balagansky, D Gavrilov arXiv preprint arXiv:2211.12092, 2022 | 1 | 2022 |
Diffusion Language Models Generation Can Be Halted Early SMLC Vaina, N Balagansky, D Gavrilov arXiv preprint arXiv:2305.10818, 2023 | | 2023 |
FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks M Zubkov, D Gavrilov arXiv preprint arXiv:2202.11364, 2022 | | 2022 |