Подписаться
Alexander Turner
Alexander Turner
Неизвестная организация
Подтвержден адрес электронной почты в домене mit.edu
Название
Процитировано
Процитировано
Год
Robustness may be at odds with accuracy
D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry
arXiv preprint arXiv:1805.12152, 2018
17212018
Label-consistent backdoor attacks
A Turner, D Tsipras, A Madry
arXiv preprint arXiv:1912.02771, 2019
444*2019
There is no free lunch in adversarial robustness (but there are unexpected benefits)
D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry
arXiv preprint arXiv:1805.12152 2 (3), 2018
912018
Optimal policies tend to seek power
AM Turner, L Smith, R Shah, A Critch, P Tadepalli
arXiv preprint arXiv:1912.01683, 2019
502019
Robustness may be at odds with accuracy. arXiv
D Tsipras, S Santurkar, L Engstrom, A Turner, A Madry
arXiv preprint arXiv:1805.12152 10, 2018
212018
Parametrically retargetable decision-makers tend to seek power
A Turner, P Tadepalli
Advances in Neural Information Processing Systems 35, 31391-31401, 2022
122022
Steering llama 2 via contrastive activation addition
N Rimsky, N Gabrieli, J Schulz, M Tong, E Hubinger, AM Turner
arXiv preprint arXiv:2312.06681, 2023
62023
On avoiding power-seeking by artificial intelligence
AM Turner
arXiv preprint arXiv:2206.11831, 2022
22022
Understanding and Controlling a Maze-Solving Policy Network
U Mini, P Grietzer, M Sharma, A Meek, M MacDiarmid, AM Turner
arXiv preprint arXiv:2310.08043, 2023
12023
Formalizing the problem of side effect regularization
AM Turner, A Saxena, P Tadepalli
arXiv preprint arXiv:2206.11812, 2022
12022
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–10