Подписаться
Matthew Rahtz
Matthew Rahtz
Google DeepMind
Подтвержден адрес электронной почты в домене google.com - Главная страница
Название
Процитировано
Процитировано
Год
Ensembl 2016
A Yates, W Akanni, MR Amode, D Barrell, K Billis, D Carvalho-Silva, ...
Nucleic acids research 44 (D1), D710-D716, 2016
16332016
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
4672023
Specification gaming: the flip side of AI ingenuity
V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ...
952020
Tracr: Compiled transformers as a laboratory for interpretability
D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik
Advances in Neural Information Processing Systems 36, 2024
302024
Does circuit analysis interpretability scale? Evidence from multiple choice capabilities in Chinchilla
T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik
arXiv preprint arXiv:2307.09458, 2023
262023
The hydra effect: Emergent self-repair in language model computations
T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg
arXiv preprint arXiv:2307.15771, 2023
172023
Safe deep RL in 3D environments using human feedback
M Rahtz, V Varma, R Kumar, Z Kenton, S Legg, J Leike
arXiv preprint arXiv:2201.08102, 2022
72022
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
S El-Sayed, C Akbulut, A McCroskery, G Keeling, Z Kenton, Z Jalan, ...
arXiv preprint arXiv:2404.15058, 2024
12024
Evaluating Frontier Models for Dangerous Capabilities
M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ...
arXiv preprint arXiv:2403.13793, 2024
12024
An extensible interactive interface for agent design
M Rahtz, J Fang, AD Dragan, D Hadfield-Menell
arXiv preprint arXiv:1906.02641, 2019
12019
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–10