Deep reinforcement learning in large discrete action spaces G Dulac-Arnold, R Evans, H van Hasselt, P Sunehag, T Lillicrap, J Hunt, ... arXiv preprint arXiv:1512.07679, 2015 | 236 | 2015 |
Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. P Sunehag, G Lever, A Gruslys, WM Czarnecki, VF Zambaldi, ... AAMAS, 2085-2087, 2018 | 137 | 2018 |
Value-decomposition networks for cooperative multi-agent learning P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ... arXiv preprint arXiv:1706.05296, 2017 | 108 | 2017 |
Variable metric stochastic approximation theory P Sunehag, J Trumpf, SVN Vishwanathan, N Schraudolph Artificial Intelligence and Statistics, 560-566, 2009 | 37 | 2009 |
Wearable sensor activity analysis using semi-Markov models with a grammar O Thomas, P Sunehag, G Dror, S Yun, S Kim, M Robards, A Smola, ... Pervasive and Mobile Computing 6 (3), 342-350, 2010 | 36 | 2010 |
The sample-complexity of general reinforcement learning T Lattimore, M Hutter, P Sunehag Proceedings of the 30th International Conference on Machine Learning, 2013 | 34 | 2013 |
Deep reinforcement learning with attention for slate Markov decision processes with high-dimensional states and actions P Sunehag, R Evans, G Dulac-Arnold, Y Zwols, D Visentin, B Coppin arXiv preprint arXiv:1512.01124, 2015 | 17 | 2015 |
Semi-Markov kMeans clustering and activity recognition from body-worn sensors MW Robards, P Sunehag 2009 Ninth IEEE International Conference on Data Mining, 438-446, 2009 | 16 | 2009 |
Malthusian reinforcement learning JZ Leibo, J Perolat, E Hughes, S Wheelwright, AH Marblestone, ... arXiv preprint arXiv:1812.07019, 2018 | 15 | 2018 |
Consistency of feature Markov processes P Sunehag, M Hutter Algorithmic Learning Theory, 360-374, 2010 | 15 | 2010 |
Feature Reinforcement Learning In Practice P Nguyen, P Sunehag, M Hutter Arxiv preprint arXiv:1108.3614, 2011 | 14 | 2011 |
Optimistic agents are asymptotically optimal P Sunehag, M Hutter Australasian Joint Conference on Artificial Intelligence, 15-26, 2012 | 13 | 2012 |
Adaptive context tree weighting A O'Neill, M Hutter, W Shao, P Sunehag 2012 Data Compression Conference, 317-326, 2012 | 13 | 2012 |
Feature reinforcement learning: state of the art M Daswani, P Sunehag, M Hutter Sequential decision-making with big data: papers from the AAAI-14 workshop, 2014 | 12 | 2014 |
Rationality, optimism and guarantees in general reinforcement learning P Sunehag, M Hutter The Journal of Machine Learning Research 16 (1), 1345-1390, 2015 | 11 | 2015 |
(Non-) equivalence of universal priors I Wood, P Sunehag, M Hutter Algorithmic Probability and Friends. Bayesian Prediction and Artificial …, 2013 | 11 | 2013 |
Context tree maximizing reinforcement learning P Nguyen, P Sunehag, M Hutter Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012 | 11 | 2012 |
Axioms for rational reinforcement learning P Sunehag, M Hutter Algorithmic Learning Theory, 338-352, 2011 | 10 | 2011 |
Optimistic AIXI P Sunehag, M Hutter International Conference on Artificial General Intelligence, 312-321, 2012 | 9 | 2012 |
Sparse Kernel-SARSA (λ) with an eligibility trace M Robards, P Sunehag, S Sanner, B Marthi Machine Learning and Knowledge Discovery in Databases, 1-17, 2011 | 9 | 2011 |