B. D. Argall, S. Chernova, M. Veloso & B. Browning (2009):
A survey of robot learning from demonstration.
Robotics and autonomous systems 57(5),
pp. 469–483,
doi:10.1016/j.robot.2008.10.024.
A. Barto & S. Mahadevan (2003):
Recent Advances in Hierarchical Reinforcement Learning.
Discrete Event Systems Journal 13,
pp. 41–77,
doi:10.1023/A:1022140919877.
S. Bhatnagar, R. Sutton, M. Ghavamzadeh & M. Lee (2009):
Natural Actor-Critic Algorithms.
Automatica 45(11),
pp. 2471–2482,
doi:10.1016/j.automatica.2009.07.008.
A. Cimatti, M. Pistore & P. Traverso (2008):
Automated planning.
In: Frank van Harmelen, Vladimir Lifschitz & Bruce Porter: Handbook of Knowledge Representation.
Elsevier,
doi:10.1016/S1574-6526(07)03022-2.
S. T. Erdoğan (2008):
A Library of General-Purpose Action Descriptions.
University of Texas at Austin.
S. Griffith, K. Subramanian, J. Scholz, C. L. Isbell & A. L. Thomaz (2013):
Policy shaping: Integrating human feedback with reinforcement learning.
In: Advances in neural information processing systems (NeurIPS),
pp. 2625–2633.
M. Hanheide, M. Göbelbecker & G. S Horn (2015):
Robot task planning and explanation in open and uncertain worlds.
Artificial Intelligence,
doi:10.1016/j.artint.2015.08.008.
M. Helmert (2006):
The fast downward planning system.
Journal of Artificial Intelligence Research 26,
pp. 191–246,
doi:10.1613/jair.1705.
C. Hogg, U. Kuter & H. Munoz-Avila (2010):
Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning..
In: Association for the Advancement of Artificial Intelligence (AAAI).
D. Inclezan & M. Gelfond (2016):
Modular action language ALM.
Theory and Practice of Logic Programming 16(2),
pp. 189–235,
doi:10.1080/11663081.2013.798954.
Y. Jiang, F. Yang, S. Zhang & P. Stone (2019):
Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots..
In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
P. Khandelwal, F. Yang, M. Leonetti, V. Lifschitz & P. Stone (2014):
Planning in Action Language BC while Learning Action Costs for Mobile Robots..
In: International Conference on Automated Planning and Scheduling (ICAPS).
P. Khandelwal, S. Zhang, J. Sinapov, M. Leonetti, J. Thomason, F. Yang, I. Gori, M. Svetlik, P. Khante & V. Lifschitz (2017):
BWIBots: A platform for bridging the gap between AI and human–robot interaction research.
The International Journal of Robotics Research 36(5-7),
pp. 635–659,
doi:10.1007/978-3-319-23264-5_42.
W. B. Knox & P. Stone (2009):
Interactively shaping agents via human reinforcement: The TAMER framework.
In: Proceedings of the fifth International Conference on Knowledge Capture.
ACM,
pp. 9–16,
doi:10.1145/1597735.1597738.
W. B. Knox & P. Stone (2010):
Combining manual feedback with subsequent MDP reward signals for reinforcement learning.
In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1.
International Foundation for Autonomous Agents and Multiagent Systems,
pp. 5–12.
W. B. Knox & P. Stone (2012):
Reinforcement learning from simultaneous human and MDP reward.
In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1.
International Foundation for Autonomous Agents and Multiagent Systems,
pp. 475–482.
J. Lee, V. Lifschitz & F. Yang (2013):
Action Language BC: A Preliminary Report.
In: International Joint Conference on Artificial Intelligence (IJCAI).
M. Leonetti, L. Iocchi & F. Patrizi (2012):
Automatic generation and learning of finite-state controllers.
In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications.
Springer,
pp. 135–144,
doi:10.1007/3-540-61474-5_68.
M. Leonetti, L. Iocchi & P. Stone (2016):
A synthesis of automated planning and reinforcement learning for efficient, robust decision-making.
Artificial Intelligence 241,
pp. 103–130,
doi:10.1016/j.artint.2016.07.004.
V. Lifschitz & W. Ren (2006):
A modular action description language.
In: Association for the Advancement of Artificial Intelligence (AAAI),
pp. 853–859.
D. Lyu, F. Yang, B. Liu & S. Gustafson (2019):
SDRL: Interpretable and Data-efficient Deep Reinforcement LearningLeveraging Symbolic Planning.
In: Association for the Advancement of Artificial Intelligence (AAAI).
J. MacGlashan, M. K Ho, R. Loftin, B. Peng, G. Wang, D. L. Roberts, M. E. Taylor & M. L. Littman (2017):
Interactive Learning from Policy-Dependent Human Feedback.
In: International Conference on Machine Learning (ICML).
J. MacGlashan, M. L. Littman, D. L. Roberts, R. Loftin, B. Peng & M. E. Taylor (2016):
Convergent Actor Critic by Humans.
In: International Conference on Intelligent Robots and Systems.
John McCarthy (1987):
Generality in Artificial Intelligence.
Communications of the ACM (CACM),
doi:10.1145/33447.33448.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A Rusu, J. Veness, M. G Bellemare, A. Graves, M. Riedmiller, A. K Fidjeland & G. Ostrovski (2015):
Human-level control through deep reinforcement learning.
Nature 518(7540),
pp. 529–533,
doi:10.1016/S0004-3702(98)00023-X.
A. Y. Ng & S. J. Russell (2000):
Algorithms for inverse reinforcement learning..
In: International Conference on Machine Learning (ICML) 1,
pp. 2.
R. Parr & S. J. Russell (1998):
Reinforcement learning with hierarchies of machines.
In: Advances in neural information processing systems (NeurIPS),
pp. 1043–1049.
J. Peters & S. Schaal (2008):
Natural actor-critic.
Neurocomputing 71(7),
pp. 1180–1190,
doi:10.1016/j.neucom.2007.11.026.
S. Rosenthal, M. M. Veloso & A. K. Dey (2011):
Learning Accuracy and Availability of Humans Who Help Mobile Robots..
In: Association for the Advancement of Artificial Intelligence (AAAI).
S. L. Rosenthal (2012):
Human-centered planning for effective task autonomy.
Technical Report.
CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.
M. R.K. Ryan (2002):
Using abstract models of behaviours to automatically generate reinforcement learning hierarchies.
In: In Proceedings of The 19th International Conference on Machine Learning (ICML).
Morgan Kaufmann,
pp. 522–529.
M. R.K. Ryan & M. D. Pendrith (1998):
RL-TOPs: An Architecture for Modularity and Re-Use in Reinforcement Learning.
In: In Proceedings of the Fifteenth International Conference on Machine Learning (ICML).
Morgan Kaufmann,
pp. 481–487.
J. Schulman, S. Levine, P. Abbeel, M. Jordan & P. Moritz (2015):
Trust region policy optimization.
In: Proceedings of the 32nd International Conference on Machine Learning (ICML),
pp. 1889–1897.
J. Schulman, P. Moritz, S. Levine, M. Jordan & P. Abbeel (2015):
High-dimensional continuous control using generalized advantage estimation.
arXiv preprint arXiv:1506.02438.
A. Schwartz (1993):
A Reinforcement Learning Method for Maximizing Undiscounted Rewards.
In: International Conference on Machine Learning (ICML).
Morgan Kaufmann, San Francisco, CA,
doi:10.1016/B978-1-55860-307-3.50045-9.
R. S. Sutton & A. G. Barto (2018):
Reinforcement learning: An introduction.
MIT press.
R. S. Sutton, D. Precup & S. Singh (1999):
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.
Artificial intelligence 112(1-2),
pp. 181–211,
doi:10.1016/S0004-3702(99)00052-1.
A. L. Thomaz & C. Breazeal (2008):
Teachable robots: Understanding human teaching behavior to build more effective robot learners.
Artificial Intelligence 172(6-7),
pp. 716–737,
doi:10.1016/j.artint.2007.09.009.
A. L. Thomaz & C. Breazeal (2006):
Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance.
In: Aaai 6.
Boston, MA,
pp. 1000–1005.
P. A. Tsividis, T. Pouncy, J. L. Xu, J. B. Tenenbaum & S. J. Gershman (2017):
Human learning in Atari.
R. J Williams (1992):
Simple statistical gradient-following algorithms for connectionist reinforcement learning.
Machine learning 8(3-4),
pp. 229–256,
doi:10.1023/A:1022672621406.
F. Yang, D. Lyu, B. Liu & S. Gustafson (2018):
PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making.
In: International Joint Conference of Artificial Intelligence (IJCAI),
doi:10.24963/ijcai.2018/675.