Imitation learning of car driving skills with decision trees and random forests

Paweł Cichosz; Łukasz Pawełczak

Imitation learning of car driving skills with decision trees and random forests

Paweł Cichosz; Łukasz Pawełczak

International Journal of Applied Mathematics and Computer Science (2014)

Volume: 24, Issue: 3, page 579-597
ISSN: 1641-876X

Access Full Article

top

Access to full text

Full (PDF)

Access to full text

Abstract

top

Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots.

How to cite

top

MLA
BibTeX
RIS

Paweł Cichosz, and Łukasz Pawełczak. "Imitation learning of car driving skills with decision trees and random forests." International Journal of Applied Mathematics and Computer Science 24.3 (2014): 579-597. <http://eudml.org/doc/271897>.

@article{PawełCichosz2014,
abstract = {Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots.},
author = {Paweł Cichosz, Łukasz Pawełczak},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {imitation learning; behavioral cloning; decision trees; model ensembles; random forest; control; autonomous driving; car racing},
language = {eng},
number = {3},
pages = {579-597},
title = {Imitation learning of car driving skills with decision trees and random forests},
url = {http://eudml.org/doc/271897},
volume = {24},
year = {2014},
}

TY - JOUR
AU - Paweł Cichosz
AU - Łukasz Pawełczak
TI - Imitation learning of car driving skills with decision trees and random forests
JO - International Journal of Applied Mathematics and Computer Science
PY - 2014
VL - 24
IS - 3
SP - 579
EP - 597
AB - Machine learning is an appealing and useful approach to creating vehicle control algorithms, both for simulated and real vehicles. One common learning scenario that is often possible to apply is learning by imitation, in which the behavior of an exemplary driver provides training instances for a supervised learning algorithm. This article follows this approach in the domain of simulated car racing, using the TORCS simulator. In contrast to most prior work on imitation learning, a symbolic decision tree knowledge representation is adopted, which combines potentially high accuracy with human readability, an advantage that can be important in many applications. Decision trees are demonstrated to be capable of representing high quality control models, reaching the performance level of sophisticated pre-designed algorithms. This is achieved by enhancing the basic imitation learning scenario to include active retraining, automatically triggered on control failures. It is also demonstrated how better stability and generalization can be achieved by sacrificing human-readability and using decision tree model ensembles. The methodology for learning control models contributed by this article can be hopefully applied to solve real-world control tasks, as well as to develop video game bots.
LA - eng
KW - imitation learning; behavioral cloning; decision trees; model ensembles; random forest; control; autonomous driving; car racing
UR - http://eudml.org/doc/271897
ER -

References

top

Anderson, C.W., Draper, B.A. and Peterson, D.A. (2000). Behavioral cloning of student pilots with modular neural networks, Proceedings of the 17th International Conference on Machine Learning (ML-2000), Stanford, CA, USA, pp. 25-32.
Atkeson, C.G. and Schaal, S. (1997). Robot learning from demonstration, Proceedings of the 14th International Conference on Machine Learning (ML-97), Nashville, TN, USA, pp. 12-20.
Baluja, S. (1996). Evolution of an artificial neural network based autonomous land vehicle controller, IEEE Transactions on Systems, Man and Cybernetics 26(3): 450-463.
Bratko, I., Urbancic, T. and Sammut, C. (1998). Behavioural cloning of control skill, in R.S. Michalski, I. Bratko and M. Kubat (Eds.), Machine Learning and Data Mining, John Wiley & Sons, Chichester.
Breiman, L. (1996). Bagging predictors, Machine Learning 24(2): 123-240. Zbl0858.68080
Breiman, L. (2001). 45(1): 5-32.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees, Chapman and Hall, New York, NY. Zbl0541.62042
Buehler, M., Iagnemma, K. and Singh, S. (Eds.) (2007). The 2005 DARPA Grand Challenge: The Great Robot Race, Springer, Berlin.
Buehler, M., Iagnemma, K. and Singh, S. (Eds.) (2009). The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, Springer, Berlin.
Cardamone, L., Loiacono, D. and Lanzi, P. (2009a). On-line neuroevolution applied to The Open Racing Car Simulator, Proceedings of the 2009 IEEE Congress on Evolutionary Computation (CEC-09), Trondheim, Norway, pp. 2622-2629.
Cardamone, L., Loiacono, D. and Lanzi, P. (2010). Learning to drive in The Open Racing Car Simulator using online neuroevolution, IEEE Transactions on Computational Intelligence and AI in Games 2(3): 176-190.
Cardamone, L., Loiacono, D. and Lanzi, P.L. (2009b). Learning drivers for TORCS through imitation using supervised methods, Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG-09), Milano, Italy, pp. 148-155.
Chambers, R.A. and Michie, D. (1969). Man-machine co-operation on a learning task, in R. Parslow, R. Prowse and R. Elliott-Green (Eds.), Computer Graphics: Techniques and Applications, Plenum, London, pp. 179-186.
Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318.
Cichosz, P. (2007). Learning Systems, 2nd Edn., WNT, Warsaw, (in Polish). Zbl0930.93048
D'Este, C., O'Sullivan, M. and Hannah, N. (2003). Behavioural cloning and robot control, Proceedings of the International Conference on Robotics and Applications, Salzburg, Austria, pp. 179-182.
Dietterich, T.G. (2000). Ensemble methods in machine learning, Proceedings of the 1st International Workshop on Multiple Classifier Systems, Cagliari, Italy, pp. 1-15.
Esposito, F., Malerba, D. and Semeraro, G. (1997). A comparative analysis of methods for pruning decision trees, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5): 476-491.
Forbes, J.R.N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California at Berkeley, Berkeley, CA.
Guizzo, E. (2011). How Google's self-driving car works, IEEE Spectrum, http://spectrum.ieee.org.
Han, J. and Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd Edn., Morgan Kaufmann, San Francisco, CA. Zbl05951239
Hertz, J., Krogh, A. and Palmer, R.G. (1991). Introduction to the Theory of Neural Computation, Addison-Wesley, Boston, MA.
John, G.H. (1996). Robust linear discriminant trees, in D. Fisher and H. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics V, Springer, New York, NY, pp. 375-385.
Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4: 237-285.
Kohl, N., Stanley, K., Miikkulainen, R., Samples, M. and Sherony, R. (2006). Evolving a real-world vehicle warning system, Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO-06), Seattle, WA, USA, pp. 1681-1688.
Krödel, M. and Kuhnert, K.-D. (2002). Reinforcement learning to drive a car by pattern matching, Proceedings of the 24th DAGM Symposium on Pattern Recognition, Zurich, Switzerland, pp. 322-329. Zbl1017.68762
Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J., Langer, D., Pink, O., Pratt, V., Sokolsky, M., Stanek, G., Stavens, D., Teichman, A., Werling, M. and Thrun, S. (2011). Towards fully autonomous driving: Systems and algorithms, Proceedings of the IEEE Intelligent Vehicles Symposium (IV-11), Baden-Baden, Germany, pp. 163-168.
Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest, R News 2/3: 18-22.
Loiacano, D., Cardamone, L. and Lanzi, P.L. (2009). Simulated car racing championship 2009: Competition software manual, Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano.
Loiacono, D., Prete, A., Lanzi, P. L. and Cardamone, L. (2010). Learning to overtake in TORCS using simple reinforcement learning, Proceedings of the 2010 IEEE Congress on Evolutionary Computation (CEC-2010), Barcelona, Spain, pp. 1-8.
Mitchell, T. (1997). Machine Learning, McGraw Hill, New York, NY. Zbl0913.68167
Munoz, J., Gutierrez, G. and Sanchis, A. (2009). Controller for TORCS created by imitation, Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games (CIG-09), Milano, Italy, pp. 271-278.
Park, B.-H. and Kargupta, H. (2002). Constructing simpler decision trees from ensemble models using Fourier analysis, Proceedings of the 7th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Madison, WI, USA, pp. 18-23.
Pomerleau, D. (1988). ALVINN: An autonomous land vehicle in a neural network, Advances in Neural Information Processing Systems 1 (NIPS-88), Denver, CO, USA, pp. 305-313.
Quinlan, J.R. (1986). Induction of decision trees, Machine Learning 1(1): 81-106.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.
Quinlan, J.R. (1999). Simplifying decision trees, International Journal of Human-Computer Studies 51(2): 497-491.
R Development Core Team (2010). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, www.R-project.org.
Sammut, C. (1996). Automatic construction of reactive control systems using symbolic machine learning, Knowledge Engineering Review 11(1): 27-42.
Sammut, C., Hurst, S., Kedzier, D. and Michie, D. (1992). Learning to fly, Proceedings of the 9th International Conference on Machine Learning (ML-92), Aberdeen, UK, pp. 385-393.
Stavens, D.M. (2011). Learning to Drive: Perception for Autonomous Cars, Ph.D. thesis, Stanford University, Stanford, CA.
Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
Therneau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines, Technical report, Mayo Clinic, Rochester, MN.
Thrun, S. (2010). What we're driving at, Google Official Blog, http://googleblog.blogspot.com/2010/10/ what-were-driving-at.html.
Togelius, J., De Nardi, R. and Lucas, S.M. (2006). Making racing fun through player modeling and track evolution, Proceedings of the SAB-06 Workshop on Adaptive Approaches for Optimizing Player Satisfaction in Computer and Physical Games, Rome, Italy, pp. 61-70.
Triviño Rodriguez, J.L., Ruiz-Sepúlveda, A. and Morales-Bueno, R. (2008). How an ensemble method can compute a comprehensible model, Proceedings of the 10th International Conference Data Warehousing and Knowledge Discovery (DaWaK-08), Turin, Italy, pp. 368-378.
Urbancic, T. and Bratko, I. (1994). Reconstructing human skill with machine learning, Proceedings of the 11th European Conference on Artificial Intelligence (ECAI-94), Amsterdam, The Netherlands, pp. 498-502.
Utgoff, P. E. (1989). Incremental induction of decision trees, Machine Learning 4(2): 161-186.
Van Assche, A. and Blockeel, H. (2007). Seeing the forest through the trees: Learning a comprehensible model from an ensemble, Proceedings of the 18th European Conference on Machine Learning (ECML-07), Warsaw, Poland, pp. 418-429. Zbl1136.68506
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edn., Morgan Kaufmann, San Francisco, CA. Zbl1076.68555
Wymann, B. (2006). TORCS manual installation and robot tutorial, http://www.berniw.org/aboutme/publications/torcs.pdf.
Zajdel, R. (2013). Epoch-incremental reinforcement learning algorithms, International Journal of Applied Mathematics and Computer Science 23(3): 623-635, DOI: 10.2478/amcs-2013-0047. Zbl1281.93113

Citations in EuDML Documents

top

Pablo J. Villacorta, David A. Pelta, A repeated imitation model with dependence between stages: Decision strategies and rewards

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.