1. Introduction
You may have a difficulty in making your software to be adaptable on interaction with human. The main reason is that in general, it is very hard to expect people’s responses in previous at various situations. How to solve the problem? If a machine can change its internal parameters based on a uses’ intention, an adaptable system can be built. One of solutions is a machine learning algorithm and Q-learning can be used. Since it is reinforcement learning, a system can adapt or change itself based on interactions with human. In this lecture, I will address the theory of Q-learning, software architecture of the machine learning, and examples.
2. Theory
Machine learning algorithms can be divided into several categories by learning method: supervised learning, reinforcement learning, unsupervised learning, etc. In case of Q-learning, it is reinforcement learning, because it receives feedback to guide the learning algorithm from environment and acts. In 1989, Watkins introduced it and it is used in numerous areas such as robotics, game, industry and so on. The reasons which let the learning algorithm widespread are that the algorithm is very simple and easy to implement and shows good result in some applications. However, it has also flaws: If states and actions are so big, the performance gets bad. It will not try for better actions in a particular state and will repeat one action when it takes an action to show good results.
Let's take a look at the algorithm of Q-learning, Aforementioned, Q-learning is composed with actions, states, rewards, Q-values and parameters. Their definitions are:
- An action is to change a state to another state.
- A state is to describe a certain situation.
- A Reward is a feedback from doing an action in a state.
- Q-value is a value used in choosing an action from numerous actions in a state.
Q-values that are the result of Q-learning are obtained by learning process described in Equation-1. The core of the algorithm is to update the Q-values based on new rewards.
Equation-1
2. Implementation
In this chapter, I will describe how to implement reusable Q-learning with C++. To implement a reusable Q-learning, I design the software architecture of Q-learning that separates the core algorithm and configurable parts. If you mix them, you have to make a new Q-learning for every applications. Fig.1 shows Q-learning class diagram and it is composed with state, action, and Q-learning class. The state class contains all variables to compose a state and has all related actions. The action class has reward and Q-value which are updated by Q-learning algorithm. The Q-learning class manages all state and actions and updates Q-values. The configurable parts are states and actions, because they are defined differently by problems. Therefore, I placed all information about states and actions in a XML file in order to avoid changing the core source code of Q-learning. Unfortunately, making a configuration file is an onerous job, because there will be tens or hundreds of actions and states for solving just one problem. Therefore, I made a GUI to make configuration files with ease. Using this GUI, you can define states and actions which will be used on various problems and create a XML file to contain all information about Q-learning and can be imported to the core source code of Q-learning.
Fig.1 Class Diagram of Q-learning
Fig.2 Graphical User Interface for Q-learning
Understanding a variety of algorithms and apply them to certain problems appropriately are very important, but making reusable software of algorithms is crucial, too. If you don’t make reusable software programs, you spend enormous time to make software programs to work out each problem. At first, when you try to make a reusable software program, you may feel that it is useless and it is waste of time. However, whenever you confront a situation to solve a problem with a previous software program of the same algorithm, you would think that I should have made a reusable one. Consequently, if you design a software program, you should spend enough time to design it as reusable one. If you do that, I strongly believe that you would save our time magnificently later.