New random state e is selected
New random state b is selected
Q value = Avg of the values.
Update the policy where Q value is high