Simple Embedded Architecture for Robot Learning and Emotion

Jeff Kroll

April, 2010


  1. Introduction/Overview

The goal of this project is to construct a framework for a data structure and algorithms that will allow a “pet” type robot to be perceived by its owner to possess emotion, personality and the ability to learn from its environment. The initial goal is to construct an architecture that is capable of being run on an 8-bit embedded processor with extensive serial-connected off-chip memory (2GB SD Card for example). If the computational complexity or memory requirements become untenable, though, it may be necessary to increase the processor capabilities or even provide sensing and actuation locally on the robot, but perform a wireless link to a more capable computer for memory and cognition.

In order for a robot to interact meaningfully with its owner it must be able to do certain things:

Trade-offs must be made in the number and complexity of inputs, outputs and algorithms. More complexity will allow the robot to sense and react in more life-like ways, but will require more and faster hardware. As an extreme example, no one would want a pet that consisted of a single LED that could only be on or off, no matter how much sensing or cognition it had. Alternatively, robots with numerous degrees of freedom to act but without any way to sense environment are nothing more than programmable toys or industrial automatons. However every increase in capabilities requires a corresponding increase in memory and computing capability, usually with some form of power law defining the size of the increase (twice as much capability is four times as complex, three times as many sensors requires nine times as much memory, etc)..

The architecture described in this paper is based on four elements; states, actions, results and predictions. The first three of these form a tree structure that will enable the robot to model its environment by recording the results of an action when it is performed from a specific state. Predictions will have to be be generated when the robot is in an unfamiliar state or wishes to perform a previously untried action from a known state.

  1. States

A state will be a snapshot of all current parameters, internal and external, to which the robot has access. The quantization level of some parameters may be increased to reduce storage requirements.

Modification of state quantization levels would allow demonstration of unique abilities among individuals (good hearing, vision, time sense, etc.). To accomplish this state variables would need to be stored in a generic manner, and state evaluation and comparison would need to be similarly generic.

  1. Actions

Actions are the set of all output settings for the robot. Examples of actions could include, but are not limited to:

Note that emotions are not actions. The robot is not able to directly manipulate its emotions, it must attempt to induce an emotional change by acting appropriately to change its state in the desired manner.

  1. State-Action-Result (SAR) tree

The State-Action-Result tree will store the experience of the robot as linked lists of State-Action- Result sequences, Essentially “I was in state X 100 times, 10 of those times I did Y, 2 of those times I ended up in state Z.” (See Figure 1 )

Figure 1

The list of states would be easier to search if it were ordered.

Order list of states in real time? or add to end of list and order later? Maybe while dreaming? If list is not ordered in real time, search methods must be reliable (if not as efficient) with unordered list. If we store recent SAR entries in a separate tree, searched first for planning, then integrated and sorted into the main tree later, we would essentially be mimicking short vs long term memory.

Do we need a method to list the SA sequences that lead to each state? It might make it easier to perform planning.

  1. Prediction

When the robot is in an unfamiliar state, or wishes to perform a previously untried action from a known state, it must search for a similar SA sequence and use this to predict a result. This will be necessary in order to allow the robot to plan in the face of unknown states and actions, as well as to enable it to reliably generate “surprise” when the results are significantly different from similar SA sequences.

How do we find “similar” states or actions? Maybe a kind of least squares fit? Hopefully something less computationally intensive.

  1. State Comparison

  1. State evaluation

A state will be evaluated recursively on the parameters of the state itself as well as the evaluations of reachable states weighted by their probability and distance from the current state. The following parameters will be used to evaluate a state:


How do we define, learn and store good/bad values and states?

  1. Goals & Planning

  1. Goal Structure

A goal is a modified state, where all status variables have a goal value and a weighting for how important that variable is to the goal. Below are a few simplified example goals:

“I don’t care where I am, as long as I’m happy”

X=0, Xweight=0

Y=0, Yweight=0

Happy=90, Happyweight=90

Battery=0, Batteryweight=0

Light=0, Lightweight=0

“I need to be somewhere around (34,22) location”

X=34, Xweight=75

Y=22, Yweight=75

Happy=0, Happyweight=0

Battery=0, Batteryweight=0

Light=0, Lightweight=0

“I need to be in the light”

X=0, Xweight=0

Y=0, Yweight=0

Happy=0, Happyweight=0

Battery=0, Batteryweight=0

Light=75, Lightweight=95

What about sub-goals? How finely should we resolve goals into sub-goal steps? Is it “I need food” or “I need to get money, then get in the car, then drive to the store...”, etc? Needs to be resolved to an actionable level eventually, where actionable means within the average search depth.

  1. Goal Generation

How do we generate initial goals? Maybe have a set of basic “givens” e.g. Happy, Fed, etc.

Then how do we resolve them into sub-goals for planning purposes?

  1. Goal priority

At any one time the robot may have multiple goals. The currently active goal will be determined by goal priority. The priority of all current goals will be re-evaluated during each goal generation sequence.

Is there a good (maybe not optimal) way to accomplish A that also facilitates B?

  1. Play

A low priority default goal will be “play.” In the absence of high priority goals, this goal will become dominant, which will result in performing a semi-random new action from the current state (avoid actions that lead to “bad” states). Playing will allow the robot to increase the scope of knowledge stored in the SAR tree. Once the robot has acquired extensive experiences, it will be capable of generating high level goals in a larger percentage of situations, and will fall back on the default “play” behavior less often (Awww, they grow up so fast ;-).

  1. Planning

Planning will be performed by searching the SAR tree for a sequence of actions that will take the robot from the current state to the goal goal state with a reasonable probability.

  1. Search Depth

Search depth will be a measure of how far through the SAR tree the robot will search before choosing an action that has the highest probability of advancing towards the goal.

“Smarter” individuals may have a deeper search depth. Extreme search depth may result in an inability to act because of too much planning (I know some people like that...).

Search depth will need to be variable depending on urgency. (Wait, what is urgency?)

Changing goals. Need a way to remove a goal from the list if it no longer is required.


  1. Inductive

During the natural course of events situations will arise where multiple state-action sequences will lead to the same result. It would be useful to be able to generalize these situations to eliminate state variable that appear to be independent of the result.

Analogies. If X behaves like Y in these other situations, maybe it X will behave like Y in this situation.

Generalization. Need some way to recognize a sub-pattern in state variables and generate a new structure to represent it.

  1. Deductive

Deductive reasoning is performed by following the SAR tree from one state to another. Since the result in a SAR sequence is actually another state, any sequence of SARAR...AR would be an example of deductive reasoning.

Need a way to do a lookup of all which match certain conditions of the current state (i.e position~(3,4), time~noon, everything else =don’t care), then extrapolate from each starting point to find path forward.

maybe use a form of inattentional blindness to construct the search terms


From (3,4) at noon if I do X, Y will happen. Except on Thursdays. Is today Thursday? No? Ok, do X.

  1. Emotions

Emotions in this model will be a set of variables that will influence reactions and decisions of the robot. Emotional states will be cumulative, and modified in real-time by evaluation of past, current and future states.

Situations will modify emotions, emotions will modify actions, and actions will result in new situations. Emotions in this architecture are represented purely for the purpose of enabling anthropomorphization by the user, and are not intended to assist in the learning process.

  1. Anxiety

From Wikipedia:

Anxiety (also called angst or worry) is a psychological and physiological state characterized by somatic, emotional, cognitive, and behavioral components.It is the displeasing feeling of fear and concern. The root meaning of the word anxiety is 'to vex or trouble'; in either presence or absence of psychological stress, anxiety can create feelings of fear, worry, uneasiness, and dread.Anxiety is considered to be a normal reaction to a stressor. It may help an individual to deal with a demanding situation by prompting them to cope with it.

  1. Fear

From Wikipedia:

Fear almost always relates to future events, such as worsening of a situation, or continuation of a situation that is unacceptable. Fear could also be an instant reaction to something presently happening.

The most common physical reactions of fear include:

The facial expression of fear includes the widening of the eyes (out of anticipation for what will happen next); the pupils dilate (to take in more light); the upper lip rises, the brows draw together, and the lips stretch horizontally. The physiological effects of fear can be better understood from the perspective of the sympathetic nervous responses (fight-or-flight), as compared to the parasympathetic response, which is a more relaxed state. Muscles used for physical movement are tightened and primed with oxygen, in preparation for a physical fight-or-flight response. Perspiration occurs due to blood being shunted from body's viscera to the peripheral parts of the body. Blood that is shunted from the viscera to the rest of the body will transfer, along with oxygen and nutrients, heat, prompting perspiration to cool the body. When the stimulus is shocking or abrupt, a common reaction is to cover (or otherwise protect) vulnerable parts of the anatomy, particularly the face and head. When a fear stimulus occurs unexpectedly, the victim of the fear response could possibly jump or give a small start. The person's heart-rate and heartbeat may quicken.

  1. Pleasure

From Wikipedia

Many pleasurable experiences are associated with satisfying basic biological drives. Other pleasurable experiences are associated with social experiences and social drives.

  1. Interest

  1. Effects of emotions on actions

  1. Effects of state evaluation on emotions

  1. Anxiety

Anxiety can be caused by the following:

  1. Fear

Fear is increased if the future is predicted to be undesirable. This can be calculated by accumulating the state evaluations resulting from the current state to a certain depth. Fear will be increased proportional to:

  1. Happiness

Happiness will be increased by:

  1. Personality

Personality will be modeled as a set of resilient modifiers for how likely the robot is to experience and display certain emotions. In other words the matrices that define how state evaluations will affect emotions and how emotions effect outputs will define the robot’s personality. The personality of a robot will initially be fixed at instantiation of the robot (birth), but future models may incorporate the ability to modify a personality matrix over long periods of time and/or extensive experience. Modifying personality matrices will allow demonstration of unique individuals in ways that are separate from the sum of their learned experiences (nature vs. nurture).

These modifiers will be a subset of the Big 5 personality model.

  1. Openness

A robot with a high openness factor will not increase anxiety as much in new situations.

  1. Conscientiousness

  1. Extraversion

  1. Agreeableness

  1. Neuroticism

  1. Dreaming

  1. Deletion of low probability SAR sequences Except that low-probability sequences may be the interesting ones. Have to look into that.
  2. Memory defragmentation
  3. Ordering of SAR list for improved search speed (Integration of short term memory)
  4. Generalizing SAR sequences to eliminate independent state variables

  1. Acknowlegdements/Bibliography

  1. Steve Grand’s Blog


Try to hard code as little as possible. Maybe instead use a “default” SAR tree to initially define responses in a logical manner. This could be the equivalent of inherited or instinctual responses that could, if necessary, be “unlearned” at a later date. Given the right experimental circumstances, an SAR tree could function as a genome in a GA scenario. Will probably need some utility for uploading, viewing, modifying and downloading SAR structures.

Modifying personality matrices will allow demonstration of unique individuals in ways that are separate from the sum of their learned experiences (nature vs. nurture).

Behavioural variance – acting differently according to our environment – is a celebrated part of being human. Anyone who lacks it is boring.