Physical AI via�Hierarchical Decision Processes
Yihao Liu
December 12, 2025
Hierarchical Decision Processes
2
Recent Trends
3
RL in IsaacLab [1]
IL Demonstration Trajectories (Sim) [2]
VLA (Nvidia GR00T) Synthetic and Teleop Data [3]
[1] Internship at Astera Institute
[2] Mu, T., Liu, Y., & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[3] Bjorck, J., Castañeda, F., Cherniadev, N., Da, X., Ding, R., Fan, L., ... & Zhu, Y. (2025). Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint arXiv:2503.14734.
Recent Trends
4
Question 1. Will the emergent behavior happen like in Large Language Models (LLM)?
Question 2. Is the reactive, reflex-like behavior sufficient?
Ganguli, D., Hernandez, D., Lovitt, L., Askell, A., Bai, Y., Chen, A., ... & Clark, J. (2022, June). Predictability and surprise in large generative models. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency (pp. 1747-1764).
Scaling Law
5
1 Vision-Language Models
2 Vision-Language-Action Models
Scaling Law
6
1 Vision-Language Models
2 Vision-Language-Action Models
Models’ Characteristics – Imitation Learning
7
[1] Zhao, T. Z., Kumar, V., Levine, S., & Finn, C. (2023). Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705.
[2] Chi, C., Xu, Z., Feng, S., Cousineau, E., Du, Y., Burchfiel, B., Tedrake, R., & Song, S. (2025). Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11), 1684-1704.
Action Chunking Transformer [1]
Diffusion Policy [2]
Question 2. Is the reactive, reflex-like behavior sufficient?
”Scoop Raisins into Bowl”
”Scoop Pretzels into Bowl”
Models’ Characteristics – Imitation Learning
8
Action Chunking Transformer [1]
Diffusion Policy [2]
[1] Zhao, T. Z., Kumar, V., Levine, S., & Finn, C. (2023). Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705.
[2] Chi, C., Xu, Z., Feng, S., Cousineau, E., Du, Y., Burchfiel, B., Tedrake, R., & Song, S. (2025). Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11), 1684-1704.
Both are sequence-to-sequence policies
Question 2. Is the reactive, reflex-like behavior sufficient?
Images
Encoder
Robot Action
Encoder
Latent Vector
Decoder
Future Robot Action
Models’ Characteristics – VLA Models
9
[1] Bjorck, J., Castañeda, F., Cherniadev, N., Da, X., Ding, R., Fan, L., ... & Zhu, Y. (2025). Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint arXiv:2503.14734.
[2] Chi, C., Xu, Z., Feng, S., Cousineau, E., Du, Y., Burchfiel, B., Tedrake, R., & Song, S. (2025). Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11), 1684-1704.
[3] Physical Intelligence. (2025). π_{0.5}: a Vision-Language-Action Model with Open-World Generalization. arXiv preprint arXiv:2504.16054
GR00T N1 [1], SmolVLA [2], pi0.5 [3]
Question 2. Is the reactive, reflex-like behavior sufficient?
Robot Action
Encoder
Latent Vector
Decoder
Future Robot Action
Pretrained Vision-Language Encoder
Images
pi0.5
“Put Green Pepper into Pot”
All are reactive visuomotor policies with some learned implicit planning, and still struggle on long-horizon tasks
10
How Animals Learn to Act and Decide
[1] Sun, W., Winnubst, J., Natrajan, M., Lai, C., Kajikawa, K., Bast, A., ... & Spruston, N. (2025). Learning produces an orthogonalized state machine in the hippocampus. Nature, 640(8057), 165-175.
“State Cells” in Hippocampus
11
Sun, W., Winnubst, J., Natrajan, M., Lai, C., Kajikawa, K., Bast, A., ... & Spruston, N. (2025). Learning produces an orthogonalized state machine in the hippocampus. Nature, 640(8057), 165-175.
“State Cells” in Hippocampus
12
Sun, W., Winnubst, J., Natrajan, M., Lai, C., Kajikawa, K., Bast, A., ... & Spruston, N. (2025). Learning produces an orthogonalized state machine in the hippocampus. Nature, 640(8057), 165-175.
“Learning produces an orthogonalized state machine in the hippocampus”
Two Layers�Formalism
13
Two Layers
14
Two Layers
15
Two Layers
16
States and Transitions
17
Markovian
18
State machine is a special case of “decision processes” using a tuple
Other frameworks may include
Different state spaces and time spaces
We could define S to be a hybrid by Cartesian product
An Intuitive Example using Finite State Machines
19
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
An Intuitive Example using Finite State Machines
20
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
Text
Executables
Agent
Input to Agent:
Task Description
State Constraints
Operation Constraints
An Intuitive Example using Finite State Machines
21
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
22
Large�Language
Model
Sim Task Demonstrations [1], [2]
Robot Models
Generate
Scripts
Imitation
Learning
LLMs
What About the Skill Library?
Generating Robotic Tasks for Imitation Learning via LLM in the Literature
[1] Shridhar et al. "Cliport: What and where pathways for robotic manipulation." Conference on robot learning. PMLR, 2022.
[2] Wang et al. "Gensim: Generating robotic simulation tasks via large language models." arXiv:2310.01361 2023.
Challenges
23
Generating Robotic Tasks for Imitation Learning via LLM in the Literature
[1] Shridhar et al. "Cliport: What and where pathways for robotic manipulation." Conference on robot learning. PMLR, 2022.
[2] Wang et al. "Gensim: Generating robotic simulation tasks via large language models." arXiv:2310.01361 2023.
Handling state-dependent, long-horizon tasks is particularly challenging
Key Insight: Utilize FSM as a Planner and to Structure Demonstrations
24
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
Key Insight: Utilize FSM as a Planner and to Structure Demonstrations
25
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
Experiments
Hanoi Tower Task
River Crossing Task
Chess Task
Key Insight: Utilize FSM as a Planner and to Structure Demonstrations
26
[1] Mu, T., Liu, Y. & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[2] Liu, Y., & Armand, M. (2024). A roadmap towards automated and regulated robotic systems. arXiv preprint arXiv:2403.14049.
Experiments
27
[1] Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. arXiv preprint arXiv:2506.06941.
Architecture��Generalize from closed “finite” states to open space
28
to Handle Complexity
An Architecture to Handle Complexity
29
SMSL – enumerative
Symbolic Planning – factored
Conceptual frameworks
Handling Complexity by Open Action and Hierarchy
30
Handling Complexity by Open Action and Hierarchy
31
Abstraction
Concretization
Handling Complexity by Open Action and Hierarchy
32
Abstraction
Concretization
Pipeline
33
Overall Architecture
34
Symbolic Components
35
Symbolic Planner
36
Inputs: Current S,
Goal G,
Option Library A
Outputs: Symbolic Plan [a_0, …, a_{K-1}]
Mu, T., Liu, Y., & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Symbolic Planner
37
Mu, T., Liu, Y., & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Symbolic Planner
38
Liu, Y., Kheradmand, A., & Armand, M. (2023). Toward process controlled medical robotic system. arXiv preprint arXiv:2308.05809.
Symbolic Planner
39
Liu, Y., Kheradmand, A., & Armand, M. (2023). Toward process controlled medical robotic system. arXiv preprint arXiv:2308.05809.
Symbolic Planner
40
Liu, Y., Kheradmand, A., & Armand, M. (2023). Toward process controlled medical robotic system. arXiv preprint arXiv:2308.05809.
Symbolic Planner
41
Liu, W., Chen, G., Hsu, J., Mao, J., & Wu, J. (2024). Learning planning abstractions from language. ICLR.
Symbolic Visualizer
42
Symbolic Visualizer
43
Mu, T., Liu, Y., & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Symbolic Visualizer
44
Mu, T., Liu, Y., & Armand, M. (2025). Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Symbolic Predictor
45
Liu, W., Chen, G., Hsu, J., Mao, J., & Wu, J. (2024). Learning planning abstractions from language. ICLR.
Silver, T., Chitnis, R., Tenenbaum, J., Kaelbling, L. P., & Lozano-Pérez, T. (2021, September). Learning symbolic operators for task and motion planning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3182-3189). IEEE.
Symbolic Estimator
46
Symbolic Estimator
47
Liu., Y. Unpublished work.
Symbolic Components
48
49
Sub-Symbolic Components
50
Sub-Symbolic Components
Sensor and Actuator
51
Liu, Y., Kheradmand, A., & Armand, M. (2023). Toward process controlled medical robotic system. arXiv preprint arXiv:2308.05809.
Visual
servoing
Sensor and Actuator
52
Liu, Y., Kheradmand, A., & Armand, M. (2023). Toward process controlled medical robotic system. arXiv preprint arXiv:2308.05809.
Visual servoing
Sub-Symbolic Estimator and Actor
53
Sub-Symbolic Estimator
54
Liu, Y., Zhang, J., Diaz-Pinto, A., Li, H., Martin-Gomez, A., Kheradmand, A., & Armand, M. (2024, April). Segment any medical model extended. In Medical Imaging 2024: Image Processing (Vol. 12926, pp. 411-422). SPIE.
Sub-Symbolic Estimator
55
Zhang, J., Zhang, Z., Liu, Y., Chen, Y., Kheradmand, A., & Armand, M. (2024, May). Realtime robust shape estimation of deformable linear object. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 10734-10740). IEEE.
Sub-Symbolic Actor
56
Sub-Symbolic Predictor
57
Sub-Symbolic Predictor
58
Internship at Astera Institute
Sub-Symbolic Visualizer
59
Ai, L., Liu, Y., Armand, M., Kheradmand, A., & Martin-Gomez, A. (2024, May). On the Fly Robotic-Assisted Medical Instrument Planning and Execution Using Mixed Reality. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE.
Symbolic Planner
60
Symbolic Visualizer
61
Symbolic Predictor
62
Symbolic Estimator
63
Sub-Symbolic Estimator and Actor
64
Sub-Symbolic Predictor
65
Symbolic Visualizer
66
Sub-Symbolic Planner and Dispatcher
67
Infrastructure
68
Robotic System�Data Platform�Calibrations
Robotic Systems
69
Liu, Y., Zhang, J., Ai, L., Tian, J., Sefati, S., Liu, H., ... & Armand, M. (2025). An Image-Guided Robotic System for Transcranial Magnetic Stimulation: System Development and Experimental Evaluation. IEEE Robotics and Automation Letters.
Data Platforms / Digital Twin
70
Liu, Y., Ku, Y. C., Zhang, J., Ding, H., Kazanzides, P., & Armand, M. (2025). dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Data Platforms / Digital Twin
71
Liu, Y., Ku, Y. C., Zhang, J., Ding, H., Kazanzides, P., & Armand, M. (2025). dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Data Platforms / Digital Twin
72
Liu, Y., Ku, Y. C., Zhang, J., Ding, H., Kazanzides, P., & Armand, M. (2025). dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Geometric Calibrations
73
Geometric Calibrations
74
Liu, Y., Zhang, J., She, Z., Kheradmand, A., & Armand, M. (2024, May). Gbec: Geometry-based hand-eye calibration. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 16698-16705). IEEE.
Ai, L., Liu, Y., Armand, M., & Martin-Gomez, A. (2024, October). Calibration of Augmented Reality Headset with External Tracking System Using AX= YB. In 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 210-218). IEEE.
Hand-eye calibration
Virtual-to-real calibration
Geometric Calibrations
75
Hand-eye calibration
Virtual-to-real calibration
Liu, Y., Zhang, J., She, Z., Kheradmand, A., & Armand, M. (2024, May). Gbec: Geometry-based hand-eye calibration. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 16698-16705). IEEE.
Ai, L., Liu, Y., Armand, M., & Martin-Gomez, A. (2024, October). Calibration of Augmented Reality Headset with External Tracking System Using AX= YB. In 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 210-218). IEEE.
Contributions
76
Future Works
77
Publications
78
Acknowledgement
79
80
81
Thesis Committee
82
83
84
85
My family
Overall Architecture
86