NAIL122 AI for Games�Modeling RTS Combat Situations II��Jakub Gemrot
Faculty of Mathematics and Physics
Charles University
20.12.2021
MCTS – Reminder
MCTS�High-level description
MCTS iteration
Image adapted from: Santos, A., Santos, P. A., & Melo, F. S. (2017, August). Monte carlo tree search experiments in hearthstone. In 2017 IEEE Conference on Computational Intelligence and Games (CIG) (pp. 272-279). IEEE.
MCTS formulated for sequential games.
How about simultaneous ones?
=> UCT considering durations (UCTCD)
Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.
UCT Considering Durations�a.k.a. UCTCD
Main idea: do not store states, only actions; mark nodes as FIRST, SECOND or SOLO. Do not apply actions in FIRST, wait for SECOND.
UCT Considering Durations�a.k.a. UCTCD
Top-level of MCTS, limited both by max number of iterations and time.
MCTS iteration == Traverse.
At the beginning of each iteration, we do Clone(s) to always work with fresh state clone.
UCT Considering Durations�a.k.a. UCTCD
Traverse method contains both selection, expansion and back-propagation; it is formulated as a recursive procedure (row 19).
UCT Considering Durations�a.k.a. UCTCD
SelectNode implements MCTS tree-policy using standard UCB a.k.a. UCB-1.
UCT Considering Durations�a.k.a. UCTCD
1. UpdateState is used to advance the (cloned) state using “move” information that is maintained in each node; i.e. a node is associated with a move that was used to get to it.
2. Each node is also labeled as:�FIRST – simultaneous node, first player to decide�SECOND – simultaneous node, second player to decide
3. SOLO nodes „not created“; sequence of SOLO node moves is aggregated within single node move of corresponding player.
UCT Considering Durations�a.k.a. UCTCD
9-11:�We’re at leaf we visited for the first time; do the move (10) and perform playout (11).
UCT Considering Durations�a.k.a. UCTCD
Otherwise, update the state with the move, and…
UCT Considering Durations�a.k.a. UCTCD
14-15:�For terminal node, just return the value of the node.
UCT Considering Durations�a.k.a. UCTCD
17-19:�Otherwise ensure children by enumerating possible actions (17-18), i.e., expand the node “fully”, and continue traversal (19).
UCT Considering Durations�a.k.a. UCTCD
Open questions
Question is, whether the alternative approaches would be better in terms of performance.
Portfolio Greedy Search
Combining scripts and playouts �while avoiding tree-search.
Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.
Portfolio Greedy Search�a.k.a. PGS
Motivation
Extreme solution for RTS combat is utilization of fixed scripts, e.g., NOK-AV or KITER. However, using single script is too restrictive while some may be decomposed to sub-scripts (KITER == NOK-AV + RETREAT). Can we extend this?
Use set of scripts; try to find a good assignments to units using greedy approach, by simulating possible exploitation by the opponent.
Portfolio Greedy Search�a.k.a. PGS
Portfolio Greedy Search�a.k.a. PGS
Portfolio Greedy Search�a.k.a. PGS
PortfolioGreedySearch sections:
Row 7 <-> [A] … default enemy
Row 8 <-> [B] … seed yourself
Row 9 <-> [C] … seed enemy
Rows 10-13 <-> [D] … improve
Portfolio Greedy Search�a.k.a. PGS
Portfolio Greedy Search�a.k.a. PGS
Experiments and Results
Comparing PGS to UCTCD, ABCD, and scripts.
Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.
Portfolio Greedy Search�Results
Portfolio Greedy Search�Results
Scenario setup
Groups used
100 random battles per group configuration per sy/se-state per number of units.
Portfolio Greedy Search�Results
Search settings
More details in the paper…
Fun fact: ABCD and UCTCD is working with low-level actions while PGS with scripts only.
Portfolio Greedy Search�Results
Portfolio Greedy Search�Results
PGS much better for separated states�What are your hypotheses?�What would you do in real-life then?
Limitation of PGS
Possibility of non-convergence.
Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.
Portfolio Greedy Search�Limitations
P1
P2
P2
P2
Nested Greedy Search
Using PGS in hierarchy
Moraes, R., Mariño, J., & Lelis, L. (2018, September). Nested-greedy search for adversarial real-time games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 14, No. 1).
Nested Greedy Search�a.k.a. NGS
Nested Greedy Search�a.k.a. NGS
BTW, notation could not have been more confusing here…
NestedGreedySearch sections:
Row 1,3 <-> [A] … default me
Row 2,4 <-> [B] … default enemy
Row 5 <-> [C] … while we have time
Rows 6-10 <-> [C1] … check on reassignments
Row 9 <-> [C2] … should we reassign?
Nested Greedy Search�a.k.a. NGS
Nested Greedy Search�a.k.a. NGS
Nested Greedy Search�a.k.a. NGS
NestedGreedySearch notes:�Having GS on the second ply of a game tree effectively means, that we move non-convergence down by one ply.
Experiments and Results
Comparing NGS to other searches.
Moraes, R., Mariño, J., & Lelis, L. (2018, September). Nested-greedy search for adversarial real-time games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 14, No. 1).
Nested Greedy Search�Results
Nested Greedy Search�Results
Nested Greedy Search�Results
NGS was the usual winner.
AHT – Adversarial HTN, NAV – Naive MCTS, PS – Puppet search, STT – Strategy tactics
… all adversaries are search based.
PGS / NGS�Final thoughts
Is it all but smoke and mirrors?
Using low-level actions in searches are probably overwhelming to script-based approaches.
Should we really be comparing stuff across “time limits”? How about using “iterations” instead?
Where is the breaking point?
What if we put scripts into MCTS instead of low-level actions (psst, we actually did and it outperformed PGS clearly)?
Or otherwise used scripts to prune MCTS so we get to the “breaking point”?
=> Any of this could effectively become your master thesis topic!
That’s it for today!