1 of 41

NAIL122 AI for Games�Modeling RTS Combat Situations II�Jakub Gemrot

Faculty of Mathematics and Physics

Charles University

20.12.2021

2 of 41

MCTS – Reminder

3 of 41

MCTS�High-level description

MCTS iteration

 

Image adapted from: Santos, A., Santos, P. A., & Melo, F. S. (2017, August). Monte carlo tree search experiments in hearthstone. In 2017 IEEE Conference on Computational Intelligence and Games (CIG) (pp. 272-279). IEEE.

 

 

 

 

 

 

 

 

4 of 41

MCTS formulated for sequential games.

How about simultaneous ones?

=> UCT considering durations (UCTCD)

Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.

5 of 41

UCT Considering Durations�a.k.a. UCTCD

Main idea: do not store states, only actions; mark nodes as FIRST, SECOND or SOLO. Do not apply actions in FIRST, wait for SECOND.

6 of 41

UCT Considering Durations�a.k.a. UCTCD

Top-level of MCTS, limited both by max number of iterations and time.

MCTS iteration == Traverse.

At the beginning of each iteration, we do Clone(s) to always work with fresh state clone.

7 of 41

UCT Considering Durations�a.k.a. UCTCD

Traverse method contains both selection, expansion and back-propagation; it is formulated as a recursive procedure (row 19).

8 of 41

UCT Considering Durations�a.k.a. UCTCD

SelectNode implements MCTS tree-policy using standard UCB a.k.a. UCB-1.

9 of 41

UCT Considering Durations�a.k.a. UCTCD

1. UpdateState is used to advance the (cloned) state using “move” information that is maintained in each node; i.e. a node is associated with a move that was used to get to it.

2. Each node is also labeled as:�FIRST – simultaneous node, first player to decide�SECOND – simultaneous node, second player to decide

3. SOLO nodes „not created“; sequence of SOLO node moves is aggregated within single node move of corresponding player.

10 of 41

UCT Considering Durations�a.k.a. UCTCD

9-11:�We’re at leaf we visited for the first time; do the move (10) and perform playout (11).

11 of 41

UCT Considering Durations�a.k.a. UCTCD

Otherwise, update the state with the move, and…

12 of 41

UCT Considering Durations�a.k.a. UCTCD

14-15:�For terminal node, just return the value of the node.

13 of 41

UCT Considering Durations�a.k.a. UCTCD

17-19:�Otherwise ensure children by enumerating possible actions (17-18), i.e., expand the node “fully”, and continue traversal (19).

14 of 41

UCT Considering Durations�a.k.a. UCTCD

Open questions

  1. Generating all children at node (row 18) must be costly.
  2. Recomputing the state each traversal must be costly.
  3. Is it necessary to keep FIRST / SECOND node separate?

Question is, whether the alternative approaches would be better in terms of performance.

15 of 41

Portfolio Greedy Search

Combining scripts and playouts �while avoiding tree-search.

Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.

16 of 41

Portfolio Greedy Search�a.k.a. PGS

Motivation

Extreme solution for RTS combat is utilization of fixed scripts, e.g., NOK-AV or KITER. However, using single script is too restrictive while some may be decomposed to sub-scripts (KITER == NOK-AV + RETREAT). Can we extend this?

  • Portfolio Greedy Search

Use set of scripts; try to find a good assignments to units using greedy approach, by simulating possible exploitation by the opponent.

17 of 41

Portfolio Greedy Search�a.k.a. PGS

 

18 of 41

Portfolio Greedy Search�a.k.a. PGS

 

19 of 41

Portfolio Greedy Search�a.k.a. PGS

PortfolioGreedySearch sections:

Row 7 <-> [A] … default enemy

Row 8 <-> [B] … seed yourself

Row 9 <-> [C] … seed enemy

Rows 10-13 <-> [D] … improve

20 of 41

Portfolio Greedy Search�a.k.a. PGS

 

21 of 41

Portfolio Greedy Search�a.k.a. PGS

 

22 of 41

Experiments and Results

Comparing PGS to UCTCD, ABCD, and scripts.

Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.

23 of 41

Portfolio Greedy Search�Results

 

24 of 41

Portfolio Greedy Search�Results

Scenario setup

Groups used

100 random battles per group configuration per sy/se-state per number of units.

25 of 41

Portfolio Greedy Search�Results

Search settings

More details in the paper…

Fun fact: ABCD and UCTCD is working with low-level actions while PGS with scripts only.

26 of 41

Portfolio Greedy Search�Results

27 of 41

Portfolio Greedy Search�Results

PGS much better for separated states�What are your hypotheses?�What would you do in real-life then?

28 of 41

Limitation of PGS

Possibility of non-convergence.

Churchill, D., & Buro, M. (2013, August). Portfolio greedy search and simulation for large-scale combat in StarCraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG) (pp. 1-8). IEEE.

29 of 41

Portfolio Greedy Search�Limitations

 

P1

P2

P2

P2

30 of 41

Nested Greedy Search

Using PGS in hierarchy

Moraes, R., Mariño, J., & Lelis, L. (2018, September). Nested-greedy search for adversarial real-time games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 14, No. 1).

31 of 41

Nested Greedy Search�a.k.a. NGS

 

32 of 41

Nested Greedy Search�a.k.a. NGS

BTW, notation could not have been more confusing here…

NestedGreedySearch sections:

Row 1,3 <-> [A] … default me

Row 2,4 <-> [B] … default enemy

Row 5 <-> [C] … while we have time

Rows 6-10 <-> [C1] … check on reassignments

Row 9 <-> [C2] … should we reassign?

33 of 41

Nested Greedy Search�a.k.a. NGS

 

34 of 41

Nested Greedy Search�a.k.a. NGS

 

35 of 41

Nested Greedy Search�a.k.a. NGS

NestedGreedySearch notes:�Having GS on the second ply of a game tree effectively means, that we move non-convergence down by one ply.

 

36 of 41

Experiments and Results

Comparing NGS to other searches.

Moraes, R., Mariño, J., & Lelis, L. (2018, September). Nested-greedy search for adversarial real-time games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 14, No. 1).

37 of 41

Nested Greedy Search�Results

 

38 of 41

Nested Greedy Search�Results

 

39 of 41

Nested Greedy Search�Results

NGS was the usual winner.

AHT – Adversarial HTN, NAV – Naive MCTS, PS – Puppet search, STT – Strategy tactics

… all adversaries are search based.

40 of 41

PGS / NGS�Final thoughts

Is it all but smoke and mirrors?

Using low-level actions in searches are probably overwhelming to script-based approaches.

Should we really be comparing stuff across “time limits”? How about using “iterations” instead?

Where is the breaking point?

What if we put scripts into MCTS instead of low-level actions (psst, we actually did and it outperformed PGS clearly)?

Or otherwise used scripts to prune MCTS so we get to the “breaking point”?

=> Any of this could effectively become your master thesis topic!

41 of 41

That’s it for today!