Analysing game learning analytics of players in the Ethers games – a summer internship report

0. Preface

I’m Vishesh Kumar, a design student at the Indian Institute of Technology, Guwahati, soon beginning my fourth (and final) year of undergraduate study. I have an interest in making things (hardware or software, technological or not), the learning sciences, and find myself particularly excited about making educational tools and games. I spent this summer (after my third year completed), working in a remote internship at Iridescent, under the guidance of Dr. Kevin Miklasz.

The stated goal of the internship was “to engage in an exploratory analysis into game learning analytics by designing and developing a series of techniques to analyze data from the Ether game series”. This is a report summarizing the work conducted by me over the duration of the summer, in the same.

1. Background

The Ethers games – currently The Fluid Ether and The Gravity Ether – are a series of games that allow students to play with complex Physics forces that are difficult to simulate in a traditional classroom environment. Each game allows the player to focus on manipulating environments with regard to individual mechanics. The Fluid Ether involves controlling 'jets', or fans that create currents inside a water tank, with the aim of moving balls to accomplish goals like collecting coins, breaking obstacles, etc. The Gravity Ether depicts a scenario in outer space where planets are moving around (or ejected from a spawner), and their motion is controlled by placing and removing black holes over the 'map' of the levels. The aim of controlling the planets' motion is largely similar to that in The Fluid Ether – collecting coins, breaking obstacles, 'charging' up glow regions, and so on.

I went through several readings to provide context for the internship, with respect to the assessment system we were to build. I began with Iridescent’s Assessment Framework which outlines the target traits of Iridescent’s events and programs. This also gave direction regarding the goal of the internship project – to analyse the data of students’ play patterns, with the aim of identifying traits being exhibited by the students, such as Persistence, Creativity, Curiosity, Conceptual Understanding and Engineering Design Patterns.

The further readings described how to construct models for Evidence Centred assessment Design (ECD): identifying the traits that are to be measured (persistence), what tasks or activities would represent them in the designated game or scenario (repeatedly attempting levels to attain more goals), and how to measure these tasks (number of Level Restarts, and how many goals are accomplished cumulatively).

There were examples of how this (the measurement of traits) was done with a few different games  some of those examples included the ECD models built and used by the testers as well. These readings gave me an understanding of the value of stealth assessment, particularly in video games – as we were going to be building for the Ethers Games. The measurement of learning contextualized within an immersive activity is more appropriate than when situated in a test environment. To build a good stealth assessment, especially in the context of video games, a commonly used method is to build and use an Evidence Centred Design (ECD) model. An ECD model comprises of a Competence Model (with regard to the Ethers Games, the traits that Iridescent focusses on – Persistence, Creativity, …). This is then correlated with an Evidence Model (evidence in a player's activities) that could be said to represent a related trait/competency, and followed by a Task Model – what tasks elicit earlier listed behaviors and evidence such that one can remark about the player's competency.

There were a variety of assessments – in games like the World of Goo, Oblivion, and Taiga Park, described across different papers. Some of them involved non-stealth, explicit assessments revealing the difference between competency in conceptual understanding & certain traits, and competency in playing the game itself.

Another interesting take from the readings, was that these in-game assessments were being designed as a summative assessment (a measured and final set of values) with respect to the games themselves – but were particularly valuable as formative assessments in the scenario of a classroom. Learning about the students' traits in engaging with the game, would better inform and prepare the teachers about their students' behaviors, and how to manage or adapt for them.

2. Diving into the data

With the aim of eventually building an ECD model that could be implemented on the data logs of the Ether games we had, we began with an Exploratory Analysis to see what kinds of extractable information, and patterns, were hidden inside the hundreds of thousands of lines of data.

We began with reconstructing time distribution charts as were already available on the teacher’s dashboard, for me to gain gradual familiarity with the nature of the data – the variety of logs and which could be used how.

2.1. Choosing events, and making plots

We began with scouring through the Gravity Ether game’s data. In thinking about what kinds of plots could give us information, we first made histograms to see a variety of events (goal completes, obstacle breaks, etc.) and when they happen. Since the main activity in Gravity Ether (the level play section) is placing (and removing) black holes, we attempted to identify patterns in that interaction – where the black holes were placed on the [levels’] map, and where they were placed with respect to the other objects, among a few others. We engaged in similar plots and explorations with the playlogs of the level editor.

A similar procedure was followed for the eventual analyses of Fluid Ether as well – where similar time plots were made; and a different set of plots were made to visualize the affordances offered in Fluid – like rotating jets, and toggling or changing their power level.

This was a gradual and iterative process of making plots, looking for patterns in them, and  thinking and making further plots which might have related meaning or work in conjunction with the earlier graphs. The thoughts behind the different plots included finding visualizations to see how the students engaged with the game (over time, conceptually, etc.), and which representations could relate to the traits that were identified as goals of Iridescent activities (Persistence, Curiosity, …).

2.2. Cleaning up data

After making the first few plots across all the data, and divided across the levels, we divided the data across the students. After a few rough attempts at dividing the data cleanly – which was made challenging by multiple aspects of how the data logging took place (programmatically as well as logistically) – I wrote some preparatory scripts to selectively print lists of relevant row numbers (for the different kinds of divisions), and additional code to keep them ready for repeated use across all the other codes.

This is explained in greater detail in the data preparation and flow described in Appendix A.

After seeing that some students’ records were frequently providing no useful data, I delved into looking at the records of all students with too few rows, or older kinds of data logging (the beta version of the game’s logs were of a different nature as compared to post-release, and we chose to not work with them), and manually made a list of the student indices that should be excluded from being considered in the analyses. This significantly helped in producing cleaner and more informative plots. A descriptive, though slightly encoded, account of how the students were filtered has been documented in the project files.

2.3. Preliminary investigation for correlations

While making some of the graphs, we were thinking about how to reduce these visualizations to singular numbers that we could use to compare the students, and find trends. We attempted to encapsulate the different representations in varying styles – the time measurements were averaged over time, whereas the other black hole plots were associated with different frequency arrays. We then looked for correlations in these measures across students. To provide an example, we determined if a greater number of goals correlated with the closer black holes were placed to planets.

2.4. Constructing an ECD model, making the measures matrix

After having seen the sorts of data that could be extracted (from the earlier step of making numerous plots), we took a step back to list what (measurable) tasks could correspond to the traits that we wanted to (affect, and) assess. A preliminary version of listing these tasks had been done earlier (before the plotting, and after having read about ECD models), and extended and modified in this round. Along side these tasks, we also added what our 'measure' would be, which refers to the actual number that would be calculated from the data. The measure can be thought of as the “evidence” that links the tasks to the learning claims. For instance, if persistence could be assessed by seeing how long the students worked at each of the levels (task), the time played divided by the number of restarts (average time per each run), could be a measure for that – such that one number corresponded to each student. This enabled us to see trends over the students across different measures, and also compare them with each other.

A table was built listing tasks and measures. All the measures were then computed, and a matrix of these measures, having numerous columns (some measures had one value for each level each student had played as well!), across one row per student, was compiled.

This matrix was used for a more thorough analysis of correlations and identifying trends – in the same vein as described in 2.3.

2.5 Future steps

After finding a few correlations among measures associated with the same trait, and across different traits, we need to verify our findings by matching them with subjective observations. A common way of doing this, involves observing students reflect the desirable traits during play. These observations are then matched with common data patterns in the different students. This would tell us that the identified patterns, when found in analysing different students, do reflect the traits that are being assessed for. The similarity of these patterns with the measures and correlations we found, would enable us to remark on how accurate or fitting our ECD model is, and how it can be improved. Unfortunately, the summer internship did not allow enough time for me to complete this step.

3. Exploratory Analysis Results

As described in section 2, we began working with the Gravity Ether data.

After gathering some familiarity with the data, we began delving into building histograms and other plots.

3.1. Information over time

Our initial histograms involved the time distribution of different events – the timing of when goals were completed, and when level objects were broken. This was to observe if there were any observable differences in when events in playing the game happened, across different levels and/or different students.

goalTimes 0 .pnggoalTimes 0 .png

Goal times histograms for level 1, without (left) and with (right) student breakup

The goal times graphs above depict the time vs. frequency distribution of when “goalComplete” events occurred since the start of the level’s play.

And:

clickTimes 0 .pngclickTimes 0 .png

Click times’ histograms of level 0, without (left) and with (right) student breakup

These are histograms for the click times into each level – which would indicate how long the players were engaged with those levels or the game.

3.2. Understanding the black holes

The other event worth paying attention to and trying to understand, was the placement and removal of black holes. For this, we made heatmaps of where the black holes were placed (and removed). Seeing the black hole placement maps right above the levels’ maps is an interesting sight too! We noticed how the black hole heatmaps tended to correlate with the position of level objects.

Levels 1, 2, and 3; heatmaps (above) and level maps (below)

bhPlace-level 5 .png

Levels 4 (first challenge level), 5 and 6

Levels 7, 8 (second challenge), and 9

We reduced the black hole heatmaps to numerical arrays as well, so as to be able to compare different students’ heatmaps in objective measures. As such, we conducted a spatial cross-correlation between a student’s particular heatmap, and the average heatmap summed over all students. The map was reduced to an 8x5 grid, and each student’s heatmap was represented as an array of forty elements with the number of black holes placed in each subgrid, equated with the value of the corresponding array element. Using this numerical representation, we could find an ‘average’ placement of the black holes for each of the level maps, as played by the many students. This representation of the different students’ play (in terms of spatial placement of black holes), and an average play in the same terms, enables us to compare how similar students’ play was to the mean, and to each other (done in our analyses by finding correlations between the respective arrays)

We explored a variety of different object relations with these black hole events, to see greater meaning or more (measurable) patterns. Heatmaps of the black holes placed shortly before completing a goal, broken up by student number;

Heatmaps of level 1’s play for different students

and histograms of the distance between the planets and the black holes – to see where the black holes were placed with respect to the planets.

Histograms of distances of the closest planet to a placed black hole, with student break-up(left) and without (right)

This had meaning with respect to an articulated concept as well – that nearer black holes cause greater acceleration of the planets – and implementing this concept aptly, could be said to be representative of effective play towards achieving some of the level objectives in the game. Most players did intuitively place the black holes near the planets more often than not, as was visible in the graphs.

3.3. Dissecting the editor

In visualizing the editor’s usage, we thought of two approaches – seeing the time distribution of how long players designed levels, playtested, and iterated; and seeing heatmaps of where they placed different game elements. The latter was to gain some idea of the method or thought which motivated student activities as designers. Seeing all the placed and removed elements in one plot wasn’t very informative, so we broke up each of the plots across time spans of 15 seconds – which gave a much better picture of how students placed elements, removed them, playtested, and then modified.

Fig. Student ‘-6’ ‘s editor usage

This plot contains quite a bit of information about the editor usage. Each block is for fifteen second usage of the editor, and in the hi-res version of the image, different classes of placed and removed items are represented by different letters (blocks, light regions, balls, spawners, etc. – the entire legend is inside the corresponding code file timebreak-labs.R); and green letters represent a placement of these items, and red letter represent removal.

A few blocks have green circles, which represent the playtesting black hole positioning done by those players.

This also motivated us to do a time breakup of the black hole placement in the levels – giving a much more granular picture of how students’ play happened, but seemingly difficult to make objective/measurable sense of.

Student ‘-15' ‘s play of Level 1 in the Gravity Ether

These plots have a rather high density of information too. Each block represents 15 seconds of play. Green circles are for placed black holes, and red ones for removed black holes. On the right of each block, two lines of information relay number of coins collected, blocks broken, planets swallowed by black holes, and level restarts in that time block. There are extra lines printed, when a goal is completed in that time block – the goal’s text written in that line.

Two more graphs were made to explore editor usage as well. One called the ‘EditorTimes’ (left graph), and the other was an ‘obstacle count’ (right graph).

Editor Times plot (left) and Obstacle Count (right)

The Editor Times plot was a time distribution of how long the players were designing the level (editing obstacles and other elements), and how long they were playtesting. The titles of the graphs (above each of the blocks) has the number of switches to playtests as well – and the graphs themselves three color coded lines representing play time (red), design time (green), and ‘other’ (blue), as logged in the game.

The obstacle count is a count of how many obstacles there were on the level being designed. Some of the graphs inside, due to unfortunate glitches in the logging, happen to be downward slopes indicating negative numbers of obstacles. Apart from those, the lines show when and how many obstacles (‘grid elements’ like blocks, walls, etc) were placed and removed in the editor usage. Thus, multiple falls indicate greater revision in terms of removing and replacing obstacles, by the player.

3.4. Parallels in Fluid – over time, and editor

The initial time distribution charts were repeated for Fluid Ether – of seeing when goal completes, player clicks, and obstacle breaks happened in the games. Similarly to Gravity, we saw these cumulative over the whole game, broken up by levels, by students, and by each students activity in each level.

We did not get adequate time to delve any deeper in the editor play of Fluid separately (and the number of usable editor logs for Fluid was also much lower than that for the Gravity Ether), thus we just replicated our earlier time break up of editor activities with adjustments made for the unique elements in the Fluid game.

goalTimes 0 .png

Level 1’s goal times, without (left) and with (right) student break up, in the Fluid Ether.

3.5. Understanding the jets

In comparison to the simple interaction of placing and removing black holes in Gravity, Fluid had a more complex repertoire of controls that the player was provided with. There were three kinds of jets, with significantly differently behaviors. In comparison to black holes, whose main (and only) feature was location (thus easily represented in heatmaps), jets were generally constrained by location. Instead, they could be rotated, ‘slid’ along predefined paths, toggled on & off, assigned a ‘power level’ with which they would push water.

A variety of plots were made, attempting to meaningfully visualize the interactions with the different jets. The following were some of the interesting ones.

Heatmaps of the rotating jets:

The map of rotating jets’ level (left), and a ‘heatmap’ of the angles of the rotating jets

Heatmap of rotating jets’ angles with student break-up

From level 6 of the Fluid Ether, these heat maps were made by drawing arrows at every logged angle of the rotating jets. So denser green regions means that the jets were held at, or rotated past those angles, more often.

Heatmaps of the sliding jets’ positions:

The map of sliding jets’ level (left), and a heatmap of the sliding jets

The heatmaps of the sliding jets for different students

Similar to the earlier rotating jets’ heat maps, these maps were made by drawing a translucent circle at every logged position of the sliding jets (depicted maps shown for Level 0 of Fluid). Thus, denser red regions denote greater frequency of presence of the sliding jets at those places.

Sliding jets’ relative positions without (left) and with (right) student break-up

A heatmap of the sliding jets’ ‘end points’ relative to each other: the points marked are wherever a jet was left after some sliding – and lines from these points are drawn to wherever the other jet was, at that point of time (again for level 0).

Heatmaps of the balls’ positions, when jets were toggled, or their power level was changed:

Map of levels with ‘level jets’ and the ‘ jets’ heatmaps’

The level jets’ heatmaps with student break-up

As the jets are fixed in location, we hoped that seeing the balls’ positions when the jets are manipulated might provide some information. The above plots are from level 1 of Fluid Ether.

After these, we tried to visualize how different jets inside a level were being used relative to each other. We attempted to make a time distribution of when which jet was in what state.

Time series plots for level 1 and level 5

For the plots above: the left one, of level 1, shows when the five jets on the map were on or off over the duration of the students’ play. The right one, of level 5, shows the quadrant in which each of the four rotating jets were, over the play time, using four different colors corresponding to each of the quadrants.

And apart from that, we attempted to plot a network analysis of the jets; which sets of jets were operating in synchrony for greater amounts of time.

Network analysis of jets being switched on

For level 1, this shows connections of orders in which jets were switched on. A directed arrow from jet A to jet B indicates that jet B was toggled after jet A was.The network analysis appeared to be too confusing to be helpful in increasing our understanding of the players’ activities.

Taking some thought from the previous two plots, we came up two other constructs – one being a histogram to show how often all the others jets were switched on relative to the each of the jets; and a time flow of the ‘cumulative value’ of all switched on jets.

Relative jets’ histograms

In the figure above, the last plot is the legend of which color corresponds to which jet. In each plot (this one being for level 1, with 5 level jets), each ‘bunch’ of bars is for each jet (say, jet A), from the first to the last one, and each column represents how many times the colored jet was on when ‘jet A’ was on too.

This last figure was made to make it possible to compare the plays of different students, in a numerical manner (similar to how the black hole heatmaps were reduced to numerical arrays). To this end, each jet was assigned a power of two. Thus the 'total value' – the sum of the jets' values that are switched on – at any point of time, uniquely identifies the jets' configuration. The line chart plotted shows the change over time, of this ‘total value’ of switched on jets, and intermediate red circles in the lines indicate goal complete events. This can more easily show if certain jet configurations were particularly preferred by students.

Above described ‘jetValues’ graph for level 1’s plays.

There was data preparation preceding these plots, which also appears to be a useful encapsulation of how players played each level. These 'state matrices' have rows for every time a jet is toggled, or has its power level changed; and also every time a goal complete occurs. This gives a fairly good non-graphic depiction of how the players played with the toggle and level jets.

Student ‘-30’ ‘s play on level 2 logged as state matrices.

4. Conclusion

The visualizations that we created possessed varying levels of meaning that could be extracted either measurably or subjectively on viewing.

The heat maps appear to be rather interesting and valuable artifacts for logging play. It is noticeable in how meaningfully different they are for different levels, as well as varying across different students. This indicates that there is potentially some very meaningful information encoded in the heat maps – thus making comparisons across levels or students potentially meaningful.

The editor heatmaps, similarly, had potential meaning in seeing when the obstacles are being placed in fancy shapes or to spell out things, and when they’re being modified with deliberation regarding play.

Finally, the measures table (so far, for the Gravity Ether), appears to have potential as a comparison tool to establish benchmarks for levels of persistence, creativity and other traits. Some logical correlations and sometimes an unexpected lack of correlation between related seeming measures reveals possible information about how the players are engaging with the game, and how much our assumptions are in accordance with the same.


Appendix A
How the preparatory scripts work, and what the prep files mean

studentRows.R (inside Data prep/) prints studentRows.txt (every row in this text file, is of the format "<student number> \t <row numbers of this student's play>")

levelGroups.R prints levelPlay.txt (every row in this text file is of the same format as studentRows, except each row is only individual levels' plays. Every empty line break represents a change in the level whose records are being printed)

editorRows.R prints editorPlays.txt (every row in this file is a collection of continuous editor play sessions by individual students. Every empty line break represents a change in the student who's editor plays' rows are being printed)

launch.R runs the scripts prepStudents, prepLevels, and prepEditor (and additionally levelSort.R in Fluid)

All the data structures provided by these prep-scripts are documented inside launch.R

prepStudents.R has manually entered data structures (toSelect or toReject) which are lists of student numbers whose entries should (or should not, respectively,) be considered due to suitability in terms of adequate detail and usability.

lengthsandrejects.txt (in Data prep/) has lists of the number of rows each student number had, and which of these were selected or not.

Appendix B
Commonly used code snippets

levels = union(leveljetLevels, toggleLevels) #specific to Fluid Ether, leveljetLevels is the list of level numbers which had level jets; toggleLevels had toggle jets, and so on (specified in the launch.R of Fluid)

for(g in levels){ #g goes through all the level numbers relevant to the analysis

        png(file=paste("jetValues -", g,".png"), width=3000, height=1500, res=150) #this opens a png to plot the images. The dimensions were subject to change: equal height and width for most histograms, and a ratio of approximately 8:5 for heatmaps to match the aspect ratio of the levels' maps

        opar = par(mfcol=c(5,5), mar=c(2,2,1,1)) #this was used to divide the printed png into a grid of multiple plots. Done when multiple students' plots were to be made separately, but collated into the same levels' file.

        for(h in (allRows[which(levelList == g)] + 1):(allRows[which(levelList == g)+1] - 1)){ #There are multiple steps happening here, which can be made sense of, by understanding the data structures from launch.R. which(levelList == g), tell us where this level number is, in the list of levels. Level 2 might be the 2nd level, or 4th level.

        #allRows[which(levelList == g)] is the first (and empty) row in the lists of row numbers, where level g's plays have been logged. Thus we start from 1 row ahead of this, and close 1 row before the next empty list.

        lplay = unlist(allLevels[(allRows[which(levelList == g)] + 1):(allRows[which(levelList == g)+1] - 1)]) #lplay gets the level logs (the row numbers in the 'original' table of all the data) of the corresponding student from allLevels

        ...

        #plot function here

        legend(x="topright", legend=paste(levelStudentList[h]), bg = "white") #levelStudentList[h] is the student number whose level logs are there in the allRows[h]-th row of allLevels[[]].

        }

}

for (h in 2:length(editorStudentRows)){ #editorStudentRows, like allRows, is has the start point of different students' editor logs.

        edPlay = unlist(editorRows[(editorStudentRows[h - 1] + 1): (editorStudentRows[h] - 1)]) #edPlay here gets all the editor logs of one student. Different rows in this group of rows in editorRows refer to different times that the editor was opened by the same student.

        studentNo = finalList[h - 1] #similar to levelStudentList above

}

Plot guidelines

#histogram

hist(ctimes, breaks = 20, col = "coral3", main = NULL, xlab = NULL, ylab = NULL)

#heatmap

plot(xs, ys, pch=19, cex = 4, col = rgb(0.7, 0.4, 0.5, 0.1), xlim = c(0, 37), ylim = c(0, 25), xlab = NULL, ylab = NULL, main = paste("Student", studentNumber))