Video Games and CS Education: Data analysis to understand how students think about game design

By Kristina Holsapple

Problem Description:

This data analysis explores how computer science students, specifically introductory students, think about video game and video game design. In designing application programming interfaces (APIs) for online tools, it is becoming more common for developers to consult their intended audience throughout the design process. When data regarding user preconceptions exists, developers can use this data to guide design principles of an API. Although tools currently exist to help programmers develop video games, the broad problem regarding this analysis is that prior data regarding student preconceptions of game design does not exist. Thus, current game development tools have not been designed with respect to how users understand and expect the tool to work. Analysis of this data has the potential to guide future design of a student and classroom-friendly library for students to learn computer science through game design.

Problem Background:

It is well known in Computer Science Education literature that games are a motivating context for students to learn computer science. For example, positive factors of game development in educational settings include enthusiastic response to instruction and appeal to different student demographics according to "Teaching Computer Science through Game Design" by Mark Overmars (2004) improved problem-solving skills according to Mete Ackcaoglu's "Learning problem-solving through making games at the game design and learning summer program" (2014).

Students use game development tools to develop games. However, users of Application Programming Interfaces (APIs) of programming tools can experience a challenging learning curve, according to Myers and Stylos (2016). Myers and Stylos discuss how human-centered design can improve API usability. User-centered design includes involving the intended users in the design process of a tool.

The most prominent example of user-centered design in Computer Science Edcuation is the development of the programming language Quorum. Quorum's design was evidence-oriented, meaning that design choices were made with user perceptions of programming in mind. Quorum was designed to support computer science for all, a movement to make computer science more inclusive and reach more students with computer science education.

Python is a popular beginner programming langauge to learn, and game development libraries do exist for Python. Pygame claims to be light-weight, simple, and easy-to-use, although there is no peer-reviewed evidence to support this claim. Arcade promotes their Python game development library as being easy-to-learn and ideal for people learning to program. Although this intention is supportive of the learning experience, Arcade's design lacks the evidence that is evidence in user-centered design.

By designing a tool for an audience based on their understanding of the tool, ideally the audience's learning and use of the tool feels more natural and compensates for the learning curve of both game development and API usage. Research that supports the learning experience of students is valuable because learning is valuable.

Dataset:

Data analyzed comes from two surveys asked of introductory computer science students. In the fall semester of 2020, my co-PI and I designed and conducted our first survey, (referred to later and in code as survey version 1/v1) asking CISC108: Introduction to Computer Science I students about their preconceptions of game design vocabulary and logic (see further details later in Research Questions). After reviewing how students understood and answered questions, we acknowledged shortcomings of our survey.

Based on our limitations of survey v1, we altered some question phrasing to establish our second survey (survey version 2/v2). Changes were minimal and meant to mitigate bias. Overall, the game vocabulary and logic we asked in version 1 remained the same topics we asked in this second version. We launched this survey with CISC108 and CISC106: Introduction to Computer Science for Engineers students in the spring semester of 2021.

The survey data consists of two types of questions: free-response and multiple choice. Free-response questions (referred to later and in code as 'Open' questions) prompted students to answer questions with a text response. Multiple choice questions (referred to later and in code as 'Closed' questions) asked students questions with a list of 3-8 multiple choice options.

This data analysis comprehensively examines responses from versions 1 and 2 of our survey, with specific focus on differences in responses between survey version as well as differences in responses based on students' self-reported prior experience with programming and game design.

Research Questions:

To design an evidence-oriented game development library for students, it is necessary to understand how students think about video games and game development.

The questions I answer with this dataset are:

RQ1. What vocabulary do novices use when thinking about video games?

Specifically, how do students think about terms commonly referred to as:

This question and these concepts were justified in a couple of ways. Anecdotally, as an introductory student who learned CS with game deveopment, I struggled for weeks with the vocabulary and concept of sprites (interactive game graphics). More importantly, existing game library APIs such as Pygame and Arcade's refer to many of these concepts with different terms. For multiple choice questions inquiring about these topics, multiple choice options were chosen from existing game library APIs. Without an existing consensus of how students perceive these concepts, this survey data can offer insight into which terms students most relate to.

RQ2. Is there significant difference between survey results based on survey version (1 or 2) or participants prior programming and game development experience?

This question is of importance because it indicates how we can treat the data for further analysis. We initially decided to conduct a second iteration of the survey due to the small sample size of version 1. If there is no significant difference based on survey version, we can pool data from both versions together for a larger sample size and more analysis. Questions that are significantly different based on survey version offer insight into survey change implications.

Analyzing results based on prior programming and game design experience offer insight into how students with different prior knowledge may interact with game design libraries. Students were sorted into four groups:

Although this research has an emphasis on novice preconceptions (the less experience the better), significant differences based on these groups may indicate design challenges in catering to different users' needs that are important to be transparent and forthcoming about throughout the design process.

RQ3. Do results indicate game development vocabulary that best aligns with how novices think about video games? If so, what vocabulary is ideal?

Based on answers to RQ2, it will be interesting to see if there are clear results in support of API design for a game design library. The goals of this analysis are to (1) understand how students think about game design before learning game design in a classroom setting and (2) use these results to guide the evidence-based design of a student-friendly game library.

Our results hope to support the development of a game library with the same intentions as libraries such as Pygame and Arcade. These intentions include understanding, valuing, and supporting the learning experience of students as they navigate learning computer science, an already difficult journey. Our process is unique in seeking out evidence from the students we hope to help in order to include them in the design process. In this way, rather than only claiming to support learners once the library has been designed, our design process is comprehensively supportive of students.

Ethical Concerns:

Ethical considerations of this investigation include the delicate nature of student education. Student education is important and should not be unnecessarily interrupted and disturbed. Research regarding education runs the risk of disrupting students' edcuation, and this should be acknowledged and accounted for. Additionally, video games in the media tend to target male audiences. A concern is that game development in education may favor male students if not presented appropriately.

Regarding data science specifically, ethical concerns also exist in analyzing this specific data. We acknowledge bias throughout our design, and want to acknowledge that this data is only the start of evidence-oriented game libraries. Placing too much emphasis on only the data offered here overvalues a predominantly male sample from a predominantly white institute such as the Univeristy of Delaware.

Additionally, although we hope to take a more scientific and evidence-based approach to game library development, not everything was justified by scientific decision-making. Specifically, the way we chose the specific concepts investigated in our survey questions does not have scientific justification. We chose concepts we anecdotally caused students problems, as well as concepts for which multiple terms exist in current game libraries, but the concepts we chose are not comphrehensive. Choosing to focus on the concepts we did runs the risk of completely disregarding other concepts that students challenge with. This ethical concern can be mitigated in future stages of design process. For example, once a prototype of the game library exists and we have students test the library, we can ask them more broad questions to evaluate the challenges they faced using the library, during which different concepts we have not yet collected data for may become apparent. For now, we can acknowledge this ethical concern, aware that our data is not exhaustive, and commit to monitoring this concern as we move forward with the data.

Helper functions and data

During a lot of the analysis I ended up doing, I noticed I was repeating code. I defined these functions to help decompose and organize my analysis.

Analysis process

My data comes from two surveys, which I will refer to as survey version one and two. Analysis will be done on each survey individually. Then testing will be conducted to determine when it is appropriate to combine survey results.

Data Cleaning/Transformation Survey v1

Load fall 2020 survey 1 data from CSV file. Drop data irrelevant to data analysis.

Drop rows without survey responses. Filter prior game development experience to yes or no.

Define list of headers for survey questions.

Data Distribution Survey v1

Make table observing distribution of experience levels.

Data Cleaning Spring 21 Survey v2

Load spring 2021 survey 2 data from CSV file and format file.

Remove columns from results not pertinent to data analysis.

Remove non-consenting responses and determine number of participants.

Filter prior experience question to True/False

Get gender demographics

Data Distribution Survey v2 and Comparison

Edit survey_headers to account for changes to survey.

Find participants per levels of experience and compare to v1.

Compared to v1, different distribution of experience. Noticably more participants without prior levels of experience.

Compare Survey v1 and v2 Distributions

Compare responses from surveys 1 and 2 with Chi-Square Independence Test.

Perform Chi-Square Independence Test for every shared survey question.

If significant difference based on survey version, investigate relationship between responses per survey version and prior experience.

Null hypothesis: Survey results are independent of the variable they are sliced upon (either survey version or experience level).

For P-values less than our confidence interval of 0.05, we reject the null hypothesis that the results are independent. For example, ClosedMovement's p-value based on survey version is 0.000001, so we reject the null hypothesis and do not conclude that the results per survey version are independent.

Pooling V1 and V2 when appropriate

Questions with p-values in which we accept the null hypothesis that responses are independent based on survey version, we pooled v1 and v2 results together. When pooled, there were 194 responses. Maintain total responses from surveys for questions with no significant difference and unusable data ('OpenScreen' that did not measure proper concept).

For each question:

ClosedWave

No significant difference based on experience.

animation and action closely frequent.

ClosedStep

No significant difference based on experience.

repeat most frequent, iteration also fairly popular.

ClosedEvent

No significant difference based on experience.

'when' evidently most common.

ClosedState

Significant difference based on experience.

As initially noticed in V1, moment is more popular among participants without prior game development experience and not as popular among students with game development and programming experience. state is still most popular overall.

OpenSprite

Significant difference based on experience. All results entered at least two times. Accounts for plurals. For example character represents responses of character or characters.

Closed Sprite

Signficantly difference based on experience.

More participants without prior game development experience preferred characters. Objects most popular overall.

When V1 and V2 data cannot be pooled

Investigating responses with signficant differences between V1 and V2.

OpenShape

rectangle more popular in V2, draw_rectangle more popular in V1.

ClosedScreen

Significant based on survey version. Visualizations are weird with added option.

Not including window in V1 affected results, considering it was not a survey option. V2 suggests window is popular option, screen as second option.

ClosedMovement

animation more popular. Difference from V1 and V2 was wording changed to manage bias. Obviously, significant difference, suggests wording affects participant perception and bias.

Open-Ended Coding Question

The final question of both v1 and v2 surveys showed students an animation of a frog across the screen and asked them to write code to animate it. Following is the analysis to these responses. Observe distributions and patterns in separate V1 and V2 data.

All p-values are less than 0.05, meaning we reject the null hypothesis that line or character length of either survey version are normally distributed. Because these values are not normally distributed, we can perform Mann Whitney U tests to compare the version one and version two responses.

Both p-values comparing line and character numbers are significantly small, less than 0.05, so we reject the null hypothesis that there are no statistically significant differences of response length and survey version. This suggests responses cannot be compared together and should instead be looked at separately.

Classify responses based on conventions

Notice an increase in number of responses that did not compile from v1 to v2. Is this a significant difference? Perform a chi-square test for independence to determine if this dependent on survey version.

This is greater than our significance level of 0.05 so we accept the null hypothesis that the conventions of responses are independent of survey version. Therefore, even though there is that increase of responses that did not compile between survey versions, it is not a significant increase.

Results

RQ1: What vocabulary do novices use when thinking about video games? and RQ2. Is there significant difference between survey results based on survey version (1 or 2) or participants prior programming and game development experience?

Results to RQ2 support broader conclusions that answer RQ1.

Regarding RQ2, there were some survey questions with significant difference based on survey version. For example, results to our survey question asking what students call an animation of a character walking across the screen were significantly different between survey versions. This difference may be explained by a survey design change from v1 to v2. We realized after launching v1 that the wording of the question may introduce bias, so when designing v2 we reworded the question to mitigate that bias. The significantly different results between survey versions suggests that the vocabulary that students use to think about video games depends on how game design concepts are framed to them. This insight into student vocabulary highlights not only the importance of choosing consistent phrasing for the game library API, but also the importance of intentional consistency when teaching game design concepts to students in the classroom.

Also on the topic of RQ2, analysis of some survey questions suggests that there is not significant difference in responses based on prior experience of students or survey response - suggesting that some students think about game design concepts similarly regardless of their prior experience with programming and game development. Such concepts based on our analysis include in-place animations such as a character waving, of which the most popular response among students in our sample conceptualized as an animation. Additionally, most participants regardless of prior experience consider an iterative occurence as a repeat.

Analysis of other survey responses suggest that there are some game design concepts that students may conceptualize differently depending on their prior levels of programming and game design experience. For example, when referring to a game's state at one particular moment, students without prior game development experience significantly preferred the term moment compared to students with prior experience. When asked about vocabulary regarding sprites, students without prior programming or game design experience chose character more than students with other levels of experience.

RQ3. Do results indicate game development vocabulary that best aligns with how novices think about video games? If so, what vocabulary is ideal?

Much like the answer to RQ2: it's complicated. Some results already discussed were clearly favored, such as repeat as an iteration. However, results to the question asking about a character waving were most popularly animation, but action was almost equally favored. No response received the majority. As a result, there is not an "ideal", or most popular, result favored by all students regarding an in-place animation.

Results to the question asking about sprites showed general favorability towards both character and object. Although these were favored, they call into question what "ideal" means in this research question and this context. Although this is ideal in the sense that most students preferred these terms, are they ideal in a teaching environment? object and character both have different meanings in computer science contexts that students may learn in classes after introductory computer science - what are the risks of choosing these terms for a game library API that may interfere with students' further learning? These findings offer preliminary insight into students' thinking regarding game design and also offer further questions to be explored in future design of a game library API.

Conclusion

Analysis of the survey one and survey two responses offers preliminary insight into how students think about game design before formally learning about game development in the classroom. In conclusion, the way students think about game design when considering prior experience is complicated and nuanced. Acknowledging ethical concerns, this data is not conclusive and depending too heavily only on this analysis ignores issues such as data collected from a PWI and the anecdotal choosing of terms to investigate. From this data analysi I can confidently conclude that there is always room to explore and ask more questions regarding how students learn computer science.