By Kristina Holsapple
This data analysis explores how computer science students, specifically introductory students, think about video game and video game design. In designing application programming interfaces (APIs) for online tools, it is becoming more common for developers to consult their intended audience throughout the design process. When data regarding user preconceptions exists, developers can use this data to guide design principles of an API. Although tools currently exist to help programmers develop video games, the broad problem regarding this analysis is that prior data regarding student preconceptions of game design does not exist. Thus, current game development tools have not been designed with respect to how users understand and expect the tool to work. Analysis of this data has the potential to guide future design of a student and classroom-friendly library for students to learn computer science through game design.
It is well known in Computer Science Education literature that games are a motivating context for students to learn computer science. For example, positive factors of game development in educational settings include enthusiastic response to instruction and appeal to different student demographics according to "Teaching Computer Science through Game Design" by Mark Overmars (2004) improved problem-solving skills according to Mete Ackcaoglu's "Learning problem-solving through making games at the game design and learning summer program" (2014).
Students use game development tools to develop games. However, users of Application Programming Interfaces (APIs) of programming tools can experience a challenging learning curve, according to Myers and Stylos (2016). Myers and Stylos discuss how human-centered design can improve API usability. User-centered design includes involving the intended users in the design process of a tool.
The most prominent example of user-centered design in Computer Science Edcuation is the development of the programming language Quorum. Quorum's design was evidence-oriented, meaning that design choices were made with user perceptions of programming in mind. Quorum was designed to support computer science for all, a movement to make computer science more inclusive and reach more students with computer science education.
Python is a popular beginner programming langauge to learn, and game development libraries do exist for Python. Pygame claims to be light-weight, simple, and easy-to-use, although there is no peer-reviewed evidence to support this claim. Arcade promotes their Python game development library as being easy-to-learn and ideal for people learning to program. Although this intention is supportive of the learning experience, Arcade's design lacks the evidence that is evidence in user-centered design.
By designing a tool for an audience based on their understanding of the tool, ideally the audience's learning and use of the tool feels more natural and compensates for the learning curve of both game development and API usage. Research that supports the learning experience of students is valuable because learning is valuable.
Data analyzed comes from two surveys asked of introductory computer science students. In the fall semester of 2020, my co-PI and I designed and conducted our first survey, (referred to later and in code as survey version 1/v1) asking CISC108: Introduction to Computer Science I students about their preconceptions of game design vocabulary and logic (see further details later in Research Questions). After reviewing how students understood and answered questions, we acknowledged shortcomings of our survey.
Based on our limitations of survey v1, we altered some question phrasing to establish our second survey (survey version 2/v2). Changes were minimal and meant to mitigate bias. Overall, the game vocabulary and logic we asked in version 1 remained the same topics we asked in this second version. We launched this survey with CISC108 and CISC106: Introduction to Computer Science for Engineers students in the spring semester of 2021.
The survey data consists of two types of questions: free-response and multiple choice. Free-response questions (referred to later and in code as 'Open' questions) prompted students to answer questions with a text response. Multiple choice questions (referred to later and in code as 'Closed' questions) asked students questions with a list of 3-8 multiple choice options.
This data analysis comprehensively examines responses from versions 1 and 2 of our survey, with specific focus on differences in responses between survey version as well as differences in responses based on students' self-reported prior experience with programming and game design.
To design an evidence-oriented game development library for students, it is necessary to understand how students think about video games and game development.
The questions I answer with this dataset are:
Specifically, how do students think about terms commonly referred to as:
This question and these concepts were justified in a couple of ways. Anecdotally, as an introductory student who learned CS with game deveopment, I struggled for weeks with the vocabulary and concept of sprites (interactive game graphics). More importantly, existing game library APIs such as Pygame and Arcade's refer to many of these concepts with different terms. For multiple choice questions inquiring about these topics, multiple choice options were chosen from existing game library APIs. Without an existing consensus of how students perceive these concepts, this survey data can offer insight into which terms students most relate to.
This question is of importance because it indicates how we can treat the data for further analysis. We initially decided to conduct a second iteration of the survey due to the small sample size of version 1. If there is no significant difference based on survey version, we can pool data from both versions together for a larger sample size and more analysis. Questions that are significantly different based on survey version offer insight into survey change implications.
Analyzing results based on prior programming and game design experience offer insight into how students with different prior knowledge may interact with game design libraries. Students were sorted into four groups:
Although this research has an emphasis on novice preconceptions (the less experience the better), significant differences based on these groups may indicate design challenges in catering to different users' needs that are important to be transparent and forthcoming about throughout the design process.
Based on answers to RQ2, it will be interesting to see if there are clear results in support of API design for a game design library. The goals of this analysis are to (1) understand how students think about game design before learning game design in a classroom setting and (2) use these results to guide the evidence-based design of a student-friendly game library.
Our results hope to support the development of a game library with the same intentions as libraries such as Pygame and Arcade. These intentions include understanding, valuing, and supporting the learning experience of students as they navigate learning computer science, an already difficult journey. Our process is unique in seeking out evidence from the students we hope to help in order to include them in the design process. In this way, rather than only claiming to support learners once the library has been designed, our design process is comprehensively supportive of students.
Ethical considerations of this investigation include the delicate nature of student education. Student education is important and should not be unnecessarily interrupted and disturbed. Research regarding education runs the risk of disrupting students' edcuation, and this should be acknowledged and accounted for. Additionally, video games in the media tend to target male audiences. A concern is that game development in education may favor male students if not presented appropriately.
Regarding data science specifically, ethical concerns also exist in analyzing this specific data. We acknowledge bias throughout our design, and want to acknowledge that this data is only the start of evidence-oriented game libraries. Placing too much emphasis on only the data offered here overvalues a predominantly male sample from a predominantly white institute such as the Univeristy of Delaware.
Additionally, although we hope to take a more scientific and evidence-based approach to game library development, not everything was justified by scientific decision-making. Specifically, the way we chose the specific concepts investigated in our survey questions does not have scientific justification. We chose concepts we anecdotally caused students problems, as well as concepts for which multiple terms exist in current game libraries, but the concepts we chose are not comphrehensive. Choosing to focus on the concepts we did runs the risk of completely disregarding other concepts that students challenge with. This ethical concern can be mitigated in future stages of design process. For example, once a prototype of the game library exists and we have students test the library, we can ask them more broad questions to evaluate the challenges they faced using the library, during which different concepts we have not yet collected data for may become apparent. For now, we can acknowledge this ethical concern, aware that our data is not exhaustive, and commit to monitoring this concern as we move forward with the data.
import pandas as pd
import scipy.stats as st
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import ast
from pedal import *
During a lot of the analysis I ended up doing, I noticed I was repeating code. I defined these functions to help decompose and organize my analysis.
clear_report()
def get_called_function(node):
# use Pedal to get called function
if node.ast_name == "Attribute":
left = get_called_function(node.value)
right = node.attr
return f"{left}.{right}"
elif node.ast_name == "Name":
return node.id
def make_cont_tables(df, headers):
# make contingency tables for each survey based on prior experience.
for header in headers:
print(header + ' totals and contingency table based on prior experience')
print(df[header].str.strip().str.lower().value_counts())
display(make_exp_table(df, header))
def make_table_by_q(df, header):
# make frequency table for one question sliced by survey version
print(header + ' contingency table')
table = (pd.crosstab(df.Version, df[header], margins = True))
# display(table)
return table
def make_exp_table(df, header):
# make frequency table for one question sliced by prior experience
print(header + ' by experience')
table = pd.crosstab([df.Prior, df.PriorGameDev], df[header].str.lower().str.strip(), margins=True)
# display(table)
return table
def perform_all_chi(df, headers):
# perform chi-square independence test on each question sliced by survey version.
p_values = []
for header in headers:
cont_tbl = pd.crosstab(df['Version'], df[header], margins=False)
chi = st.chi2_contingency(cont_tbl.values.tolist())
p_values.append(chi[1])
return p_values
def perform_exp_chi(df, headers):
# perform chi-square indpendence test on each question sliced by experience.
p_values = []
for header in headers:
cont_tbl = pd.crosstab([df.Prior, df.PriorGameDev], df[header], margins = False)
chi = st.chi2_contingency(cont_tbl.values.tolist())
p_values.append(chi[1])
return p_values
def analyze_free_response(df, header, exp):
# analyze free response questions
pd.set_option("display.max_rows", None, "display.max_columns", None)
if exp:
table = make_exp_table(df, header)
table = table.drop('All', axis=1).dropna().sort_values(('All',''), axis=1, ascending=False)
else:
table = make_table_by_q(df, header)
table = table.drop('All', axis=1).sort_values(by='All', axis=1, ascending=False)
return table
def analyze_open_end(df, header):
# analyze open ended code
total = {
'methods': 0,
'functions': 0,
'both': 0,
'test cases': 0,
'graphics': 0,
'pens': 0,
'for': 0,
'while': 0,
'if': 0,
'func def': 0
}
for code in df[header]:
methods = False
functions = False
clear_report()
code = str(code)
calls = find_asts("Call", code)
num_for = len(find_asts("For", code))
num_while = len(find_asts("While", code))
called_functions = [get_called_function(c.func) for c in calls]
for c in calls:
if c.func.ast_name == "Name":
methods = True
if c.func.ast_name == "Attribute":
functions = True
if functions & methods:
total['both'] += 1
methods = 0
functions = 0
if methods:
total['methods'] += 1
if functions:
total['functions'] += 1
if "assert_equal" in code:
total['test cases'] += 1
if "graphics" in code:
total['graphics'] += 1
if " pen" in code:
total['pens'] += 1
if 'if' in code:
total['if'] += 1
if 'def' in code:
total['func def'] += 1
if num_for != 0:
total['for'] += 1
if num_while != 0:
total['while'] += 1
lines_char = avg_lines_char(df, header)
show_line_char_graph_calc(lines_char)
analyze_open_code(df)
return total
def chi_per_version(headers):
# perform chi squared test for independence based on survey version
for header in headers:
cont_tbl = pd.crosstab([v1.Prior, v1.PriorGameDev], v1[header], margins = False)
chi = st.chi2_contingency(cont_tbl.values.tolist())
print(f"p-value of v1 based on prior experience: {chi[1]}")
cont_tbl = pd.crosstab([v2.Prior, v2.PriorGameDev], v2[header], margins = False)
chi = st.chi2_contingency(cont_tbl.values.tolist())
print(f"p-value of v2 based on prior experience: {chi[1]}")
def clean_text(text):
# function applied to open-ended responses to uniformize responses
text = str(text)
text.lower()
if 'def' in text:
text = text.replace('def', '')
if '()' in text:
text = text.replace('()', '')
if text[-1] == 's':
text = text[:-1]
text = text.strip()
return text
def analyze_open_code(df):
# process open-ended responses
# function not written by KH, written by co-PI
student_programs = df['GameDevOpenCode']
broken, working = 0, 0
for program in student_programs:
try:
program = str(program)
ast.parse(program)
working += 1
except SyntaxError:
broken += 1
print(f"\nBroken: {broken}; Working: {working}")
def show_line_char_graph_calc(lines_char):
# shows graphs for distribution of open-ended responses based on line and character length
# performs Shapiro Wilks test for normality on responses
plt.hist(lines_char[0])
plt.title("Distribution of Number of Lines per Response")
plt.show()
plt.hist(lines_char[1])
plt.title("Distribution of Number of Characters per Response")
plt.show()
lines_stat, line_p = st.shapiro(lines_char[0])
char_stat, char_p = st.shapiro(lines_char[1])
print("Num Lines Normality on an interval of .05: ", lines_stat, line_p, line_p < 0.05)
print("Num Characters Normality on an interval of .05: ", char_stat, char_p, char_p < 0.05)
def avg_lines_char(df, header):
# calculates lines and characters averages as well as lists of line count responses and char count respones
lines_count = 0
char_count = 0
student_codes = df[header].dropna()
num_codes = len(student_codes)
lines_all = []
char_all = []
for code in student_codes:
code = str(code)
len_code = len(code)
lines = (code.count('\n') + 1)
lines_count += lines
char_count += len_code
lines_all.append(lines)
char_all.append(len_code)
lines_avg = lines_count / num_codes
char_avg = char_count / num_codes
return [lines_all, char_all]
My data comes from two surveys, which I will refer to as survey version one and two. Analysis will be done on each survey individually. Then testing will be conducted to determine when it is appropriate to combine survey results.
Load fall 2020 survey 1 data from CSV file. Drop data irrelevant to data analysis.
v1 = pd.read_csv('f20_game_dev_library_data.csv')
v1 = v1.drop(columns=['FinalGrade', 'Midterm', 'CanvasGrade', 'Section', 'Lab', 'TAEngagement', 'TACommunication', 'MeanDiffRank', 'Points', 'Hours Spent in BlockPy', 'Median # of Days Submitted Before Due Date', 'Number of Days Spent Working', 'Assignments With 0s', 'Native Speaker?', 'Location', 'Race/Ethnicity', 'coarse_race', 'PointsGrade', 'MockMidterm14', 'Level', 'OpenPython', 'OpenCode'])
Drop rows without survey responses. Filter prior game development experience to yes or no.
v1 = v1.dropna(subset=['PriorGameDev'])
v1['PriorGameDev'] = v1['PriorGameDev'].mask(v1['PriorGameDev'] != 'No.', 'Yes.')
Define list of headers for survey questions.
survey_headers = ['OpenShape', 'OpenScreen', 'ClosedScreen', 'OpenMovement', 'ClosedMovement', 'ClosedWave', 'ClosedStep', 'ClosedEvent', 'ClosedState', 'OpenSprite',
'ClosedSprite']
Make table observing distribution of experience levels.
v1_dist = pd.crosstab(v1['Prior'], v1['PriorGameDev'], margins=False)
display(v1_dist)
PriorGameDev | No. | Yes. |
---|---|---|
Prior | ||
False | 19 | 1 |
True | 21 | 35 |
Load spring 2021 survey 2 data from CSV file and format file.
v2 = pd.read_csv('s21_game_dev_library_data.csv')
column_names = v2.iloc[0]
v2.columns = column_names
v2 = v2.drop([0, 1])
v2 = v2.reset_index(drop=True)
Remove columns from results not pertinent to data analysis.
v2 = v2.drop(columns=['Status', 'IPAddress', 'Duration (in seconds)', 'Progress', 'Finished', 'RecordedDate', 'ResponseId', 'RecipientLastName',
'RecipientFirstName', 'RecipientEmail', 'ExternalReference', 'LocationLatitude', 'LocationLongitude', 'DistributionChannel', 'UserLanguage'])
Remove non-consenting responses and determine number of participants.
v2 = v2[v2.Q16 != 'I do not consent to participate in the research study.']
print('Number of consenting participants: ' + str(len(v2)))
Number of consenting participants: 118
Filter prior experience question to True/False
v2['PriorGameDev'] = v2['PriorGameDev'].mask(v2['PriorGameDev'] != 'No.', 'Yes.')
v2['Prior'] = v2['Prior'].mask(v2['Prior'] != ('None'), True)
v2['Prior'] = v2['Prior'].mask(v2['Prior'] != True, False)
Get gender demographics
v2['Gender'].value_counts()
Man 67 Woman 39 Man,Prefer not to answer 1 Name: Gender, dtype: int64
Edit survey_headers
to account for changes to survey.
survey_headers.remove('OpenMovement')
Find participants per levels of experience and compare to v1.
v2_dist = pd.crosstab(v2['Prior'], v2['PriorGameDev'], margins=False)
display(v2_dist)
pi_wedges = v1_dist.groupby(['Prior', 'PriorGameDev']).sum().values.flatten()
wedge_labels = ['No prior programming, no prior game dev.', 'No prior programming, prior game dev.', 'Prior programming, no prior game dev.', 'Prior programming, prior game dev.']
plt.figure(figsize=(6, 6))
plt.pie(x=pi_wedges, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 12})
plt.legend(wedge_labels, bbox_to_anchor=(1.05, 1), prop={'size': 12})
plt.title('Percentages of Prior Experience Groups for V1', fontsize=15)
plt.show()
pi_wedges_2 = v2_dist.groupby(['Prior', 'PriorGameDev']).sum().values.flatten()
plt.figure(figsize=(6, 6))
plt.pie(x=pi_wedges_2, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 12})
plt.legend(wedge_labels, bbox_to_anchor=(1.05, 1), prop={'size': 12})
plt.title('Percentages of Prior Experience Groups for V2', fontsize=15)
plt.show()
PriorGameDev | No. | Yes. |
---|---|---|
Prior | ||
False | 51 | 5 |
True | 31 | 31 |
Compared to v1, different distribution of experience. Noticably more participants without prior levels of experience.
Compare responses from surveys 1 and 2 with Chi-Square Independence Test.
Perform Chi-Square Independence Test for every shared survey question.
If significant difference based on survey version, investigate relationship between responses per survey version and prior experience.
Null hypothesis: Survey results are independent of the variable they are sliced upon (either survey version or experience level).
v1['Version'] = 1
v2['Version'] = 2
total = pd.concat([v1, v2])
total['ClosedSprite'] = total['ClosedSprite'].str.strip().str.lower()
total['OpenShape'] = total['OpenShape'].str.lower()
all_chi = perform_all_chi(total, survey_headers)
exp_chi = perform_exp_chi(total, survey_headers)
total_p_values = pd.DataFrame(list(zip(all_chi, exp_chi)), columns=['sliced by survey version', 'sliced by experience'], index=survey_headers).transpose()
print("P-Values from Chi Square Independence Test performed individually by survey version and experience level: ")
display(total_p_values)
print("\n\nP-Values from Chi Square Indpendence Test performed by experience level on responses with significant difference based on survey version: ")
version_headers = ['OpenShape', 'ClosedScreen', 'ClosedMovement']
exp_chi_version1 = perform_exp_chi(v1, version_headers)
exp_chi_version2= perform_exp_chi(v2, version_headers)
pd.DataFrame(list(zip(exp_chi_version1, exp_chi_version2)), columns=['v1', 'v2'], index=version_headers).transpose()
P-Values from Chi Square Independence Test performed individually by survey version and experience level:
OpenShape | OpenScreen | ClosedScreen | ClosedMovement | ClosedWave | ClosedStep | ClosedEvent | ClosedState | OpenSprite | ClosedSprite | |
---|---|---|---|---|---|---|---|---|---|---|
sliced by survey version | 0.008233 | 0.422553 | 1.857338e-12 | 0.000001 | 0.499849 | 0.222499 | 0.815481 | 0.069710 | 0.158248 | 0.647583 |
sliced by experience | 0.333725 | 0.011658 | 7.237847e-03 | 0.249262 | 0.152158 | 0.237950 | 0.149999 | 0.040338 | 0.019545 | 0.001064 |
P-Values from Chi Square Indpendence Test performed by experience level on responses with significant difference based on survey version:
OpenShape | ClosedScreen | ClosedMovement | |
---|---|---|---|
v1 | 0.185385 | 0.297430 | 0.786123 |
v2 | 0.818702 | 0.168839 | 0.494844 |
For P-values less than our confidence interval of 0.05, we reject the null hypothesis that the results are independent. For example, ClosedMovement's p-value based on survey version is 0.000001, so we reject the null hypothesis and do not conclude that the results per survey version are independent.
Questions with p-values in which we accept the null hypothesis that responses are independent based on survey version, we pooled v1 and v2 results together. When pooled, there were 194 responses. Maintain total responses from surveys for questions with no significant difference and unusable data ('OpenScreen' that did not measure proper concept).
For each question:
upd_total = total.drop(['OpenShape', 'OpenScreen', 'ClosedScreen', 'ClosedMovement', 'OpenMovement'], axis=1)
No significant difference based on experience.
wave_total = upd_total['ClosedWave'].value_counts()
print(wave_total)
plt.figure(figsize=(6,6))
plt.pie(x=wave_total, colors=['lightcoral', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.title('ClosedWave results from V1 and V2', fontsize=16)
plt.legend(['Animation', 'Action', 'Movement'], bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
Animation 82 Action 73 Movement 29 Name: ClosedWave, dtype: int64
animation
and action
closely frequent.
No significant difference based on experience.
step_total = upd_total['ClosedStep'].value_counts()
print(step_total)
plt.figure(figsize=(6,6))
plt.pie(x=step_total, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.title('ClosedStep results from V1 and V2', fontsize = 16)
plt.legend(['Repeat', 'Iteration', 'Step', 'Update'], bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
Repeat 99 Iteration 62 Step 19 Update 4 Name: ClosedStep, dtype: int64
repeat
most frequent, iteration
also fairly popular.
No significant difference based on experience.
event_total = upd_total['ClosedEvent'].value_counts()
print(event_total)
plt.figure(figsize=(6,6))
plt.pie(x=event_total, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin', 'darkseagreen'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.title('ClosedEvent results from V1 and V2', fontsize = 16)
plt.legend(['When', 'Attach', 'Connect', 'Register', 'On'], bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
When 75 Attach 36 Connect 31 Register 24 On 18 Name: ClosedEvent, dtype: int64
'when' evidently most common.
Significant difference based on experience.
state_total = make_exp_table(upd_total, 'ClosedState')
state_total = state_total.drop(labels="All", axis=0)
display(state_total)
moment = state_total['moment']
labels = ['No Prgm.,\nNo Game Dev.', 'No Prgm.,\nGame Dev.', 'Prgm.,\nNo Game Dev.', 'Prgm.,\nGame Dev.']
x = np.arange(len(labels))
width = .20
fig, ax = plt.subplots(figsize=(8,8))
moment_bar = ax.bar(x - width*(3/2), moment, width, label="moment", color='lightcoral')
state = state_total['state']
state_bar = ax.bar(x - width*.5, state, width, label="state", color='lightblue')
status = state_total['status']
status_bar = ax.bar(x+width*.5, status, width, label='status', color='moccasin')
world = state_total['world']
world_bar = ax.bar(x + width*1.5, world, width, label='world', color='darkseagreen')
ax.set_title('ClosedState Responses by Experience', fontsize=20)
ax.set_ylabel('Responses', fontsize=20)
ax.set_xlabel('Prior Experience Level', fontsize = 20)
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=12)
ax.legend(prop={'size':14})
fig.tight_layout()
plt.show()
ClosedState by experience
ClosedState | moment | state | status | world | All | |
---|---|---|---|---|---|---|
Prior | PriorGameDev | |||||
False | No. | 28 | 22 | 18 | 2 | 70 |
Yes. | 3 | 1 | 2 | 0 | 6 | |
True | No. | 12 | 21 | 19 | 0 | 52 |
Yes. | 7 | 30 | 18 | 1 | 56 |
As initially noticed in V1, moment
is more popular among participants without prior game development experience and not as popular among students with game development and programming experience. state
is still most popular overall.
Significant difference based on experience.
All results entered at least two times. Accounts for plurals. For example character
represents responses of character
or characters
.
upd_total['OpenSprite'] = upd_total['OpenSprite'].apply(clean_text)
open_results = (analyze_free_response(upd_total, 'OpenSprite', True))
open_results = open_results.drop(open_results.columns[16:], axis=1)
open_results
OpenSprite by experience
OpenSprite | object | character | sprite | nan | figure | element | item | candy | animation | entitie | asset | characters | game element | bubble | component | objective | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prior | PriorGameDev | ||||||||||||||||
False | No. | 11 | 14 | 2 | 0 | 4 | 4 | 1 | 3 | 0 | 2 | 2 | 2 | 1 | 2 | 2 | 1 |
Yes. | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
True | No. | 12 | 9 | 4 | 1 | 4 | 0 | 2 | 1 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Yes. | 12 | 10 | 12 | 11 | 0 | 3 | 2 | 2 | 2 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | |
All | 36 | 34 | 18 | 12 | 8 | 7 | 6 | 6 | 4 | 4 | 4 | 2 | 2 | 2 | 2 | 2 |
Signficantly difference based on experience.
sprite = make_exp_table(upd_total, 'ClosedSprite').sort_values(('All',''), axis=1, ascending=False)
sprite = sprite.drop(labels="All", axis=0)
display(sprite)
objects = sprite['objects']
labels = ['No Prog.,\nNo Game', 'No Prog.,\nGame', 'Prog.,\nNo Game', 'Prog.,\nGame']
x = np.arange(len(labels))
width = .20
fig, ax = plt.subplots(figsize=(8,8))
objects_bar = ax.bar(x - width, objects, width, label="objects", color='lightcoral')
characters = sprite['characters']
char_bar = ax.bar(x, characters, width, label="characters", color='lightblue')
sprites = sprite['sprites']
sprites_bar = ax.bar(x + width, sprites, width, label='sprites', color='moccasin')
ax.set_title('ClosedSprite Responses by Experience',fontsize=20)
ax.set_ylabel('Responses', fontsize=20)
ax.set_xlabel('Prior Experience Level', fontsize=20)
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=12)
ax.legend(prop={'size':14})
fig.tight_layout()
ClosedSprite by experience
ClosedSprite | All | objects | characters | sprites | images | drawables | pictures | pixels | |
---|---|---|---|---|---|---|---|---|---|
Prior | PriorGameDev | ||||||||
False | No. | 70 | 28 | 25 | 14 | 1 | 0 | 1 | 1 |
Yes. | 6 | 3 | 2 | 0 | 0 | 1 | 0 | 0 | |
True | No. | 52 | 20 | 17 | 14 | 1 | 0 | 0 | 0 |
Yes. | 55 | 26 | 10 | 19 | 0 | 0 | 0 | 0 |
More participants without prior game development experience preferred characters. Objects most popular overall.
Investigating responses with signficant differences between V1 and V2.
total['OpenShape'] = total['OpenShape'].apply(clean_text)
open_shape_results = analyze_free_response(total, 'OpenShape', False)
open_shape_results = open_shape_results.drop(open_shape_results.columns[15:], axis=1)
open_shape_results
OpenShape contingency table
OpenShape | rectangle | draw_rectangle | nan | make_rectangle | create_rectangle | generate_rectangle | shape | draw_square | rect | generate_rect | square | make_rect | makerectangle | draw_rect | i don't know |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Version | |||||||||||||||
1 | 11 | 16 | 0 | 11 | 7 | 2 | 2 | 4 | 2 | 1 | 2 | 1 | 1 | 1 | 0 |
2 | 51 | 6 | 14 | 1 | 5 | 4 | 3 | 0 | 2 | 1 | 0 | 1 | 1 | 1 | 2 |
All | 62 | 22 | 14 | 12 | 12 | 6 | 5 | 4 | 4 | 2 | 2 | 2 | 2 | 2 | 2 |
rectangle
more popular in V2, draw_rectangle
more popular in V1.
Significant based on survey version. Visualizations are weird with added option.
closed_tot = (make_table_by_q(total, 'ClosedScreen').sort_values(by='All', axis=1, ascending=False))
display(closed_tot)
closed_tot = closed_tot.drop(labels="All", axis=1)
display(closed_tot)
labels = ['window', 'screen', 'view', 'scene', 'world', 'level']
labels_v1 = labels[1:]
v1_data = closed_tot.iloc[0]
v1_data = v1_data[1:6]
v2_data = closed_tot.iloc[1]
v2_data = v2_data[:6]
plt.figure(figsize=(6,6))
plt.title('ClosedScreen Results V1', fontsize = 16)
v1_pi = plt.pie(v1_data, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin', 'darkseagreen'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.legend(labels_v1, bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
plt.figure(figsize=(6,6))
plt.title('ClosedScreen Results V2', fontsize = 16)
v2_pi = plt.pie(v2_data, colors=['sandybrown', 'lightcoral', 'lavender', 'lightblue', 'moccasin', 'darkseagreen'], autopct='%.2f%%', textprops={'fontsize': 12})
plt.legend(labels, bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
ClosedScreen contingency table
ClosedScreen | All | Window | Screen | View | Scene | World | Level | Camera |
---|---|---|---|---|---|---|---|---|
Version | ||||||||
1 | 76 | 0 | 30 | 23 | 15 | 4 | 2 | 2 |
2 | 107 | 53 | 22 | 6 | 12 | 7 | 7 | 0 |
All | 183 | 53 | 52 | 29 | 27 | 11 | 9 | 2 |
ClosedScreen | Window | Screen | View | Scene | World | Level | Camera |
---|---|---|---|---|---|---|---|
Version | |||||||
1 | 0 | 30 | 23 | 15 | 4 | 2 | 2 |
2 | 53 | 22 | 6 | 12 | 7 | 7 | 0 |
All | 53 | 52 | 29 | 27 | 11 | 9 | 2 |
Not including window
in V1 affected results, considering it was not a survey option. V2 suggests window
is popular option, screen
as second option.
movement_tot = make_table_by_q(total, 'ClosedMovement')
movement_tot = movement_tot.drop(labels="All", axis=1)
display(movement_tot)
labels = ['action', 'animation', 'glide', 'movement']
v1_data = movement_tot.iloc[0]
v2_data = movement_tot.iloc[1]
plt.figure(figsize=(6,6))
plt.title('ClosedMovement Results V1', fontsize = 16)
v1_pi = plt.pie(v1_data, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.legend(labels, bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
plt.figure(figsize=(6,6))
plt.title('ClosedMovement Results from V2', fontsize = 16)
v2_pi = plt.pie(v2_data, colors=['lightcoral', 'lavender', 'lightblue', 'moccasin'], autopct='%.2f%%', textprops={'fontsize': 14})
plt.legend(labels, bbox_to_anchor=(1.05, 1), prop={'size':14})
plt.show()
ClosedMovement contingency table
ClosedMovement | Action | Animation | Glide | Movement |
---|---|---|---|---|
Version | ||||
1 | 3 | 7 | 33 | 33 |
2 | 5 | 50 | 30 | 23 |
All | 8 | 57 | 63 | 56 |
animation
more popular. Difference from V1 and V2 was wording changed to manage bias. Obviously, significant difference, suggests wording affects participant perception and bias.
The final question of both v1 and v2 surveys showed students an animation of a frog across the screen and asked them to write code to animate it. Following is the analysis to these responses. Observe distributions and patterns in separate V1 and V2 data.
print('V1: ')
print(analyze_open_end(v1, 'GameDevOpenCode'))
print('V2: ')
print(analyze_open_end(v2, 'GameDevOpenCode'))
V1:
Num Lines Normality on an interval of .05: 0.8001936078071594 2.4016699171625078e-06 True Num Characters Normality on an interval of .05: 0.7900149822235107 1.4448777392317425e-06 True Broken: 6; Working: 70 {'methods': 18, 'functions': 15, 'both': 2, 'test cases': 0, 'graphics': 0, 'pens': 0, 'for': 3, 'while': 2, 'if': 2, 'func def': 7} V2:
Num Lines Normality on an interval of .05: 0.4858640432357788 7.884292787326171e-16 True Num Characters Normality on an interval of .05: 0.7452929019927979 6.568793425865138e-11 True Broken: 34; Working: 84 {'methods': 31, 'functions': 11, 'both': 1, 'test cases': 0, 'graphics': 0, 'pens': 0, 'for': 0, 'while': 1, 'if': 2, 'func def': 5}
All p-values are less than 0.05, meaning we reject the null hypothesis that line or character length of either survey version are normally distributed. Because these values are not normally distributed, we can perform Mann Whitney U tests to compare the version one and version two responses.
v1_char_line = avg_lines_char(v1, 'GameDevOpenCode')
v2_char_line = avg_lines_char(v2, 'GameDevOpenCode')
line_mwu, line_mwu_p = st.mannwhitneyu(x=v1_char_line[0], y=v2_char_line[0])
char_mwu, char_mwu_p = st.mannwhitneyu(x=v1_char_line[1], y=v2_char_line[1])
print(f"P-value of lines per response: {line_mwu_p},\nP-value of characters per response: {char_mwu_p}")
P-value of lines per response: 1.1529999803013193e-16, P-value of characters per response: 1.8128938277980706e-06
Both p-values comparing line and character numbers are significantly small, less than 0.05, so we reject the null hypothesis that there are no statistically significant differences of response length and survey version. This suggests responses cannot be compared together and should instead be looked at separately.
# code not written by me, written by co-PI
def attrib(code):
clear_report()
code = str(code)
calls = find_asts("Call", code)
num_for = len(find_asts("For", code))
num_while = len(find_asts("While", code))
num_assign = len(find_asts("Assign", code))
num_def = len(find_asts("FunctionDef", code))
called_functions = [get_called_function(c.func) for c in calls]
call = False
for c in calls:
if (c.func.ast_name == "Name") or (c.func.ast_name == "Attribute"):
call = True
else:
call = False
try:
code = str(code)
ast.parse(code)
working = True
except SyntaxError:
working = False
if code == 'nan':
return "empty"
elif num_def:
return "function definition"
elif num_for or num_while:
return "other construct"
elif call:
return "method/function call"
elif num_assign:
return "assignment"
elif '#' in code or "'''" in code or '"""' in code:
return "comment"
elif not working:
return "did not compile"
def open_by_version(df):
return df.groupby(['Prior', 'PriorGameDev'])['GameDevOpenCode'].apply(attrib).count()
Classify responses based on conventions
def open_by_version(df):
classified = df.assign(classification=df.GameDevOpenCode.apply(attrib))
grouped_classes = classified.groupby(['Prior', 'PriorGameDev']).classification
classes_counts = grouped_classes.value_counts()
return classes_counts.unstack().fillna(0).astype(int)
v1_open = open_by_version(v1).loc[False,'No.']
# v1_open = v1_open.div(v1_open.agg(sum)).multiply(100).round(1)
v2_open = open_by_version(v2).loc[False, 'No.']
# v2_open = v2_open.div(v2_open.agg(sum)).multiply(100).round(1)
open_total = pd.concat([v1_open, v2_open], axis = 1)
open_total.columns = ['V1', 'V2']
open_total = open_total.fillna(0)
print("Code conventions present in version 1 and version 2 responses: ")
open_total
Percentages of code conventions present in version 1 and version 2 responses:
V1 | V2 | |
---|---|---|
classification | ||
assignment | 0 | 1.0 |
comment | 0 | 0.0 |
did not compile | 3 | 19.0 |
empty | 7 | 11.0 |
function definition | 1 | 0.0 |
method/function call | 6 | 17.0 |
other construct | 1 | 0.0 |
Notice an increase in number of responses that did not compile from v1 to v2. Is this a significant difference? Perform a chi-square test for independence to determine if this dependent on survey version.
chi = st.chi2_contingency(pd.crosstab(open_total['V1'], open_total['V2']))
print(f'P-value of response conventions based on survey version: {chi[1]}.')
P-value of response conventions based on survey version: 0.10511012676905432.
This is greater than our significance level of 0.05 so we accept the null hypothesis that the conventions of responses are independent of survey version. Therefore, even though there is that increase of responses that did not compile between survey versions, it is not a significant increase.
Results to RQ2 support broader conclusions that answer RQ1.
Regarding RQ2, there were some survey questions with significant difference based on survey version. For example, results to our survey question asking what students call an animation of a character walking across the screen were significantly different between survey versions. This difference may be explained by a survey design change from v1 to v2. We realized after launching v1 that the wording of the question may introduce bias, so when designing v2 we reworded the question to mitigate that bias. The significantly different results between survey versions suggests that the vocabulary that students use to think about video games depends on how game design concepts are framed to them. This insight into student vocabulary highlights not only the importance of choosing consistent phrasing for the game library API, but also the importance of intentional consistency when teaching game design concepts to students in the classroom.
Also on the topic of RQ2, analysis of some survey questions suggests that there is not significant difference in responses based on prior experience of students or survey response - suggesting that some students think about game design concepts similarly regardless of their prior experience with programming and game development. Such concepts based on our analysis include in-place animations such as a character waving, of which the most popular response among students in our sample conceptualized as an animation
. Additionally, most participants regardless of prior experience consider an iterative occurence as a repeat
.
Analysis of other survey responses suggest that there are some game design concepts that students may conceptualize differently depending on their prior levels of programming and game design experience. For example, when referring to a game's state at one particular moment, students without prior game development experience significantly preferred the term moment
compared to students with prior experience. When asked about vocabulary regarding sprites, students without prior programming or game design experience chose character
more than students with other levels of experience.
Much like the answer to RQ2: it's complicated. Some results already discussed were clearly favored, such as repeat
as an iteration. However, results to the question asking about a character waving were most popularly animation
, but action
was almost equally favored. No response received the majority. As a result, there is not an "ideal", or most popular, result favored by all students regarding an in-place animation.
Results to the question asking about sprites showed general favorability towards both character
and object
. Although these were favored, they call into question what "ideal" means in this research question and this context. Although this is ideal in the sense that most students preferred these terms, are they ideal in a teaching environment? object
and character
both have different meanings in computer science contexts that students may learn in classes after introductory computer science - what are the risks of choosing these terms for a game library API that may interfere with students' further learning? These findings offer preliminary insight into students' thinking regarding game design and also offer further questions to be explored in future design of a game library API.
Analysis of the survey one and survey two responses offers preliminary insight into how students think about game design before formally learning about game development in the classroom. In conclusion, the way students think about game design when considering prior experience is complicated and nuanced. Acknowledging ethical concerns, this data is not conclusive and depending too heavily only on this analysis ignores issues such as data collected from a PWI and the anecdotal choosing of terms to investigate. From this data analysi I can confidently conclude that there is always room to explore and ask more questions regarding how students learn computer science.