## User description

It is becoming increasingly difficult for game developers to manage the cost of developing a game, while meeting the high expectations of gamers. One way to balance the increasing gamer expectation and development stress is to build an active modding community around the game. There exist several examples of games with an extremely active and successful modding community, with the Minecraft game being one of the most notable ones.This paper reports on an empirical study of 1,114 popular and 1,114 unpopular Minecraft mods from the CurseForge mod distribution platform, one of the largest distribution platforms for Minecraft mods. We analyzed the relationship between 33 features across 5 dimensions of mod characteristics and the popularity of mods (i.e., mod category, mod documentation, environmental context of the mod, remuneration for the mod, and community contribution for the mod), to understand the characteristics of popular Minecraft mods. We firstly verify that the studied dimensions have significant explanatory power in distinguishing the popularity of the studied mods. Then we evaluated the contribution of each of the 33 features across the 5 dimensions. We observed that popular mods tend to have a high quality description and promote community contribution.Keywords: mods mod development CurseForge MinecraftThe team size, cost and complexity in game development can grow exponentially as the user requirements increase Shumovsky (2018). Thus, it has become challenging to develop a successful game, and game developers are constantly under an immense amount of stress Phillips (2018).One approach to balance the increasing gamer expectation and development stress is to build an active modding community around the game. Skyrim and Minecraft are examples of games that have been successful in building active modding communities Hackman and Björkqvist (2014); Zorn et al (2013) to increase the longevity of the games. For example, the Skyrim game still has a median of 86 new mods released per day 8 years after its initial game release in 2011, along with more than 514M total unique downloads of mods Scott (2007). Prior work also shows that an active modding community can contribute to the increased sales of the original game Poretski and Arazy (2017).There are two key components of an active modding community of a game: the active development of mods, and the active adoption of mods by gamers. In our prior work, we looked at how game developers can help maintain the active development of mods, and observed that games from developers with a consistent modding support within the same or different game franchises, were associated with faster releases of mods Lee et al (2018). In this paper, we identify the characteristics that distinguish popular mods from unpopular ones. To do so, we study 33 characteristics along 5 dimensions of 1,114 popular and 1,114 unpopular mods for the Minecraft game from the CurseForge mod distribution platform - one of the largest distribution platforms for Minecraft mods. We focus on the mods from the Minecraft game because it has one of the largest and most active modding communities Mills, Aaron (2015). In particular, we answer the following two research questions (RQs):RQ1:Do our studied dimensions have enough explanatory power to distinguish popular mods from unpopular ones? Motivation: The goal of this research question is to investigate how well each studied dimension of characteristics (i.e., features) of mods can individually distinguish the popular mods from unpopular ones. We also investigate how well all the studied dimensions together can distinguish popular mods from unpopular ones. Prior work Tian et al (2015) used similar dimensions to identify the characteristics that distinguish mobile apps with high ratings from the ones with low ratings. The results of this research question lay the foundation for further investigations of the characteristics of popular mods. Findings: We observed that each studied dimension of characteristics of a mod has significant explanatory power in distinguishing popular from unpopular mods. Among the studied dimensions, the community contribution for the mod dimension has the largest explanatory power. However, our combined model which uses all the features across the five dimensions outperforms the best model using an individual dimension by 10% (median).RQ2:What features best characterize a popular mod? Motivation: The goal of this research question is to investigate which features of mods can best characterize popular mods. The results of RQ1 show that the studied features have a strong explanatory power for the popularity of a mod. In this RQ, we further investigate the characteristics of popular mods at a granular level. Findings: We observed that 18 of the 33 (54.5%) studied features help in distinguishing popular mods from unpopular ones. Simplifying the mod development is positively correlated with mod popularity. In addition, popular mods tend to promote community contribution with a source code repository URL and an issue tracking URL, and have a richer mod description.The remainder of the paper is outlined as follows. Section 2 gives background information about the Minecraft game and the CurseForge mod distribution platform. Section 3 gives an overview of related work. Section 4 discusses our methodology. Sections 5 discusses the results of our empirical study. Section 6 outlines threats to the validity of our findings. Section 7 concludes our study.This section provides a brief overview of the Minecraft game and the CurseForge mod distribution platform.2.1 The Minecraft GameThe Minecraft game is an open-ended 3D sandbox game, initially developed in the Java programming language, where gamers can use various resources (e.g., blocks) to create their own worlds Mojang (2019). Developed by the Mojang111https://mojang.com/ game studio, the Minecraft game is one of the best selling video games of all time in 2019, with over 176 million copies sold since its release in 2011 Blake, Vikki (2019). Mods are considered one of the most popular aspects of the Minecraft game, and are credited for the great success of the game Geere and Copeland (2019); Finley, Klint (2014); O’Brien, Chris (2013).2.2 The CurseForge Mod Distribution PlatformMinecraft mods on CurseForge. The CurseForge mod distribution platform hosts one of the largest online Minecraft mod repositories with more than 12,000 downloadable mods CurseForge (2006). Table 1 shows a comparison of the CurseForge mod distribution platform to other Minecraft mod distribution platforms with respect to the number of mods. The CurseForge mod distribution platform provides a dedicated page for each mod. The dedicated page contains detailed information about a mod including contributors, releases, and dependencies, while categorizing the mod under at least one mod category. Furthermore, mod developers can provide their Paypal222https://www.paypal.com/ or Patreon333https://www.patreon.com/ donation URLs on their mod’s page. Patreon is a crowdfunding platform where content creators such as mod developers can promote themselves, and receive monthly donations.Mod contributors on CurseForge. A mod on the CurseForge mod distribution platform can have multiple contributors, and each contributor is assigned a role for the mod (i.e., artist, author, contributor, documenter, former author, maintainer, mascot, owner, tester, ticket manager, or translator). There can be multiple contributors of a mod with the same role, except for the “owner” role which is only assigned to the user that creates the mod on the platform. Unfortunately, the CurseForge mod distribution platform does not provide any official definition for the roles. Furthermore, we observed that the number of mod developers in a mod does not always accurately represent the actual number of contributors. For example, the Fossils and Archeology Revival mod444https://minecraft.curseforge.com/projects/fossils shows 10 mod developers on the CurseForge page, but the mod has 17 contributors on Github. Hence, we do not use the mod developer roles or the number of mod developers in our study.Mod releases and dependencies on CurseForge. The dedicated page of each mod on the CurseForge mod distribution platform lists the mod releases with corresponding upload dates and supported Minecraft, Java, and Bukkit555Bukkit is a Minecraft Server mod that helps in the running and modification of a Minecraft server. See https://bukkit.org/pages/about-us/ for more details. versions. In addition, the dependencies for each release are also listed on a mod’s page. games The CurseForge mod distribution platform supports the declaration of several types of dependencies of a mod release, including “incompatible”, “tool”, “required”, “embedded library”, and “optional dependencies”.This section discusses prior studies that are related to our study. We discuss related work on (1) empirical studies of game mods, (2) games and software engineering, (3) studies of the Minecraft game, and (4) mining online software distribution platforms.3.1 Empirical Studies of Game ModsSeveral prior studies studied the modding community to identify and analyze the relationship between mod developers and the game industry, yielding insights on collaborative practices and strategies, as well as capturing the value of mods Arakji and Lang (2007); Jeppesen (2004); Nieborg and Van der Graaf (2008). A few prior studies mined data from the Nexus Mods distribution platform to quantitatively study the motivation behind mod developers based on the users’ expectations, and to understand how to build and maintain an active modding community Dey et al (2016); Lee et al (2018). Particularly, Dey et al. Dey et al (2016) study the meta data available for popular and unpopular mods of six famous PC games across several popular online mod distribution platforms to investigate the motivations of mod developers. They find that user demands and the content created by the mod developers correlate very weakly and suggest that more effort needs to undertaken to bridge this gap. Furthermore, similar to our study they also seek to investigate what features make a mod popular. However, they consider only the general tags associated with a given mod and they do it across multiple games without any consideration to the game-specific characteristics.Additionally, Poretski and Arazy Poretski and Arazy (2017) conducted an empirical study on 45 games from the Nexus Mods distribution platform and observed that mods increased the sales of the original game. Targett et al. Targett et al (2012) empirically studied user-interface mods of the World of Warcraft666https://worldofwarcraft.com/en-us/ game to gather insights on how mods contribute to the World of Warcraft game and its modding community. They observed that modifications helped the interface of video games meet the needs of users, since every user has their own ideal interface.Similarly, Wu et al. Wu (2016) studied popular Reddit threads on Minecraft mod discussions to uncover the learnt knowledge by Minecraft modders. They assert that these threads contain vast peer-generated knowledge on how to create artifacts in the Minecraft environment. Levitt Leavitt (2013) studied the evolution of the creative process around the creation of Minecraft mods. Additionally, several studies Lane et al (2017); Nguyen (2016) investigated Minecraft mods and their role in enhancing individual creativity and general interest in the field of Science, Technology, Engineering and Mathematics (STEM). They found that modding in the context of the Minecraft game positively influenced both of these aforementioned aspects. Beggs Beggs (2012) studied how the dynamics between producers and consumers within the game industry are impacted by modding. They did so by studying Minecraft mods. Beggs observed that Minecraft modders in total spend close to 3 million hours weekly creating and maintaining mods. Furthermore, they also noted that the modding culture pushes game consumers into generally preferring games that allow modding.Different from the aforementioned studies, we study the characteristics that distinguish popular mods from unpopular ones specific to a particular game (Minecraft) in order to better understand the characteristics of popular mods.3.2 Games and Software EngineeringSeveral studies investigated open source game projects to relate them to software engineering aspects Ahmed et al (2017); Pascarella et al (2018). For instance, Pascerella et al.Pascarella et al (2018) investigated how the developers contribute to video games in an open source setting. A few studies analyzed the development of the authors’ own video games Graham and Roberts (2006); Köhler et al (2012), while Guana et al. Guana et al (2015) studied the development of their own game engine. In particular, Guana et al. Guana et al (2015) outline how game development is more complicated than traditional software development and presents a model-driven approach to simplify the development of game engines. Bécares et al. Bécares et al (2017) investigated the gameplay of the Time and Space game and outlined an approach to automate the game tests.A few prior studies studied the videos of game-related bugs Lewis et al (2010). Notably, Lin et al. Lin et al (2019a) identified gameplay videos that showcase game bugs, as naïve methods such as keyword search is inaccurate. They proposed a random forest classifier that out-performs other classifiers (i.e., logistic regression and neural network), and provides a precision that is 43% higher than the naïve keyword search approach. Furthermore, several studies Lewis and Whitehead (2011); Politowski et al (2016); Washburn Jr et al (2016) have been conducted on the postmortems of games based on articles/magazines to draw insights on the do’s and dont’s of game development.Ampatzoglou and Stamelos Ampatzoglou and Stamelos (2010) provided researchers with a systemic review on available literature. In addition, Scacchi and Cooper Scacchi and Cooper (2015) extensively analyzed the software engineering literature of games.Rather than investigating the software engineering aspect of the original game, in this paper we conduct an empirical study by mining the software engineering aspects of game mods that are available in the CurseForge platform.3.3 Studies of the Minecraft GameSeveral prior studies have examined the Minecraft game for pedagogical uses Nebel et al (2016); Stone et al (2019); Lenig and Caporusso (2018); Al-Washmi et al (2014); Bayliss (2012); Bebbington (2014); Brand and Kinash (2013); Duncan (2011); Ekaputra et al (2013); Hanghøj et al (2014); Petrov (2014); Short (2012); Siko et al (2011); Zorn et al (2013). In addition, Nebel et al. Nebel et al (2016) conducted an extensive literature review on the usage of the Minecraft game in education. A few prior studies primarily focused on using the Minecraft game to study the players of the game Canossa et al (2013); Müller et al (2015); Quiring (2015). Furthermore, a few prior studies primarily focused on using the Minecraft game to streamline the development of software Balogh and Beszédes (2013); Saito et al (2014).In our study, we analyze Minecraft mods to provide an empirical understanding of the characteristics of popular mods.3.4 Mining Online Software Distribution PlatformsMining online software distribution platforms to provide useful information and insights about the popularity of software has been a fundamental part of software engineering research. We present a brief summary of how mining online software distribution platforms has been carried out in the context of traditional software, games and mobile apps.Traditional software. GitHub is one of the most popular online code hosting distribution platforms for traditional software. Several prior studies investigated the popularity of software projects in GitHub to provide insights to software developers Zhu et al (2014); Blincoe et al (2016); Kalliamvakou et al (2014); Borges et al (2016a, b); Borges and Valente (2018). For example, Borges et al. Borges et al (2016b) outline how a GitHub repository gathers popularity over time. In addition, Borges et al. outline the characteristics of successful GitHub repositories for other software developers to mimic. Similarly, Zhu et al. Zhu et al (2014) suggest that better folder organizational practices lead to better project popularity in GitHub.Mobile apps. Many prior studies investigated features that impact the success of a mobile app by mining data from mobile app stores to provide useful guidelines to mobile app developers Guerrouj et al (2015); Tian et al (2015); Bavota et al (2014); Linares-Vásquez et al (2013); Taba et al (2014); Chia et al (2012). For example, Tian et al. Tian et al (2015) studied the differences between popular and unpopular mobile apps and found that popular apps generally have more complex code and better exploit the latest features of the target Android SDK (Software Development Kit). Taba et al. Taba et al (2014) studied how the complexity of the UI of a mobile app affects its popularity and provided guidelines to developers on the amount of UI complexity they should strive for in order to keep their users happy. Similarly, Bavota et al. Bavota et al (2014) and Linares-Vásquez et al. Linares-Vásquez et al (2013) studied the characteristics of the APIs used by popular and unpopular apps and recommended developers to use less defect-prone and change-prone APIs to ensure the popularity of their mobile apps.Games. Prior studies that mine data from online game distribution platforms primarily focused on extrapolating useful insights for game developers from platforms such as Steam Sifa et al (2014); Blackburn et al (2014); Lin et al (2019b). For example, Lin et al. Lin et al (2017) studied urgent updates on the Steam platform and observed several update patterns to help developers avoid undesirable updates. Lin et al. Lin et al (2018) also studied the early access model on the Steam platform and suggested that game developers use the early access model to elicit early feedback and gather more positive feedback. Cheung et al. Cheung et al (2014) investigated over 200 Xbox 360 game reviews to understand how the first hour of gameplay engages new players. Similarly, Ahn et al. Ahn et al (2017) analyzed game reviews between popular and unpopular games on the Steam platform to better understand the characteristics of popular Steam games, and offered guidance to game developers on how to make their game popular.Though many studies mined various software repositories and provided insights to developers, these insights do not directly translate to mod developers as software such as mobile apps and games are developed from the ground-up for the consumption of users. In contrast, game mods are software that was built to enhance, extend or provide (new) features to an existing game in a meaningful way by hacking the source code of the original or through official APIs. Several prior studies Murphy-Hill et al (2014); Pascarella et al (2018); Petrillo et al (2009, 2008) show that video game development is starkly different from other types of software development. Therefore, by extension, we expect game mod development (which is a subset of game development) to be different from mobile app and video games development. For instance, consider these two studies by Tian et al. Tian et al (2015) and Ahn et al. Ahn et al (2017). Both studies examine the characteristics of popular mobile apps and video games by mining the Google Play store and the Steam platform respectively to provide insights to mobile app and video game developers. For the mobile app developers, Tian et al. Tian et al (2015) suggest that size of the app, number of promotional images and the target SDK are the three key elements that are associated with the popularity of a mobile app. In contrast, Ahn et al. Ahn et al (2017) recommend developers to improve the gameplay, the challenge and the motivational aspects and emotional connect of the video game while lowering the price and improving the gameâs storyline. However, different from both of these studies, from studying the CurseForge platform we find that popular mods are likely to have a better mod description, ease other mod development and welcome community contributions. Such a result further signifies that game mods are different from other types of software.Hence, the findings and recommendations for mobile developers, game developers and traditional software developers to ensure the popularity of their software as prescribed by prior studies cannot be directly transferred to game mod developers. Therefore, a study such as ours is pivotal in understanding the characteristics of popular mods. We envision future studies to build on our work in order to help developers improve the popularity of their mods.We did however conduct our study in the same vein as the aforementioned studies by mining the CurseForge mod distribution platform to gain an empirical understanding of the characteristics of popular mods. To the best of our knowledge, the study by Dey et al. Dey et al (2016) is the only other study that mines online mod distribution platforms to study the characteristics of popular mods. However, they focus only on the tags that are provided for the mods on the distribution platforms and do not endeavour to provide insights to mod developers.We study the characteristics of popular and unpopular mods specific to a particular game (Minecraft) to better understand what characterizes popular mods. These characteristics can be further explored by future work to assist mod developers in improving the quality of their mods. Furthermore, we are the first to conduct a statistically rigorous analysis on 33 features collected across 5 dimensions to generate insights for mod developers.This section discusses the methodology of our empirical study of the characteristics of popular and unpopular Minecraft mods. Figure 1 gives an overview of our methodology.4.1 Collecting DataWe collected the dataset for our study from the CurseForge mod distribution platform on June 6, 2019, using a customized crawler. Table 2 shows an overview our Minecraft mod dataset.Collecting Mods. We collected the information of 12,710 mods. In particular, we collected the name, categories, number of total comments, source code URL, issue tracking URL, Paypal URL, and Patreon URL for each mod.Collecting Mod Releases. We collected the information of 111,574 releases across all mods. In particular, we collected the type, upload date, size, number of downloads, and supported Minecraft, Java, and Bukkit versions for each mod release.Collecting Dependencies. We collected 76,453 mod dependencies across all mod releases. In particular, we collected the type, mods, and the direction for each dependency.4.2 Filtering ModsTo ensure the quality of the studied mods, we first removed 295 inactive mods that have no mod releases. Then, we removed 6,845 mods that were created before 2014 or after 2016 to ensure the studied mods all have an equal chance to obtain a high number of downloads. For the remaining 5,570 mods, we selected the top and bottom 20% of the mods based on their total number of downloads for our study. We consider the top 20% of mods (1,114 mods) as popular mods, and the bottom 20% of mods (1,114 mods) as unpopular mods based on their total number of downloads. Hence the claims that are made about a mod being (un)popular are about the likelihood of the mod belonging to the most/least popular group of mods.We do not take into account the lifetime of a mod (despite some mods being created in 2014 and some mods being created in 2016) when separating the mods into popular and unpopular groups. We do so as the number of median downloads across the studied years for mods in the popular and unpopular groups remains relatively consistent as we can observe from Figure 2. Furthermore, we observed that the number of popular mods that were created each year in the studied period also remains consistent. More specifically, among the 1,114 popular mods, 279 were created in 2014, and 415 and 418 mods were created in 2015 and 2016 respectively. In total, we studied 2,228 mods. Our selection approach is similar to prior study Tian et al (2015) which selected the highest and lowest rated mobile apps for study.We choose to study the number of downloads as a proxy for the popularity of a mod, as this number acts as a good indicator of the needs for the provided features/alterations by the mod within the Minecraft community. Furthermore, a mod becoming popular in an online platform like CurseForge is pivotal for the mod developers. For instance, as Postigo et al. Postigo (2007) outline, mod developers want their mods to be popular as being known in the modding community may open up potentially lucrative job opportunities. Finally, identifying features that affect the popularity of software in online distribution platforms is widely regarded as an important software engineering challenge Nagappan and Shihab (2016). This importance is for example demonstrated by the many software engineering studies that examine the characteristics of popular mobile apps in app stores (e.g.,Harman et al (2012); Tian et al (2015); Bavota et al (2014); Linares-Vásquez et al (2013)).For each of the 2,228 mods, we used the information of the mod’s latest release and dependencies in our study.4.3 Selecting FeaturesStarting from prior work on the popularity of mobile apps Tian et al (2015) and our own intuition, we defined 5 dimensions that might be associated with the popularity of mods (i.e., mod category, mod documentation, environmental context of the mod, remuneration for the mod, and community contribution for the mod). Then, we define for each dimension the features that are available on the CurseForge platform and that we can extract in an automated fashion. We end up with 33 features (characteristics) that we leverage to understand the differences between the characteristics of popular and unpopular Minecraft mods.Table 3 shows an overview of the 33 features and their associated dimensions, along with their corresponding explanation and rationale. In addition, we normalized all features with the ‘numeric’ type in Table 3 using a log(1 + x) transformation to reduce the bias caused by the outliers.5 Characteristics of Popular and Unpopular Minecraft ModsIn this section, we present the results of our empirical study of the characteristics of popular and unpopular Minecraft mods.5.1 RQ1: Do our studied dimensions have enough explanatory power to distinguish popular mods from unpopular ones?Motivation: In this research question, we investigate how well each studied dimension of characteristics (i.e., features) of mods can individually distinguish the popular mods from unpopular ones. We also investigate how well can all the studied dimensions together distinguish popular mods from unpopular ones. Prior study Tian et al (2015) used similar dimensions to identify the characteristics that distinguish mobile apps with high ratings from the ones with low ratings. The results of this research question lay the foundation for further investigations of the characteristics of popular mods.Approach: To investigate how well the individual dimensions can distinguish popular mods from unpopular ones (i.e., their explanatory power), we built a logistic regression model for each dimension in Table 3. We used logistic regression, instead of other complex techniques (e.g., a neural network) as logistic regression is transparent and interpretable Ruiz and Villa (2008); Molnar (2018). In particular, for each dimension’s model, we used the features in a dimension as independent variables and whether the mod is popular as the dependent variable. We consider the given dimension to have significant explanatory power if the AUC of the model constructed with the dimension is greater than 0.5, which means that the dimension can distinguish popular from unpopular mods. The dimension that results in the largest AUC is deemed to have the most explanatory power and vice versa. We used the glm function777https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/glm from the stats package888https://www.rdocumentation.org/packages/stats/versions/3.6.1 to create the logistic regression models.To validate the performance of our built models, we performed 100 out-of-sample bootstrap iterations to compute the AUC (Area Under the receiver operator characteristics Curve) for each model. Prior study Tantithamthavorn et al (2016) showed that the out-of-sample bootstrap technique had the best balance between the bias and variance of estimates. The out-of-sample bootstrap technique randomly samples data with replacement for n iterations. The sampled data in an iteration is used as the training set for that iteration, while the data that was not sampled in that iteration is used as the testing set for that iteration. We then trained a model with the training set and calculated the AUC of the model with the testing set for each iteration.In addition, to investigate how well all studied dimensions combined can distinguish popular mods from unpopular mods, we built a logistic regression model using all 33 features from the 5 dimensions in Table 3. We evaluated the performance of this combined model using the same aforementioned process of computing the AUC of the model with 100 out-of-sample bootstrap iterations.Furthermore, we used the Scott-Knott effect size difference test to statistically sort and rank the distributions of the AUCs of all studied dimensions Tantithamthavorn et al (2016). We used the sk_esd function999https://www.rdocumentation.org/packages/ScottKnottESD/versions/1.2.2/topics/%22sk_esd%22 from the ScottKnottESD package101010https://www.rdocumentation.org/packages/ScottKnottESD/versions/1.2.2 for the Scott-Knott effect size difference test.Findings: Each studied dimension has significant explanatory power to individually identify popular mods. Figure 3 shows the distribution of AUCs per studied dimension. The lowest median AUC among the studied dimensions was 0.66, implying that every dimension has significant explanatory power (i.e., the model has an AUC >>>0.5) in distinguishing popular mods from unpopular ones. In addition, the Scott-Knott effect size difference test shows a statistical significant difference between each studied dimensions, with non-negligible effect sizes. Among the studied dimensions, the community contribution for the mod dimension is ranked as having the largest explanatory power, whereas the remuneration for the mod dimension is ranked as having the lowest explanatory power.The combined model has a larger explanatory power than each of the studied dimension individually. Figure 3 shows the distribution of AUCs of the combined model that combines all studied dimensions together. The combined model has the largest median AUC of 0.91, outperforming every one of the studied dimensions on their own. The Scott-Knott effect size difference test confirms that the combined model has the highest ranking in explanatory power compared to the individual studied dimensions.In addition, Figure 3 shows that the combined model has a 10% higher median AUC than the community contribution for the mod dimension (the dimension with the highest explanatory power among the studied dimensions), and a 38% higher median AUC than the remuneration for the mod dimension (the dimension with the lowest explanatory power among the studied dimensions). Prior study Tian et al (2015) also observed that a combined model with all the dimensions has a larger explanatory power than models with individual dimensions in the context of distinguishing mobile apps with high ratings from mobile apps with low ratings.Each studied dimension of characteristics of a mod has significant explanatory power in distinguishing popular from unpopular mods. Among the studied dimensions, the community contribution for the mod dimension has the largest explanatory power. However, our combined model which uses all the features across the five dimensions outperforms the best model using individual dimension by 10% (median).5.2 RQ2: Which features best characterize a popular mod?Motivation: In this research question, we investigate which mod features can best characterize popular mods. The results of RQ1 show that the studied dimensions have a strong explanatory power for the popularity of a mod. In this RQ, we further investigate the characteristics of popular mods at the feature-level across 33 features and 5 dimensions to systematically quantify the association between the studied features and the number of downloads for a mod.Approach: To investigate which features can best characterize popular mods, in this research question we focus on analyzing the combined model with all dimensions of features, as RQ1 shows that the combined model has the most explanatory power for mod popularity.Figure 4 shows an overview of our approach to construct, evaluate and analyze the combined model. Below we explain each step in detail:1. Correlation analysis. We performed correlation analysis to reduce collinearity between the features before we built the models, since correlated features can affect the interpretation of the model Midi et al (2010); McIntosh et al (2016). We used the varclus function111111https://www.rdocumentation.org/packages/Hmisc/versions/4.2-0/topics/varclus from the Hmisc package121212https://cran.r-project.org/web/packages/Hmisc/index.html in R to filter out highly correlated features. We calculated Spearman’s correlation coefficients among the studied features. We consider a pair of features with a Spearman correlation coefficient >=>=> = 0.7 as highly correlated. We did not observe high correlations among our studied features.2. Redundancy analysis. Before building the models, we also performed redundancy analysis to eliminate redundant features that can interfere with the relationship between the independent variables (i.e., features), which in turn may distort the relationship the independent variables have with the dependent variable (i.e., popularity) McIntosh et al (2016). We used the redun function131313https://www.rdocumentation.org/packages/Hmisc/versions/4.2-0/topics/redun from the Hmisc package in R to filter out features that can be linearly predicted by other features. We removed the ‘number of categories’ feature as it is redundant, leaving 32 features for the remainder of the study.3. Building the combined model. We used all the remaining features after step 2 to build a logistic regression model. However, the model’s regression coefficients could vary or be estimated incorrectly based on the sample of data and the underlying assumptions Fox and Monette (2002). Hence, to avoid biasing the estimated regression coefficients, we used the bootcov function from the rms package using 100 bootstrap iterations to adjust the regression coefficients with bootstrap estimates, to ensure the non-arbitrariness of the estimated regressions co-efficients in the combined model Harrell Jr et al (1984); Harrell Jr and Slaughter (2001).4a. Explanatory power of features. We used Wald’s χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to measure the explanatory power of the features in the model from step 3. The larger the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the larger the explanatory power of the feature Harrell Jr et al (1984). Prior study Thongtanunam and Hassan (2018) used the same approach to compute the explanatory power of features. We computed the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with the Anova function141414https://www.rdocumentation.org/packages/car/versions/3.0-3/topics/Anova from the car package151515https://www.rdocumentation.org/packages/car/versions/3.0-3 in R using the parameter test.statistic=‘Wald’. Table 4 shows the explanatory power of each feature (Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT).4b. Explanatory power of dimensions. Though in RQ1, we observed that each dimension of features of a mod has explanatory power, we are uncertain of the unique explanatory power each of them contains in relation to the other dimensions. Understanding the unique explanatory power of each dimension is critical to assert which of these dimensions matter the most for characterizing the popularity of a mod. For example, from Figure 3 we observe that the environmental context of the mod and mod documentation dimensions by themselves can explain the popularity of a mod with a median AUC of 0.74. However, we are uncertain of how much unique power each of these dimensions contribute to the model built on all the studied dimensions, which had a median AUC of 0.92.Therefore, we conducted a chunk test on each of the studied dimensions in the combined model from step 3, to quantify the explanatory power of each studied dimension Harrell Jr (2001); McIntosh et al (2016). For each of the studied dimensions (given in Table 3), the chunk test estimates the difference in goodness of fit (by computing the difference in log-likelihood) between the full model (i.e., the combined model from step 3) and the combined model that was built without one studied dimension (whose explanatory power we are computing). The chunk test reports a Chi-square value (ΔΔ\Deltaroman_ΔLRχ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) (which is the difference in log-likelihood compared to the Chi-squared distribution) and a p-value. The Chi-squared value quantifies the unique explanatory power that was lost due to the removal of the given dimension (in relation to the other dimensions) and a lower p-value (<=0.05absent0.05<=0.05< = 0.05) signifies the dimension’s significance.We used the lrtest function161616https://www.rdocumentation.org/packages/lmtest/versions/0.9-37/topics/lrtest from the lmtest package171717https://www.rdocumentation.org/packages/lmtest/versions/0.9-37 in R to conduct the chunk test. Table 4 shows the explanatory power of each dimension (ΔΔ\Deltaroman_ΔLRχ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT).5a. Backward feature selection. We do backward feature selection to ensure the parsimony of the constructed model, as suggested by Harrell et al. Harrell Jr et al (1984). For instance, if a model contains a large number of independent features, the model becomes too complex to draw explanations. Hence, Harrell et al. Harrell Jr et al (1984) suggests using backward feature selection when the goal of the model is to interpret it. We used the fastbw function181818https://www.rdocumentation.org/packages/rms/versions/5.1-3.1/topics/fastbw from the rms package in R to perform a backward elimination of features. The fastbw function takes the model that was constructed on all the features (32) and eliminates the features that do not significantly contribute to reducing the AIC of the model. We removed 14 of the 32 features (44%) using the fastbw function. In result, we obtained a new combined model with 18 features.5b. Build the final model. With the reduced feature set from step 5a, we reconstructed the final combined model. Similar to step 3, we adjusted the regression coefficients with the bootstrap estimate, as outlined by Harrell et al. Harrell Jr et al (1984).5c. Performance evaluation. To demonstrate the quality of the constructed model from 5b, we calculated the AUC of the model using 100 out-of sample bootstrap iterations to evaluate the performance of the model.5d. Nomogram analysis. We used the final combined model from step 5b to create and analyze a nomogram using the nomogram function191919https://www.rdocumentation.org/packages/rms/versions/5.1-3.1/topics/nomogram from the rms package in R, which provides a way to measure the explanatory power of each feature in distinguishing popular from unpopular mods. A nomogram provides a graphical visualization of the parsimonious logistic regression model that we built in step 5b. Although the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can provide insight into the explanatory power of each feature in the combined model, the nomogram provides us with an exact interpretation on how the variation in each feature affects the outcome probability. For instance, while the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT may indicate that the number of words in the long description of a mod is important, the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT does not provide insights on how the exact number of words in the long description contribute to the explanatory power in distinguishing popular from unpopular mods. Furthermore, the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT does not show if a certain feature has a positive or negative role in distinguishing popular from unpopular mods, whereas the nomogram does. For instance, if for a given mod, the feature “latest_num_bukkit_versions” is 0, then it has a positive role in distinguishing popular from unpopular mods. Several prior studies Shariat et al (2009); Chun et al (2007) showed that nomograms are one of the most accurate discriminatory tools in interpreting a logistic regression model. Hence, we constructed a nomogram to observe the exact role of features in classifying if a given mod is either popular or unpopular. Another key difference between the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and nomogram is that the nomogram can show the contribution of each feature towards the outcome probability for each of the studied mods, whereas the Wald χ2superscript????2\chi^2italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT only shows the overall contribution (which is not specific to each mod). Figure 5 shows the results of the nomogram analysis.5e. Partial effects analysis. We used the final combined model from step 5b and the nomogram analysis from step 5d to create partial effects plots, which show how different values in numeric features with respect to another feature held constant at the median for numeric features and at the mode for boolean features, contributes the outcome probability. Hence, the partial effects analysis provides a deeper explanation of how the variation in certain features can contribute to the probability of a mod being popular or unpopular.In addition, to measure if two distributions are significantly different, we used the Wilcoxon tests. The Wilcoxon signed-rank test is a paired and non-parametric statistical test, whereas the Wilcoxon rank-sum test is an unpaired and non-parametric statistical test, where the null hypothesis indicates that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample Wilcoxon (1945). If the p-value of the used Wilcoxon test on the two distributions is less than 0.05, we reject the null hypothesis, and conclude that the two distributions are significantly different. In addition, to calculate the magnitude of the difference we calculate the Cliff’s delta d effect size Long et al (2003), with the following thresholds Romano et al (2006):Effect size=≤1\textEffect size=\begincases\textitnegligible(N),&\textif $.\\ \textitsmall(S),&\text\leq 0.33$.\\ \textitmedium(M),&\textif $0.33<.\\ \textitlarge(L),&\text\leq 1$.\\ \endcasesEffect size = ≤ 0.33 . end_CELL end_ROW start_ROW start_CELL medium(M) , end_CELL start_CELL if 0.33