Skip to main content
  • Regular article
  • Open access
  • Published:

Diversity dilemmas: uncovering gender and nationality biases in graduate admissions across top North American computer science programs

Abstract

Although different organizations have defined policies towards diversity in academia, many argue that minorities are still disadvantaged in university admissions due to biases. Extensive research has been conducted on detecting partiality patterns in the academic community. However, in the last few decades, limited research has focused on assessing gender and nationality biases in graduate admission results of universities. In this study, we collected a novel and comprehensive dataset containing information on approximately 14,000 graduate students majoring in computer science (CS) at the top 25 North American universities. We used statistical hypothesis tests to determine whether there is a preference for students’ gender and nationality in the admission processes. In addition to partiality patterns, we discuss the relationship between gender/nationality diversity and the scientific achievements of research teams. Consistent with previous studies, our findings show that there is no gender bias in the admission of graduate students to research groups, but we observed bias based on students’ nationality.

1 Introduction

Every year, many students from all over the world apply to pursue their graduate studies at top universities in North America [1]. Despite the committee-based nature of admission to many of these universities, professors still play a prominent role in accepting students and providing them with financial support [2]. As a result, students often directly contact faculty members to enhance their chances of admission. Furthermore, students who are admitted by a committee must find an academic advisor and research group, and faculty members have the authority to approve or reject these requests. Consequently, their research group may demonstrate a preference for accepting students of similar gender, country of origin, or previous universities. In this study, we aim to examine the existing biases in interactions with computer science faculty members at top North American universities and their preferences regarding nationality and gender when selecting graduate students for their research group.

In addition to establishing fair admission systems, it is crucial to enhance diversity in academia. Promoting diversity within universities enables them to have a greater impact on societies [3]. This is because institutions aim to address social issues, which cannot be effectively achieved without embracing diversity [4]. Furthermore, it is argued that being in a diverse environment can broaden students’ horizons [5].

Most prestigious universities typically strive to ensure fairness in the admission process for their graduate programs. Various factors, such as merit, gender equality, and diversity, contribute to establishing a fair graduate admission system [2, 6]. However, it is argued that admitting a greater number of marginalized students for graduate education at U.S. universities remains a contentious issue [7].

To the best of our knowledge, only three studies have focused on assessing gender or nationality bias in graduate admissions, and all of them were conducted prior to 2000. Bickel and Hammel [8] analyzed admission results from various schools at the University of California, Berkeley to examine the presence of a gender gap. They found statistically significant favoritism towards female applicants. Maxwell and Jones [9] employed adjustment techniques to compare admission rates between women and men in four graduate programs at the University of North Carolina, Chapel Hill. Their findings suggested that gender was not a significant factor in admission decisions. Subsequently, the authors of [10] discussed the influence of demographic attributes, such as gender and country of citizenship, on graduate admission decisions at top-ranked American universities. Their results indicated that these universities placed greater emphasis on admitting U.S. students, and female applicants received some degree of preference. Our work builds upon these studies by addressing questions regarding gender/nationality bias in more recent and comprehensive graduate admissions data. The dataset we collected for this study encompasses a larger number of students and includes a greater number of universities.

Some studies have examined the impact of gender/nationality diversity on the performance of research teams. In [11], the authors investigated the level of cultural diversity at which a research group achieves the highest performance. AlShebli et al. [12] analyzed author lists of research papers to explore the influence of diversity in characteristics such as gender and ethnicity on the success of research teams. Llorens et al. [13] demonstrated the existence of gender bias throughout scholars’ academic careers, affecting aspects such as career opportunities, promotion, and grant allocation. They also proposed solutions at various levels to enhance diversity, highlighting its importance for scientific success. The authors of [14] examined different facets of gender diversity and reported its positive impact on creativity and performance in scientific domains. Kamerlin [15] addressed bias issues in academia and presented strategies to promote gender diversity in academic environments. Powell [16] utilized citation count to quantify the success of research papers and investigated its relationship with various aspects of diversity, such as gender, age, ethnicity, and affiliation, among the authors. In addition to citation count, we consider faculty members’ h-index and publication count as measures of success for their research groups.

Many initiatives have been undertaken to enhance diversity in computer science. The author of [17] emphasized that these efforts should not be limited to achieving gender equity alone. Wilson [18] highlighted how his team in Hour of Code decided to translate their lectures into multiple languages and establish branches in more countries to promote diversity in computer science. One of the primary objectives of their program is to globalize computer science [19]. Increasing students’ awareness of diversity and inclusion is a crucial step towards fostering a more diverse community of computer scientists [20]. These studies collectively underscore the significance of addressing diversity issues in academia.

In this study, we aim to address the following questions:

  • Do professors exhibit a preference for admitting students of the same gender to their research group?

  • Are they inclined to accept students who share their country of origin?

  • How do these bias patterns evolve over time?

  • Is there any correlation between the diversity of gender or nationality among team members and the research team’s productivity?

Our contributions can be summarized as follows:

  1. 1.

    We provide a comprehensive description of the dataset collected for this study, highlighting its various features.

  2. 2.

    We analyze the gender distributions of students and faculty members and conduct hypothesis tests to examine the presence of gender bias in the selection of students for graduate study.

  3. 3.

    We investigate the distributions of advisors and students’ home countries and explore the existence of bias in this variable.

  4. 4.

    We construct an advisor-student relationship network using our dataset and calculate centrality metrics to identify the most influential countries in higher education.

  5. 5.

    We examine the trends in gender/nationality biases and diversities among advisor-student pairs over time using Mann-Kendall tests.

  6. 6.

    We assess the correlations between academic success and diversity measures to analyze the relationship between gender/nationality diversity and the performance of research groups.

The rest of this paper is structured as follows. The “Materials and methods” section offers an overview of the data collection process and delineates the diverse features within our dataset. Following this, our discoveries are outlined and analyzed in the “Results and discussion” section. In the “Future work” section, potential directions for future research are proposed. Finally, the “Conclusion” section succinctly summarizes the key takeaways of the paper.

2 Materials and methods

In this section, we define the techniques and metrics that we use in answering our research questions. Moreover, we describe the dataset that we collected for this study.

2.1 Methods

In this part, we introduce the algorithms and statistical tests utilized in our study.

2.1.1 Disparity filter

The disparity filter is a graph sparsification algorithm utilized to effectively reduce the number of edges in a network while preserving its multi-scale nature [21]. We apply this algorithm to remove insignificant edges from the advisor-student relationship network. Figure 1 provides an example of the application of the disparity filter algorithm.

Figure 1
figure 1

A sampled subgraph of the advisor-student relationship network. The subgraph is shown before (a) and after (b) the application of the disparity filter algorithm

2.1.2 Louvain community detection

The Louvain community detection algorithm is utilized to identify communities within a large-scale network by optimizing the modularity. This algorithm aims to maximize the difference between the expected edge counts within a community and the actual edge counts. It employs a greedy approach with heuristics to solve the problem efficiently in polynomial time [22]. We apply this algorithm to detect communities within the advisor-student relationship network.

2.1.3 Leiden community detection

The Leiden community detection algorithm is an advancement of the Louvain algorithm. It employs a fast local move approach and iteratively refines partitions to ensure the connectedness of all detected communities. Compared to the Louvain algorithm, it offers improved speed and provides more accurate partitions [23]. We use this algorithm to identify communities within the advisor-student relationship network.

Figure 2 shows the examples of the Louvain and Leiden community detection algorithms.

Figure 2
figure 2

The sampled subgraph of the advisor-student relationship network with specified communities. Communities are detected using Louvain (c) and Leiden (d) community detection algorithms

2.2 Statistical analysis

In this part, we provide a description of the statistical methods employed in this study.

2.2.1 Proportion hypothesis test

The proportion hypothesis test is a statistical method that compares the ratio of an attribute in a population with a reference proportion. It also establishes a range of values that are likely to include the population proportion [24]. We utilize this technique to assess our research questions regarding biases in graduate admission.

2.2.2 Mann-Kendall test

The Mann-Kendall test is a nonparametric method that assesses the presence and direction of trends. It is particularly suitable for detecting monotonic trends that exhibit consistent increases or decreases over time [25]. We employ this technique to evaluate the trends of variables such as gender/nationality diversity over time.

2.3 Metrics

In this part, we provide the definitions of the measures that we calculate in this study.

2.3.1 Weighted degree centrality

Weighted degree centrality is defined for each node in a network by summing the weights of the edges connected to that node. The formula for weighted degree centrality is as follows:

$$\begin{aligned} WD(u) = \sum_{v}{w(v, u)}, \end{aligned}$$
(1)

where v is a neighbor of u, and \(w(v, u)\) is the weight of the edge between v and u [26]. We employ this measure to examine the faculty members from which countries accept a greater number of students from other countries.

2.3.2 Closeness centrality

For each node, closeness centrality is defined as the average distance between that node and all other nodes in the network. The formula for closeness centrality is as follows:

$$\begin{aligned} C(u) = \frac{n - 1}{\sum_{v=1}^{n-1} d(v, u)}, \end{aligned}$$
(2)

where n represents the number of vertices that node u is reachable from, and \(d(v, u)\) denotes the geodesic distance between nodes v and u [27]. We utilize this metric to determine which countries are closer to the rest of the world in terms of admission results.

2.3.3 Entropy

The entropy of a variable is defined as the average uncertainty of that variable based on its probability distribution. The formula for entropy is as follows:

$$\begin{aligned} E(X) = -\sum_{v=1}^{n} p_{i} \log p_{i}, \end{aligned}$$
(3)

where the base of the logarithm is e, and \(p_{i}\) represents the probability of the i-th outcome in variable X [28]. We employ this measure to calculate the diversity of an advisor’s research team.

2.4 Dataset

Data collection was the most challenging aspect of this study. We collected data from multiple websites, each with its own unique structure, using a combination of manual and automated approaches.

The data collection procedure consists of four steps: manual data gathering, data collection using crawlers, removal of unnecessary data, and preprocessing. We collected data from the top 25 universities in North America, as ranked by Quacquarelli Symonds (QS) in 2021 for computer science [29].

2.4.1 Manual data collection

Among all the faculty members in the computer science departments of each university, we randomly selected approximately 30 professors. We collected information such as the professor’s academic rank, home country, gender, research areas, and academic performance metrics (h-index and citation count). We also completed the prior universities (alma maters) column by referring to the professors’ resumes and information available on their websites, LinkedIn, and Google Scholar. To determine the gender, we relied on images or pronouns specified on their websites. If the birthplace was not explicitly stated, we used the location of their undergraduate university to determine their home country. We also gathered academic records, such as citation counts and h-indexes, from Google Scholar.

The academic rank of faculty members, including Assistant Professor, Associate Professor, and Professor, was typically available on the university’s website. Table 1 presents the key information about faculty members that we collected from the university homepage and the professors’ personal pages.

Table 1 Essential professor information

For the professor’s field column, we initially obtained the professor’s research interests from their website, resume, or in some cases, from Google Scholar. Next, we manually determined whether the professor’s research interests were associated with one or more of the 13 primary fields of the Association for Computing Machinery (ACM) computer science field category [30]. Table 2 presents a sample mapping between professor interests and ACM subareas within our dataset.

Table 2 Mapping between professor interests and ACM subareas

After obtaining all the necessary information for each professor, we proceeded to gather the names of their students and any additional available information from their profiles. If any student-related information was available, we used it to populate the corresponding column; otherwise, we left it blank and planned to update it later with data collected from our crawlers in the next stage. Furthermore, after running the crawlers, we manually cross-checked the data to fill in any gaps using information available from other sources. The process of finding the information and collecting the data proved to be challenging and time-consuming, leading to the development of crawlers for different sections. Table 3 presents the student information available in our dataset.

Table 3 Student information in our dataset

2.4.2 Data collection using crawlers

We used the list of all students as input for the Google search engine to locate their websites and resumes, including their LinkedIn accounts. The next challenge was to automatically extract the required data from these websites and resumes to populate the information columns, such as degree, admission year, and alma maters. We also performed data cleaning on the output from the crawler and merged it with the primary dataset to ensure consistency and completeness.

We used the Name2GAN website [31] to label a person’s gender, if it was not manually identified. We checked the results of this tool for 3000 previously labeled data. The results show that the gender detection tool has an accuracy higher than 90%. We used manual labeling for cases that gender detection uncertainty was high to enrich the quality of our dataset.

2.4.3 Irrelevant data removal

Since we recorded information about all students associated with each randomly-selected professor, including visiting students, undergraduates, postdocs, masters, and PhD students, it was important to filter out irrelevant data and include only graduate students for our analysis. The final version of the dataset was completed on August 2, 2022, and it consisted of a total of 13,936 graduate students.

2.4.4 Preprocessing

The preprocessing stage consists of two phases:

  1. 1.

    Preparing the input for the crawlers.

  2. 2.

    Preparing the data for analysis.

The most crucial component of the preprocessing stage was creating a consistent list of institutions that could be used for analysis and for the Google Maps crawler. We also double-checked the address results for each university to ensure that the mapping between university and address was unique. As mentioned earlier, we used these addresses to identify the students’ countries of origin. In some cases, the home countries of students were improperly reported as a state rather than the country, and we corrected this during the preprocessing stage. Once the home country column was filled out, we standardized the names of the countries and prepared them for analysis. Additionally, the admission year column required cleaning, as there were specific irrational values that were quickly corrected.

Students’ home country is determined based on explicit specifications, if available. If not explicitly specified, we first consider the country from which they earned an associate degree. If that information is not available, we use the location of their undergraduate university to determine their home country. Additionally, we utilized a crawler for the Google Maps API to search for the location of universities and schools, which provided us with the necessary addresses for further analysis.

2.5 Data exploration

In this part, we present an overview of the key features of our dataset in order to gain insights into their distributions.

2.5.1 Advisors’ gender

In this part, we examine the distributions of advisors’ gender across other attributes. Figure 3 displays the mosaic plot depicting the relationship between advisors’ gender and their academic rank. The majority of advisors in our dataset hold the professor rank, and the highest proportion of male advisors is also observed at the professor level. This finding aligns with the results of [32], which suggest that men have a greater likelihood of being promoted to the professor rank compared to women.

Figure 3
figure 3

Mosaic plot of advisors’ academic rank and gender. The numbers displayed on the bars indicate the percentage of advisors’ gender, while the numbers on the gray section represent the percentage of faculty members in each rank. The width of each bar corresponds to the number of faculty members with a specific rank

Figure 4 illustrates the distribution of female and male faculty members across different subfields of computer science in our dataset. The graph shows that the computing methodologies subfield has the highest number of advisors. This observation can be attributed to the growing significance of Artificial Intelligence, which falls under the computing methodologies category and is an interdisciplinary field [33, 34]. The theory of computation and computer systems organization subfields represent the second and third largest groups, respectively.

Figure 4
figure 4

Back-to-back bar plot of advisors’ gender and their research fields

Figure 5 shows the distribution of gender among computer science faculty members across different universities.

Figure 5
figure 5

Gender-disaggregated bar plot showing the count of advisors for each university. The numbers on each column represent the percentage of different genders in that specific university

2.5.2 Advisors’ academic performance metrics

In this part, we present the dispersion of academic performance metrics of the faculty members, including publication count, h-index, and citation count, which are crucial indicators of the success of their research teams. Figure 6 displays the boxplots of advisors’ publication counts for each university. To enhance the resolution, advisors with more than 1000 publications were excluded from this plot.

Figure 6
figure 6

Boxplots of advisors’ publication counts for each university

Figure 7 illustrates the distribution of citation counts for faculty members at each university. To improve the clarity of the diagram, faculty members with a citation count exceeding 100,000 were excluded.

Figure 7
figure 7

Boxplots of advisors’ citation counts for each university

Figure 8 presents the boxplots of h-indexes for faculty members at each university. It is worth noting that the h-index metric has fewer outliers compared to the previous metrics, indicating that it may be a better indicator for assessing the success of research groups [35].

Figure 8
figure 8

Boxplots of advisors’ h-indexes for each university

2.5.3 Students’ gender

In this part, we illustrate the distribution of students’ gender against other features. Figure 9 presents a mosaic plot depicting the distribution of students’ gender based on the degree they are pursuing (or have pursued) under the supervision of their advisor. The plot reveals that there are fewer women in graduate computer science programs, which aligns with the findings of Cuny and Aspray’s study [36]. Additionally, the female-to-male ratio decreases as the degree level progresses from masters to doctorate, potentially indicating a lower tendency among women to pursue higher education [37].

Figure 9
figure 9

Mosaic plot of students’ degree and gender. The numbers on the bars represent the percentage of students of each gender, while the numbers on the gray sections indicate the percentage of students in each degree category. The width of each bar is proportional to the number of students in that particular degree category

Figure 10 displays the gender distribution of CS students across different universities.

Figure 10
figure 10

Gender-disaggregated bar plot showing the count of students for each university. The numbers on each column represent the percentage of different genders in that particular university

2.5.4 Nationality distributions

In this part, we explore the distribution of nationalities among students and faculty members. Figure 11 presents the distribution of students’ citizenship for each degree. It shows that the majority of students apply for doctoral programs, and the percentage of international students is higher than that of American and Canadian students. This finding is in line with the result of a study by Okahana and Zhou [38], which states that in Fall 2015, approximately 55% of students majoring in computer science or related programs were international students.

Figure 11
figure 11

Mosaic plot of students’ degree and citizenship. The numbers on each bar represent the percentage of different citizenships, and the numbers on the gray section indicate the percentage of students in each degree. The width of each bar is proportional to the number of students in that particular degree

Figure 12 displays the distribution of students’ nationalities on the world map. The United States and Canada have been excluded to focus solely on international students. The map reveals that the majority of international students are from China, India, and Iran, respectively. This finding aligns with the results of [39], which indicate that graduate programs are predominantly composed of Chinese and Indian students.

Figure 12
figure 12

Distribution of students’ nationalities

Figure 13 displays the distribution of faculty members’ home countries on the world map. The map reveals that the majority of advisors originated from the United States, followed by India, China, and Canada, respectively.

Figure 13
figure 13

Distribution of advisors’ countries of origin

Figure 14 depicts the sorted bar plots of the 15 most common countries among faculty members and students.

Figure 14
figure 14

Distributions of countries of origin among faculty members and students

3 Results and discussion

In this section, we provide a comprehensive explanation of our analyses and interpret the results we obtained.

3.1 Assessing gender partiality

In this part, we evaluate the presence of gender bias in admission decisions. We conduct a two-sided hypothesis test with a significance level of 0.05 to examine whether there is gender bias in the acceptance of graduate students into advisors’ research groups. To accomplish this, we employ a simulation-based approach with 500 iterations [24]. In each iteration, we generate 13,759 advisor-student pairs, where the gender of each component is selected based on the observed ratio in our dataset. Specifically, the probability of an advisor being male is 0.788, and the probability of a student being male is 0.771. This simulation yields an approximately normal distribution with a mean of 0.6562 and a standard deviation of 0.0212, as depicted in Fig. 15. It is important to note that this distribution represents the values for the ratio of advisor-student pairs with the same gender, assuming no gender bias in admitting graduate students. In our dataset, the observed ratio of advisor-student pairs with the same gender is 0.6896. We will now test whether this observed value is likely to occur in the simulated distribution. Thus, our hypothesis test is formulated as follows:

$$ \begin{gathered} H_{0}: p_{\mathrm{common} \: \mathrm{gender} \: \mathrm{ratio}} = 0.6562, \\ H_{a}: p_{\mathrm{common} \: \mathrm{gender} \: \mathrm{ratio}} \neq 0.6562. \end{gathered} $$
(4)
Figure 15
figure 15

Histogram of advisor-student common gender proportion

Using a z-test, we obtained a p-value of 0.1152, which is higher than the significance level of 0.05. Therefore, we cannot reject the null hypothesis. In other words, the data does not provide strong evidence of gender bias in the admissions of graduate students. This finding is consistent with the results of Maxwell’s study [9], which also concluded that gender is not a significant factor in graduate student acceptance.

3.2 Evaluating nationality bias

In this part, we aim to investigate the presence of nationality bias in advisor-student relationships. We conduct a two-sided hypothesis test with a significance level of 0.05 to assess the existence of such bias. Similar to the previous analysis, we employ a simulation-based approach with 500 iterations. For this analysis, we only consider international students who are not from the United States or Canada. At each iteration, we generate 4839 advisor-student pairs, where the nationality of each individual is selected with a probability equal to the observed ratio in the dataset. In each iteration, we calculate the ratio of advisor-student pairs with the same nationality. The resulting distribution, shown in Fig. 16, approximates a normal distribution with a mean of 0.0682 and a standard deviation of 0.0113. In our dataset, the proportion of advisor-student pairs with the same nationality is 0.1593. To assess the likelihood of observing such a ratio in the simulated distribution, we formulate the following hypothesis test:

$$ \begin{gathered} H_{0}: p_{\mathrm{common} \: \mathrm{nationality} \: \mathrm{ratio}} = 0.0682, \\ H_{a}: p_{\mathrm{common} \: \mathrm{nationality} \: \mathrm{ratio}} \neq 0.0682. \end{gathered} $$
(5)
Figure 16
figure 16

Histogram of advisor-student common nationality ratio

Using a z-test, we obtain a p-value of \(p < 10^{-15}\), which is significantly lower than the chosen significance level of 0.05. Therefore, we reject the null hypothesis and conclude that there is strong evidence of nationality bias in admitting international graduate students. This bias may be attributed to advisors’ familiarity with universities in their home country and their potential to make more accurate assessments of students who have graduated from those universities.

3.3 Advisor-student relationship network

In this part, we present a cross-country advisor-student relationship network based on our dataset. The network is constructed by connecting the nationalities of students and their advisors with weighted edges. We apply the disparity filter algorithm [21] to eliminate insignificant edges and remove isolated nodes from the network. Figure 17 provides an overview of the advisor-student relationship network. In this visualization, the size of the nodes and labels corresponds to the weighted degree and closeness centralities, respectively. The thickness of the edges represents their weight, which indicates the number of advisor-student pairs between the respective countries. Additionally, the nodes are color-coded based on their community assignment, determined using the Louvain community detection algorithm [22].

Figure 17
figure 17

Cross-country advisor-student relationship network, with communities detected via Louvain algorithm

The countries with the highest values for both centrality metrics are the United States, India, China, Canada, and Iran, respectively. This observation aligns with the previous findings that faculty members from these countries are prevalent in top universities. It serves as further evidence of the potential existence of nationality bias in advisor-student relationships.

In Fig. 18, the advisor-student relationship network is depicted with similar settings, but the Leiden algorithm [23] is utilized for community detection. According to the results, Sweden and Romania are assigned to different communities compared to Fig. 17.

Figure 18
figure 18

Cross-country advisor-student relationship network, with communities detected via Leiden algorithm

3.4 Exploring time effect

In this part, we analyze the changes in bias patterns over time. Specifically, we examine admissions from 2000 to 2021. For each year, we calculate the ratios of advisor-student pairs with the same gender and nationality. Figure 19 illustrates the time series of the identical gender ratio. The results of a Mann-Kendall test indicate that this time series exhibits a statistically significant decreasing trend (\(p < 0.01\)).

Figure 19
figure 19

Time series of advisor-student identical gender ratio

Figure 20 illustrates the proportions of advisor-student pairs with the same nationality across different acceptance years. We observe an increasing trend in these proportions, which is consistent with the results of a Mann-Kendall test (\(p < 0.01\)).

Figure 20
figure 20

Time series of advisor-student similar nationality ratio

3.5 Investigating relationship between academic success and diversity

In this part, we aim to investigate whether there is a correlation between diversity in advisors’ research groups and their academic success. To assess this relationship, we employ scientometrics, which are described in Table 4, as measures of research group success.

Table 4 Scientometrics and their explanations

Moreover, we consider the entropy of genders and nationalities among an advisor’s students as measures of diversity within their research group. We calculate the academic success and diversity measures for 737 advisors in our dataset. Subsequently, we compute the correlations between these variables, as shown in Fig. 21. To assess the statistical significance of each correlation, we conduct a hypothesis test with a significance level of 0.01. Based on the results, the correlations between gender entropy and other variables are close to zero and not statistically significant. This suggests that there is no significant linear correlation between gender diversity and the performance of research groups. On the other hand, nationality entropy exhibits a moderate positive correlation with advisors’ h-index. This implies that research teams with greater diversity in terms of nationality tend to have higher research productivity. Additionally, there are weak positive linear relationships between nationality diversity and the remaining academic success metrics. It is important to note that the h-index is considered a more reliable measure of academic success [35].

Figure 21
figure 21

Correlogram of academic success measures and gender/nationality diversity

3.6 Analyzing trends of diversity

In this section, we discuss how gender and nationality diversities have changed over the past two decades. Once again, we employ the Mann-Kendall test to assess the strength of the observed trend. Figure 22 illustrates the increasing trend in gender entropy over time. According to the results of the Mann-Kendall test, the observed trend is highly statistically significant (\(p < 10^{-5}\)).

Figure 22
figure 22

Students’ gender entropy across admission years

Figure 23 shows the time series of nationality entropy. As depicted, there has been a decrease in nationality diversity over time. The decline from 2016 to 2020 is particularly noticeable. The results of the Mann-Kendall test confirm that the observed trend is statistically significant (\(p< 0.01\)).

Figure 23
figure 23

Students’ nationality entropy across admission years

4 Future work

While our work presents a novel study analyzing gender and nationality biases in graduate admissions over recent decades, future research should aim to explore other crucial factors influencing admission decisions. These factors include academic background, religion, and politics, in order to provide a more comprehensive understanding of bias in graduate admissions. To achieve this, researchers could contemplate integrating our dataset with additional sources, such as institutional reports and the social media profiles of students and faculty members on platforms like Twitter, to glean fresh insights on this matter.

Moreover, another promising avenue for future research involves evaluating whether specific stages of the admissions process accentuate gender and nationality biases, and how these biases manifest diversely across various universities. For example, researchers could concentrate on distinct phases of the admissions process, such as committee decisions, to discern differing bias patterns.

Additionally, future investigations might delve into the correlation between gender and nationality diversity within computer science faculty and observed biases in graduate admissions. This analysis could yield insights into potential strategies for addressing these biases effectively.

Lastly, a valuable topic for future research could be assessing whether significant variations in gender and nationality biases exist across different subfields within computer science (e.g., artificial intelligence, systems, theory). Furthermore, exploring how these biases correlate with broader trends could provide valuable insights into the dynamics of bias within the field.

5 Conclusion

In this study, we analyzed the distribution of genders and nationalities among students and their advisors. We conducted two-sided hypothesis tests to examine the presence of bias in gender and home country within advisor-student relationships. Our findings indicate that there is no gender bias in admission results. However, our results confirm the existence of bias against international applicants based on nationality. Additionally, we explored centrality metrics in the advisor-student relationship network, revealing that the United States, India, and China are the dominant countries in CS academia, influencing the composition of students and faculty members in top North American universities. We investigated the trends in gender and nationality bias over time and observed a reduction in gender bias, while nationality bias has shown an increasing pattern. Furthermore, we established a positive relationship between diversity in the nationalities of research group members and their academic performance. Lastly, we demonstrated an increase in gender diversity over time, alongside a decline in nationality diversity.

We acknowledge a limitation regarding the data collected for this study. We cannot guarantee that each faculty member consistently includes all individuals on their webpage. While the majority of computer science professors at high-ranking universities update their homepage at least once a year, some faculty members may not update information about newly admitted students as frequently.

Universities can utilize the findings of this study to formulate and implement policies aimed at promoting diversity and equality among their graduate students. Furthermore, they can raise awareness among faculty members regarding the benefits, particularly in terms of scientific achievement, that arise from having a diverse research team. Universities can also encourage faculty members to actively consider admitting students from a variety of nationalities.

Availability of data and materials

The dataset generated and analyzed during the current study is available in the Advisor Student Data repository, https://github.com/kalhorghazal/Advisor-Student-Data.

Abbreviations

CS:

Computer Science

QS:

Quacquarelli Symonds

ACM:

Association for Computing Machinery

MS:

Master of Science

Berkeley:

University of California, Berkeley

Caltech:

California Institute of Technology

CMU:

Carnegie Mellon University

Columbia:

Columbia University

Cornell:

Cornell University

Georgia Tech:

Georgia Institute of Technology

Harvard:

Harvard University

McGill:

McGill University

MIT:

Massachusetts Institute of Technology

NYU:

New York University

Princeton:

Princeton University

Stanford:

Stanford University

U of T:

University of Toronto

UBC:

University of British Columbia

UChicago:

University of Chicago

UCLA:

University of California, Los Angeles

UCSD:

University of California, San Diego

UIUC:

University of Illinois at Urbana-Champaign

UMich:

University of Michigan-Ann Arbor

UPenn:

University of Pennsylvania

USC:

University of Southern California

UT Austin:

University of Texas at Austin

UW:

University of Washington

Waterloo:

University of Waterloo

Yale:

Yale University

References

  1. Sharaievska I, Kono S, Mirehie MS (2019) Are we speaking the same language? The experiences of international students and scholars in North American higher education. SCHOLE, J Leis Stud Recreat Educ 34(2):120–131

    Google Scholar 

  2. Posselt JR (2014) Toward inclusive excellence in graduate education: constructing merit and diversity in PhD admissions. Am J Educ 120(4):481–514

    Article  Google Scholar 

  3. Bollinger L (2007) Why diversity matters. Educ Dig 73(2):26

    MathSciNet  Google Scholar 

  4. Smith DG (2020) Diversity’s promise for higher education: making it work

  5. Maruyama G, Moreno JF (2000) University faculty views about the value of diversity on campus and in the classroom. Does diversity make a difference? Three research studies on diversity in college classrooms, 9–35

  6. Pitman T (2016) Understanding ‘fairness’ in student selection: are there differences and does it make a difference anyway? Stud High Educ 41(7):1203–1216

    Article  Google Scholar 

  7. Barrera CR (2006) Making U.S. graduate education more diverse. Science 313:614

    Article  Google Scholar 

  8. Bickel PJ, Hammel EA, O’Connell JW (1975) Sex bias in graduate admissions: data from Berkeley: measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. Science 187(4175):398–404

    Article  Google Scholar 

  9. Maxwell SE, Jones LV (1976) Female and male admission to graduate school: an illustrative inquiry. J Educ Stat 1(1):1–37

    Article  Google Scholar 

  10. Attiyeh G, Attiyeh R (1997) Testing for bias in graduate school admissions. J Hum Resour 32(3):524–548

    Article  Google Scholar 

  11. Barjak F, Robinson S (2008) International collaboration, mobility and team diversity in the life sciences: impact on research performance. Soc Geogr 3(1):23–36

    Article  Google Scholar 

  12. AlShebli BK, Rahwan T, Woon WL (2018) The preeminence of ethnic diversity in scientific collaboration. Nat Commun 9(1):5163

    Article  Google Scholar 

  13. Llorens A, Tzovara A, Bellier L, Bhaya-Grossman I, Bidet-Caulet A, Chang WK, Cross ZR, Dominguez-Faus R, Flinker A, Fonken Y et al. (2021) Gender bias in academia: a lifetime problem that needs solutions. Neuron 109(13):2047–2074

    Article  Google Scholar 

  14. Nielsen MW, Bloch CW, Schiebinger L (2018) Making gender diversity work for scientific discovery and innovation. Nat Hum Behav 2(10):726–734

    Article  Google Scholar 

  15. Kamerlin SCL (2020) When we increase diversity in academia, we all win. EMBO Rep 21(12):e51994

    Article  Google Scholar 

  16. Powell K (2018) These labs are remarkably diverse—here’s why they’re winning at science. Nature 558(7708):19–23

    Article  Google Scholar 

  17. Larsen EA, Stubbs ML (2005) Increasing diversity in computer science: acknowledging, yet moving beyond, gender. J Women Minor Sci Eng 11(2):139–170

    Article  Google Scholar 

  18. Wilson C (2014) Hour of code. ACM Inroads 5(4):22

    Article  Google Scholar 

  19. Partovi H (2015) A comprehensive effort to expand access and diversity in computer science. ACM Inroads 6(3):67–72

    Article  Google Scholar 

  20. Garcia-Holgado A, Vazquez-Ingelmo A, Verdugo-Castro S, Gonzalez C, Gomez MCS, Garcia-Penalvo FJ (2019) Actions to promote diversity in engineering studies: a case study in a computer science degree. In: 2019 IEEE global engineering education conference (EDUCON). IEEE, New York

    Google Scholar 

  21. Serrano MÁ, Boguná M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci 106(16):6483–6488

    Article  Google Scholar 

  22. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008

    Article  MATH  Google Scholar 

  23. Traag VA, Waltman L, van Eck NJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9(1):5233

    Article  Google Scholar 

  24. Diez DM, Barr CD, Cetinkaya-Rundel M (2012) OpenIntro statistics. OpenIntro, Boston

    Google Scholar 

  25. Mann HB (1945) Nonparametric tests against trend. Econometrica 13(3):245

    Article  MathSciNet  MATH  Google Scholar 

  26. Wei D, Li Y, Zhang Y, Deng Y (2012) Degree centrality based on the weighted network. In: 2012 24th Chinese control and decision conference (CCDC). IEEE, New York

    Google Scholar 

  27. Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239

    Article  Google Scholar 

  28. Rényi A (1959) On the dimension and entropy of probability distributions. Acta Math Acad Sci Hung 10(1):193–215

    Article  MathSciNet  MATH  Google Scholar 

  29. QS world university rankings for computer science and information systems 2021—topuniversities.com. https://www.topuniversities.com/university-rankings/university-subject-rankings/2021/computer-science-information-systems. Accessed 30 Jan 2023

  30. Computing classification system—dl.acm.org. https://dl.acm.org/ccs. Accessed 30 Jan 2023

  31. Acua: audience, customer, and user analytics—acua.qcri.org. https://acua.qcri.org/tool/Name2GAN. Accessed 30 Jan 2023

  32. Li B, Jacob-Brassard J, Dossa F, Salata K, Kishibe T, Greco E, Baxter NN, Al-Omran M (2021) Gender differences in faculty rank among academic physicians: a systematic review and meta-analysis. BMJ Open 11(11):050322

    Article  Google Scholar 

  33. Liu J, Kong X, Xia F, Bai X, Wang L, Qing Q, Lee I (2018) Artificial intelligence in the 21st century. IEEE Access 6:34403–34421

    Article  Google Scholar 

  34. Schönemann PH (1985) On artificial intelligence. Behav Brain Sci 8(2):241–242

    Article  Google Scholar 

  35. Sharma B, Boet S, Grantcharov T, Shin E, Barrowman NJ, Bould MD (2013) The h-index outperforms other bibliometrics in the assessment of research performance in general surgery: a province-wide study. Surgery 153(4):493–501

    Article  Google Scholar 

  36. Cuny J, Aspray W (2002) Recruitment and retention of women graduate students in computer science and engineering: results of a workshop organized by the computing research association. SIGCSE Bull 34(2):168–174

    Article  Google Scholar 

  37. Berg HM, Ferber MA (1983) Men and women graduate students: who succeeds and why? J High Educ 54(6):629–648

    Article  Google Scholar 

  38. Okahana H, Feaster K, Allum J (2016) Graduate enrollment and degrees: 2005 to 2015. Council of Graduate Schools, Washington

    Google Scholar 

  39. Sun Q, Nguyen TD, Ganesh G (2019) Exploring the study abroad journey: Chinese and Indian students in U.S. higher education. J Int Consum Mark 32(3):210–227

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Baharan Khatami for providing some helpful ideas for this research.

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

GK: Data curation, Formal analysis, Methodology, Investigation, Validation, Visualization, Writing- Original draft. TZ: Data curation, Investigation, Validation, Software, Writing- Original draft. BB: Conceptualization, Project administration, Supervision, Writing- Reviewing and Editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ghazal Kalhor.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalhor, G., Zeraati, T. & Bahrak, B. Diversity dilemmas: uncovering gender and nationality biases in graduate admissions across top North American computer science programs. EPJ Data Sci. 12, 44 (2023). https://doi.org/10.1140/epjds/s13688-023-00422-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjds/s13688-023-00422-5

Keywords