The higher education space: connecting degree programs from individuals’ choices

Data on the applicants’ revealed preferences when entering higher education is used as a proxy to build the Higher Education Space (HES) of Portugal (2008–2015) and Chile (2006–2017). The HES is a network that connects pairs of degree programs according to their co-occurrence in the applicants’ preferences. We show that both HES network structures reveal the existence of positive assortment in features such as gender balance, application scores, unemployment levels, academic demand/supply ratio, geographical mobility, and first-year drop-out rates. For instance, if a degree program exhibits a high prevalence of female candidates, its nearest degree programs in the HES will also tend to exhibit a higher prevalence when compared to the prevalence in the entire system. These patterns extend up to two or three links of separation, vanishing, or inverting for increasing distances. Moreover, we show that for demand/supply ratio and application scores a similar pattern occurs for time variations. Finally, we provide evidence that information embedded in the HES is not accessible by merely considering the features of degree programs independently. These findings contribute to a better understanding of the higher education systems at revealing and leveraging its non-trivial underlying organizing principles. To the best of our knowledge, this is the first network science approach for improving decision-making and governance in higher education systems.


I. INTRODUCTION
While many factors are known to determine applicants' choices when entering Higher Education and to contribute to their educational attainment -examples ranging from the socio-economic background of applicants [1][2][3][4] to their gender [5][6][7], but also including the expected earnings differentials between education fields [8,9]; self-identification and career opportunities [10,11]; ability beliefs and heterogeneous tastes [12][13][14]; political views, and applicants personality [15] -little is known on how these factors translate into higher order principles of higher education systems.
Linking individual actions to higher-order organizational principles of social systems has been a long lasting problem in computational social sciences [16][17][18][19][20][21]. Such link plays a key role in our ability to design effective governance instruments and interventions, in that their effectiveness is arguably bounded by our understanding of how elements in a system can affect each other. In the context of higher education [22,23], a lack of such knowledge materializes in our inability to answer simple questions, such as, how do changes in the demand of a given degree program spillover throughout the system? Would such variation be observed equally across all degree programs or would we, instead, observe a predictable and structured spillover dynamics? And what should we expect regarding other measurable features?
Here, we propose the Higher Education Space (HES) as a way to map the interplay and similarity between degree programs and as an instrument to improve the effectiveness of policy-making in higher education. Similarity between degree programs is measured by proxy from the revealed preferences of applicants when applying to higher education. The emerging structure, the HES, is a network that connects pairs of degree programs according to the likelihood that they co-occur in the applicant's preferences. Therefore, the HES represents 'how students, not administrators or faculty, think about the grouping of' degree programs [24]. This structure contrasts with the state of the art classification, the International Standard Classification of Education (ISCED) [25,26], based on the similarity of degree programs according to their expected course content.
Our work briefly presents findings that illustrate the relevance of the HES in different topics and in the context of the Portuguese and Chilean higher education systems. These have a similar and centrally run application process to higher education. However, they also contrast in many socio-economic standards.
The HES reveals the existence of positive autocorrela-tions [27] among features 1 of degree programs. These features include gender balance, application scores, demandsupply ratio, unemployment level, first-year dropout rate, and mobility. The autocorrelations patterns indicate that features tend to be positively assorted throughout the network structure of the HES, meaning that, if a degree program exhibits a high prevalence of, say, female applicants, then, degree programs up to two/three links away will also show a similar prevalence. Furthermore, while some features (e.g., application scores and demandsupply ratio) also exhibit autocorrelations patterns with respect to temporal variations, others do not (e.g., gender balance).
Results also show that autocorrelations regarding unemployment cannot be explained simply by matching elements with similar features. Indeed, the connectivity structure of a degree program in the HES seemingly plays a determinant role in the reported unemployment levels.
In that respect, we observe that connected degree programs tend to have similar unemployment levels, even after controlling for feature-matched, but unconnected, degree programs. Naturally, this finding has implications for applicants, since the choice of certain degree programs might later translate in social and capital costs associated with labor mobility.
This manuscript is organized as follows: in Section II we present a short description of the data used in this study; Section III presents the results along with a detailed discussion; and, in Section IV we present concluding remarks by summing all major contributions of this work and its societal implications.

II. DATA
The dataset consists of the preferences of applicants to the Portuguese (PHES) and Chilean (CHES) Higher Education Systems. While each preference in the application process corresponds to a pair of institution and degree program, here we limit the analysis to the choices of degree programs only.
Data for the PHES was obtained from the General Directorate for Education and Science Statistics, DGEEC 2 , through a collaboration with the Agency for Assessment and Accreditation of Higher Education, A3ES 3 . This dataset includes application records to all public higher education institutions between 2008 and 2015.
Data for the CHES was provided by the Department of Evaluation, Measurement and Educational Record, 1 Features here correspond to students' aggregated characteristics in a particular degree program. 2  A detailed description of the similarities and differences between each higher education system can be found in Appendix A. More information related to the data cleaning procedures can be found in Appendix B.

A. Descriptive Features of Degree Programs
For each degree program we collect different descriptive features aiming to explore autocorrelation patterns that might explain the organization of the HES. These features are assembled from the aggregated data of applicants (application scores, gender, demand, and geographical origin) or from institutional reports (unemployment levels, supply levels, and first-year dropout rates).
Each feature is standardized by year and across all degree programs that make the Higher Education Space in each country. For instance, the gender balance of each degree program is estimated by i) computing the fraction of female enrolled students in each degree program, these values are standardized by ii) subtracting the average fraction of enrolled female students among all degree programs and iii) dividing by the standard deviation, thus obtaining a Z-score. Standardization of the features yields not only comparable results across time but also information about the deviations of each feature to the average of the entire system.
In this work we focus on the analysis of the following features: Gender Balance (PHES and CHES), given by the fraction of female applicants in each degree program that actually enrolled at the end of the application process; Application Scores (PHES and CHES), given by the average score of applicants that enrolled in a degree program; Demand-Supply ratio (PHES and CHES), given by the ratio between the number of applicants that chose a given degree program as their first choice 6 by the number of open positions in that same degree program. This normalization ensures that demand is corrected for size effects (i.e., cases in which the sheer size of supply can drive demand). This indicator is similar, in spirit, to the "strength index" [28] sometimes computed to quantify institutions ability to fill the available offer from the first options of applicants; Geographical Mobility (CHES only), given by the distance by car, in km, between the candidate's city of origin and the location of the main campus of the institution of enrollment; Unemployment Level (PHES only), compiled and reported by institutions, and finally the first year dropout rate (PHES only), given by proxy from the enrollment situation of applicants at the end of the first year. Data on both Unemployment and first-year dropout levels are publicly available at http://infocursos.mec.pt.

A. The Higher Education Space
The Higher Education Space (HES) is estimated by identifying which pairs of degree programs exhibit a statistically significant co-occurrence in the applicants' preference list [29,30]. To that end, we start by counting the number of times a pair of different degree programs cooccurs (see Appendix C for more details), and control the number of observed co-occurrences by the expected number of occurrences from random chance, based on the total number of observations of each degree program.
Hence, we start by computing the φ-correlation index φ ij between pairs of degree programs. This can be achieved by taking a pair of options, i and j, and compute: where M ij represents the number of co-occurrences of option i and j in the preferences of a candidate and M i is the total number of observations of option i (M i = i M ij ). We discard all negatively correlated relationships, since this implies that such connections appear less than we would expect by random chance. Moreover, since the magnitude of observations varies across different degree programs, we use a t-test to infer whether the positive correlations are significantly distinguishable from zero.
To that end, we compute: where D − 2 represents the degrees of freedom (here we take a conservative approach and take D = max(M i , M j )). All relationships with a p-value greater than 0.05 are discarded as well as all the nodes that are not connected to the giant component. Figure 1 shows graphically the network structures of the HES for Portugal and Chile. Nodes represent degree programs and are colored according to the first level of the ISCED classification, which groups degree programs in one of nine education fields, namely: Arts and Humanities (dark blue), Social Sciences (dark green), Sciences (dark purple), Engineering (dark Yellow), Agriculture (pink), Education (red), Services (light purple), and Health (light blue). The size of the nodes is proportional to the number of observations. Links connect pairs of degree programs with a statistically significant co-occurrence pattern and thickness is proportional to the t-value associated to the φ-correlation.
The PHES network (Fig. 1a) results from all application preferences between 2008 and 2015, since no major and significant changes occurred in the system during that time interval. By contrast, the CHES network analysis is divided into two periods, due to the 2012's addition of nine new universities (see Appendix A). As such, the first period (Fig. 1b) considers all applications between 2006 and 2011, and the second period (Fig. 1c) takes all applications between 2012 and 2017.
The PHES and CHES networks are sparse (between 2% and 5% of the maximum number of relationships possible) and highly clustered (clustering coefficient measures between 0.48 and 0.51). The high clustering coefficient invites the use of network science methods (e.g., modularity-based network partition algorithms) to derive a classification/grouping of degree programs (see Figure 2 and related discussion bellow). Each network exhibits a diameter between 6 and 7 links, and an average path length (APL) between 3.06 and 3.61. Both CHES networks have fewer nodes than the PHES network (177 and 175 against 301) but relatively similar connectivity per degree program -9.50 and 8.18 against 8.15. There are common topological motifs in all three networks discernible by visual inspection, viz. the existence of three main clusters: one dominated by degree programs in Engineering; a second one that involves degree programs in Biology, Sciences, and Health; and a third with a strong representation of degree programs in Arts and Humanities, and Social Sciences.
Overall, the HES space is characterized by a doughnutshaped structure with a few degree programs occupying a central region connecting opposite sides of the network. This topology is not new and similar networks were obtained when mapping science and research areas [31,32]. Nonetheless, the above common motifs can have relevant implications for higher education policy development. For example, the centric role of Economics and Management (Commercial Engineering in Chile) connecting the Engineering, Arts and Humanities and Social Sciences clusters might hint to potential trans-disciplinary crossings when designing future changes in the system [33][34][35].
As mentioned above, the high clustering levels in all three networks invite for a classification/grouping of degree programs based on the network structure of the HES. Figure 2 shows the best partitions obtained using the Louvain algorithm [36], where nodes of the PHES (a) and CHES (c, and d) are colored according to the partition they belong 7 . The best PHES partition has a modularity of 0.72 and explains 86% of the intra-group 7 To estimate the best partition we have run the Louvain algorithm 10 3 independent times and selected the partition that resulted in the highest modularity.  connectivity. When compared with the ISCED classification, these values correspond to an improvement of 33% in modularity and of 23% in intra-group connectivity. Likewise, the best partition of the CHES networks exhibits a modularity of 0.62 (2006/11) and 0.63 (2012/17), explaining 85% (2006/11) and 86% (2012/17) of the intra-group connectivity with an improvement of 5%(2006/11) and 9% (2012/17), over the ISCED classification, see Figure 2.   according to the ISCED classification of its constituents. Colors among similar groups (C 1 to C 8 ) of different HES are kept consistent to ease the comparison. Groups of similar color match groups located in similar regions of the PHES and CHES. For example, group 1 (C 1 ) in PHES is composed of 11 degree programs from the Science Education Field, 1 degree program from Services and 40 degree programs from Engineering. A similar composition is found in C 1 of CHES, and for all other comparable groups (C 1 to C 6 ). As expected, there are differences and similarities among the three HES. Firstly, the number of communities differs between the PHES (8) and the CHES (6) which might be explained simply by the size of each network (see Appendix A for more details about each system). Secondly, the organization of the CHES network seems to have changed in the second time period, becoming more similar to the PHES network. This conjecture is backed-up by visual inspection only and needs future validation, but raises interesting questions: 1) does globalization of higher education [37][38][39][40] leads different HES to evolve towards similar structures? and 2) since these structures are based on applicants' choices, are they adapting quickly to societal transformations and is policy on higher education able to follow suit?

B. Feature Assortment in the Higher Education Space
The Higher Education Space (HES) is estimated uniquely based on the applicants' choices and completely nescient about particular features that characterize each degree program. Thus, the emergence of three coherent and similar networks, in two different countries and for different time periods, naturally leads to the question of what explains the emergence of these same structures?
The answer likely lies in a multiplicity of factors, some of which we briefly explore here by matching the HES network structures with available data on descriptive features of degree programs -e.g. gender balance or unemployment levels (c.f. Section II A). It is important to keep in mind that other factors involved in the applicants' choices can certainly help to explain the structure of the HES. However, due to data limitations and the scope of this manuscript such exploration is left for future work. bility, and first-year dropout rates (see Appendix D). Figure 3d-q explores, quantitatively, these clustering patterns (i.e., positive assortment) over the HES. To that end, we compute, for each feature, the autocorrelations between pairs of degree programs at different distances in the HES network (i.e., measured by the minimum number n of links that form a path from one degree program to the other). Bars represent the autocorrelation averaged over all observation years, and error bars the standard error in the estimation of the coefficients. For example, an autocorrelation of 0.75 at n = 1 for gender dominance, means that degree programs separated by one link exhibit, in average, 75% of the proportion of Female students of a focal degree program. Positive (negative) autocorrelation coefficients are shown in green (red). Bars in light colors indicate an autocorrelation that is not significantly different from zero (failed a ttest with p > 0.01).
These positive/negative relationships between pairs of degree programs seem to ascertain previous findings [3,4], in that some groups of students tend to choose similar preferences based on similar determinants of choice. For example, a positive assortment of in gender balance ( Fig.3d-f) confirms the existence of different preferences from individuals from different gender groups that are revealed in the choices of degree programs, as found in [5][6][7]41]. But more importantly, and a non-trivial finding of this approach, is to be able to show How and Where these similarities spread through the network and how neighbouring degree programs (nodes) influence or contami-  nate each other. In other words, how features spillover throughout the network structure of the HES. Returning to the gender balance example, Figure 3d-f confirms what was already concluded from a visual inspection of the network -the more female applicants apply to a degree program, the more female applicants are observed in neighboring degree programs, when compared with the average prevalence of female applicants in the entire system. This relationship This relationship is positive, significant up to two links, and holds for both Portugal and Chile. Positive autocorrelations, up to two neighbours, are also found, although not so strong, for Application Scores (Figure 3g-i) and Demand-Supply ratio ( Figure  3j-l), in both countries. Due to data availability, autocorrelation patterns for Unemployment levels ( Figure 3m) and Retention Rates (Figure 3q) are calculated for the PHES only. Both show similar behavioural patterns as in the previous features, although the positive relationship in unemployment levels extends to three-links of distance instead of two. Again, due to data constraints, the Student Mobility feature is only analyzed for the CHES (Figure 3n-o). Contrary to the other features, the positive relationship observed in the Geographical Mobility vanishes quickly and becomes negative/zero between degree programs separated by more than one links. Two possible explanations for the lack of a positive autocorrelation away from the first neighbors can be: 1) most applicants assign a small weight to distance as a factor in the choice of a degree program, and 2) the majority of applicants has a tendency to apply to degree programs that minimize the distance to their local of origin. Although previous research seems to support the second hypothesis [42][43][44][45][46][47], a more in-depth future analysis is needed to answer this question conclusively.
In sum, and with the exception of Geographical Mobility, all features exhibit positive autocorrelations that extend up to two/three links of separation. The Higher Education Space captures information embedded in the interplay between degree programs, which is revealed by studying the preference patterns of applicants. These results are a natural outcome of all the information applicants' carry at the moment of their choices [48] (i.e., either contextual information used in the decision making or inherent characteristics of applicants), which in turn modulates the topology of the HES.

C. Temporal Variations in Features
In the previous section, we have shown How and Where certain degree programs are positively correlated, in several features, as a function of the network distance between them. In this section, we examine how temporal changes in these features can spillover throughout the HES. By understanding the When of the autocorrelations patterns, it is possible, for instance, to perceive how external shocks propagate through the system. As an example, we take the particular case of the building sector in Portugal -one of the most affected by the financial crisis that hit the country between 2010 and 2014 (a crisis that was preceded by a downward path since the beginning of the millennium and the global financial crisis of 2008 [49]). Figure 4a-b shows, for the PHES, the temporal variation in the demand-supply ratio for Civil Engineering (a) and Architecture (b) between 2008 and 2015. Also shown (light gray) are the temporal variations of their closest direct neighbors in the Higher Education Space network (averaged is highlighted in red). After the economic and financial crisis, the construction industry was one of the most negatively affected [50,51]. A priori (without knowing the structure of the network), one could expect that both Civil Engineering and Architecture would suffer a similar impact on their demand-supply ratio given their close market relationship. However, a closer inspection of Figure 4a-b shows that the negative impact on the demand for Civil Engineering is not observed for Architecture. More importantly, in both cases, the variations are consistent with the average behavior of the nearest connected degree programs (temporal spillovers). This confirms and reinforces the above finding where both belong to two different clusters (architecture being closest to degree programs in Arts and Humanities than to Engineering), c.f. Figure 1 above.
The spatial autocorrelation patterns, concerning the temporal variations of features, help to explain how the observed changes that affect entire regions of the network in different ways and in different time periods. For example, a clearly discernible pattern in Figure 4c-d reveals that variations in the demand-supply ratio reversed from one part of the network to the other in two distinct time periods (2010/11 - Fig. 4c and 2014/15 -Fig. 4d). These temporal spillovers are confirmed by the autocor-relation patterns of the yearly time variations of each feature, over all degree programs in the PHES (Figure  5a-b). There are positive effects in time that remain up to two links of separation in the Demand-Supply Ratio and Application Scores, suggesting that, not only these two features changed over time (thus reacting to conjuntural changes) but also that those changes spillover to their neighbors.
However, we do not find autocorrelation patterns among the temporal variations for all features. Certain features, such as the demand-supply ratio (5 a, d, and g) and application scores (5 b, e, and h), show a synchronous variation over time, suggesting that it responds to contextual changes. On the other hand, gender balance (5 c, f, and i) do not change over time, suggesting that it is likely to respond to more long-term structural changes, e.g., cultural mechanisms, and other socio-economic factors. In the CHES although not all autocorrelation coefficients show a statistically significant pattern, results lead to similar conclusions (see Figure 5d-i).

D. Measuring Unemployment Similarity
Thus far, we have identified several prevailing autocorrelation patterns both in the spatial distribution of features but also in their temporal variations. However, at this point, it is not clear what explains the distinctive behaviour of a degree program in any given characterizing feature. For example, is the unemployment level of a degree program better explained by the connections it has in the HES or by other degree programs with similar features?
To explore this question we compare the difference in unemployment levels in a treatment group (pairs of degree programs that are connected in the HES) against several control groups (with similar behaviours in one or more features but that are not connected). To generate the control groups, we sample, for each pair in the treatment group, a second pair with equivalent levels of similarity in the available features, namely 1) gender level, 2) application scores, 3) demand-supply levels and 4) all three features combined. In addition, we built a 1) random control group where pairs of nodes are taken at random disregarding any similarity) and 2) a control group with degree programs of the same ISCED education field.
In Figure 6 rows show the average of the absolute difference in unemployment levels between pairs of degree programs for each control group. In all cases, the differences are smaller for the treatment group (vertical black line) when compared to the control groups (all differences are statistically significant -t-test between the averages of the two groups with p-value < 0.001). These findings support the hypothesis that the HES represents a similarity mapping between degree programs, as perceived by the applicants to higher education, that is not possible to access by estimating similarities using traditional features alone (e.g. gender, application scores or demand-  Figure 6. Comparison between the absolute differences in unemployment levels of pairs of degree programs in a treatment group (black vertical line) against different control groups (horizontal). The treatment group corresponds to 1177 connected pairs of degree programs in the Higher Education Space. Each control group (of the same size as the treatment group) is a set of pairs of degree programs matched through the propensity score matching [52] with the pairs in the treatment group. Similarities measure the Euclidean distance among pairs of degree programs in different control groups: random (dark orange), education fields of the ISCED classification (purple), gender (red), application scores (green), demand-supply ratio (orange) and all the last three features (blue). Error bars in the control groups indicate the standard error in the estimation of the averages therein and the shaded area is the standard error for the treatment group. Statistical significance was measured by a t-test between the treatment and control averages -all differences are significant with p-values < 0.01.

supply).
We should note that nodes in these networks do not incorporate any information about the institutions. These specificities can potentially change the results of the current model, especially in those cases where factors, such as the prestige of higher education institutions, the societal value of degree programs (e.g. medicine), and the relative location of institutions to their recruitment base can greatly impact the applicants' choices [53] and consequently, the structural organization of the higher education space.

IV. CONCLUSIONS
The ever growing worldwide complexity ensuing from technological, social, cultural and economic changes demand the design of highly effective governance instruments that can support management and policy development of higher education systems. This design requires novel data-driven approaches [22,23] that are able to capture the complex interplay between existing elements of the system and report new, comprehensive and reliable information about its functioning.
Here, we examined the potentials of exploring the higher education system through the lens of network science by looking at the applicants' conjoint choices and the emergent organizational structures. The rationale behind this approach originates from the assumption that students are not isolated beings when choosing their educational paths. The underlying intricacies of their choices are reflected not only on their individual decisions but also on society's organizational structure as captured and materialized in higher education systems and more specifically in the inter-dependencies among degree programs as viewed from the student's point of view.
By leveraging the information carried by the applicants to higher education in Portugal and Chile at the time of their application we have derived wider organizing principles common to both systems. We show that the Higher Education Space (HES) is sparse, highly ordered, modular and able to capture multi-factorial information about the applicants' choices.
The HES reveals the existence of autocorrelation patterns among many features describing degree programsgender balance, application scores, unemployment, mobility, demand-supply ratios, and retention rates -that stem from the aggregated characteristics of applicants and/or enrolled students. By construct, the methodology is blind to the applicants individual information, and as such, serves as evidence for validating the HES's utility as a source of non-trivial information about the system. For example, it informs that degree programs that are closer in the HES tend to be more similar in regards to their features. It follows that these similarities among degree programs have a "contagious" effect between their closest neighbours. These spatial and temporal spillovers are identified in features that reflect conjectural changes (application scores and the demand-supply). On features that reflect structural changes, as gender balance, only spatial spillovers are identified.
Moreover, the connectivity structure of the HES offers a larger explanatory power to certain features, such as unemployment levels, than a proximity mapping using other traditional variables. This implies an important take away for applicants, as unemployment is prevalent in full regions (i.e., sets of interconnected degree programs), thus exhibiting above-average unemployment, which can later manifest as a job mobility cost for graduates.
As Baker [24] stresses, perception mismatches between students or applicants and educators or decision-makers need to be taken into consideration when developing new policies. In that sense, here we proposed a network driven classification of degree programs that can serve as a complement to the ISCED classification. In our classification degree programs are grouped according to the applicants' perspectives, not to their curricular content. The HES stems from a much richer and multi-factorial decisionmaking process than the ISCED classification, reflecting how actors in the society perceive higher education.
As stated in the beginning of this work, we aimed at showing the potential of the Higher Education Space in supporting policy development. Admittedly, much was left for future work. In this respect, we identify three main areas for future development: 1) exploring the practical and actual application of the HES in designing effective governance actions, 2) exploring the resulting topological features of the HES for a wider spectrum of countries. This can either highlight the universality of the structures identified or help us understand how distinct HES are shaped by different cultural contexts and perspectives. Finally, 3) in countries where application systems are not governed by a central body -such as in the USA and in Brazil -the methodology herein can be replicated by resorting to nationwide surveys that mimic the application process in countries such as Portugal and Chile. There are three key aspects in common between the Portuguese and Chilean Higher Education Systems. First, the application process is centralized and controlled by state-governed bodies. Second, the application consists in the submission of a ranked list of preferences -up to six in Portugal and ten in Chile -that correspond to pairs Institution and Degree Program. Third, candidates are allocated to open positions -which are set prior to the competition process -by descending order according to the candidates' preferences and scores.
The Portuguese higher education system is organized into Universities and Polytechnics, and the Public sector represented more than 70% of total students' enrollments (first year, first time for all education levels), be-  The Chilean higher education system is organized into Universities, Professional Institutes, and Technical Schooling. Universities are classified into 'Traditional' or Private. 'Traditional' universities, created before de 1981 educational reform, include public universities belonging to the CRUCH 8 and private universities with state funding. The competition process, called Sistema Único de Admisión -SUA, was implemented in 2003 and is managed by the DEMRE. It started by covering the access to just 27 'Traditional' universities but following an educational movement commanded by secondary school students in 2012, 9 other private universities were included. Contrary to its Portuguese counterpart, the Chilean application process happens in a single round. Until 2011, SUA represented around 44.4% of total enrollments in universities and since 2012 around 67.9%.
The  Figure A1a and A1b show the time evolution of the number of candidates (blue), number of open positions (green), and number of enrolled students (red) per year, both for Portugal and Chile. It is important to note that the Portuguese Higher Education system suffered an overall decrease in demand during the period of analysis, follow by a recovery (although by 2018 this number has yet failed to match 2008 values), which contrasts with the steady growth observed in Chile during the same period. The decline of the Portuguese demand can have its origin not only in demographic trends but also in a wide range of socio-economic factors, this is however out of the scope of this work. Figure A1c and A1d shows the time evolution of the number of candidates (blue), number of open positions (green), and number of enrolled students (red) per year, both for Portugal and Chile. It is important to note that the Portuguese Higher Education system suffered an overall decrease in demand during the period of analysis, follow by a recovery (although by 2018 this number has yet failed to match 2008 values), which contrasts with the steady growth observed in Chile during the same period. The decline of the Portuguese demand can have its origin not only in demographic trends but also in a wide range of socio-economic factors, this is however out of the scope of this work. Figure A1e and A1f shows the average application score of candidates aggregated by age to both the Portuguese (e) and Chilean (f) Higher Education Systems. For both cases, the average scores peak at the age of sixteen years old and tend to decrease with older candidates, apart from fifteen years old that tend to have lower average scores when compared with the former. In both cases, the majority of the (above 90%) candidates are twenty-one or younger, and roughly 80% are eighteen or nineteen years old. Figure A2 show per year the correlations among the standardized features that mark degree programs. These features include the aggregated characteristics of enrolled students (gender, application scores, mobility), the first option of candidates (demand over supply ratio), but also the output variables such as the Unemployment level. In all cases, correlations are very strong between the same features across the years and for that reason we have opted to discuss, in the main manuscript, the results for a single year for all networks (typically the last year available). Interestingly, correlation patterns between different features in the data, some of which more 'common sense' than others, include the following: Unemployment shows a small but positive correlation with increasing proportion of girls and a negative correlation with application scores; for Chile, increasing proportions of girls are positively correlated with demand and negatively correlated with mobility; in both cases demand is positively correlated with application scores.

Appendix B: Data Cleaning
In this appendix we discuss the data transformations procedures in order to clean and filter the original raw data. We divide this Appendix in two subsection as the data cleaning process had differences between the two systems under study.

Portuguese Higher Education System Data Set
The level of organization of the raw data facilitated, greatly, the cleaning and filtering tasks. Besides a name, degree programs are encoded by a unique 4-digit ID code, and also have a 3-digit ISCED classification code associated with it. Degree programs are divided in several types, which are important to mention. First, there are 3-year (BA) and 5-year degree programs (BA+MA), both of which are available as options for candidates applying for the first time to Higher Education but offer different output degrees. Second, degree programs can be taught in different regimes. For instance, they can be taught in Portuguese or in a foreign language (e.g. English or French), and they can also be taught during the daytime (normal) or in a nocturnal (special) regime. In some cases, degree programs with the same name, can have different IDs to differentiate between these different types/scenarios. In order to clean and disambiguate degree programs, we have done the following steps: Other important manipulations include discarding all applicants older than 21 years old, in order to exclude applicants that entered in the Higher Education System via special programs. Additionally, we have only considered the first round of the Portuguese application process in all subsequent analyses.

Chilean Higher Education System Data Set
Data from Chile comes from multiple sources and, as such, it was not disambiguated at the same level as in the Portuguese data set. One major issue is that there is a unique ID for each pair degree programs and institution. For instance, the degree program in Physics will have different identification IDs, one per institution. A second issue is that we only have information on the first two levels of the ISCED classification. To disambiguate this problem we applied the following steps: 1) Discarded all degree programs that are not taught during in Portuguese during the normal regime (day time); 2) Aggregate degree programs with the same name but different IDs.
As in the Portuguese dataset, here we have also discarded all applicants older than 21 years old.

Appendix C: Higher Education Construction and Network Science Methods
The initial data set comprises of a list of preferences for each unique candidate. From here, a list of pairs of co-occurrences is generated among the preferences of each candidate. We do this by constructing, for each candidate, a list of all possible 2-Subsets, discarding all 2-Subsets that have repeated entries. This process is graphically depicted in Figure C1a, we refer to the final list of 2-Subsets as the observations.
After obtaining a list of all observations, we build a symmetric matrix that counts all co-occurrences of each pair of degree programs in the preference list of candidates, followed by the following steps: 1) ignore selfconnections, 2) calculated correlations -consider only positive values, 3) compute t-statistics and 4) select all significant edges (p-value < 0.05) and 5) discard all nodes that are unconnected from the giant component of the network. This process is illustrated in Figure C1c-g.

Network Science Methods
A network is a system comprised by a set of nodes/vertices and a set of links/edges. Links represent a pair of nodes that are connected. In the particular case of this manuscript, nodes abstract degree programs, and links represent a statistically significant co-occurrence relationship between a pairs of degree programs. The connectivity k i of a degree program, i, measures the number of degree programs it is connect to by a link. Figure 1 shows the graphical representation of the Higher Education Space network. Alternatively, one can represent the network through an adjacency matrix, A, where each entry (a ij ) is one if there is a link that connects degree programs i and j, or zero otherwise. Given the nature of the Higher Education Space, the adjacency matrix is symmetric, and all elements of the diagonal are zero. The degree distribution (D(k)) indicates the fraction of nodes in a network with degree k. Hence, the average degree ( k ) of a network corresponds to k = k kD(k), while the degree variance (var(k)) is given by k 2 − k 2 .
The Average Path Length (APL) measures the average shortest distance, measure in links, between any two nodes the network. That is, the minimum number of links a random walker would have to transverse if going from one node to the other [54], averaged by all possible pairs of nodes. The APL can be formally computed as phi correlation, ɸ ij

List of Preferences 2-Subsets Observations
(1) -List all possible 2-Subsets from the list of preferences of each candidate (2) -Discard all 2-Subsets that contain the same degree program repeated a) Figure C1. a) Illustration of the procedure used to extract observations from the initial list of preferences of each candidate. From each candidate's preference list, we construct a list of all possible 2-Subsets. From the latter, we discard all sets that contain the same degree program repeated. The final list corresponds to a list of pairs of degree programs. b-g) Depiction of the steps conducted to generate the PHES and CHES networks.
Step 1 (b,c), observations are collected, Step 2 (d,e) we compute the Phi correlation discarding all edges that exhibit negative values, next we compute the T-Statistics discarding all edges that are non-significant with a p-value < 0.05.
Step 3 (f,g), we discard all nodes that are unconnected from the giant component.
where Z is the number of nodes in the network, and d(i, j) the shortest distance between nodes i and j measured in number of links. The Cluster Coefficient C measures the average fraction of triangular motifs a node participates in over the total number of possible triangles [55]. Formally, it can be computed as where λ 1 is the number of triangles that involve i, and Λ i is the number of triples that involve i, that is, the number of sets of three nodes with two edges (open triangle). The Modularity (Q) measures the quality of a particular partition (i.e., groups of nodes) of a graph by estimating how many links between elements of the same group are captured by a given partition when compared with a random and uncorrelated network with the same connectivity distribution [56]. Formally, Q is computed as where m is the total number of links in the network, a i j equals one if i and j are connected and zero otherwise, k i is the number of links where node i participates, its degree, δ(X, Y ) equals one if X = Y and it is zero otherwise, and c i is the community/partition that node i belongs.   Figure D1. a) Illustration of the procedure used to extract observations from the initial list of preferences of each candidate. From each candidate's preference list, we construct a list of all possible 2-Subsets. From the latter, we discard all sets that contain the same degree program repeated. The final list corresponds to a list of pairs of degree programs. b-g) Depiction of the steps conducted to generate the PHES and CHES networks. Step 1 (b,c), observations are collected, Step 2 (d,e) we compute the Phi correlation discarding all edges that exhibit negative values, next we compute the T-Statistics discarding all edges that are non-significant with a p-value < 0.05.
Step 3 (f,g), we discard all nodes that are unconnected from the giant component.