Feature analysis of multidisciplinary scientific collaboration behaviors: A case study on PNAS

The features of collaboration behaviors are often considered to be different from discipline to discipline. Meanwhile, collaborating among disciplines is an obvious feature emerged in modern scientific research, which incubates several interdisciplines, such as sustainability science. The features of collaborations in and among the disciplines of biological, physical and social sciences are analyzed based on 52,803 papers published in a multidisciplinary journal PNAS during 1999 to 2013. In the aspect of similarities, the data emerge the similar transitivity and assortativity of collaboration behaviors, the identical distribution type of collaborators per author and that of papers per author. In the aspect of interactions, the data show a considerable proportion of authors engaging in interdisciplinary research, and the more collaborators and papers an author has, the more likely the author pursues interdisciplinary research. The analysis of the paper contents illustrates that the development of each science category has an equilibrium relationship in the long-run with the developments of typical research paradigms and transdisciplinary disciplines. Hence, those unified methodologies can be viewed as grounds for the interactions.


Introduction
According to their purposes, sciences are divided into three divisions by Aristotle, namely theoretical sciences (mathematics, physics,...), practical sciences (ethics, politics,...) and productive sciences (poetry, rhetoric,...) [1]. It is the initial version for the classification of natural sciences, social sciences and humanities. Natural and social sciences provide methodical approaches to study, predict and explain the natural phenomena and sociality (human behaviors and psychological states) respectively. Humanities present a methodology to study human culture, including history, literature, music and arts with an emphasis on understanding particular individuals, events, or eras [2]. Differences among the three methodical types exist on several dimensions, ranging from research problems, the evidences on which inferences are based, the vocabularies presenting concepts and theories, etc. J. Kagan even described them as three cultures [3].
Collaboration level is one of the differences among the three cultures. In common sense, the collaboration activities of researchers in the social sciences are more comparable to those of researchers in the natural sciences than those in the humanities, which can be verified by bibliometrical analysis on papers [4,5]. In reality, the media mainly count on scientific papers for the natural sciences, but diversify for social sciences and humanities, e. g. books [6,7]. However, the collaboration level in books is usually lower than that in papers. Hence, measuring collaboration level based on papers is an informative and reliable approach [8]. Bibliometrical studies based on papers also show that the collaboration level in each of the three cultures grows steadily over years in the world, especially the level of collaborations among nations and institutions [4].
Widely and frequently interacting between disciplines is a new feature emerged in modern research [9][10][11][12][13][14][15][16]. Exploring the hidden mechanism under the complex phenomena in nature and society, researchers often need to integrate data, techniques, concepts, and theories from several disciplines to solve problems whose solutions are beyond the scope of a single discipline [17,18]. Therefore, collaborations occur not only in one discipline solitary, also across disciplines [19][20][21]. Consequently, interactions between disciplines and and cross-sector projects incubate several interdisciplines, and fuzz the boundary of natural and social sciences [22][23][24][25]. A range of important scientific discoveries and breakthroughs are produced by the interactions, which have positive influences on academic, economic and social fields [26][27][28][29][30][31][32].
Quantitatively analyzing the interactions needs a fine dataset, e. g. papers of an influential journal of multidisciplinary studies. Discipline information of papers also should be provided. The Proceedings of the National Academy of Sciences (PNAS) is one such journal, the content of which spans social sciences and two principal sub-sciences in natural sciences, viz. biological and physical sciences. Moreover, the journal provides a fine data platform for analyzing the interactions at the level of global worldwide, because nearly half of its papers come from authors outside the United States. The data considered here consist of 52,803 papers published in PNAS over the years 1999-2013. Based on the data, we analyze the similarities of collaboration behaviors in the three science categories and the interactions between each two of them.
Collaboration relationships can be expressed by graphs (called coauthorship networks), where nodes represent authors, and edges coauthor relationships [33][34][35]. In the language of social networks, the data show specific similarities of the collaboration behaviors in the three categories of sciences, viz. partial transitivity of coauthorship, homophily on the number of collaborators and the distribution-type of collaborators/papers per author. The type is a mixture of generalized Poisson and power-law distributions. A presumed explanation is given to show the type can be deduced from a range of "yes/no" decisions of authors.
The data show a considerable proportion of authors and papers in physical and social sciences involving interdisciplinary research. Meanwhile, the authors having more collaborators and papers are probably to carry interdisciplinary research, and to introduce their collaborators coauthor in future. In a network view, the interactions make the coauthorship network extracted from the data have a giant component and small-world property. More than 88%, 80% and 71% authors in biological, physical and social sciences belong to the giant component respectively. Note that the author misidentification caused by initial-based methods increases the size of the ground-truth giant components [36]. Hence we identify authors by their provided names on papers (which likely split one author into two) to obtain a conservative result.
Research paradigms of sciences can be classified into four categories, namely theoretical research, experiment, simulation and data-driven. Meanwhile, transdisciplinary disciplines (e. g. systems science) integrate the theoretical and methodological perspectives drawn from all disciplines to build a unified methodology. The universality of research paradigms and transdisciplinary disciplines gives grounds for the interactions and the emergence of giant components in coauthorship networks. To validate the universality and transdisciplinarity, we analyze the contents of papers, and find that in each science category, the papers containing one of the seven selected topic words of research paradigms and transdisciplinary disciplines account a considerable fraction. Moreover, the quarterly number of papers in each science category and that containing each of the selected words is cointegrated and positively correlated.
This report is structured as follows: the data processing is described in Section 2; the similarities and interactions are analyzed in Sections 3-5 respectively; the content analysis is shown in Section 6; and the conclusion is drawn in Section 7.
2 The Data 2.1 Reason for using the data A multidisciplinary journal with the scope covering natural and social sciences can be utilized to analyze the interactions between science categories. Such journal can be also utilized to compare the collaboration behaviors of multidisciplines and find similarities. PNAS publishes high quality research papers, commentaries, reviews etc. Most importantly, it provides reliable discipline information of papers.
Multidiscipline journals: Science, Nature and Nature communication do not provide discipline information of papers. Journal of the royal society interface focuses on the cross-disciplinary research at the interface between the physical and life sciences, but does not involve social sciences. Our study is restricted into one journal. The restriction brings limitation of our study. However, knowledge media of social sciences are not limited to research papers [6,7]. Hence the results obtained must be carefully interpreted as being the behaviors of researchers who publish papers in the chosen journal. However, due to the influence and representability of PNAS, the case study potentially contributes to understanding aspects of multidisciplinary collaboration behaviors.

Discipline information
Almost papers of the dataset have been classified into three first-class disciplines (biological, physical, and social sciences) and 39 second-class disciplines, e. g. mathematics (Fig. 1). A small fraction of papers are only classified into the first-class disciplines. For those papers, we regard their second-class discipline to be the same as their first-class discipline. The data have 43,304 biological papers (including 3,957 papers of biophysics), which account for 77% of the total [38]. The data also contain 5,987 physical papers and 1,310 social papers. There are 3,007 interdisciplinary papers belonging to more than one of the second-class disciplines, which count for 5.7% of the total. The significant difference of discipline proportion does not mean the preference of PNAS. In reality, the number of researchers involving natural sciences (especially, biological sciences) is far more than that of researchers involving social sciences [3].

Coauthorship
Analyzing coauthorship needs to identify ground-truth authors. Current methods of author disambiguation can be classified into two classes: the methods only using the information of provided names (e. g. initial based methods), and the methods requiring additional information (e. g. email address). The second-class of methods are often hard to implement, due to the difficulty of collecting additional information. The dominant misidentification of initial based methods is caused by merging two or more different authors as one. Hence, it deflates the number of unique authors, and inflates the size of the ground-truth giant component. However, it does not much affect the distribution type of collaborators per author and that of papers per author [36].
The result in Section 5 is based on giant components, will be inflated if using initial based methods. Hence we identify authors by their provided names on papers, the effectiveness analysis of which is given in Appendix.
The splitting risk (treat the author using different names as different authors) of the adopted method is higher than those of initial based methods. Hence the result obtained here is conservative. and count the quarterly numbers of the papers (for each of the three science categories) containing specific words, which are listed in Appendix.

Topic words
The contents of papers are analyzed to explore the deep reasons for interactions between disciplines, and those for the formation of giant components in coauthorship networks. The package of Python: Natural Language Toolkit (NLTK, www.nltk.org) is used to extract specific words (nouns and the words whose synsets contain nouns) from the data. The words are called topic words, if they can express topics. The high proportion of the papers containing a topic word at certain levels reflects the typicality of the topic. Over half the papers contain the topic words "model", "experiment", "data", "system" and "control", which inspires us to think the research paradigms in common and transdisciplinary sciences (e. g. systems science) reflected by those words can be regarded as the grounds for the interactions. To confirm the universality and transdisciplinarity, we use some statistical technologies (e. g. cointegration test of time series) to analyze the relationships between the de-velopmental trend of each science category and that of each considered research paradigm and transdiscipline. Hence, we extract the papers' publication time, and count the quarterly numbers of the papers (for each of the three science categories) containing specific words, which are listed in Appendix.

Statistical properties of multidisciplinary coauthorship networks
Collaboration relationships can be expressed by a hypergraph, where nodes represent authors, and the authors of a paper (paper-team) form a hyperedge. The data show that the average paper-team-size of biological sciences (6.624) and that of physical sciences (5.254) are larger than that of social sciences (4.634). The size relation fits the reality that the sizes of research teams are usually larger in natural sciences, and smaller in social sciences [3].
A coauthorship network is extracted from a hypergraph as a simple graph, in which edges are formed between every two nodes in each hyperedge, and the multiple edges are treated as one. The technical terms "degree" and "hyperdegree" of nodes in theory of hypergraph are used to express the number of authors' collaborators and papers respectively.
We consider the coauthorship networks of the considered papers in specific disciplines or science categories. All of the networks are highly clustered, assortative, and their average shortest path length scale as the logarithms of their number of nodes (Table 1). Those properties do not mean all of the networks are small world. The network of social sciences is an exception, which even has no component containing more than 10% authors. However, it does not mean that the research in social sciences goes solitary. In fact, 71.5% authors in social sciences belong to the giant component of the network generated by the whole data. Therefore, analyzing the collaboration of authors restricting in single discipline has limitations. So we proceed the analysis in the environment of all disciplines together.     Table 2. The regions "G-P", "C-O", "P-L" stand for generalized Poisson, cross-over and power-law respectively.

Distribution type of degrees and hyperdegrees
Counting in the data (not restricted in each science category), the shapes of degree distributions for the authors in the three first-class disciplines are quite similar, so do the shapes of hyperdegree distributions (Fig. 2). Although collaboration level differs from one discipline to another, all of the distributions emerge a hook-head, a fat-tail, and a cross-over between them, which could be viewed as a common feature of coauthorship networks [39]. In the language of statistics, those distributions belong to one type: a mixture of generalized Poisson and power-law distribution. Regarding authors as samples, such a mixture distribution means those samples come from different populations, namely the collaboration behaviors of authors with few collaborators and papers differ from those with many collaborators and papers. In reality, authors mainly are teachers and students in institutes and universities, who can be viewed as two populations. Students on average only write few papers with few collaborators, but teachers are on the contrary.
An illustration free of disciplines in Reference [40] is given to explain the emergence of such type for degree distribution. With the same general ideas, a similar illustration can be adopted for hyperdegree distributions as follows. The event whether a researcher collaborates with one another to publish a paper can be regarded as a "yes/no" decision. So the hyperdegree of a researcher is equal to the number of successes in a sequence of decisions made by the candidates who want to coauthor with that researcher. Denote the number of those candidates to be n. Suppose the collaboration probability of each candidate to be p, and those "yes/no" decisions to be independent. Then, the hyperdegrees will follow a binomial distribution B(n, p). Poisson limit theorem shows when n is large and p is small, B(n, p) can be approximated by a Poisson distribution with expected value np. The value of np varies from author to author, due to the diversity of authors' ability to attract collaborators.
In reality, the decisions of authors could be affected by previous decisions, e. g. collaborating with the researchers who have publishing experience contributes to publish another paper. Hence, it is reasonable to regard small hyperdegrees as random variables drawn from a generalized Poisson distribution, which allows the occurrence probability of an event to affect by previous events [41]. For the researchers with large hyperdegrees, the numbers of their candidates are large enough that the "yes/no" decisions of candidates can be regarded to be independent. So their hyperdegrees could be regarded as random variables drawn from a range of Poisson distributions with sufficiently large expected values. The diversity guarantees the relative commonness for the existence of those authors with large hyperdegree, which appears as the fat tails of hyperdegree distributions.

Transitivity of coauthorship
Transitivity in society is that "the friend of my friend is also my friend", which is a typical feature of social affiliation networks. In academic society, collaborators of an author likely acquaint and so coauthor with each other. Organizational and institutional contexts drive the formation of transitive coauthorship, and so contribute the emergence of clusters and communities of authors.
The transitivity of a network can be quantified by two indexes in graph theory, namely global clustering coefficient (the fraction of connected triples of nodes which also form "triangles") and local clustering coefficient (the probability of a node's two neighbors connecting). High transitivity is a common feature of scientific collaboration networks [42], which can be reflected by the high values of global clustering coefficients in Table 3.
To what extent the transitivity is due to the activity of authors in academic society? The activity can be partly reflected through the number of collaborators, namely degree. Hence, the extent can be sketched through the correlation coefficients between degree and local clustering coefficient. Note that the correlation coefficients indicate the extent of a linear relationship between two variables or their ranks. The coefficients of variables X and Y generally do not completely characterize correlation, unless the conditional expected value of Y given X, denoted by E(Y |X), is linear or approximate linear function in X. The conditional expected value of local clustering coefficient given degree is the average local clustering coefficient of k-degree nodes, denoted by CC(k). The approximatively linear trends of CC(k) shown in Fig. 3 guarantee the effectiveness of correlation analysis in Table 3. The indicators are local clustering coefficient (LCC), the local transitivity of collaboration (LTC), the average degree of node neighbors (DN), the average hyperdegree of node neighbors (HN). We calculate the mean of those indicators over authors, the Spearman rank correlation coefficient (SCC) and Pearson product-moment correlation coefficient (PCC) between each indicator and degrees. For the two indicators with small PCC, we calculate their standard deviation (Std).
Do the negative correlation coefficients, or equivalently, the decreasing trends of CC(k) mean activity depresses transitivity? A positive answer to it is against common sense. In fact, those reflect the reality that the local clustering coefficients of students are larger than those of teachers on average. Many students leave their research teams after graduations, so the students studying in different periods of time are unlikely to collaborate. Leaving makes the neighbors of students probably are in the same time period and even in the same paper team. So students have high local clustering coefficient on average. Leaving simultaneously makes the low local clustering coefficient of teachers. Hence the puzzling thing does not contradict with common sense, but is due to insufficiency of measuring transitivity such a dynamical property by counting "triangles" on a static network.
To design a more reasonable index measuring transitivity, we come back to the original meaning of transitivity on coauthorship: the probability of two uncoauthored collaborators of a researcher coauthoring in future. The probability can be calculated for dynamic hypergraphs of collaborations through time information. Averaging the probability over authors measures the global transitivity, the value of which is quite low in each science category ( Table 3). Note that the calculation is imitated in the dataset, and transitivity may happen in other journals. In fact, 74.62% authors only have one paper in PNAS 1999-2013, whose local transitivity is zero. So the transitivity values here are underestimated.
The increasing trends of the transitivity probability of k-degree nodes (TC(k) in Fig. 3) mean the neighbors of the authors with many collaborators tend to coauthor in future on average. It means the activity contributes to transitivity. It fits common sense: a teacher (usually with a large degree) is more likely to introduce two students (who haven't worked together) to collaborate.

Homophily in coauthorship
Coauthorship is based on specific features of researchers in common, including interest, geography, academic reputation, etc. The homophily phenomenon appears in many social relations, and is called assortative mixing in network science [43]. Whether authors prefer to coauthor with others that are similar in social activity or productivity in each science category? The social activity and productivity of authors can be quantified by two scalar indexes of nodes, namely degree and hyperdegree respectively. Then the preference of an index could be sketched through the correlation coefficient between two variables, namely the index of nodes and the average index of each node's neighbors. Positive correlation means assortative, negative disassortative, and zero no preference.
Degree assortativity is feature of scientific collaboration networks [43]. It means sociable researchers (with many collaborators) will preferentially coauthor with other sociable researchers, and unsociable to unsociable. Hence one can expect to get a core of sociable authors tending to stick together and surrounded by a less dense periphery of less sociable authors. It has emerged in the data. The proportion of top 5.99% most sociable authors (measured according to degree) having coauthored with another such author is 99.5% [44]. The proportion may even be underestimated, because those authors probably couathored before 1999 or in other situations. Note that the splitting and  Fig. 3 The relation between degree and specific indicators. From k = 1 to max(degree), average over authors with k degree for specific indicators, viz. local clustering coefficient (CC(k)), local transitivity of collaboration (TC(k)), the average degree of node neighbors (DN(k)), the average hyperdegree of node neighbors (HN(k)). The data are binned on abscissa axes to extract the trends hiding in noise.
merging errors of the used name disambiguation method affect the proportion at certain levels. Even so, the proportion is still remarkable. If sociable researchers preferentially coauthor with sociable ones, then there will exist many sociable researchers, which is against the empirical data. Now we analyze the influence of the social activity of authors on degree assortativity. For the nodes with k-degree, denote the average degree of their neighbors by DN(k). There exists a transition in DN(k) of each empirical dataset: the head part has a clear increasing trend, but the tail part does not (Fig. 3). It means that degree assortativity are mainly contributed by the authors with small degree.
Actually, the transition fits common sense. Consider the research team of a teacher. Members of small research teams are likely to write papers together on average. Hence the authors in a small research team may have similar degrees. As the cumulative size of the research team increases over time, the degrees of students, on average, do not increase with the degree of their teacher, which leads to the non-positive and positive slopes of DN(k) in large and small k regions respectively.
The correlation coefficient between hyperdegree and the average hyperdegree of neighbors is around zero in each of the three science category (Table 3). For the nodes with k-hyperdegree, denote the average hyperdegree of their neighbors by HN(k). It means choosing collaborators is free of the productiv-ity factor. In reality, members of a research team may have various scientific ages (newcomers, incumbents), so different hyperdegrees. Since collaborations mainly happen in a research team, collaborators of an author could have various hyperdegrees, which appears as the stable trend of HN(k).
Based on the average value of HN(k) larger than 2, and 74.62% authors only having one paper in the data, we derive that most new appearing authors of the data (it does not mean the authors have not published papers on PNAS before 1999) collaborate with at least one author who has published a paper in the data. The proportions of those authors are 79.22%, 71.17% and 65.12% in biological, physical and social sciences respectively.

Interdisciplinarity of disciplines
The data have 58.1% and 48.2% papers of social and physical sciences belonging to interdisciplinary research (Fig. 4). Only 7.4% papers of biological sciences belong to interdisciplinary research, but occupy 40.8% interdisciplinary papers. Papers of biophysics (which belong to biological sciences) occupy 37.1% interdisciplinary papers. So 90.9% interdisciplinary papers of biological sciences belong to biophysics. Those interactions are between biological and physical sciences, namely happens within natural sciences (Fig. 5). There exist 22.5% interdisciplinary papers between natural and social sciences.
The discipline information can be used to classify authors into science categories: if one of his/her papers belongs to a discipline, an author can be classified into the discipline, so into the corresponding sciences. The giant component of coauthorship network PNAS 1999-2013 contains more than 86.8% authors. There are 71.5%, 76.7% and 88.9% authors of social, physical and biological sciences in the giant component. Note that 85.9% authors of interdisciplinary papers (less than that of the physical sciences) belong to the giant component, which is underestimated due to data boundary. There are 49.2%, 46.0% and 7.3% authors of social, physical and biological sciences who published interdisciplinary papers. The indicator is statistically significant high in social sciences, which undermines the common sense that socialists engage in research solitary. In fact, there has been a move towards increased interdisciplinarity in recent decades in social sciences, especially in information science and library science [45].
It seems above analysis process could be implemented to second-class disciplines to obtain a high-resolution result. Since some disciplines only have a few papers, e. g. 17 papers in political science, the analysis would lose statistical meaning. Here we provide a bird view of the interactions between second-class discipline (Fig. 6), where two disciplines are connected if there is a paper belonging to them simultaneously. As the bird view shows, no discipline is isolated. More details of the view are presented in our previous work [38].

Which authors and papers tend to involve interdisciplinary research?
The data show that in each science category, an author with more collaborators is more likely to engage in interdisciplinary research (Fig. 7a). Paper productivity of authors also has a significantly positive effect on the formation of interdisciplinary collaborations (Fig. 7b). The probability of a paper to be an interdisciplinary one is an increasing function of paper team size in physical and social sciences, but not in biological sciences (Fig. 7c). However, in all of the science categories, papers with a very large paper team size almost involve in interdisciplinary research.
The hub authors of the data neither concentrate on high degree nodes, nor distribute uniformly according to degree. This phenomenon is different from those of ER graph and the network generated by BA model (Fig. 8a). In this sense, the model in Reference [40] fits this phenomenon well. Authors with a comparable large hyperdegree are also responsible for the existence of giant components (Fig. 8b). Hence a considerable number of authors make coauthorship networks have the small-world property, not just the authors with numerous collaborators and papers. It means that interdisciplinary research is widely performed, and can be viewed as a reason for the emerged giant component of authors in all disciplines.  Panels (a,b) show that the more collaborators and papers an author has, the more likely the author does interdisciplinary research. Panel (c) shows the relationship between the probability of a paper to be an interdisciplinary one and the number of the paper's authors. , nodes are removed from high degree and hyperdegree to low respectively. In BA model, each new node connects to one old node. In ER graph, the connecting probability is 0.3. In both models, the number of nodes is 10,000. The parameter of the geometric model is that of the modeled network 1 in Reference [40].
The above view cannot be verified in the level of papers. Because removing the edges generated by interdisciplinary papers, the coauthorship network PNAS 1999-2013 still has a giant component. The reason is that some interdisciplinary authors acting as hubs also publish papers in a single discipline, so those hubs cannot be deleted from the network by removing interdisciplinary papers.

Exploring the grounds of interdisciplinary research
With the development of sciences, there is a tendency of fragmentation for disciplines: going to split into sub-disciplines and specific topics. Although the research objects are different, their research paradigms are in common, which can be grouped into four categories, namely theoretical research (e. g. modelling), experiment, simulation, and data-driven [46].
Many scientific problems are too complex to be understood through the methodology of single discipline. Integrating theoretical and methodological perspectives drawn from different disciplines creates a unified methodology for research problems and even vocabulary used to present concept in specific disciplines [47]. The integration drives the formation of transdisciplinary disciplines [48]. For example, systems science, as a methodology, studies systems from simple to complex, from natural to social sciences. Complexity science risen in 1980s is a new stage in the development of systems science.
Common research paradigms and methodology, especially those integrated as transdisciplinary disciplines, give grounds for the interactions between science categories and for the formation of giant components in coauthorship networks. To validate the universality of those paradigms and methodologies in the three science categories, we analyze the contents of papers in the level of words. A paper containing a topic word means the corresponding topic is utilized or discussed by that paper. We choose "model", "experiment", "simulation" and "data" to represent the four basic research paradigms, and choose "system", "network" and "control" to represent three typical topics of systems science.
We extract the quarterly numbers of papers containing those words in the three science categories respectively, which illustrate the developmental trends of the topics expressed by the selected words in each category respectively ( Fig. 9). Those time series are non-stationary, but first-order integrated (Table 4). Restricting in each science category, we test the cointegrations between the quarterly number of papers containing each word and that of papers. The Johansen test shows that all tested pairs are cointegrated (Table 4). The results demonstrate that the development of each science category and that of each research paradigm or transdisciplinary topic obey an equilibrium relationship in the long-run. Over half the papers contain "system" and "control". The proportions of papers containing "network" in all categories of sciences emerge an increasing trend ( Table 5). The high proportions also verify the transdisciplinarity of systems science. To understand the deep reason of the transdisciplinarity, we must fall back on the definitions and functions of systems science. A system is composed of several parts, and has specific structure and functions. Systems broadly exist in nature and society, such as aerospace systems, ecosystem, Year (b) Physical sciences data experiment model Fig. 9 The proportions of papers containing specific topic words in each year. The words are "system", "network", "data" etc.
commercial systems, etc. The core view of studying systems is "the whole is greater than the sum of the parts" (an immortal line of Aristotle). The aim of studying systems is to reveal their operation rules, and to understand the macro behavior, etc. To achieve this goal, the relations between parts are needed to be analyzed deeply. Without regard for the functionalities and features of parts, systems can be abstracted as networks. Therefore, after entering in 21 century, the rapid development of research on networks (model, algorithm, etc.) breeds a new discipline, namely network science. Some researchers from biological, physical and social fields investigate their respective problems under network framework. For example, social network analysis has become an important research technique in social sciences. Since understanding of natural and social systems is reflected in our ability to control them. Control theory (cybernetics) has a distinctly transdisciplinary mission to provide theories and approaches for comprehending the complex systems [49].

Conclusion
The case study on PNAS 1999-2013 verifies the similar transitivity and assortativity of collaboration behaviors in biological, physical and social sciences. The data demonstrate that the degree distribution types of the three science categories are identical, which are a mixture of generalized Poisson and power law distributions. The property also holds for hyperdegree. An illustration shows the type can be generated through authors' "yes/no" decisions for collaborations and the diversity of authors' ability to attract collaborations.
The data show that a considerable number of authors pursue interdisciplinary research. The interactions between disciplines are often regarded as a reason for the formation of giant components in coauthorship networks. Meanwhile, the data show that the formation goes beyond the authors with many collaborators and papers, but counts on a considerable number of authors. Therefore, the interactions can be considered to have been developed widely.
The data show that a substantial fraction of papers involve four typical research paradigms and three transdisciplinary topics in systems science. The development of each science category and that of each of the considered research paradigms and transdisciplinary topics obeys an equilibrium relationship in the long-run. Based on those statistical analyses, we could regard the universal research paradigms and transdisciplinary disciplines as grounds for the interactions.
The case study potentially provides a window for understanding aspects of multidisciplinary collaboration modes, due to the importance of PNAS in multidisciplinary research and to the provided accurate discipline information of papers. The selection of data might affect the details of our research findings: our results may not be interpreted as the behaviors of all researchers. Some aspects are indicative of the need for further research: Analysis of the interaction with the information of citation between sciences; Relationships of sciences or disciplines in the level of topics; Prediction of interdisciplinary fields via the trends of collaboration strength.

Author name disambiguation
Current methods of author disambiguation can be classified into two classes: the methods only using the information of provided names, and the methods requiring additional information. There are two traditional methods in the first class, namely using surname and the initial of the first name token, using surname and initials of all name token. S. Milojević provided a hybrid method, which uses surname and the first name token's initial. If a name with a first name token's initial has two or more match names with different second name token's initials, then all of the names are treated as different authors [50].
The misidentification caused by initial based methods is mainly through merging different authors as one [36]. There are 93.1% authors in PNAS 1999-2013 providing full first name. So we use name tokens per se, rather than their initial to identify authors. However, this can still produce errors by merging, if the authors provide exactly the same name. Chinese names were found to account for the repetition of names. We count the number and appearance frequency of names with a given name less than six characters and a surname among major 100 Chinese surnames. Note that people with those surnames account for 84.77% of the total population, where the statistical data come from Wikipedia. The small proportions of such kind authors in the data (2.7%), especially that of such authors publishing more than one paper (1.1%), make us feel quite confident that the impact of name repetition is limited. The detail of the analysis for each science category and specific disciplines is listed in Table 6. The indicators are the numbers of authors identified by first initial (FI), hybrid method (HI), all initials (AI), provided name (PN), the percent of authors only providing first initial and surname and that of those authors with more than one papers (a, b), the percent of authors providing full first name (c), providing middle names (d), and the percent of authors with major 100 Chinese surnames and one given name less than 6 characters, and that of those authors with more than one papers (e, f ).
The method adopted here will split one author as two or more, if the author does not provide his/her name consistently. Splitting underestimates the indexes used as evidences for universality of interdisciplinary research. Meanwhile, the misidentification caused by initial based methods is mainly through merging different authors as one, which makes the initial based methods unsuitable for analyzing the most productive authors and the size of largest component. Wrong merging will overestimate the indexes in Fig. 4. We prefer to obtain a conservative result rather than a wrong result, so use the provided names to identify authors. In addition, the inaccurate caused by the adopted method does not much affect the distribution type of collaborators per author and that of papers per author, which we will show elsewhere. .

6.2
The geometric hypergraph model of collaborations in Reference [40] Denote the probability density functions of generalized Poisson and powerlaw by f 1 (x) = a(a + bx) x−1 e −a−bx /x! and f 2 (x) = cx −d respectively, where a, b, c, d ∈ R + , and x belongs to some subsets of Z + . Generate random variables of an f (x) with head f 1 (x) and tail f 2 (x) by sampling random variables of f 1 (x) and f 2 (x) with probability q and 1 − q respectively. The modelled hypergraph is built on a cluster of concentric circles S 1 t , t = 1, 2, ..., T (T ∈ Z + ) as follows: 1. Coordinate and influential zone assignment For time t = 1, 2, ..., T do: Sprinkle N 1 nodes uniformly and randomly on S 1 t . Identify each node, e. g. i, by its coordinates (θ i , t i ), where t i is the generating time of i; Select N 2 new nodes randomly as lead nodes to attach specific zones: the zone of a lead node, e. g. j, is an interval of angular coordinate with center θ j and length αt −β j t β−1 , where α ∈ R + , and β ∈ [0.5, 1];

Connection rules
For time t = 1, 2, ..., T do: (a) For each new node i, search the existing lead nodes whose zones cover i. For each such lead node j, generate a hyperedge with size m by grouping together i, j and m − 2 neighbors of j nearest to i, where m is a random variable drawn from a given f (x) or the number of j's neighbors plus two if the former is larger than the latter.
(b) Select N 3 existing nodes with non-zero degree randomly. For each selected node l, generate a hyperedge by grouping together l and m − 1 randomly selected nodes with the same degree of l, where m is a random variable drawn from a given f (x) or the number of nodes with the same degree of l if the former is larger than the latter.