We aim to assess the novelty of technological constituents in each patent, and then compare aspects of this novelty to the patent’s impact. We measure how technology codes are combined in the empirical data and compare the observed combination to what would be expected if the combinations were randomly configured. In this way, we can discern recurring themes within invention space and also those combinations that are unconventional or novel. These features, measured by well established standard scores, are related to patent future impact.

### 3.1 Standard scores (z-scores) of code pairs

The patent data *P* can be represented by a collection of sets of classification codes, where each set corresponds to an individual patent and contains its classification codes. The z-score for a pair of codes, *α* and *β* is expressed as:

$$ z_{\alpha\beta} = \frac{o_{\alpha\beta} - \mu_{\alpha\beta }}{\sigma _{\alpha\beta}}, $$

(1)

where \(o_{\alpha\beta}\) is the observed number of times the code *α* appears together with *β* within a patent (a set) within the actual data. \(\mu_{\alpha \beta}\) and \(\sigma_{\alpha\beta}\) are the expected co-occurrences of the codes and its standard deviation, derived from a null model of the data which randomises code arrangement while preserving code usage and number of patents within the data (the Section 3.2 provides the detail).

The observed co-occurrences \(o_{\alpha\beta}\) in the patent record is compared with \(\mu_{\alpha\beta}\). If the two codes appear together more often than expected, then Eq. (1) results in a positive value, or if they are rarely paired within a patent relative to their expected occurrences then their z-score is negative. The degree to which the deviation is significant is derived by normalising the value by the expected standard deviation *σ* [8, 9, 34, 35]. We can thus associate high z-scores with very typical code pairing, and conversely, a negative z-score is indicative of an atypical or novel pairing of codes.

### 3.2 Expected co-occurrences

The null model acts as the baseline by which we deem an aspect of the data to have statistical significance, beyond what would occur by random, or with no underlying pattern or law. The aspect in consideration is the arrangement of the codes between the patents. The premise of the null model is that each of these arrangements of codes is equally likely. From these possible arrangements the expected pairing counts can be computed.

Consider codes *α* and *β*, with the number of occurrences \(n_{\alpha}\) and \(n_{\beta}\) within the set of patents *P*. Noting that patents cannot be classified with the same code twice, the number of possible configurations of the *α* and *β* into the \(\vert P\vert \) possible patents is \(\binom{\vert P\vert }{n_{\alpha}} \binom{\vert P\vert }{n_{\beta}}\). Now consider those arrangements which contain exactly *x* co-occurrences, within a patent, of *α* and *β*. There are \(\binom{\vert P\vert }{n_{\alpha}} \binom{n_{\alpha}}{x} \binom {\vert P\vert -n_{\alpha }}{n_{\beta}-x}\) possible configurations; first distribution the *α* into \(\vert P\vert \), then *x* of the *β* into those \(n_{\alpha}\) patents already assigned an *α*, finally distribute the remaining *β* into the patents without an *α*. Thus giving a hypergeometric probability distribution for the number of co-occurrences:

$$ p(o_{\alpha\beta}=x) = \frac{ \binom{n_{\alpha}}{x} \binom {\vert P\vert -n_{\alpha }}{n_{\beta}-x} }{ \binom{\vert P\vert }{n_{\beta}} } $$

(2)

thus the expected number of patents that have both *α* and *β* is:

$$ \mu_{\alpha\beta} = \frac{n_{\alpha}n_{\beta}}{ \vert P\vert } $$

(3)

and the variance of \(\mu_{\alpha\beta}\) is:

$$ \sigma^{2}_{\alpha\beta} = \mu_{\alpha\beta} \biggl(1- \frac {n_{\alpha }}{\vert P\vert } \biggr) \biggl(\frac{\vert P\vert -n_{\beta}}{ \vert P\vert -1} \biggr). $$

(4)

### 3.3 Incorporating temporal evolution

As new technologies become successful, so they may subsequently become established areas of inventive activity. Following z-scores in time allows us to observe the case where an invention may have been exceptionally novel in its time of creation, but its novelty would ‘wash-out’ with many similar inventions subsequently follow it over time.

To capture this time variance we consider z-scores specific to time-ordered subsets of the entire data. We choose cumulatively increasing subsets in yearly steps, letting \(P(t)\) be the sub-collection of patents up to the year *t* in *P*. So \(P(2000)\) contains all patents issue up to the year 2000, and the z-scores calculated using this set are specific to this year. Thus for a given year, the newly added patents’ z-scores are discerned based on all the patents that precede them, and the older patents’ z-scores continue to evolve and change based on subsequently issued inventions.

### 3.4 A schematic for three cases: atypical, typical and neutral

We provide a schematic to aid in our understanding of how atypicality embedded in code combinations is captured and expressed by z-score measure.

Suppose 40 inventions *P* at time *t*, indexed by its entering order *i*. Each invention is expressed as a combination of codes:

$$\begin{aligned} P (t) ={} & \{P_{1}, P_{2}, P_{3}, \ldots, P_{40}\} \\ = {}& \bigl\{ \{A,C\}\times19, \{C,D\}, \{B,E,F\} \times4, \\ &{} \{B, E\} \times6, \{E, F\} \times5, \{B,F\} \times5 \bigr\} . \end{aligned}$$

Figure 1 illustrates the collection of patents at *t*, \(P(t)\), represented as a network structure where pairwise combinations are represented as weighted links (solid lines). We then consider three cases where a new patent \(P(t+1)\) arrives with two codes, that is, (i) \(P(t+1)-P(t) = \Delta P(t) \supset\{A,B\}\), denoted as a black dash line, (ii) \(\Delta P(t) \supset\{B,E\}\), as a red dash line, or (iii) \(\Delta P(t) \supset\{A,D\}\), as a green dashed line. Simply put, links are solid when they are present at time *t*, and dashed when they are added at time \(t+1\).

In the case (i), the link *a* bridges the two most frequently used codes *A* and *B* that are yet combined together until time *t*. We therefore find the appearance of link *a*
*atypical* given the current statistics, and naturally expect a *negative* z-score. Indeed, calculated \(z_{a}\) exhibits a negative value, that is, −4.3, with \(\mu_{AB} = 7.8\), and \(\sigma_{AB}=1.6\) in the Eq. (1). Note that the frequency of *A* and *B* are, respectively, \(n_{A}(t+1) = 20 \) and \(n_{B}(t+1) = 16\). On the other hand, the link *b* reinforces the existing pair that are already well connected, or established, hence, becoming a *convention*, yielding a positive z-score, 3. This indicates that they are combined more than expected by three times of standard deviation derived by the random choices. Finally, the link *c* yields a statistically *neutral* pair, around zero, indicating the occurrence is indistinguishable from the random configurations.

This method was employed successfully by Uzzi *et al.* [8] for academic paper citation rather than classification codes for inventions. It is worth noting the difference that technology code combinations in patent records bring different implications than citation relationships do, in that technology codes define full uses of discrete units of technological capabilities to create an output; every code represents a definite constitute of the whole. In contrast, paper citations can imply a much broader range of relations, for instance, direct reliance on the previous work, to show correspondence with previous results, to negatively point out flaws in the cited paper, and parallel methods applied to a different subject. Thus when it comes to an invention, we believe codes would more accurately capture the parts of an invention.

### 3.5 Coarse-graining over classification codes

The classification codes are created in a nested structure (hierarchical tree). The number of classes at the highest level is 473, with 16,087 subclasses at the first level down, and 168,743 codes at the deepest and most detailed level. The codes can be extremely detailed in their content, making results pertaining to specific codes very narrow in scope, or it can be quite broad. For example, the class 257, ‘ACTIVE SOLID-STATE DEVICES’, has the longest depth up to 16 at the end of which ‘Floating gate layer used for peripheral FET (EPO)’ and ‘Floating gate dielectric layer used for peripheral FET’, while the class 245, ‘Wire fabrics and structure’, has the shortest depth up to 2 at the end of which ‘Chain’ and ‘Coil’. As shown in the above examples, the level of differentiation for two codes in a class can be qualitatively different according to the depth and classes.

In addition, for a code pairing to appear novel the codes must have been used enough, relative to the number of patents, for the null model to predict a high number of co-occurrences by chance, it is then the relative lack of co-occurrences in the actual data that signifies a novel combination. The individual code usages (or frequencies) are far smaller than the total number of patents up to *t*; \(\vert P(t)\vert \). This means that the expected co-occurrence value \(\mu_{\alpha\beta}\) between two different codes, *α* and *β*, becomes increasingly small; \(\mu_{\alpha\beta}\ll1\). If two codes do co-occur, then by definition \(o_{\alpha\beta}\geq1\) thus giving a positive z-score. Hence using fine-grained codes gives us almost entirely positive z-scores and we cannot identify novel combinations.

To create a consistent level of detail in the analysis, and gain broader and more intelligible insights, we coarse-grain over the codes. This method is also employed by Uzzi *et al.* [8], who coarse-grain over individual papers up to the journal level. We look at pairings at the highest and the second highest level of the code hierarchy [36].

We consider each patent as a combination of classes - the highest level. We also consider them as a combination of subclasses, i.e. at one level below the class level, to gain insight in a slightly more fine-grained technology space, as well as for comparison with the class level results. At the code level a patent is never assigned the same code twice, and so no self-pairing is possible. This is not the case for class and subclasses, as multiple codes from the same class are often assigned to the same patent.

Let \(\alpha_{i}\) and \(\beta_{j}\) represent a code each, where *α* and *β* denote the class and *i* and *j* represent the rest of the code, specifying the low level detail. Rather than coarse-grain over the data before computing the resultant statistics, we first compute \(\mu_{\alpha_{i}\beta_{j}}\) and \(o_{\alpha_{i}\beta_{j}}\) for the most detailed structure at the code level, and only after which do we coarse-grain these values up the higher levels of the subclass and class pairings. Calculating the observed co-occurrences \(o_{\alpha\beta}\) and expected co-occurrences \(\mu_{\alpha\beta}\) of a class/subclass pair is carried out by summing over all code pairings that result in the considered class/subclass pairing. Hence for the observed co-occurrences:

$$ o_{\alpha\beta}=\biggl(1-\frac{\delta_{\alpha\beta}}{2}\biggr)\sum _{i,j}o_{\alpha _{i}\beta_{j}}, $$

(5)

where the bracketed term accounts for double counting when the classes considered are the same. Similarly for the expected co-occurrences:

$$ \mu_{\alpha\beta}=\biggl(1-\frac{\delta_{\alpha\beta}}{2}\biggr)\sum _{i,j}\mu _{\alpha_{i}\beta_{j}} $$

(6)

and the variance:

$$ \sigma^{2}_{\alpha\beta}=\biggl(1-\frac{\delta_{\alpha\beta}}{2}\biggr)\sum _{i,j}\sigma^{2}_{\alpha_{i}\beta_{j}} $$

(7)

from which using these the class/subclass pair z-score can be calculated.