 Regular article
 Open Access
 Published:
The shocklet transform: a decomposition method for the identification of local, mechanismdriven dynamics in sociotechnical time series
EPJ Data Science volume 9, Article number: 3 (2020)
Abstract
We introduce a qualitative, shapebased, timescaleindependent timedomain transform used to extract local dynamics from sociotechnical time series—termed the Discrete Shocklet Transform (DST)—and an associated similarity search routine, the Shocklet Transform And Ranking (STAR) algorithm, that indicates time windows during which panels of time series display qualitativelysimilar anomalous behavior. After distinguishing our algorithms from other methods used in anomaly detection and time series similarity search, such as the matrix profile, seasonalhybrid ESD, and discrete wavelet transformbased procedures, we demonstrate the DST’s ability to identify mechanismdriven dynamics at a wide range of timescales and its relative insensitivity to functional parameterization. As an application, we analyze a sociotechnical data source (usage frequencies for a subset of words on Twitter) and highlight our algorithms’ utility by using them to extract both a typology of mechanistic local dynamics and a datadriven narrative of sociallyimportant events as perceived by Englishlanguage Twitter.
Introduction
The tasks of peak detection, similarity search, and anomaly detection in time series is often accomplished using the discrete wavelet transform (DWT) [1] or matrixbased methods [2, 3]. For example, waveletbased methods have been used for outlier detection in financial time series [4], similarity search and compression of various correlated time series [5], signal detection in meteorological data [6], and homogeneity of variance testing in time series with long memory [7]. Wavelet transforms have far superior localization in the time domain than do pure frequencyspace methods such as the shorttime Fourier transform [8]. Similarly, the chirplet transform is used in the analysis of phenomena displaying periodicityinperspective (linearly or quadraticallyvarying frequency), such as images and radar signals [9–12]. Thus, when analyzing time series that are partially composed of exogenous shocks and endogenous shocklike local dynamics, we should use a small sample of such a function—a “shock”, examples of which are depicted in Fig. 1, and functions generated by concatenation of these building blocks, such as that shown in Fig. 2.
In this work, we introduce the Discrete Shocklet Transform (DST), generated by crosscorrelation functions of a shocklet. As an immediate example (and before any definitions or technical discussion), we contrast the DWT with the DST of a sociotechnical time series—popularity of the word “trump” on the social media website Twitter—in Fig. 3, which is a visual display of what we claim is the DST’s suitability for detection of local mechanismdriven dynamics in time series.
We will show that the DST can be used to extract shock and shocklike dynamics of particular interest from time series through construction of an indicator function that compresses timescaledependent information into a single spatial dimension using prior information on timescale and parameter importance. Using this indicator, we are able to highlight windows in which underlying mechanistic dynamics are hypothesized to contribute a stronger component of the signal than purely stochastic dynamics, and demonstrate an algorithm—the Shocklet Transform and Ranking (STAR) algorithm—by which we are able to automate post facto detection of endogenous, mechanismdriven dynamics. As a complement to techniques of changepoint analysis, methods by which one can detect changes in the level of time series [13, 14], the DST and STAR algorithm detect changes in the underlying mechanistic local dynamics of the time series. Finally, we demonstrate a potential usage of the shocklet transform by applying it to the LabMT Twitter dataset [15] to extract word usage timeseries matching the qualitative form of a shocklike kernel at multiple timescales.
Data and theory
Data
Twitter is a popular microblogging service that allows users to share thoughts and news with a global community via short messages (up to 140 or, from around November 2017 on, 280 characters, in length). We purchased access to Twitter’s “decahose” streaming API and used it to collect a random 10% sample of all public tweets authored between September 9, 2008 and April 4, 2018 [16]. We then parsed these tweets to count appearances of words included in the LabMT dataset, a set of roughly 10,000 of the most commonly used words in English [15]. The dataset has been used to construct nonparametric sentiment analysis models [17] and forecast mental illness [18] among other applications [19–21]. From these counts, we analyze the time series of word popularity as measured by rank of word usage: on day t, the mostused word is assigned rank 1, the secondmost assigned rank 2, and so on to create time series of word rank \(r_{t}\) for each word.
Theory
Algorithmic details: description of the method
There are multiple fundamentallydeterministic mechanistic models for local dynamics of sociotechnical time series. Nonstationary local dynamics are generally welldescribed by exponential, biexponential, or powerlaw decay functions; mechanistic models thus usually generate one of these few functional forms. For example, Wu and Huberman described a stretchedexponential model for collective human attention [22], and Candia et al. derived a biexponential function for collective human memory on longer timescales [23]. Crane and Sornette assembled a Hawkes process for video views that produces powerlaw behavior by using powerlaw excitement kernels [24], and LorenzSpreen et al. demonstrated a speedingup dynamic in collective social attention mechanisms [25], while De Domenico and Altmann put forward a stochastic model incorporating social heterogeneity and influence [26], and Ierly and Kostinsky introduced a rankbased, signalextraction method with applications to meteorology data [27]. In Sect. 1.2.2 we conduct a literature review, contrasting our methods with existing anomaly detection and similarity search time series data mining algorithms and demonstrating that the DST and associated STAR algorithm differ substantially from these existing algorithms. We have opensourced implementations of the DST and STAR algorithm; code for these implementations is available at a publiclyaccessible repository.^{Footnote 1}
We do not assume any specific model in our work. Instead, by default we define a kernel \(\mathcal{K}^{(\cdot )}\) as one of a few basic functional forms: exponential growth,
monomial growth,
powerlaw decay,
or sudden level change (corresponding with a changepoint detection problem),
where \(\varTheta (\cdot )\) is the Heaviside step function. The function rect is the rectangular function (\(\mathrm{rect}(x)=1\) for \(0< x< W/2\) and \(\mathrm{rect}(x) = 0\) otherwise), while in the case of the powerlaw kernel we add a constant ε to ensure nonsingularity. The parameter W controls the support of \(\mathcal{K}^{(\cdot )}(\tau W,\theta )\); the kernel is identically zero outside of the interval \([\tau  W/2, \tau + W/2]\). We define the window parameter W as follows: moving from a window size of W to a window size of \(W + \Delta W\) is equivalent to upsampling the kernel signal by the factor \(W + \Delta W\), applying an ideal lowpass filter, and downsampling by the factor W. In other words, if the kernel function \(\mathcal{K}^{(\cdot )}\) is defined for each of W linearly spaced points between \(N/2\) and \(N/2\), moving to a window size of W to \(W + \Delta W\) is equivalent to computing \(\mathcal{K} ^{(\cdot )}\) for each of \(W + \Delta W\) linearlyspaced points between \(N/2\) and \(N/2\). This holds the dynamic range of the kernel constant while accounting for the dynamics described by the kernel at all timescales of interest. We enforce the condition that \(\sum_{t= \infty }^{\infty } \mathcal{K}^{(\cdot )}(t W,\theta ) = 0\) for any window size W.
It is decidedly not our intent to delve into the question of how and why deterministic underlying dynamics in sociotechnical systems arise. However, we will provide a brief justification for the functional forms of the kernels presented in the last paragraph as scaling solutions to a variety of parsimonious models of local deterministic dynamics:
If the time series \(x(t)\) exhibits exponential growth with a statedependent growth damper \(D(x)\), the dynamics can be described by
$$ \frac{{d}x(t)}{{d}t} = \frac{\lambda }{D(x(t))}x(t),\qquad x(0) = x _{0}. $$(5)If \(D(x) = x^{1/n}\), the solution to this IVP scales as \(x(t) \sim t ^{n}\), which is the functional form given in Eq. (2). When \(D(x) \propto 1\) (i.e., there is no damper on growth) then the solution is an exponential function, the functional form of Eq. (1).
If instead the underlying dynamics correspond to exponential decay with a time and statedependent halflife \(\mathcal{T}\), we can model the dynamics by the system
$$\begin{aligned} &\frac{{d}x(t)}{{d}t} = \frac{x(t)}{\mathcal{T}(t)},\qquad x(0) = x_{0}, \end{aligned}$$(6)$$\begin{aligned} &\frac{{d}\mathcal{T}(t)}{{d}t} = f\bigl(\mathcal{T}(t), x(t)\bigr),\qquad \mathcal{T}(0) = \mathcal{T}_{0}. \end{aligned}$$(7)If f is particularly simple and given by \(f(\mathcal{T}, x) = c\) with \(c > 0\), then the solution to Eq. (6) scales as \(x(t) \sim t^{1/c}\), the functional form of Eq. (3). The limit \(c \rightarrow 0^{+}\) is singular and results in dynamics of exponential decay, given by reversing time in Eq. (1) (about which we expound later in this section).
As another example, the dynamics could be essentially static except when a latent variable φ changes state or moves past a threshold of some sort:
$$\begin{aligned} &\frac{{d}x(t)}{{d}t} = \delta \bigl( \varphi (t)  \varphi ^{*} \bigr),\qquad x(0) = x_{0}, \end{aligned}$$(8)$$\begin{aligned} &\frac{{d}\varphi (t)}{{d}t} = g\bigl(\varphi (t), x(t)\bigr),\qquad \varphi (0) = \varphi _{0}. \end{aligned}$$(9)In this case the dynamics are given by a step function from \(x_{0}\) to \(x_{0} + 1\) the first time \(\varphi (t)\) changes position relative to \(\varphi ^{*}\), and so on; these are the dynamics we present in Eq. (4).
This list is obviously not exhaustive and we do not intend it to be so.
We can use kernel functions \(\mathcal{K}^{(\cdot )}\) as basic building blocks of richer local mechanistic dynamics through function concatenation and the operation of the twodimensional reflection group \(R_{4}\). Elements of this group correspond to \(r_{0} = \mathrm{id}\), \(r_{1} = \) reflection across the vertical axis (time reversal), \(r_{2} = \) negation (e.g., from an increase in usage frequency to a decrease in usage frequency), and \(r_{3} = r_{1} \cdot r_{2} = r_{2} \cdot r_{1}\). We can also model new dynamics by concatenating kernels, i.e., “glueing” kernels backtoback. For example, we can generate “cusplets” with both anticipatory and relaxation dynamics by concatenating a shocklet \(\mathcal{K}^{(S)}\) with a timereversed copy of itself:
We display an example of this concatenation operation in Fig. 2. For much of the remainder of the work, we conduct analysis using this symmetric kernel.
The discrete shocklet transform (DST) of the time series \(x(t)\) is defined by
which is the crosscorrelation of the sequence and the kernel. This defines a \(T \times N_{W}\) matrix containing an entry for each point in time t and window width W considered.
To convey a visual sense of what the DST looks like when using a shocklike, asymmetric kernel, we compute the DST of a random walk \(x_{t}  x_{t1} = z_{t}\) (we define \(z_{t} \sim \mathcal{N}(0,1)\)) using a kernel function \(\mathcal{K}^{(S)}(\tau W, \theta ) \sim \mathrm{rect}(\tau )\tau ^{\theta }\) with \(\theta = 3\) and display the resulting matrix for window sizes \(W \in [10, 250]\) in Fig. 4.
The effects of time reversal by action of \(r_{1}\) are visible when comparing the first and third panels with the second and fourth panels, and the result of negating the kernel by acting on it with \(r_{2}\) is apparent in the negation of the matrix values when comparing the first and second panels and with the third and fourth. For this figure, we used a random walk as an example time series here as there is, by definition, no underlying generative mechanism causing any shocklike dynamics; these dynamics appear only as a result of integrated noise. We are equally likely to see large upwardpointing shocks as large downwardpointing shocks because of this, which allows us to see the activation of both upwardpointing and downwardpointing kernel functions.
As a comparison with this null example, we computed the DST of a sociotechnical time series, the rank of the word “bling” among the LabMT words on Twitter, and two draws from a null random walk model, and displayed the results in Fig. 5. Here, we calculated the DST using the symmetric kernel given in Eq. (10). (For more statistical details of the null model, see Appendix 1.) We also computed the DWT of each of these time series and display the resulting wavelet transform matrices next to the shocklet transform matrices in Fig. 5. Direct comparison of the sociotechnical time series (\(r_{t}\)) with the draws from the null models reveals \(r_{t}\)’s moderate autocovariance as well as the large, shocklike fluctuation that occurs in late July of 2015. (This underlying driver of this fluctuation was the release of a popular song entitled “Hotline Bling” on July 31st, 2015.) In comparison, the draws from the null model have a covariance with much more prominent time scaling and do not exhibit dramatic shocklike fluctuations as does \(r_{t}\). Comparing the DWT of these time series with the respective DST provides more evidence that the DST exhibits superior spacetime localization of shocklike dynamics than does the DWT.
To aggregate deterministic behavior across all timescales of interest, we define the shock indicator function as the function
for all windows W considered. The function \(p(W\theta )\) is a probability mass function that encodes prior beliefs about the importance of particular values of W. For example, if we are interested primarily in time series that display shock or shocklike behavior that usually lasts for approximately one month, we might specify \(p(W\theta )\) to be sharply peaked about \(W = 28\) days. Throughout this work we take an agnostic view on all possible window widths and so set \(p(W\theta ) \propto 1\), reducing our analysis to a strictly maximumlikelihood based approach. Summing over all values of the shocklet parameter θ defines the shock indicator function,
In analogy with \(p(W\theta )\), the function \(p(\theta )\) is a probability density function describing our prior beliefs about the importance of various values of θ. As we will show later in this section, and graphically in Fig. 6, the shock indicator function is relatively insensitive to choices of θ possessing a nearlyidentical \(\ell _{1}\) norm for wide ranges of θ and different functional forms of \(\mathcal{K}^{(S)}\).
After calculation, we normalize \(\mathrm{C}_{\mathcal{K}^{(S)}}(t)\) so that it again integrates to zero and has \(\max_{t} \mathrm{C}_{ \mathcal{K}^{(S)}}(t)  \min_{t} \mathrm{C}_{\mathcal{K}^{(S)}}(t) = 2\). The shock indicator function is used to find windows in which the time series displays anomalous shock or shocklike behavior. These windows are defined as
where the parameter \(s > 0\) sets the sensitivity of the detection.
The DST is relatively insensitive to quantitative changes to its functional parameterization; it is a qualitative tool to highlight time periods of unusual events in a time series. In other words, it does not detect statistical anomalies but rather time periods during which the time series appears to take on certain qualitative characteristics without being too sensitive to a particular functional form. We analyzed two example sociotechnical time series—the rank of the word “bling” on Twitter (for reasons we will discuss presently)— and the price time series of Bitcoin (symbol BTC) [28], the most activelyused cryptocurrency [29], and of one null model, a pure random walk. For each time series, we computed the shock indicator function using two kernels, each of which had a different functional form (one kernel given by the function of Eq. (10) and one of the identical form but constructed by setting \(\mathcal{K}^{(S)}(\tau W,\theta )\) to the function given in Eq. (1)), and evaluating each kernel over a wide range of its parameter θ. We also vary the maximum window size from \(W = 100\) to \(W = 1000\) to explore the sensitivity of the shock indicator function to this parameter. We display the results of this comparative analysis in Fig. 6. For each time series, we plot the \(\ell _{1}\) norm of the shock indicator function for each \((\theta , W)\) combination. We find that, as stated earlier in this section, the shock indicator function is relatively insensitive to both functional parameterization and value of the parameter θ; for any fixed W, the \(\ell _{1}\) norm of the shock indicator function barely changed regardless of the value of θ or choice of \(\mathcal{K}^{(\cdot )}\). However, the maximum window size does have a notable effect on the magnitude of the shock indicator function; higher values of W are associated with larger magnitudes. This is a reasonable finding, since higher maximum W means that the DST is able to capture shocklike behavior that occurs over longer timespans and hence may have values of higher magnitude over longer periods than for comparatively lower maximum W.
That the shock indicator function is a relative quantity is both beneficial and problematic. The utility of this feature is that the dynamic behavior of time series derived from systems of widelyvarying time and length scales can be directly compared; while the rank of a word on Twitter and—for example—the volume of trades in an equity security are entirely different phenomena measured in different units, their shock indicator functions are unitless and share similar properties. On the other hand, the Shock Indicator Function carries with it no notion of dynamic range. Two time series \(x_{t}\) and \(y_{t}\) could have identical shock indicator functions but have spans differing by many orders of magnitude, i.e., \(\operatorname{diam} x_{t} \equiv \max_{t} x _{t}  \min_{t} x_{t} \gg \operatorname{diam} y_{t}\). (In other words, the diameter of a time series in interval I is just the dynamic range of the time series over that interval.) We can directly compare time series inclusive of their dynamic range by computing a weighted version of the shock indicator function, \(\mathrm{C}_{\mathcal{K}}(t) \Delta x(t)\), which we term the weighted shock indicator function (WSIF). A simple choice of weight is
where \(t_{b}\) and \(t_{e}\) are the beginning and end times of a particular window. We use this definition for the remainder of our paper, but one could easily imagine using other weighting functions, e.g., maximum percent change (perhaps applicable for time series hypothesized to increment geometrically instead of arithmetically).
These final weighted shock indicator functions are the ultimate output of the shocklet transform and ranking (STAR) algorithm; the weighting corresponds to the actual magnitude of the dynamics and constitutes the “ranking” portion of the algorithm, while the weighting will only be substantially larger than zero if there existed intervals of time during which the time series exhibited shocklike behavior as indicated in Eq. (15). We present a conceptual, bird’seye view of the STAR algorithm (of which the DST is a core component) in Fig. 7. Though this diagram is lacking in technical detail, we have included it in an effort to provide a bird’seye view of the entire STAR algorithm and to help orient the reader on the conceptual process underpinning the algorithm.
Algorithmic details: comparison with existing methods
On a coarse scale, there are five nonexclusive categories of time series data mining tasks [30]: similarity search (also termed indexing), clustering, classification, summarization, and anomaly detection. The STAR algorithm is a qualitative, shapebased, timescaleindependent, similarity search algorithm. As we have shown in the previous section, the discrete shocklet transform (a core part of the overarching STAR algorithm) is qualitative, meaning that it does not depend too strongly on values of functional parameters or even the functions used in the crosscorrelation operation themselves, as long as the functions share the same qualitative dynamics (e.g., increasing rates of increase followed by decreasing rates of decrease for cusplike dynamics); hence, it is primarily shapebased rather than relying on the quantitative definition of a particular functional form. STAR is timescaleindependent as it is able to detect shocklike dynamics over a wide range of timescales limited only by the maximum window size for which it is computed. Finally, we believe that it is best to categorize STAR as a similarity search algorithm as this seems to be the bestfitting label for STAR that is given in the five categories listed at the beginning of this section; STAR is designed for searching within sociotechnical time series for dynamics that are similar to the shock kernel in some way, albeit similar in a qualitative sense and over any arbitrary timescale, not functionally similar in numerical value and characteristic timescale. However, it could also be considered a type of qualitative, shapebased anomaly detection algorithm because we are searching for behavior that is, in some sense, anomalous compared to a usual baseline behavior of many time series (though see discussion at the beginning of the anomaly detection subsection near the end of this section: STAR is an algorithm that can detect defined anomalous behavior, not an algorithm to detect arbitrary statistical anomalies).
As such, we are unaware of any existing algorithm that satisfies these four criteria and believe that STAR represents an entirely new class of algorithms for sociotechnical time series analysis. Nonetheless, we now provide a detailed comparison of the DST with other algorithms that solve related problems, and in Sect. 2.1 provide an indepth quantitative comparison with another nonparametric algorithm (Twitter’s anomaly detection algorithm) that one could attempt to use to extract shocklike dynamics from sociotechnical time series.
Similarity search—here the objective is to find time series that minimize some similarity criterion between candidate time series and a given reference time series. Algorithms to solve this problem include nearestneighbor methods (e.g., knearest neighbors [31] or a localitysensitive hashingbased method [32, 33]), the discrete Fourier and wavelet transforms [5, 34–36]; and bit, string, and matrixbased representations [30, 37–39]. With suitable modification, these algorithms can also be used to solve time series clustering problems. Generic dimensionalityreduction techniques, such as singular value decomposition/principal components analysis [40–42], can also be used for similarity search by searching through a dataset of lower dimension. Each of these classes of algorithms differs substantially in scope from the discrete shocklet transform. Chief among the differences is the focus on the entire time series. While the discrete shocklet transform implicitly searches the time series for similarity with the kernel function at all (userdefined) relevant timescales and returns qualitativelymatching behavior at the corresponding timescale, most of the algorithms considered above do no such thing; the user must break the time series into sliding windows of length τ and execute the algorithm on each sliding window; if the user desires timescaleindependence, they must then vary τ over a desired range. An exception to this statement is Mueen’s subsequence similarity search algorithm (MSS) [43], which computes sliding dot products (crosscorrelations) between a long time series of length T and a shorter kernel of length M before defining a Euclidean distance objective for the similarity search task. When this sliding dot product is computed using the fast Fourier transform, the computational complexity of this task is \(\mathcal{O}(T \log T)\). This computational step is also at the core of the discrete shocklet transform, but is performed for multiple kernel function arrays (more precisely, for the kernel function resampled at multiple userdefined timescales). Unlike the discrete shocklet transform, MSS does not subsequently compute an indicator function and does not have the selfnormalizing property, while the matrix profile algorithm [39] computes an indicator function of sorts (their “matrix profile”) but is not timescaleindependent and is quantitative in nature; it does not search for a qualitative shape match as does the discrete shocklet transform. We are unaware of a similaritysearch algorithm aside from STAR that is both qualitative in nature and timescaleindependent.
Clustering—given a set of time series, the objective is to group them into groups, or clusters, that are more homogeneous within each cluster than between clusters. Viewing a collection of N time series of length T as a set of vectors in \(\mathbb{R}^{T}\), any clustering method that can be effectively used on highdimensional data has potential applicability to clustering time series. Some of these general clustering methods include kmeans and kmedians algorithms [44–46], hierarchical methods [47–49], and densitybased methods [47, 50–52]. There are also methods designed for clustering time series data specifically, such as errorinmeasurement models [53], hidden Markov models [54], simulated annealingbased methods [55], and methods designed for time series that are wellfit by particular classes of parametric models [56–59]. Although the discrete shocklet transform component of the STAR algorithm could be coerced into performing a clustering task by using different kernel functions and elements of the reflection group, clustering is not the intended purpose of the discrete shocklet transform or STAR more generally. In addition, none of the clustering methods mentioned replicate the results of the STAR algorithm. These clustering methods uncover groups of time series that exhibit similar behavior over their entire domain; application of clustering methods to time series subsequences carries leads to meaningless results [60]. Clustering algorithms are also shapeindependent in the sense that they cluster data into groups that share similar features, but do not search for specific known features or shapes in the data. In contrast with this, when using the STAR algorithm we already have specified a specific shape—for example, the shock shape demonstrated above—and are searching the data across timescales for occurrences of that shape. The STAR algorithm also does not require multiple time series in order to function effectively, differing from any clustering algorithm in this respect; a clustering algorithm applied to \(N=1\) data points trivially returns a single cluster containing the single data point. The STAR algorithm operates identically on one or many time series as it treats each time series independently.
Classification—classification is the canonical supervised statistical learning problem in which data \(x_{i}\) is observed along with a discrete label \(y_{i}\) that is taken to be a function of the data, \(y_{i} = f(x_{i}) + \varepsilon \); the goal is to recover an approximation to f that precisely and accurately reproduces the labels for new data [61]. This is the category of time series data mining algorithms that least corresponds with the STAR algorithm. The STAR algorithm is unsupervised—it does not require training examples (“correct labels”) in order to find subsequences that qualitatively match the desired shape. As above, the STAR algorithm also does not require multiple time series to function well, while (nonBayesian) classification algorithms rely on multiple data points in order to learn an approximation to f.^{Footnote 2}
Summarization—since time series can be arbitrarily large and composed of many intricatelyrelated features, it may be desirable to have a summary of their behavior that encompasses the time series’s “most interesting” features. These summaries can be numerical, graphical, or linguistic in nature. Underlying methodologies for time series summary tasks include waveletbased approaches [62, 63], genetic algorithms [64, 65], fuzzy logic and systems [66–68], and statistical methods [69]. Though intermediate steps of the STAR algorithm can certainly be seen as a time series summarization mechanism (for example, the matrix computed by the DShT or the weighted shock indicator functions used in determinning rank relevance of individual time series at different points in time), the STAR algorithm was not designed for time series summarization and should not be used for this task as it will be outperformed by essentially any other algorithm that was actually designed for summarization. Any “summary” derived from the STAR algorithm will have utility only in summarizing segments of the time series the behavior of which match the kernel shape, or in distinguishing segments of the time series that do have a similar shape as the kernel from ones that do not.
Anomaly detection—if a “usual” model can be defined for the system under study, an anomaly detection algorithm is a method that finds deviations from this usual behavior. Before we briefly review time series anomaly detection algorithms and compare them with the STAR algorithm, we distinguish between two subtly different concepts: this data mining notion of anomaly detection, and the physical or social scientific notion of anomalous behavior. In the first sense, any deviation from the “ordinary” model is termed an anomaly and marked as such. The ordinary model may not be a parametric model to which the data is compared; for example, it may be implicitly defined as the behavior that the data exhibits most of the time [70]. In physical and social sciences, on the other hand, it may be observed that, given a particular set of laboratory or observational conditions, a material, state vector, or collection of agents exhibits phenomena that is anomalous when compared to a specific reference situation, even if this behavior is “ordinary” for the conditions under which the phenomena is observed. Examples of such anomalous behavior in physics and economics include: spectral behavior of polychromatic waves that is very unusual compared to the spectrum of monochromatic waves (even though it is typical for polychromatic waves near points where the wave’s phase is singular) [71]; the entire concept of anomalous diffusion, in which diffusive processes with mean square displacement (autocovariance functions) scaling as \(\langle r(t)\rangle \sim t^{\alpha }\) are said to diffuse anomalously if \(\alpha \not \approx 1\) (since \(\alpha = 1\) is the scaling of the Wiener process’s autocovariance function) [72, 73], even though anomalous diffusion is the rule rather than the exception in intracellular and climate dynamics, as well as financial market fluctuations; and behavior that deviates substantially from the “rational expectations” of noncooperative game theory, even though such deviations are regularly observed among human game players [74, 75]. This distinction between algorithms designed for the task of anomaly detection and algorithms or statistical procedures that test for the existence of anomalous behavior, as defined here, is thus seen to be a subtle but significant difference. The DST and STAR algorithm fall into the latter category: the purpose for which we designed the STAR algorithm is to extract windows of anomalous behavior as defined by comparison with a particular null qualitative time series model (absence of clear shocklike behavior), not to perform the task of anomaly detection writ large by indicating the presence of arbitrary samples or dynamics in a time series that does not in some way comport with the statistics of the entire time series.
With these caveats stated, it is not the case that there is no overlap between anomaly detection algorithms and algorithms that search for some physicallydefined anomalous behavior in time series; in fact, as we show in Sect. 2.1, there is some significant convergence between windows of shocklike behavior indicated by STAR and windows of anomalous behavior indicated by Twitter’s anomaly detection algorithm when the underlying time series exhibits relatively low variance. Statistical anomaly detection algorithms typically propose a semiparametric model or nonparametric test and confront data with the model or test; if certain datapoints are very unlikely under the model or exceed certain theoretical boundaries derived in constructing the test, then these datapoints are said to be anomalous. Examples of algorithms that operate in this way include: Twitter’s anomaly detection algorithm (ADV), which relies on generalized seasonal ESD test [76, 77]; the EGADS algorithm, which relies on explicit time series models and outlier tests [78]; timeseries model and graph methodologies [79, 80]; and probabilistic methods [81, 82]. Each of these methods is strictly focused on solving the first problem that we outlined at the beginning of this subsection: that of finding points in one or more time series during which it exhibits behavior that deviates substantially from the “usual” or assumed behavior for time series of a certain class. As we outlined, this goal differs substantially from the one for which we designed STAR: searching for segments of time series (that may vary widely in length) during which the time series exhibits behavior that is qualitatively similar to underlying deterministic dynamics (shocklike behavior) that we believe is anomalous when compared to nonsociotechnical time series.
Empirical results
Comparison with Twitter’s anomaly detection algorithm
Through the literature review in Sect. 1.2 we have demonstrated that, to our knowledge, there exists no algorithm that solves the same problem for which STAR was designed—to provide a qualitative, shapebased, timescaleindependent measure of similarity between multivariate time series and a hypothesized shape generated by mechanistic dynamics. However, there are existing algorithms designed for nonparametric anomaly detection that could be used to alert to the presence of shocklike behavior in sociotechnical time series, which is the application for which we originally designed STAR. One leading example of such an algorithm is Twitter’s Anomaly Detection Vector (ADV) algorithm.^{Footnote 3} This algorithm uses an underlying statistical test, seasonalhybrid ESD, to test for the presence of outliers in periodic and nonstationary time series [76, 77]. We perform a quantitative and qualitative comparison between the STAR and ADV to compare their effectiveness at the task for which we designed STAR—determining qualitative similarity between shocklike shapes over a wide range of timescales—and to contrast the signals picked up by each algorithm, which, as we show, differ substantially. Before presenting results of this analysis, we note that this comparison is not entirely fair; though ADV is a stateoftheart anomaly detection algorithm, it was not designed for the task for which we designed STAR, and so it is not exactly reasonable to assume that ADV would perform as well as STAR on this task. In an attempt to ameliorate this problem, we have chosen a quantitative benchmark for which our a priori beliefs did not favor the efficacy of either algorithm.
As both STAR and ADV are unsupervised algorithms, we compare their quantitative performance by assessing their utility in generating features for use in a supervised learning problem. Since the macroeconomy is a canonical example of a sociotechnical system, we consider the problem of predicting the probability of a U.S. economic recession using only a minimal set of indicators from financial market data. Models for predicting economic recessions variously use only real economic indicators [83–85], only financial market indicators [86, 87], or a combination of real and financial economic indicators [88, 89]. We take an approach that is both simple and relatively granular, focusing on the ability of statistics of individual equity securities to jointly model U.S. economic recession probability. For each of the equities that was in the Dow Jones Industrial Average between 19990701 to 20171231 (a total of \(K=32\) securities), we computed both the DST (outputting the shock indicator function), STAR algorithm (outputting windows of shocklike behavior), and the ADV routine on that equity’s volume traded time series (number of shares transacted), which we sampled at a daily resolution for a total of \(T = 6759\) observations for each security. We then fit linear models of the form
where \(p_{t}\) is the recession probability on day t as given by the U.S. Federal Reserve (hence p is the lengthT vector of recession probabilities).^{Footnote 4} When we the model represented by Eq. (17) using ADV or STAR as the algorithms generating features, the design matrix X is a binary matrix of shape \(T \times (K + 1)\) with entry \(X_{tk}\) equal to one if the algorithm indicated an anomaly or shocklike behavior respectively in security k at time t and equal to zero if it did not (the +1 in the dimensionality of the matrix corresponds to the prepended column of ones that is necessary to fit an intercept in the regression). When we fit the model using the shock indicator function generated by the DST, the matrix X is instead given by the matrix with column k equal to the shock indicator function of security k.
We evaluate the goodness of fit of these linear models using the proportion of variance explained (\(R^{2}\)) statistic; these results are summarized graphically in Fig. 8. The linear using ADVindicated anomalies as features had \(R^{2}_{ \mathrm{ADV}} = 0.341\), while the model using the shock indicator function as columns of the design matrix had \(R^{2}_{\mathrm{DST}} = 0.455\) and the model using STARindicated shocks as features had \(R^{2}_{\mathrm{STAR}}= 0.496\). This relative ranking of feature importance remained constant when we used model loglikelihood ℓ as the performance metric instead of \(R^{2}\), with ADV, DST, and STAR respectively exhibiting \(\ell _{\mathrm{ADV}} = 16{,}278\), \(\ell _{\mathrm{DST}} = 15{,}633\), and \(\ell _{\mathrm{STAR}} = 15{,}372\). Each linear model exhibited a distribution of residuals \(\varepsilon _{t}\) that did not drastically violate the zeromean and distributionalshape assumptions of leastsquares regression; a maximum likelihood fit of a normal probability density to the empirical error probability distribution \(p( \varepsilon _{t})\) gave mean and variance as \(\mu = 0\) to within numerical precision and \(\sigma ^{2} \approx 6.248\), while a maximum likelihood fit of a skewnormal probability density [90] to the empirical error probability distribution gave mean, variance, and skew as \(\mu \approx 0.043\), \(\sigma ^{2} \approx 6.025\), and \(a \approx 2.307\). Taken in the aggregate, these results constitute evidence to suggest that features generated by the DST and STAR algorithms are superior in the task of classifying time periods as belonging to recessions or not than are features derived from the ADV method.
As a further comparison of the STAR algorithm and ADV, we generated anomaly windows (in the case of ADV) and windows of shocklike behavior (in the case of STAR) for the usage rank time series of each of the 10,222 words in the LabMT dataset. We computed the Jaccard similarity index for each word w (also known as the intersection over union) between the set of STAR windows \(\{I_{i}^{\mathrm{STAR}}(w)\}_{i}\) and the set of ADV windows \(\{I_{i}^{\mathrm{ADV}}(w)\}_{i}\),
We display the word time series and ADV and STAR windows for a selection of words pertaining to the 2016 U.S. presidential election in Fig. 9. (These words display shocklike behavior in a time interval surrounding the election, as we demonstrate in the next section, hence our selection of them as examples here.)
We display the distribution of all Jaccard similarity coefficients in Fig. 10. Most words have relatively little overlap between anomaly windows returned by ADV and windows of shocklike dynamics returned by STAR, but there are notable exceptions. In particular, a review of the figures contained in the online index suggests that ADV’s and STAR’s windows overlap most when the shocklike dynamics are particularly strong and surrounded by a time series with relatively low variance; they agree the most when hypothesized underlying deterministic mechanics are strongest and the effects of noise are lowest. The pronounced spikes in the words “crooked” and “stein” in Fig. 9 are an example of this phenomenon. However, when the time series has high variance or exhibits strong nonstationarity, ADV often does not indicate that there are windows of anomalous behavior while STAR does indicate the presence of shocklike dynamics; the panels of the words “trump”, “jill”, and “hillary” in Fig. 9 demonstrate these behaviors.
Taken in the aggregate, these results suggest that a stateoftheart anomaly detection algorithm, such as Twitter’s ADV, and a qualitative, shapebased, timescaleindependent similarity search algorithm, such as STAR, do have some overlapping properties but are largely mutuallycomplementary approaches to identifying and analyzing the behavior of sociotechnical time series. While ADV and STAR both identify strongly shocklike dynamics that occur when the surrounding time series has relatively low variance, their behavior diverges when the time series is strongly nonstationary or has high variance. In this case, ADV is an excellent tool for indicating the presence of strong outliers in the data, while STAR continues to indicate the presence of shocklike dynamics in a manner that is less sensitive to the time series’s stationarity or variance.
Social narrative extraction
We seek both an understanding of the intertemporal semantic meaning imparted by windows of shocklike behavior indicated by the STAR algorithm and a characterization of the dynamics of the shocks themselves. We first compute the shock indicator and weighted shock indicator functions (WSIFs) for each of the 10,222 labMT words filtered from the gardenhose dataset, described in Sect. 1.1, using a power kernel with \(\theta =3\). At each point in time, words are sorted by the value of their WSIF. The jth highest valued WSIF at each temporal slice, when concatenated across time, defines a new time series. We perform this computation for the top ranked \(k = 20\) words for the entire time under study. We also perform this process using the “spike” kernel of Eq. (4) and display each resulting time series in Fig. 11 (shock kernel) and Fig. 12 (spike kernel). (We term the spike kernel as such because we have \(\frac{{d} \mathcal {K}^{(Sp)}(\tau )}{{d}\tau } = \delta (\tau )\) on the domain \([W/2, W/2]\), the Dirac delta function; its underlying mechanistic dynamics are completely static except for one point in time during which the system is driven by an ideal impulse function.)
The \(j=1\) word time series is annotated with the corresponding word at relative maxima of order 40. (A relative maximum \(x_{s}\) of order k in a time series is a point that satisfies \(x_{s} > x_{t}\) for all t such that \(t  s \leq k\).) This annotation reveals a dynamic social narrative concerning popular events, social movements, and geopolitical fluctuation over the past neardecade. Interactive versions of these visualizations are available on the authors’ website.^{Footnote 5} To further illuminate the oftenturbulent dynamics of the top j ranked weighted shock indicator functions, we focus on two particular 60day windows of interest, denoted by shading in the main panels of Figs. 11 and 12. In Fig. 11, we outline a period in late 2011 during which multiple events competed for collective attention:
the 2012 U.S. presidential election (the word “herman”, referring to Herman Cain, a presidential election contender);
Occupy Wall Street protests (“occupy” and “protestors”);
and the U.S. holiday of Thanksgiving (“thanksgiving”)
Each of these competing narratives is reflected in the topleft inset. In the top right inset, we focus on a time period during which the most distinct anomalous dynamics corresponded to the 2014 Gaza conflict with Israel (“gaza”, “israeli”, “palestinian”, “palestinians”, “gathered”). In Fig. 12, we also outline two periods of time: one, in the top left panel, demonstrates the competition for social attention between geopolitical concerns:
street protests in Egypt (“protests”, “protesters”, “egypt”, “response”);
and popular artists and popular culture (“rebecca”, referring to Rebecca Black, a musician, and “@ddlovato”, referring to another musician, Demi Lovato).
In the top right panel we demonstrate that the most prominent dynamics during late 2015 are those of the language surrounding the 2016 U.S. presidential election immediately after Donald Trump announced his candidacy (“trump”, “sanders”, “donald”, “hillary”, “clinton”, “maine”).
We note that these social narratives uncovered by the STAR algorithm might not emerge if we used a different algorithm in an attempt to extract shocklike dynamics in sociotechnical time series. We have already shown (in the previous section) that at least one stateoftheart anomaly detection algorithm is unlikely to detect abrupt, shocklike dynamics that occur in time series that are nonstationary or have high variance. We display sidebyside comparisons of the indicator windows generated by each algorithm for every word in the LabMT dataset in the online appendix (http://compstorylab.org/shocklets/all_word_plots/). A review of figures in the online appendix corresponding with words annotated in Figs. 11 and 12 provides evidence that an anomaly detection algorithm, such as ADV, may not necessarily capture the sane dynamics as does STAR. We include selected panels of these figures in Appendix 3, displaying words corresponding with some peaks of the weighted shock and spike indicator functions. (We hasten to note that this of course does not preclude the possibility that anomaly detection algorithms might indicate dynamics that are not captured by STAR.)
Typology of local mechanistic dynamics
To further understand divergent dynamic behavior in word rank time series, we analyze regions of these time series for which Eq. (15) is satisfied—that is, where the value of the shock indicator function is greater than the sensitivity parameter. We focus on shocklike dynamics since these dynamics qualitatively describe aggregate social focusing and subsequent defocusing of attention mediated by the algorithmic substrate of the Twitter platform.
We extract shock segments from the time series of all words that made it into the top \(j = 20\) ranked shock indicator functions at least once. Since shocks exist on a wide variety of dynamic ranges and timescales, we normalize all extracted shock segments to lie on the time range \(t_{\mathrm{shock}} \in [0, 1]\) and have (spatial) mean zero and variance unity. Shocks have a focal point about their maxima by definition, but in the context of stochastic time series (as considered here), the observed maximum of the time series may not be the “true” maximum of the hypothesized underlying deterministic dynamics. Shock points—hypothesized deterministic maxima—of the extracted shock segments were thus determined by two methods: The maxima of the withinwindow time series,
and the maxima of the time series’s shock indicator function,
We then computed empirical probability density functions of \(t_{1}^{*}\) and \(t_{2}^{*}\) across all words in the LabMT dataset. While the empirical distribution of \(t^{*}_{1}\) is unimodal, the corresponding empirical distribution of \(t^{*}_{2}\) demonstrated clear bimodality with peaks in the first and last quartiles of normalized time. To better characterize these maximum a posteriori (MAP) estimates, we sample those shock segments \(x_{t}\) the maxima of which are temporallyclose to the MAPs and calculate spatial means of these samples,
where
The number ε is a small value which we set here to \(\varepsilon = 10 / 503\).^{Footnote 6} We plot these curves in Fig. 13. Shock segments that are close in spatial norm to the \(\langle x_{t_{ \mathrm{shock}}} \rangle _{n}\)—that is, shock segments \(x_{t_{ \mathrm{shock}}}\) that satisfy
where \(F^{\leftarrow }_{Z}(q)\) is the quantile function of the random variable Z—are plotted in thinner curves. From this process, three distinct classes of shock segments emerge, corresponding with the three relative maxima of the shock point distributions outlined above:

Type I: exhibiting a slow buildup (anticipation) followed by a fast relaxation;

Type II: with a correspondingly short buildup (shock) followed by a slow relaxation;

Type III: exhibiting a relatively symmetric shape.
Words corresponding to these classes of shock segments differ in semantic context. Type I dynamics are related to known and anticipated societal and political events and subjects, such as:
“hampshire” and “republican”, concerning U.S. presidential primaries and general elections,
“labor”, “labour”, and “conservatives”, likely concerning U.K. general elections,
“voter”, “elected”, and “ballot”, concerning voting in general, and
“grammy”, the music awards show.
To contrast, Type II (shocklike) dynamics describe events that are partially or entirelyunexpected, often in the context of national or international crises, such as:
“tsunami” and “radiation”, relating to theFukushima Daichii tsunami and nuclear meltdown,
“bombing”, “gun”, “pulse”, “killings”, and “connecticut”, concerning acts of violence and mass shootings, in particular the Sandy Hook elementary school shooting in the United States;
“jill” (Jill Stein, a 2016 U.S. presidential election competitor), “ethics”, and “fbi”, pertaining to surprising events surrounding the 2016 U.S. presidential election, and
“turkish”, “army”, “israeli”, “civilian”, and “holocaust”, concerning international protests, conflicts, and coups.
Type III dynamics are associated with anticipated events that typically reoccur and are discussed substantially after their passing, such as
“sleigh”, “xmas”, “wrapping”, “rudolph”, “memorial”, “costumes”, “costume”, “veterans”, and “bunny”, having to do with major holidays, and
“olympic” and “olympics”, relating to the Olympic games.
We give a full list of words satisfying the criteria given in Eqs. (22) and (23) in Table 1. We note that, though the above discussion defines and distinguishes three fundamental signatures of word rank shock segments, these classes are only the MAP estimates of the true distributions of shock segments, our empirical observations of which are displayed as histograms in Fig. 13; there is an effective continuum of dynamics that is richer, but more complicated, than our parsimonious description here.
Discussion
We have introduced a nonparametric pattern detection method, termed the discrete shocklet transform (DST) for its particular application in extracting shock and shocklike dynamics from noisy time series, and demonstrated its particular suitability for analysis of sociotechnical data. Though extracted social dynamics display a continuum of behaviors, we have shown that maximizing a posteriori estimates of shock likelihood results in three distinct classes of dynamics: anticipatory dynamics with long buildups and quick relaxations, such as political contests (Type I); “surprising” events with fast (shocklike) buildups and long relaxation times, examples of which are acts of violence, natural disasters, and mass shootings (Type II); and quasisymmetric dynamics, corresponding with anticipated and talkedabout events such as holidays and major sporting events (Type III). We analyzed the most “important” shocklike dynamics—those words that were one of the top20 most significant at least once during the decade of study—and found that Type III dynamics were the most common among these words (40.9%) followed by Type II (36.4%) and Type I (22.7%). We then showcased the discrete shocklet transform’s effectiveness in extracting coherent intertemporal narratives from word usage data on the social microblog Twitter, developing a graphical methodology for examining meaningful fluctuations in word—and hence topic—popularity. We used this methodology to create documentfree nonparametric topic models, represented by pruned networks based on shock indicator similarity between two words and defining topics using the networks’ community structures. This construction, while retaining artifacts from its construction using intrinsicallytemporal data, presents topics possessing qualitatively sensible semantic structure.
There are several areas in which future work could improve on and extend that presented here. Though we have shown that the discrete shocklet transform is a useful tool in understanding nonstationary local behavior when applied to a variety of sociotechnical time series, there is reason to suspect that one can generalize this method to essentially any kind of noisy time series in which it can be hypothesized that mechanistic local dynamics contribute a substantial component to the overall signal. In addition, the DST suffers from noncausality, as do all convolution or frequencyspace transforms. In order to compute an accurate transformed signal at time t, information about time \(t + \tau \) must be known to avoid edge effects or spectral effects such as ringing. In practice this may not be an impediment to the DST’s usage, since: empirically the transform still finds “important” local dynamics, as shown in Fig. 11 near the very beginning (the words “occupy” and “slumdog” are annotated) and the end (the words “stormy” and “cohen” are annotated) of time studied. Furthermore, when used with more frequentlysampled data the lag needed to avoid edge effects may have decreasing length relative to the longer timescale over which users interact with the data. However, to avoid the problem of edge effects entirely, it may be possible to train a supervised learning algorithm to learn the output of the DST at time t using only past (and possibly present) data. The DST could also serve as a useful counterpart to phrase and sentencetracking algorithms such as MemeTracker [93, 94]. Instead of applying the DST to time series of simple words, one could apply it to arbitrary ngrams (including whole sentences) or sentence structure pattern matches to uncover frequency of usage of verb tenses, passive/active voice construction, and other higherorder natural language constructs. Other work could apply the DST to more and different natural language data sources or other sociotechnical time series, such as asset prices, economic indicators, and election polls.
Notes
 1.
Python implementations of the DST and STAR algorithms are located at this git repository: https://gitlab.com/compstorylab/discreteshocklettransform.
 2.
Bayesian classification algorithms can perform classification based only on prior information, but this is also not similar to the STAR algorithm, since the STAR algorithm is a maximumlikelihood method that by definition requires at least one time series to operate.
 3.
 4.
Data is available at https://fred.stlouisfed.org/series/RECPROUSM156N.
 5.
 6.
This value comes from an arbitrary but small number of indices (five) we allow a shock segment to vary (±) about the index of the MAP estimate of the distributions of shock points, each of which can be considered as multinomial distributions supported on a 503dimensional vector space. The number 503 is the dimension of each shock segment after time normalization since the longest original shock segment in the labMT dataset was 503 days.
 7.
The dataset is available for purchase from Twitter at http://support.gnip.com/apis/firehose/overview.html. The ondisk memory statistic is the result of du h <dirname>  tail n 1 on the authors’ computing cluster and so may vary by machine or storage system.
Abbreviations
 ADV:

anomaly detection vector, Twitter’s anomaly detection algorithm
 DST:

Discrete Shocklet Transform, the analytical method at the core of this article
 STAR:

Shocklet Transform and Ranking algorithm, method that uses the DST to provide qualitative, shapebased similarity search in multivariate time series
 DWT:

Discrete Wavelet Transform, a transform that performs a task similar to, yet fundamentally different from, that performed by the DST
 LabMT:

Lab Mechanical Turk dataset, set of over 10,000 words used here and in multiple studies of human language and behavior
 MAP:

Maximum a posteriori, estimated maximum of a posterior distribution
 WSIF:

weighted shock indicator function, one of the outputs of the STAR algorithm
References
 1.
Chaovalit P, Gangopadhyay A, Karabatis G, Chen Z (2011) Discrete wavelet transformbased time series analysis and mining. ACM Comput Surv (CSUR) 43(2):6
 2.
Yeh CCM, Kavantzas N, Keogh E (2017) Matrix profile vi: meaningful multidimensional motif discovery. In: 2017 IEEE international conference on data mining (ICDM). IEEE Press, New York, pp 565–574
 3.
Zhu Y, Imamura M, Nikovski D, Keogh E (2018) Introducing time series chains: a new primitive for time series data mining. Knowl Inf Syst: 1–27
 4.
Struzik ZR, Siebes AP (2002) Wavelet transform based multifractal formalism in outlier detection and localisation for financial time series. Phys A, Stat Mech Appl 309(3–4):388–402
 5.
Popivanov I, Miller RJ (2002) Similarity search over timeseries data using wavelets. In: Proceedings 18th international conference on data engineering. IEEE Press, New York, pp 212–221
 6.
Lau KM, Weng H (1995) Climate signal detection using wavelet transform: how to make a time series sing. Bull Am Meteorol Soc 76(12):2391–2402
 7.
Whitcher B, Byers SD, Guttorp P, Percival DB (2002) Testing for homogeneity of variance in time series: long memory, wavelets, and the Nile river. Water Resour Res 38(5)
 8.
Benítez R, Bolós V, Ramírez M (2010) A waveletbased tool for studying nonperiodicity. Comput Math Appl 60(3):634–641
 9.
Mann S, Haykin S (1991) The chirplet transform: a generalization of Gabor’s logon transform. In: Vision interface, vol 91, pp 205–212
 10.
Wang G, Xia XG, Root BT, Chen VC (2002) Moving target detection in overthehorizon radar using adaptive chirplet transform. In: Proceedings of the 2002 IEEE radar conference (IEEE cat. no. 02CH37322). IEEE Press, New York, pp 77–84
 11.
Spanos P, Giaralis A, Politis N (2007) Time–frequency representation of earthquake accelerograms and inelastic structural response records using the adaptive chirplet decomposition and empirical mode decomposition. Soil Dyn Earthq Eng 27(7):675–689
 12.
Taebi A, Mansy H (2016) Effect of noise on timefrequency analysis of vibrocardiographic signals. J Bioeng & Biomed Sci 6(4)
 13.
Page E (1955) A test for a change in a parameter occurring at an unknown point. Biometrika 42(3/4):523–527
 14.
Mallat S, Hwang WL (1992) Singularity detection and processing with wavelets. IEEE Trans Inf Theory 38(2):617–643
 15.
Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM (2011) Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS ONE 6(12):26752
 16.
Li Q, Shah S, Thomas M, Anderson K, Liu X, Nourbakhsh A, Fang R (2017) How much data do you need? Twitter decahose data analysis
 17.
Reagan AJ, Danforth CM, Tivnan B, Williams JR, Dodds PS (2017) Sentiment analysis methods for understanding largescale texts: a case for using continuumscored words and word shift graphs. EPJ Data Sci 6(1):28
 18.
Reece AG, Reagan AJ, Lix KL, Dodds PS, Danforth CM, Langer EJ (2017) Forecasting the onset and course of mental illness with Twitter data. Sci Rep 7(1):13006
 19.
Frank MR, Mitchell L, Dodds PS, Danforth CM (2013) Happiness and the patterns of life: a study of geolocated tweets. Sci Rep 3:2625
 20.
Mitchell L, Frank MR, Harris KD, Dodds PS, Danforth CM (2013) The geography of happiness: connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PLoS ONE 8(5):64417
 21.
Lemahieu R, Van Canneyt S, De Boom C, Dhoedt B (2015) Optimizing the popularity of Twitter messages through user categories. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE Press, New York, pp 1396–1401
 22.
Wu F, Huberman BA (2007) Novelty and collective attention. Proc Natl Acad Sci 104(45):17599–17601
 23.
Candia C, JaraFigueroa C, RodriguezSickert C, Barabási AL, Hidalgo CA (2019) The universal decay of collective memory and attention. Nat Hum Behav 3(1):82
 24.
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41):15649–15653
 25.
LorenzSpreen P, Mønsted BM, Hövel P, Lehmann S (2019) Accelerating dynamics of collective attention. Nat Commun 10(1):1759
 26.
De Domenico M, Altmann EG (2019) Unraveling the origin of social bursts in collective attention. arXiv preprint. arXiv:1903.06588
 27.
Ierley G, Kostinski A (2019) A universal rankorder transform to extract signals from noisy data. arXiv preprint. arXiv:1906.08729
 28.
Nakamoto S, et al. (2008) Bitcoin: a peertopeer electronic cash system
 29.
Al Shehhi A, Oudah M, Aung Z (2014) Investigating factors behind choosing a cryptocurrency. In: 2014 IEEE international conference on industrial engineering and engineering management. IEEE Press, New York, pp 1443–1447
 30.
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
 31.
Yang K, Shahabi C (2007) An efficient k nearest neighbor search for multivariate time series. Inf Comput 205(1):65–98
 32.
Kale DC, Gong D, Che Z, Liu Y, Medioni G, Wetzel R, Ross P (2014) An examination of multivariate time series hashing with applications to health care. In: 2014 IEEE international conference on data mining. IEEE Press, New York, pp 260–269
 33.
Driemel A, Silvestri F (2017) Localitysensitive hashing of curves. arXiv preprint. arXiv:1703.04040
 34.
Keogh EJ, Pazzani MJ (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: PacificAsia conference on knowledge discovery and data mining. Springer, Berlin, pp 122–133
 35.
Wu YL, Agrawal D, El Abbadi A (2000) A comparison of dft and dwt based similarity search in timeseries databases. In: Proceedings of the ninth international conference on information and knowledge management. ACM, New York, pp 488–495
 36.
Chan FP, Fu AC, Yu C (2003) Haar wavelets for efficient similarity search of timeseries: with and without time warping. IEEE Trans Knowl Data Eng 15(3):686–705
 37.
Ratanamahatana C, Keogh E, Bagnall AJ, Lonardi S (2005) A novel bit level time series representation with implication of similarity search and clustering. In: PacificAsia conference on knowledge discovery and data mining. Springer, Berlin, pp 771–777
 38.
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE Press, New York, p 8
 39.
Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE Press, New York, pp 1317–1322
 40.
Eastman JR, Fulk M (1993) Long sequence time series evaluation using standardized principal components. Photogramm Eng Remote Sens 59(6)
 41.
Harris D (1997) Principal components analysis of cointegrated time series. Econom Theory 13(4):529–557
 42.
Lansangan JRG, Barrios EB (2009) Principal components analysis of nonstationary time series data. Stat Comput 19(2):173
 43.
Mueen A, Viswanathan K, Gupta C, Keogh E (2017) The fastest similarity search algorithm for time series subsequences under Euclidean distance
 44.
Seref O, Fan YJ, Chaovalitwongse WA (2013) Mathematical programming formulations and algorithms for discrete kmedian clustering of timeseries data. INFORMS J Comput 26(1):160–172
 45.
Vlachos M, Lin J, Keogh E, Gunopulos D (2003) A waveletbased anytime algorithm for kmeans clustering of time series. In: Proc. Workshop on clustering high dimensionality data and its applications. Citeseer
 46.
Goutte C, Toft P, Rostrup E, Nielsen F, Hansen LK (1999) On clustering fmri time series. NeuroImage 9(3):298–310
 47.
Jiang D, Pei J, Zhang A (2003) Dhc: a densitybased hierarchical clustering method for time series gene expression data. In: Third IEEE symposium on bioinformatics and bioengineering, 2003. Proceedings. IEEE Press, New York, pp 393–400
 48.
Rodrigues PP, Gama J, Pedroso JP (2006) Odac: hierarchical clustering of time series data streams. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, Philadelphia, pp 499–503
 49.
Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of timeseries data streams. IEEE Trans Knowl Data Eng 20(5):615–627
 50.
Denton A (2005) Kerneldensitybased clustering of time series subsequences using a continuous randomwalk noise model. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE Press, New York, p 8
 51.
Birant D, Kut A (2007) Stdbscan: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60(1):208–221
 52.
Çelik M, DadaşerÇelik F, Dokuz AŞ (2011) Anomaly detection in temperature data using dbscan algorithm. In: 2011 international symposium on innovations in intelligent systems and applications. IEEE Press, New York, pp 91–95
 53.
Kumar M, Patel NR, Woo J (2002) Clustering seasonality patterns in the presence of errors. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 557–563
 54.
Oates T, Firoiu L, Cohen PR (1999) Clustering time series with hidden Markov models and dynamic time warping. In: Proceedings of the IJCAI99 workshop on neural, symbolic and reinforcement learning methods for sequence learning, pp 17–21. Citeseer
 55.
Schreiber T, Schmitz A (1997) Classification of time series data with nonlinear similarity measures. Phys Rev Lett 79(8):1475
 56.
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of arima timeseries. In: Proceedings 2001 IEEE international conference on data mining. IEEE Press, New York, pp 273–280
 57.
Bagnall AJ, Janacek GJ (2004) Clustering time series from arma models with clipped data. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 49–58
 58.
Xiong Y, Yeung DY (2004) Time series clustering with arma mixtures. Pattern Recognit 37(8):1675–1689
 59.
FröhwirthSchnatter S, Kaufmann S (2008) Modelbased clustering of multiple time series. J Bus Econ Stat 26(1):78–89
 60.
Keogh E, Lin J (2005) Clustering of timeseries subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
 61.
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660
 62.
Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: onepass summaries for approximate aggregate queries. In: Vldb, vol 1, pp 79–88
 63.
Ahmad S, TaskayaTemizel T, Ahmad K (2004) Summarizing time series: learning patterns in ‘volatile’ series. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 523–532
 64.
CastilloOrtega R, Marín N, Sánchez D, Tettamanzi AG (2011) A multiobjective memetic algorithm for the linguistic summarization of time series. In: Proceedings of the 13th annual conference companion on genetic and evolutionary computation. ACM, New York, pp 171–172
 65.
Castillo Ortega R, Marín N, Sánchez D, Tettamanzi AG (2011) Linguistic summarization of time series data using genetic algorithms. In: EUSFLAT, vol 1. Atlantis Press, pp 416–423
 66.
Kacprzyk J, Wilbik A, Zadrożny S (2007) Linguistic summarization of time series under different granulation of describing features. In: International conference on rough sets and intelligent systems paradigms. Springer, Berlin, pp 230–240
 67.
Kacprzyk J, Wilbik A, Zadrożny S (2008) Linguistic summarization of time series using a fuzzy quantifier driven aggregation. Fuzzy Sets Syst 159(12):1485–1499
 68.
Kacprzyk J, Wilbik A, Zadrożny S (2010) An approach to the linguistic summarization of time series using a fuzzy quantifier driven aggregation. Int J Intell Syst 25(5):411–439
 69.
Li L, McCann J, Pollard NS, Faloutsos C (2009) Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 507–516
 70.
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR 41(3):15
 71.
Gbur G, Visser T, Wolf E (2001) Anomalous behavior of spectra near phase singularities of focused waves. Phys Rev Lett 88(1):013901
 72.
Plerou V, Gopikrishnan P, Amaral LAN, Gabaix X, Stanley HE (2000) Economic fluctuations and anomalous diffusion. Phys Rev E 62(3):3023
 73.
Jeon JH, Tejedor V, Burov S, Barkai E, SelhuberUnkel C, BergSørensen K, Oddershede L, Metzler R (2011) In vivo anomalous diffusion and weak ergodicity breaking of lipid granules. Phys Rev Lett 106(4):048103
 74.
Palfrey TR, Prisbrey JE (1997) Anomalous behavior in public goods experiments: how much and why?. Am Econ Rev: 829–846
 75.
Capra CM, Goeree JK, Gomez R, Holt CA (1999) Anomalous behavior in a traveler’s dilemma? Am Econ Rev 89(3):678–690
 76.
Rosner B (1983) Percentage points for a generalized esd manyoutlier procedure. Technometrics 25(2):165–172
 77.
Vallis O, Hochenbaum J, Kejariwal A (2014) A novel technique for longterm anomaly detection in the cloud. In: 6th \(\{\mathrm{USENIX}\}\) workshop on hot topics in cloud computing (HotCloud 14)
 78.
Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated timeseries anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1939–1947
 79.
Chan PK, Mahoney MV (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE Press, New York, p 8
 80.
Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, Philadelphia, pp 413–424
 81.
Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for timeseries anomaly detection. In: 2012 IEEE 12th international conference on data mining. IEEE Press, New York, pp 1074–1079
 82.
Akouemo HN, Povinelli RJ (2016) Probabilistic anomaly detection in natural gas time series data. Int J Forecast 32(3):948–956
 83.
Chauvet M (1998) An econometric characterization of business cycle dynamics with factor structure and regime switching. Int Econ Rev: 969–996
 84.
Dueker M (2005) Dynamic forecasts of qualitative variables: a qual var model of US recessions. J Bus Econ Stat 23(1):96–104
 85.
Österholm P (2012) The limited usefulness of macroeconomic Bayesian vars when estimating the probability of a US recession. J Macroecon 34(1):76–86
 86.
Hamilton JD, Lin G (1996) Stock market volatility and the business cycle. J Appl Econom 11(5):573–593
 87.
Estrella A, Mishkin FS (1998) Predicting US recessions: financial variables as leading indicators. Rev Econ Stat 80(1):45–61
 88.
Qi M (2001) Predicting US recessions with leading indicators via neural network models. Int J Forecast 17(3):383–401
 89.
Berge TJ (2015) Predicting recessions with leading indicators: model averaging and selection over the business cycle. J Forecast 34(6):455–471
 90.
O’hagan A, Leonard T (1976) Bayes estimation subject to uncertainty about parameter constraints. Biometrika 63(1):201–203
 91.
Cramer JS (1987) Mean and variance of r2 in small and moderate samples. J Econom 35(2–3):253–266
 92.
Carrodus ML, Giles DE (1992) The exact distribution of r2 when the regression disturbances are autocorrelated. Econ Lett 38(4):375–380
 93.
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397
 94.
Leskovec J, Backstrom L, Kleinberg J (2009) Memetracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 497–506
 95.
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
 96.
Dou W, Wang X, Chang R, Ribarsky W (2011) Paralleltopics: a probabilistic approach to exploring document collections. In: 2011 IEEE conference on visual analytics science and technology (VAST). IEEE Press, New York, pp 231–240
 97.
Serrano MÁ, Boguná M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci 106(16):6483–6488
 98.
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Acknowledgements
The authors acknowledge the computing resources provided by the Vermont Advanced Computing Core and financial support from the Massachusetts Mutual Life Insurance Company, and are grateful for web hosting assistance from Kelly Gothard and useful conversations with Jane Adams and Colin Van Oort.
Availability of data and materials
The datasets analysed during the current study and our source code are available in the https://gitlab.com/compstorylab/discreteshocklettransform repository.
Funding
The authors acknowledge financial support from NSF Big Data Grant #1447634 and MassMutual Life Insurance.
Author information
Affiliations
Contributions
DRD, DK, CMD, and PSD conceived of the idea; DRD and DK developed the theory; DRD, TA, MVA, and JRM analyzed data; DRD, TA, MVA, JRM, and PSD wrote the paper; DRD, TA, MVA, JRM, CMD, and PSD edited the paper. All authors read and approved the final manuscript.
Corresponding author
Correspondence to David Rushing Dewhurst.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Statistical details
In this appendix we will outline some statistical details of the DST and STAR algorithm that are not necessary for a qualitative understanding of them, but could be useful for more indepth understanding or efforts to generalize them.
We first give an illustrative example of how a sociotechnical time series can differ substantially from two null models of time series that have some similar statistical properties, displayed in Fig. 14 (a more informationrich version of Fig. 5, displayed in the main body), panels (A) and (B). In panel (A), we display an example sociotechnical time series in the red curve, usage rank of the word “bling” within the LabMT subset of words on Twitter (denoted by \(r_{t}\)), and \(\sigma r_{t}\), a randomly shuffled version of this time series. We denote \(\sigma \in \mathcal{S}_{T}\), the symmetric group on T elements, and draw σ from the uniform distribution over \(\mathcal{S}_{T}\). It is immediately apparent that the structure of \(r_{t}\) and \(\sigma r_{t}\) are radically different in autocorrelation (both in levels and differences) and we do not investigate this admittedlynaïve null model any further.
We next consider a random walk null model constructed as follows: first differencing \(r_{t}\) to obtain \(\Delta r_{t} = r_{t}  r_{t1}\), we apply random elements \(\sigma _{i} \in \mathcal{S}_{T}\) and integrate, displaying the resulting \(r_{\sigma _{i} t}= \sum_{t' \leq t}\sigma _{i}\Delta r_{t}\) in panel (C) of Fig. 14. Visual inspection (i.e., the “eye test”) also demonstrates that these time series do not replicate the behavior displayed by the original \(r_{t}\); they become negative, have a dynamic range that is almost an order of magnitude larger, and are more highly autocorrelated. We contrast the results of the DST on \(r_{t}\) and draws from this random walk null model in panels (D)–(G) of Fig. 14. In panel (D) we display the DST of \(r_{t}\), while in panels (E)–(G) we display the DST of three random \(\sigma _{i} r_{t}\). The DSTs of the draws from the random walk model are more irregular that the DST of \(r_{t}\), displaying many timedomain fluctuations between large positive values and large negative values. In contrast, the DST of \(r_{t}\) is relatively constant except near August of 2015, where it exhibits a large positive fluctuation across a wide range of W. The underlying dynamics for this fluctuation were driven by the release of a popular song called “Hotline Bling” on July 31st, 2015.
As a couterpoint to the DST, we computed the discrete wavelet transform (DWT) of \(r_{t}\) and the same \(\sigma _{i} r_{t}\). We computed the wavelet transform using the Ricker wavelet,
We chose to compare the DST with the DWT because these transforms are very similar in many respects: they both depend on two parameters (a location parameter τ and a scale parameter W); they both output a matrix of shape \(T \times N_{W}\) (\(N_{W}\) rows, one for each value W, and T columns, one for each value of τ). There are some key difference between these transforms, however. The “kernels” of the wavelet transform—the kernels—have unique properties not shared by our shocklike kernels: wavelets \(\psi (t)\) are defined on all of \(\mathbb{R}\), satisfy \(\lim_{t \rightarrow \pm \infty } \psi (t) = 0\), and are orthonormal. Our shocklike kernels do not satisfy any of these properties; they are defined on a finite interval \([W/2, W/2]\), do not vanish at the endpoints of this interval, and are not orthogonal functions. Hence, differences in the DST and DWT of a time series are due primarily to the choice of convolution function—shocklike kernel in the case of the DST and wavelet in the case of the DWT. We display the DWT of \(r_{t}\) and the same \(\sigma _{i} r_{t}\) in panels (H)–(K) of Fig. 14. Comparing these transforms with the DSTs displayed in panels (D)–(G), we see that the DST has increased timelocalization over the DWT in time intervals during which the time series exhibit shocklike dynamics.
As we note in Sect. 2.1 (there when comparing STAR to Twitter’s ADV anomaly detection algorithm), this observation should not be construed as equivalent to the statement that the DST is in some way superior to the DWT or should supersede the DWT for general time series processing tasks; rather, it is evidence that the DST is a superior transform than the DWT for the purpose of finding shocklike dynamics in sociotechnical time series—a task for which it was designed and the DWT was not.
We finally note an analytical property of the DST that, while likely not useful in practice, is a fact that should be recorded and may be useful in constructing theoretical extensions of the DST. The DST is defined in Eq. (11), which we record here for ease in reference:
defined for each t. The function \(\mathcal{K}^{(\cdot )}\) is the shock kernel that is nonzero on \(\tau \in [W/2 + t, W/2 + t]\). For \(t \in [T, T]\), this can be rewritten equivalently as
where \(\mathbf{K}(W  \theta )\) is a \((2T + 1) \times (2T + 1)\)Wdiagonal matrix, \(\mathbf{C}_{\mathcal{K}^{(\cdot )}}(W \theta )\) is the Wth row of the cusplet transform matrix, and x is the entire time series \(x(t)\) considered as a vector in \(\mathbb{R}^{2T + 1}\). The matrix \(\mathbf{K}(W  \theta )\) is just the convolution matrix corresponding to the crosscorrelation operation with \(\mathcal{K} ^{(\cdot )}\). If \(\mathbf{K}(W\theta )\) is invertible, then it is clear that
for any \(1 < W < T\) and hence also
This is an inversion formula similar to the inversion formulae of overcomplete transforms such as the DWT and discrete chirplet transform.
When \(T \rightarrow \infty \) (that is, when the signal \(x(t)\) is turned on in the infinite past and continues into the infinite future), this equation becomes the formal operator equation
and hence (as long as the operator inverses are welldefined),
These inversion formulae are, in our estimation, of relatively little utility in practical application. Whereas inverting a wavelet transform is a common task—it may be desirable to decompress an image that is initially compressed using the JPEG 2000 algorithm, which uses the wavelet transform for compact representation of the image—we estimate the probability of being presented with some arbitrary shocklet transform and needing to recover the original signal from it to be quite low; the shocklet transform is designed to amplify features of signals to which we already have access, not to recreate timedomain signals from their representations in other domains.
Appendix 2: Documentfree topic networks
An important application of the DST is the partial recovery of context or documentdependent information from aggregated time series data. In natural language processing, many models of human language are statistical in nature and require original documents from which to infer values of parameters and perform estimation [95, 96]. However, such information can be both expensive to purchase and require a large amount of physical storage space. For example, the tweet corpus from which the labMT rank dataset used throughout this paper was originally derived is not inexpensive and requires approximately 55 TB of disk space for storage.^{Footnote 7} In contrast, the dataset used here is derived from the freelyavailable LabMT word set and is less than 400 MB in size. If topics of relatively comparable quality can be extracted from this smaller and less expensive dataset, the potential utility to the scientific community at large, could be high.
We demonstrate that a reasonable topic model for Twitter during the time period of study can be inferred from the panel of rank time series alone. This is accomplished via a multistep metaalgorithm. First, the weighted Shock Indicator Function \(R_{i}\) is calculated for each word i. At each point in time t, words are sorted by their respective shock indicator functions. At time step t, the top k words are taken and linked pairwise for an upper bound of \(\binom{k}{2}\) additional edges in the network; if an edge already exists between word i and j, it is incremented by the mean of the words’ respective weighted Shock Indicator Function \(\frac{R_{i} + R_{j}}{2}\). Performing this process for all time periods results in a weighted network of related words. The weights \(w_{ij} = \sum_{t} \frac{R_{i,t} + R_{j,t}}{2}\) are large when the value of a word’s weighted shock indicator function is large or when a word is frequently in the top k, even if it is never near the top. The resulting network can be large; to reduce its size, its backbone is extracted using the method of Serrano et al. [97] and further pruned by retaining only those nodes and edges for which the corresponding edge weights are at or above the pth percentile of all weights in the backboned network. Topics are associated with communities in the resulting pruned networks, found using the modularity algorithm of Clauset et al. [98].
Figure 15 and Fig. 16 display the result of this procedure for \(k = 20\) and \(p = 50\). Unique communities (topics) are indicated by node color. In the coshock network (Fig. 15), topics include, among others:
Winter holidays and events (“valentines”, “superbowl”, “vday”, …);
U.S. presidential elections (“republicans”, “barack”, “clinton”, “presidential”, …);
Events surrounding the 2016 U.S. presidential election in particular (“clinton’s”, “crooked”, “giuliani”, “jill”, “stein”, …);
while the coshock network displays topics pertaining to:
popular culture and music (“bieber”, “#nowplaying”, “@nickjonas”, “@justinbieber”);
U.S. domestic politics (“clinton”, “hillary”, “trump”, “sanders”, “iran”, “sessions”, …);
and conflict in the Middle East (“gaza”, “iraq”, “israeli”, “gathered”)
The predominance of U.S. politics at the exclusion of politics of other nations is likely because the labMT dataset contains predominantly English words.
Appendix 3: STAR and ADV comparison figures
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dewhurst, D.R., Alshaabi, T., Kiley, D. et al. The shocklet transform: a decomposition method for the identification of local, mechanismdriven dynamics in sociotechnical time series. EPJ Data Sci. 9, 3 (2020). https://doi.org/10.1140/epjds/s136880200220x
Received:
Accepted:
Published:
Keywords
 Nonparametric statistics
 Sociotechnical time series
 Timedomain filtering
 Social media