Social Media Would Not Lie: Prediction of the 2016 Taiwan Election via Online Heterogeneous Data

The prevalence of online media has attracted researchers from various domains to explore human behavior and make interesting predictions. In this research, we leverage heterogeneous social media data collected from various online platforms to predict Taiwan's 2016 presidential election. In contrast to most existing research, we take a"signal"view of heterogeneous information and adopt the Kalman filter to fuse multiple signals into daily vote predictions for the candidates. We also consider events that influenced the election in a quantitative manner based on the so-called event study model that originated in the field of financial research. We obtained the following interesting findings. First, public opinions in online media dominate traditional polls in Taiwan election prediction in terms of both predictive power and timeliness. But offline polls can still function on alleviating the sample bias of online opinions. Second, although online signals converge as election day approaches, the simple Facebook"Like"is consistently the strongest indicator of the election result. Third, most influential events have a strong connection to cross-strait relations, and the Chou Tzu-yu flag incident followed by the apology video one day before the election increased the vote share of Tsai Ing-Wen by 3.66%. This research justifies the predictive power of online media in politics and the advantages of information fusion. The combined use of the Kalman filter and the event study method contributes to the data-driven political analytics paradigm for both prediction and attribution purposes.


INTRODUCTION
Recent years have witnessed the rapid development of social media and their innovative applications in many fields [1]. For instance, it has been found that the volumes of tweets related to protests on Twitter are associated with real-life protest events [2]. Moreover, film mentions on Twitter can reflect box office revenues [1]. Additionally, public moods extracted from tweets can predict changes in stock markets [3], and a real-time earthquake reporting system was developed by analyzing only tweets [4].
The unprecedented prevalence of social media has driven politicians to make use of this channel to propagate their ideas and political views [5][6][7][8] to more directly approach potential voters. It is not unusual to see election candidates post their daily activities and political ideas on social media and even debate on social media before and during the campaign. These behaviors can attract online discussion from massive numbers of netizens and, compared with traditional polls, are an easier way to gather wide-ranging public opinions about the candidates. Some research has shown the predictability of election results based on social media information in various countries and regions, including the United States [4,9,10,12], the United Kingdom [13,14], Germany [3], the Netherlands [16], and Korea [17],where netizens' behaviors and posts on social media were analyzed to infer the election results.
The existing research, however, usually exploits a single information source and uses simple descriptive statistics for election predictions, which easily results in hindsight bias and lacks generality. The way to ameliorate these issues is two-fold. On one hand, multiple sources should be included to obtain heterogeneous information for robust predictions. For instance, the keywords searched in Google represent the attention of the public, and the aggregated volumes can be used to predict the trends of influenza [18], stock markets [19,20], consumer behaviors [21], etc. On the other, massive heterogeneous data obtained in real time are often too chaotic to provide consistent predictions; therefore, a method that can fuse the data and deliver robust predictions is indispensable. Our work in this paper is a novel attempt on this front.
We take Taiwan's 2016 presidential election as a real-life case. Taiwan adopted direct election in 1996, and since then, Kuomintang (KMT) and the Democratic Progressive Party (DPP) have become the two major competing political parties. KMT pursues a "One China Policy" and the political legitimacy of the "Republic of China", whereas DPP takes "Taiwan Independence" as its party program. In 2016, three candidates ran for the presidential election, including Eric Chu from KMT, Tsai Ing-wen from DPP, and James Soong from the People First Party (PFP). The election regulations adopt the "one man one vote" principle and execute the majority rule [22]. This research leverages time series data collected from various mainstream online platforms (i.e., Facebook, Twitter and Google) and visitation traffic to candidates' campaign pages. These heterogeneous signals represent public opinions and are fed into a Kalman filter [10] to estimate the vote shares of each candidate dynamically. The most efficient signals are then identified based on the signal strengths characterized by the Kalman gain.
In addition to prediction, this research attempts to automatically identify the events that most influenced the election by leveraging the event study model [24] that originated in the field of financial research.
The results show that the prediction errors for every candidate one day, week, and month before the election are no greater than 2.59%, 4.58% and 5.87%, respectively. The results include some interesting findings. First, online signals appear to be more accurate than traditional polls in election prediction, although the polls can still function on mitigating the sample bias of netizens. In particular, a simple Facebook"Like" on a candidate's post is the most significant predictor, whereas the seemingly more informative "Comments" function is much less important. Second, online signals show clear convergence as the final election day approaches. For example, Google keyword searches fluctuated initially but became a strong indicator in the final stage. Third, bursty events most influential to the campaign have a strong relationship with the cross-strait relation topics. For instance, while the Xi-Ma meeting reduced support of Tsai Ing-wen by 0.55%, the Chou Tzu-yu flag incident followed by the apology video one day before the election increased her votes by 3.66%.

DATA AND MEASUREMENTS
To identify the most popular Internet applications in Taiwan, we referred to professional Internet surveys [1] and web traffic reports from Alexa, comScore and Digital Age (see SI, TABLE 1). We selected Facebook, Twitter, Google, and candidates' campaign homepages as the "online sensors" of public opinions towards the election and designed various daily updated measurements to characterize the signals during the period from Oct. 31, 2015 to Jan. 16,2016 consecutively. A 30-day moving average was applied to each measure to avoid excessive fluctuation.
Facebook. Facebook is the most popular social platform in Taiwan and provides an easy way for candidates to reach out to a large audience. For each post by a candidate, users can click the "Like" tag to indicate a positive reaction. Hence, we can use the "daily average number of Likes per post" to measure a candidate's popularity: where like c k−j,i is the number of Likes of post i published by candidate c on day k − j, n c k,FA is the total number of the candidate's posts, and m is the window length of the moving average. Analogously, we compute the "daily average number of Comments per post" for each candidate as another signal from Facebook: where Comment c k−j,i is the number of comments on post i published by candidate c on day k − j.
Twitter. We use three candidates' names in both Simplified and Traditional Chinese as keywords(See SI, Sect.1.2) to retrieve tweets from Twitter. The measurement "number of tweets mentioning the candidate" is calculated as where tw c k−j is the volume of tweets about candidate c on day k − j. Search Engine. We also obtained search data from Google Trends to trace the evolution of a keyword's search volume. We used the three candidates' names in both Simplified and Traditional Chinese as keywords and restricted the search source to Taiwan. The measurement "search index ratio" is defined as where search c k−j is the aggregated search indexes of keywords about candidate c on day k − j.
Campaign Homepages. We collected the daily traffic to candidates' campaign homepages data from Alexa, and used the "IP traffic ratio" as an opinion measure as follows: where IP c k−j is the IP traffic volume to candidate c's campaign homepage on day k − j. The above measurements convey different signals for continuous election prediction. We also collected offline election polls published by nineteen authoritative pollsters during the period from Aug. 1, 20151, to Jan. 16, 2016 (see SI, Sect. 1.1) for comparison. These polls were published aperiodically and infrequently, so we assume the opinions from a poll remain unchanged until a new poll has been released.

Vote Prediction Model
The goal of election prediction is to infer the underlying vote shares of various candidates based on heterogeneous noisy signals. A model that can fuse the signals in such a way to debias the prediction from noise and make dynamic predictions to reflect the evolution of public opinion is desired. We exploit the Kalman filter, a linear dynamic model, for this purpose. The filter was adopted in [26][27][28] for election analysis, but previous studies were mostly based on polls and assumed only two candidates.
In general, a Kalman filter maps hidden states to observed variables with noise, and the current hidden states are assumed to transition from previous states with noise. That is, where h k is a vector that maps the hidden state x c k to observed multiple signals in s c k , f k is the state transition coefficient, and x c 0 is the initial value of the hidden state. r c k and q c k denote independent Gaussian random noise.
In our case, x c k is the genuine vote share of candidate c on day k, and s c k = (s c k,GO , s c k,F AL , s c k,T W , s c k,IP ) contains the observed multiple signals. We set f k = 1 and h k = 1 for scale equivalence of the variables. The initial vote m c 0 is set as the latest poll result of TISR (see SI, Sect. 2.1), with p c 0 = 1 to allow fluctuation. Nevertheless, the final prediction is insensitive to the initial values when the time series is sufficiently long (see SI,Sect. 2.2 and Sect. 2.3).
The remaining challenge is to estimate the noise parameters R c k and σ 2 c,k . To reduce the model complexity, we assume R c k = R k and σ 2 c,k = σ 2 k , ∀ c. The maximum a posteriori estimation can then be obtained by maximizing the conditional density function: J = p(x tsai 1:k , x chu 1:k , x soong 1:k , σ 2 k , R k |s tsai 1:k , s chu 1:k , s soong 1:k ) with c x c k = 1 and c s c k = I 4×1 . Accordingly (see SI, Sect. 2.1), wherep j|j−1 is the estimated variance of x c j|j−1 .

Event Detection Method
Twitter, as an online plaza, aggregates information about different candidates during an election campaign. The volatility of tweets can thus signal influential events. A three-step detection method is designed as follows.
Step I is to perceive events based on massive numbers of tweets. To this end, we watch the statistic tw c k , i.e., the number of tweets about candidate c on day k, and trace its volatility in the past m days by comparing it with an upper bound u c k+1 =n + s √ m t α/2 (m − 1), wheren is the average of tw c k on m days and s is the standard deviation. Based on a t-test with significance level α, there exists an influential event if tw c k+1 surpasses u c k+1 . We assume that only one new event is dominant in each burst, which is reasonable for political campaigns.
Step II is to estimate the event time window. The daily tweets about each candidate are first integrated into a single document; then, the terms in the document are weighted by the tf-idf method(see SI, Sect. 3.1). The 30 terms with the highest weights in the burst are selected as the typical words for that event. We then proceed to check the overlaps of typical words on the burst day plus or minus five days. The first day with non-zero overlap is deemed to be the start day of the event, and the last day with non-zero overlap is the closing day, which defines the event time window (see SI, TABLE 9, TABLE 10, and TABLE   11). We remove suspicious events with a time window of one day.
Step III is to measure the impact of events on public opinion. We denote the estimated x c k initially transited from the previous day asx c k|k−1 (see transition function in (6)) and the final x c k calibrated with multiple signals asx c k|k (see mapping function in (6)). Intuitively, x c k|k has absorbed the information about all pertinent events on day k; hence, the change fromx c k|k−1 (equalingx c k−1|k−1 for f k = 1 and E(q c k ) = 0) tox c k|k indicates the impact of an event. To measure the significance of the impact, we apply the event study model [29] from the field of finance as follows: where D c j,k is a dummy variable equal to 1 if day k is within the time window of event j for candidate c and is equal to 0 otherwise. J c is the total number of events detected for candidate c, and a is a regression constant. γ c j is the estimator of the effect of event j on candidate c, which passes the t-test if event j has a significant effect on public opinion (see SI, TABLE 12,TABLE 13,and TABLE 14). In this way, we can identify the events that actually influence the election.

Prediction Performance
Figs. 1(a)-(c) show various online signals two months before election day. Intuitively, the user behavior in different channels is related to the public opinion towards a candidate, but the signals have vastly different volatilities. This justifies the value of information fusion for election prediction. i.e., s c k,F AL , s c k,T W , s c k,GO and s c k,IP , by the Kalman filter. Although the four signals behave differently, the fused signal representing the predicted vote share for each candidate is relatively stable and exhibits a clear tendency, confirming the effectiveness of the prediction system for information aggregation. The final result is impressive -while Tsai's win is easy to predict even in October, the prediction errors for every candidate one day, week, and month before the election day are no greater than 2.59%, 4.58% and 5.87%, respectively.    To further justify the predictive power of online signals, we also compare our results with offline polls. As shown in Fig. 3, during the last two weeks of the election, our predictions (M1) outperform most of the pollsters (P1-P10) greatly, and can improve continuously by absorbing up-to-date information. This is possibly due to the fact that the anonymity of the Internet enables individuals to express their opinions freely and voluntarily, which could reduce the bias relative to that in the tele-interview setting of a traditional poll.
Furthermore, currently, news usually breaks online first and then spreads at a tremendously fast pace from online to offline via physical social networks. Therefore, online information can also influence offline voting blocs during campaigns, which mitigates the bias effect of using only the netizen population in our method.
We also try to reduce the sample bias by mixing the prediction results from online signals with those from offline pollsters in older groups (see SI,Sect. 2.4). As shown in Fig. 3, the online-offline data fusion method (M2) indeed outperforms the online data fusion method (M1) in the early stage of the final two weeks, which indicates the power of sample bias correction. But the advantage disappears gradually as the final election day approaches, which again exposes the drawback of offline polls in responding to newly emerging information. In each interval between two gray dashed lines, there are two bars. The lower bar represents the absolute error of the online data fusion method, and the upper bar represents the absolute error of the online-offline data fusion method. The interval between two gray horizontal dashed lines indicates one day. The bars on the right side of the timeline show the prediction errors of the final polls from ten pollsters. Comparison of the bars on both sides shows that the absolute prediction errors of the signal fusion methods are smaller than those of the polls.

Signal Evaluation
We also explore the predictive power of various online signals via their daily Kalman gains. As shown in Fig. 4, Facebook "Likes" are consistently the strongest indicator among all the signals. This demonstrates the power of social media in collecting public opinions via a simple mechanism, although it is vulnerable to shilling attacks. The predictive power of the Google index appears to be time-sensitive, contributing less initially and becoming the second best indicator one month before the election. One possible explanation is that the election might not be a focal topic in the early stage of the campaign, making Google searches rather random. However, as the election day approaches, the campaign becomes the central topic and drives the public to search for information about the candidates. The two remaining signals, i.e., tweet volumes and homepage traffic, appear to be of much weaker predictive value, which may be due to their lack of popularity in Taiwan (see SI, TABLE 1) and diverse attitudes about candidates.
We further explore the distinct value of the "Like" function on Facebook. We compare it with the "Comment" function by substituting s c k,F AL with s c k,F AC in the Kalman filter. The results indicate that the prediction outcomes become significantly worse -the oneday-earlier prediction errors for Tsai and Chu increase to 5.42% and 4.86%, respectively(see SI,Sect. 2.5). These results indicate the superiority of "Like" over "Comment". To understand this result, we search for the population of Facebook users who have ever liked or commented on the candidates and obtain the overlapping users who have both liked and commented on a candidate. Fig. 5 shows that these users constitute only a small proportion of the "Like" users but a much larger proportion of the "Comment" ones. Therefore, a considerable proportion of users who have commented on a post may also choose to like the post but not vice versa. In other words, the "Like" signal represents the positive attitude of a much larger population than that of the "Comment" signal, which may be attributed to the fact that a "Like" is a more direct and widely engaged in behavior for online users to express their positive opinions without great effort. Another disadvantage of "Comment" lies in its diversity of expression, which can be a blend of contradictory attitudes, including support, praise, opposition and even insult (see SI, Sect. 2.6).
The overlapping users indeed constitute a group of firm supporters for each candidate who show their support by not only clicking "Like" but also going through the effort to publish comments. By further tracking the changes in the overlap ratios during the election, as shown in Fig. 5, we find that the ratio for Tsai is relatively stable, indicating that Tsai has a firm group of supporters regardless of her behavior during the campaign. By contrast, for Chu and Soong, the overlap ratios remain small until election day approaches, suggesting Tsai should partially attribute her success to her firm supporters rather than swing voters.
This also explains why we can predict the victory of Tsai two months before election day.

Influential Events
We apply the event detection method to each candidate's Twitter data to identify influential events. Fig. 6 shows the results and event descriptions. The most influential events detected with p-values less than 0.05 include the meeting between Xi Jinping and Ma Yingjeou (Xi-Ma Meeting), the emergence of negative comments on Tsai Ing-wen's Facebook homepage possibly by users from mainland China, and the Chou Tzu-yu flag incident. All these events share a common feature; that is, they all belong to the category of crossstrait relation, which is always subtle and controversial in Taiwan's political circle. Other seemingly important events from the perspective of the election campaign, such as the TV broadcast of the candidates' debates and various types of electioneering activities in local areas, have insignificant influences on public opinion. We further assess the influence level of the events, which is measured by the coefficient γ c j in (9). The Xi-Ma Meeting resulted in a 0.55% decrease in the vote share of Tsai Ing-wen.
This result is not surprising because Tsai was believed to favor Taiwan independence over the "One China Policy", and the meeting thus prompted the public to doubt Tsai's ability to handle cross-strait relations. This same event increased Eric Chu's vote share by 0.58% because he was thought to be more able to develop cross-strait peace after the meeting.
Despite the abundance of events during the campaign, the Chou Tzu-yu flag incident from the entertainment domain is the most influential. Chou Tzu-yu, a 16-year-old Taiwan singer, sparked huge controversy in social media for showing the Taiwan flag as the national The number of overlapping users accounts for less than 1% of all the users who have "liked" on average, with the maximum proportions being 3.51%, 3.74%, and 9.25% for the three candidates.
By contrast, the overlapping users constitute more than 37.16%, 14.90%, and 12.03% of all users who have commented, on average, for the three candidates, and the maximum ratios are 73.05%, 59.75%, and 83.01%.
flag of China. As the uproar intensified online, Chou's company released a video in which Chou apologized for her behavior by stating that "there is only one China" and identifying herself as Chinese. The most subtle point is that the video was released the day before the election, which was described as a humiliation to Taiwan and spread quickly in Taiwan's online social media. As a consequence, this incident increase the vote share of Tsai Ing-wen by approximately 3.66% and lowered the vote share of Eric Chu by approximately 2.62%.

DISCUSSION
The accurate prediction of Taiwan's 2016 Presidential Election suggests an interesting viewpoint that public opinions towards political campaigns can be determined via online user-generated content. This indeed coincides with some recent studies reporting that social media such as Facebook [6,8,9,12,13], Twitter [2-8, 13, 14, 16, 17] and Youtube [6] are able to aggregate public opinions about political matters. Donald Trump winning the 2016 US  Presidential Election was also considered to be a victory for the heavy use of social media such as Twitter [30]. Nevertheless, this finding remains controversial in academia, and the above studies have often been criticized for the unreliability of single-source information [31] and/or the unrepresentativeness of online user populations [32]. Our study attempts to address these concerns.
First, we introduce multiple online channels as different types of signals to produce more robust predictions. These signals, while reflecting more or less latent public opinions, have varied fluctuations due to their different sensitivities to campaign dynamics and possible fake responses from the Internet "water army" (see Fig. 1). The fusion of these signals can help to filter out some noise by consensus learning to highlight the tendencies. Moreover, although one signal might contribute more to some specific election prediction, such as the Facebook "Like" for the Taiwan election, it is unlikely to find it omnipotent for different elections. The fusion of these signals could help to mitigate the risk of selection bias. This information fusion scheme gives our study some important extensibility -the four channels, namely, Facebook, Twitter, Google Trends and campaign homepages, could be considered to be the fundamental and preemptive online information sources for different elections.
We also find that although selection bias of the online voting population exists, its influence on the prediction results is limited. Prediction based on pure online information is much more accurate than the polls released by Taiwan's mainstream pollsters (see Fig. 3).
The reason behind this may be two-fold. On one hand, online users who pay close attention to election campaigns likely become active voters and constitute a large voting population on election day [33,34]. On the other hand, we should not underestimate the information exchange between online social networks and offline physical networks [35,36]. Older people who seldom interact with the Internet still have access to online information via ordinary family communications or traditional media's reports on Internet opinions. This communication contributes to the opinion conformance across online and offline networks and further improves the representativeness of the online voting population. In fact, compared with traditional polls, which are susceptible to questionnaire wording [37], reporting error [38], ballot order [39], and social desirability bias [38,40], online big data enables a much larger sample and thus can improve the sample resistance to human manipulation. The real-time availability of online data, which enables the timely update of predictions based on continuously incoming information, is another major advantage relative to polls.
Our study also suggests that the Kalman filter with the event detection model (see Materials and Methods) could be packaged as a fundamental kit for political vote analytics.
Specifically, the Kalman filter is responsible for the dynamic prediction of vote shares given multi-source time-varying signals and multiple candidates. Meanwhile, the event detection model is responsible for the automatic identification of influential events during the campaign, which provides a causal explanation for the predictions. In other words, the two models together could provide interpretable predictions to political vote analytics, which is deemed particularly valuable for a big-data-driven research paradigm [41].
The Kalman filter has been adopted in previous studies but either for backward review given the final result or for forward prediction given multiple historical elections data. Our study shows that while we cannot obtain the true vote shares until election day, we can still fine-tune the model parameters by using up-to-date time series signal data for the current election, which solves the problems in leveraging the Kalman filter for election prediction.
Moreover, given the sum-to-one constraint in a statistical learning framework (see (8)), the Kalman filter is capable of building models for more than two election candidates. One may consider the inclusion of some other relatively stable factors, such as the globalization trend, economic status, the technology environment, etc., in the prediction model, which can be achieved by setting appropriate initial values of the Kalman filter. Nevertheless, our study shows that the Kalman filter is insensitive to the initial values as long as the prediction is based on a sufficiently long time series (see SI,Sect 2.2). In this case, the signals should have fully "absorbed" the influences of the macro factors.
Our study provides some political insight into the Taiwan presidential election. It is interesting that the simple "Like" function on Facebook collects the public opinions about candidates (see Signals Evaluation in Results), although it has been reported to be vulnerable to shilling attacks in electronic commerce [42]. The "Like" function is more beneficial than the "Comment" function, although the latter actually expresses more complex sentiments and richer opinions. This difference is attributes to the widespread use of Facebook in Taiwan (see SI, TABLE 1) and the easy-to-use characteristic and emotional unambiguity of the "Like" function. Another interesting finding is that the most influential events during the Taiwan election campaign are all closely related to cross-strait relations (see Influential Events in Results). In particular, in line with the findings in [43], the events more closely associated with public sentiment (such as the Chou Tzu-yu flag incident) appear to have a greater impact than those with merely political meaning (such as the Xi-Ma Meeting).
We provide accurate prediction and automatic causal analysis of the 2016 Taiwan Presidential Election, which illustrates the feasibility of applying a data-driven paradigm for political vote analytics. Although our focus is on Taiwan, the proposed signal fusion approach and the event detection model can be applied to other elections or referendums, especially those using majority rule. Considering the different Internet applications used across countries and areas, we may need to adjust the input online information sources and design new measurements for the new signals. Furthermore, we should consider how the election systems of particular countries or areas differ and require adjustment of the prediction model. For example, the US election system is not a direct election but relies on the Electoral College system with 538 electoral votes. Hence, we have to incorporate information about the states and locations of online users into the prediction. However, this information is often unavailable. Nevertheless, we can still consider online users as the voters for a "virtual" direct election and obtain the predictive results as the popular votes for the candidates, which could still indicate the winner if there is a large difference in vote share among candidates. The recent 2016 US Presidential Election demonstrates the power of voices on social media.

DATA
To measure the public opinions of the three candidates, we collected offline data from pollsters and online data from social media, search engines, and campaign homepages. TISR, which is conducted every 15 days using random-digit telephone surveys, is the most frequently updated poll during the election. Thus, we select TISR as the representative of the polls. In each survey, in addition to the overall vote preference of the candidates, the polls investigate the opinions with regards to six population age groups (i.e., the people aged between 20 and 30, 30 and 40, 40 and 50, 50 and 60, 60 and 70, above 70 years). In this study, we collect the support rate in each age group from the polls published by TISR [1].

Online Data
Generally, public opinions are concentrated on popular social media sites. To identify the most popular websites in Taiwan, we referred to professional Internet surveys (i.e., Internet Usage in Taiwan: Summary Report of October 2015 Survey [2]) and web traffic reports (i.e., Alexa, comScore and Digital Age). The top websites are listed in TABLE 1. We selected several sites primarily composed of user-generated content and classified them into three categories: social network (i.e., Facebook and Twitter), search engine (i.e., Google), and campaign homepages. We extracted the daily measurements from the above platforms as signals of collective opinions during the period of October 1, 2015 to January 16, 2016. To weaken the effects of measurement loss and violent fluctuations on prediction and to capture the trends, we apply a moving average to the measurements.The number of moving days used in the following analysis is 30.
Social Network. Facebook, which provides an easy way for candidates to reach out to a large audience, is the most popular social service platform in Taiwan. For each published post, users can "Like" or "Comment" on it to indicate their attitudes towards the candidate.
In our study, we use Facebook API to retrieve all the posts published by the three candidates during the election period, the corresponding timestamp, the number of Likes, the id list of users that pressed the "Like" button, the number of "Comments", the "Comment" content and user id, etc. To reflect the candidates' popularity in terms of Facebook "Like", we calculate the daily average number of "Like" per post for each candidate s c k,F AL as follows, where like c k−j,i is the number of Likes of the i − th post published by candidate c on day k − j, and n c k−j,F A is the number of posts published by candidate c on day k − j. To compare "Like" with "Comment"on Facebook, we also extract daily average number of "Comments" per post for each candidate as an indicator.
where Comment c k−j,i is the number of comments on the i − th post published by candidate c on day k − j.
Some recent reports [3][4][5][6] have suggested that there is a statistically significant correlation between Twitter and election outcomes in terms of volume and sentiment. Thus, although Twitter is not as popular as Facebook in Taiwan, we select it as a measurement of voting preference. By querying the Twitter API, we obtained 283412 candidate-related Twitter messages posted from 1 October 2015 to 16 January 2016 in real time. To focus on Taiwan public opinions, we only use party names, the three candidates' names and their morphs in Simplified and Traditional Chinese as keywords to retrieve tweets. The keyword list is presented in TABLE 2. By analyzing Twitter sentiment, we find that more than 80% of the retrieved tweets are news and do not represent public opinions. Thus, we use s c k,T W to represent the number of tweets mentioning a specific candidate, where tw c k−j is the Twitter volume about candidate c on day k − j. Search Engine. Search queries usually reflect information that users hope to obtain, which can be regarded as an indicator of their opinions and preferences. Thus, we also investigate Google, the most used search engine in Taiwan. We obtained the search data from Google Trends [7], which provides the time series of the search index for given words.
Google Trends not only allows users to view the search index in a specific region for given words but also to compare the search frequencies of multiple keywords. In our study, we take the candidates' names in Simplified and Traditional Chinese as keywords and further restrict the search source to Taiwan. Then, we calculate the ratio of search indexes s c k−j,GO as a signal reflecting the vote share for candidate c, where search_volume c k−j is the search index of keywords about candidate c on day k − j. Campaign Homepages. The three candidates set up campaign websites [8] to promote themselves and raise funds. The IP traffic of candidates' homepages can be used to determine their popularity. We collect the daily traffic data and take the proportion s c k−j,IP as a measurement of opinions about candidate c: where IP c k−j is the IP traffic of candidate c's campaign homepage on day k − j. The traffic data are from Alexa [9], which is a subsidiary of Amazon that provides commercial web traffic data for given sites.

Kalman Filter
The core of the Kalman filter is defined by the following set of equations: x r c k and q c k are independent Gaussian random noise with mean zero, observation covariance R c k and transition variance σ 2 k,c . Equation 17 is the starting value that sets the dynamic system in motion. The logic behind the set of equations is that the online measures are flawed signals with the true vote states represented by the mean with mixing noise.The goal of the model is to fuse the flawed signals to estimate the daily state and to further transfer the estimate to next day to make a prediction.
To recursively estimate the daily vote state at time k, the prediction of vote sharesx c k|k−1 is first derived by the state transition equation, a variation of equation 16: wherex c k|k−1 is the vote state prediction for candidate c at time k given the signals up to k − 1, andx c k−1|k−1 is the updated estimation of the vote state at time k − 1 given the signals up to k − 1. p c k|k−1 and p c k−1|k−1 are the prediction covariance and updated estimation covariance, respectively. Meanwhile, online measure s c k is observed. Then, it is feasible to update the state estimationx c k|k by absorbing the new signals s c k into the predictionx c k|k−1 .
We use a weighting function to express the combination of the state prediction and signals as follows: x where k c k is the Kalman gain, which is used to weight the state prediction and various signals in the fusion. p c k|k is the updated estimate covariance. By minimizing the updated state estimation error x c k −x c k|k , we can derive the Kalman Gain When the updated estimate is obtained, we can use equation 18 to predict the next-day vote share.
As shown in equation 18 to equation 22, daily vote state estimation depends on the linear dynamic system parameters, such as R k , σ 2 k,c , f k ,h k , x 0 c , and p 0 c . In particular, the two noise parameters R c k and σ 2 k,c determine the impact of the signals on the state update. To estimate the two parameters, we adopt maximum posterior estimations, which can be obtained by maximizing the conditional density function. As c x c k = 1 and c s c k = I 4×1 , there is a trade-off between the vote states and measurements of the three candidates, which implies that a change in one candidate's vote state is equal to the sum of the changes in that of the other two candidates'. To characterize this constraint, we assume that the noise of measurements and state transitions for the three candidates are the same, that is, R c k = R k and σ 2 k,c = σ 2 k . Then, we express the conditional density function as follows, According to the multiplication rule of probability, we have where n is the dimension of the measurement vector.
Similar to equation 24, we can readily obtain: Inserting equation 24 and equation 25 into equation 23 yields where Because J and lnJ have the same maximum value, we transfer equation 26 into its logarithmic form, Taking the partial derivatives of lnJ with respect to σ 2 k and R k gives and the maximum posterior estimations of σ 2 k and R k can be written as In the following, we construct the unbiased estimations for the variance σ 2 k and covariance matrix R k . Since the daily vote states x c j and x c j−1 are normally unavailable, a suboptimal estimator can be obtained by replacing them with the filtering estimationsx j|j andx j−1|j−1 and the predictionx j|j−1 ,σ Denoting the innovation vector by we have In view of equation 20 and equation 35, the following equation holds. x Thus, we have and Therefore, the unbiased estimate of σ 2 k iŝ To ensure thatσ 2 k > 0, we take a suboptimal estimate, The unbiased estimate of R k iŝ At time k, we use equation 41, equation 42 and the signals up to time k to update σ 2 k and R k and then apply the updated estimates to the next-day state calculations. In addition to the covariance of observation noise R k and the variance of state transition noise σ 2 k , the mapping vector h k , the state transition coefficient f k , and the initial state values x c 0 and p c 0 affect the state estimation. However, as shown by equation 41 and equation 42, those parameters determine the update of R k and σ 2 k , which indicates that it is not feasible to estimate them simultaneously via the maximum posterior method. Instead, we set the parameters as follows based on some assumptions.
Mapping vector h k . Since the signals are normalized, the scale of the signals and that of the state estimations are the same. Thus, h k is set to be a unit vector.
State transition coefficient f k . We introduce signals and update prediction at a daily frequency, which makes it possible to absorb the latest information into the prediction.
Therefore, we assume that daily prediction is equal to the previous updated state estimation.
The state transition coefficient f k is set to be 1.
Initial vote share m c 0 . At the beginning of the prediction, there is little information about the three candidates' vote ratios. We use the latest poll result of the pollster TISR as a rough estimation of the vote share. We also change the value to the mean of each candidate's signal and an equal value m c 0 = 1/3. However, the changes have almost no effects on the prediction results. Details are provided in 2 2.2.
Initial state variance p c 0 . The initial state variance reflects our belief of the state value x c 0 , which is a rough estimate. Thus, we make the variance large, p c 0 = 1. We test different settings and find that the fusion result is not sensitive to changes. The test results are shown in 2 2.3.

Robustness Tests for Initial Candidate Vote Ratio
The initial candidate vote ratio m c 0 starts the prediction process. Since the candidate vote preference is unknown at the beginning of the prediction, we make a rough estimate. To test the influence of the initial estimate on the prediction, we examine three different sets of parameters. First, we set m c 0 = 1/3. The logic of this setting is that when little information about the candidates' vote ratios is available, we assume that the three candidates receive equal vote shares. However, in contrast to the unknown vote ratio of the first setting, we believe that polls and social media signals provide clues about candidates' popularity.
Therefore, in the second and third settings, we take the latest poll results of the pollster TISR and the mean of social media signals as the initial state values, respectively. The results are shown in FIG. 7, FIG. 8, and FIG. 9.
All the figures show that the predictions under the three settings are consistent with respect to both trend and numerical value. Since mid-November 2016, the differences between the three cases are no greater than 5%. On election day, the differences are no greater than 3%. The small differences between the three cases indicate that the prediction is insensitive to the initial state value, especially more than one month after the start of the prediction. 10 9. The predicted vote share of James Soong based on three sets of initial vote ratios. After November 4, 2016, the differences between the three predictions are no greater than 5%. On election day, the differences are less than 2.70%.

Robustness Tests for Process Noise
As described in Sec. 2 2.1, the daily state predictionx c k|k−1 depends on the updated previous state estimate. Then, the prediction is fused with the signals s c k to update the current state estimationx c k|k . As shown in equation 20 and equation 22 [10], the fusion weight primarily relies on the process noise p c k|k−1 and the measurement noise R c k . The process noise p c k|k−1 can be iteratively calculated as follows, As σ 2 k,c and R c k are derived by maximum posterior estimation, the fusion weight can be obtained as long as p c 0 is determined. We consider two parameter sets for p c 0 : First, we set p c 0 = 0. This setting assumes that the variance of the initial state distribution is 0, which implies that we are relatively sure of the mean value of the initial state.
Second, we set p c 0 = 1. This setting assumes that the variance of the initial state distribution is 1. Because the state value is greater than 0 but less than 1, the setting implies that we are relatively uncertain about the mean value of the initial state.
The time series of the prediction for the three candidates based on the above two parameter sets are shown in FIG. 10-FIG. 12. In the three figures, the curve of the first setting is almost coincident with that of the second setting. This coincidence indicates that the daily prediction is not sensitive to the setting of p c 0 .

Online and Offline Data Fusion Method
In the results part of the main body, we compare our predictions with those of the polls and the outcomes of the online and offline data fusion method. The comparison is shown in Fig. 2 in the main body, which illustrates that our online data fusion method is superior in terms of prediction accuracy. However, due to the space constraints, we do not elaborate on the comparative method, the online and offline data fusion method. We now present the detailed results.
According to the Internet survey report of Taiwan [1], more than 90% of Taiwan where w i is the population proportion of age group i, which is obtained from the Ministry of the Interior of Taiwan [11]. z c i,j is the most recent TISR poll result of age group i for candidate c on day k.
The results are shown in FIG. 13. The errors of the online-offline fusion method are greater than those of the online fusion method, which range from 0.23% to 3.07%. 10/31 11/7 11/15 11/21 11/27 12

Prediction Based on Comment Volume
To compare the use of "Like" and "Comments" in election prediction, we substitute s c k,F AL with s c k,F AC in the Kalman filter. The results are plotted in FIG. 14. The results indicate that the prediction outcomes become worse. The prediction errors for the three candidates are 5.42%, 4.86%, and 0.56%. The figure also indicates that there is a negative correlation between the number of comments and the number of "Likes" before December 14, 2015. However, after that point, they are consistent in time. This change also provides evidence that the various types of signals become consistent as election day approaches. 10/31 11/7 11/15 11/21 11

Topics in Comments
To compare the behavior patterns of overlapping users and users who only commented on the candidates, we apply Latent Dirichlet Allocation (LDA) model to extract topics from their corresponding comments. The results are presented in TABLE 3 to TABLE 8. The representative words from the topics of overlapping users are mainly supportive attitudes, while the words from the topics of users who only commented on candidates are mixed, with both positive and negative words. The negative words are marked in red.

Event Detection and Event Study
We adopt an event study to measure the effects of campaign events on public opinion.
First, we use daily Twitter volume to detect bursty days. Twitter is an online platform that aggregates information about candidates during the campaign. Therefore, the number of tweets related to candidates is a signal that is related to events. We select candidate keywords to retrieve tweets via Twitter API, as illustrated in Section 1, and then count the number of tweets tw c k for each candidate c on each day k. Applying moving average model, we calculate the confidence intervals for daily Twitter volume u c k+1 as follows: wheren is the average of tw c k for the previous m days and s is the standard deviation. Based on a t-test with significance level α, there exists an influential event if tw c k exceeds u c k . Here, we set m = 7 and α = 0.05. Daily Twitter volume and the confidence interval are plotted in FIG. 15. Second, we estimate the event timespan of each detected burst. The daily tweets for each candidate are first integrated into a single document, and the terms in the documents are weighted by the tf − idf method. tf − idf is a numerical statistic that is intended to reflect how important a word is to a document in a collection of corpora. The tf − idf value increases proportionally with the number of times a word appears in a document but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general. tf − idf is calculated as follows, where f t,d c k is the term count of t in daily Twitter document d c k of candidate c at time k. D c is the total Twitter document of candidate c, N c = |D c |. |d c k ∈ D c : t ∈ d c k | is the number of documents in which the term t appears.
We extract the top 30 words as keywords at the burst. Then, for each bursty day, we compare the keywords to those of each day from day k − 5 to day k + 5. If any keywords  from the bursty do not appear in the time window, the event is eliminated. Otherwise, we take the first date that any of the keywords appear as the start date of the event and the date that any of keywords no longer appear as the end date. If the keywords on the bursty day also appear in the keywords for day k − 5 (or day k + 5), we continue to check the keywords of the 5 days before day k − 5(or the 5 days after day k + 5) . The results are   shown in TABLE 9, TABLE 10, and TABLE 11. In the three tables, the column 'timespan' represents the event window derived by keyword matching. The column 'event' provides a descriptive summary of events based on the keywords. The 'Checking date' column is the dates with which the keywords coincide with those of the bursty days. Dates with extended event timespans (that is dates for which the search continued to the five days before k − 5 or the five days after k + 5) are marked in purple. The overlapping keywords are listed in the last column. Bursty days detected based on Twitter volume are highlighted in red.
To propagandize their political views, the three candidates held two television debates on December 27, 2015 andJanuary 2, 2016. However, we only obtain Twitter references to Tsai Ing-wen during the first debate. To examine and compare the influence of the two debates on perceptions of all three candidates, we modify the debate timespan of Tsai Ing-wen and James Soong to be the period from December 27, 2015 to January 4, 2016, which coincides with that of Eric Chu.
After determining the events and their timespans, we calculate the impact of events on public opinion using vote state estimates derived via Kalman filter. We augment the state transition equation with dummy variables to capture the effects of events. We use a dummy variable to D j,k to indicate each detected event: where D c j,k = 1 during the event timespan of event j of candidate c, and D c j,k = 0 otherwise. J c is the number of detected events of candidate c. The coefficient γ j captures the event effect, which is an estimate of the average effect across event j. If the event has a significant effect on public opinion, then γ j will pass the t-test. It is worth mentioning that p c k is equal to the posterior estimate of the state valuex c k|k .
The results are shown in  [15]). BBC News cited remarks of Xiao Xinhuang, a Taiwan sociologist, and reported that "Ms Tsai would have won even if the video hadn't been posted, but the incident may have contributed another one or two percentage points" [12]. In addition, traditional pollsters also investigated the influence of the incident. TVBS announced on January 20th that "Polls show that about 500 thousand voters, accounting for 4%, who previously did not want to go out to vote might show up to cast their ballots to Taiwancentric candidates" [16]. Taiwan Think Tank also released a poll on January, which reported that 11.9% of the respondents were influenced by the incident in the legislative election, and 11.4% of the respondents were influenced in the presidential election [17]. The meeting provoked diverse responses from parties, civil society and countries. Ma Ying-Jeou said that "this meeting with Xi is not aimed at boosting his personal legacy after he steps down in May 2016, nor is it intended to salvage the flagging ruling Kuomintang campaign in the runnup to the Jan.16 presidential election, but is designed entirely for the good of the next generation" [18]. But the presidential candidate and incumbent chairwoman of the DPP Tsai Ing-wen countered by stating that this meeting "is a manipulation of the January elections and labelling the decision-making process as opaque" [19]. Tank polls show that the meeting was supported by a majority.  Continued on next page  Continued on next page         ***, **, and * indicate that the coefficients are significantly different from zero at the level of 1%, 5%, and 10%, respectively.