2.1 Identification of the discursive communities
Users can interact on Twitter in different ways: for example, one can retweet the content of another user, hence endorsing it [43] and raising the content visibility; in order to infer the membership of the various accounts, in the present paper we leverage on this activity, following the procedure adopted in [17, 18].
2.1.1 Discursive communities of verified users
On Twitter there are essentially two kind of accounts: the ones that are verified and whose authenticity is certified by Twitter itself—and belonging to journalists, politicians, VIPs or being the official accounts of ministries, political parties, newspapers and TV-channels—and the ones that are not verified. About the former ones, we have the largest available information: interestingly enough, verified accounts are more devoted to product original posts than sharing existing ones [6]. Indeed, these accounts act like seeds, proposing new arguments for the public debate.
First, we divided the users of our data set into two groups: the verified and the non-verified accounts. Then, we represented the system as a bipartite network, where verified users are gathered on one layer and the non-verified users are gathered on the other; an edge between vertices of different layers indicates that one has retweeted the other’s content at least once during the period of study. To infer the membership of the verified users to a certain discursive community, we projected the bipartite network onto the layer of verified accounts. The procedure consists in counting, for each pair of verified accounts, how many non-verified users have retweeted both of them. The rationale is the following: the largest the number of non-verified users interacting (via tweet or retweet) with the same couple of verified accounts, the greater the possibility that the two are perceived as similar by the audience of unverified ones [17]. Nevertheless, the sole information regarding the number of common neighbors is not enough to state if the two verified accounts are similar: two users may have a great number of common neighbours just because they are both extremely active on Twitter or because their nearest neighbours are among the most active unverified accounts. The statistical significance of the number of common neighbours between two verified accounts can be evaluated by comparing it with its expected value, according to a null-model [32]; once the amount of common nearest neighbours is deemed as statistically significant, we can connect the considered couple of nodes in the projected network.
In the present case, the adopted benchmark is the entropy-based null-model constraining the degree sequences of the bipartite network i.e. the Bipartite Configuration Model (BiCM [25]). The details about the whole procedure, i.e. the null-model construction and the validation, can be found in Sect. 3.
The result of the projection is a monopartite network of 3786 edges and 576 different verified users; we used the Louvain algorithm [44] to detect the various communities.Footnote 1 The algorithm provided different communities of verified users with an overall modularity equal to 0.61. Discursive communities with a clear political orientation were already detected in other works [17, 18] but the arguments studied there (i.e. the 2018 Italian elections and the political debate about migration policies) were political in nature. Remarkably, as already observed in [6], the political discursive communities shape even the wider debate targeting the Covid-19 pandemic—and including different health, scientific, societal, economic and political facets. In order to gain more insight on the partition and spot the presence of sub-communities, we re-run the Louvain algorithm inside each one of the largest groups: as a result, we individuated five major modules that can be associated with the main Italian political parties (more details on the Italian political scenario and the identity of the verified users mentioned below can be found in Appendix A):
-
The M5S community contains 85 accounts of supporters and politicians of the Movimento 5 Stelle party. This community includes the official account of the movement and the accounts of personalities like Beppe Grillo, Luigi Di Maio and Virginia Raggi. Interestingly the account of the former premier Giuseppe Conte is in this community. There are also some official accounts of ministries (like Ministero della Giustizia and Ministero del LavoroFootnote 2) and newspapers and TV-channels like Il Fatto Quotidiano and Report Rai 3.
-
The right-wing (DX) community is constituted by the supporters and the politicians of the right-wing parties Lega Nord and Fratelli d’Italia. This is much smaller than the previous one (only 32 elements) and contains the accounts of Matteo Salvini, Giorgia Meloni, Lorenzo Fontana, Vittorio Sgarbi and the Russian embassy.
-
The Democratic Party (Partito Democratico, or PD) community contains politicians and supporters of the main center-left party. It contains 37 nodes among which the official accounts of politicians as Nicola Zingaretti, Paolo Gentiloni, Enrico Letta and the party official one (pdnetwork).
-
The Italia Viva (IV) group contains accounts of politicians affiliated to the homonym center-left wing party. Here we can find the official account of the party and the accounts of Matteo Renzi, Ivan Scalfarotto and Maria Elena Boschi. Interestingly, we also signal the presence of Roberto Burioni, one of the most popular Italian virologists nowadays, particularly active on the popularization on subjects related to the pandemic. The group contains 24 accounts in total.
-
The Forza Italia (FI) community contains only 11 accounts and all of them are of politicians affiliated to the Forza Italia center-right wing party; for example, it contains accounts like Silvio Berlusconi, Antonio Tajani and Renato Brunetta.
In addition to the political groups described above, we also considered the MEDIA community, i.e. the community that contains the official accounts of newspapers (like Repubblica and Agenzia Ansa), TV-channels (like La7TV), radio and other media. This group contains 33 verified accounts.
The other discarded discursive communities are mainly very small groups with less than 5 elements. Only three of them result more numerous: the first and bigger one (38 elements) contains accounts of Italian sports journalists and newspapers, football players and clubs (for instance, the official accounts of Sky Sport, the football club AS Roma or the player Giorgio Chiellini); the second one (19 users) consists in accounts related to the digital world, like the official account of IBM, TIM or XIAOMI; the last and smaller one (10 elements) includes accounts of Swiss politicians and media. Given their nature, we speculated that these accounts (and people interacting with them) do not actively participate to the political debates created around the topic of the Covid-19 epidemic; therefore, we chose to not include them in our analysis.
The Largest Connected Component (LCC) of the validated network of verified users is shown in Fig. 1. The main communities are depicted with different colors.
2.1.2 Political orientation of non-verified users
Once verified accounts are associated to the various discursive communities, following the approach of. [43], we can infer the membership of unverified ones by considering their interactions in the retweet network. As in [6, 18, 19], we use the membership of verified users as (fixed) seeds for the label propagation proposed by Raghavan et al. [46]. Let us remind that in case this algorithm cannot find a dominant label for a specific vertex (in case of a tie), it randomly removes some of the edges attached to that vertex and repeats the procedure; due to its intrinsic stochasticity, we run the label propagation 500 times and assigned to each node the most frequent label (actually, the noise in the assignment of the labels is extremely limited): as a result, approximately 89% of the users in the network have been inserted in one of the 6 discursive communities described in the subsection above. They are distributed as follows:
-
117,798 users in MEDIA group;
-
27,989 users in DX group;
-
7230 users in M5S group;
-
1685 users in IV group;
-
1408 users in PD group;
-
430 users in FI group.
As expected, the MEDIA community is the biggest one: it represents users who considerably share news from the accounts of newspapers, radios or newscasts. Looking at the political groups, it is interesting to see that the M5S group contains less elements than the DX community: by considering only verified accounts, the M5S community includes more than 1.5 times the total number of users of the DX one. The center-left wing (PD and IV) and FI communities are quite small; the vertices with the largest degree are mostly verified accounts, belonging to the MEDIA group (e.g. newspapers or press agencies as La Repubblica, La Stampa, Ansa).
The most retweeted accounts are those of Giorgia Meloni (DX community) and Roberto Burioni (IV community). Remarkably, the vertex with the largest degree, that the label propagation algorithm assigns to DX community, is a non-verified user whose number of neighbors amounts at 27,000 and whose activity is that of sharing news everyday.Footnote 3 It often shares racially-motivated news that the debunking web-site Bufale.net has identified as lacking in sources.
Overall, our results confirm the ones observed in other works, i.e. that communication on Online Social Networks (OSNs) is characterized by a strong polarization, in turn inducing a strongly modular system [6, 16–19, 43, 47–49].
2.1.3 Social-bots
Social bots, or simply bots, are social accounts governed—completely or partly—by pieces of software that automatically create, share and like contents on Twitter and other platforms. In general, the usage of automated accounts is allowed by Twitter platform for promotional purposes by various companies (see Twitter Developer’s Automation Rules). Nevertheless, bots often pretend to be human accounts and aim at influencing and diverting the course of discussions by inflating the visibility of some genuine accounts [10, 14].
In the present manuscript we used Botometer [14], a tool based on supervised machine learning: given a Twitter account, Botometer extracts over 1000 features and produces a classification score called “bot score”: according to the algorithm, the higher the score, the greater the likelihood that the account is controlled completely or in part by a software. Botomoter revealed that in our data set social-bots shared 52,054 different tweets (approximately the 3%–4% of the entire data set) with 74,884 hashtags. The most used hashtags by bots are: #iorestoacasa, #italia, #news, #quarantena and #conte.Footnote 4
In Fig. 2 the temporal evolution of the number of tweets shared by social-bots during the period of study is displayed. Remarkably, this trend is similar to the trend for the entire set of users, except for the presence of two peaks, on the 14th and on the 17th of April. The intense activity of bots in these two days is linked to the discursive community of MEDIA. Indeed, the 78% and 76% of their retweets in April 14 and 17 respectively, were directed to accounts of this group and only the 16% to users of the DX one (much lower percentages in the case of the other communities). Among the most posted hashtags by bots, there are some references to Coronabonds, the prime minister at the time of the data collection Giuseppe Conte and the right-oriented Italian party of Lega Nord. Literally, in the mid of April, the political debate in Italy became more intense, in particular about the European Stability Mechanism (ESM). The ESM is an international organization born as a European financial fund for the financial stability of the euro area; in those days the Italian government was considering the possibility of using these funds in order to limit the impact of the pandemic, analysing the possible consequences. In particular, right-, center-right-wing parties and M5S were against its usage, while both PD and IV were in favour of it. In particular, the prime minister Giuseppe Conte proposed to avoid the usage EMS in favour of the European Bonds (or Coronabonds). Lega, still against the usage of ESM, in the European parliament voted against Coronabonds and so it was strongly criticized. The extremely technical nature of the problem could explain the lower interest of the other discursive communities respect to that of the MEDIA group.
Then, we looked at the interactions of social bots with the political groups identified before; in particular, we saw how many bots retweeted contents from verified accounts: while most of the retweeted accounts by bots belong to MEDIA group (e.g. La Repubblica, Agenzia Ansa, Sky TG 24), the most retweeted people are Roberto Burioni and Giorgia Meloni. In general, bots interact most with the MEDIA group; then, they followed the verified accounts of M5S, DX, IV, PD and FI.
The community with the largest percentage of social-bots is the FI community with 21.9%, followed by IV with 10%, PD with 8.4%, M5S with 6.5%, MEDIA with 5.3% and DX with 4.1%. When absolute numbers are considered, a strong prevalence of bots from MEDIA community appears, followed by those of the DX community, see Fig. 3. More details about the temporal evolution of the activity of automated accounts can be found in the Appendix B.
2.2 The semantic network
Let us now analyse the evolution of the narratives characterizing the online debate during the first peak of the contagion. As in [19, 20], we start from the bipartite network of accounts and hashtags, where a link between the user u and the hashtag h is present if user u has used hashtag h at least once. Then, by using the same procedure implemented to determine the discursive communities of the validated accounts, we can extract the (validated) semantic network. As already mentioned in the Introduction, the approach that we follow in the present manuscript is slightly different from the one used in [19, 20]: there, the authors analysed the semantic network defined by each discursive community, while here we consider the “global” semantic network and how the various discursive communities interact with it. The resulting network is formed by 5666 different hashtags, linked by 90,560 connections.
Interestingly, even if the main topic is not strictly political, the most connected hashtags, i.e. those with the highest values of the degree, refer to political parties and politicians: #pd, #oms, #m5s, #lamorgese, #regione, #lazio, #dimaio, #governo, #zingaretti, #mes and #conte.Footnote 5
We, then, run the Louvain algorithm again to detect the various semantic communities; the algorithm provided 61 different communities of hashtags (with modularity \(Q\simeq 0.56\)). We just focused on the most populated ones (see Fig. 4). The biggest communities refers to some of the most debated themes and subjects during the pandemic, in particular:
-
the Red community contains mostly political subjects: here we can find the name of the governing political parties at the moment of the data collection. In this sense, heavy criticisms towards the Prime Minister are present;
-
the Violet community includes subjects related to the Catholic Church and to the Pope Francis I;
-
the Yellow community is the most crowded one with pieces of news related to either the local (e.g. at regional level) or the global response to the epidemic;
-
in Blue, we find updates of the Covid-19 situation (number of deaths, number of contagions, etc.);
-
the Cyan community includes hashtags related to trade unions, remote working and to actions adopted by the government at the time of data collections to sustain the employment. Those arguments quickly became hot topics, since the Covid-19 had a heavy impact on the employment and forced firms to take countermeasures such as remote workings [7];
-
the Green community includes hashtags related to sports—in particular football, the most followed sport in Italy—that, as many other activities, had to stop.
More details about the hashtags in the various communities of the semantic network can be found in Appendix C.
2.2.1 Temporal activities over the semantic network
After the identification of the most important topics within the main communities, we examined the temporal evolution, on a daily scale, of the number of published hashtags belonging to each community (see Fig. 5). Tracking these temporal behaviours is important for understanding which events may have caused an increasing Twitter activity about a specific topic. By looking to the peaks in the temporal evolution, the first thing that catches the eye is that all the trends are upwards, indicating that the Twitter conversations about the Covid-19 became more intense since middle April. We identified some events in specific days, which are strictly related to the main topic of the community in exam:
-
17/04/2020: in this day there was the vote on the activation of “corona bonds” (joint debt issued to member states of the EU) at the European Parliament. The parties of Lega Nord and Forza Italia voted against and this caused a lot of comments also on Twitter. There is a peak in this day in the yellow community, which contains also the hashtags #coronabond, #eurobond, #lega and #salvinisciacallo.Footnote 6
-
19/04/2020: in this day there is a peak in the curve of the violet community. In that day the hashtag #25aprile (i.e. the Italian liberation day from nazi-fascist occupation) was published many times. In particular, some statements of the senator Ignazio La Russa, about the nature of the commemorations in that day, caused debates and controversies on Twitter.Footnote 7 Indeed, also the hashtag #ignaziolarussa is contained in the violet community.
-
20/04/2020: the so-called “second phase”, during which less severe measures have been implemented and some shops and workplaces reopened, started this day. The cyan community contains the hashtags #fase2 (phase 2) and consequentially its trend shows a peak on 20/04. In this day a tweet of the US President at the time of data collection, i.e. Donald Trump, declared that he would sign an executive order for suspending immigration in the United States to stop the virus and this announcement caused comments and debates on social networks. Again in the yellow community, which contains also the hashtag #trump, there is a peak in this day. Also the green community shows a maximum on the 20th of April, due to the statements of Sport Minister Vincenzo Spadafora in which he expressed doubts about the resumption of the Italian football championship Serie A.
-
21/04/2020: the temporal evolution of the red community, i.e. the “political” one, shows a peak in this day. The most-shared hashtag of this community, in this day, is #quartarepublica, an Italian TV-program. The day before, the main host of the program was Silvio Berlusconi, leader of the “Forza Italia” party and former Prime Minister. Other hosts were the Governor of Veneto region Luca Zaia, the Councilor for Welfare of the Lombardy Giulio Gallera and the Mayor of Naples Luigi De Magistris. As observed in similar studies [19, 20], right and center-right wing users are particularly active on mediated events, i.e. public events as TV interview that are heavily covered by users on Twitter.
2.2.2 Semantic activity of the discursive communities
Figure 6 shows the semantic activity of the various discursive communities, including both verified and unverified users. In the DX group (and, similarly, also in the FI group) there is a sharp prevalence of the red community, due to the presence of hashtags against the government. For what concerns the M5S and the PD, the yellow community is the most shared one. Within the MEDIA group, hashtags are homogeneously distributed among violet, yellow, blue and cyan communities.
We repeated the same analysis for social-bots and observed that hashtags by bots are quite evenly distributed across all communities (with the exception of the green one): still, social-bots interact mostly with the verified accounts of MEDIA group and, therefore, they share news and contents of different types. In general the situation changes when considering only verified accounts: more details can be found in Appendix D.
2.2.3 Tracking conspiracy theories and d/misinformation campaigns
In the literature, “disinformation” and “misinformation” have different meanings, both referring to the spread of false information: the former concerns deliberate diffusion, while the latter refers to an unintentional mechanism [50, 51]. At the present level, we cannot distinguish between the two different natures, thus we always use the term d/misinformation. One of the most interesting aspects of our analysis is the possibility of investigating the spread of forms of d/misinformation online. We have identified 2 sub-communities of hashtags related to d/misinformation campaigns regarding the origin and the diffusion of the coronavirus.
Looking at the hashtags of the first community, connections between Bill Gates, vaccines, 5G, nano-/micro-chips and naturally the Coronavirus, emerge.Footnote 8 Indeed, one of the most widespread false claims seems to be the theory for which the pandemic is a plan masterminded by Bill Gates to implant microchips into humans along with a Coronavirus vaccine. Other interesting connections are those between the hashtags #colao and #montagnier: the first refers to Vittorio Colao, ex CEO of Vodafone and new director of the task force formed by the premier Giuseppe Conte, and the latter refers to the Nobel prize for medicine in 2008, Dr. Montagnier. Conspiracy theorists extracted one phrase from a 2019 video, in which Colao said something about a “medical substance” that could be injected thanks to 5G. Instead, Dr. Montagnier stated in an interview that the spread of Coronavirus was a human error by scientists trying to develop a vaccine, precisely against AIDS (in other sub-communities we found also hashtags like #HIV and #AIDS). There are also some references to “Immuni” App, which is the application developed by Bending Spoons company, appointed by the Italian government for contact-tracing in order to control the spread of the epidemic; shortly after the release of the app, there were worries about privacy and some users argued that the app was created for spying people.
The second community is about the creation in laboratory of the virus by Chinese scientists.Footnote 9 We plotted the temporal evolutions of the daily number of published hashtags, also for these communities in Fig. 7, trying again to identify those events that may have caused an increasing attention on Twitter about these topics. For the community about 5G there is a peak on April 21st and 22nd; on the former there was a trial in the Hague court against the Dutch government for the introduction of 5G brought by the group “Stop 5GNL”. Moreover, rumors about Bill Gates started to spread from middle April, when conspiracy theorists used a “TED Talk” from 2015 in which Gates warned that the world is not prepared for an epidemic, to confirm their theories. The second community has a peak on the 16th of April when a news report was published about the American intelligence investigating the alleged creation of the Coronavirus in laboratory in Wuhan.
Confirming the results of Ref. [2], those conspiracy communities represent a minority in the semantic network: the hashtags included in the conspiracy communities are respectively 62 and 15, over a total of 5666 different hashtags. Even in terms of their popularity, their impact is limited: the first conspiracy community was shared 10,452 times and the second one 1287, against a total of nearly 602,299 messages containing at least one hashtag.
Remarkably, not all discursive communities share conspiracy hashtags in the same way. In Fig. 8 there are the fractions of users of the 6 different discursive communities listed in the previous paragraph, which shared the hashtags of the three communities of d/misinformation. The DX community is the one most affected by d/misinformation, followed by MEDIA group which however contains much more users than the DX one. Moreover, the discursive communities not included in our analysis do not participate at all to the sharing of d/misinformation contents. Even in this sense, our analysis confirms the findings of Ref. [6]: there the Non Reliable sources, as tagged by NewsGuard, were almost exclusively shared by DX community (in Ref. [6] MEDIA community was not analysed).