Arab reactions towards Russo-Ukrainian war

The aim of this paper is to analyze the Arab peoples reactions and attitudes towards the Russo-Ukraine War through the social media of posted tweets, as a fast means to express opinions. We scrapped over 3 million tweets using some keywords that are related to the war and performed sentiment, emotion, and partiality analyses. For sentiment analysis, we employed a voting technique of several pre-trained Arabic language foundational models. For emotion analysis, we utilized a pre-constructed emotion lexicon. The partiality is analyzed through classifying tweets as being ‘Pro-Russia’, ‘Pro-Ukraine’, or ‘Neither’; and it indicates the bias or empathy towards either of the conflicting parties. This was achieved by constructing a weighted lexicon of n-grams related to either side. We found that the majority of the tweets carried ‘Negative’ sentiment. Emotions were not that obvious with a lot of tweets carrying ‘Mixed Feelings’. The more decisive tweets conveyed either ‘Joy’ or ‘Anger’ emotions. This may be attributed to celebrating victory (‘Joy’) or complaining from destruction (‘Anger’). Finally, for partiality analysis, the amount of tweets classified as being ‘Pro-Ukraine’ was slightly greater than Pro-Russia’ at the beginning of the war (specifically from Feb 2022 till April 2022) then slowly began to decrease until they nearly converged at the start of June 2022 with a shift happening in the empathy towards Russia in August 2022. Our Interpretation for that is with the initial Russian fierce and surprise attack at the beginning and the amount of refugees who escaped to neighboring countries, Ukraine gained much empathy. However, by April 2022, Russian intensity has been decreased and with heavy sanctions the U.S. and West have applied on Russia, Russia has begun to gain such empathy with decrease on the Ukrainian side.


Introduction
The Ukrainian crisis is one of the most complicated and unfortunate events of this decade with many aspects to be considered to have an informed opinion about.Social media platforms (e.g., Facebook, Twitter, etc.) are currently the main data source for public opinion analysis [1].A great deal of work has been done to analyze public opinion on ongoing affairs and to study the influence of such events on people.For instance, the authors in [2] showed that the 2012 Olympic Summer games, held in London, increased the life satisfaction and happiness of Londoners during the Olympics period, particularly around the opening and closing ceremonies.There were no consistent changes (either positive or negative) in anxiety during this period in comparison to residents in neighboring cities such as Paris and Berlin.
Amid the Brexit controversy, the researchers in [3] studied the public attitudes towards the EU (European Union) testing the effect of "real world" arguments on both sides of the campaign that attempted to influence the vote through pro-EU or anti-EU messages.Their main finding was that the pro-EU arguments had the potential to significantly increase the support for "remaining" in the union whilst the anti-EU arguments had less potential to impact the support for either "remaining" or "leaving".
As mentioned in [4] Twitter data are an important source for studying public response, and thus utilized to examine COVID-19 related discussions, concerns, and sentiments emerged from tweets.The results indicated that the dominant sentiment for the spread of coronavirus was "anticipation" followed by mixed feelings of "trust", "anger", and "fear" for different topics and significant feelings of "fear" when new cases and deaths were discussed.In [5], the authors presented a Twitter dataset of the Russo-Ukrainian war.The majority of the tweets are written in the English language (about 60%).Until the day their paper was written (7th of April, 2022) the dataset has reached 57.3 million tweets written by 7.7 million users.
In [6], the authors also provided a Twitter dataset of the Russo-Ukrainian conflict.The data collection process was not filtered by any language or geographical location.Thus, the dataset includes tweets in several languages from different regions.The authors did some descriptive analysis over this dataset.For example, an analysis for the daily volume of the tweets revealed that an average of about 200,000 tweets have been posted daily.The authors also presented the number of tweets containing the keywords used in data crawling; this revealed that most of the tweets contained the keyword 'putin' (328,186 tweets) followed by 'zelensky' (86,122 tweets).Moreover, the authors also presented the top-10 used hashtags and mentions.This analysis reveals that Zelensky had the highest mentions, followed by NATO and other western leaders.A word cloud of tweet text was also provided that showed the significance of tokens like 'breaking' , 'news' , and 'suspensions' .
The main interest of the current work is detecting and analyzing how people in the Arabic-speaking Middle and Near East reacted towards the Russo-Ukraine War and its related parties as the conflict unfolded.It is true that the Middle/Near East, and the Arab World in particular, are not directly involved in this conflict, but it has direct and indirect dire consequences such as the case with oil prices, food and other vital commodities and goods; and the pressure from either side of the conflict to attract support from the region.As mentioned in [7], African economies -including Arab African countries -have been the worst losers aside from Ukraine, should this war escalate further ahead.While the region's oil, gas, and commodity-exporting countries have benefited from the rising energy and commodity prices, strong negative effects have befallen upon Other Arabic as well as African countries with their huge dependence on Russian and Ukrainian food imports and other essential metal and oil products.
The different reactions towards the conflict are apparent and in order to better understand them we aim to know the public's perception towards the war, and to better understand whether people are in favor of which side and to what extent.The approach we followed to achieve this was the collection of social media posts, particularly tweets, that are related to the war incidents with the required preprocessing achieved.Specifically, our

‫ﻛ‬
‫ﯿ‬ ‫ﯿ‬ ‫ﻒ‬ en: Kiev).The total number of queries used in the search were 40 different queries incorporating uni-grams, bi-grams, and tri-grams.Table 1 presents the exact search queries in the data scraping procedure for the tweets in our dataset.
Social media posts compared with survey polls can result in better and more thorough perception of public opinion about specific topics in a better scientific manner [1].Moreover, social media (such as Twitter) currently plays a major role in affecting the public opinion and attitude as has been observed by several studies [8].Results of this study showed how an election candidate in the U.S. can influence other users to change the course of the election by identifying high in-degree centrality within users participating in a political discussion as happened in the 2012 and 2016 U.S. presidential elections.The authors in [9] concluded that automated public opinion monitoring using social media is a very powerful tool, able to provide interested parties with valuable insights for more fruitful decision making.Twitter has been gaining significant attention in this respect, since people use it to express their views and politicians use it to reach their voters, in a very short, concise, yet effective way.
In the current work we have performed three different kinds of analyses over the collected Twitter data.The first is concerned with the sentiment: what the people's attitudes have been towards the war expressed through tweeting short condensed text.This can be either positive, negative, or neutral.The second kind of analysis is concerned with analyzing emotion with six different emotions.We adopted a lexicon-based approach for emotion analysis.This approach calculates the semantic orientation of specific text (e.g., documents, tweets, posts, etc.) from the semantic orientation of its lexicon (by aggregating the scores of the individual n-grams in the text).The learning approach is a bit harder at this stage as emotion datasets for the Arabic language are very scarce and very limited in size.The final analysis is concerned with the people's bias/partiality analysis towards either of the two parties directly involved in the conflict.This gives an indication of the credibility and propaganda success of each of the conflicting parties, at least throughout the Arab region.
Performing several kinds of analyses aims at detecting the impressions and opinions expressed in different forms.More specifically, sentiment analysis mainly aims to express the main attitude behind the tweet, emotion analysis searches for specific strong feelings in the tweet, while partiality analysis mainly aims to know how people favor either side, which in the current geopolitical context can convey significant trends of the public opinion that may have an impact on the decision makers towards the current and future international situation.It is important to note that our assumption for being 'Pro-Russia' involves favoring Russia and/or its supposed allies, disapproving the narrative of the U.S., Ukraine, and/or the west.On the other hand, standing by the war while being 'Pro-Ukraine' involves favoring Ukraine, U.S., or the west, disapproving Russia and/or its supposed allies, and disapproving the war and/or the Russian's narrative.
The paper is organized as follows.Section 1 is an introduction.Section 2 presents a background about the techniques and approaches utilized in the proposed analysis.In addition, this section presents some related works that utilize Twitter data to analyze how people react towards specific topics and events.Section 3 presents our methodology including the data collection process and the three types of analyses performed over the data.Section 4 presents the experimental work performed to analyze the reactions with respect to the three different aspects along with the results and discussions.Finally, Sect.6 concludes the paper with pointers to future work.

Background and related works
Natural Language Processing (NLP) has significantly increased its potential in the last several years, with the newly trained foundational large models and their impacts on NLP applications.NLP is a subfield of computer science, artificial intelligence, and linguistics that is concerned with developing computational tools to understand text and speech in a similar way to humans.This includes the interactions between computing devices and humans as well as programming computers to process and analyze big chunks of human language.Arabic NLP is the application of NLP tools and technologies, particularly artificial intelligence and text mining, to understand the Arabic language in general; and particularly, Modern Standard Arabic (MSA) and the different Arabic dialects.These include, Arabic text search, PoS (Part-of-Speech) tagging, translation, diacritization, sentiment and emotion analyses, topic modeling, document summarization, etc.
Our focus in this article is to apply Arabic NLP techniques to extract semantic insights from Twitter text data.

Sentiment analysis
Sentiment analysis, among other approaches, represents a decent percentage of Arabic applications which have led to impressive discoveries.In [10] the authors mentioned different approaches to Twitter sentiment analysis including machine learning, lexicon-based, and hybrid-based approaches.In our work we utilized machine learning pre-trained models to perform sentiment analysis.We determine the sentiment of a tweet by taking the majority voting of three of the most well known state-of-the-art models that are used to predict sentiment in Arabic: (1) the Mazajak model [11] built on a Convolutional Neural Network (CNN) [12] followed by a Long-Short Term Memory (LSTM) [13], (2) AraBERT [14], a transformer-based model inspired by the Google BERT model [15], (3) CaMeL-Tools [16] whose driving design principles were largely inspired by the MADAMIRA [17], Farasa [18], CoreNLP [19], NLTK [16].AraBERT can be fine-tuned on different datasets; we decided to fine-tune it on ArSAS [20] a multi-class dataset where sentences are classified under one of the following classes: 'Positive' , 'Negative' , 'Neutral' , or 'Mixed' .
Mazajak [11] is built on a Convolutional Neural Network (CNN) followed by a Long-Short Term Memory (LSTM).The word embeddings of Mazajak were built from a corpus of 250 million different Arabic tweets.The tweets were scrapped through time periods between 2013 and 2016.LSTM is a recurrent neural network (RNN) used heavily in AI and deep learning applications, specially for modeling and analyzing sequential data.Unlike standard feedforward neural networks, LSTM has feedback connections to handle sequential data and realizes memorization for that purpose.This recurrent network can process single data points (e.g., images) as well as entire sequences of data (e.g., text, speech, video, etc.).The notion of LSTM stems from the analogy that a typical RNN should have both "long-term memory" and "short-term memory".The weights and biases in the network connections change once per epoch of training.This is analogous to how physiological changes in synaptic strengths that store long-term memories.The activation patterns in the network change once per time-step.
The second pre-trained model is AraBERT [14] which is a transformer-based model inspired by the Google BERT model [15].AraBERT is based on manually extracted Arabic news websites.The authors used two publicly available large Arabic corpora: Arabic Corpus [21] consisting of 1.5 billion words including more than 5 million articles collected from 10 main news sources covering 8 countries, and the Open Source International Arabic News Corpus OSIAN [22] consisting of 3.5 million articles (∼1B tokens) extracted from 31 news sources covering 24 Arab countries; the pre-training dataset final size is 70 million sentences (after duplicate sentences were removed), that is ∼ 24GB of text.The sentiment voting model AraBERT [14] is an Arabic language model that is based on BERT.Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based ML model for NLP pre-training developed by Google [15].The original English-language BERT has two components: (1) the BERT-BASE which consists of 12 encoders with 12 bidirectional self-attention heads, and (2) the BERT-LARGE which consists of 24 encoders with 16 bidirectional self-attention heads.Both models are pre-trained from unlabeled data extracted from the BooksCorpus with 800M words and English Wikipedia with 2500M words.The last tool used to vote for sentiment analysis is CaMeL-Tools [16] whose driving principles of the design were largely inspired by the MADAMIRA [17], Farasa [18], CoreNLP [19], and NLTK [16].
The authors in [23] presented a linguistically accurate, large-scale morphological analyzer for the Egyptian Arabic language, which differs from the Modern Standard Arabic (MSA) phonologically, morphologically, and lexically and has no standardized orthography.The authors in [24] presented ADIDA, a system for automatic dialect identification for Arabic text distinguishing between dialects of 25 Arab cities in addition to the MSA.A Dialect Identification system in CaMeL-Tools [16] was used as the back-end component of ADIDA for computing the dialect probabilities of the given input.Arabic has a distinguishing characteristic of its complex structure that a computational system has to deal with at each linguistic level [25].Beside the structure, one of the most challenging difficulties is how the analysis may differ with different Arabic dialects.
Twitter sentiment analysis is concerned with analyzing users' tweets in terms of thoughts and opinions in a variety of domains.This analysis can be very important for researchers who need to understand people's views about a particular topic or event [10].
In [26] the authors collected a dataset that was clustered into three categories highlighting the use of social media in the 'Jammu and Kashmir' conflict and identified how people utilized Twitter to reach many others at the same time without any geographical restrictions to raise awareness of the situation in Kashmir by using hashtags, retweets, or replies to tweets.In [27] the authors collected 43,000+ tweets of Donald Trump, trying to identify patterns in his tweets and identify changes over time and how entering politics has affected his behavior on social media.Also, identifying topics that the former 45th president of the U.S. discussed on Twitter.
In [28] the authors collected 1,433,032 tweets, extracting 57,842 tweets filtered by Hurricane Florence in 2018 between August and October.Their analysis showed that human sentiment plays an important role in spreading disaster information compared to the news of the hurricane in online communication.Moreover, people actively utilized Twitter to share a lot of emotions, opinions, and information about the Hurricane; concluding that governments and decision makers should monitor Twitter data to understand the human environment.
In [29] the authors collected a sentiment-annotated dataset for the analysis of Brazilian protests in 2013 annotated by three raters.Each document was classified in one of three classes: positive, negative, or neutral with 56% being classified as Neutral and only 4% as Negative.
Regarding work related to the Arab world and the Arab language, in [30] the authors tried to understand the roots of ISIS terrorist group and its supporters' using data collected from Twitter classifying tweets to "Pro-ISIS" and "Anti-ISIS", and then going back to analyze the historical timelines of both kinds of users supporting and opposing, looking at their pre-ISIS period.One of the conclusions reached was that ISIS supporters refer a lot more to the Arab Spring uprisings that failed.
In [31] the authors aimed to predict online Islamophobic behavior after the Paris terrorist attacks on the 13th of November 2015, through collecting millions of tweets related to these attacks.Tweets are then identified mentioning Islam and Muslims going through attitudes towards Islam and Muslims before the attack.The authors built a classifier to predict post-event stance towards Muslims utilizing pre-event interactions.
In [32] the authors investigate the emotional intensity of students' public opinion on the Internet.The authors studied the challenges of feature selection in sentiment tendency analysis.Sentiment analysis of students' cross-media written text is done through an im-proved MapReduce combinator model.In [33] the authors utilized the Guardian newspaper and extracted useful information about the world points-of-view on the important events in Egypt, from early 2011 onwards, to detect the world perception for such events.The authors did sentiment analysis on the articles included in the 'World' section spanning the period from the start of 2010 to the end of 2017 using the 'Egypt' keyword.The authors got the uni-gram tokens from every article and utilized these tokens to infer the sentiment using three lexicon dictionaries: afinn, nrc, and bing.The analytics indicated that the common trend was slightly negative during the whole selected period.Some conflicting feelings were appearing during this time span e.g., positive, negative, trust, fear, anger, and anticipation.The findings showed also that the years 2011 and 2013 had the peaks in both of positive and negative sentiments attributed to the two uprisings in Egypt.
The lexicon-based approach is one of the main approaches for semantic analysis; it measures the semantic class of a specific text from the semantic orientation of its words [42].The semantic class can be positive, neutral, or negative [42] or emotion class (e.g., anger, disgust, fear, joy, sadness).Specifically, the lexicon-based approach uses a semantic lexicon to score a document by aggregating the semantic scores (or taking the majority class(es)) of all the words in this document.The semantic lexicon contains a word and its corresponding semantic score [43].
Most works of opinion analysis in English can depend successfully on sentiment lexicon like SentiWordNet, e.g., [44][45][46][47].However, Arabic sentiment lexicon faces some challenges, e.g., limited size, usability issues considering the Arabic rich morphology, public unavailability, and the huge diversities among the different dialects.The authors in [48] addressed these issues and created a publicly available large scale Standard Arabic sentiment lexicon (ArSenL) using a combination of existing resources: English SentiWordNet, Arabic WordNet, and the Standard Arabic Morphological Analyzer (SAMA).The authors evaluated their proposal in terms of subjectivity and sentiment analysis.
In [49] the authors presented a way to build an electronic Arabic lexicon by using a hash function that converts each word as an input to a corresponding unique integer number being used then as a lexicon entry.In [50] a large-scale sentiment lexicon called MoAr-Lex was presented; it was built through a novel technique for automatically expanding an Arabic sentiment lexicon using word embedding.The authors evaluated the quality of the automatically added terms in multiple ways.One of the advantages is its ability to incorporate terms that are commonly used in social media, but would normally be considered misspelled such as ‫ﺟ‬ ‫ﻤ‬ ‫ﯿ‬ ‫ﻠ‬ ‫ﻞ‬ (beautiful) with the last Arabic letter wrongly repeated twice.In [51], the authors showed that the use of a sentiment lexicon (whether scored or not) has improved the sentiment classification results while the use of the scored lexicon consistently showed best classification results.Their experiments also showed that the use of scored lexicon can increase the sentiment classifier's ability to generalize across multiple datasets.
In [52] the authors presented AraVec, a collection of pre-trained Arabic word embedding models that can be used for Arabic NLP tasks (e.g., sentiment analysis, emotion analysis, etc.).AraVec is an open source and free to use project.The first version of AraVec contains six word embedding models.These models are built on top of three Arabic content channels: Twitter, World Wide Web pages, and Wikipedia Arabic articles.The total number of tokens used to build the models is more than 3,300,000,000.
In [34] the authors presented a deep learning approach for multi-label emotion classification of Arabic tweets.The proposed model is a multilayer Bidirectional-Long Short Term Memory (BiLSTM) trained on top of pre-trained word embedding vectors using the SemEval2018 Task1 dataset [53].Several pre-processing steps are applied, e.g., normalization, stemming, replacing the most common emojis with their meanings using a manually constructed emoji lexicon.Word embedding was found to be the best method for feature generation.The AraVec [52] pre-trained word embedding model with Continuous Bag of Words (CBoW) avails 300 dimensional word vectors for each word in the dataset [53].The average embedded word vector is then calculated for each tweet, then the BiLSTM is used for classification.The proposed method achieved the best results compared with Support Vector Machines (SVM), Random Forests (RF), and the fully connected deep Neural Network (DNN).It achieved 9% increase in the validation results compared to the previously best obtained results by SVM.
In [35] the authors provided a practical overview on developing an Arabic language model for emotion classification of Arabic tweets.In [36] the authors classified emotions in Arabic tweets: joy, anger, sadness, and fear.The proposed model is based on a deep Convolutional Neural Network (CNN) and word vectors trained specifically on the used dataset.The proposed deep learning approach was evaluated on the Arabic tweets dataset provided by SemiEval for the EI-oc task [53].The model achieved high training accuracy of 99.90% and validation accuracy of 99.82%.The authors compared their results with three other ML approaches: SVM, Naïve Bayes (NB), and Multi-Layer Perceptron (MLP); implemented using three different Arabic stemmers (Light stemmer, ISRI, and Snowball), and two basic feature extractors (word count and TF-IDF).
In [38] the authors proposed a Bayesian inference method for emotion analysis in different semantic dimensions and inferred the co-occurrence of multiple emotion labels from the words in the document.The experiment is performed on the Chinese emotion corpus, i.e., Ren-CECps [54] which has high accuracy and is robust in word and document emotion predictions.
In [39] the authors used hashtags to label emotions.The method was evaluated by two subject studies: through psychology experts and through general crowd.The labels generated by experts were consistent with the hashtag labels of Twitter messages in more than 87% of the cases.The authors developed Emotex which is a supervised learning approach that classifies Twitter messages by the emotion classes they represent.Emotex correctly classifies the emotions presented in more than 90% of the text messages.
In [40] the authors studied various ML-based methods for emotion detection.The methods include ANN and DL.The ANN approaches were the Perceptron and Multilayer Perceptron.The DL approaches were the CNN-LSTM, CNN-BiLSTM, CNN-GRU, CNN-BiGRU, BiLSTM, and CNN.The authors used various feature representation approaches like n-grams, TF-IDF, word-embeddings, and contextualized embeddings.The authors evaluated the algorithms on the "International Survey on Emotion Antecedents and Re-actions" (ISEAR) dataset [55].The results showed that the model consisting of BERT with dense layer outperformed all other methods with macro-average F1-measure equals 0.71 for seven emotions, 0.76 for five emotions, and 0.8 for four emotions.
In [56] the author studied the Arabic songs and lyrics of the very famous singer Abd ElHalim Hafez ( ).The work of the artist has many varieties with a big range of genres spanning romanticism, nationalism, spiritualism, etc.The author analyzed the common characteristics of the artist's work comprising the composers and lyricists that the artist had been working with.The same author in [57]  ).The author analyzed the most important words, idioms, and tokens performed in the songs using word clouds and term frequency-inverse document frequency (TF-IDF).The author had shown a tight correlation between the analysis statistically and the political and social status in Egypt and the Arab region at that time.The author also studied the effectiveness of Part-of-Speech (PoS) tagging in genre analysis and classification.

Partiality analysis
Our final analysis is concerned with the people's bias/partiality analysis towards either of the two parties involved in the conflict.Partiality analysis mainly aims to know how people favor either side.This analysis can convey significant trends of the public opinion that may have an impact on the decision makers.This analysis may be the most challenging with attempting to determine the amount of empathy each party receives across the data.
For partiality analysis, we first filter out the tweets with 'Neutral' or 'Unspecified' sentiment, as these tweets are expected not to carry any bias towards the conflicting parties.The resulting dataset then contained about 0.4M tweets that are used to pretrain an LSTM-based Neural Network.The aim is to classify tweets of the dataset as being either: 'Pro-Russia' , 'Pro-Ukraine' , or 'Neither' .
In [58] the authors studied the effect of feature selection metrics on the performance of Decision Trees, Naïve Bayes classifiers, and Support Vector Machines.The evaluation is done through bias analysis of highly skewed data.Three types of biases are metric bias, class bias, and classifier bias.Experiments were performed to study the employment of these biases together in an efficient way to achieve good classification performance.The authors reported the results and best methods for text classification based on bias analysis.Over-sampling is found to be an effective way for class bias handling.In [59] the authors analyzed real instances of manual edits aimed to remove bias from Wikipedia pages.In [60] the author analyzed the partiality in Italian translations of three articles on Italian politics published in 2015 in the New York Times and the Financial Times.It looks at the discursive re-localization of these three translations when being distributed in the form of Italy's politics and media.
In [61] the authors proposed a statistical model to identify biased users and social bots sharing the biased Twitter content.The authors used annotated twitter dataset and checked the results of sentiment analysis with and without the biased tweets and studied the biased users effects at micro-level and macro level.The results showed that the proposed approach is effective in identifying the biased users and bots from other authentic users using sentiment analysis.In [62] the authors used Twitter data from the 2018 U.S. midterm elections.The authors proposed a method to detect voters on Twitter and compare their behaviors with various accounts sampled randomly.Some accounts flood the public data stream with political content sinking the voters' majority vote.Consequently, these hyperactive accounts were over-represented in the whole sample volume.The proposed work gave insights about the characterizations of these biased voters using Twitter data to analyze such political issues.

Methodology
In this section, we present our methodology in collecting tweets, data pre-processing, sentiment analysis, emotion analysis, and partiality analysis.
Generally, there are two main approaches for text analyses: (1) data-oriented approach based on machine learning methods and (2) more classical NLP approach based on lexicon analysis.Both paradigms go hand in hand, however, the current trend is leaned more towards the use of machine learning with the availability of more datasets and the huge successes of large language foundational models.However, Arabic is a low-resource language, where quality annotated large datasets are still missing.In addition, the computational resources needed to build or fine-tune pre-trained models are still huge and beyond the capabilities of many of the research institutions in the Arab world.So, we had to be careful in our decisions regarding the choice of the analysis paradigm.Sentiment analysis was the easiest; there is an abundance of work in the Arabic NLP literature that treats this problem and already there are several well-established pre-trained models on the Arabic language for sentiment analysis.So we resorted to the use of such models for supervised classification of the tweet's sentiment in a voting based criteria to reach a final decision regarding the target sentiment.Partiality analysis was the hardest.There are no pre-trained models, no annotated datasets, and no lexicon anywhere built for that purpose.So we even believe, to the best of our knowledge, that our work is the first methodical Arabic work handling this problem in any context.Therefore, we consider it our main contribution and we tried our best to take a data-oriented approach based on supervised classification.In order to do that, we did some tricks that include the following: (1) we filtered out large chunk of the tweets using the sentiment analysis part (removing neutral tweets), (2) we built an n-gram lexicon database, and (3) in addition to do some manual annotation for small part of the tweets using several subjects and taking their majority voting.All of these procedures at the end have been used to predict the bias of the major chunk of the tweets.The manual annotation is basically used for verification; and it was feasible as the number of classes were few, Pro-Russia, Pro-Ukraine, or neither.In addition, the decision on the class was rather easy, as it is not that subjective to determine the bias of the given tweet.So, as indicated, sentiment and partiality analyses were tightly coupled in our work.Emotion analysis was in the middle regarding its difficulty.Still there are no pre-trained or fine-tuned models for emotion analysis in the Arabic language.However, there is a constructed emotion lexicon that has already been used in published work.So, we used that lexicon for our analysis.We could not do the same procedures as in the partiality analysis as the number of classes here is too large for manual annotation (6 compared with 3 in partiality) In addition, it is much more subjective than the case of partiality making the annotation process more daunting and hungry for human resources.

Collecting tweets
We have done tweets scraping starting from the 23rd of February 2022 (the day before the start of the war) till the 31st of January 2023.Every scrapped tweet contains the following raw information: Date-time, Tweet Id, Text (the text of the tweet including the tweet hyperlink, any hashtags included in the tweet, and any mentioned account using the @ symbol in case of 'Replying to' other tweet).The language of the Tweets is of course Arabic.Figure 1 presents the number of Arabic tweets in our dataset concerning the Russo-Ukrainian War.It is clear from the figure that the rate of tweets kept decaying till nearly converging at a consistent level starting from June 2022 (except a relatively small spike in Sept -Oct 2022 "the beginning of Fall 2022" with the higher need to Oil worldwide and the inflation of its prices leading to more comments between Twitter users).This can be attributed to the initial unexpected turn of events at the beginning of the conflict, then a rather stability after the main motives, consequences, and outcomes have become clearer.In addition, there are durable periods in the conflict where the involved parties seem to be at stall.
The total number of tweets collected is 3,167,210 covering nearly the first 11 months of the still-ongoing conflict.As shown in Fig. 1, the peak in the number of tweets was on the 24th of February 2022 with over 156k tweets written in Arabic on that day; the day where Russia initiated the war.Again, as stated earlier another relatively smaller peak exists in Sept 2022 and Oct 2022, with the entrance of Fall 2022 and the higher demand for Oil worldwide and the inflation of its prices.

Data preprocessing
The main aim of the data preprocessing step is to present the text of tweets in a consistent form and reduce any potential noise (e.g., special symbols of hashtags).The data preprocessing procedure can be summarized in the following steps using ReGex in parsing and CaMeL-Tools to specifically deal with Arabic as follows: • Removing usernames: any word starting with "@" is removed (e.g., @moe123).
• Removing links: any text starting with "www." or "http" is removed.
• Removing emojis: any emoji in the tweet has been removed using the emoji's unicode.(In the future, we will use these for further investigating emotions and bias.)• Removing hashtags' octothorpe, underscores, and hyphens (-) from tweets.
• Removing non-Arabic characters and words from tweets.
• Normalizing different forms of letters in Arabic to one consistent form; e.g., all the different forms of the Arabic letter Alef ‫,أ(‬ ‫)إ,آ‬ were converted to the unified form ‫.)ا(‬ Table 2 presents some examples of the data pre-processing procedure on a few tweets in our dataset with their English translation.In the boycott, it is a must to take the permission from the authority.Did Biden give them the permission to boycott the vodka?by Jamia Salem Al-Taweel

Sentiment analysis
Identifying sentiment behind text is used to measure the attitude and feeling behind each tweet on the individual level, as well as to analyze the aggregate overall statistical pattern of the trending sentiments.In addition, such analysis can facilitate the subsequent other types of analyses.An example is identifying and consequently removing the 'Neutral'/'Mixed' tweets along with the 'Unspecified' ones (as these tweets are expected not to carry any emotions/biases towards the conflicting parties); then, utilizing the remaining attitudinal tweets (i.e., 'Positive' and 'Negative') in the partiality analysis step (i.e., 'Pro-Russia' , 'Pro-Ukraine' , or 'Neither').
In our 'Sentiment Analysis' , the tweets are classified into four mutually exclusive labels: 'Positive' , 'Negative' , 'Neutral'/'Mixed' , or 'Unspecified' .A tweet is labeled as 'Unspecified' when the three sentiment models, AraBERT, Mazajak, and CaMeL-Tools, annotate it with three different sentiments.It is interesting to note that 53.34% of the tweets were given the same label by the three sentiment models (assuming that 'Neutral' and 'Mixed' are the same).In other words, the three models exactly agreed on their sentiment decisions in more than half of the tweets.

Emotion analysis
Although 'sentiment' and 'emotion' are distinct notions that require distinct analyses, there is no agreed upon definition to distinguish between both [63].Hence, our aim for emotion analysis is to determine the presence of words showing strong intolerance towards specific feeling(s).So, we take an operational stance towards such an analysis.
The main difference between 'sentiment' and 'emotion' analysis approaches used within this research is the 'emotion' dependence mainly on searching for previously specified ngrams within each tweet, while sentiment analysis is done using pre-trained models for identifying the sentiment.On one hand, these are two different technical paradigms (classical lexicon-based and data-oriented ML-based) to tackle two rather seemingly similar problems.On the other hand, it was not feasible to take a data-oriented approach to emotion analysis simply due to the lack of annotated quality emotion Arabic datasets and the lack of pre-trained models for such tasks in the Arabic language.However, we adopt a machine learning approach only for 'sentiment' analysis inspired by the abundance of work related to this aspect compared to 'emotion' analysis.Hence, the 'emotion' analysis task was done instead using a lexicon-based approach.
Each emotion has its own lexicon list 1 [64].The lexicon is manually translated into Arabic from WordNet-Affect emotion lexicon [65], which is a subset of the English WordNet.Each entry in this lexicon is labeled with one of six emotions: Joy, Anger, Sadness, Fear, Surprise, and Disgust.The highest associated words with any of the aforementioned emotions is 'Joy' with 1156 words in Arabic followed by ' Anger' with 748 words as shown in Table 3.
Table 4 presents some examples of tokens for each emotion.This table includes the emotion itself (Anger, Disgust, Fear, Joy, Sadness, and Surprise), example tokens in Arabic, and example tokens in English.
We observed that the word 'war' ( ‫ا‬ ‫ﻟ‬ ‫ﺤ‬ ‫ﺮ‬ ‫ب‬ ) is being repeated in thousands of tweets without showing any specific emotion behind.This is mainly caused by the fact that many tweets talking about the war, regardless of their opinion or how they feel, can be considered 1 https://github.com/motazsaad/emotion-lexicon.).Moreover, any tweet that was labeled with an emotion must contain at least two n-grams (max value of n is 3) from the specific emotion it was labeled with.If a tweet failed to meet this condition, either fewer than two n-grams per emotion or multiple emotions each with more than two n-grams, it would be labeled as 'Null' or 'Mixed feelings' , respectively.The 'Null' class represents not having enough lexical strength for expressing any target emotion in the given tweet; whereas the 'Mixed feelings' class represents having enough richer lexical strengths for multiple feelings in the same tweet.

Partiality analysis
Partiality analysis means that the author of the tweet has some bias towards one of the two parties of the conflict.The most challenging step in 'Partiality Analysis' is to determine the amount of empathy each party receives across the data and the need for validation.There is rather lack of 'Partiality Analysis' pre-trained models in addition to the lack of annotated data.Pre-trained models may detect 'Sentiment' , 'Emotion' , or 'Sarcasm' , but it will not necessarily be able to detect such partiality/bias.Table 5 presents our methodology for bias analysis.
Step 1 in Table 5 aims at building our own weighted n-grams lexicon. 2 A total of 223 uni-grams, bi-grams, and tri-grams have been collected in this lexicon with 118 considered 'Pro-Ukraine' (n-grams with positive weights) and 105 considered 'Pro-Russia' (n-grams with negative weights).The magnitude of the weight indicates the strength of the bias in either of the two directions.Table 6 presents some examples of particular n-grams and their weights. 2The weighted n-grams lexicon used in partiality analysis is currently available upon request.

Table 5 Partiality analysis procedure
Step No. Step 1.
Building a weighted lexicon of n-grams directly related to the conflict.

2.
Filtering out the tweets with 'Neutral' and 'Unspecified' sentiment (their count is 1.379M tweets till Jan 31st 2023), as these tweets are expected not to carry any bias towards the conflicting parties.This first mandates collecting the Arabic tweets that are potentially linked to the conflict and classifying them according to their sentiment.

3.
Lemmatizing the remaining 1.789M tweets, then extracting those tweets containing the n-grams, resulting in 449K tweets that will be used to train the LSTM neural network model in Step 5.

4.
Annotating each tweet of the 449K according to the pre-assigned weights as being 'Pro-Russia' (157K tweets), 'Pro-Ukraine' (140K tweets), or 'Neither' (152K tweets) ('Neither' means that the tweet could be irrelevant to the topic, showing strong opposition to both parties or contains relatively unsubstantial support to either party).5.
Using the formerly identified tweets as a training dataset for a LSTM neural network model.This machine learning model will then annotate a testing dataset of about 1.471M tweets as being 'Pro-Russia' , 'Pro-Ukraine' , or 'Neither' .Russia' or 'Pro-Ukraine' , then a majority voting was taken for each tweet based on the labeling of the four individuals.It turned out that 81.25% of the tweets labels were consistent with the labels assigned by the n-grams lexicon-based approach.
The disagreements in the manual annotation and the lexicon-based annotation are noticed strongly within tweets containing countries or personnel involved in the conflict and can exist in one context criticizing Russia or another context criticizing Ukraine.As an example, countries like Syria  each row represents a word, and each word is represented by the corresponding word embedding.
The Embedding matrix mentioned in the architecture is critical in order to ensure the Arabic tweets are properly encoded while being fed to the LSTM layer.It is based on the skip-gram using 100M tweets used in Mazajak [11].Bidirectional LSTM layer allows the model to capture the long-term dependencies and context of the text.We have not used CNN layers for feature extraction as the embedding matrices already contain enough feature content, so instead we connected the LSTM layers directly to a dense layer with ReLU activation function.Our model showed 95.07%test accuracy and 95.21% validation accuracy after running it with the hyperparameters shown in Table 9.
Beside training an LSTM model from scratch, we fine-tuned an AraBERT model on the same task of labeling a tweet as either 'Pro-Russia' , 'Pro-Ukraine' , or 'Neither' on the same dataset.The hyperparameters of fine-tuning are presented in Table 10.The difference between the hyperparameters used in training the LSTM model and the ones used in finetuning the AraBERT model was the lesser number of epochs and the lower learning rate.This is mainly attributed to the recommendation of the BERT authors.Referring to [66], in order to achieve good performance across all tasks, the number of epochs can be 2, 3, or 4, and the learning rate values for the Adam optimizer can be 5e-5, 3e-5, or 2e-5.The results of fine-tuning the AraBERT model showed that it achieved a test accuracy of 94.69% and a validation accuracy of 94.64%.In order to compare the results of both the LSTM model and the fine-tuned AraBERT model, 50,000 unlabeled tweets were chosen at random for labeling.Both models agreed on the annotation of 45,113 tweets which means that both models gave the same label for over 90% of the chosen sample.

Experimental work 4.1 Sentiment analysis
We have collected 3,167,208 tweets starting from Feb 23rd, 2022 till Jan 31st 2023.The results of the sentiment analysis are shown in Table 11.
It is obvious that 'Negative' and 'Neutral' labels dominate the attitude comprising nearly 95% of the tweets.Most of the tweets (53%) showed a 'Negative' attitude towards the war.This can be attributed to two things: (1) war by itself is a negative human experience involving casualties, specially for civilians and the destruction of human civil facilities; so civilians specially suffer the most from wars and (2) pragmatic reasons due to the direct negative effects of the war on the economic and lifestyle for people in most of the Arabicspeaking region.In addition a large percentage (about 40%) have a 'Neutral' attitude towards the conflict.These could be oblivious to the conflict as it is geographically happening at a distance and the relevant parties have no strong ties to the region.Some people could also be a little ignorant of the dire consequences of this conflict over the region, namely, at least, high inflation rates, the crisis in food supplies, and the soaring increase in energy prices.A small fraction of the people (3.45%) have a positive attitude towards this conflict.They could be originating from people in the Gulf area where they benefited much economically from the ongoing war.However, it is worth thoroughly studying why such an attitude is happening.
Table 12 presents examples of sentiment tweets of each kind (Positive, Negative, Neutral).Figures 3 and 4 illustrate the evolution of each individual sentiment over time since the beginning of the war and its normalized version, respectively.It is important to note that the 'Unspecified' labeled tweets were removed before plotting.
Table 13 presents some examples for the change in each sentiment over time, that is the difference between each two consecutive points.Figure 5 illustrates the evolution of this difference over time.It is clear from the table and the figure that at the beginning of the war the Negative sentiments increased intensively then decreased dramatically in the first few days.This may be attributed to the initial shock at the beginning of the war that lightened after a few days and started oscillating between slight increase and decrease.Notice that as the hype of the events decreased overtime, tweets relevant to the subject faded away from being the trending subjects (except relatively small spikes in Sept 2022 and Oct 2022, i.e., the entrance of Fall 2022 and the higher need for Oil worldwide and the inflation of its prices; manifested in the high increase in the 'Negative' sentiment).The shrinkage can be seen to follow a power law pattern (∝ 1 k d , for some positive d, and k represents the time, for example, in days) since the beginning of the conflict.The total number of relevant tweets as well as the corresponding sentiments stabilize nearly after two months since the beginning of the conflict.
The most impressive thing we notice in Fig. 3 is how big was the difference between the 'Negative' and 'Neutral' tweets in the beginning of the war and how it decreased till Jan 2023 (again except the beginning of Fall 2022 for the reasons stated earlier).In pro-Figure 3 Sentiment of Arabic tweets on the Russo-Ukrainian war portion to the total number of tweets, the difference between the 'Negative' and 'Neutral' tweets was about 25% in February and March 2022, then decreases to below 10% in Jan 2023; specifically the percentage of 'Negative' has decreased from 62% to 44% while the percentage of 'Neutral' has increased from 36% to 53%.

Emotion analysis
In emotion analysis, a 'Null' label indicates that the tweet contains less than two n-grams from the emotion lexicon.A 'Mixed' label indicates that the tweet contains enough ngrams from at least two different emotions (e.g., ' Anger' and 'Sad').89.1% of the tweets did not contain any n-grams from the emotion lexicon, consequently these tweets were annotated as 'Null' .This may be attributed to writing with different dialects or a rather limited size of the currently adopted lexicon.As shown in Table 3, the total number of tokens in Arabic is 3207 that is considered for future extension.The results of emotion analysis are shown in Table 14, which can be extrapolated to uncharted tweets due to limited lexicon size.It is apparent that there is no strong feeling as the conflict may seem a bit far from next door.Also, there are some many tweets with mixed feelings indicating the perplexity towards that conflict and the confusion it causes amongst the public; or due to naturally co-occurring emotions such as fear and anger.Generally, we can say that Figure 4 Normalized Sentiments of Arabic Tweets on the Russo-Ukrainian War nearly 10.93% of the tweets were considered to contain strong expressions of emotion (i.e., Anger, Joy, Fear, Sad, Surprise, Disgust, or Mixed Feelings).
Table 15 dives deeper into the 'Mixed Feelings' category, showing the frequency of each emotion in the tweets labeled as 'Mixed Feelings' .Each row gives the number of tweets in the mixed feelings category that carry the given emotion listed in the first column.For example, among the 234,384 of 'Mixed feelings' tweets, 174,548 of them carry an ' Anger' feeling.It is clear from the table that ' Anger' , 'Sad' , and 'Fear' are the most frequent emotions in the 'Mixed feelings' category.These emotions are no doubt the most relevant emotions with any violent conflicts such as wars.
Table 16 presents the results of different combinations of all emotions.In this table the combination (' Anger' , 'Sad') is the most frequently occurring combination followed by (' Anger' , 'Fear').These co-occurring emotions are natural as, for example, anger is usually accompanied with any of the emotions of sadness and/or fear; similarly, for the other combinations.This validates and indicates the efficacy of both our tweets emotion annotation scheme and the effectiveness of the developed classification models.
For a deeper dive in a sample emotion (like 'Disgust') in order to study its presence with all other emotions; interestingly, ' Anger' then 'Fear' are the most present emotions with 'Disgust' that can be attributed to a person feel angry or fear while expressing a disgusting    Figures 6 and 7 illustrate the evolution of each individual emotion over time since the beginning of the war and its normalized version, respectively.Similar to the tweets temporal trend shown in Fig. 1, the amount of emotions kept decaying till nearly converging at a consistent level starting from June 2022.Again, this can be attributed to the initial unexpected surprise at the beginning of the conflict, then a rather stability after the Figure 6 Emotions of Arabic Tweets on the Russo-Ukrainian War main motives, consequences, and outcomes become clearer.However, starting from late September there is another upsurge in the emotions.This may be attributed to the new tide of the war in the opposite direction with the counter-attack of Ukraine seizing territories from Russian troops that suffered several defeats and started receding back.Within this decaying pattern, still the emotions' majority is either 'Joy' or ' Anger' .This is mainly attributed to the highest number of Arabic tokens in the adopted emotion lexicon being associated with 'Joy' (1156 tokens) followed by ' Anger' (748 tokens) as presented previously in Table 3.So, there is some sort of bias leading to the emotion lexicon needing to be enhanced which is put in our future work.In addition, the 'Joy' emotion was found to be existing in tweets related to the war but including hashtags relevant to other emotional subjects in the Arab region, e.g., (Union Victory).This can be attributed to increase the tweet visibility by mentioning more than one hashtag (especially trending ones) in the tweet (even if not being relevant to the Russo-Ukrainian war).
It is apparent that both parties, namely Russia and Ukraine, are almost equal regarding their support (among opinionated people) in the Arab region with a slight shift towards Russia.It seems a bit surprising that the amount of empathy each party gained is almost equal (or even leaning more towards Russia), even though Russia is the aggressor.In order to try to understand that we did finer investigation over the temporal evolution of the partiality/empathy over the course of the war.Figures 8 and 9 illustrate the evolution of each individual partiality over time since the beginning of the war and its normalized version, respectively.As shown in these figures, the amount of tweets classified as being 'Pro-Ukraine' was slightly greater than Pro-Russia' at the beginning of the war (specifically from Feb 2022 till April 2022) then slowly began to decrease until they nearly converged at the start of June 2022 with a shift happened in the empathy towards Russia in August 2022.Our Interpretation for that is with the initial Russian fierce and surprise attack at the beginning and the amount of refugees who escaped to neighbouring countries, Ukraine gained much empathy.However, by April 2022, Russian intensity has been decreased and with heavy sanctions the U.S. and West have applied on Russia, Russia has begun to gain such empathy with decrease on the Ukrainian side.
There is an upsurge in 'Pro-Russia' in August 2022, as Putin, the President of Russia has signed a decree on Thursday 25th of August 2022 in order to increase Russia's armed forces size from 1.9M to 2.04M [67].There is a more Pro-Ukraine towards the end of the year (about one year from the start of the war), as Russia has lost a reported 200,000 subjects, including many high-ranking military officials, and Putin was confounded by the successes of the Ukrainian army/citizens [68].For results validation, a representative dataset of 1000 tweets was selected.Out of these, 780 tweets belong to the testing dataset annotated by the LSTM model, while 220 tweets belong to the training dataset annotated by the lexicon-based approach.The ratio between both datasets complies with the size of the LSTM testing dataset (1.471M) and the lexiconbased training dataset (449K).
Each of the 1000 tweets was manually annotated by everyone of four individuals based on the following criteria: • Pro-Russia: the tweet either defends "Russia, the Russian army, the Russian president, Russian affiliated organizations/personnel, allies" or strongly criticizes "Ukraine, Ukrainian affiliated organizations/personnel, allies".• Pro-Ukraine: the tweet either defends "Ukraine, the Ukrainian army, the Ukrainian president, Ukrainian affiliated organizations/personnel, allies" or strongly criticizes "Russia, Russian affiliated organizations/personnel, allies".• Neither: the tweet is not siding with any of the two parties, criticizing both, or not related to the conflict.Majority voting for the four individuals' decisions was taken for selecting the final label.The four individuals had the same annotation of the sample labels given by either LSTM model or lexicon-based approach for 70.1% of the cases.The LSTM model was retrained again with increasing the training dataset size to be about 80% of the whole dataset and the accuracy increased to 77.3%, strongly indicating that increasing the dataset size made the model to learn wider patterns leading to improved performance. of confusion, especially with the contradicting reports from both sides about the conflict and the uncertainties regarding the unfolding events.The most common emotion is 'anger' .
We wanted to contribute to the intuitive political understanding of the public opinion towards the two conflicting parties which can be roughly phrased as a confrontation between the east and the west.We did a preliminary investigation using partiality analysis: the bias and/or the empathy of the people towards either of the conflicting parties.We found that the collected tweets somehow were not oriented towards a specific side, with a little bit of lean towards Russia, especially towards the latter phases of the war.
For future work, we aim to collect more data in order to cover a larger temporal span of the conflict, and hopefully to cover the whole wishing it to end very soon.We also would like to dive deeper and do more thorough analysis of emotions and partiality.We want to tackle these two analyses using a machine learning approach with smarter methods for data collection and annotation.We also plan to incorporate media and political theories to have a deeper and more thorough understanding of the historical context and the evolution of the conflict over time.We would like to analyze other sources of textual media such as Facebook posts and blogging articles, as well as other media types such as audio and video.

Figure 1
Figure 1 Distribution of Arabic tweets on the Russo-Ukrainian war

Figure 2
Figure 2 Partiality analysis LSTM model architecture

Figure 5
Figure 5 Difference in sentiment of Arabic tweets on the Russo-Ukrainian war

Figure 7
Figure 7 Normalized Emotions of Arabic Tweets on the Russo-Ukrainian War

Figure 8
Figure 8 Evolution of partiality in Arabic Tweets on the Russo-Ukrainian War

Figure 9
Figure 9 Normalized Partiality of Arabic Tweets on the Russo-Ukrainian War

Table 1
Search queries in our tweets dataset scraping procedure studied the lexical density and diversity of the same singer Abd ElHalim Hafez (

Table 2
Data pre-processing procedure on a few tweets in our dataset

Table 3
Number of Tokens in Arabic for each emotion sorted in descending order

Table 4
Example of tokens associated with each emotion

Table 6
Examples of particular n-grams and their weights

Table 10
Hyperparameters used in fine-tuning AraBERT for the 'Partiality Analysis' task

Table 14
Emotion analysis obtained according to the presence of different n-grams in each tweet Disgust' .The table includes an example in Arabic, the translation in English, and the emotion annotation.As stated earlier in Sect.3.4; each emotion has its own tokens list; for some examples, the reader can refer to Table4.

Table 15 '
Mixed feelings' in emotion analysis obtained according to the presence of different n-grams in each tweet.The size of 'Mixed feelings' is 234,384

Table 16
Different combinations of all emotions obtained according to the presence of different n-grams in each tweet where the total size of 'Mixed feelings' is 234,384

Table 18
Partiality analysis obtained according to Step 4 and Step 5 mentioned in Table 5