Arab reactions towards Russo-Ukrainian war

EPJ Data Science

Table 5 Partiality analysis procedure

Step No.	Step
1.	Building a weighted lexicon of n-grams directly related to the conflict.
2.	Filtering out the tweets with ‘Neutral’ and ‘Unspecified’ sentiment (their count is 1.379M tweets till Jan 31st 2023), as these tweets are expected not to carry any bias towards the conflicting parties. This first mandates collecting the Arabic tweets that are potentially linked to the conflict and classifying them according to their sentiment.
3.	Lemmatizing the remaining 1.789M tweets, then extracting those tweets containing the n-grams, resulting in 449K tweets that will be used to train the LSTM neural network model in Step 5.
4.	Annotating each tweet of the 449K according to the pre-assigned weights as being ‘Pro-Russia’ (157K tweets), ‘Pro-Ukraine’ (140K tweets), or ‘Neither’ (152K tweets) (‘Neither’ means that the tweet could be irrelevant to the topic, showing strong opposition to both parties or contains relatively unsubstantial support to either party).
5.	Using the formerly identified tweets as a training dataset for a LSTM neural network model. This machine learning model will then annotate a testing dataset of about 1.471M tweets as being ‘Pro-Russia’, ‘Pro-Ukraine’, or ‘Neither’.