Skip to main content

Table 1 Guidelines and examples for the annotations

From: Analysis and classification of privacy-sensitive content in social media posts

Category

Guidelines

Examples

Sensitive

A post is “sensitive” if the text is understandable, i.e., written in clear English, and the annotator is certain that it contains information that violates a person’s privacy, not necessarily of the author of the post. A text violates a person’s privacy if contains the following types of information (non-exhaustive list):

• current or upcoming moves;

• information on events in the private sphere;

• information on health or mental status;

• information about one’s habits;

• information that can help geolocalize the author of the post or other people mentioned;

• information on the sentimental status;

• considerations that may hint at the political orientation or religious belief of a mentioned person.

In general, given the subjectivity of the topic, a post can be sensitive if the person reading it feels discomfort due to the private content it contains (and not to other moral considerations).

“...heading to the gym with *PROPNAME*, *PROPNAME* and my sista!!” “is feeling uninspired and unmotivated. Can someone else please pay her bills and move her into her new apartment?” “is very sore and very tired...” “Just wanted to thank everyone for all the support (and great tips) yesterday, it meant a lot! made it through yesterday without smoking at all...and still going strong! :)” “Lazy day around the house after the family has left.” “ARGH. 2 whole years! Congratulations, *PROPNAME*! You’ve tolerated me for a total of 730 days! Plus ‘getting to know you’ time... hahaha!” “is shaking his head wondering when some of his conservative christian friends became so hate filled that they will join any anti-obama group on facebook.”

Non sensitive

A post is “non-sensitive” if the text is understandable, i.e., written in clear English, and the annotator is sure that it does not contain information that violates privacy, according to the indications of the “sensitive” category.

“Fabulous weekend :-)” “When we are no longer able to change a situation – we are challenged to change ourselves. Viktor E. Frankl” “loves summer evenings”

Unknown

A post is of “unknown sensitivity” if the text is understandable, i.e., written in clear English, but the annotator is unable to tell if it contains information that is sensitive for privacy, because (non-exhaustive motivations):

• the context is not sufficient to understand the sensitivity of the message;

• the post is incomplete, i.e., the text does not contain the whole post, and from the available portion one is unable to understand its sensitivity;

• the post contains a reference to a media (an image, a link, a GIF) which is considered essential for understanding the message, if the text alone is not sufficient to understand its sensitivity.

“black” “Goodbye *PROPNAME*. :(“ “I know 6 sick people at the moment, and now I’m...” “Check out what I’ve got written for The Book of *PROPNAME*. [link]”

Unintelligible

A post can be marked as “unintelligible” when:

• it is written with slang/abbreviations or a grammar that does not render it understandable from a lexical point of view;

• the post is written in a language other than English.

“hooked on PBS” “fml” “wahhhh,. di na ko. hurot na jud ako kwarta aning AI. huhuhu” “Pas de mauvaise nouvelle pour l’instant! Je presume donc que c’est une bonne chose!”