1 of 28

ICWSM 2022 TRENDS

A Summary of Proceedings to Observe Research Trends on the ICWSM 2022 Conference

2 of 28

Agenda

  • Introduction
  • 20 ICWSM Papers Summary
  • Research Map
  • Discussion

2

1/19/23

3 of 28

Introduction

This presentation is aimed at analyzing and observing the trend propagated by the proceedings at the International AAAI Conference on Web and Social Media. The objective is to identify the domains and techniques used in Artificial Intelligence to solve various social problems.

3

1/19/23

4 of 28

20 ICWSM PAPERS

A Summary on Trends

5 of 28

P1: Correcting Sociodemographic Selection Biases for Population Prediction from Social Media

  • Social media population sampling leads to selection bias and misrepresentation
  • Current solutions involves reweighing of sample population based on the demographic groups under- or over-sampling
  • Current Solutions lead to degradation of prediction accuracies due to reliance on sparse or shrunken estimates of population’s socio-demographics
  • ROBUST POSTSTRATIFICATION consisting of:
    • Estimator redistribution to handle shrinkage
    • Adaptive binning
    • Informed smoothing to handle sparse socio-demographic estimates

PROBLEM/MOTIVE

APPROACH/METHOD

  • Significant prediction accuracy improvement over current techniques
  • 53% increase in variance on life satisfaction study on Twitter

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA

REFERENCE: Giorgi, S., Lynn, V. E., Gupta, K., Ahmed, F., Matz, S., Ungar, L. H., & Schwartz, H. A. (2022). Correcting Sociodemographic Selection Biases for Population Prediction from Social Media. Proceedings of the International AAAI Conference on Web and Social Media16(1), 228-240. https://doi.org/10.1609/icwsm.v16i1.19287

5

1/19/23

6 of 28

P2: SAFER: Social Capital-Based Friend Recommendation to Defend Against Phishing Attack

  • Online Social Networks are susceptible to phishing attacks due to recent advanced cyber-attack techniques
  • Current Friend Recommendation Systems do not adequately mitigate phishing attacks probabilities
  • Users’ friending decisions method are prone to phishing attacks
  • SAFER: Social cApital-based FriEnd Recommendation
  • Measured social capital by 3 dimensions: structural, cognitive, relational capital based on 2 Twitter Datasets
  • Developed 4 FRS based on 4 social capitals (relational, cognitive, structural, multidimensional), compared performance with 3 non-social capitals (social attributes-based, topic-based, trust-based) for friending decisions

PROBLEM/MOTIVE

APPROACH/METHOD

  • Key Findings:
    • Users having friends with high social capital (SC) can self- defend against phishing attacks better than users having friends with the same topic interests or social attributes.
    • SC-based FRSs can enable users to combat phishing attacks better than non-SC-based counterparts because the friends of the users with high social capital can help them defend against the phishing attacks.
    • Bot-based phishing attacks can be more easily detected and defended than human-based phishing attacks under all FRSs because bot attackers show more distinctive characteristics than human attackers in social capital.
    • Although SC-based FRSs can allow more attackers to infect (engage) users in the attacks due to users with high social capital attracting more attacks, even the infected users can be easily recovered with the help of their friends in their social networks.
    • All SC-based FRSs perform comparably in detecting phishing attacks, while the cognitive SC has shown the best performance among all FRSs with a slightly better performance. This suggests that if a weighted linear model is used, cognitive SC can be assigned with a higher weight than relational SC and structural SC.
  • Future Work: using SC for FRS

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA

REFERENCE: Guo, Z., Cho, J.-H., Chen, I.-R., Sengupta, S., Hong, M., & Mitra, T. (2022). SAFER: Social Capital-Based Friend Recommendation to Defend against Phishing Attacks. Proceedings of the International AAAI Conference on Web and Social Media16(1), 241-252. https://doi.org/10.1609/icwsm.v16i1.19288

6

1/19/23

7 of 28

P3: Effect of Popularity Shocks on User Behavior

  • MOTIVE: To study the changes in user activity in terms of frequency of posting and content posted around popularity shocks
  • Previous studies suggested that reputation is the key driving factor of repeated online activity
  • This paper aims to fill the gap by extending the research studying popularity shock as an essential aspect of the reputation theory
  • Little work has been done on after-effects of posts going viral
  • Used Generic Social Media platform to run test on research questions:
  • RQ1: [Engagement Response to Popularity] Do users increase their posting behavior after receiving popularity shock?
  • RQ2. [Content Response to Popularity] Do users alter their content post receiving popularity shock?
  • RQ3. [Longevity of Effect] How long do the effects of popularity shock last?
  • RQ4. [Sustained Shock Effect] What type of activity characterizes long-term sustainability of effects of popularity shock?

PROBLEM/MOTIVE

APPROACH/METHOD

  • FINDINGS:
  • RQ1: (Increased Posting) Users increase their posting behavior post shock but decrease it as time goes
  • RQ2: Users, post-shock generate more similar content to the shock inducing posts.
  • RQ3: Popularity shocks are short-lived. The increased response received by users goes down to pre-shock level very quickly after the shock.
  • RQ4: Maintaining high posting frequency helps keep retaining the long-term effect. Users deviating away from the content which got them to the shock have shorter survival times of shock effect, at the same time having high similarity in consecutive posts can lead to repetitiveness which again causes the survival to go down. High audience engagement helps maintain the effects long-term.

SOLUTION/FINDINGS

CATEGORY: MACHINE LEARNING / SOCIAL MEDIA

REFERENCE: Gurjar, O., Bansal, T., Jangra, H., Lamba, H., & Kumaraguru, P. (2022). Effect of Popularity Shocks on User Behaviour. Proceedings of the International AAAI Conference on Web and Social Media16(1), 253-263. https://doi.org/10.1609/icwsm.v16i1.19289

7

1/19/23

8 of 28

P4: Are Proactive Interventions for Reddit Communities Feasible?

  • Reddit has been found to propagate problematic socio-political discourse
  • Reddit admins find it difficult to contain/prevent these discourses due to:
    • Inadequate number of admins to track & reacts to millions of posts daily
    • Fear of negative consequence of banning or quarantining hateful communities on the platform
  • This paper investigates proactive means as a solution
  • Using most active subreddits with admin interventions as dataset to:
    • Measure stability by community vocabulary to propose for computational techniques to monitor subreddits
    • Identify means to predict the evolution of problematic subreddits
  • Using features such as: community, moderator, user, & language to build interpretable models to predict problematic subreddits
  • 80% dataset used for training & 20% used for testing to evaluate findings

PROBLEM/MOTIVE

APPROACH/METHOD

  • The model was able to identify problematic subreddits based on their evolutionary data collected
  • Proactive measures using predictive strategies in Machine Learning can be used to assist human administrators in moderating and monitoring of subreddit accounts
  • The limited number of admins and policy fears can be mitigated by automated measures
  • Future Work: Other SMP studies

SOLUTION/FINDINGS

CATEGORY: MACHINE LEARNING - LR/ SOCIAL MEDIA

REFERENCE: Habib, H., Musa, M. B., Zaffar, M. F., & Nithyanand, R. (2022). Are Proactive Interventions for Reddit Communities Feasible?. Proceedings of the International AAAI Conference on Web and Social Media16(1), 264-274. https://doi.org/10.1609/icwsm.v16i1.19290

8

1/19/23

9 of 28

P5: Exploring the Magnitude and Effects of Media Influence on Reddit Moderation

  • To investigate the difference in effect between media-driven and proactive moderation strategy
  • Social media platforms such as reddit, face a dilemma:
    • Do they risk angering their communities by banning problematic/toxic subreddits, or do they risk losing advertising revenue due to negative media reactions on toxic communities?
  • This paper seeks to answer the question (dilemma)
  • 2 hypotheses set to investigate the answer:
    • H1: In communities with toxic content, Reddit’s administrative interventions for violating the content policy related to toxicity occur because of media pressure.
    • H2: Prior media attention on communities which receive interventions for toxic content: (1) increases the prevalence of problematic activity on the platform and (2) reduces the effectiveness of the issued interventions.
  • Used subreddit datasets to find avg: toxicity, neg. media mentions, media pressure

PROBLEM/MOTIVE

APPROACH/METHOD

  • H1 was confirmed
  • H2 (1) was confirmed and H2 (2) was not confirmed
  • In summation, media-driven and proactive intervention/moderation strategies don’t have a difference in effect on social media platforms

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA

REFERENCE: Habib, H., & Nithyanand, R. (2022). Exploring the Magnitude and Effects of Media Influence on Reddit Moderation. Proceedings of the International AAAI Conference on Web and Social Media16(1), 275-286. https://doi.org/10.1609/icwsm.v16i1.19291

9

1/19/23

10 of 28

P6: On the Infrastructure Providers that Support Misinformation Websites

  • This paper made analysis on infrastructures that in one way or another assist directly or indirectly in the spread of misinformation by providing services such as: hosting, domain registration, CDN, DDoS protection, advertising, donation processing, and email service to misinformation websites.
  • They render misinformation mitigation more difficult or practically impossible.
  • Manually investigating misinformation websites service providers on hosting, domain registration, CDN, DDoS protection, advertising, donation processing, and email service
  • Ranking the service providers on different types of services for various misinformation websites

PROBLEM/MOTIVE

APPROACH/METHOD

  • Cloudflare top the list of service providers powering misinformation sites
  • Hosting providers have weak policies that are rarely implemented unless until publicly noticeable incidents occur
  • Misinformation sites rely on monetization platforms such as advertising networks

SOLUTION/FINDINGS

CATEGORY: STATISTICS / MISINFORMATION

REFERENCE: Han, C., Kumar, D., & Durumeric, Z. (2022). On the Infrastructure Providers That Support Misinformation Websites. Proceedings of the International AAAI Conference on Web and Social Media16(1), 287-298. https://doi.org/10.1609/icwsm.v16i1.19292

10

1/19/23

11 of 28

P7: No Calm in the Storm: Investigating QAnon Website Relationships

  • QAnon is an online conspiracy theory movement with many followers in the US
  • This paper aims to identify and understand pro-QAnon websites by building a QAnon-centered domain-based hyperlink graph
  • It also studied the link of these websites to mainstream misinformation content online
  • The target is to find an effective way to study growing presence of conspiracy and misinformation online
  • Using web crawlers to target 2 major QAnon websites as samples: 8kun.top & voat.co
  • Use Hypertext Induced Topic Selection algorithm to target similar websites
  • Using Hyperlink Graph algorithm to study relationships & misinformation propagation
  • Random Forest Classifier algorithm to distinguish misinformation sites from authentic news sites
  • Holistic approach: web crawling, hyperlink graph, & Random Forest classifier

PROBLEM/MOTIVE

APPROACH/METHOD

  • The paper successfully found that:
  • A holistic approach by combining web crawling techniques, hyperlink graph theory and algorithms, and Random Forest Classifier algorithm is effective in the identification and study of QAnon-centric websites and misinformation propagation, and consequently can be an effective method to identify and analyze any growing online conspiracy theory and misinformation movement of the sort.

SOLUTION/FINDINGS

CATEGORY: STATISTICS / MISINFORMATION / SOCIAL MEDIA

REFERENCE: Hanley, H. W. A., Kumar, D., & Durumeric, Z. (2022). No Calm in the Storm: Investigating QAnon Website Relationships. Proceedings of the International AAAI Conference on Web and Social Media16(1), 299-310. https://doi.org/10.1609/icwsm.v16i1.19293

11

1/19/23

12 of 28

P8: Time after Time: Longitudinal Trends in Nostalgic Listening

  • Nostalgia is an emotional feeling of affinity towards something enjoyable
  • This papers studies the trends of nostalgic emotions in connection to music listeners
  • It investigates whether nostalgic feelings are prompted by changes in a person’s life or by certain events across a population
  • Data was collected through a cross-national survey: listeners listed their listening histories they found nostalgic
  • Used Random Forest Classifier to identify nostalgic tracks per individual listeners over 5-year period
  • Compared results with listeners across 4 countries to observe consistency in behavior on nostalgic tracks
  • Dataset used gathered various elements of the listeners such as: age, sex, country, nostalgic track, etc.

PROBLEM/MOTIVE

APPROACH/METHOD

  • The research found that:
  • People listen to nostalgic music more often as they age
  • There is consistency of nostalgic music listening in people’s day-to-day lives
  • Events or traditions had no effect on increasing personally nostalgic music listening
  • A novel methodological approach for nostalgic listening studies

SOLUTION/FINDINGS

CATEGORY: STATISTICS / ENTERTAINMENT / SOCIAL LIFE

REFERENCE: Hanson, C., Anderton, J., Way, S. F., Anderson, I., Wolf, S., & Wang, A. (2022). Time after Time: Longitudinal Trends in Nostalgic Listening. Proceedings of the International AAAI Conference on Web and Social Media16(1), 311-322. https://doi.org/10.1609/icwsm.v16i1.19294

12

1/19/23

13 of 28

P9: The Impact of Viral Posts on Visibility and �Behavior of Professionals: �A Longitudinal Study of Scientists on Twitter

  • This paper studies the effect of virality of posts on scientists on Twitter: how does the viral posts affect their subsequent behavior and long-term visibility on Twitter
  • Studying the causal effects of unusual attention to posts or individuals, is the main philosophical background of this paper
  • Using a Dataset of tweeting activities on Twitter comprising of 17,157 scientists
  • Identified scientists who experienced unusual first time virality
  • Quantified how virality influenced their subsequent behavior on Twitter using follower graph in statistics
  • Compared pre-virality and post-virality posts and number of followers to study the effects and subsequent behavior and popularity

PROBLEM/MOTIVE

APPROACH/METHOD

  • The study found that:
  • The scientists increase their tweet frequency & popularity after virality
  • Their tweets became more objective and centered on fewer topics
  • They expressed positive sentiment on their pre-virality posts
  • Their subsequent tweets were more aligned with their professional expertise

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA

REFERENCE: Hasan, R., Cheyre, C., Ahn, Y.-Y., Hoyle, R., & Kapadia, A. (2022). The Impact of Viral Posts on Visibility and Behavior of Professionals: A Longitudinal Study of Scientists on Twitter. Proceedings of the International AAAI Conference on Web and Social Media16(1), 323-334. https://doi.org/10.1609/icwsm.v16i1.19295

13

1/19/23

14 of 28

P10: Post Approvals in Online Communities

  • Post approval is the act on a post to be published or discarded by online community moderators or administrators
  • Many online communities adopt post approval, however, little research has been done on the impact it inflicts on the community members
  • This paper seeks to identify those effects through a longitudinal analysis of certain number of Facebook groups
  • 233,402 Facebook Groups analyzed on:
    • The factors leading to post approval adoption
    • The effect of post approval on subsequent user activities and moderations
  • Research Questions:
    • RQ1: What leads communities to adopt post approvals?
    • RQ2: How do post approvals shape user activity and moderation in online communities?
    • RQ3: Does the impact of post approvals depend on community properties and on how the setting is used?

PROBLEM/MOTIVE

APPROACH/METHOD

  • RQ1 Answer:
    • Increase in user activity (comments) & reported posts
  • RQ2 Answer:
    • Posts are now more matured and responsible, less post reporting
  • RQ3 Answer:
    • Impact varied
  • Future Work:
    • How post approval affect different kinds of users and friendship relations?

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA

REFERENCE: Horta Ribeiro, M., Cheng, J., & West, R. (2022). Post Approvals in Online Communities. Proceedings of the International AAAI Conference on Web and Social Media16(1), 335-346. https://doi.org/10.1609/icwsm.v16i1.19296

14

1/19/23

15 of 28

P11: Rules and Rule-Making in the Five Largest Wikipedias

  • This paper studies the comparative and relational dimensions of rules and rules-making of self-governing online communities
  • It took the 5 largest Wikipedia communities based on language editions: English, French, German, Japanese, and Spanish specific Wikipedia communities
  • Based on 2 Research Questions:
  • RQ1: How do patterns of rule-making over time compare across autonomous communities with shared goals and technical infrastructure?
  • RQ2: How do the sets of rules become more or less similar over time among communities with shared goals and technical infrastructure?
  • Compared the rule-making patterns and rules evolution amongst the 5 editions
  • Used Knowledge Graph & Statistics

PROBLEM/MOTIVE

APPROACH/METHOD

  • Results:
  • RQ1:
    • Rule creation follows similar, but distinct patterns across the wikis in the sample.
  • RQ2:
    • Rule evolution patterns become less similar overtime among different language edition communities
  • Wikipedia language editions share common rules but domesticate them in distinct ways

SOLUTION/FINDINGS

CATEGORY: GRAPH / STATISTICS / SOCIAL MEDIA

REFERENCE: Hwang, S., & Shaw, A. (2022). Rules and Rule-Making in the Five Largest Wikipedias. Proceedings of the International AAAI Conference on Web and Social Media16(1), 347-357. https://doi.org/10.1609/icwsm.v16i1.19297

15

1/19/23

16 of 28

P12: Twitter User Representation Using Weakly Supervised Graph Embedding

  • This paper proposes a method of identifying user types by embedding a weakly supervised graph to monitor and learn about users’ lifestyle based on activity
  • The dataset focused on is gathered from Twitter users tweets on lifestyle aspects such as Yoga and Ketogenic Diet, propagated as practitioners or advocates
  • The goal is to automatically classify users based on tweet content
  • Dataset gathered based on 2 classes: Ketogenic diet and Yoga
  • Collected data based on keywords related or including the 2 classes
  • Used knowledge Graph to find links and patterns from users of similar tweets
  • Build a model for binary classification of either “Keto” or “Yoga” on tweets training and testing data
  • Used pretrained BERT as baseline model

PROBLEM/MOTIVE

APPROACH/METHOD

  • The results showed that:
  • Users can be identified and classified automatically based on lifestyle tweets
  • The method can be adopted to other types of corpora for similar classification techniques
  • Future Work: to expand the work to detect communities based on different lifestyle decisions and understand their motivations.

SOLUTION/FINDINGS

CATEGORY: NLP / SOCIAL MEDIA

REFERENCE: Islam, T., & Goldwasser, D. (2022). Twitter User Representation Using Weakly Supervised Graph Embedding. Proceedings of the International AAAI Conference on Web and Social Media16(1), 358-369. https://doi.org/10.1609/icwsm.v16i1.19298

16

1/19/23

17 of 28

P13: The Hipster Paradox in Electronic Dance Music: How Musicians Trade Mainstream Success off against Alternative Status

  • This paper studies the hipster paradox in EDM using large-scale and longitudinal digital traces of musicians.
  • EDM is characterized as notoriously seeking independence from mainstream music industry to earn its liberty and autonomy
  • The hipster paradox eludes that EDM musicians need mainstream industry for success while they actively go against this industry’s beliefs and values
  • Collected dataset from live performances and album releases of EDM musicians from 2001 to 2018
  • Construct network model based on sociological approach: bipartite network
  • Run regression on network to use network positioning to make observations and conclusions regarding artists’ success on autonomy vs. commercial success through mainstream industry
  • Linear Regression algorithm used

PROBLEM/MOTIVE

APPROACH/METHOD

  • The study found evidence for a structural trade-off among success and autonomy. Musicians in EDM embed into exclusive performance-based communities for autonomy but, in earlier career stages, seek the mainstream for commercial success.
  • The approach highlights how Computational Social Science can benefit from a close connection of data analysis and theory.

SOLUTION/FINDINGS

CATEGORY: MACHINE LEARNING / ENTERTAINMENT / SOCIOLOGY

REFERENCE: Jadidi, M., Lietz, H., Samory, M., & Wagner, C. (2022). The Hipster Paradox in Electronic Dance Music: How Musicians Trade Mainstream Success off against Alternative Status. Proceedings of the International AAAI Conference on Web and Social Media16(1), 370-380. https://doi.org/10.1609/icwsm.v16i1.19299

17

1/19/23

18 of 28

P14: Two-Face: Adversarial Audit of Commercial Face Recognition Systems

  • This paper made an adversarial audit on commercial face recognition systems to detect biases against minority groups during recognition predictions
  • These biases have been lingering for quite sometime despite numerous studies on the subject in the past
  • This paper aims to show the increase in bias on minority groups and the effects these biases can cause in society
  • Tested cases on FRS:
    • Amazon AWS Rekognition
    • Microsoft Azure Face
    • Face++ Detect
  • Dataset: CELEBSET: 1600 images of 80 Black & White Celebrities
  • Predictions on gender, age, smile detection
  • Measured accuracy of each of the above systems on the selected features during test cases

PROBLEM/MOTIVE

APPROACH/METHOD

  • String discriminatory bias against individuals of minority groups especially of black color for age, gender, and smile prediction
  • Significant accuracy reduction on adversarial inputs from same dataset, where disparity affected mainly people of black color
  • Social consequence: real life implications
  • Future work: study FRS skin color sensitivity of images and impact of color filters on FRSs

SOLUTION/FINDINGS

CATEGORY: COMPUTER VISION –CNN / SOCIOLOGY

REFERENCE: Jaiswal, S., Duggirala, K., Dash, A., & Mukherjee, A. (2022). Two-Face: Adversarial Audit of Commercial Face Recognition Systems. Proceedings of the International AAAI Conference on Web and Social Media16(1), 381-392. https://doi.org/10.1609/icwsm.v16i1.19300

18

1/19/23

19 of 28

P15: Sunshine with a Chance of Smiles: How Does Weather Impact Sentiment on Social Media?

  • This paper studies the impact of weather on sentiment on social media by leveraging the contextual cues of location and time alongside weather cues to obtain better sentiment analysis accuracy over current BERT models

  • Dataset: Snapchat & Twitter data
  • Use RoBERTa as base model

PROBLEM/MOTIVE

APPROACH/METHOD

  • Test Results:

  • Historical weather has a lasting impact on sentiment and contextual cues improve model accuracy by 3%

SOLUTION/FINDINGS

CATEGORY: NLP / SOCIAL MEDIA

REFERENCE: Jiang, J., Murrugara-Llerena, N., Bos, M. W., Liu, Y., Shah, N., Neves, L., & Barbieri, F. (2022). Sunshine with a Chance of Smiles: How Does Weather Impact Sentiment on Social Media?. Proceedings of the International AAAI Conference on Web and Social Media16(1), 393-404. https://doi.org/10.1609/icwsm.v16i1.19301

19

1/19/23

20 of 28

P16: Many Ways to Be Lonely: Fine-Grained Characterization of Loneliness and Its Potential Changes in COVID-19

  • This paper studies different kinds of loneliness exhibited by different age groups to understand how different forms of loneliness and coping mechanisms can be manifested in loneliness self-disclosure in online forums
  • The study was conducted during the COVID-19 pandemic as a strategy to use the long lasting lockdowns to get data and scenario on 4 different groups
  • Built FIG-Loneliness dataset from Reddit forums: r/lonely, r/loneliness, r/youngadults, r/college to get data from loneliness expressions
  • Using BERT as baseline to classify and predict on Reddit posts as lonely or not lonely and further classify based on context with multiclass classifier still on BERT as baseline model
  • Using Hierarchical Distributional Learning to examine how different forms of loneliness manifest on the above Reddit forums

PROBLEM/MOTIVE

APPROACH/METHOD

  • Binary classification of lonely/not lonely achieved > 97% accuracy
  • Multilabel classification on context: location, duration, interaction features obtained average accuracy of 77%
  • Loneliness expressions in young adults differ from those of adults and more likely to express concerns of the COVID-19 lockdown health effects
  • Showed that different forms of loneliness have different use in coping mechanism

SOLUTION/FINDINGS

CATEGORY: NLP / SOCIAL MEDIA

REFERENCE: Jiang, Y., Jiang, Y., Leqi, L., & Winkielman, P. (2022). Many Ways to Be Lonely: Fine-Grained Characterization of Loneliness and Its Potential Changes in COVID-19. Proceedings of the International AAAI Conference on Web and Social Media16(1), 405-416. https://doi.org/10.1609/icwsm.v16i1.19302

20

1/19/23

21 of 28

P17: Out of the Shadows: Analyzing Anonymous’ Twitter Resurgence during the 2020 Black Lives Matter Protests

  • This paper studies the resurgence of a hacktivist group named Anonymous that seemed to have been dismantled on Twitter by arrests of prominent members by Police in 2013 after major attacks on government facilities in the US
  • After the murder of George Floyd in 2020, the group appeared to have been reinforced according to reports. The study aims to gather facts to examine and understand its factors of resurgence
  • Collected samples of Anonymous-affiliated accounts data as dataset
  • Annotated & classified as anonymous/non-anonymous and trained & tested on the following algorithms:

  • Random Forest adopted for sentiment analysis and subsequent tasks
  • Used algorithm to detect automation in the posts by chat-bots

PROBLEM/MOTIVE

APPROACH/METHOD

  • Found evidence of a united approach amongst the group, with positive tweets typically being used to express support towards BLM and negative tweets typically being used to criticize police actions
  • Found indications of bot-like behavior across the majority of Anonymous accounts.
  • Findings show that whilst the group has seen a resurgence during the protests, bot activity may be responsible for exaggerating the extent of this resurgence

SOLUTION/FINDINGS

CATEGORY: MACHINE LEARNING / SOCIAL MEDIA

REFERENCE: Jones, K., Nurse, J. R., & Li, S. (2022). Out of the Shadows: Analyzing Anonymous’ Twitter Resurgence during the 2020 Black Lives Matter Protests. Proceedings of the International AAAI Conference on Web and Social Media16(1), 417-428. https://doi.org/10.1609/icwsm.v16i1.19303

21

1/19/23

22 of 28

P18: Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators

  • This paper examines the degree to which Natural Language Generator models like GPT-2 can generate texts capable of deceiving online Authorship Attribution (AA) models
  • The role of AA in real-world applications spam-detection and investigation of criminal activity online alongside verifying the authors of online texts for various purposes
  • Dataset:
    • 19,320 users’ blog-posts sampled from blogger.com
    • 11,698 tweets sampled from Twitter
  • Used GPT-2 to generate texts
  • Tested and compared results from GPT-2 generated texts vs Human generated texts (blog, twitter)
  • Tested on BERT-AA, BERTweet-AA, Random Forest-AA models

PROBLEM/MOTIVE

APPROACH/METHOD

  • The study found that:
  • Current AI-based text generators are able to successfully mimic authorship on both datasets (blogs, tweets)
  • The AI-based generated texts can sufficiently deceive popular AA models based authorial style

SOLUTION/FINDINGS

CATEGORY: NLP / SOCIAL MEDIA

REFERENCE: Jones, K., Nurse, J. R., & Li, S. (2022). Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators. Proceedings of the International AAAI Conference on Web and Social Media16(1), 429-440. https://doi.org/10.1609/icwsm.v16i1.19304

22

1/19/23

23 of 28

P19: Local News Online and COVID in the U.S.: Relationships among Coverage, Cases, Deaths, and Audience

  • This paper conducted analyses to study the relationship between online local news coverage of COVID cases and deaths in an area and the properties of the local news outlets and their audiences
  • The study seeks to understand the factors associated with the degree of local news coverage over time and the variance of COVID topics across news outlets during the pandemic
  • Collected 750,000 articles from 300 online local news sites from Apr 2020 to Feb 2021
  • Use Statistical analyses to analyze the distribution based on the questions:
    • RQ1: What factors are associated with the degree of local news coverage, over time, of the COVID-19 pandemic?
    • RQ2: What were the primary COVID-related topics covered during the pandemic, and how did this coverage vary across outlets?
  • Regression to address RQ1
  • Structural Topic Model to address RQ2

PROBLEM/MOTIVE

APPROACH/METHOD

  • RQ1 Result:
    • the rate of COVID coverage over time by local news outlets was primarily associated with death rates at the national level, but that this effect dissipated over the course of the pandemic
  • RQ2 Result:
    • the volume and content of COVID coverage differed depending on local politics, and outlet audience size, as well as evidence that more vulnerable populations received less pandemic-related news

SOLUTION/FINDINGS

CATEGORY: STATISTICS / SOCIAL MEDIA / SOCIOLOGY

REFERENCE: Joseph, K., Horne, B. D., Green, J., & Wihbey, J. P. (2022). Local News Online and COVID in the U.S.: Relationships among Coverage, Cases, Deaths, and Audience. Proceedings of the International AAAI Conference on Web and Social Media16(1), 441-452. https://doi.org/10.1609/icwsm.v16i1.19305

23

1/19/23

24 of 28

P20: Supporting Human Memory by Reconstructing Personal Episodic Narratives from Digital Traces

  • This paper proposes an application leveraging on the collected Personal Digital Traces (PDT), which involves aspects of peoples’ lives captured in digital form, to generate models that can retrace episodic narratives of past events for patients suffering from neurodegenerative disease

  • Data collected from numerous sources about the patient

PROBLEM/MOTIVE

APPROACH/METHOD

  • Results on w5h model

SOLUTION/FINDINGS

CATEGORY: NLP / MEDICINE / SOCIOLOGY

REFERENCE: Kalokyri, V., Borgida, A., & Marian, A. (2022). Supporting Human Memory by Reconstructing Personal Episodic Narratives from Digital Traces. Proceedings of the International AAAI Conference on Web and Social Media16(1), 453-464. https://doi.org/10.1609/icwsm.v16i1.19306

24

1/19/23

25 of 28

RESEARCH MAP

Domain – Technique Relationship

26 of 28

Research Map

Map

26

1/19/23

Machine Learning

Statistics

NLP

Correcting Sociodemographic Selection Biases for Population Prediction from Social Media

P. 228 - 240

SAFER: Social Capital-Based Friend Recommendation to Defend Against Phishing Attack

P. 241 - 252

Effect of Popularity Shocks on User Behavior

p. 253 - 263

Are Proactive Interventions for Reddit Communities Feasible?

p. 264 - 274

Exploring the Magnitude and Effects of Media Influence on Reddit Moderation

p. 275 - 286

On the Infrastructure Providers that Support Misinformation Websites

p. 287 - 298

No Calm in the Storm: Investigating QAnon Website Relationships

p. 299 - 310

Time after Time: Longitudinal Trends in Nostalgic Listening

p. 311 - 322

The Impact of Viral Posts on Visibility and �Behavior of Professionals: �A Longitudinal Study of Scientists on Twitter

p. 323 - 334

Post Approvals in Online Communities

p. 335 - 346

Rules and Rule-Making in the Five Largest Wikipedias

p. 347 - 357

Twitter User Representation Using Weakly Supervised Graph Embedding

p. 358 - 369

The Hipster Paradox in Electronic Dance Music: How Musicians Trade Mainstream Success off against Alternative Status

p. 370 - 380

Two-Face: Adversarial Audit of Commercial Face Recognition Systems

p. 381 - 392

Computer

Vision

Sunshine with a Chance of Smiles: How Does Weather Impact Sentiment on Social Media?

p. 393 - 404

Many Ways to Be Lonely: Fine-Grained Characterization of Loneliness and Its Potential Changes in COVID-19

p. 405. -416

Out of the Shadows: Analyzing Anonymous’ Twitter Resurgence during the 2020 Black Lives Matter Protests

p. 417 - 428

Are You Robert or RoBERTa? Deceiving Online Authorship Attribution Models Using Neural Text Generators

p. 429 - 440

Local News Online and COVID in the U.S.: Relationships among Coverage, Cases, Deaths, and Audience

p. 441 - 452

Supporting Human Memory by Reconstructing Personal Episodic Narratives from Digital Traces

p. 453 - 464

27 of 28

Discussion

  • From the 20 papers I have read:
  • 50% used Statistical techniques (10 papers)
  • 25% used NLP techniques (5 papers)
  • 20% used Machine Learning techniques (4 papers)
  • 5% used Computer Vision techniques (1 paper)

27

1/19/23

28 of 28

Thank you

Ebrima Hydara (ID: 34414090)

Ozono Lab

Dept. of Computer Science

Graduate School of Engineering

Nagoya Institute of Technology