Searching for Trustworthiness using Belief Spaces                        

With Maps and Mobs: Searching for Trustworthiness using Belief Spaces

A Dissertation Proposal by Philip Feldman

University of Maryland, Baltimore County

Abstract

As information explodes into existence all around us, we increasingly must depend on algorithmic mediated information retrieval. These processes are opaque to us, yet we routinely make assumptions about the trustworthiness of their results. Since there is a strong tendency in human interaction to polarize into ever more oppositional groups, it is important that our tools do not amplify this tendency, but rather encourage us to expand our information horizons.

Using digital news aggregation as a proxy for the general problem of search and retrieval, I propose to research a model for information retrieval that integrates two levels of information interaction. On the individual level, I will leverage Munson and Resnick’s diversity-seeker, confirmer, and avoider user types. At a group level, I propose to integrate individual behaviors according to Sunstein’s Law of Group Polarization. Following a research through design methodology, my work will iterate between model development, agent-based situation, and interface instantiation. A starting place for the former are biological models on sexual selection and the latter the NewsCube platform.

To date a proof-of-concept information exploration prototype has been developed and tested with a literature synthesis task across a modest corpus.   The next iterative cycle broadens the user base, exploring both individual and group behaviors. The final cycle, based on a validated model, will involve a large scale evaluation using a historical news corpora.  This work will provide considerable insight into effective ways to include the user more fully in their information choices, pulling back the curtain of algorithmic selection. The ability to alert when slipping into the self-reinforcing echo chamber of information bubbles will be a transformative contribution.

1 - Introduction

The interactions between humans and computers are becoming ever more sophisticated and intertwined. The amount of communication that is mediated or affected by algorithms is (currently?) astonishing. According to Cisco (Cisco Whitepaper 2016), internet traffic will exceed one zetabyte  (Wikipedia definition 2016) in 2016. This information can be placed on a continuum ranging from direct communication between two people (say Skype or SMS) through platform algorithms placing targeted advertising in front of potential customers to the chatter of embedded devices colluding to provide the sensation of poise and responsiveness to the driver of a high-end sports car while not allowing a crash (Sutcliffe, S. 2013).  

Using Shannon’s calculation of entropy for the average word (Shannon, C. E. 1951), all these bits moving back and forth is the equivalent of a world population of ten trillion people talking continuously all day, every day, non-stop[1]. To put that in more personal terms, that means that your share of the information pie is a room with about 1,400 people in it, with more coming in the door all the time. Clearly, we cannot handle even a fraction of what is available to us. As a result, we choose tools and strategies that allow us to organize and process the digital world around us. And as the amount of information continues to increase along its inexorable exponential curve, we are driven to trust algorithms to deliver ever more precise information. With that need for increasing precision comes a corresponding risk that the information provided will have less context and framing. We will get what we ask for, but it may not be what we need. Or even more troubling, we may get what our ever-attentive, personalized systems think that we would like. Put a different way, as each of us become a smaller element in the growing oceans of information, we stop being navigators and start being pulled by currents we don’t understand.

What does that mean for Human Centered Computing? What kind of system-wide design considerations must we take into account so that our cybernetic interactions with each other are trustworthy and empowering? This is an enormous question, but Ben Schneiderman says that we should embrace enormous questions (Shneiderman, B. 2016). Still,  are there proxies that can stand in for these issues that can serve to illuminate the fundamental questions of identity, anonymity, system and social trust that serve as the implicit foundations of communication (Kress, G. 2009), be it computer-mediated or otherwise.

1.1 - News as a proxy

I submit that the production and consumption of news contains many of the issues and challenges in the mediated want/need/like framework mentioned above. News, at its best is a commons, an information space that we share within our everyday lives (Lefebvre, H. 1991). And to the extent that these commons are diminished, we as a society are diminished.  

1.2 - What is news?

News, or rather news as expressed through the practice of journalism, is the delivery of information, generally about current events, using affordances, practices and culture developed over the centuries. News needs to be timely, informative, trustworthy and entertaining  enough to be consumed as a product[2]. A key aspect is the concept of ‘enough’. In this case, ‘enough’ is similar to Simon’s term ‘satisficing’ (Simon, H. A. 1996), or meeting an acceptability threshold. In other words, timely does not have to mean immediate. Informative does not have to be exhaustive. Not all information can be independently verifiable. Every story doesn’t have to be an engrossing page turner. The decision of what is ‘enough’ is made by the editors based on a complex interplay between components such as  editorial perspective, advertiser attractiveness, and readership.

When writ large, news, or ‘The Press’ also has an implicit goal to support civic engagement (Madison, J. 1789). The Pulitzer prize embodies this. Founded to recognize  "the most disinterested and meritorious public service rendered by any American newspaper during the preceding year" (Topping, S., & Gissler, S. 2016). An example of this civic feedback loop is the prize-winning stories on rogue narcotics police by the Philadelphia Daily News which was followed up by political action (Van Aelst, P., & Walgrave, S. 2016), in this case an FBI probe. This role of news as ‘the fourth estate’ (Schultz, J. 2009)- a flawed but effective watchdog has been part of Western political discourse at least since Edmund Burke coined the term in 1787

But, for political action to occur (Binder 2014), news can not degenerate into the relaying of static opposing views[3]. Readers need to be aware of a range of reportage that goes beyond what they and their social group are interested in. This is true in the case of traditional news organizations as well as the more ad-hoc structures that can arise, such as targeted platforms like Ushahidi, or personalized news aggregators such as Google News. For news to realize its civic purpose, it cannot provide a  bubble of harmonious agreement. Effective news needs to support ‘agonistic pluralism’, “in which the facts, beliefs, and practices of a society are forever examined and challenged” (Korn & Voida 2015).

To support this type of contestation, news producers (human or otherwise) need to be responsive to the information needs of their readers, and readers should be able to share their views productively with other readers, reporters and editors. The design and cultural mechanisms  for these interactions have accreted over the centuries to produce the current affordances. And even though the technology differs (paper, radio, television, web page and apps), the human process is the same - reporters interact with people to produce stories, editors fit the stories into the current ‘edition’ based on business, logistical and technical constraints, whereupon the product is sold back to the people. The foundations of this process are the same, whether the story is from the Topeka Capital on the dust bowl of the 1930’s (Panhandle PBS 2012), or a Buzzfeed article on the 2016 hurricane season (Broder Van Dyke 2016).

1.3 - Designing for future news systems

The delivery of news is changing from an editorially packaged and integrated product into something that is accessed piecemeal through computer-mediated search and social recommendation. How can future news systems continue to support agonistic pluralism in the absence of the traditional integrated news production organizations?  Other mechanisms need to be found/built that can support a more ad-hoc interplay between news production and news consumption that detect and amplify the contributions of trustworthy sources, reporters, editors and readers. In this way, news does indeed represent a proxy for many aspects  of the sociotechnical issues that are emerging in the human-centered part of the internet, and would profit from deeper study.

Based on this premise that  the communication structure known as networked digital journalism can be regarded as a reasonable microcosm of its enclosing system of computer-mediated communication, it seems reasonable that success in designing mechanisms that address issues of communication quality, trust and identity within the news production and consumption context may be broadly generalizable.

1.4 - Contributions

Particularly with respect to News, but possibly more generally, I believe that this work will provide a more sophisticated understanding of how designing for serendipitous search can uncover latent information retrieval behaviors in individual users. These behaviors in turn may uncover how individuals act collectively through computer-mediated information. This research should provide a model-validated framework for understanding how individual information retrieval patterns (explorer, confirmer, avoider), can manifest in larger scale patterns such as information bubbles and antibubbles.  The feedback in turn between individual and group behaviors may have implications with respect to determining sources of information that are more correlated with a ‘grounded’ understanding of newsworthy events and which are not.

Relatedly, this work should demonstrate the idea that the quantified behaviors of many users exploring information in the context of a serendipitous design may be as effective as a relatively small number of deliberate fact checkers. This has implications for the design of scalable, self-organizing, trustworthy news production and consumption systems.

1.5 - Proposal Structure

The structure of the rest of this proposal is as follows:

  1. Can a model be built that allows for an in-depth evaluation of group polarization behaviors such as echo chambers, information bubbles and antibubbles? Further, can this model be extended and refined to reflect and evaluate the sorts of designs being proposed in this dissertation? The model built in this step will be continuously refined over the period of this research, with the results being fed back into the model with the goal of creating a fully validated and grounded model.
  2. Is it possible to design an interface for searching news that elicits explorer, confirmer and avoider behaviors? What are these elements? Are these behaviors associated with people or with sessions?
  3. Can design influence the way that users interact with information such that explorer, confirmer and avoider patterns can be detected statistically or programmatically?
  4. When looking at populations of explorers, confirmers and avoiders, is it possible to detect any differences in how they interact with news. Are there patterns of interaction that allow the detection of information bubbles and antibubbles and indirectly, the trustworthiness value of this information? At what scale(s) can these patterns be detected, and can they feed back into search of a news corpora?

2 - Motivations

The initial motivation for this work arises from contemplating the synergy between Information Systems and Information Bubbles, particularly in polarizing news items such as climate change and gun control (Lewandowsky, Ecker, Seifert, Schwarz, & Cook, 2012). In these and in many other cases, the facts are well known but there are groups who interact with this information in dramatically different ways, ranging from deep exploration of the available information to the creation of (Hobson & Niemeyer, 2012, p. 399)  alternate ‘facts’ (Garrett, Weeks, & Neo, 2016, p. 2). Even outside of ‘hot-button’ issues, information bubbles seem to be remarkably easy to slide into.  This can happen in two ways. In the first, we select sources of information that are most agreeable to us. In the second a system or person that we trust learns our information biases and preferences, and begins to tailor the information we receive to fit with that perspective. An example of the former could be the intentional selection of a set particular news sources from cable, radio and the internet. An example of the latter could be the use of a  political advisor, or an information retrieval system that tailors its delivery of results to the user  (Pariser, 2012) based on any number of personalized observations. Although the effects are similar in that an information bubble or echo chamber (Key & Cummings, 1966) - a self referential data entity that does not depend on additional outside information for continued viability - may be created, the first is the result of explicit action on the part of the user (explicitly choosing a channel with a comforting and agreeable bias) while the second is  an implicit choice to accept the selection of others, be they people or systems. This implicit choice has deep implications, since we are unlikely to realize the ramifications of such implicit choices.

Now there is nothing to say that our choices of information providers necessitates a narrowing of information horizons. A user may choose a news source that provides wide and varied content that is challenging, that opens up areas hitherto unknown. Similarly, a politician may choose advisors that have differing viewpoints and that force discussion, analysis and development (e.g. Abraham Lincoln and his team of rivals - Goodwin, 2005)

2.1 - The evolution of news as a civic good

The past and current production of news exemplify these mechanisms clearly. After the advent of the printing press, it became possible to affordably disseminate news and other information to the masses. As they developed, newspapers invented affordances and processes to quickly and compellingly deliver the news. The concept of the lede, the front page, cross-checking sources, headline size, and the human-interest perspective all led to a politically essential and commercially viable platform to inform citizens. Publications embodied any number of perspectives on what was editorially important, and the consumer was able to choose the source(s) that were most matched their views or aspirations. This processes of choosing a source continued through radio, film and television, and was, for the most part a well choreographed broadcasting mechanism with little opportunity for customization other than coarse measures such as the geographic zoning to accommodate local stories and advertising inserts.

2.2 - How search changed news consumption

How we access news is changing with the ascendency of cheap data processing, cheaper storage and planetwide interactivity. Using a search engine on a news topic returns results from across a number of publications, blogs, and other sources. The resulting list of relevant articles is often based on personal browsing history. Optimally, the goal of the search is to return a highly targeted set of relevant links that best matches the user’s information need (Manning, Raghavan, & Schütze, 2008, p. 152).

But news should also strive to tell you things you need to know. Excessively personalized search runs the risk of only presenting what the searcher wants to hear. This can make search-based news less like news and more like some weird fawning of a technological apparatchik wrapping us in a comfortable filter bubble.

 There is a tension here - the ability to customize information for individual consumption is built into the DNA of the web. The original PageRank paper (Page & Brin, 1999) describes how to personalize search results. Since then personalization has grown, particularly in the context of social networks. The Pew Research Foundation reports that in 2016 an increasing majority of Americans (62%) access their news indirectly through search engines, aggregators or social networks (Gottfried & Shearer, 2016). This means that the affordances of traditional news journalism (headline size, position, section, etc.) are being replaced by the affordances of IR - the list and the link. It also means that more and more we are trusting algorithms to deliver ever more precise information with correspondingly less context and framing.

2.3 - Trust, system trust and social trust

Merriam Webster's Dictionary’s primary definition of trust is the “belief that someone or something is reliable, good, honest, effective, etc”. Belief, in turn is defined as the “conviction of the truth of some statement or the reality of some being or phenomenon especially when based on examination of evidence”. Another way of looking at this is that we build an internal model of what we expect in our interactions with people or things based on our experiences. If our model matches our observations, then we trust our model. If that model is of “someone or something is reliable, good, honest, effective, etc”, then we call that someone or something trustworthy. If that model accurately describes someone or something that is not “reliable, good, honest, effective, etc”, then we distrust them. The erratic and unpredictable lie between these two poles.

Again, Merriam defines Trustworthiness is the property where one is deserving of trust. We expect a trustworthy person or thing to be behave honestly and reliably. This manifests differently for things and for people. Within Computer-Mediated-Communication, we can refer to  trust in things as system trust, and the trust we place in people and groups of people as social trust.

Haciyakupoglu and Zhang define system trust as the faith invested in the  functioning of technological systems based on the technological affordances they provide.(Haciyakupoglu & Zhang, 2015, p. 451). I trust my bicycle to handle high-speed mountain descents because I can see and feel its solidity and responsiveness, and I’ve spent many miles feeling out its performance limits. Information retrieval systems like Google search have high system trust because what is returned so often matches our expectations. We trust that the information these systems deliver us for a variety of reasons. The unambiguous nature of the search box and results page. The often remarkable relevancy of the search results. The consistency of behavior over time. We trust the GPS because it’s so much more effective than using a map to find our way. Our trust in IR systems is the trust that we place in tools, and we regularly trust our tools, and place our lives in their hands(?) on a daily basis. However, though the process of IR is mediated through computers, it is the indexing and retrieving of human-produced or influenced content. Should we be using some hybrid of social trust, which can be defined as which Haciyakupoglu and Zhang define as “the feeling of connection to a certain social group or organization (Haciyakupoglu & Zhang, 2015, p. 451) and system trust? After all, the trust I feel towards my bicycle is entirely different from the trust that I feel for a group of riders that I’ve hooked up with for a club ride or race. My interaction with computer-mediated information is actually an interaction with a large group of loosely coupled individuals. So who are these people that we’re trusting?

Further, note that the trust that we place an information retrieval system is based on our internal model of our expectations. It does not need to be based on any externally verifiable facts.  that might necessity if you expect to see conspiracies, you’ll find conspiracies, often without any context or framing. Search engines simply retrieve documents that we as other people in the network have placed into the system. If there are people in the network that create documents that match your beliefs, a trustworthy information retrieval system will accurately and reliably retrieve those documents.

Note that this differs from the information retrieval inherent in maps because the maps inevitably have an understandable relationship between them and the terrain. Even though the map it's not the terrain, there must be a correlation or the map has no utility.

2.4 - Current search and the things unseen

Consider Google’s response to a ‘Washington DC’ query[4][5]:

 

Figure 1 - Google Search Engine Results Page (SERP) for Washington DC, not accounting for personal history

This is a surprisingly linear extension of the list framework first implemented by Gerard Salten in 1964 with the SMART (Salton, 1964) system, only instead of one list, we have multiple overlapping lists - the Query Autocomplete (QAC), then a mixture of lists for ads, news items and lastly, the traditional(?) list of relevant documents. But as with SMART, the IR system has had to identify and retrieve relevant information based on some set of criteria. But the criteria for ranking the search results has had some odd side effects in this case. According to these results, Washington DC, is most relevant as a tourist destination.

It’s possible that the majority of searches and/or writing with respect to Washington DC are about travel. But if we trust people’s popular searches to be the lens that we see information, we may miss a tremendous amount that we may find highly pertinent, if we ever got the chance to access it. There is much more to the city that is beyond the ‘information horizon’ (Sonnenwald & Wildemuth, 2001) of this presentation of search results.

With social networks, topic search is somewhat different, though still (primarily? exclusively?) list based. Perhaps most interesting are the ‘implicit searches’ (Figure 2)  that produce items such as trending lists. In a way, these are completely frictionless searches to an implied question ‘what is interesting/entertaining?’, as determined by generally proprietary and confidential algorithms.

Figure 2 - Trending items on the author’s Twitter and Facebook feeds, June 3, 2016

From an information horizon perspective, these lists are like small spotlights piercing the darkness based on some unknown set of rules. Items appear with little  contextual information, and even less opportunity for serendipity. A set of algorithms (Tufeki, 2016), and by extension, their platform designer (Momeni, Cardie, & Diakopoulos, 2015, p. 42) decide what the user sees. The basis for this decision can be based on computational efficiency, desired user behavior (clicks, etc), or even as the result of experiments (Gedikli, 2013, p. 20).

2.5 - Is search an echo chamber?

We’ve seen that depending on their design, indexing systems such as these can perform as reliable advisors or fawning apparatchiks. One of the reasons that this matters beyond one individual’s search results, particularly in news, is the Law of Group Polarization (Sunstein, 2002). Group polarization states that when a group of individuals discuss an item with a particular point of view, the act of discussion will cause that opinion to shift such that the average opinion of the group at the end of the discussion is more extreme than it was at the beginning. This property was first observed by Le Bon in his 1896 book ‘The Crowd’. Of crowds, he states that ‘one of their general characteristics was an excessive suggestibility, and we have shown to what an extent suggestions are contagious in every human agglomeration; a fact which explains the rapid turning of the sentiments of a crowd in a definite direction’ (Le Bon, 2009, p. 28).  The existence of this phenomenon was demonstrated in studies by Moscovici and Doise who showed that the consensus reached will be most extreme with less cohesive, homogeneous groups (Moscovici, Doise, & Halls, 1994, p. 73). In addition to being a long standing phenomenon, group-polarization-like behavior appears widespread, with manifistations at the individual level (e.g. confirmation bias) to the political organizations (e.g. epistemic closure [Sanchez, 2010]) to the nation-state (e.g. nationalism [Dieckhoff & Jaffrelot, 2006]). As such, it should seem entirely reasonable that these behaviors should manifest in computer-mediated communication.

As we have seen, the feedback that results in the production of the individualized lists is effectively a social construct - a discussion -, mediated somewhat anonymously through computer networks. And since these lists are based on providing the most relevant, useful result, the sense of what there is beyond the immediate results is obscured. In other words, our computer mediated discussion includes only those who agree with what we are looking for.

The SMART system produced lists because that was all that it was capable of doing. It could be argued that the dominance of list-based presentation is more of an accident of history rather than any well thought-out plan or design. Regardless of origin, the ‘ten blue links’ is a restrictive way of looking at information (Bar-Ilan, Keenoy, Levene, & Yaari, 2009).  There are, in fact, many other models for human interfaces to information systems. Indeed, if we look at the physical library as just one prior model of information retrieval, we see a different, physical/spatial approach to mapping information using the architecture and culture of the library, where collection layout, selection and event the position of the reference desk all influence which resources are used (Farmer, 2016).

2.6 - Designing for serendipity

Library design has accreted  for thousands of years and has many affordances for both information retrieval and serendipitous discovery. The physicality of the library means that even the most focussed of searches entails at least the peripheral awareness of large amounts of ambient information. This process is known as Serendipity in the Stacks, anecdotally described as "wandering through vast corridors of deserted stacks and then happening on a passage in some long-dormant volume that unexpectedly reveals a special insight" (Carr, 2015). This improvisational aspect of accidental discovery implicit in library search is perceived as under threat by the library community due to computer-mediated searches (Alves, 2013).

Digital maps are also a form of information retrieval that use many of the same IR mechanisms as list production, but because of their geographic nature, need to place items into a identifiable context that a user can navigate through the use of query, zoom, and scroll. There is still bias in this system - maps that are cast on a Mercator projection will imply to users that Greenland is far larger than it actually is (Stockton, 2013), for example.

Consider again the query ‘Washington, DC’, but this time using geographic context:

Figure 3 - Bing Maps search for Washington DC

The same sorts of retrieval mechanisms that pull data from lists (Page Rank, etc) are used to extract and order the information presented. Here though, there is an underlying geographic context, so ordering of the information is spatial, and relative. Here also, the affordances of cartography, again developed over thousands of years are being used (Jacob & Dahl, 2006, p. 11-53). Major roads and cities are indicated with bolder strokes and fonts. Zooming in and out presents different levels of information, from the hyperlocal to the global. Scrolling is used to swap information that is at the same scale. These direct manipulation actions are not queries so much as reframing mechanisms. Maps allow the user to create a mental model that is shared across the population (Mark, Freksa, Hirtle, Lloyd, & Tversky, 1999). Ask someone to use a map, digital or paper to find the relative location of Baltimore and Washington, and it’s highly unlikely that they will come up with an answer that is so unique to them that they would not be able to share it with others. Further, there are opportunities to serendipitously discover new information while engaged in a geographic search. For example, one might stumble upon the town of Brookville, MD. It lies north of Washington, about halfway to Baltimore, and was U.S. Capitol for one day during the war of 1812. It is visible at all scales below 5km on Google maps and at all scales below 2km on Bing Maps.

Personalization is applied to GIS systems as well. Cyclists, walkers, drivers, and mass transit users have different routes calculated for them based on a source and destination. However, they all exist in the same geospatial context. One could imagine the personalization of digital maps such that the information available to a person who consistently uses ony type of transportation to simply not provide visibility into the other types of transportation infrastructure[6]. Imagine the group polarization that would occur between transportation-using groups that have no insight into how the other group gets around. And yet, at least for now, that level of personalization does not exist, even though it might create a more frictionless experience. Perhaps, from a cultural perspective, we feel that we need to know and share more about our physical world that we all inhabit.

This need appears to manifest the multiple attempts to ‘map’ the internet. Lima, in Visual Complexity - Mapping Patterns of Information, describes numerous approaches and graphical patterns used to show the relationships between nodes of information in a network (Lima, 2011, p. 58) . However, these do not behave as maps, where one of the defining characteristics is to ‘stabilize and bring uniformity to an image of space’ (Jacob & Dahl, 2006, p. 32). These information graphs are dynamic and uniquely configured. They support serendipitous discovery, but not repeatable navigation.

The lack of serendipity has been an ongoing, if peripheral (figure 4),  concern in the IR/CHI community at least since 2006, when two papers came out that began to address the issue.  McNee et. al. discussed how overly accurate IR metrics are creating too-constrained results by relying too much on similarity without allowing for serendipity (McNee, Riedl, & Konstan, 2006). Russell Beal showed that “ambient intelligence” (a combination of data mining, clustering and visualization) might be used to create  a more serendipitous web browsing experience (Beale, 2007). Since then,numerous approaches to extracting serendipity have been introduced. For example, Gemmis et al recently showed how a Knowledge Infusion (KI) approach might be able to provide a mechanism for more better results. They also discuss efforts at incorporating serendipity at Google, Amazon, Facebook and Ebay as early as 2011 (De Gemmis, Lops, Semeraro, & Musto, 2015). This means that there have been mechanisms for providing serendipitous returns for several years now, and yet only recently has Google added a level of serendipity to image search as a ribbon across the top of the SERP.

Figure 4 - papers with “serendipity” and “relevance” by year.

So how much and what kind of information to share/show ‘in the background’ of a search must be more of design[7] than technical problem. Because some of the information shown will be outside of the scope of ‘relevant’, what to show and how to show it has critical design aspects.

2.7 - The benefits of some friction

Consider The Parable of Simon’s Ant, In it, he describes the path of an and moving along a beach. Rather than a simple straight line towards the goal, the path is complex and twisting. “...why is it not straight; why does it not aim directly from its starting point to its goal? In the case of the ant (and for that matter the others) we know the answer. He has a general sense of where home lies, but he cannot foresee all the obstacles between. He must adapt his course repeatedly to the difficulties he encounters and often detour uncrossable barriers. His horizons are very dose, so that he deals with each obstacle as he comes to it; he probes for ways around or over it, without much thought for future obstacles” (Simon, 1996, p. 51).  The point of this parable to Simon is that the interaction of simple rules and a complex environment result in complex behavior. Patrick Winston describes this relationship (Winston, 2014) as

Complexity (behavior) = g(Complexity (program) | Complexity(environment))

But this relationship can be looked at in another way - given the complexity of the behavior and the program, we can infer the environment:

Complexity (environment) = g(Complexity (program) | Complexity(behavior))

. If the ant’s path we straight, but it is capable of changing its motions to avoid obstacles, we could infer that the environment was simple with a clear path to the goal. In general, a complex path should describe a complex environment. However, if the environment is complex, but there is a simple and straight path to the goal then we will never know how complex the surrounding environment is. Some impediments or frictions aid in the exploration of an environment. Just as the placement of stones or food in the path of the and will change its path, the inclusion of serendipitous items within the search process can influence the outcome and direction of the search.

Like the ant’s travels, search is an ‘everyday’ activity, and creating explorational affordances such as a more frictional design should reflect an awareness of this (Korn & Voida, 2015). One part of this thesis will examine the kind of design that might provide greater level of shared information that could inform and contextualize a search without interrupting the flow of the activity, while still allowing for ‘Serendipity in the Stacks’ moments. The focus of this exploration will be with respect to news items, but the approach may be generalizable to other kinds of SERPs.

2.8 - Tracking the paths of frictional search

The other side of such a design process is the impacts that it can have on the way different types of users interact with information. Grabe and Myrick discuss differences based on demographics and involvement (Grabe & Myrick, 2016). Munson and Resnick show that news readers can belong to statistically-distinguishable groups consisting of explorers, confirmers, and avoiders (Munson & Resnick, 2010). Winter, et al show these groups from a slightly different context. They primed users to search the news for particular information with respect to determining accuracy, defending a point or getting an impression of the story’s content. To track this, they used a custom interface that tracked a user’s marking of news stories for later reading[8] (Winter, Metzger, & Flanagin, 2016).

My intuition is that most non-geographic SERPs currently reflect and amplify the confirmer pattern as a result of design decisions made by the system designers, though not necessarily for the purpose of supporting confirmers over other types of searchers. These same system designers could configure the presentation of results to support and amplify the explorer pattern[9], which implicitly lead to broader information engagement.

Korn and Voida suggest that designs for civic engagement can be categorized into  Deliberation, Disruption, Situated Participation, and Friction. Of these design patterns, they find that frictional designs are underrepresented. As with the beach for Simon’s ant, frictional designs ‘get in the way’ but do not ‘take away agency’ (Korn & Voida, 2015). While these types of designs may be an inconvenience to confirmer and avoider behavior patterns, they may be helpful to explorer behaviors, since they provide opportunities for exploration. By designing to accommodate all of these these behaviors, we could then look for different patterns of usage among users that reflect these patterns. If  such patterns are sufficiently unique and recognisable, then we can look to see if certain behaviors are more or less associated with information bubbles and/or ‘antibubbles’, where an antibubble is a set of behaviors that look to a larger set of information that is related to a topic but is not consistent with only one point of view (Zhou & Sornette, 2004).

Why would recognising these behaviors be important?

2.9 - Bubbles, antibubbles  and group polarization

Information bubbles, powered by group polarization, drift. Over time, they become less and less connected to any kind of external reality because that reality must at some point exist outside of the range of information that the members of the bubble can accept. The bubble seals off and continues to drift until it encounters something it cannot ignore or deny (Sunstein, 2002). Then it ‘crashes’, potentially into smaller/different bubbles (Brunnermeier & Oehmke, 2012). Based on this premise, if an antibubble drifts at all, it should drift with respect to diverse sources of information that are in general agreement or consensus. In short, antibubbles should be more likely to reflect a more accurate state of ‘reality’ across larger domains than bubbles. The affordances of journalism should help to make these patterns more visible, and the presence of a large historical data set allows for examination over prolonged timeframes and multiple locations.

A critical assumption here is that by looking at these different patterns of interactions, we should be able to determine some level of relative credibility/trustworthiness to a source. Sources that are strongly correlated with bubbles should be less trustworthy. Sources that are strongly correlated should be more so. This implies that there may be identifiable‘ trustworthy readers’ or ‘trustworthy sessions’.

2.10 - Breaking group polarization with virtuous cycles

Particularly in a news context, where anonymity may be important, such trustworthy patterns could then be used to evaluate and rank anonymous writers, and a virtuous cycle could be built[10]. I expect this fundamental interplay between search design and searcher response with respect to news to be the foundational base of my research.

If these effects can be found and incorporated into effective designs, then this work could then contribute to an overall approach to designs that enhance the generation of trustworthy information[11] as well as data trustworthiness evaluation, such as Lukoianova and Rubin's Veracity Roadmap (Lukoianova & Rubin, 2014).

3 - Literature Review

I approached the literature review for this proposal similarly to a qualitative study, using Google Scholar searches for initial orientation in the information space and then snowball sampling as reading the literature brought new concepts and approaches to light. New documents were found based on keyword, citation, or author. Documents were collected and organized using AtlasTi. I used an open coding approach, since the boundaries of the corpus were undefined at the beginning of this process. Once I reached what I felt was saturation (also indicated by no additional codes being created), I began to analyze the relationships using term-based centrality, based on codes shared between documents. This process is similar to the topic-extraction technique developed by Kurland and Lee, with the difference that codes/topics are manually developed (Kurland & Lee, 2010). The process of developing this technique is discussed in the next chapter (The Lit Review that Ate Itself).

3.1 - Overall themes this thesis aims to support

The foundation of this thesis is that

  1. Interface design can be used to encourage a wider perspective on news items.
  2. With the proper interface, information browsing behavior on the part of news consumers can be used to infer a trustworthiness value for a given news story.

For this to be possible, an IR system needs to provide news consumers with an information environment that is inviting and worth exploring. In other words, the information provided by the system needs to be relevant to the broader information context(s) of the users, while providing attractive and compelling affordances for the user to access the stories that best meet their information needs.

3.2 - Conceptual grounding

It is a truism that the world’s population becomes more and more connected through information networks, backed by near-infinite storage. This appears to be having ramifications in the way that we handle knowledge. Siemens, when laying out his learning theory of Connectivism examines the ramifications of the following trends (Siemens, 2005):

With connectivism, Siemens states that “learning and knowledge has become a process of connecting specialized nodes or information sources”. It follows then that the “ability to see connections between fields, ideas and concepts is a core skill”. For the purposes of this work, there are two interesting implications that arise from these statements. The first is that the tools that we use to find the links that we use to build our knowledge greatly affect the knowledge we build. The second is that the network that we build as part of our learning and knowledge creation should both be uniquely identifiable and have elements that are common across multiple individuals. For example, consider individuals who follow a particular sports team. The other aspects of their knowledge networks may be completely different (authors, directors,  and politicians, for example), but it seems reasonable to look for common and adjacent information nodes when it comes to their favorite team and the sport.

3.3 - Relationship to design

This process of extracting smaller, more focussed information networks from larger proceeds through several steps that all have implications from design. A good example of this is the Wikipedia.  At this point in time, many searches on popular search engines (Google, Bing, DuckDuckGo, etc) will scour their ranked site lists (our enclosing set of indexed pages) for a term match, which is generally presented to the user as a list. Often high on this list is a page from the Wikipedia, which has its own set of design constraints (high-level navigation on the left, main article, tabs across the top for editing, discussion and history). The article in turn contains links to other pages in the Wikipedia as well as external sources. As the amount of interrelated information that makes up the particular network of interest shrinks, the interface can become less general. At the other extreme from the general search screen are ‘twitch’ videogames which require a small set of physical controls that are easily learned and mastered so that the user can search out and defeat a sequence of progressively more difficult opponents that require progressively more sophistication in the use of the controls.  

This largest, ‘containing network’ (the searchable internet, or in our case searchable news providers) consists of information that is accessible using low friction methods - usually some kind of query that is entered using a simple, generalized interface such as the search box. Because it is general, the task of the search engine is straightforward - find the highest ranking page(s) that match the query. This matching based on relevance means that the search engine does not have to provide any other relationship between the pages other than their rank. For example, adjacency is part of the PageRank calculation, but that information is not provided to the user in the ‘ten blue links’ of the typical result.

From a connectivist perspective, the information in a search result set is a human construct resting on a tower of interlinked human constructs - all digital information is directly produced or mediated through designed devices. Though this is an immense source of information, it is by no means all information -- One can’t yet query the natural world, for example. A query to that set is mediated through information retrieval systems that must filter and rank the information in a way that system designer (Momeni, Cardie, & Diakopoulos, 2015) believes provides utility to the user. A user can  then take that visible(accessible?) part of the network and incorporate the personally relevant aspect(s) of it into their knowledge representation. This representation is dynamic - “Choosing what to learn and the meaning of incoming information is seen through the lens of a shifting reality. While there is a right answer now, it may be wrong tomorrow due to alterations in the information climate affecting the decision(Siemens, 2005).

3.4 - Relevance and Pertinence

Saracevic describes this difference between information returned from a query and the information used to extend an individual’s knowledge network as difference between relevance and pertinence (Saracevic, 1975, p. 332):

“Relevance  is  the property  which assigns certain members  of  a  file  (e.g.,  documents)  to  the  question; pertinence is the property which assigns them to the information need. Subsequently, (as known  from  experience) some relevant  answers  are also pertinent; but  there  could  be  relevant  answers  that are not pertinent and pertinent answers that are not relevant. It  has  often  been  argued  that, from  the user's  point of view,  desirable  answers  are  pertinent  answers;  but,  in reality,  an  IR system  can only  provide relevant answers.”

Adding pertinence, or the answers to questions that haven’t been asked yet has found a widespread expression in Query Autocomplete (QAC), initially developed by Amazon in 2000 (Ortega, Avery, & Frederick, 2000). Based on aggregations of prior searches and other vaguely disclosed algorithms, the QAC provides a level of insight into what other people have searched for that begin with the same text. Below are examples from Google, Bing and DuckDuckGo and Yahoo[12]:

how Google Bing DDG Yahoo.jpg

Figure 5 - Query Autocomplete examples

As the string lengthens, the number of options shrinks, becoming more and more relevant as the query becomes more specific. Often, a user can simply select one of the queries at some point. I personally find this effective in conversions - metric to standard, HDMI to whatever.

To a degree, this addresses the issue raised by Saracevic that the user be exposed to potentially irellevant information that may wind up being pertinent. Users with an implicit information need, when presented with a search engine have to interact with the engine’s set of relevant responses to produce a pertinent result. However, the mechanism of autocomplete - where the search is anchored in the initial text reflects the bias of its shopping roots. It’s very effective to be able to type “SONY…” and get a list of ever narrowing items. This way of searching reflects the physical manifestation of the store, where on may in fact go the bread aisle of a supermarket and progressively narrow a search until the actual goal is reached, or a serendipitous item is discovered and substituted.

Since each user often has unique information needs, it should follow that the pattern of extracting pertinence from relevance is also unique and not limited to shopping metaphors, for example. As discussed in the motivations section, the design of the user interface (the SERP or its equivalent) should be able to support the discovery of a broadly pertinent answer. Otherwise the pertinent but irrelevant information is hidden from the user, which can result in a sparser, less well developed personal knowledge network.

3.5 - Information needs of news consumers

This is as true in finding news as it is in other forms of IR. According to a study by the American Press Institute about the trustworthiness factors in news, survey respondents ranked Accuracy, Completeness, Transparency, Balance and Presentation as the five most important factors among twelve when asked the question “Thinking about the sources you consider trustworthy, how important is each of the following factors?” (Media Insight Project, 2016).  The affordances of traditional journalism in the context of a news platform attempt to address all of these issues, though perhaps with varying levels of success and commitment. Criteria for addressing these elements are often explicitly addressed in items such as ethics guidelines (American Society of News Editors, 2016), and style and usage manuals (Siegal & Connolly, 1999).

Because a news platform is an integrated presentation of information under editorial control, a consistent approach to Accuracy, Completeness, Transparency, Balance and Presentation can generally be achieved. But what happens when articles are accessed by search? News search engines balance a variety of factors when retrieving news. The content needs to match, the source should be recognizable as news[13], and timeliness of the information is emphasized. These mechanisms can work to ensure that the results page accurately reflects the stated need of the user, and that the information is presented in a legible way, often as a list of headlines, snippets, associated pictures, source and timeliness information. The other, more editorial factors are more difficult to address in such a context. Consider a search in Google News for ‘Vince Foster’. The suicide of Mr. Foster, a friend and advisor to the Clintons, is a source of conspiracy theories among the right wing, where it finds its way into websites such as Infowars, Daily Mail, and Fox News. These allegations have long been disproven, yet the search brings up several links with stories promoting these discredited accusations along with stories that in turn again discredit the accusations. The information is not flagged as credible or not, so the user has to evaluate on their own. In this way, the factors of Completeness, Transparency and Balance is confused by the presence false information, which an editor would normally exclude. In a more recent example, the removal of the human editorial team from Facebook’s ‘trending’ feature has resulted in the promotion of patently false rumours, such as Megyn Kelly being fired from Fox News because of her support for Hillary Clinton (Ohlheiser, 2016). In these and similar situations where ranking algorithms are a primary interface between the user and the information, the need for high quality, trustworthy information cannot be relied upon.

3.6 - Issues with using search for news

Since there are bad actors in the real world, an issue that needs to be considered is the role of misinformation, particularly with respect to timely information like news (Lewandowsky, Ecker, Seifert, Schwarz, & Cook, 2012). Connectivism states that “there is now an emphasis on learning how to find information as opposed to knowing the information (since information obsolescence happens more rapidly, the value of the information is lower than the knowledge of how to find current knowledge)”. This sets up a tension between traditional mechanisms to produce trustworthy articles such as journalism, and more ad hoc systems such as blogs and feeds that do not have the fact-checking mechanisms of traditional media.

There is a high cost for fact-checking stories and vetting sources, but little immediate payoff. Consistently reporting on stories that turn out to be true builds credibility and long term audience, but it is inhibits good storytelling. Good stories are always attractive to the editorial staff of a news organization, and since the primary responsibility for fact checking is on the reporter, the tension between running a compelling read vs. spending additional resources on fact checking is an ongoing struggle for the editors. On need only to look to feature pieces major news organizations to find examples where lack of fact checking led to fiction or hearsay being published as vetted news. Examples include from the Washington Post -“Jimmy’s World”, a fabricated story about a child addict (Maraniss, 1981), Rolling Stone  - The insufficiently vetted ‘A Rape on Campus’ story (Rolling Stone, 2014) , or the New York Times, where staff reporter Jayson Blair committed multiple acts of journalistic fraud (Barry, et. al., 2003).  The effects of these disclosures on the other hand are significant for these organizations, resulting in a loss of prestige and credibility when reporting the next major story

As imperfect as it is, fact checking by media channels is affordable because of paid subscriptions and advertising. Aggregating stories together as a news organization allows for important but less popular stories to be supported by more popular ones. But this requires that the user be peripherally aware of what stories are available on the platform. When information was physical, it could be bundled. You may buy a paper for one item - the headline, for example - but wind up serendipitously discovering other items. Once in the ‘newspaper pattern’, it’s easier to stay there and explore using the newspaper affordances.

Search changes this dynamic. It lowers the friction of finding information across multiple relevant sources rather than within a single source that may also contain irrelevant yet pertinent information. Blogs and other free sources make this even more difficult for the consumer, since what appears credible may not be, but may be confused with an actual information source nonetheless (Haciyakupoglu & Zhang, 2015). Also, a free, easily produced and  emotionally appealing story may have higher value for a consumer than a (non-free) well researched story that disputes the reader's beliefs. If the friction to finding less-vetted free alternatives is low, then the economic value of producing the accurate, well researched story becomes lower. This leads to a vicious cycle where traditional high-cost journalism is forced out, leaving the user to determine what to believe among a sea of relevant results. There are ways of fact-checking available to the individual such as Snopes.com, but this increases the effort of integrating the information along with the concomitant issues such as confirmation bias (Eil & Rao, 2011).

3.7 - How news is changing in a Connectivist world

At the limit, this implies that economic forces could reconstruct news into a system without institutions, where the primary production of stories is user-generated by authors with varying levels of identity, ranging from completely anonymous to famous. To a degree, this is happening already. Storyful was founded in 2010 to incorporate social media into news, while on the other end of the spectrum,  Automated Insights is developing technologies to automatically generate news stories from financial reports and sports scores.  

Generally, the highest quality news occurs when news organizations have the opportunity and time to set up a bureau and have reporters situated for prolonged periods with editors and fact checking desks to back them up. More recently, the use of ‘stringers’, local freelance reporters with an ongoing relationship with the news media has become a more cost effective alternative.  Crisis Informatics is the study of unvetted news from crisis zones. The type of journalism described above may be the best source for fact-checked information - except when the journalists aren’t at the scene. Regardless of the source, the issues with credibility and trustworthiness are the same. Crisis informatics as developed by Palen, Meier, van der Windt and others is looking at the extraction and integration of citizen-based reporting from crisis zones (Haciyakupoglu & Zhang, 2015) where traditional media are unable to provide coverage (Meier, 2008).

3.8 - Previous work on fact-checking and trustworthiness

Fact checking as a traditional journalistic activity is a relatively formalized process that depends primarily on the journalist with potential backup by a ‘fact-desk’ or copy editor. The elements of good fact checking are spelled out in the New York Times Integrity Statement:

Concrete facts – distances, addresses, phone numbers, people’s titles – must be verified by the writer with standard references like telephone books, city or legislative directories and official Web sites. More obscure checks may be referred to the research desk. If deadline pressure requires skipping a check, the editors should be alerted with a flag like "desk, please verify," but ideally the writer should double back for the check after filing; usually the desk can accommodate a last-minute repair. It is especially important that writers verify the spelling of names, by asking . A person who sees his or her own name misspelled in The Times is likely to mistrust whatever else we print” (New York Times, 2003).

Here, credibility is obtained through extensive fact checking, particularly with respect to the smaller personal items. Flubbing an easily checked fact like a name can bring the rest of an article’s credibility into doubt.

The Times uses an explicit,  three layer fact checking approach that starts with the journalist, then the editor, then the fact check desk. Current research into fact checking takes a different, more implicit approach. There are roughly three broad categories - Credibility-based, Computational, and Crowdsourced.

Credibility-based approaches look at the cues that people provide with respect to the trustworthiness of their information. This is the oldest field of study, and proceeds computational approaches by thousands of years. And indeed, there are many mechanisms that can be applied to to either induce or detect trustworthy (or at least believed) information in humans. An overview of these techniques is covered in Eliciting Information and Detecting Lies in Intelligence Interviewing: An Overview Of Recent Research (Vrij & Granhag, 2014). One aspect that is common across much recent research is the understanding that fabrication requires higher cognitive load than truth-telling. As such, determining the difference between truthful information and lies consists of setting up situations that take advantage of this fact. For example, asking Unanticipated Questions that are outside the domain of the acts in question and require an on-the-fly fabrication of an answer as opposed to reliance on pre considered answers: “Liars, compared with truth tellers, gave significantly more detail to the expected questions and significantly less detail to the unexpected questions”, other methods include being interviewed while adding to the mental load of the exercise, like describing a trip in reverse order or drawing a picture of a location they claimed to have visited. Tentative research also points to evidence that liars are more motivated with why questions that have to do with establishing motivations while truth tellers are more concerned with questions involving how a goal had to be pursued.

Credibility analysis can also be performed computationally: Methods broadly focus on quantifiable credibility cues. For example, in Adaptive Faceted Ranking for Social Media Comments (Momeni, Braendle, & Adar, 2015), NLP is used to group youtube comments by topic relatedness and subjectivity/objectivity. Comments are re-ranked so that text that contain named entities that are mentioned in the original description and objective information such as timestamps are ranked highest in the comment list. Test users then evaluated the top 30 comments ranked by Adaptive Filtering and the default Reverse Chronological and Crowd-based ranking algorithms. A significant number of users found the new ranking scheme more relevant and interesting.

Zafar et.al., in On the Wisdom of Experts vs. Crowds - Discovering Trustworthy Topical News in Microblogs implement a computational version of ‘the appeal to authority’[14]. In other words, they define a set of ‘trustworthy experts’ and then use that information to evaluate the inferred trustworthiness of tweets. Trustworthy ‘experts’ are based on their occurrence in Twitter Lists processed using the TrustRank algorithm to find the most central sources by topic. Results indicate that this is an effective strategy. Human evaluation of quality and relevance when compared with the default ‘Twitter-top’ search resulted in a distinct preference for expert-sourced stories. One of the really interesting points that the paper uncovers is that indirect actions of twitter consumes (creation of lists) allows a more efficient inference of the news by producing a ranked matrix of experts and topics (Zafar et. al., 2016).

Computational approaches encompass methods that attempt to look at data that is already present and can be analyzed without additional input from users. This is particularly difficult, since it requires a computational system to have a mechanism for calculating what is true. The state of this research is covered extensively in Veracity Roadmap: Is Big Data Objective, Truthful and Credible?. It proposes that data sources and content should be evaluated on a scale from Objective, Truthful and Credible (OTC) to Subjective, deceptive and implausible (SDI). It then describes a set of potential approaches centered around natural language processing (NLP) to address these issues in a scalable way (Lukoianova & Rubin, 2014). As an alternative to NLP-based approaches, the structure of information can also be used for evaluation. In Computational Fact Checking from Knowledge Networks - Knowledge Graphs like those created from the Wikipedia, are used to evaluate veracity. In this approach, low numbers of hops between sources with few links are strongly associated with independently verifiable facts, while longer chains, particularly through highly-linked nodes are much less likely to be factual (Ciampaglia et al., 2015).

Crowdsourced. Crowdsourcing focuses on extracting high-quality content from participants in some kind of related action. These participants are different from Le Bon’s Crowd, which more rightly may be termed a mob (Le Bon, 2009). In a mob, the rules of group psychology prevail, which entail issues such as loss of individual identity. Conversely, the concept of crowdsourcing as popularized by Surwieki in his 2005 book, The Wisdom Of Crowds, concerns aggregating the decisions of large numbers of individuals (Surowiecki, 2004). For this research, we will focus on research and applications of crowdsourcing that provide mechanisms to enhance the sharing of information from individuals sharing a particular context. Such contexts can range from reports from conflict zones to comments about a YouTube video.

In the workshop report, Sensing and Shaping Emerging Conflicts, the focus is on ‘communicating ground truths that facilitate change. This means overcoming numerous issues ranging from access to technology to trust between the sources and users of the information. A good example of an incomplete loop was a project in the Eastern Congo called Voix des Kivus. In it, individuals were selected and trained, information was effectively gathered, but no humanitarian effort was ever mounted in response to the gathered data (Robertson & Olson, 2013). A more successful example is Ushahidi, which started as the efforts of a Kenyan blogger named Ory Okolloh and has grown into a deployable system for mapping human rights abuses, illegal pollution and sexual violence (Ushahidi, 2008).

Crowdsourcing contributes to both depth and accuracy in reporting. Traditional news media does not have the resources to be everywhere at once, and at more remote, local levels accuracy can suffer. In On the Accuracy of Media-based Conflict Event Data, data from traditional media in Afghanistan is correlated with US military SIGACTS (SIGnificant ACTivities). It is found that more than 50km from a major population center, the accuracy of reporting declines dramatically (Weidmann, 2014). Microblogging by citizens can fill this gap, though fact checking and trust issues become critical. Seeking the Trustworthy Tweet: Can Microblogged Data Fit the Information Needs of Disaster Response and Humanitarian Relief Organizations discusses the issues blocking the use of microblogging as an actionable resource for humanitarian organizations. Primary among these is the costs of ‘...committing to the mobilization of valuable and time sensitive relief supplies and personnel, based on what may turn out be illegitimate claims...’. Potential solutions involved bounded microblogging, which is limited to trusted sources, contextual, which integrates microblogging into a more generalized contextual frame, rather than providing directly actionable information, and computational solutions, which involve machine learning approaches to promoting trustworthy information (Tapia et.al., 2011).

Lastly, game theoretic and economic models have been explored for ways of increasing the quality of user-generated content. The issue with low-quality content as Ghosh and McAfee present in Incentivizing High-quality User-Generated Content is that the barriers to user content are low, resulting in a ‘free-entry Nash Equilibrium’. If the primary motivator is exposure, then flooding a site with low quality contributions is an effective and rational choice. They propose a model that links exposure to content quality independent of the identity of the user. This results in an equilibrium that can be shown to alter the ratio of low/high quality content[15] (Ghosh & McAfee, 2011).

3.9 - The state of User-Generated-Content in News

News reporting from, for example, local citizens in a crisis zone using social media requires automated, scalable mechanisms to filter and process the user-generated content (UGC). Ideally, these mechanisms should be able to be able to provide context, ground truth and filtering of streams like social network feeds.

In the previous section, we saw that current research into fact checking and trustworthiness range from interviewing techniques to NLP-extractible credibility cues to economic and game-theoretic approaches to encouraging high-quality content. Looking to capitalize on existing information streams  Momeni, Cardie and Diakopoulos surveyed and classified  the current state of computationally evaluating UGC directly.

They found  that values can be determined in aggregate via crowdsourcing, by an individual user with respect to particular information-seeking needs, or by the platform designer. They then describe a taxonomy that identifies a set of influential features that can be used for evaluating UGC. Some of these features lend themselves well to a machine-centered approach. Some are better suited to a human-centered method.  The system designer is responsible for determining the usage and mix of these methods and features that are  used in the production system. They find that hybrid systems that involve aspects of unsupervised machine learning (due generally to scale issues) and interface design that leverages personal interests of the end user can provide more accurate and complete organization of UGC. With respect to transparency, the use of humans to evaluate information provides provenance and justification for a particular answer (Momeni, Cardie, & Diakopoulos, 2015). An excelent example of this running at moderate scales is the Consider.it system, which implements a web-based platform for considering the pros and cons of difficult political questions (Kriplean, Morgan, Freelon, Borning, & Bennett, 2012).

3.10 - The limits of UGC and user feedback

Cha et al describe UGC as a form of deliberately published information (Text, pictures, videos, etc)  by non-professionals (Cha, Kwak, Rodriguez, Ahn, & Moon, 2007). Cha’s and subsequent studies such as Cheng et al have characterized the video and comments section of YouTube and find that the production of UGC is far greater than professional content generation and it follows the power-law model with respect to user participation and content production. Simply put, a power law model would predict that 10% of the users provide 90% of the content, and 10% of those users would provide 90% of that content and so on (Cheng, Dale, & Liu, 2008).

What this means is that only a relative few users contribute enough content to support extensive credibility analysis. One tweet does not necessarily indicate a crisis. But with more messages comes increased credibility. But as we’ve seen above, the vast majority of users do not contribute meaningful amounts of UCG. But they do engage in many other examples of data generation, such as interactions with search engines (Weber & Castillo, 2010). Although not without issues, Google’s analysis of influenza-related searches has provided a remarkably credible alternative to CDC-collated reports (Cook, Conrad, Fowlkes, & Mohebbi, 2011).

Added to text interaction is the user’s clickstream and cursor motions among other trackable items. For most users, this collection of textual and physical interaction with our computer interfaces makes up the majority of content produced. Can this be used instead of ‘traditional’ UGC for fact checking purposes? There are two parts of this question that need to be examined with respect to this question: 1) The explicit interaction of the user with the system, where queries are issues and results are evaluated, and 2) The implicit interaction of the user with the interface that might inform the system of the value of the information presented.

Relevance feedback is the process by which a user interacts textually with a search system to find desired information. Manning, Raghavan and Schutze succinctly describe the process with the following steps (Manning, Raghavan, & Schütze, 2008, p. 178):

The issue here is that although effective in refining a result, getting users to rate feedback is difficult. Are there methods that exploit actions that all users take?

3.11 - Implicit Feedback

Techniques such as pseudo relevance feedback and indirect relevance feedback are used where clickstream and query history is used to produce the most relevant results. This has turned out to be quite effective in improving the precision of results with respect to a particular user. Might this also be possible with classes of users? Are there group levels of browsing and searching behavior that could be used to vet UGC? In Presenting Diverse Political Opinions: How and How Much, Munson and Resnick describe how they were able to isolate three information navigation behaviors with respect to political news among a population of Mechanical Turk workers. These patterns were Diversity-Seeking, Support-Seeking and Challenge Averse. These populations could be distinguished by the level of satisfaction that they indicated when being presented with a list of articles that had varying degrees of disagreeable political content. The users were presented with a list of stories in a news-aggregator format and asked if they felt that their viewpoints were represented and if the list appeared biased using a 5-point Likert scale. Though the Support-seeking and Challenge averse populations were more difficult to disentangle, the difference between those two behaviors and the Diversity-seeking pattern was straightforward to distinguish (Munson & Resnick, 2010). We will discuss how the three elements of Diversity-seeking, Confirming and Avoiding may map to exploring and exploiting biologically-based behaviors in the next section.

3.12 - Exploring,  Confirming, and Avoiding vs. Exploring and Exploiting

The fact that diversity-seeking or ‘exploring’ behavior is more distinctive than confirm/avoid correlates strongly with research into explore/exploit patterns of behavior. This is a significant field of study that encompases biology, game theory and neuroscience.

Confirming, avoiding and exploring behaviors embody different ways of searching, and apply to physical and informational systems equally. A route from one location to another can be found by exploring alternate routes, by using familiar roads, avoiding toll roads, or combinations of the three. In each case there are costs and rewards for each choice. More generally, this is known as the explore-exploit problem, and is discussed in detail by Cohen et al. In their review, they describe both theoretical and biological framings for the tradeoff between exploring a system to find increased reward or sticking with and exploiting a current choice:

...the distinction between expected and unexpected forms of uncertainty may be an important element in choosing between exploitation versus exploration. As long as prediction errors can be accounted for in terms of expected uncertainty—that is the amount that we expect a given outcome to vary—then all other things being equal (e.g. ignoring potential non-stationarities in the environment), we should persist in our current behaviour (exploit). However, if errors in prediction begin to exceed the degree expected—i.e. unexpected uncertainty mounts—then we should revise our strategy and consider alternatives (explore) (Cohen, McClure, & Yu, 2007).

From a theoretical perspective, the canonical model is the ‘multi-armed bandit’ model, devised by  John Gittins. In it he proposes a set of one-armed bandits gambling machines that a user can choose from. Each has it’s own probability of payout and a reservoir of cash. The user can either explore by testing some number of machines or the user can stick with the current machine (exploit). It turns out that under special conditions (2 bandits), an optimal answer can be calculated.  

Biological systems seem to have developed to address this tradeoff. In studies of birds, where the bandits were represented by automated bird feeders, birds were able to develop near-optimal strategies for how to balance exploring multiple feeders and exploiting the current feeder. More recently, Badre, Long and Frank[16] have mapped areas in the human brain that are activated during explore or exploit behaviors (Badre, Doll, Long, & Frank, 2012), while Wilson et. al. were able to develop simple games that manipulated the ratio of time spent in the explore or exploit behavior pattern (Wilson, Geana, White, Ludvig, & Cohen, 2014).

For our purposes, confirm and avoid are included in the exploit pattern. One can imagine this being applied to the multi-armed bandit problem where ‘bad luck’ machines are avoided and ‘good luck’ machines are embraced.  There does not appear to be game-theoretic or biological work addressing this level of granularity of the exploit pattern. That being said, a more in-depth analysis that unpacks how confirming news and disconfirming (avoiding) news is incorporated by individuals was performed by Eil and Rao. They show that “subjects incorporated favorable news into their existing beliefs in a fundamentally different manner than unfavorable news. In response to favorable news, subjects tended to respect signal strength ... albeit with an optimistic bias. In contrast, subjects discounted or ignored signal strength in processing unfavorable news’. This implies that rather than an overall confirmation bias, that the strongest effect (and therefore potentially detectable) is a ‘disconfirmation bias’. They attribute this to “the ego utility consequences of "being right." … Confirming signals are always good news and disconfirming signals are always bad news” (Eil & Rao, 2011).

The concepts of explore/confirm/avoid and explore/exploit are straightforward to conceptualize but difficult to quantify. How similar is similar? What does it mean to be different? To improve my understanding of the issue, I have begun to model these behaviors in a simple agent-based system (discussed in more detail in section 5.4). In the model, a similarity measure is calculated by comparing the number of shared ‘statements’ between individuals. Similarity and difference are manifested as attractive and repulsive forces in a 2D environment. In this way, the group manifestations of these behaviors can be explored. Some initial visualizations are shown in figure 6, below.GP-Sequence.jpg

Figure 6 - Simulation of Explore/Confirm/Avoid Behaviors

 In the first frame, the population is initialized with a set of ‘reddish’, ‘greenish’ and ‘blueish’ statements (Strings) that are collected into a ‘belief’ (Collection of Strings). Positions are randomly allocated. In the second frame, confirming behaviors are triggered, causing red to be attracted to red, blue to blue, etc. Since the beliefs of each agent are not the same, the clustering is obvious, but loose. In the last frame, a form of Group Polarization is triggered where agents actively determine to avoid those who are more different and adjust the statements in their beliefs to become more congruent with similar agents. As a result, the agents cluster more tightly, cast out a few who do not adapt, and move further away from dissimilar groups. I find it particularly interesting that his sort of geometric approach to visualizing these behaviors appears to have the ability to measure straightforward distance relationships between groups. One of the main goals of this research will be to extend and improve this model based on measured activities by actual users navigating news.

3.13 - Implicit feedback in news browsing

Another study that looked at the news-browsing behaviors of users was Park, et al’s NewsCube: Delivering Multiple Aspects of News to Mitigate Media Bias, This paper focusses  on a framework to solve the ‘media bias problem’ computationally. To this end they develop text processing and classification that they call ‘aspects’ that are conceptually designed to run at scale within the constraints of the rapid news cycle. ‘Aspect level browsing’ is a concept where a diverse (hopefully with cancelling bias) set of articles are presented to the user. In addition to titles and snippets, users also get keywords that were used to determine the aspects. This represents a more sophisticated mechanism for aggregating news that may be useful in indicating the populations identified by  Munson and Resnick. They also provide user feedback with respect to the aspects already explored so that the user is ‘nudged’ in the direction of reading more diverse articles. As discussed in the introduction, Marco de Gemmis et al approach a similar issue in recommender systems by using knowledge infusion from sources such as Wikipedia and Wordnet to introduce serendipity into movie recommendations.

One of the most interesting elements of the NewsCube implementation was the concept of structure-based extraction. Journalism has a set of particular practices, such as the lede, that are designed to place the most important information at the beginning of the story and place less important details towards the end. Taking this into account allows for more targeted keyword extraction than could be performed on more general purpose text. This allows for better clustering of news articles based on similar keywords found across titles, headings and ledes so that story diversity is maintained in the presentation. When evaluated against a RandomCube implementation that presented random articles using the NewsCube interface, clickstream history showed that users not only read more diverse stories but also read more stories. However, the reading of these stories did not appear to change the bias of the readers, who claimed that the stories that they disagreed with had low credibility (Park, Kang, Chung, & Song, 2009).

3.14 - Framework for the proposed system

To explore how these apparently fundamental aspects of human information integrate, I propose to build on some of the major elements developed for NewsCube. Bias may not be something that can be eliminated, but as Sonnenwald shows information horizons can be extended (Sonnenwald & Wildemuth, 2001). The  mechanisms of aspect-level browsing may be used to provide a supportive environment for diversity-seeking users to increase their visibility into the greater information networks that they have access to while not forcing the averse and confirming out of their comfort zone.  

As discussed earlier, this approach attempts to embody frictional design. It is not about forcing people to change their behaviors.  “Frictional infrastructures  do not stop citizens from carrying on as intended; they do not break down completely in the face of inaction or disinterest. Rather, they  serve to  make citizens pause,  to be disruptive without bringing things to a halt. Friction happens in a flicker of a moment in the product-residue of everyday life” (Korn & Voida, 2015).  At the same time, this friction allows the user to inform the underlying system about what it is that they value and how they incorporate information that might be over their normal information horizon into their information gathering process. These in turn are can be foregrounded to inform the information horizons of others. This also embodies frictional design by “employing trace data of infrastructural use in  order to critique those infrastructures, or even to reveal them in the first place”

 As such, the proposed approach has two goals; (1)Increase the diversity of information to the user such that pertinent information may be discovered and (2), harness the user’s browsing patterns within this more diverse space to look for implicit indicators of ‘trustworthy information’. With that in mind, let’s look at some initial work that I’ve done in this area to support a more diverse, frictional interaction with an information corpus that could be part of a search result.

4 - Literature Review 2 - The Lit Review That Ate Itself

User generated content is generally considered to be statements from a user. But statements are a small part of computer mediated communication. Statements can be truthful, false, incomplete or confused. But actions are also mediated and evaluated by computers as we interact through them with others. And the amount of information contained in these actions is certainly larger than our deliberate actions of writing and speaking. But as can be seen in the previous lit review, the vast majority of work with respect to credibility and trustworthiness has to do with the content we deliberately create - our postings. But as the truism goes, ‘actions speak louder than words’. What does that mean for the evaluation of trustworthiness and pertinence in news?

Regardless of whether we are using the terms Explore/Exploit or Diversity Seeking/Confirming/Avoiding, these are terms that describe actions, not statements. Actions are guided and suggested through the use of affordances (Norman, 1999)[17]. Currently, the prevalent affordances in IR are the search box and the ‘ten blue links’ search results page. As I have stated earlier, I believe that this presentation best supports a ‘confirming/exploiting’ action pattern since the the only action available is to select from the list or submit a new/modified search.

But what would an interface look like that affords exploration as well as exploiting? How would the explorable information be presented? What additional actions should the user be able to perform? For that matter, what is a good use case to evaluate these concepts?

This chapter describes a research through design (RtD)/autobiographical design (AD) process that I undertook to explore these questions. In both research through design and autobiographical design, the focus of the effort is the creation of a usable artifact. RtD typically involves groups of designers, ethnographers, engineers and users iterating over a design until it provides sufficient usability. AD on the other hand collapses all the roles onto a single researcher/user. The goal in AD is also to produce a usable output, though applicability beyond the author is not a requirement (Neustaedter & Sengers, 2012).

The use case is the lit review for the proposal that you are reading. Because of the focus of this proposal on creating affordances for exploration, it seemed justified to evaluate what kinds of user interfaces might encourage exploration of search results. As a result of developing personal tools for this effort, I believe that I can demonstrate that direct manipulation techniques that act on network representations of terms and documents provide a mechanism to explore a corpus interactively rather than as a series of queries and results. In particular, this chapter shows that dynamic, interactive reranking of pertinent concepts and documents is possible for useful-sized corpora on non-specialized hardware and software, ranging from Java on a PC  to smartphone browsers. This opens up a different way for users to interact with search results that may provide a pathway for better supporting supporting exploring-oriented behavior.

4.1 - Background

As mentioned in the previous Literature Review section, I’ve been treating my collection of papers as a corpus that I’ve been coding, using an open coding approach as described by Merriam in Qualitative Research (Merriam, 2009, p. 178) . The tool that I chose to structure my lit review was AtlasTi. Although not common, others have used AtlasTi for lit reviews as well (Pope, 2016). In addition, I was particularly interested that AtlasTi can create a number of outputs, including a code/document table that can be exported to Excel. This means that network analytics, such as the calculation of centrality and adjacency can be directly calculated from such term/document tables.

4.2 - Centrality as a technique for evaluating qualitative data

Term and topic centrality are particularly interesting ways to explore a corpus. I had already used PageRank to calculate relationships between topics (Haveliwala, 2002) and speakers using Politifact’s data api. Doing these calculations in Java were taking fractions of a second, so the possibility of running these calculations online and interactively seemed reasonable. To explore this, I extended a news exploring webapp I had written to interactively rate text-containing nodes in a network and dynamically rank the nodes using the PageRank algorithm. Remarkably, it turns out that interactive rates for matrices consisting of tens to hundreds of nodes are easily achievable in JavaScript and Java. Update rates on a wide variety of platforms, from high-performance desktops to iPhones typically maintained 30hz update rates.

Currently, my ‘corpus’ includes 37 papers and bit under 200 codes. Code frequencies appear to follow a logarithmic distribution, with a minority of codes (Information Networks, Bubble Patterns, Credibility Cues) accounting for the majority of the overall code frequencies, and broader documents such as survey papers containing the majority of codes. The distribution frequencies are shown in figures 1 and 2 below:

Figure 7 - Number of codes per document

Figure 8 - Code frequency across all documents

A point worth restating is that these charts represent a ranking based purely on frequencies. How these codes and documents interrelate with respect to influence and centrality is not visible. And indeed, if documents and codes are ranked purely on frequency, the result lists long papers (with lots of codes) first, intermingled with codes (in green) that have no clear association with the adjacent papers:

Name

Code Count

P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf

158

P13: The Egyptian Blogosphere.pdf

100

P10: Sensing_And_Shaping_Emerging_Conflicts.pdf

93

P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf

88

P 5: Saracevic_relevance_75.pdf

88

P77: The Law of Group Polarization.pdf

83

P43: On the Accuracy of Media-based Conflict Event Data.pdf

75

P37: Security-control methods for statistical databases - a comparative study.pdf

62

P30: Crowdseeding Conflict Data.pdf

56

P 4: PageRank without Hyperlinks - Structural Reranking using Links Induced by Language Models.pdf

54

Information Networks

54

P46: Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013.pdf

53

P59: Incentivizing High-quality User-Generated Content.pdf

53

P 9: JunJul10_Palen_Starbird_Vieweg_Hughes.pdf

49

P33: The Future of Journalism - Networked Journalism.pdf

47

P40: The Hybrid Representation Model for Web Document Classification.pdf

46

P54: Siemens_2005_Connectivism_A_learning_theory_for_the_digital_age.pdf

46

Bubble Pattern

45

P32: Communication Power and Counter-power in the Network Society.pdf

43

Credibility Cues

43

Table 1 - List of top 20 documents and codes by coding frequency (codes in green).

My intuition was that PageRank should convey more of the relationships between terms and documents. The reason for this is that the algorithm calculates ‘centrality’, when can be viewed as the probability that a hypothetical ‘random web surfer’ will wind up at a particular page. Pages with high more links are more central, but also pages that are backlinked to pages with more links (Page & Brin, 1999). Links do not have to be hyperlinks. Any item that can be viewed as ‘connecting’ documents can be used. In this case, the codes were used as conceptual ‘implicit links’. Other approaches based on textual analysis have also been found to be productive (Kurland & Lee, 2010).

4.3 - Initial Implementation

PageRank, as described by Page and Brin,  works by repeatedly squaring a matrix that consists of a set of web pages and the links between them. In this case, the links are terms, so I produce a symmetric matrix that contains the documents and the terms in the rows and columns. As such a document does not link directly to another document, but links through a term.  The values in the matrix are unitized with the respect to the highest value in the matrix and summed with an identity matrix. The resulting matrix is subsequently squared and re-unitized until the differences in the sums of two sequential eigenvectors are less that a specified epsilon (currently 0.1, derived empirically). As can be seen in figure 8, the eigenvectors converge quickly to this epsilon, taking 4 to 7 iterations to stabilize to the 0.1 epsilon:

Figure 8: Eigenvectors from term-document matrix converging.

The resultant ranked list from this first run turned out to be remarkably different from the list shown above. The results are shown in Table 2. The Eigenvalues are omitted, since they are used only for ranking purposes and do not provide additional information:

Name

P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf

System Trust

Social Trust

P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf

P84: What is Trust_ A Conceptual Analysis--AMCIS-2000.pdf

P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf

Credibility Cues

P13: The Egyptian Blogosphere.pdf

P10: Sensing_And_Shaping_Emerging_Conflicts.pdf

P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf

Information Networks

Pertinence

P 5: Saracevic_relevance_75.pdf

Physical Presence

P77: The Law of Group Polarization.pdf

Bubble Pattern

Misinformation

P43: On the Accuracy of Media-based Conflict Event Data.pdf

P 8: Rumors, False Flags, and Digital Vigilantes - Misinformation on Twitter after the 2013 Boston Marathon Bombing.pdf

Relevance

Information Quality

 Table 2 - Centrality Rank of Papers and Codes (codes in green)

Note first that the papers and terms are much more intermingled. Also, and more importantly, terms are close in rank to papers that are pertinent to that term. For example, Technology Humanness and Trust-Rethinking Trust in Technology is ranked closely to System Trust and Social Trust, two terms that I coded heavily in that paper. Similarly, Credibility Cues is ranked closely to Social Media and Trust during the Gezi Protests in Turkey and The Egyptian Blogosphere, and Saracevic_relevance_75 is associated with the code Pertinence.

These relationships strongly reflect my understanding of the relationships between the documents as I coded them, with the advantage of now having a persistent (not just in my head) representation that I can refer to. As such, I can regard this is as a reasonable representation of the way that I made sense of the corpus as I went through it.

The initial pass through the term-document ranker included all the codes that I had used including meta codes such as ’definitions’ and ‘methods’ that merely pointed to a section in a document where a code was defined, or where the authors used a particular research method. These were some of the most common codes, but they had no thematic component. Realizing this, I excluded them and similar codes from subsequent calculations. This resulted in a subtle reordering of the calculated ranks that better reflected my understanding of the relationships of the documents.

4.4 - Advantages of the approach

After reviewing the presentation of codes and documents and reflecting on the process, It became apparent to me that this principle of representing the relationship between qualitative elements could have some distinct advantages:

4.5 Adding interactivity (tool building)

The next step was to explore the contribution of direct manipulation to the understanding of the relationships expressed in this network. Since the live manipulation of one set of data (term-document matrices) to produce a dynamically updated list of items is difficult to model or approximate due to the unconstrained nature of the design space, it seemed most appropriate to use the methods of Research Through Design to evaluate and iterate on interaction mechanisms and displays (Zimmerman, Forlizzi, & Evenson, 2007).  To this end I developed the  Java testbed shown in figure 5:

Figure 9: Java application with ‘Context Need’ code selected

The application consists of two primary areas. In the column on the left, there are a set of controls that provide for the direct manipulation of the weight[18] of a node. The right two-thirds of the screen contain the ranked sets of terms and documents. What is visible can be filtered by entering text into the text field in the top right of this area. Within the program, the rank is calculated across both documents and codes, but separating the terms and documents into different columns seems to provide more clarity (The saved output of a session, an Excel workbook, contains the full, integrated matrix as calculated and manipulated).

Selecting a term in either the Term or Document table causes the details of that node and its connections to be shown in the control panel on the left. Manipulating the slider then changes the weight of the node, causing a recalculation of the entire matrix. A measure of the statistical difference between the original matrix and the current, manipulated matrix is shown in the bottom-left corner. If the average of the means as computed using the bootstrap method[19] differs by more than one standard deviation, the color of the top line changes from green to red.

As an example of the interaction effects, figure six shows the change in the ranking when the weight of ‘Context Need’ is increased from 1.0 to 1.93, which moved it to the top of the Term list: Note that both the term and document lists have re-ordered. My AtlasTi Codes (Information Networks, Relevance and Physical Presence) and their corresponding documents (P10: Sensing_And_Shaping_Emerging_Conflicts, P 1: Social Media and Trust during the Gezi Protests in Turkey, and P 9: JunJul10_Palen_Starbird_Vieweg_Hughes) have moved up in rank. This is entirely reasonable, as these are papers that discuss groups of people sharing and seeking relevant,  highly contextual information.

Figure 10 - Re-weighted network. Items that have moved up in rank are in green.

4.6 - Determining the amount of manipulation

One of the first questions asked when I started to show the system and get comments was whether or not such manipulation fundamentally changes the representation of the corpus. In other words, is the network being forced to represent a relationship that is not actually present? There are a number of ways that such a difference could be measured. Weber has developed a method for comparing ranked lists (Weber & Castillo, 2010).  Valenti, in How Correlated Are Network Centrality Measures? Finds significant but varied correlations between the network measures of degree, closeness, betweenness, and eigenvector, so in principle any of these measures should also be usable (Valente, Coronges, Lakon, & Costenbader, 2010). These are meaningful and commonly used to compare networks, but they are more geared towards comparing two different networks (the graph isomorphism problem - to devise a good algorithm for determining if two graphs are isomorphic). Since the manipulations in this concept involve reweighting the same network, I wondered if there might be a way to compare the networks more directly.

Bootstrapping is a method of statistical analysis that can be used to analyze systems that have have complicated or unknown distributions (Efron & Tibshirani, 1994, p. 17). Since the term/document matrix represents a population with a complex distribution, it should be amenable to bootstrapping. Since the matrix that is manipulated is the derived, symmetric matrix,  I use this as the underlying population that is used for bootstrap resampling. These matrices only differ by the weights that have been applied to the manipulated matrix. As such, the measures such as adjacency and diameter of the two networks we are comparing remains unchanged[20]. A good question to ask is ‘how much manipulation results in a meaningful difference in centrality? The intuition used here is that if the mean of the values in the manipulated matrix is more than one standard deviation from the mean of the original population then the two matrices can be considered to be substantially different. One standard deviation is an arbitrary choice, but it does mean of the new population will be different from 84% of the original matrix population.

It follows from this approach that manipulating an item of high centrality should have more effect on moving the mean that manipulating an item of low rank. In this collection, adjusting the weight of the highest-ranked item (A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web) by a factor of 1.31 causes the the means to diverge by more than the allowed criteria. On the other hand, scaling the lowest ranked item (ACLED_User-Guide_2016) by the maximum factor allowed by the interface (10.0), does not move the mean beyond the one standard deviation criteria. This allows for the creation of a GUI component that shows how much manipulation via reweighting has occurred. The example described above is shown in figure 11, below:

bootstrap.jpg

Figure 11 - deviation from original network using bootstrapping

As can be seen in the above figure, there are two ‘original means. One is calculated with respect to the term frequencies, the other is calculated using the normalized values. In this case, the calculations are being done with respect to the non-normalized means. When the current dynamic mean differs by more than one standard deviation, the values are rendered in red, otherwise, the values are rendered in green.

4.7 Comparison with traditional open coding analysis

Once I had begun using centrality as a way of analyzing codes, I realized that  looking at relationships this way externalizes some complex relationships that can get lost using more traditional coding methods. One of the main goals of coding is to group codes into categories so that theoretical constructs can be generated and evaluated. Merriam describes this as ‘When categories and their properties are reduced and refined and then linked together, the analysis is moving toward the development of a model or theory to explain the data’s meaning’ (Merriam, 2009, p 192). Although not discussed in these terms, this is a form of data compression (through abstraction), with the associated loss in information (Gray & Tall, 2007). For example, consider a set of codes that are closely related (and sequentially ranked)  in my schema : Information Quality, Credibility Cues, Relevance, Misinformation and News Credibility. A natural clustering from my perspective might be to collapse the Information  Quality and Misinformation  into one ‘Information Quality’ cluster, the two Credibility codes into another ‘Credibility’ cluster, and keep Relevance separate. This makes intuitive sense, but would result in the loss of some interesting implicit relationships that are exposed using centrality calculations.

For example, if I manipulate the Information Quality weights, I can see that it has little influence on the rank of Misinformation, but does ‘pull up’ Credibility Cues and News Credibility. The relative motion of the codes can be seen in figure 7, where the the credibility codes are in green and the Misinformation code is in red.

 

Figure 12 - effects of re-weighting Information Quality

Note that there are more ramifications in the ranking than just the relative motion of the four codes mentioned above. Items such as Game Theory and Databases have entered the list, while the two Anonymity-related codes have dropped off the list.

Coding is subjective and interpretive and based on much more than a straightforward implementation of codebook interpretation. My codebook for this lit review describes Information Quality as “In this context, a 'measure' of the consistency and trustworthiness of a corpus or other information source”. There is also an example to provide additional guidance.  But examining the  re-ranking effects implies that this code may have a more subtle and implicit relationship to a more data-driven view of information. These latent relationships would be lost if the original, explicit, grouping discussed above was used. Further, I can hand the term-document table to any other researcher and they can explore these relationships to clarify the original researcher’s implicit perspective on the corpus used in the generation of a given theoretical construct.

4.8 - Automated analysis

Up to this point, I’ve discussed the analysis of a hand-coded corpus. Since automated term-document classification is a common practice, it seems reasonable to see how this technique applies to a ‘machine-coded’ set of documents. Understanding that topic extraction is an intensively studied area in research (e.g. Kurland & Lee, 2010), the goal here is not to develop a new topic extraction method. Rather, it is to prototype enough of a system to explore the area of interest in this paper; the use of centrality and interaction to explore relationships between documents. As such, this is a deliberately simplistic approach. It seems reasonable to assume that if naive term/topic extraction can be shown to work, then that result should generalize to include  more sophisticated forms of topic extraction[21] as it applies to producing term-document matrices and their associated networks that can then be interactively explored.

Continuing with the approach of research through design,  I developed an additional set of software that scans a set of text, PDF or web/html documents and performs  term frequency analysis (Figure 8). The output of this program is an Excel spreadsheet that can then be read into the testbed described above. At this point, the interactivity of this application is low, serving merely to collate documents into a corpus for the testbed to use, but it may turn out that these capabilities would also benefit from a direct manipulation approach.

Figure 13 - Document scanning and term extraction interface

4.9 - Determining terms

In this approach, PDF, HTML or text documents are selected by the user and stored as statistical representations of the documents using lemmatized[22] words using the Stanford NLP Library. Once all documents have been read in,  TF-IDF is calculated for each document. Latent Semantic Indexing (LSI) is performed over the superset of all documents (the corpus). This produces a set of terms that are shared across the corpus (Manning, Raghavan, & Schütze, 2008 p 116-121, 382). The output of this process is an Excel spreadsheet with documents listed by row and term by column[23]. The terms are ordered from highest to lowest LSI score, so the the user can then edit the spreadsheet to select the top n terms  (since the number of terms can extend into the thousands). This new term-document matrix can then be read into the testbed.

4.10 - Initial results from automated analysis

The first results from this approach as they appear in the testbed are shown in Figures 16 and 17. The documents consisted of four recent papers where Dr. Lutters was last author, and in his estimation they appropriately represent the content of the papers. These terms were calculated according to the description above. The highest ranking 52 terms were then selected in the resulting spreadsheet and the rest of the 2,000+ terms were deleted. This sheet was read into the testbed, and the screenshot shows the initial result. The ranking is not the same as the raw LSI ranking in the spreadsheet (micronote, solicitation, acceptance, etc., vs. micronote, lifecycle, memory, etc.). Instead the PageRank algorithm used the term frequency within and between documents to determine the centrality of the LSI-extracted terms with respect to the documents. And, as with the manually generated code lit review corpus described above, it is manipulatable, either by document or by term.

The primary difference between the manually coded documents and the LSI parsing is what ‘counts’ can mean. In the case of the manually coded documents, ‘count’ is the actual count of the code within and across documents. This represents the distribution of concepts across the corpus as interpreted by the reader. Further, these codes may or may not use grounded terms within the document, so terms in the document may not directly map to codes used to describe an understanding of the document.

In the case of the auto-generated list, the frequency of terms in the document are counted. Latent Semantic Indexing then creates an eigenvector of representative terms. This should more closely (though by no means completely) encode the concepts of the authors as reflected in the terms that they chose and that the system determined were most uniquely representative. So then the question arises about how to classify the strength of the connections. One option would be to use the term counts as with the manual coding described above. The other option is to preserve the ranking information generated in the LSI process. Since the output from this application is to produce an Excel spreadsheet that is then read into the testbed, both variations are produced, with both using the ranking produced by LSI.

In the case of the matrix using the LSI ranking for node weight, the final LSI eigenvector is normalized on a 0-100 with respect to the highest value in the list, which in this particular case is ‘micronote’. This gives a more readable indication of the rank the item received without affecting the relationship between the ranked items.

Figure 14 shows the (0-100) weight given to each item in the ranking vector. Note that this distribution(?) is best fit with a power function as opposed to the logarithmic best fit that emerged in the manual coding.

Figure 14 - LSI term distribution

When counts are used instead of calculated weights, a different distribution is apparent in figure 15. The sorting of terms (within the LSI-selected and truncated to 52 items) is somewhat different, and the best curve fit to the distribution is logarithmic. This may imply that term counts may reflect underlying relationships in a way that more closely matches the manual coding effort described earlier.

Figure 15 - LSI extracted term frequency across documents

Once prepared as described above, the spreadsheet can be read into the testbed. The LSI example of this approach to ranking and centrality based on weights is shown in figure 16.  The calculation of centrality using counts while maintaining the same set of terms is shown in figure 17. Note that the centrality parsing does result in different rankings.

Figure 16 - Initial results of LSI parsing in testbed, using LSI rank for centrality (‘micronote’ highlighted for comparison)

Figure 17 - Initial results of LSI parsing in testbed, using counts for centrality (‘micronote’ highlighted for comparison)

Since the LSI representation is the summation of the somewhat arbitrary TF-IDF values, the term-frequency-based display seems more intuitive and useful

At this point I have shown that a centrality-based approach to ranking documents and terms/codes is a potentially useful and intuitive way of exploring a corpus. This is particularly interesting because manual coding represents an extremely sophisticated form of ‘topic extraction’, while LSI is one of the simplest. This strongly implies that any form of topic extraction that lies between these two extremes should also work. It is my speculation that centrality is effective in this application because it is a measure of the ‘connectedness’ of nodes across a network. In this case the nodes are documents, linked by concepts, represented as terms or codes. This implicit linking reflects an important element of the lit review process, which is to synthesize relationships between publications.

4.11 - Some Thoughts about this tool

Tools affect how we interact with the world. The search box and Ten Blue Links create a context for interacting with information that I believe is fundamentally confirmatory. The goal of this effort was to create an alternative to the dominant paradigm of interactive search, one that could support a more direct,  interactive manipulation and navigation of a corpora. Being able to manipulate and observe the relationships between, for example terms and documents, could provide a user more opportunities to discover relationships that would be hidden by other, more static interfaces. More opportunities for discovery in turn encourage exploration.

To support dynamic interaction I revisited the PageRank algorithm and discovered that a matrix large enough to support tens to hundreds of documents could be manipulated on modern hardware at interactive framerates. Using the framework laid out by Kurland and Lee, I then used Language Model Linking to produce term/document matrices where the ‘weight’ of an item could be directly manipulated by a user.

Consistent with the research through design framework, I then began to iterate on a tool, based on adjusting weights, that would support dynamic interaction with a term/document matrix. Although not in anything that could be regarded as a final or generally usable form, I believe that I have developed a system that does support exploration as well more ‘traditional’ information retrieval, in that any result set that has a collection of text organized by source, author, topic, etc., can be fed through this system to provide exploratory navigability to a result set.

The next, and deeper question is will this additional capability provide a way to gain greater insight into how users can be encouraged, through design, to explore information (like search results) in a way that their patterns of interacting with the information may expose their approach to the results they are presented with. Are they explorers, reaching outside and across information bubbles? Do they seek confirmation? Or do they avoid conflicting information [Munson, 2010]?  

In my case, this process led to the creation of a tool that gave me greater insight into the literature review that I had done. The ability to emphasize codes and see the relationship of that code with the underlying documents has turned out to be so useful that it has in turn affected the way that I think about the literature I’ve sampled and my relationship with it. It well may be that these patterns of interacting with the interface are distinct in a way that can be characterized and generalized in a domain independent way. This is a core component of my research and will be discussed in detail in the research plan section.

5 - Research Plan

In the previous chapters, I’ve discussed how list-based information retrieval presentation may be implicitly confirming to the user’s point of view. It is ironic that the unending quest for the most relevant, frictionless way to obtain and present information may in fact be contributing to polarization and isolation. This polarization process appears to be accelerating. Lelkes, Sood and Iyengar show that as bandwidth increases, affective polarization follows (Lelkes, Sood, & Iyengar, 2015).

However, this may not have to be the case, as Park et. al show with NewsCube (Park, Kang, Chung, & Song, 2009) and Munson & Resnick show with their news browser (Munson & Resnick, 2010), small changes in design can affect the way that individuals interact with the news in particular and potentially with search in general. Can we create designs that encourage explore behavior over exploit? The multi-armed bandit problem shows that there is value in exploring and that the process of exploration contributes to a better understanding of the environment and increased reward.

If design approaches can be developed that support exploratory behaviors, this may in turn change the way that we interact with the information we are presented with. Although not everyone needs to have a design that supports a level of serendipity, map-makers, newspaper editors, librarians and even supermarket architects all understand the value of affording it.

This is my fundamental research question - what does a design that supports exploration look like? What does it look like when it’s used? Do the behaviors of individuals exploring the news ramify through the behaviors of larger groups that they belong to or participate in?

The research plan below is broken into several sections Since each research question builds on the answer to the previous, the following sections are ordered by the sequence that each section needs to be completed. What follows then is  a rough description of how the research is to be conducted, and the iterative, spiral framework of Research Through Design and Autobiographical Design as discussed in Chapter 4. Section 5.1 lists the overall research questions. After that, the hypothesis that derive from these questions are listed in section 5.3. Last is the actual plan for conducting the research in section 5.3.

The broad research question has two parts - what are the design considerations for an information browsing display that affords exploration (RQ2), and can aggregated browsing behaviors provide insight into information bubbles and antibubbles and by indirection, the trustworthiness of news (RQ4). However, for clarity, the sequential order of questions will be discussed below

5.1 - Research Questions

There are four basic research questions that this proposal is trying to answer. Each basic question has several additional sub-questions that clarify particular aspects of the parent issue. These questions, their ramifications and evaluation criteria will be discussed in detail in section 5.4:

RQ1: What is a simple model that can serve as a proxy for behaviors that indicate Group Polarization?

RQ1a: Within the model, what direct and indirect behavioral indicators can be determined to map to the concept of trustworthiness? For example could the deterioration of information as exemplified by the game of ‘telephone’ be used as a marker? How different should the derived information be from the original before it is considered wrong?

RQ1b: Is it possible to adjust the degree of polarization by making changes to the social network, such as varying degree? To continue the metaphor above, what happens if a loop of nodes form in the telephone game and lose all reference to the source data?

RQ2: What are the design considerations for an information browsing display that affords exploring/diversity-seeking search behavior[24]?

RQ2a: How does a diversity-seeking supporting interface affect user search behavior with respect to avoidance and confirmation patterns?

RQ3: What are the detectable patterns of user actions that can be correlated with exploring/confirming/avoiding behavior?

RQ3a: How do we extract them from the computer mediated data stream?

RQ4: Is it possible to use these patterns and the principal of group polarization to isolate trustworthy information as defined in the model of RQ1?

RQ4a: Is it possible to do this at scale with domain independence?

RQ4b: Is it possible to use this principle in turn to re-rank documents, particularly news documents with respect to trustworthiness?

5.2 - Hypotheses

These questions generalize to four hypothesis:

H1(RQ1, RQ3) - Group Polarization can be modeled using the Explorer/Confirmer/Avoider pattern.

H2(RQ2) - A search interface that supports diversity-seeking behavior (without inhibiting other behavior patterns) will result in greater exploration of news (and potentially other IR) items than a standard ‘ten blue links’ interface.

H3(RQ3) - Clickstream data gained from browsing behavior with the H1 interface will provide a domain independent way of vetting the truthfulness and trustworthiness of browsed documents.

H4(RQ4) - Trustworthiness data from H2 will allow for otherwise unvetted documents to be ranked with respect to trustworthiness as well as relevance.

5.3 - Plan detail

Figure 18 - Full Plan with Interactions

Let’s explore the research questions in detail, and how they might be answered. For each research question, there will be a brief discussion of the intended approach and an evaluation criteria that can be used to determine when the question can be regarded as satisfactorily answered.

5.3.1 - RQ1: What is a simple model that can serve as a proxy for behaviors that indicate Group Polarization?

Figure 19 - RQ1 Components

The goal of building a computer model in this case is to have a reference to inform the design process, and later, as data is gathered from users, to be updated and refined by real-world results. The use of such models based on simplifying assumptions has been used in fields as widespread as climate and economics. For social systems that are based on the interaction between individuals, agent-based approaches are widely used. The use of agent-based computer models for such ‘generative social science’ is appropriate for building systems where complex behavior emerges at the result of interaction of simple rules with and environment. In many respects it is the direct descendent of Simon’s Ant. Joshua Epstein provides a thorough overview of such systems in Agent-Based Computational Models and Generative Social Science (Epstein, 2006, p. 5 - 10).

Eric Bonabeau, in a colloquium on agent-based simulation describes five situations where agent-based simulation is appropriate. The model in this proposal aligns with the social network situation, where social networks are “are characterized by clusters, leading to deviations from the average behavior” (Bonabeau, 2002, p. 7287). Macal and North state that Agent-Based Modelling and Simulation (ABMS) is “a preferred mechanism to represent social interaction, collaboration, group behavior, and the emergence of higher order social structure” (Macal & North, 2005., p. 3),  The social characteristics of Group Polarization as supported by information bubbles and anti-bubbles fall into this situation.

Group Polarization is a social phenomenon,that emerges from individual behaviors. Work modelling of the social aspect of this has been done by Guillaume Deffuant, who explored the creation of extreme, polarized, opinions (Deffuant, Amblard, & Weisbuch, 2004). Yu and Dayan that developed a neurological behavior model for explore/exploit behavior in individuals (Yu & Dayan, 2005).

The core agent-based simulation for this effort will begin with the following components and be refined from this initial point as user data becomes available:

In the basic system that may be sufficient to see effects. The rules for individual agents would be something along the following lines:

  1. Initial Belief formation - An agent  randomly samples a set of SOURCEs and draws a set of statements from the SOURCE beliefs.
  2. Iteration - At each subsequent turn, each agent interacts with their neighboring agents such that the differences in behavior are expressed. EXPLORERS should sample more widely, CONFIRMERS should look for matching beliefs/statements, while AVOIDERS should look only at agents who already share similar beliefs. An example of a potential selection algorithm is described by Lande, who creates a model for runaway sexual selection as originally described by Darwin and Fisher[25] (Lande, 1981). Runaway selection is an instance of a confirming pattern and aspects of Lande’s model will be included in the model.

5.3.1.1 - Evaluation Criteria:

A successful model should display two qualities:

  1. The differences between EXPLORER, CONFIRMER and AVOIDER should be statistically significant. Within the model, this should mean that the quantitative data about agent sources should be distinguishable. For example, EXPLORERS should have a wider range of sources. This should manifest itself as a greater variance. CONFIRMERS should have tighter variance, with a higher total of sources than AVOIDERS. This should be calculable using bootstrapping, using Chapman and Hall’s An Introduction to the Bootstrap as my primary authority[26] (Efron & Tibshirani, 1994). I would expect a 95% confidence between EXPLORERS and others. I’m not sure that there will be as large a difference between CONFIRMERS and AVOIDERS. To a degree this may be moot, as the simulation can be run over sufficiently large enough populations to empirically determine the needed statistical power.
  2. Group polarization should emerge during simulation, and the degree of polarization should depend on the distribution of agents. To confirm the creation of bubbles, I propose the use of dp-means or DBSCAN cluster detection. The repeatable creation of unique clusters centered around shared ‘beliefs’ should indicate the presence of group polarization. Again, this should be confirmable by comparing the members of a cluster to each of the other clusters using bootstrapping analysis of means and variances. My intuition is that there will be considerable ‘tuning’ of parameters to achieve a statistically meaningful separation of an unknown number of clusters. My hope is that these parameters will provide insight into the analysis of human interaction in finding answers to the following research questions. This type of interaction between model and ‘real world’ data has been explored using cell phone data by (Roehner, 2005) though I am having difficulty finding other examples.

5.3.2 - RQ1a, Within the model, what behavior can be determined to be most trustworthy? 

My intuition is that given enough iterations in a large enough simulated population, each behavior pattern will result in a statistically significant difference in a ‘final’ belief or set of statements. Those behaviors that best reflect the mean and variance of all SOURCE beliefs should be ‘most trustworthy’.  In the extremely simple naive implementation, there is no communication between agents other than sampling sources, so all agents with the same behavior may have their ‘opinions formed’ with the same values.

5.3.2.1 - Evaluation Criteria

In this case, populations of clusters will be compared against SOURCE data using bootstrap. Here, we are looking for which agent behavior results in an internal model that best reflects the objective, ‘world’ model contained in the sources. So this would be a between-subject analysis, comparing SOURCE iteratively to EXPLORER, AVOIDER and CONFIRMER. We are looking for a 95% confidence that the values contained by agents using a particular behavior match the SOURCE data. Since this question as written does not incorporate additional complexities such as social interactions, the ‘trustworthiness’ test will be revisited as the model gains complexity. This will allow for a longitudinal, within subject comparison of ‘trustworthiness’, based on a successively more sophisticated criteria.

5.3.3 - RQ1b: Is it possible to adjust the degree of polarization by making changes to the social network, such as varying degree?

The limitations of this previous approach leads to this question. Since information consumers listen to other people than just sources, we need to factor in inter-agent communication. I think that this should be done in two stages. The first stage would be a broadcast style of communication, where all nodes are equally accessible to all other nodes (i.e. a complete graph). The second would use partially connected network such as a small world network with cliques. This would need to extend the CA model in several ways. Let’s consider the broadcast/fully connected model first.

Once these sort of group communication behaviors are enabled, the proxy of group initialization can be removed.

The final level of model sophistication would be the addition of social network interaction. Building a simulated social network is straightforward, requiring agents to become nodes that can be connected by directed edges to other nodes. Once set up, additional rules can be applied as to the priority of local information, cluster structure and so forth (Scott, & Wasserman, 2005). For example, it has been posited that contact with an opposing positions, even well-supported factual ones, may reinforce an opposing individual’s belief (Yardi & Boyd, 2010). In this context, an asymmetric difference-based repulsion interaction could be trivially implemented and examined under a variety of scenarios. These values can then be adjusted to examine the sensitivity of Group Polarization to these new values.

Once these simulations are in place, questions can be asked that can inform the following research questions. For example, Different styles of user-interface can be proxied by adjusting the sampling process. Traditional ‘ten blue link’ search can be simulated by returning to the sampling agents only nodes that contain similar variable values, for example. How the different agents types react to this ‘information retrieval system’ should indicate if changes in UI can be expected to work outside of the simulation.

5.3.3.1 - Evaluation Criteria

Using the dimensional property of agents, clustering should be straightforward to observe and to calculate using the cluster analysis described RQ1. Clusters of agents should have properties that can be analysed statistically, such as the distance between the centroids of clusters and the average distance between elements in a cluster (Manning, Raghavan, & Schütze, 2008, p. 321-367). Values from runs where the values are varied can then be analyzed using a between-subjects analysis. If 95% confidence is achieved, then this question can be considered to have been satisfactorily answered.

5.3.4 - RQ2: What are the design considerations for an information browsing display that affords exploring/diversity-seeking behavior?

Figure 20 - RQ2 Components

Recall from Section 2.5 that list-based retrieval presentations support confirmation and avoidance behavior by virtue of their specificity and lack of an ‘information horizon’. List-based information retrieval dates back to the SMART system of the 1960s and is currently implemented almost universally as variations of  the ‘ten blue links’

The issue is to create the capacity to return a more diverse information horizon. Since my current efforts with the testbed from the Lit Review 2 section appear compelling, it is my plan to selectively distribute the tool to other users in my student cohort or committee and incorporate them into the design evaluation process. This population represents a set of ‘UX experts’ and in and their input can be regarded as a heuristic evaluation of the current system with respect to what features work, don’t work, need improvement, or are missing with respect to RQ2 (RQ questions and surveys are in Appendix 1).

In addition to the above evaluation, the current tool will be extended to support its use as a within this community:

Based on usage and ongoing feedback from the ‘expert user’ sample (as ascertained by survey and log activity[27] thresholds and survey construction), The design will be iterated until there is a consensus that this approach provides some level of affordance for diversity seeking behavior.

5.3.4.1 - Evaluation Criteria

Gavor states that research through design is essentially a study of convergence (Gaver, 2012). I expect that there will be a diverse set of opinions and behaviors with respect to the interaction of users with the tool. This diversity will be collected using the logging capabilities of the tool itself and a set of short surveys[28]  that provide a longitudinal dataset of each user’s impressions of the tool and its capabilities. Semi-structured interviews will augment the survey data (TODO details). A bootstrap analysis of the variance over time should indicate if there is convergence over time. The advantage of looking at variance as opposed to means is that users can adopt different interaction patterns as long as they converge to a particular behavior pattern with respect to usage. So a decrease of variance within-subject but not between subjects over time would mean individual patterns of behavior, while a decrease in both would mean that usage patterns are consistent across users.

5.3.4.2 - Model Feedback

The model consists of two basic parts - the explicit encoding of behavior of the agents and the patterns that emerge from their interactions. This stage in the process is focussed on the refinement of explicit agent behaviors. I am expecting that agents may have to switch or blend explorer/confirmer/avoider behaviors based on external factors such as source. For example, as the son of a Holocaust survivor, I tend to avoid anything that involves Nazi concentration camps, though in the case of this proposal, my exploration patterns were quite wide ranging. This in turn will mean that detection of the agent behavior patterns may have to become more sophisticated. For example, agent behavior may be detectable through interaction effects rather than directly.

5.3.5 - RQ2a: How does an interface that supports diversity-seeking  behavior support search?

Figure 21 -  Interface Strawman Design

To answer this question, an interface more focussed on the lay news consumer will need to be developed (what is the population that will evaluate - people that normally search for news on the web). Using the algorithms developed in the RQ2 stage, use research through design methods to design a search interface for news. What I would really like to do is to work with a class such as HCC 613/729 to develop a set of candidate interfaces, wire them up and have them evaluated by experiments that the class determines.

Alternatively, the process will consist of the following steps, again adhering to the process of research through design (Zimmerman, Forlizzi, & Evenson, 2007).

5.3.5.1 - Evaluation Criteria

Evaluating the UI requires the determination of user behavior and then evaluating across the traditional and putative UI. I propose to do this by replicating the evaluation section of the Munson and Resnick study, Presenting Diverse Political Opinions: How and How Much, with the version developed for this effort and an equivalent of M&R’s version (Munson & Resnick, 2010). Rather than coding open-ended responses, this study will use survey questions and Likert scale responses to describe user information-seeking behavior,

For this evaluation, I would expect that a medium sample size that could be recruited from the university population should be sufficient. However the client-server system will be set up to scale, if the use of a larger sample size, such as one that can be reached through Mechanical Turk, may be required. Comparing user-satisfaction scores[29] between the new interface and the replicated M&R interface should provide for a user-centered evaluation of the new interface vs the traditional. This can be correlated against clickstream and other user behavior to determine how interaction patterns compare to the behavior models defined by the surveys.

5.3.5.2 - Model Feedback

As with RQ2, this is question that is focussed on agent behavior. Since the output is more data at a finer granularity, a more direct comparison of user vs. agent behavior can be performed.  In this case, the affordances of the UI should provide some insight into patterns of usage. The goal of the design is to diversity-seeking users to interact with the explore UI components, the confirming users to interact with the ‘more-like-this’ UI components, and the avoiding users to interact with the ‘less-like-this’ UI components. These frequencies may be associated with users or they may be associated with sessions based on context. These new patterns and ratios once uncovered can be explicitly added to the agent behaviors

5.3.6 - RQ3: Is it possible to detect these explorer, confirmer and avoider patterns?

Figure 22 - RQ3 Components

Munson and Resnick used survey questions of user response to list-based presentation of news  items. Furthermore, each user and each item presented had already been characterized with respect to bias. The question here is whether it is possible to determine explorer, confirmer and avoider patterns purely from information that can be obtained through the human-computer interaction between the user and the presented information.

Hopefully, insight into the types of interactions that can produce detectable patterns can be explored using the model developed in RQ1. I expect that by this time it should be possible to describe a synthetic agent that meets the explorer, confirmer, and avoider patterns, and that the traces of their interactions in the simulated environment should be identifiable using statistical methods. These sorts of patterns are what I will be looking for first. If other patterns are found instead, then the model will be refined to reflect these actual user patterns. (TODO - rework bullets to make more clear)

5.3.6.1 - Evaluation Criteria

The evaluation depends on the approach that winds up being successful. If statistical analysis of the user data proves to be effective, then a repeated-measures bootstrap analysis that returns a 95% confidence of a difference between explorer and other certainly would be a satisfactory result. Since the difference between confirmer and avoider appears to be more subtle, distinguishing between these two elements may be beyond the power of a study that can reasonably be performed.

If a machine learning approach is used, then the initial pass at evaluation will be to use held-back test data from the RQ2a study to evaluate the performance of the machine learning performance trained on the rest of the data. For this to be effective, data tagged with each of the patterns will need to be equivalent in quantity. Munson and Resnick indicate that the explorer pattern is less common (approximately 25%) than the confirmer/avoider pattern. This means that for effective training and testing to occur, getting a sufficient quantity of explorer data may well be the limiting factor, so balancing training data may become an issue. (Batista, Prati, & Monard, 2004) show that oversampling and cleaning can be used to address these concerns.

Successfully answering this question using machine learning is not a cookbook answer. Machine learning may be able to determine if there is a pattern to be extracted and what variables would provide the most sensitivity. Under ideal conditions, a better understanding for what to look for and how to train the systems can then be implemented. This iteration can continue conceivably until the problem is sufficiently well described so that machine learning is no longer needed and the data can be extracted directly. That being said, a 90% effectiveness in classification would certainly be sufficient, while a 60% classification wouldn’t.

5.3.6.2 - Model Feedback

Up to this point, the model should reflect the best intuition with respect to behavior of agents. As such, the model should be adjusted so that the evaluation measures performed above in section 5.3.6.1 should occur with the same frequencies when applied to model-generated data. This means either modifying the existing explorer, avoider, and confirmer agents, or developing new agents that reflect the observed behavior. The goal is that the agents reflect separable classes of behavior on the part of the users. Once the agents in the system produce such detectable patterns of behavior, the model can be considered effectively updated.

5.3.7 - RQ4: Is it possible to use these patterns and the principal of group polarization to isolate trustworthy information as defined in the model of RQ1?

Figure 23 - RQ4 Components

Since group polarization states that enclave deliberation causes opinion shift, it may be possible to compare opinion shift in diversity-seeking and non-diversity-seeking populations with respect to the documents they consume. A pattern of progressive drift in the terms of the read corpus by non-diversity seeking users when compared to diversity-seeking users could provide an scalable, non-domain-specific way to vet documents, particularly documents that follow journalistic affordances as exploited by the NewsCube researchers.

5.3.7.1 - Evaluation Criteria

As with other questions that are based on deciding ground truth, comparison with the model created for RQ1 will be a starting point. Since agent ‘beliefs’ are made up of textual ‘statements’, a statement/agent matrix can be constructed so that the patterns of centrality between the simulated and actual data can be compared. This is a two-way street in that the model should be able to inform the direction of analysis on the user results, while the results should in turn refine the model.

Successfully answering this question has several parts. Showing that there is a difference between the populations of users (explorers, confirmers and avoiders) with respect to the use of known trustworthy sources would be a significant outcome. As with Munson and Resnick, I expect that it will be easier to find a difference between explorer (diversity-seeking) behavior and other. In this case, the populations that will be examined are the terms that the users interact with. The corpus can be tagged with respect to trustworthiness, and then the term populations from the different behavior patterns can be compared against the source material using a repeated measure between-subject approach.  The final result should ideally be able to show this difference (at least between explorer and other) in the user data and be able to replicate the patterns in the model.        

5.3.7.2 - Model Feedback

We always know what is trustworthy knowledge in the model because we define it to be so. In this particular study of historic news stories, we have a similar situation in that the information in the colpus is vetted by decades of historical research. As such, at this point a functional correspondence between information sources in the model and the study as well as a correspondence between the model agents and our study subjects. A first pass at refining model correspondence will be to insure that the probabilities of the overall agent explorer/avoider/confirmer patterns that are found in the study are replicated in the model. THe preferred way to do this would be to ramify the observed percentage of explorer/confirmer/avoider/unclassifiable users are replicated in the agents. Additionally, the ratios of different browsing behavior within agents can also be replicated. These adjustments should in turn ramify into the emergence of similar overall observed patterns of behavior.

5.3.8 - RQ4a: Is it possible to do this at scale with domain independence?

Domain independence in this context means that the meaning of the terms and the content of the documents do not need to be known by the ranking system beyond counts. If some weighting has to be given to particular documents, then complete domain independence begins to be lost. If the entire corpus has to be manually read and tagged into a set of structured data, then there will be no domain independence. The greater the text of the documents has to be manually processed, the lesser the ability to scale as well. Other issues such as those involved in calculating LSI and centrality across larger corpora will need to be evaluated.

The determination of scalability will depend on the amount of domain independence and the sizing and timing of the system with respect to corpora. This will be a somewhat qualitative answer with supporting data.

5.3.8.1 - Evaluation Criteria

The answer to this question will come from the amount of ‘special-case’ handling (TODO - How, why, etc - and add this to the slide!)  that is required to answer RQ4. To a degree, this is a qualitative judgement - if no additional human intervention is required and trustworthiness can be determined entirely from the behavior of terms within a language model, then this indicates a high level of domain independence. As more and more contextually aware, ‘special-case’ processing is required, the claim of domain independence is diminished.

Scalability is a different question, but also one that is substantially a qualitative one. The heuristic that is generally applied here is the use of human assets to process the information and determine an answer. In this case, the context is slightly different, in that humans are always at the core of the process, they just aren’t being tasked to do anything. Their contribution comes from their browsing patterns and the systems ability to determine which users or sessions have the characteristics of effective fact checking. So as with the domain independence question, the claim for scalability is strongest when no additional users need to be employed as special-purpose fact checkers and weakest if nothing can be done without special-purpose tasking.

5.3.9 - RQ4b: Is it possible to use this principle in turn to re-rank documents, particularly news documents with respect to trustworthiness?

The task here would be to determine a way to re-weight query results based on trustworthiness. If the top results for a collection of ‘trusted browsers’ can be collected, then they would need to be able to re-weight a query result so that pages that they are closely associated with would be pulled up while non-associated documents should be pushed down.

5.3.9.1 - Evaluation Criteria

The goal here is to see if the strawman LSI (or other[30]?) ranking of a given corpus is significantly different from a ‘browsed’ ranking.  Conceptually, search in this context is a two step process. The first step consists of ranking the entire corpus using centrality. The centrality eigenvector is then used to reweight  the search result. The re-incorporation of browsing information adds an additional reweighting step, so a comparison of the results with and without the additional step should be straightforward.

For this evaluation to occur, a set of queries that refer to sections of a (manually vetted for accuracy) corpus that are known a-priori to have browsing data associated with them need to be constructed. For the purposes of this discussion, we will assume that we are using ‘Red Scare’ news stories from the 1950’s, and an example of a query might be “loyalty oath”.

Searches will be run against each ranking system, and the population of search results will be analyzed for accuracy with respect to the initial vetting process. The ‘accuracy scores’ can then be compared using between-subjects analysis. If the ‘browsed’ ranking system returns a population of results of highly rated items at a frequency that exceeds chance, then the answer to RQ4b can be shown to be true.

5.3.9.2 - Model Feedback

Reranking of the information in a corpus depends on the centrality of the information and the weight of the rating. The more central a piece of information is, the harder it is to move. The more central the rating, the more powerful it is. A well-grounded rating should be able to significantly rerank an item of low centrality, but not move a central item by much. Based on the results of this part of the study, we should have insight into the ‘fact-checking’ network and how it interacts with the original ‘news statement network’. These ratios should be incorporated back to the model not only to improve the accuracy of the model, but also to have the model perform as a sophisticated ‘sensitivity test’. One the model’s validity is determined, then the effects of larger and smaller populations of fact-checking behaviors in the agent population can be explored with an eye towards determining the minimal thresholds that an arbitrarily scaled system should begin to function effectively.

6 - Implications for Design

Let’s explore a potential instantiation of the concepts discussed in this proposal. This assumes that the hypotheses of chapter 5 are realized, and that an effective, scalable system can be implemented.

The fundamental premise of this proposal is that the tendencies for users to explore or exploit (confirm/avoid) can 1) be encouraged and identified through interface design that affords and targets such activity and that 2) inform the detection of information bubbles and antibubbles by analyzing the behavior of many individuals. If proven successful, these insights can practically guide the development of new, less-opaque, news reading tools. At the conclusion of full research project I anticipate being able to generate implications for the design of a proof-of-concept system similar to the one mocked-up in this section.

trustworthy4.jpg

Figure 24 - News Aggregator Mockup 

As an expression of the material discussed in this proposal, this mockup (figure 24) is geared towards the casual consumer of news, as opposed to the more ‘research-oriented’ system described in chapter 4. This is similar to current news aggregators such as Inkl, Digg, and Google News. Innately, the format takes advantage of traditional journalistic practices by showing the headline and the first paragraph or so of the story. This allows the reader to evaluate whether or not to read the story before clicking on the link, which will open the story in a new tab. Traditional  search behavior is supported at the top of the screen, with additional ranking criteria exposed. In the case of the mockup, the choices are to rank by story popularity and then trustworthiness. Other user modifiable options could include timeliness, region, area of interest, and others.

Explore, confirm, and avoid behaviors are afforded in the display in a variety of ways. Starting at the top right of the display is a ‘map’ that shows the relationships of the displayed stories. In the case of the mockup, the map is from Andrews et al’s VisIslands visualization from their xFIND system. With respect to an implementation, many of the features of the xFIND system, such as a gatherer/indexer/broker framework may be appropriate for this system (Andrews, Gutl, Moser, Sabol, & Lackner, 2001). A detail of the map view is shown below:

map.jpg

Figure 25 - Map Detail of Mockup

To afford detailed exploration, the map can be expanded out of its inset. Selecting areas and items in the map would cause new items to be displayed in the article collection below. Such interaction would be recorded by the server as ‘explorer activity’ for the current session.

Confirmation and avoiding behavior would be supported by the information bar above each story:

infobar.jpg

Figure 26: News Story InfoBar

There are three sections of the infobar to describe. On the left is the icon and name of the news source. Clicking on that will bring the user to the ‘about’ page of the organization, which in this case states that the “World News Daily Report is an American Jewish Zionist newspaper based in Tel Aviv and dedicated on covering biblical archeology news and other mysteries around the Globe”, which, considering the story is about an affair between Yoko Ono and Hillary Clinton, might give a reader pause. Moving to the right, the next section consists of a “more like” green check and a “less like” red x. Clicking on the check would keep the desired story and find more like it. This would be logged as a confirming behavior. Clicking the red x would cause the current article to be deleted from the display, along with similar articles. This would be logged as an avoiding behavior. Lastly, the anonymous, aggregated contribution of other user’s browsing behavior would be shown in the Bubble/antibubble section. Clicking on either of these links would open a tab that has a similar view the articles that led to the calculation of the bubble/antibubble percent calculation. It is not clear how these actions should be logged, though in the instantiation described, it should be possible to see if such interaction contains a bias towards any of the mentioned behaviors or may point to something new.

For completeness a strawman implementation of the server side of this instantiation is shown below:

Browser back end Data Flows.jpg

Figure 27 - Strawman Server-side Configuration

This implementation shows that all communication for the browser is mediated through dedicated servers. This allows for user-selected web pages to be lazily scraped and added to the database for later evaluation. At these moderate scales, search would be handled through Google’s Custom Search Engine Interface, which can be conformed to look for particular domains, sites and schemas. Additional information such as inbound links which could be useful for evaluating bubble characteristics is available from providers such as Semrush.com, who periodically crawl and publish searchable backlink data. This information is then fed back to the user interface in the form of the (e.g.) VisIslands map, and the bubble/antibubble statistics.

7 - Discussion

This proposal attempts to lay out a plan for determining if a larger information horizon in a search display could lead to deeper browsing behavior on the part of diversity-seeking users.

The foundational principle laid out by Tefko Saracevic is the idea relevant information may not include pertinent information, and that pertinent information need not be relevant to a particular search. By presenting an ‘information horizon’ display in search results, the user is exposed to additional, potentially pertinent information. This view should be traversable using map-like affordances to support exploring an information space.

This in turn may lead to a mechanism to infer trustworthiness of sources and documents in a corpus. These trustworthy documents may in turn allow reranking of a search result that is indirectly related to the trustworthy documents that have been found by recognisably trustworthy behaviors.

In the fantasy world where everything works even better than planned, this principle provides a basis for a domain independent way of evaluating trustworthy information on the web, informed by the behavior of recognisably trustworthy behaviors.

8 - References

Alvarez, R. M. (2004). Party system compactness: Measurement and consequences. Political Analysis, 12(1), 46-62. doi:10.1093/pan/mph003

Alves, J. (2013, June 23). Unintentional Knowledge - What we find when we're not looking. Retrieved from http://www.chronicle.com/article/unintentional-knowledge/139891

American Society of News Editors. (2016). Ethics code index. Retrieved October 4, 2016, from http://asne.org/content.asp?contentid=236

Andrews, K., Gutl, C., Moser, J., Sabol, V., & Lackner, W. (2001). Search result visualisation with xFIND. Proceedings Second International Workshop on User Interfaces in Data Intensive Systems. UIDIS 2001. doi:10.1109/uidis.2001.929925

Badre, D., Doll, B., Long, N., & Frank, M. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595-607. doi:10.1016/j.neuron.2011.12.025

Bar-Ilan, J., Keenoy, K., Levene, M., & Yaari, E. (2009). Presentation bias is significant in determining user preference for search results-A user study. J. Am. Soc. Inf. Sci, 60(1), 135-149. doi:10.1002/asi.20941

Barry, D., Barstow, D., Glater, G., Liptak, A., & Steinberg, J. (2003, May 11). CORRECTING THE RECORD; Times reporter who resigned leaves long trail of deception - The New York Times. Retrieved from http://www.nytimes.com/2003/05/11/us/correcting-the-record-times-reporter-who-resigned-leaves-long-trail-of-deception.html?scp=1&sq=jayson+blair&st=cse&pagewanted=all&_r=0

Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter,6(1), 20. doi:10.1145/1007730.1007735

Beale, R. (2007). Supporting serendipity: Using ambient intelligence to augment user exploration for data mining and web browsing. International Journal of Human-Computer Studies, 65(5), 421-433. doi:10.1016/j.ijhcs.2006.11.012

Binder, S. (2014, May 1). Polarized we govern?. Retrieved from https://www.brookings.edu/wp-content/uploads/2016/06/BrookingsCEPM_Polarized_figReplacedTextRevTableRev.pdf

Bonabeau, E. (2002). Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, 99(Supplement 3), 7280-7287. doi:10.1073/pnas.082080899

Broder Van Dyke, M. (2016, August 31). Hurricane warning issued as storms take aim at florida and hawaii - BuzzFeed News. Retrieved from https://www.buzzfeed.com/mbvd/hurricane-watches-warnings-issued-as-storms-take-aim-at-flor?utm_term=.ts99dGP4Dr#.lrqNm7Xk5y

Brunnermeier, M., & Oehmke, M. (2012). Bubbles, financial crises, and systemic risk. doi:10.3386/w18398

Carr, P. L. (2015). Serendipity in the stacks: Libraries, information architecture, and the problems of accidental discovery. College & Research Libraries, 76(6), 831-842. doi:10.5860/crl.76.6.831

Cha, M., Kwak, H., Rodriguez, P., Ahn, Y., & Moon, S. (2007). I tube, you tube, everybody tubes. Proceedings of the 7th ACM SIGCOMM conference on Internet measurement - IMC '07, 1-14. doi:10.1145/1298306.1298309

Cheng, X., Dale, C., & Liu, J. (2008). Statistics and social network of YouTube videos. 2008 16th Interntional Workshop on Quality of Service. doi:10.1109/iwqos.2008.32

Ciampaglia, G. L., Shiralkar, P., Rocha, L. M., Bollen, J., Menczer, F., & Flammini, A. (2015). Computational fact checking from knowledge networks. PLOS ONE, 10(6), e0128193. doi:10.1371/journal.pone.0128193

Cisco. (2016, June 2). The zettabyte era—trends and analysis - Cisco. Retrieved from http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html

Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 933-942. doi:10.1098/rstb.2007.2098

Cook, S., Conrad, C., Fowlkes, A. L., & Mohebbi, M. H. (2011). Assessing Google Flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic.PLoS ONE, 6(8), e23610. doi:10.1371/journal.pone.0023610

Creating friction: Infrastructuring civic engagement in everyday life. (2015). In Korn, Matthias, Voida, & Amy (Eds.), AA '15 Proceedings of The Fifth Decennial Aarhus Conference on Critical Alternatives (pp. 145-154). (Aarhus Series on Human Centered Computing; Vol 1, Denmark: Aarhus University Press.

De Gemmis, M., Lops, P., Semeraro, G., & Musto, C. (2015). An investigation on the serendipity problem in recommender systems. Information Processing & Management,51(5), 695-717. doi:10.1016/j.ipm.2015.06.008

Deffuant, G., Amblard, F., & Weisbuch, G. (2004). Modelling group opinion shift to extreme : the smooth bounded confidence model. Retrieved from https://arxiv.org/ftp/cond-mat/papers/0410/0410199.pdf

Dieckhoff, A., & Jaffrelot, C. (2006). For a theory of nationalism. In Revisiting nationalism: Theories and processes (pp. 10-61). New York, NY: Palgrave Macmillan.

Efron, B., & Tibshirani, R. (1994). An introduction to the bootstrap. New York, NY: Chapman & Hall.

Eil, D., & Rao, J. M. (2011). The good news-bad news effect: Asymmetric processing of objective information about yourself. American Economic Journal: Microeconomics,3(2), 114-138. doi:10.1257/mic.3.2.114

Elsbach, K. D., & Bhattacharya, C. B. (2001). Defining who you are by what you're not: Organizational disidentification and the National Rifle Association. Organization Science, 12(4), 393-413. doi:10.1287/orsc.12.4.393.10638

Epstein, J. M. (2006). Generative social science: Studies in agent-based computational modeling. Princeton, NJ: Princeton University Press.

Farmer, L. S. (2016). Library space: Its role in research. The Reference Librarian, 57(2), 87-99. doi:10.1080/02763877.2016.1120620

Garrett, R. K., Weeks, B. E., & Neo, R. L. (2016). Driving a wedge between evidence and beliefs: How online ideological news exposure promotes political misperceptions.Journal of Computer-Mediated Communication, 21(5), 331-348. doi:10.1111/jcc4.12164

Gaver, W. (2012). What should we expect from research through design? Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12, 937-946. doi:10.1145/2207676.2208538

Gedikli, F. (2013). Recommender systems and the social web: Leveraging tagging data for recommender systems (pp. 15-21). Wiesbaden, Germany: Springer Vieweg.

Ghosh, A., & McAfee, P. (2011). Incentivizing high-quality user-generated content. Proceedings of the 20th international conference on World wide web - WWW '11, 137-146. doi:10.1145/1963405.1963428

Goodwin, D. K. (2005). Team of rivals: The political genius of Abraham Lincoln. New York, NY: Simon & Schuster.

Gottfried, J., & Shearer, E. (2016). News use across social media platforms 2016 | Pew Research Center. Retrieved from Pew Research Center website: http://www.journalism.org/2016/05/26/news-use-across-social-media-platforms-2016/

Grabe, M. E., & Myrick, J. G. (2016). Informed citizenship in a media-centric way of life.Journal of Communication, 66(2), 215-235. doi:10.1111/jcom.12215

Gray, E., & Tall, D. (2007). Abstraction as a natural process of mental compression.Mathematics Education Research Journal, 19(2), 23-40. doi:10.1007/bf03217454

Haciyakupoglu, G., & Zhang, W. (2015). Social media and trust during the gezi protests in turkey. Journal of Computer-Mediated Communication, 20(4), 450-466. doi:10.1111/jcc4.12121

Haveliwala, T. H. (2002). Topic-sensitive PageRank. Proceedings of the eleventh international conference on World Wide Web - WWW '02, 517-526. doi:10.1145/511446.511513

Hobson, K., & Niemeyer, S. (2012). "What sceptics believe": The effects of information and deliberation on climate change scepticism. Public Understanding of Science, 22(4), 396-412. doi:10.1177/0963662511430459

In Gieseking, J. J., In Mangold, W., In Katz, C., In Low, S. M., & In Saegert, S. (2014). The production of space. In The people, place, and space reader (pp. 289-293). Oxford, England: Blackwell.

Jacob, C., & Dahl, E. H. (2006). The sovereign map: Theoretical approaches in cartography throughout history. Chicago, IL: University of Chicago Press.

Key, V. O., & Cummings, M. C. (1966). The responsible electorate: Rationality in presidential voting, 1936-1960. Cambridge, MA: Belknap Press of Harvard University Press.

Korn, M., & Voida, A. (2015). Creating friction: Infrastructuring civic engagement in everyday life. Aarhus Series on Human Centered Computing, 1(1), 12. doi:10.7146/aahcc.v1i1.21198

Kress, G. R. (2010). Shaping the domain of meaning. In Multimodality: A social semiotic approach to contemporary communication (pp. 32-37). London, England: Routledge.

Kriplean, T., Morgan, J., Freelon, D., Borning, A., & Bennett, L. (2012). Supporting reflective public thought with ConsiderIt. Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW '12, 1-10. doi:10.1145/2145204.2145249

Kurland, O., & Lee, L. (2010). PageRank without hyperlinks. ACM Transactions on Information Systems, 28(4), 1-38. doi:10.1145/1852102.1852104

Landay, J. A., & Myers, B. A. (1995). Interactive sketching for the early stages of user interface design. Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '95, 43-50. doi:10.1145/223904.223910

Lande, R. (1981). Models of speciation by sexual selection on polygenic traits. Proceedings of the National Academy of Sciences, 78(6), 3721-3725. doi:10.1073/pnas.78.6.3721

Le Bon, G. (1996). The crowd: A study of the popular mind. Salt Lake City, UT: Project Gutenberg Literary Archive Foundation.

Lelkes, Y., Sood, G., & Iyengar, S. (2015). The hostile audience: The effect of access to broadband internet on partisan affect. American Journal of Political Science, 1 - 16. doi:10.1111/ajps.12237

Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106-131. doi:10.1177/1529100612451018

Lima, M. (2011). Visual complexity: Mapping patterns of information. New York: Princeton Architectural Press.

Lukoianova, T., & Rubin, V. L. (2014). Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online, 24(1), 4. doi:10.7152/acro.v24i1.14671

Macal, C., & North, M. (2005). Tutorial on agent-based modeling and simulation. Agent-based Modeling and Simulation, 2-15. doi:10.1057/9781137453648.0004

Madison, J. (1789, September 25). Bill of rights transcript text. Retrieved from http://www.archives.gov/exhibits/charters/bill_of_rights_transcript.html

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York, NY: Cambridge University Press.

Maraniss, D. (1981, April 16). Post reporter's pulitzer prize is withdrawn - The Washington Post. Retrieved from https://www.washingtonpost.com/archive/1981/04/16/post-reporters-pulitzer-prize-is-withdrawn/9cf4b4dc-c9a9-438d-8fa1-c2e1cf53fcf9/

Mark, D. M., Freksa, C., Hirtle, S. C., Lloyd, R., & Tversky, B. (1999). Cognitive models of geographical space. International Journal of Geographical Information Science, 13(8), 747-774. doi:10.1080/136588199241003

McNee, S. M., Riedl, J., & Konstan, J. A. (2006). Being accurate is not enough. CHI '06 extended abstracts on Human factors in computing systems - CHI EA '06, 1097- 1101. doi:10.1145/1125451.1125659

Media Insight Project. (2016). A new understanding: What makes people trust and rely on news. Retrieved from American Press Institute website: https://www.americanpressinstitute.org/publications/reports/survey-research/trust-news/

Meier, P. (2008, October 23). Crisis mapping Kenya’s election violence | iRevolutions [Web log post]. Retrieved from https://irevolutions.org/2008/10/23/mapping-kenyas-election-violence/

Merriam Webster. (n.d.). Apparatchik | Definition of Apparatchik by Merriam-Webster. Retrieved from http://www.merriam-webster.com/dictionary/apparatchik

Merriam, S. B. (2009). Qualitative research: A guide to design and implementation. San Francisco, CA: Jossey-Bass.

Momeni, E., Braendle, S., & Adar, E. (2015). Adaptive faceted ranking for social media comments. Lecture Notes in Computer Science, 789-792. doi:10.1007/978-3-319-16354-3_86

Momeni, E., Cardie, C., & Diakopoulos, N. (2015). A survey on assessment and ranking methodologies for user-generated content on the web. CSUR, 48(3), 1-49. doi:10.1145/2811282

Moscovici, S., Doise, W., & Halls, W. D. (1994). Conflict and consensus: A general theory of collective decisions. London, England: Sage.

Munson, S. A., & Resnick, P. (2010). Presenting diverse political opinions: How and how much.Proceedings of the 28th international conference on Human factors in computing systems - CHI '10. doi:10.1145/1753326.1753543

Neustaedter, C., & Sengers, P. (2012). Autobiographical design in HCI research. Proceedings of the Designing Interactive Systems Conference on - DIS '12, 514-523. doi:10.1145/2317956.2318034

New York Times. (2003). Guidelines on integrity. Retrieved from http://www.nytco.com/wp-content/uploads/Guidelines-on-Integrity.pdf

Norman, D. A. (1999). Affordance, conventions, and design. Interactions, 6(3), 38-43. doi:10.1145/301153.301168

Ohlheiser, A. (2016, August 29). Three days after removing human editors, Facebook is already trending fake news - The Washington Post. Retrieved from https://www.washingtonpost.com/news/the-intersect/wp/2016/08/29/a-fake-headline-about-megyn-kelly-was-trending-on-facebook/

Ortega, R., Avery, J., & Frederick, R. (2000). Search query autocompletion (US Patent 6564213). Retrieved from United States Patent Office website: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=6564213.PN.&OS=PN/6564213&RS=PN/6564213

Page, L., & Brin, S. (1999). The PageRank citation ranking: Bringing order to the web (SIDL-WP-1999-0120). Retrieved from Stanford InfoLab Publication Server website: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Panhandle PBS. (2012). Days of dust. Retrieved from http://www.kacvtv.org/dustbowl/newspaperclippings.php

Pariser, E. (2012). The filter bubble: How the new personalized web is changing what we read and how we think. New York, NY: Penguin Books.

Park, S., Kang, S., Chung, S., & Song, J. (2009). NewsCube. Proceedings of the 27th international conference on Human factors in computing systems - CHI 09, 443-452. doi:10.1145/1518701.1518772

Pope, E. (2016, September 1). On conducting a literature review with ATLAS.ti [Web log post]. Retrieved from http://atlasti.com/2016/09/01/litreview/?utm_source=CleverReach&utm_medium=email&utm_campaign=Newsletter+2016%2F6+sept&utm_content=Mailing_10575821

Potter, W. J., & Levine‐Donnerstein, D. (1999). Rethinking validity and reliability in content analysis. Journal of Applied Communication Research, 27(3), 258-284. doi:10.1080/00909889909365539

Pulitzer Prize Board. (2016). The 2010 Pulitzer Prize winner in investigative reporting. Retrieved from http://www.pulitzer.org/winners/barbara-laker-and-wendy-ruderman

Robertson, A., & Olson, S. (2013, July 30). Sensing and shaping emerging conflicts | United States Institute of Peace. Retrieved from http://www.usip.org/publications/sensing-and-shaping-emerging-conflicts

Roehner, B. (2005). Consensus formation: The case of using cell phones while driving. Retrieved from https://arxiv.org/pdf/physics/0502046v1.pdf

Rolling Stone. (2014, December 5). A note to our readers - Rolling Stone. Retrieved from http://www.rollingstone.com/culture/news/a-note-to-our-readers-20141205

Salton, G. (1964). A document retrieval system for man-machine interaction. Proceedings of the 1964 19th ACM national conference on -. doi:10.1145/800257.808923

Sanchez, J. (2010, April 7). Epistemic closure, technology, and the end of distance [Web log post]. Retrieved from http://www.juliansanchez.com/2010/04/07/epistemic-closure-technology-and-the-end-of-distance/

Saracevic, T. (1975). RELEVANCE: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science,26(6), 321-343. doi:10.1002/asi.4630260604

Schultz, J. (2009). Reviving the fourth estate: Democracy, accountability and the media. Cambridge, United Kingdom: Cambridge University Press.

Scott, J., & Wasserman, S. (2005). Models and methods in social network analysis. P. Carrington (Ed.). New York, NY: Cambridge University Press.

Siegal, A. M., & Connolly, W. G. (1999). The New York Times manual of style and usage. New York, NY: Times Books.

Siemens, G. (2005). Connectivism: A learning theory for the digital age. lnstructional Technology and Distance Learning, 2(1). Retrieved from http://www.itdl.org

Simon, H. A. (1996). The sciences of the artificial. Cambridge, MA: MIT Press.

Sonnenwald, D., & Wildemuth, B. (2001). Investigating information seeking behavior using the concept of information horizons. ALISE Methodology Paper Competition. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2993

Stockton, N. (2013, July 29). Get to know a projection: Mercator | WIRED. Retrieved from https://www.wired.com/2013/07/projection-mercator/

Sunstein, C. R. (2002). The law of group polarization. Journal of Political Philosophy, 10(2), 175-195. doi:10.1111/1467-9760.00148

Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. New York, NY: Doubleday.

Sutcliffe, S. (2013, April 10). What happens to a Lamborghini Gallardo when you switch traction control off? [Video file]. Retrieved from https://www.youtube.com/watch?v=I86mQZJWabU&feature=youtu.be&t=93

Tapia, A., Bajpai, K., Jansen, B., Yen, J., & Giles, L. (2011). Seeking the trustworthy tweet: Can microblogged data fit the information needs of disaster response and humanitarian relief organizations. Proceedings of the 8th International ISCRAM Conference, 1-10. Retrieved from http://www.iscramlive.org/ISCRAM2011/proceedings/papers/161.pdf

Topping, S., & Gissler, S. (2016). The Pulitzer Prizes. Retrieved from http://www.pulitzer.org/page/history-pulitzer-prizes

Tufeki, Z. (2016, May 19). The real bias built in at Facebook - The New York Times. Retrieved from http://www.nytimes.com/2016/05/19/opinion/the-real-bias-built-in-at-facebook.html

Ushahidi. (2008). About Ushahidi - Ushahidi. Retrieved October 4, 2016, from https://www.ushahidi.com/about

Valente, T., Coronges, K., Lakon, C., & Costenbader, E. (2010). How correlated are network centrality measures? (PMC2875682) Retrieved from National Institutes of Health website: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875682/

Van Aelst, P., & Walgrave, S. (2016). Information and arena: The dual function of the news media for political elites. Journal of Communication, 66(3), 496-518. doi:10.1111/jcom.12229

Vrij, A., & Granhag, P. A. (2014). Eliciting information and detecting lies in intelligence interviewing: An overview of recent research. Applied Cognitive Psychology, 28(6), 936-944. doi:10.1002/acp.3071

Webber, W., Moffat, A., & Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4), 1-38. doi:10.1145/1852102.1852106

Weber, I., & Castillo, C. (2010). The demographics of web search. Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10, 523-530. doi:10.1145/1835449.1835537

Weidmann, N. B. (2014). On the accuracy of media-based conflict event data. Journal of Conflict Resolution, 59(6), 1129-1149. doi:10.1177/0022002714530431

Wikipedia. (2016, September 23). Zettabyte - Wikipedia, the free encyclopedia. Retrieved October 4, 2016, from https://en.wikipedia.org/wiki/Zettabyte

Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074-2081. doi:10.1037/a0038199

Winston, P. (2014, January 10). Reasoning: Goal trees and rule-based expert systems [Video file]. Retrieved from https://www.youtube.com/watch?v=leXa7EKUPFk#t=16m43s

Winter, S., Metzger, M. J., & Flanagin, A. J. (2016). Selective use of news cues: A multiple-motive perspective on information selection in social media environments. Journal of Communication, 66(4), 669-693. doi:10.1111/jcom.12241

Winter, S., Metzger, M. J., & Flanagin, A. J. (2016). Selective use of news cues: A multiple-motive perspective on information selection in social media environments. Journal of Communication, 66(4), 669-693. doi:10.1111/jcom.12241

Yardi, S., & Boyd, D. (2010). Dynamic debates: An analysis of group polarization over time on twitter. Bulletin of Science, Technology & Society, 30(5), 316-327. doi:10.1177/0270467610380011

Yu, A. J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46(4), 681-692. doi:10.1016/j.neuron.2005.04.026

Zafar, M. B., Bhattacharya, P., Ganguly, N., Ghosh, S., & Gummadi, K. P. (2016). On the wisdom of experts vs. crowds: discovering trustworthy topical news in microblogs.Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing - CSCW '16. doi:10.1145/2818048.2819968

Zhou, W., & Sornette, D. (2004). Antibubble and prediction of China's stock market and real-estate. Physica A: Statistical Mechanics and its Applications, 337(1-2), 243-268. doi:10.1016/j.physa.2004.01.051

Zimmerman, J., Forlizzi, J., & Evenson, S. (2007). Research through design as a method for interaction design research in HCI. Proceedings of the SIGCHI conference on Human factors in computing systems - CHI '07. doi:10.1145/1240624.1240704

Appendix 1 - RQ surveys

RQ2 semi-structured interview questions


[1] 8 Zbits / (11.82bits per word * 125 wpm *1440 minutes per day) = 10,301,658,309,446

[2] The concept of ‘consumed’ is somewhat tricky. There is a qualitative difference between news that is bought as a distinct item, consumed for ‘free’ (supported by advertising), and state supported or run. Because search blurs the cues that indicate the source of the information (and its bias), my intuition the interaction of news consumers with news items would be affected only when the diversity of sources becomes restricted or inaccessible.

[3] Such as what seems to be happening on gun rights/ gun control  (Elsbach & Bhattacharya, 2001)

[4] Query run June 6, 2016

[5] A query on Google News, run on June 23 2016 returned similar results, and very different from the Washington Post Metro Section (archive viewer?).

Sound Mind Sound Body Academy: Washington D.C. Primer

Severe Weather Turns Washington, DC, Metro Station Into a Waterfall

How Washington, DC, Is Preparing for the Next Terrorist Attack

Where to Eat Peruvian Food Around Washington, DC

Zeeland students to attend conference in Washington, DC

Washington, DC announces special events during US Travel ...

Scammer or Entrepreneur? Washington DC's Kushgod Defends His ...

Veterans reflect on experience after Honor Flight to Washington, D.C.

Washington DC Train Station Flooded

[6] Consider the specialized DC metro map vs a USGS topo map of the same region.

[7] GUI, socio-technical, or political? Possibly a context-sensitive blend.

[8] The concepts of confirmer, avoider and explorer are discussed in more detail in section 3.13

[9] This may be a Pleasurable Troublemaker design. Possibly using gamification techniques, such as Buzzfeed-ish or Pew quizes?

[10] And possibly amplified. Since trustworthy readers/sessions can be regarded as a type of classifier, one avenue to explore may be to use machine learning classifier amplification strategies such as adaptive boosting (adaboost).

[11] Such as  a ‘temporary follow feature?’ I think this would be easy to write. Some kind of disposable persona/bot mediated through a dashboard? The Twitter API would have no problem with this. Might have to go through a DB.

[12] Screenshots taken on September 15, 2016

[13] A variety of mechanisms can be used for this, ranging from semantic web marking by the publisher to machine learning textual analysis to link analysis.

[14] Their webapp is available at http://twitter-app.mpi-sws.org/what-is-happening

[15] As an interesting validation of this approach, I would submit the PBS Idea Channel, which promotes high quality comments of an episode by having a second, follow-up episode that presents and responds to the best user content. YouTube comments in general are notoriously low quality and often trollish. The Idea Channel comments appear remarkable in their high ratio of quality user generated content.

[16] Watch the embedded video for an excellent overview of the research.

[17] Because at some point, we all have to cite Don Norman

[18] Weights are calculated by taking the desired scalar and multiplying the value of all the links that connect one node to its neighbors. So if node A were connected to node B with a value of 10 and node C with a value of 30, doubling the weight of the node would scale those values to 20 and 60 respectively.

[19] 50 samples per iteration, 1,000 iterations. The bootstrapping approach is discussed in more detail later in this section

[20] Unless a weight is set to zero. For the purposes of this discussion, we assume that if this is the case, then the network has been altered so substantially that different measures may be needed.

[21] I would argue that hand-coding is an extremely sophisticated form of topic extraction. As such, the validity of  this approach to corpora exploration can be shown to work with naive and sophisticated forms of term generation.

[22] Words shorter than 4 characters are currently ignored due to PDF artifacts that would pollute the list. Other Parts-of-Speech information is stored, but not currently used.

[23] Limited to 16,376 as per Apache’s POI documentation.

[24] As of (at least) 6/8/16, Google appears to be doing this for image search

[25] This algorithm implements Fisher's solution of Darwin's problem of why in many species with polygamous systems of mating females should prefer mates with extreme characters that are apparently useless or deleterious for survival, such as the plumage of some male birds and the horns and tusks of certain male mammals.

Fisher showed that a positive correlation between female mating preferences and male secondary sexual characters will arise in the population because genetic variance in the preferences of females for more extreme males. The evolution of mating preferences may be self-reinforcing because once started, females are selecting not only for more extreme males but also indirectly, through the genetic correlation, for a higher intensity of mating preferences.

Fisher described this as a runaway process that must eventually be stopped by severe counterselection against extreme males or against the most discriminating females because of their difficulty in finding a suitable mate.

[26] For an overview of the bootstrap method, which works by resampling a small number of distributions into a large population, refer to https://www.youtube.com/watch?v=CKNEgikQRkw

[27] My initial assumption is that more interaction with the data per ‘query’ is correlated with diversity seeking behavior.

[28] The surveys are listed in the appendices.

[29] See Table 1 and Figure 4 in the M&R paper for examples of ‘diversity-seeking’ vs. ‘other’.

[30] As an alternative, it would be possible to use a Google or Bing custom search that just points at the web repository of documents. The confounding factor would be that results could change over time as the search algorithms shift. A further alternative is the use of an open source search engine such as TNTsearch or Solr which would provide consistent results but with more integration complexity.