Filter bubbles, echo chambers, and online news consumption
Uploaded by: Alt-Tab
Upload date: 2019-04-26 15:55:25


Very good article on segregation phenomena as measured on online news consumption. ### Introduction * question asked: impact of technological changes on ideological segregation * two conflicting hypotheses: either consumption increased of like-minded opinions (echo chambers) - ex: Sunstein, 2009, or access to broader spectrum of information implies more consumption of opposite opinions - ex: Benkler, 2006 * work proposed: study 50,000 anon users from the US who regularly consume online news * ML algorithms identify hard news, then divide them in descriptive reporting vs opinion pieces * defines ideological segregation as the difference of the expected share of conservative news consumption between two random individuals * observes that segregation tends to be higher when users come from social media * observes that individual users tend to read news only from one side of the spectrum * observes counter-intuitively that reading of opposite sides tends to append more often from channels with highest segregation (social, search) * descriptive reporting corresponds to about 75% of the traffic * online news consumption still dominated by mainstream medias ### Data and methods * data collection: from Bing toolbar for IE => 1.2M US citizens from March to May 2013 * focus son 50,000 regular newsreaders => 2.3 billion pages (median:~1000 pages per user) * selection bias: individuals who accept to share their info ; IE users in general more aged * test representativeness by measuring Spearman coefficient of consumption on the dataset with Quantcast and Alexa rankings: 0.67 et 0.7 ; while Spearman(Quantcast, Alexa) ~ 0.64 ##### identifying news and opinion articles : * use Open Directory Project => identify ~8000 domains as news, politics, etc. * contain major national sources, important regional outlets, important blogs * isolate 4.1M articles, but not always relevant in terms of ideology (e.g. sports, weather,...) => isolate with ML 1.9M of "front section news", among which 200,000 opinion stories (Tab1 indicates terms highly predictive of the categories) ##### measuring the political slant of publishers * impossible to do it manually, but no easy way to do it automatically for all 1.9M articles => assign slant of the outlet * use outlet readers slant, inferred from vote at the presidential election, which is inferred from the location through the IP address * robustness check: Tab2 lists top20 consistent to common knowledge and consistent with previous studies (Gentzkow et Shapiro, 2011) ##### inferring consumption channels * 4 info channels: direct (visit the domain), social (TB, Twitter, mail), search (Google, Bing, Yahoo), aggregator (Google news) * use the referrer domain to define the channel (interpretations pb to solve, eg if ref=Facebook and 4 articles read, are all of them from social origin?) ##### limiting to active news consumers : * limit = 10 news articles and 2 opinion pieces during the 3 months period => from 1.2M to 50.000 users (so 4%) * RK: some conclusions are still true with looser threshold ### Results #### overall segregation * individual polarity = average from polarities of the outlets consumed * segregation = distance between polarity scores * naive estimation insufficient => use of a hierarchical bayesian model ##### bayesian model * process standard in the literature? see Gelman et Hill, 2007 * look for sigma_d global dispersion * polarity of user i supposed to be distributed according to a normal law with latent variables * evaluate parameters using approximate marginal likelihood estimate ##### segregation * distribution of users polarity obtained: see fig2 * segregation = sqrt(2).sigma_p = 0.11 * 2/3 of the scores are between 0.41 and 0.54 => most people are moderate #### segregation by channel and article subjectivity * pb of data scarcity exacerbated by dividing data into channels * but for a user polarity probably correlated for different channels * same type of bayesian model but with a 8 dimension vector * 8 dimensions = 4 channels * 2 classes (opinion and report) ##### results on fig3 : segregation per channel * trend: segregation effect stronger for opinions * trend: social media tend to increase segregation effects * strongest segregation for search ; possible explanations: 1) search formulations are already oriented, 2) when search formulated, users read like-minded medias * as access to a large variety of media comes from the technology, they cause the segregation effect * trend: aggregators => less segregation * interpretation of overall segregation effect weakness: even after pre-filtering, many news are not polarizing * general conclusion: there is a filter bubble effect but still limited #### ideological isolation ##### two conflicting hypotheses * moderate polarization and individuals consume a large spectrum of opinions * moderate polarization but individuals consume a thin spectrum of opinions * dispersion sigma_d=0.06, very small => rather second hypothesis * explanation: 78% of users use only 1 source, 94% one or two sources * RK: still true for users with larger number of sources ##### Dispersion per user and per channel: Fig4a * more or less identical for news and opinions * direct: lowest dispersion, search: highest dispersion ##### Dispersion per individual polarity: Fig4b * most polarized individuals are also ones with highest dispersion ##### Does it mean that highly polarized individuals see opposite opinions? (Fig5) * test by ranking medias with l from left (0) to right (1) * define opposing partisan exposure o_i = min (l_i , 1-l_i) * fig5: percentage of exposure to opposite opinion articles, depending on the channel and on the user polarity * lower than 20% in all cases * weaker for opinion pieces than reports * lowest for most partisan users * conclusion: users read ideologically homogeneous outlets, and partisan users are in general exposed only to their side of the spectrum ### Discussion and conclusion ##### Overall * with social media and web search in general more segregation than with direct consumption * however, channels with more segregation are counter-intuitively related to a wider range of opinions * majority of online behaviors mimick reading habits: most users go to their favorite outlet (which are predominantly mainstream) ##### Limits * measure slant of the outlet, not of an article * focus only on consumption, not on the vote itself * no measure of amplifying effect of social medias or search engines
Alt-Tab at 2019-04-26 16:21:25
Edited by Alt-Tab at 2019-04-26 17:44:19
I agree that it is a very good article. I wish the Algodiv project had the same data… The main contributions are the results on echo chambers: **people _are_ in echo chambers, but the influence remains limited because all users massively consume news content from mainstream media**. I find the "segregation" metric interesting, it should be compared (and merged?) with other diversity-related metrics. I see a few more limits with the methodology. The main one is the reliability of the _slant_ of a news outlet, obtained from the location of the IP addresses of the readers (not exactly reliable) matched with polls results for a given county at the 2016 presidential election(!). Besides the reliability of the metric, it is very hard to have any notion of this made-up scale. Is a 0.11 interval large or small? What does it mean to have BBC at 0.3 and FoxNews at 0.59? Is a difference between 0.3 and 0.32 truly the same as between 0.48 and 0.5?

You comment anonymously! You will not be able to edit/delete the comment.

Please consider to register or login.

Use $\LaTeX$ to type formulæ and markdown to format text.
When you post something to which you hold the copyright you authorise us to do distribute this data across the scientific community. You can post public domain content. All user-generated content will be freely available online. Please see this page to learn more about Papersγ's terms of use and privacy policy.