Filter bubbles, echo chambers, and online news consumption
Uploaded by: Alt-Tab
Upload date: 2019-04-26 15:55:25


Very good article on segregation phenomena as measured on online news consumption. ### Introduction * question asked: impact of technological changes on ideological segregation * two conflicting hypotheses: either consumption increased of like-minded opinions (echo chambers) - ex: Sunstein, 2009, or access to broader spectrum of information implies more consumption of opposite opinions - ex: Benkler, 2006 * work proposed: study 50,000 anon users from the US who regularly consume online news * ML algorithms identify hard news, then divide them in descriptive reporting vs opinion pieces * defines ideological segregation as the difference of the expected share of conservative news consumption between two random individuals * observes that segregation tends to be higher when users come from social media * observes that individual users tend to read news only from one side of the spectrum * observes counter-intuitively that reading of opposite sides tends to append more often from channels with highest segregation (social, search) * descriptive reporting corresponds to about 75% of the traffic * online news consumption still dominated by mainstream medias ### Data and methods * data collection: from Bing toolbar for IE => 1.2M US citizens from March to May 2013 * focus son 50,000 regular newsreaders => 2.3 billion pages (median:~1000 pages per user) * selection bias: individuals who accept to share their info ; IE users in general more aged * test representativeness by measuring Spearman coefficient of consumption on the dataset with Quantcast and Alexa rankings: 0.67 et 0.7 ; while Spearman(Quantcast, Alexa) ~ 0.64 ##### identifying news and opinion articles : * use Open Directory Project => identify ~8000 domains as news, politics, etc. * contain major national sources, important regional outlets, important blogs * isolate 4.1M articles, but not always relevant in terms of ideology (e.g. sports, weather,...) => isolate with ML 1.9M of "front section news", among which 200,000 opinion stories (Tab1 indicates terms highly predictive of the categories) ##### measuring the political slant of publishers * impossible to do it manually, but no easy way to do it automatically for all 1.9M articles => assign slant of the outlet * use outlet readers slant, inferred from vote at the presidential election, which is inferred from the location through the IP address * robustness check: Tab2 lists top20 consistent to common knowledge and consistent with previous studies (Gentzkow et Shapiro, 2011) ##### inferring consumption channels * 4 info channels: direct (visit the domain), social (TB, Twitter, mail), search (Google, Bing, Yahoo), aggregator (Google news) * use the referrer domain to define the channel (interpretations pb to solve, eg if ref=Facebook and 4 articles read, are all of them from social origin?) ##### limiting to active news consumers : * limit = 10 news articles and 2 opinion pieces during the 3 months period => from 1.2M to 50.000 users (so 4%) * RK: some conclusions are still true with looser threshold ### Results #### overall segregation * individual polarity = average from polarities of the outlets consumed * segregation = distance between polarity scores * naive estimation insufficient => use of a hierarchical bayesian model ##### bayesian model * process standard in the literature? see Gelman et Hill, 2007 * look for sigma_d global dispersion * polarity of user i supposed to be distributed according to a normal law with latent variables * evaluate parameters using approximate marginal likelihood estimate ##### segregation * distribution of users polarity obtained: see fig2 * segregation = sqrt(2).sigma_p = 0.11 * 2/3 of the scores are between 0.41 and 0.54 => most people are moderate #### segregation by channel and article subjectivity * pb of data scarcity exacerbated by dividing data into channels * but for a user polarity probably correlated for different channels * same type of bayesian model but with a 8 dimension vector * 8 dimensions = 4 channels * 2 classes (opinion and report) ##### results on fig3 : segregation per channel * trend: segregation effect stronger for opinions * trend: social media tend to increase segregation effects * strongest segregation for search ; possible explanations: 1) search formulations are already oriented, 2) when search formulated, users read like-minded medias * as access to a large variety of media comes from the technology, they cause the segregation effect * trend: aggregators => less segregation * interpretation of overall segregation effect weakness: even after pre-filtering, many news are not polarizing * general conclusion: there is a filter bubble effect but still limited #### ideological isolation ##### two conflicting hypotheses * moderate polarization and individuals consume a large spectrum of opinions * moderate polarization but individuals consume a thin spectrum of opinions * dispersion sigma_d=0.06, very small => rather second hypothesis * explanation: 78% of users use only 1 source, 94% one or two sources * RK: still true for users with larger number of sources ##### Dispersion per user and per channel: Fig4a * more or less identical for news and opinions * direct: lowest dispersion, search: highest dispersion ##### Dispersion per individual polarity: Fig4b * most polarized individuals are also ones with highest dispersion ##### Does it mean that highly polarized individuals see opposite opinions? (Fig5) * test by ranking medias with l from left (0) to right (1) * define opposing partisan exposure o_i = min (l_i , 1-l_i) * fig5: percentage of exposure to opposite opinion articles, depending on the channel and on the user polarity * lower than 20% in all cases * weaker for opinion pieces than reports * lowest for most partisan users * conclusion: users read ideologically homogeneous outlets, and partisan users are in general exposed only to their side of the spectrum ### Discussion and conclusion ##### Overall * with social media and web search in general more segregation than with direct consumption * however, channels with more segregation are counter-intuitively related to a wider range of opinions * majority of online behaviors mimick reading habits: most users go to their favorite outlet (which are predominantly mainstream) ##### Limits * measure slant of the outlet, not of an article * focus only on consumption, not on the vote itself * no measure of amplifying effect of social medias or search engines
Alt-Tab at 2019-04-26 16:21:25
Edited by Alt-Tab at 2019-04-26 17:44:19
I agree that it is a very good article. I wish the Algodiv project had the same data… The main contributions are the results on echo chambers: **people _are_ in echo chambers, but the influence remains limited because all users massively consume news content from mainstream media**. I find the "segregation" metric interesting, it should be compared (and merged?) with other diversity-related metrics. I see a few more limits with the methodology. The main one is the reliability of the _slant_ of a news outlet, obtained from the location of the IP addresses of the readers (not exactly reliable) matched with polls results for a given county at the 2016 presidential election(!). Besides the reliability of the metric, it is very hard to have any notion of this made-up scale. Is a 0.11 interval large or small? What does it mean to have BBC at 0.3 and FoxNews at 0.59? Is a difference between 0.3 and 0.32 truly the same as between 0.48 and 0.5?

Please consider to register or login to comment on the paper.