Nice article : easy to read, recommendation method is quite straightforward and efficient considering the task, impressive dataset. ## Summary : ### Introduction * two aspects underexplored in the field of news consumption : increase of shortcuts to news portals, browsing behavior in news consumption * focus on the influence of the referrer on the user behavior (to predict where they go) * data 500 millions viewlogs of Yahoo News * compare 24 types of recommendation (actually 24 flavors as the principle is quite similar from a method to another) * contributions: browse graph definition, study the browse graph on their dataset, provide recommendation method for next article to read ### Related work * problem of cold start recommendation, one possible way to circumvent is to use a small set of preferences ("warm start") => how to use the little amount of information available (like the social network) * here, info = referrer URL and current reading * literature for news recommendation : use predominantly user history ; here sort of collaborative filtering * browse graph and referrer URL : ref analyzing browsing sessions ; browse graph already used in literature ### Browse graph in the news domain ##### dataset : - Yahoo pages have infinite recommendation when scrolling - cookie contains : current URL, referrer URL, temporal information, brwoser information - split in sessions with timeout 25 minutes - 22 article topics (editorially assigned) - see tab3 for examples ##### about the browsegraph : - definition : aggregated graph of transitions, with weights - paper focus : contents differences depending of the origin - process : breaking down the browsegraph depending of the referrer (Twitter, Facebook, search engine, etc) - hourly separation ### Analysis ##### description referrer graphs : - browsing sessions are short but typical distance in GWCC of the browse graph is 5 (cf tab2) - all referrer graphs have a GWCC which contains more than around 90% of nodes - degree distributions vary a lot from one to another, weight distributions don't (fig2) - testing if RG vary from one to the other in terms of nodes => measure overlap with Jaccard index and Kendall-tau between pageranks (fig3) => two major groups : search vs social - most popular topics per RG : see tab3 ##### evolution through time : - fig4 : cumulative number of views => 80% visits during first 30hrs and first 20% of lifespan (consistent with literature) - does it vary with RG considered ? - fig5 : 3 categories homepage / search / social => rapid decay in three cases, social exhibits consumption spikes later in their life span - fig6 : topic influence => most cases homepage > search > social ; standard dynamic (sports, movies, blogs) : search starts close to social and then gets closer to homepage - fig7 : change through time, Kendall tau (rank at time t=0 vs rank at time t) => decrease then steady after roughly 24h (questionable observation from my pov), lesser offset for search than for the rest ### Cold-start prediction of next view ##### problem definition : - predicting page seen after starting page - restriction to users who have at least one page view after starting page ##### selection of candidate pages : - full : all (out)neighbors in the Browse graph - ref : or all neighbors in the Referrer graph only - or mixed : if no proposition in the RG then full BG ##### topical filtering : - case of Twitter and Facebook : possible additional constraint to search into the same category ##### prediction method : - random (baseline) - cb : most similar in content, using text-based metrics - pop: highest view count at previous timestep - edge: maximum weight link from the same node at the previous timestep ##### results (Fig8) : - quality measures : precision and mean reciprocal ranking@3 - trend : random < pop ~< cb < edge => (overall conclusion) weights of the BG effective to anticipate user needs - trend : full < ref < mixed - precision increases with smaller domains, just because the set of possibilities is smaller - trend : using topical filtering => drop in precision (because high probability of topic transition)
Alt-Tab at 2019-04-23 10:02:03
Edited by Alt-Tab at 2019-04-23 10:08:03

