Nice article : easy to read, recommendation method is quite straightforward and efficient considering the task, impressive dataset.
## Summary :
### Introduction
* two aspects underexplored in the field of news consumption : increase of shortcuts to news portals, browsing behavior in news consumption
* focus on the influence of the referrer on the user behavior (to predict where they go)
* data 500 millions viewlogs of Yahoo News
* compare 24 types of recommendation (actually 24 flavors as the principle is quite similar from a method to another)
* contributions: browse graph definition, study the browse graph on their dataset, provide recommendation method for next article to read
### Related work
* problem of cold start recommendation, one possible way to circumvent is to use a small set of preferences ("warm start")
=> how to use the little amount of information available (like the social network)
* here, info = referrer URL and current reading
* literature for news recommendation : use predominantly user history ; here sort of collaborative filtering
* browse graph and referrer URL : ref analyzing browsing sessions ; browse graph already used in literature
### Browse graph in the news domain
##### dataset :
- Yahoo pages have infinite recommendation when scrolling
- cookie contains : current URL, referrer URL, temporal information, brwoser information
- split in sessions with timeout 25 minutes
- 22 article topics (editorially assigned) - see tab3 for examples
##### about the browsegraph :
- definition : aggregated graph of transitions, with weights
- paper focus : contents differences depending of the origin
- process : breaking down the browsegraph depending of the referrer (Twitter, Facebook, search engine, etc)
- hourly separation
### Analysis
##### description referrer graphs :
- browsing sessions are short but typical distance in GWCC of the browse graph is 5 (cf tab2)
- all referrer graphs have a GWCC which contains more than around 90% of nodes
- degree distributions vary a lot from one to another, weight distributions don't (fig2)
- testing if RG vary from one to the other in terms of nodes => measure overlap with Jaccard index and Kendall-tau between pageranks (fig3)
=> two major groups : search vs social
- most popular topics per RG : see tab3
##### evolution through time :
- fig4 : cumulative number of views => 80% visits during first 30hrs and first 20% of lifespan (consistent with literature)
- does it vary with RG considered ?
- fig5 : 3 categories homepage / search / social
=> rapid decay in three cases, social exhibits consumption spikes later in their life span
- fig6 : topic influence
=> most cases homepage > search > social ; standard dynamic (sports, movies, blogs) : search starts close to social and then gets closer to homepage
- fig7 : change through time, Kendall tau (rank at time t=0 vs rank at time t)
=> decrease then steady after roughly 24h (questionable observation from my pov), lesser offset for search than for the rest
### Cold-start prediction of next view
##### problem definition :
- predicting page seen after starting page
- restriction to users who have at least one page view after starting page
##### selection of candidate pages :
- full : all (out)neighbors in the Browse graph
- ref : or all neighbors in the Referrer graph only
- or mixed : if no proposition in the RG then full BG
##### topical filtering :
- case of Twitter and Facebook : possible additional constraint to search into the same category
##### prediction method :
- random (baseline)
- cb : most similar in content, using text-based metrics
- pop: highest view count at previous timestep
- edge: maximum weight link from the same node at the previous timestep
##### results (Fig8) :
- quality measures : precision and mean reciprocal ranking@3
- trend : random < pop ~< cb < edge => (overall conclusion) weights of the BG effective to anticipate user needs
- trend : full < ref < mixed
- precision increases with smaller domains, just because the set of possibilities is smaller
- trend : using topical filtering => drop in precision (because high probability of topic transition)
Nice article : easy to read, recommendation method is quite straightforward and efficient considering the task, impressive dataset.
## Summary :
### Introduction
* two aspects underexplored in the field of news consumption : increase of shortcuts to news portals, browsing behavior in news consumption
* focus on the influence of the referrer on the user behavior (to predict where they go)
* data 500 millions viewlogs of Yahoo News
* compare 24 types of recommendation (actually 24 flavors as the principle is quite similar from a method to another)
* contributions: browse graph definition, study the browse graph on their dataset, provide recommendation method for next article to read
### Related work
* problem of cold start recommendation, one possible way to circumvent is to use a small set of preferences ("warm start")
=> how to use the little amount of information available (like the social network)
* here, info = referrer URL and current reading
* literature for news recommendation : use predominantly user history ; here sort of collaborative filtering
* browse graph and referrer URL : ref analyzing browsing sessions ; browse graph already used in literature
### Browse graph in the news domain
##### dataset :
- Yahoo pages have infinite recommendation when scrolling
- cookie contains : current URL, referrer URL, temporal information, brwoser information
- split in sessions with timeout 25 minutes
- 22 article topics (editorially assigned) - see tab3 for examples
##### about the browsegraph :
- definition : aggregated graph of transitions, with weights
- paper focus : contents differences depending of the origin
- process : breaking down the browsegraph depending of the referrer (Twitter, Facebook, search engine, etc)
- hourly separation
### Analysis
##### description referrer graphs :
- browsing sessions are short but typical distance in GWCC of the browse graph is 5 (cf tab2)
- all referrer graphs have a GWCC which contains more than around 90% of nodes
- degree distributions vary a lot from one to another, weight distributions don't (fig2)
- testing if RG vary from one to the other in terms of nodes => measure overlap with Jaccard index and Kendall-tau between pageranks (fig3)
=> two major groups : search vs social
- most popular topics per RG : see tab3
##### evolution through time :
- fig4 : cumulative number of views => 80% visits during first 30hrs and first 20% of lifespan (consistent with literature)
- does it vary with RG considered ?
- fig5 : 3 categories homepage / search / social
=> rapid decay in three cases, social exhibits consumption spikes later in their life span
- fig6 : topic influence
=> most cases homepage > search > social ; standard dynamic (sports, movies, blogs) : search starts close to social and then gets closer to homepage
- fig7 : change through time, Kendall tau (rank at time t=0 vs rank at time t)
=> decrease then steady after roughly 24h (questionable observation from my pov), lesser offset for search than for the rest
### Cold-start prediction of next view
##### problem definition :
- predicting page seen after starting page
- restriction to users who have at least one page view after starting page
##### selection of candidate pages :
- full : all (out)neighbors in the Browse graph
- ref : or all neighbors in the Referrer graph only
- or mixed : if no proposition in the RG then full BG
##### topical filtering :
- case of Twitter and Facebook : possible additional constraint to search into the same category
##### prediction method :
- random (baseline)
- cb : most similar in content, using text-based metrics
- pop: highest view count at previous timestep
- edge: maximum weight link from the same node at the previous timestep
##### results (Fig8) :
- quality measures : precision and mean reciprocal ranking@3
- trend : random < pop ~< cb < edge => (overall conclusion) weights of the BG effective to anticipate user needs
- trend : full < ref < mixed
- precision increases with smaller domains, just because the set of possibilities is smaller
- trend : using topical filtering => drop in precision (because high probability of topic transition)
The paper describes the architecture of YouTube current recommendation system (as of 2016), based on Deep Learning, which replaced the previous Matrix Factorization based architecture (ref 23). The architecture uses Google Brain's TensorFlow tool and achieves significantly better than the former one (see fig6 for example).
The paper is a high-level description of the architecture, and sometimes lack the technical details which would allow a precise understanding. However, it provides very interesting ideas about the problems faced and solutions contemplated by YouTube engineers and of actual constraints with industrial recommendation systems. It also helps realizing that there is as much craft as science in the process.
Overall, the architecture consists of two main phases :
- a coarse candidate generation (creating a set of a few hundred videos per user from the corpus of several million videos available),
- precise ranking of the candidates.
Both steps use a DNN architecture.
Some technical details :
- user profile is summarized as a heterogeneous embedding of identifiers of videos watched, demographic information, search tokens, etc.
- a decisive advantage of DNN put into light in the paper is their ability to deal with heterogeneous signal (sources of information of various nature), another is that it (partly) circumvents feature manufacturing
- while development of the method calls to offline evaluation metrics (precision, recall, etc.), the final evaluation relies on live A/B testing experiments, the discussion related to this point in Sec 3.4 is very interesting
- for candidate generation, YouTube uses implicit feedback information (e.g. watch times) rather than explicit feedback (e.g. thumb up) because there is more information available
- taking into account the "freshness" of a video has an important impact on the efficacy of the candidate generation (fig 4)
- taking into account the context of a watch (meaning the sequence of watch) is also important as co-watch distribution probability is very asymmetric, in particular taking into account the previous action of a user related to similar items matters
The paper describes the architecture of YouTube current recommendation system (as of 2016), based on Deep Learning, which replaced the previous Matrix Factorization based architecture (ref 23). The architecture uses Google Brain's TensorFlow tool and achieves significantly better than the former one (see fig6 for example).
The paper is a high-level description of the architecture, and sometimes lack the technical details which would allow a precise understanding. However, it provides very interesting ideas about the problems faced and solutions contemplated by YouTube engineers and of actual constraints with industrial recommendation systems. It also helps realizing that there is as much craft as science in the process.
Overall, the architecture consists of two main phases :
- a coarse candidate generation (creating a set of a few hundred videos per user from the corpus of several million videos available),
- precise ranking of the candidates.
Both steps use a DNN architecture.
Some technical details :
- user profile is summarized as a heterogeneous embedding of identifiers of videos watched, demographic information, search tokens, etc.
- a decisive advantage of DNN put into light in the paper is their ability to deal with heterogeneous signal (sources of information of various nature), another is that it (partly) circumvents feature manufacturing
- while development of the method calls to offline evaluation metrics (precision, recall, etc.), the final evaluation relies on live A/B testing experiments, the discussion related to this point in Sec 3.4 is very interesting
- for candidate generation, YouTube uses implicit feedback information (e.g. watch times) rather than explicit feedback (e.g. thumb up) because there is more information available
- taking into account the "freshness" of a video has an important impact on the efficacy of the candidate generation (fig 4)
- taking into account the context of a watch (meaning the sequence of watch) is also important as co-watch distribution probability is very asymmetric, in particular taking into account the previous action of a user related to similar items matters
There is no 👍-style explicit feedback for the comments yet.
So, I'll use the implicit one.
Alt-Tab's comment is a very condensed one, it matters, I even think that this comment is better than the orignal article abstract. Now I really want to drop everything, sit down and read this article in details.
There is no 👍-style explicit feedback for the comments yet.
So, I'll use the implicit one.
Alt-Tab's comment is a very condensed one, it matters, I even think that this comment is better than the orignal article abstract. Now I really want to drop everything, sit down and read this article in details.
Comments: