Cyrus Shepard asked a question to the SEO community:
How do you think Google’s Helpful Content system actually works? How do you believe they identify “Helpful Content?”
According to SEO experts, here’s how they believe Google’s helpful content system works and how Google determines “helpful content”:
Machine learning and quality rater data:
Google’s helpful content system is believed to be a machine learning system that is trained on a vast amount of labeled examples and fine-tuned using satisfaction data from quality raters. This system can predict which content is likely to be helpful, and sites with predominantly unhelpful content may be negatively classified.
Quality rater guidelines as signals:
Everything that quality raters are instructed to evaluate may be what Google looks at to identify helpful content. The Quality Rater Guidelines (QRG) are considered signals or ranking factors.
Correlation with ranking factors:
There is a strong correlation between the characteristics listed in the QRG and actual ranking factors. This is not a direct influence of QRG characteristics on ranking changes, but their correlation with ranking factors.
Content assessment and behavioral signals:
Google compares its own content assessment with user behavioral signals, particularly by analyzing content engagement and comparing it with behavioral signals to measure information gain. Content that does not lead to subsequent searches or refinements is rewarded.
Penalty system for unhelpful content:
HCU started as a filter for unhelpful content, essentially a penalty system. “Hidden gems” are considered a subsequent addition using reverse scoring.
Multilayered probability mathematics:
Identifying unhelpful content involves multilayered probability mathematics, with a multitude of small indicators of unhelpful or questionable content, each of which is a possible indicator, and collectively very effective.
Modeling based on trusted sample sites:
The system may be modeled based on the characteristics of sample sites marked as trusted by testers. Click data is considered, but does not fully explain the sample size problems of domains receiving few clicks.
Page view depth:
One factor that is considered is the depth of page view. If a user quickly returns to Google, it indicates that the page did not help, and they are trying to find another source.
Content created solely for search traffic:
The system attempts to identify websites created exclusively to receive search traffic from Google. News sites are rarely affected because they also have a lot of content that is considered interesting to their users, not just keyword-targeted.
Niche sites focused on keywords and SEO optimization:
Content considered unhelpful includes niche sites that have been too aggressive in pursuing keywords and optimization.
Aggregated click data and website vectors:
Machine learning based on aggregated click data and comparing website vectors with benchmark groups to identify helpful versus unhelpful content.