Skip to content

Adjust keyphrase distribution scoring for short texts#22694

Merged
marinakoleva merged 10 commits intofeature/off-the-bat-analysisfrom
adjust-keyphrase-distribution-scoring-for-short-texts
Nov 13, 2025
Merged

Adjust keyphrase distribution scoring for short texts#22694
marinakoleva merged 10 commits intofeature/off-the-bat-analysisfrom
adjust-keyphrase-distribution-scoring-for-short-texts

Conversation

@agnieszkaszuba
Copy link
Copy Markdown
Contributor

@agnieszkaszuba agnieszkaszuba commented Nov 6, 2025

Context

  • After making the keyphrase distribution assessment available off the bat, we want to adjust the scoring criteria for shorter texts, which the assessment was previously not available for. The keyphrase distribution formula does not work well with shorter texts, so instead we want to give a good score if the keyphrase is found at least once in the text, and a bad score otherwise.
  • This PR also fixes a separate issue: Off the bat: Keyphrase distribution inaccurately returns a good score

Summary

This PR can be summarized in the following changelog entry:

  • Adjusts the keyphrase distribution assessment's scoring criteria for texts that are shorter than 15 sentences.
  • [shopify-seo] Adjusts the keyphrase distribution assessment's scoring criteria for texts that are shorter than 15 sentences.
  • [yoast-doc-extension] Adjusts the keyphrase distribution assessment's scoring criteria for texts that are shorter than 15 sentences.
  • Fixes an unreleased bug where keyphrase distribution would inaccurately return a green traffic light when there was a keyphrase set and the content consisted solely of excluded blocks.
  • [yoastseo enhancement] Renames keyphraseDistributionScore to keyphraseDistractionPercentage to make its function clearer.
  • [yoastseo enhancement] Adjusts the distraction percentage criteria in the keyphraseDistribution research for texts that are shorter than 15 sentences.

Relevant technical choices:

  • keyphraseDistributionScore is renamed to KeyphraseDistractionPercentage to make its function clearer, because using score to refer to both the assessment score (9, 6, 1), and the distraction score (0-100) is confusing.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Test keyphrase distribution scoring for short texts

Note

Only test this scenario for Google docs

  • Activate Yoast SEO and Yoast SEO Premium
    • Acceptance tester: Build the feature/off-the-bat-analysis branch in Premium
  • Create a post (or product if in Shopify)
  • Add the following text with 14 sentences:
Text with 14 sentencesGiant pandas conservation efforts have significantly improved their survival prospects over the past few decades. These giant pandas conservation efforts include the creation of more than 60 protected reserves in China. The reserves safeguard essential bamboo forest habitats. Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of giant pandas. They help to boost both wild and captive populations. The Chinese government, along with global wildlife organizations, continues to prioritize these initiatives. This will help to ensure long-term species stability. As a result of sustained conservation efforts of giant pandas, the IUCN reclassified the species from “Endangered” to “Vulnerable” in 2016. Ongoing monitoring and community engagement remain crucial to maintaining this conservation success. In 2020, the giant panda population of the new national park was already above 1,800 individuals. That's roughly 80 percent of the entire panda population in China. Establishing the new protected area in the Sichuan Province also gives various other endangered or threatened species, like the Siberian tiger, the possibility to improve their living conditions by offering them a habitat. Other species who benefit from the protection of its habitat include the snow leopard, the golden snub-nosed monkey, the red panda and the complex-toothed flying squirrel. In July 2021, Chinese conservation authorities announced that giant pandas are no longer endangered in the wild following years of conservation efforts, with a population in the wild exceeding 1,800.
  • Set the keyphrase to: animal conservation efforts
  • Confirm that the keyphrase distribution assessment returns a red traffic light and the following feedback:
    • Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms.
  • Change the keyphrase to: panda population in China
    • This keyphrase occurs once in the text.
  • Confirm that the keyphrase distribution assessment returns a green traffic light and the following feedback:
    • Keyphrase distribution: Good job!
  • Move the sentence "That's roughly 80 percent of the entire panda population in China." to the end of the text.
  • Confirm that the keyphrase distribution assessment feedback stays the same.
  • Change the keyphrase to: giant panda conservation
    • This keyphrase occurs in the text 5 times
  • Confirm that the keyphrase distribution feedback stays the same

Test keyphrase distribution scoring for long texts (regression)

  • Create a new post or remove the text you have in your current one
  • Start with the text from the beginning of the previous section:
Text with 14 sentencesGiant pandas conservation efforts have significantly improved their survival prospects over the past few decades. These giant pandas conservation efforts include the creation of more than 60 protected reserves in China. The reserves safeguard essential bamboo forest habitats. Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of giant pandas. They help to boost both wild and captive populations. The Chinese government, along with global wildlife organizations, continues to prioritize these initiatives. This will help to ensure long-term species stability. As a result of sustained conservation efforts of giant pandas, the IUCN reclassified the species from “Endangered” to “Vulnerable” in 2016. Ongoing monitoring and community engagement remain crucial to maintaining this conservation success. In 2020, the giant panda population of the new national park was already above 1,800 individuals. That's roughly 80 percent of the entire panda population in China. Establishing the new protected area in the Sichuan Province also gives various other endangered or threatened species, like the Siberian tiger, the possibility to improve their living conditions by offering them a habitat. Other species who benefit from the protection of its habitat include the snow leopard, the golden snub-nosed monkey, the red panda and the complex-toothed flying squirrel. In July 2021, Chinese conservation authorities announced that giant pandas are no longer endangered in the wild following years of conservation efforts, with a population in the wild exceeding 1,800.
  • Add this sentence to the text so that it now contains 15 sentences:
    • China has received international praise for its conservation of the species, which has also helped the country establish itself as a leader in endangered species conservation.
  • Set the keyphrase to: animal conservation efforts
  • Confirm that the keyphrase distribution assessment returns a red traffic light and the following feedback:
    • Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms.
  • Change the keyphrase to: panda population in China
    • This keyphrase occurs once in the text.
  • Confirm that the keyphrase distribution assessment returns a red traffic light and the following feedback:
    • Keyphrase distribution: Very uneven. Large parts of your text do not contain the keyphrase or its synonyms. Distribute them more evenly.
  • Change the sentence "Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of giant pandas." to:
    • Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of the giant pandas population in China.
  • Confirm that the keyphrase distribution assessment returns an orange traffic light and the following feedback:
    • Keyphrase distribution: Uneven. Some parts of your text do not contain the keyphrase or its synonyms. Distribute them more evenly.
  • Change the keyphrase to: giant panda
    • This keyphrase occurs six times in the text.
  • Confirm that the keyphrase distribution assessment returns a green traffic light and the following feedback:
    • Keyphrase distribution: Good job!

Text fix of unreleased bug where Keyphrase distribution inaccurately returned a good score

Scenario 1

  1. Create an empty post
  2. Set the focus keyphrase
  3. Add spaces to the post, OR add a Yoast Breadcrumb block
  4. Confirm the keyphrase distribution assessment returns a red 🔴 bullet and the feedback string Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms.

Scenario 2

  1. Add a new product in Woo (no text)
  2. Add a keyphrase
  3. Confirm Keyphrase distribution returns a red 🔴 bullet and the feedback string Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms..

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes https://github.com/Yoast/lingo-other-tasks/issues/626

@agnieszkaszuba agnieszkaszuba added the changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog label Nov 6, 2025
@agnieszkaszuba agnieszkaszuba added the Shopify This PR impacts Shopify. label Nov 6, 2025
@agnieszkaszuba agnieszkaszuba marked this pull request as ready for review November 6, 2025 15:17
…ress-seo into adjust-keyphrase-distribution-scoring-for-short-texts
@coveralls
Copy link
Copy Markdown

coveralls commented Nov 6, 2025

Pull Request Test Coverage Report for Build 3492c9ba44fde6b7ffe320362a46e826663ae9aa

Details

  • 15 of 15 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.007%) to 53.442%

Totals Coverage Status
Change from base Build d11c2d337cd7e7e52815a482629c59e679f0d1a7: 0.007%
Covered Lines: 32294
Relevant Lines: 60668

💛 - Coveralls

Comment on lines -236 to +264
// It is a valid sentence if the last token of the current sentence is ending with a sentence delimiter and if next sentence exists,
// it should start with a valid sentence beginning.
if ( nextSentence ) {
const nextSentenceFirstToken = nextSentence.getFirstToken();
return sentenceDelimiterRegex.test( currentSentenceLastToken.text ) &&
( nextSentenceFirstToken && sentenceTokenizer.isValidSentenceBeginning( nextSentenceFirstToken.text[ 0 ] ) );
}
return sentenceDelimiterRegex.test( currentSentenceLastToken.text );
// It is a valid sentence if the last token of the current sentence is ending with a sentence delimiter and if the next
// sentence starts with a valid sentence beginning.
const nextSentenceFirstToken = nextSentence.getFirstToken();
return sentenceDelimiterRegex.test( currentSentenceLastToken.text ) &&
( nextSentenceFirstToken && sentenceTokenizer.isValidSentenceBeginning( nextSentenceFirstToken.text[ 0 ] ) );
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this specific issue, but I noticed that we'd never reach the line return sentenceDelimiterRegex.test( currentSentenceLastToken.text ); so I simplified the code

@marinakoleva
Copy link
Copy Markdown
Contributor

CR & AT done ✔️

Note: This PR hasn't been tested on Shopify, but the behaviour on a product page in Woo has been checked.

@marinakoleva marinakoleva merged commit 42effbd into feature/off-the-bat-analysis Nov 13, 2025
19 checks passed
@marinakoleva marinakoleva deleted the adjust-keyphrase-distribution-scoring-for-short-texts branch November 13, 2025 15:50
@FAMarfuaty FAMarfuaty added the Google Docs Add-on If this PR is also relevant or has an impact on the Google Docs Add-on label Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog Google Docs Add-on If this PR is also relevant or has an impact on the Google Docs Add-on Shopify This PR impacts Shopify.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants