Adjust keyphrase distribution scoring for short texts#22694
Merged
marinakoleva merged 10 commits intofeature/off-the-bat-analysisfrom Nov 13, 2025
Merged
Conversation
…ress-seo into adjust-keyphrase-distribution-scoring-for-short-texts
Pull Request Test Coverage Report for Build 3492c9ba44fde6b7ffe320362a46e826663ae9aaDetails
💛 - Coveralls |
agnieszkaszuba
commented
Nov 6, 2025
Comment on lines
-236
to
+264
| // It is a valid sentence if the last token of the current sentence is ending with a sentence delimiter and if next sentence exists, | ||
| // it should start with a valid sentence beginning. | ||
| if ( nextSentence ) { | ||
| const nextSentenceFirstToken = nextSentence.getFirstToken(); | ||
| return sentenceDelimiterRegex.test( currentSentenceLastToken.text ) && | ||
| ( nextSentenceFirstToken && sentenceTokenizer.isValidSentenceBeginning( nextSentenceFirstToken.text[ 0 ] ) ); | ||
| } | ||
| return sentenceDelimiterRegex.test( currentSentenceLastToken.text ); | ||
| // It is a valid sentence if the last token of the current sentence is ending with a sentence delimiter and if the next | ||
| // sentence starts with a valid sentence beginning. | ||
| const nextSentenceFirstToken = nextSentence.getFirstToken(); | ||
| return sentenceDelimiterRegex.test( currentSentenceLastToken.text ) && | ||
| ( nextSentenceFirstToken && sentenceTokenizer.isValidSentenceBeginning( nextSentenceFirstToken.text[ 0 ] ) ); |
Contributor
Author
There was a problem hiding this comment.
Unrelated to this specific issue, but I noticed that we'd never reach the line return sentenceDelimiterRegex.test( currentSentenceLastToken.text ); so I simplified the code
…st/wordpress-seo into adjust-keyphrase-distribution-scoring-for-short-texts
…in an effort to make what it does clearer
Contributor
|
CR & AT done ✔️ Note: This PR hasn't been tested on Shopify, but the behaviour on a product page in Woo has been checked. |
This was referenced Nov 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Summary
This PR can be summarized in the following changelog entry:
keyphraseDistributionScoretokeyphraseDistractionPercentageto make its function clearer.keyphraseDistributionresearch for texts that are shorter than 15 sentences.Relevant technical choices:
keyphraseDistributionScoreis renamed toKeyphraseDistractionPercentageto make its function clearer, because usingscoreto refer to both the assessment score (9, 6, 1), and the distraction score (0-100) is confusing.Test instructions
Test instructions for the acceptance test before the PR gets merged
This PR can be acceptance tested by following these steps:
Test keyphrase distribution scoring for short texts
Note
Only test this scenario for Google docs
feature/off-the-bat-analysisbranch in PremiumText with 14 sentences
Giant pandas conservation efforts have significantly improved their survival prospects over the past few decades. These giant pandas conservation efforts include the creation of more than 60 protected reserves in China. The reserves safeguard essential bamboo forest habitats. Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of giant pandas. They help to boost both wild and captive populations. The Chinese government, along with global wildlife organizations, continues to prioritize these initiatives. This will help to ensure long-term species stability. As a result of sustained conservation efforts of giant pandas, the IUCN reclassified the species from “Endangered” to “Vulnerable” in 2016. Ongoing monitoring and community engagement remain crucial to maintaining this conservation success. In 2020, the giant panda population of the new national park was already above 1,800 individuals. That's roughly 80 percent of the entire panda population in China. Establishing the new protected area in the Sichuan Province also gives various other endangered or threatened species, like the Siberian tiger, the possibility to improve their living conditions by offering them a habitat. Other species who benefit from the protection of its habitat include the snow leopard, the golden snub-nosed monkey, the red panda and the complex-toothed flying squirrel. In July 2021, Chinese conservation authorities announced that giant pandas are no longer endangered in the wild following years of conservation efforts, with a population in the wild exceeding 1,800.Test keyphrase distribution scoring for long texts (regression)
Text with 14 sentences
Giant pandas conservation efforts have significantly improved their survival prospects over the past few decades. These giant pandas conservation efforts include the creation of more than 60 protected reserves in China. The reserves safeguard essential bamboo forest habitats. Breeding programs, habitat restoration, and international partnerships are all key components of the conservation efforts of giant pandas. They help to boost both wild and captive populations. The Chinese government, along with global wildlife organizations, continues to prioritize these initiatives. This will help to ensure long-term species stability. As a result of sustained conservation efforts of giant pandas, the IUCN reclassified the species from “Endangered” to “Vulnerable” in 2016. Ongoing monitoring and community engagement remain crucial to maintaining this conservation success. In 2020, the giant panda population of the new national park was already above 1,800 individuals. That's roughly 80 percent of the entire panda population in China. Establishing the new protected area in the Sichuan Province also gives various other endangered or threatened species, like the Siberian tiger, the possibility to improve their living conditions by offering them a habitat. Other species who benefit from the protection of its habitat include the snow leopard, the golden snub-nosed monkey, the red panda and the complex-toothed flying squirrel. In July 2021, Chinese conservation authorities announced that giant pandas are no longer endangered in the wild following years of conservation efforts, with a population in the wild exceeding 1,800.Text fix of unreleased bug where Keyphrase distribution inaccurately returned a good score
Scenario 1
Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms.Scenario 2
Keyphrase distribution: Please add both a keyphrase and some text containing the keyphrase or its synonyms..Relevant test scenarios
Test instructions for QA when the code is in the RC
QA can test this PR by following these steps:
Impact check
This PR affects the following parts of the plugin, which may require extra testing:
Other environments
[shopify-seo], added test instructions for Shopify and attached theShopifylabel to this PR.Documentation
Quality assurance
Innovation
innovationlabel.Fixes https://github.com/Yoast/lingo-other-tasks/issues/626