Skip to content

fixes a bug where yoast markers break html of content in shopify and classic#19499

Closed
hdvos wants to merge 28 commits intofeature/lingo-fixesfrom
19477-yoast-markers-break-html-of-content-in-shopify-and-classic
Closed

fixes a bug where yoast markers break html of content in shopify and classic#19499
hdvos wants to merge 28 commits intofeature/lingo-fixesfrom
19477-yoast-markers-break-html-of-content-in-shopify-and-classic

Conversation

@hdvos
Copy link
Copy Markdown
Contributor

@hdvos hdvos commented Dec 30, 2022

Context

  • Highlighting that used the "markWordsInSentences" strategy (such as keyphrase highlighting) did not work when there was an anchor in the text that contained the string that should be marked (e.g. the keyphrase catnip also occurred in the url in the anchor).
  • This problem was solved previously in the following PR. This was reverted as it caused a problem with classic blocks in the block editor. PC-965 Fixes highlighting when inline HTML tags are present in Classic editor and in Shopify #19162
  • The current PR uses a different (less convoluted) strategy than a previous PR.
  • There is an unsolved edge case: if a closing html bracket (>) appears in the same sentence after a word that must be marked, then this word cannot be marked. The decision was made that solving this would cost too much time compared to the gains.

Summary

This PR can be summarized in the following changelog entry:

  • Fixes a bug where the highlighting feature in the Classic editor would not work when inline HTML tags were present.
  • [shopify-seo] Fixes a bug where the yoastmark tags broke the HTML when applied to inline HTML attributes.
  • [yoastseo] Excludes applying yoastmark to anchor tag attributes.

Relevant technical choices:

  • in getSentencesFromTokens, a check was added. Previously, if there was an opening tag on the beginning of the text, and a closing tag at the end, both would be removed, regardless of whether they belonged together. In this PR a check is added that does a rudimentary (and not watertight) check whether they belong together. Once we have the html parser, this should be replaced.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

NOTE: while testing, you might note that there are different results for the same text between post and page in default editor. This is reproducible on trunk. And this issue was created for it

WordPress

  • Install and activate Yoast SEO
  • Set the site language to English
  • Create a post in Classic editor and add this text:
    catnip.txt

NOTE: Add the text in Text editor mode in Classic editor, and in Code editor mode in Block editor

Keyword density assessment
  • Set the word catnip as the focus keyphrase
  • Confirm that the keyphrase density assessment detects 8 occurrences of the keyphrase in the text
  • Embed this link below to the phrase "alternative plants exist" (the link contains the focus keyphrase):

https://en.wikipedia.org/wiki/Catnip#Felines_not_affected_by_catnip

  • Confirm that the keyword density assessment still detects 8 occurrences of the keyphrase in the text
  • Click the eye icon of keyword density assessment
  • Confirm that the "alternative plants exist" is not highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
  • Embed this link below to the phrase "catnip flowers" (the link contains the focus keyphrase)

https://hort.extension.wisc.edu/articles/catnip-nepeta-cataria/

  • Confirm that the keyword density assessment still detects 8 occurrences of the keyphrase in the text
  • Click the eye icon of keyword density assessment
  • Confirm that the "catnip" in "catnip flowers" is highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
Keyphrase distribution assessment
  • Install and activate Yoast SEO premium
    • If building the plugin, run composer require yoast/wordpress-seo:dev-PC-965-yoast-markers-break-html-of-content@dev before building it
  • Go to the previous post
  • Set the word catnip as the focus keyphrase
  • Embed this link below to the phrase "alternative plants exist" (the link contains the focus keyphrase):

https://en.wikipedia.org/wiki/Catnip#Felines_not_affected_by_catnip

  • Click the eye icon of keyphrase distribution assessment
  • Confirm that the "alternative plants exist" is not highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
  • Embed this link below to the phrase "catnip flowers":

https://hort.extension.wisc.edu/articles/catnip-nepeta-cataria/

  • Click the eye icon of keyphrase distribution assessment
  • Confirm that the "catnip" in "catnip flowers" is highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
Synonyms:
  • add catmint as a keyphrase synonym.
  • toggle the highlighting button for keyphrase density
  • Make sure all occurances of catnip are marked
  • toggle the highlighting button for the keyphrase distribution
  • Make sure all occurrences of catnip as well as all occurrences of catmint are marked
Word complexity assessment
  • Install and activate Yoast SEO premium
  • Go to the previous post
  • Go to the Readability analysis tab
  • Embed this link below to the word "hereditary" in the text:

https://en.wikipedia.org/wiki/hereditary-defects

  • Click the eye icon on the Word complexity assessment
  • Confirm that the word "hereditary" is highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
Sentence length, Passive voice, and Transition words assessment
  • Go to the previous post
  • Add this sentence to the post:

However, the compounds were found to repel <a href="https://en.wikipedia.org/wiki/Mosquito" rel="nofollow">mosquitos</a>, and it is hypothesized that rubbing against the plants provides the cats with a chemical coat that protects them against mosquito bites.

NOTE: Add the text in Text editor mode in Classic editor, and in Code editor mode in Block editor

  • Click on the eye icon of the sentence length assessment
  • Confirm that the sentence above is highlighted
  • Click on the eye icon of the passive sentence assessment
  • Confirm that the sentence above is highlighted
  • Click on the eye icon of the transition words assessments
  • Confirm that the sentence above is highlighted
Consecutive sentences assessment
  • (Assessment not available for E-commerce content types)
  • Go to the previous post
  • Add these three sentences containing anchor links that start with the same words:

<a href="https://en.wikipedia.org/wiki/Cat">Cats</a> detect nepetalactone through their <a href="https://en.wikipedia.org/wiki/Olfactory_epithelium">olfactory epithelium</a>, not through their vomeronasal organ. <a href="https://en.wikipedia.org/wiki/Cat">Cats</a> detect nepetalactone through their <a href="https://en.wikipedia.org/wiki/Olfactory_epithelium">olfactory epithelium</a>, not through their vomeronasal organ. <a href="https://en.wikipedia.org/wiki/Cat">Cats</a> detect nepetalactone through their <a href="https://en.wikipedia.org/wiki/Olfactory_epithelium">olfactory epithelium</a>, not through their vomeronasal organ.

NOTE: Add the text in Text editor mode in Classic editor, and in Code editor mode in Block editor

  • Click on the eye icon of the consecutive sentences assessment
  • Confirm that the three sentences above are highlighted
Paragraph length assessment
  • Go to the previous post
  • Make one of the paragraphs that contain links longer than 150 words
  • Click on the paragraph length assessment
  • Confirm that the long paragraph that contains links is highlighted
Paragraph length assessment
  • Combine the 2nd, 3rd and 4th paragraph to 1 paragraph.
  • Activate highlighting for paragraph length assessment
  • Make sure the newly combined paragraph is highlighted.
Inclusive language analysis
  • Go to the previous post
  • In one of the sentences that contain links, add a non-inclusive word like, "seniors", "policemen" etc.
  • Click on the eye icon of the feedback
  • Confirm that sentence that contains the non-inclusive word is highlighted

Upgrade routine

  • Install and activate the previous version of Yoast SEO

  • Set the site language to English

  • Create a post in Classic editor and add this text:
    catnip.txt
    NOTE: Add the text in Text editor mode in Classic editor

  • Set "catnip" as the focus keyphrase

  • Embed this link below to the phrase "catnip flowers":

https://hort.extension.wisc.edu/articles/catnip-nepeta-cataria/

  • Click the eye icon of keyphrase distribution assessment or keyphrase density
  • Confirm that the "catnip" in "catnip flower" is NOT highlighted
  • Upgrade Yoast SEO version to the one that includes this fix
  • Go to the previous post
  • Click the eye icon of keyphrase distribution assessment or keyphrase density
  • Confirm that the "catnip" in "catnip flower" is highlighted
Smoke test Advanced custom fields
Smoke classic editor block in block editor
  • Install and activate the classic editor.
  • Create a post and add text and save the post.
  • Deactivate the classic editor.
  • Open the post in the block editor. Do NOT try block recovery.
  • Smoke test all highlighting.

Test in Shopify

  • Install and activate Yoast SEO for Shopify

    • If building the app:
      • Build the app from main in shopify-seo
      • Link the wordpress-seo before building
  • Set the store language to English

  • Create a product and add this text:
    catnip.txt

  • NOTE: the test instruction below should be repeated in all content types in Shopify

Keyword density and Keyphrase distribution assessment
  • Set the word catnip as the focus keyphrase
  • Confirm that the keyword density assessment detects 8 occurrences of the keyphrase in the text
  • Embed this link below to the phrase "alternative plants exist":

https://en.wikipedia.org/wiki/Catnip#Felines_not_affected_by_catnip

  • Confirm that the keyword density assessment still detects 8 occurrences of the keyphrase in the text
  • Click the eye icon of keyword density assessment
  • Confirm that the "alternative plants exist" is not highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
  • Click the eye icon of keyphrase distribution assessment
  • Confirm that the "alternative plants exist" is not highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
  • Embed this link below to the phrase "catnip flowers":

https://hort.extension.wisc.edu/articles/catnip-nepeta-cataria/

  • Confirm that the keyword density assessment still detects 8 occurrences of the keyphrase in the text

  • Click the eye icon of keyword density assessment

  • Confirm that the "catnip" in "catnip flower" is highlighted

  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"

  • Click the eye icon of keyphrase distribution assessment

  • Confirm that the "catnip" in "catnip flowers" is highlighted

  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"

Word complexity assessment
  • Go to the Readability analysis tab
  • Embed this link below to the word "hereditary" in the text:

https://en.wikipedia.org/wiki/hereditary-defects

  • Click the eye icon on the Word complexity assessment
  • Confirm that the word "hereditary" is highlighted
  • Check the anchor link of the phrase and confirm that the href value doesn't contain yoast mark tags, e.g. "<yoastmark class='yoast-text-mark'>" or "</yoastmark>"
(Smoke) Test impact by sentence tokenizer
  • Add the following sentence as a separate paragraph (use text editing mode (as opposed to visual editing mode)): <i>The cat</i> was greeted by <i>the dog</i>.
  • Toggle highlighting for passive voice analysis. Make sure that the above-mentioned sentence is properly highlighted.
    • properly =
    • The sentence is fully highlighted.
    • The previous sentence and past sentence are not highlighted (unless they are passive voice as well)
    • When highlighting 'the cat' and 'the dog' are still in italics.
    • After undoing the highlighting 'the cat' and 'the dog' are still in italics.
  • Add the following sentence as a separate paragraph (use text editing mode (as opposed to visual editing mode)): <div>The cat was greeted by the dog</div>.
  • Toggle highlighting for passive voice analysis. Make sure that the above-mentioned sentence is properly highlighted.
  • Add the word midget to a few random sentences in the post. This should trigger the inclusive language assessment for midget.
  • Toggle highlighting for midget.
  • Make sure that all sentences containing midget are properly highlighted.

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Block/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

  • This PR impacts all assessments that use highlighting. Tests are added for this.
  • This PR impacts classic editor and shopify. Block editor and Elementor need to be smoke tested.
  • This PR impacts the sentence tokenizer. An acceptance testing scenario was added to cover this.

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change.

Quality assurance

  • I have tested this code to the best of my abilities
  • I have added unit tests to verify the code works as intended
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label and noted the work hours.

Fixes #19477

@hdvos hdvos changed the base branch from trunk to feature/lingo-fixes December 30, 2022 11:50
@hdvos hdvos changed the title 19477 yoast markers break html of content in shopify and classic yoast markers break html of content in shopify and classic Dec 30, 2022
@hdvos hdvos added changelog: bugfix Needs to be included in the 'Bugfixes' category in the changelog Shopify This PR impacts Shopify. labels Dec 30, 2022
@hdvos hdvos marked this pull request as ready for review December 30, 2022 15:12
Comment thread packages/js/src/decorator/tinyMCE.js Outdated
@hdvos hdvos linked an issue Dec 30, 2022 that may be closed by this pull request
@hdvos hdvos changed the title yoast markers break html of content in shopify and classic fixes a bug where yoast markers break html of content in shopify and classic Jan 2, 2023
Copy link
Copy Markdown
Contributor

@FAMarfuaty FAMarfuaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job for finding that smart and neat logic for not applying yoastmark in html tags! 🙌🏽

I have a suggestion for checking whether ACF plugin is used. It needs to be adapted, since it's currently not working that way :)

Comment thread packages/js/src/decorator/tinyMCE.js Outdated
@FAMarfuaty
Copy link
Copy Markdown
Contributor

FAMarfuaty commented Jan 2, 2023

NOTES:

  • Currently, the content of the custom fields is not included in the analysis in the Classic editor. Hence, we cannot test the highlighting feature for the Custom fields in Classic editor.
  • The content of the custom fields is included in the analysis in Block editor. However, the highlighting is not applied for the match inside the custom field. This is related to this bug issue.

The two points above are not in the scope of this PR.

@manuelaugustin manuelaugustin added this to the feature/lingo-fixes milestone Jan 3, 2023
…into 19477-yoast-markers-break-html-of-content-in-shopify-and-classic
@hdvos hdvos requested a review from a team as a code owner January 12, 2023 13:20
Copy link
Copy Markdown
Contributor

@FAMarfuaty FAMarfuaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's super nice that you found a solution to the mind-boggling situation 🐡 🙌🏽
I have a few comments and minor suggestions (that I think can be directly committed? you're the judge 😸 )

Another thing, since we introduce an additional check in the sentence tokenizer which we use everywhere (as proven by the adapted fullTextTests), maybe you can think of an additional impact to check?

Otherwise, the code is good to go!

Comment thread packages/js/src/decorator/tinyMCE.js Outdated
const element = document.createElement( "body" );
element.innerHTML = str;
return element.innerHTML;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you rename the function and also make the description even clearer! 🤩

const lastTagType = lastTokenText.match( tagTypeRegex )[ 1 ];

return firstTagType === lastTagType && [ "p", "div", "h1", "h2", "h3", "h4", "h5", "h6", "span" ].includes( firstTagType );
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hdvos Is there a reason why we only check the tags in the array and not the others? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FAMarfuaty : yes. This is the list of types for which I deem it safe to remove the first and the last tag. If you were to include stylistic tags such as <i> and <b>, the chance of creating corrupted html would be too big as it is not unlikely that only the first and last word of a paragraph/block are italics. For example <i>First words</i> ... <i> last words.</i> would be normalized to the corrupted First words</i> ... <i> last words.

Also, I think these types of 'semantic' tags, is this is the situation this function was intended for.

Comment thread packages/yoastseo/src/languageProcessing/helpers/sentence/SentenceTokenizer.js Outdated
hdvos and others added 2 commits January 16, 2023 09:07
Co-authored-by: Aida Marfuaty <48715883+FAMarfuaty@users.noreply.github.com>
Co-authored-by: Aida Marfuaty <48715883+FAMarfuaty@users.noreply.github.com>
@hdvos
Copy link
Copy Markdown
Contributor Author

hdvos commented Jan 16, 2023

@FAMarfuaty : regarding the extra impact check. I will add one that smoke tests the sentence tokenizer.

@hdvos hdvos closed this Jan 17, 2023
@mhkuu mhkuu deleted the 19477-yoast-markers-break-html-of-content-in-shopify-and-classic branch February 10, 2023 08:10
@mhkuu mhkuu removed this from the feature/lingo-fixes milestone Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: bugfix Needs to be included in the 'Bugfixes' category in the changelog Shopify This PR impacts Shopify.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Yoast markers break HTML of content in Shopify

4 participants