Skip to content

Re-instate background indexing via WP Cron#20005

Merged
thijsoo merged 35 commits intofeature/cron-background-indexationfrom
19950-revisit-background-indexing-with-wp-cron
Apr 21, 2023
Merged

Re-instate background indexing via WP Cron#20005
thijsoo merged 35 commits intofeature/cron-background-indexationfrom
19950-revisit-background-indexing-with-wp-cron

Conversation

@leonidasmi
Copy link
Copy Markdown
Contributor

@leonidasmi leonidasmi commented Mar 14, 2023

Context

Summary

This PR can be summarized in the following changelog entry:

  • Sets up background indexation via WP Cron
  • Fixes a bug where indexables were created when using the wp yoast index WP CLI command on a staging site

Relevant technical choices:

  • Prominent words need frontend libraries in order to be indexed, so the cron indexation doesn't cover those
  • We don't deal with notifications at all, because these look to be related to the manual SEO Optimization.
    • Even with the production version, after a user runs and completes the SEOO, the relevant notification will be cleared only for him. For any other user, that notification number (in the admin bar and the menu item) will stay until the user visits the notification center.
    • A task has been created to cover this in the future.
  • Test coverage for the background indexing class is as close to 100% as possible, not covering register_shutdown_function. See the old PR for details around that: https://github.com/Yoast/wordpress-seo/pull/18050/files

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

For all tests:

  • have the following snippet in functions.php to ensure we're not running any admin requests in the background (because that could mess up with our conclusions)
add_action( 'init', 'stop_heartbeat', 1 );
function stop_heartbeat() {
	wp_deregister_script('heartbeat');
}
  • make sure you dont have any other plugins enabled (because we dont have any potential background admin requests to run), other than Yoast SEO (or any other Yoast plugins) and Query Monitor

Note:
When selected pages is mentioned in the instructions, we mean:

  • The WP's Dashboard->Home page
  • The WP's Dashboard->Updates page
  • The WP's Plugins->Installed plugins page
  • The WP's Settings->Permalinks page
  • All of Yoast settings pages

For the main feature:

  • Yoast SEO should not be active when installing/checking out this RC/branch.
  • Also, in case you tested this in the past, delete any remaining indexation crons via WP CLI wp cron event delete "wpseo_indexable_index_batch"
  • Use a fresh WP install or truncate the tables and transients via queries:
TRUNCATE wp_yoast_indexable;
TRUNCATE wp_yoast_indexable_hierarchy;
TRUNCATE wp_yoast_primary_term;
TRUNCATE wp_yoast_seo_links;
TRUNCATE wp_yoast_prominent_words;
DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%');
  • Make sure your site has at least 100 posts (wp post generate --count=100 --post_author=admin with WP CLI). Once the posts are created, add some internal links in them. A couple of them should have at least some content (for testing prominent words, when Premium is enabled, in the second run of the tests)
  • Activate the plugin after you've generated some content and visit one of the selected pages.
  • Using WP CLI, run wp cron event list and confirm that you see the wpseo_indexable_index_batch scheduled action there, that's to run in 1 hour.
  • Using WP CLI, run wp cron event run "wpseo_indexable_index_batch".
  • This should yield output similar to this:
    Executed the cron event 'wpseo_indexable_index_batch' in 0.072s.
    Success: Executed a total of 1 cron event.
    
  • Check in your database; you should see at least 15 more indexables now.
  • Run wp cron event list and confirm that you see the wpseo_indexable_index_batch scheduled action there again, that's to run in 15mins now.
  • Add the following line to your wp-config.php or functions.php: add_filter( 'Yoast\WP\SEO\enable_cron_indexing', '__return_false' );
  • Repeat the WP CLI step.
  • See that there are no added indexables.
  • Repeat the WP CLI step.
  • See that you get a different response in the terminal: Error: Invalid cron event 'wpseo_indexable_index_batch'
  • Remove the line of code and visit one of the selected pages on your site.
  • Keep repeating the process of running wp cron event run "wpseo_indexable_index_batch" and checking the database until you get a different response: Error: Invalid cron event 'wpseo_indexable_index_batch'
  • Check that the indexable table is complete and that all your posts/pages/terms/system-pages/homepage are there.
  • Disable Yoast SEO again
  • Disable WP-Cron by adding the following line to your wp-config.php or functions.php define( 'DISABLE_WP_CRON', true );.
  • Generate another 10 posts
  • Delete transients - DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%');
  • Enable Yoast SEO
  • Visit any of visit one of the selected pages.
  • Check that the indexable table is complete and that all your posts/pages/terms/system-pages/homepage are there. If Premium is not enabled, the SEOO button should not be visible in Yoast SEO->Tools
  • Repeat the whole test with Premium enabled.

For re-enabling the cron but only upon admin page load, not on frontend load:

  • In a fresh site (or in a site with indexables/options purged)
  • Using WP CLI, run wp cron event run "wpseo_indexable_index_batch" until you get Error: Invalid cron event 'wpseo_indexable_index_batch'. This means that indexables are all created and the cron is now unscheduled
  • Truncate the yoast tables (aka, delete all rows there) NOT via the test helper but via a db tool and delete transients with this query:
TRUNCATE wp_yoast_indexable;
TRUNCATE wp_yoast_indexable_hierarchy;
TRUNCATE wp_yoast_primary_term;
TRUNCATE wp_yoast_seo_links;
TRUNCATE wp_yoast_prominent_words;
DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%');
  • In WP CLI, run wp cron event list and confirm that the wpseo_indexable_index_batch is NOT in that list
  • Load a frontend page and again run wp cron event list and confirm that the wpseo_indexable_index_batch is NOT in that list (this means that we're not performing heavy queries on the frontend)
  • Load one of the selected pages and again run wp cron event list but now confirm that the wpseo_indexable_index_batch IS in that list
  • Run wp cron event run "wpseo_indexable_index_batch" until you get Error: Invalid cron event 'wpseo_indexable_index_batch' and repeat this test with Premium enabled and confirm the same - Note that the cron indexation will finish but the SEO optimization button will still be enabled because prominent words are not able to be indexed via background indexing
  • Repeat the test but this time with add_filter( 'Yoast\WP\SEO\enable_cron_indexing', '__return_false' ); added. This time the wpseo_indexable_index_batch will never be created and there's no need to check the last step (the one that begins with running wp cron event run "wpseo_indexable_index_batch")

For re-enabling the cron indexing when indexable versions are increased:

  • In a fresh site (or in a site with indexables/options purged)
  • Using WP CLI, run wp cron event run "wpseo_indexable_index_batch" until you get Error: Invalid cron event 'wpseo_indexable_index_batch'. This means that indexables are all created and the cron is now unscheduled
  • In WP CLI, run wp cron event list and confirm that the wpseo_indexable_index_batch is NOT in that list
  • In the Indexable_Builder_Versions class (in the src\values\indexables\indexable-builder-versions.php file) increase the version of post indexables in the indexable_builder_versions_by_type, aka. 'post' => 3,
  • Delete transients
DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%');
  • Refresh one of the selected pages, run wp cron event list and confirm that the wpseo_indexable_index_batch IS in that list.
  • Run wp cron event run "wpseo_indexable_index_batch" until you get Error: Invalid cron event 'wpseo_indexable_index_batch'.
  • Once you save, go to the db's indexable table and confirm that the rows that have post as their object_type have 3 as their version
  • This means that cron indexation will automatically run when a plugin upgrade brings increases in indexable versions
  • (Make sure to clean up indexables and option before continuing testing in that site)

For further checking that heavy queries will NEVER run in the frontend:

  • In a fresh site (or in a site with indexables/options purged), first add 30 posts
  • Delete the indexation cron via WP CLI: wp cron event delete "wpseo_indexable_index_batch" and delete transients: DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%');
  • Load the frontend of the site and via Query monitor confirm that you don't get a query with the Yoast\WP\SEO\Actions\Indexing\Abstract_Indexing_Action caller.
  • You can also delete transients and refresh and you still shouldnt get it.
  • Repeat the test with Premium enabled.

For the staging sites rule of not indexing there:

  • Add define( 'WP_ENVIRONMENT_TYPE', 'staging' ); to your wp-config.php (or remove the yoast_seo_development_mode if you're on Local WP)
  • Run wp yoast index.
  • You should get a notice stating that nothing will be indexed because of the staging env your site is set to.
  • Check that no indexables were created in the database
  • Run wp cron event run wpseo_indexable_index_batch
  • Check that no indexables were created in the database
  • Remove the WP_ENV... line or change the value to production. Refresh one of the selected pages.
  • Run wp cron event run wpseo_indexable_index_batch
  • Check that 15 indexables were created
  • Run wp yoast index
  • Check that all indexables were created.
  • Now, set the site as staging and then clean up indexables via the test helper.
  • Set the site as production, load the homepage and check the db to see you have created the indexable for the homepage.
  • Now set the site as staging.
  • Run wp cron event run wpseo_indexable_index_batch, confirm that you got Executed the cron event 'wpseo_indexable_index_batch' response but also confirm that no new indexables were created in the db.
  • Now cleanup indexables, set the site as staging, disable cron (DISABLE_WP_CRON is set to true) and add this filter (so that we set a really large number for the shutdown indexing, so we make sure it is supposed to run on every page load):
add_filter( 'wpseo_shutdown_indexation_limit', 'custom_limit', 1 );
function custom_limit() {
  return 1000000;
}
  • Refresh one of the selected pages and the check the db to confirm that there were no indexables created.
  • Set the site as production, refresh the page and confirm that now indexables did get created.

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Block/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

  • Test the above, but also an extra test below:

For testing the actual cron indexation without WP CLI:

  • Have a fresh install of WordPress without Yoast SEO
  • Add the following line to your wp-config.php or functions.php: add_filter( 'Yoast\WP\SEO\enable_cron_indexing', '__return_false' );
  • Create 100 posts for your site (wp post generate --count=45 --post_author=admin with WP CLI). Once the posts are created, add a couple of internal links in them.
  • Activate the plugin after you've generated some content and visit one of the selected pages.
  • After a bit over 60 minutes, visit the homepage of your site
  • In the database, check that there are fewer than 15 records in the wp_yoast_indexables table. It's likely just the home-page
  • Remove the line from the wp-config that you added in the second step.
  • Visit the homepage of you site. And after a bit over 60 minutes, visit the homepage of your site again
  • In the database, check that there are 15 more records in the wp_yoast_indexables table.
  • Add the following filters in your functions.php file:
add_filter( 'wpseo_cron_indexing_limit_size', 'custom_cron_indexing_limit_size', 10 );
function custom_cron_indexing_limit_size() {
    return 1000;
}

add_filter( 'wpseo_cron_link_indexing_limit_size', 'custom_cron_link_indexing_limit_size', 10 );
function custom_cron_link_indexing_limit_size() {
    return 1000;
}
  • After a bit over 15 minutes, visit the homepage of your site again.
  • The indexables should now be completely built (unless you have Premium, because prominent words aren't built via the cron indexation). Confirm that in the Yoast SEO->Tools page.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

  • The SEO optimization

Also:

  • Installing Yoast SEO for the first time on small sites with fewer than 25 posts+authors+terms in total. This used to index the site in one go during the first wp-admin request that hits the plugin's admin pages. This is still the case when DISABLE_WP_CRON is set to true. If that is not the case, the site should be indexed gradually; 15 posts every 15 minutes (granted the site gets visitors every 15 minutes).
    • To regression test the background indexing when define( 'DISABLE_WP_CRON', true ); add this filter (so that we set a really large number for the shutdown indexing, so we make sure it is supposed to run on every page load):
add_filter( 'wpseo_shutdown_indexation_limit', 'custom_limit', 1 );
function custom_limit() {
  return 1000000;
}
  • Clean out indexables and once the test helper page refreshes, check the db and confirm that no indexables got created.
  • Go to Dashboard->Home, check the db and confirm that some indexables got created.
  • Go to Plugins->Install plugins, check the db and confirm that some indexables got created.
  • Go to any plugin's (other than Yoast's) settings pages, check the db and confirm that no indexables got created.
  • Go to any of Yoast's settings pages, check the db and confirm that some indexables got created.
  • Repeat the test but without define( 'DISABLE_WP_CRON', true ); and confirm that the background indexation never run (aka no indexables were created when visiting the Dashboard->Home, Plugins->Install plugins, Yoast's settings pages. But make sure you check that in under 15mins, because the cron indexation might create the indexables and mess up with our tests)
  • Also, remove the wpseo_shutdown_indexation_limit filter above and regression test this PR.
  • Also regression test the notifications about the SEO optimization

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label and noted the work hours.

Fixes #

diedexx and others added 5 commits February 10, 2022 10:47
The previous background indexing hook had "\" characters in its handle.
This may cause issues with automations that use bash (in combination with WP-CLI)
for instance, which sees that as a escaping character.
@leonidasmi leonidasmi linked an issue Mar 15, 2023 that may be closed by this pull request
@leonidasmi leonidasmi added the changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog label Apr 12, 2023
@leonidasmi leonidasmi changed the base branch from trunk to feature/cron-background-indexation April 12, 2023 13:32
@leonidasmi leonidasmi marked this pull request as ready for review April 13, 2023 08:44
Copy link
Copy Markdown
Contributor

@thijsoo thijsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 🚧 Did not do the ACC yet. Might be nice to run a quick code coverage report on the unit tests you touched because a couple of methods in those files are not covered yet and I think that is a quick win to add those :)

Comment thread src/helpers/indexing-helper.php
Comment thread src/helpers/indexing-helper.php Outdated
Comment thread src/integrations/admin/background-indexing-integration.php
Comment thread src/integrations/admin/background-indexing-integration.php Outdated
Comment thread src/integrations/admin/background-indexing-integration.php Outdated
Comment thread src/integrations/admin/background-indexing-integration.php Outdated
Comment thread tests/unit/commands/index-command-test.php
@@ -127,9 +168,7 @@ protected function set_up() {
public function test_get_conditionals() {
static::assertEquals(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why this is suddenly a static?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR just adds an additional test following the pattern of the existing ones. I'm not sure if it's rightfully a static, but maybe it's out of the scope to investigate this here?

Indexing_Helper::class,
$this->getPropertyValue( $this->instance, 'indexing_helper' )
);
static::assertInstanceOf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above? Maybe check the entire file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

*/
public function test_register_hooks() {
Monkey\Actions\expectAdded( 'admin_init' );
Monkey\Actions\expectAdded( 'wpseo_indexable_index_batch' );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I think we always use has( instead of expectAdded, not sure if there is a reason you chose to use this except that there was already a use here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something that I moved from the original PR, but I think it is because we already have been using expectAdded here, so for uniformity reasons?

@thijsoo thijsoo merged commit 0615d63 into feature/cron-background-indexation Apr 21, 2023
@thijsoo thijsoo deleted the 19950-revisit-background-indexing-with-wp-cron branch April 21, 2023 12:58
leonidasmi added a commit that referenced this pull request May 8, 2023
…-indexing-with-wp-cron"

This reverts commit 0615d63, reversing
changes made to 95bf91e.
@leonidasmi leonidasmi added changelog: reverted Needs to be excluded from all categories in the changelog, should be used for reverted PR and removed changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog labels May 9, 2023
@leonidasmi leonidasmi mentioned this pull request May 10, 2023
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: reverted Needs to be excluded from all categories in the changelog, should be used for reverted PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit background indexing with WP Cron

4 participants