Skip to content

Upgrade SETUPTOOLS_VERSION to improve memory rss#1429

Closed
goinnn wants to merge 1 commit intoheroku:mainfrom
we-are-Joinup:main
Closed

Upgrade SETUPTOOLS_VERSION to improve memory rss#1429
goinnn wants to merge 1 commit intoheroku:mainfrom
we-are-Joinup:main

Conversation

@goinnn
Copy link
Copy Markdown

@goinnn goinnn commented Mar 24, 2023

We have been using heroku since 2015. Currently we are upgrading our stack. And we have upgraded heroku-buildpack-python from v184 to v229.

In our stack we have several dynos: web (with Django), worker (with celery worker), worker-slow (with celery worker to slow tasks) and beat (celery beat). This last is very very stable to measure. The next measures are of this dyno in preproduction environment, that is even more stable:

  • v184: 126.45 MB
  • v200: 126.44 MB
  • v205: 126.46MB
  • v208: 126.48 MB
  • v209: 126.23 MB

Memory is increased

  • v210: 130.83 MB
  • v212: 130.96 MB
  • v214: 130.99 MB

Memory is increased another time

  • v215: 134,20 MB
  • v220: 133,92 MB
  • v229: 133,95 MB

Reasons:

We have done more tests:

  • v229 with setuptools==39.2.0: 126.45MB
  • v229 with setuptools==59.8.0: 126.70 MB (previous version to 60.0.0)

Memory is increased:

  • v229 with setuptools==60.0.0: 128.24 MB

Memory is decreased:

  • v229 with setuptools==60.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 126.45MB
  • v229 with setuptools==61.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 126.89MB

Memory is increased:

  • v229 with setuptools==62.3.2 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.41MB
  • v229 with setuptools==63.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.73MB
  • v229 with setuptools==63.4.3 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.41MB
  • v229 with setuptools==65.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.36MB
  • v229 with setuptools==66.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.44MB
  • v229 with setuptools==66.1.1 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 128.43MB

Memory is decreased:

  • v229 with setuptools==67.0.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 125.37MB
  • v229 with setuptools==67.6.0 and env variable SETUPTOOLS_USE_DISTUTILS=stdlib: 125.63MB

So, if we upgrade setuptools and we use (or document) SETUPTOOLS_USE_DISTUTILS environment variable we can reduce our project by 9MB (7%). But 9MB for celery beat, this process consumes low memory. In our stack, we have decreased next MB per dyno:

  • Web decreases about 40MB
  • Worker descreses about 50 MB
  • Worker slow decreases about 50MB too
  • Beat decreases about 9MB

@goinnn goinnn requested a review from a team as a code owner March 24, 2023 12:27
@edmorley
Copy link
Copy Markdown
Member

edmorley commented Apr 3, 2023

Hi! Thank you for the detailed write up.

I'm quite surprised to hear that the setuptools version is affecting your application's memory usage at runtime. In general setuptools will only affect the build process itself, and not anything after that. The only thing I can think of, is that setuptools does runtime patching of distutils, and if your application or any of its dependencies rely upon distutils at runtime (and not only build time), then perhaps this is the reason? (Which would seem to be the case, given you mention adjusting SETUPTOOLS_USE_DISTUTILS changes memory usage.)

From reading the PR description, it seems you found the memory usage improved between v66.1.1 and v67.0.0. However, reading the changes between the two, there doesn't seem to be anything that could affecting the distutils implementation?
pypa/setuptools@v66.1.1...v67.0.0

Is it possible there is a mistake in the versions listed above (and the memory usage improvement occurred in a different version), or there is some other factor not being taken into account with the test approach?

In terms of updating setuptools version, we will be doing so at some point soon - however, there are multiple breaking changes in newer setuptools, so have been deliberately holding off updating as long as possible, in order to give packages time to adapt (along with stack overflow answers to exist for common failure modes) - to try and reduce the amount of customer app breakage caused.

@edmorley
Copy link
Copy Markdown
Member

Setuptools has been updated as part of #1437 (along with an update to wheel, and test assertions being updated etc).

@edmorley edmorley closed this Apr 11, 2023
@goinnn
Copy link
Copy Markdown
Author

goinnn commented Apr 24, 2023

Thanks @edmorley and sorry for delay

Is it possible there is a mistake in the versions listed above (and the memory usage improvement occurred in a different version), or there is some other factor not being taken into account with the test approach?

Everything is possible, but no :-), we were studing the memory of our application several weeks. We saw a memory increment in v210, and we understand the reason: setuptools changed the default value of SETUPTOOLS_USE_DISTUTILS from stdlib to local) but in v215 we saw another memory increment, but it is truth we don't understand what is the reason.

In terms of updating setuptools version, we will be doing so at some point soon - however, there are multiple breaking changes in newer setuptools, so have been deliberately holding off updating as long as possible, in order to give packages time to adapt (along with stack overflow answers to exist for common failure modes) - to try and reduce the amount of customer app breakage caused.

Thanks! I think a test would be good to be able to detect an increase in memory and be able to change setuptools dependencies without fear. Don't you think? Would you like me to do a "pip freeze" or do you need something else?

@edmorley
Copy link
Copy Markdown
Member

edmorley commented Apr 24, 2023

No problem about the delay! The setuptools version was updated a couple of weeks ago, which should have resolved the issue you were experiencing.

I think a test would be good to be able to detect an increase in memory and be able to change setuptools dependencies without fear.

This buildpack has quite a few tests already (integration tests), the issue with setuptools is that it intentionally made changes that broke a small proportion of packages out in the wild. PyPI has about half a million packages - it's not viable to test all of them here. And even if we attempted to, all it would prove is what we already know from setuptools' changelog - that newer setuptools breaks some older packages.

The reason for holding off updating setuptools is not for lack of tests, but to allow time for:

  • users' local machines and CI runner images to be updated to newer setuptools, so that hopefully they see the errors in their affected packages there first, and don't just blame Heroku if we instead were to update our setuptools to a newer version than they had locally
  • newer versions of affected packages to be published, so that users who do encounter errors in their pinned packages have a workaround other than forking a package
  • Stack Overflow posts and other resources to be created and discoverable, when users search for the error messages they encountered

Regarding adding a test for memory usage, that feels like something that might be good for you to suggest to the upstream setuptools project? It's not really viable for us to test every single implementation detail of our dependencies beyond the major functionality - at a certain point we have to trust that our dependencies have reasonable tests themselves. Also for this specific case, I'm presuming it requires a very specific scenario to be affected - since in general setuptools should not affect anything at runtime (only build time), plus for most apps a 9MB memory usage change would be insignificant.

@goinnn
Copy link
Copy Markdown
Author

goinnn commented Apr 24, 2023

Ok, thanks for quickly answer. Only a detail of your answer: "... apps a 9MB memory usage change would be insignificant.". 9MB is only in one of our dynos.

So, if we upgrade setuptools and we use (or document) SETUPTOOLS_USE_DISTUTILS environment variable we can reduce our project by 9MB (7%). But 9MB for celery beat, this process consumes low memory. In our stack, we have decreased next MB per dyno:
Web decreases about 40MB
Worker descreses about 50 MB
Worker slow decreases about 50MB too
Beat decreases about 9MB

The increase is 50 MB in worker and worker slow, or what is the same, as we have 1 Giga a increase of 5%. The measures in this issue are of beat dyno because it is very very stable, and with this dyno is very easy to meassure.

But I understand your point of view.

Thanks!

@edmorley
Copy link
Copy Markdown
Member

Yeah I meant 9MB per process would not be significant for some projects (but of course would be for some others), depending on type of concurrency used for the app and memory usage of the rest of the application. For your app, it sounds like you are using process based concurrency set to 5, which would be more affected than the default Heroku standard 1x dyno concurrency of 3. Also, apps that use features like gunicorn's --preload would presumably be less affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants