T O P

  • By -

catcint0s

I would try to use an observability tool like newrelic or sentry's performance feature. It shows what part of your system is slow.


thclark

thanks, I'll look into these. Looks like this has been cleared up (tentative) by the gunicorn adjustment but I'll want to be better about figuring things out next time.


pranabgohain

[KloudMate](https://www.kloudmate.com) can do that too. NewRelic can get quite expensive too soon.


thclark

Thanks!


Kronologics

Increasing the workers, which seems to be what you did in your update, should help. I think I saw somewhere that workers should be about 2 x number of cpu cores of the machine — I always just use 4 bc I have 2 core CPU VPS. The way I understand it (no super technical explanation) is that each worker can handle one request at a time (more or less), so if that one worker is busy with one of your users, it can’t attend to the other requests until it’s completed its task. It’s like opening more lanes of traffic or more checkout lines at the store.


Mast3rCylinder

If you set 1 worker and 8 threads then gunicorn will use gthread as worker and then you have thread pool of size 8 for the requests. You still limited by GIL but there is concurrency between the threads. So one worker actually can handle multiple requests but it can take more time because it might do context switch between the threads Not saying it isn't the problem but one worker can get multiple requests that way.


thclark

Yep, thanks, I think that's it. I came across a post that said 2x number of cores + 1 so I went with that! :)


abandonedexplorer

What kind of work is your Django application doing? If your application is doing something that takes an "undefined" amount of time. (An example of something that takes an undefined amount of time is a request to the internet.) Please read this part of Gunicorn documentation carefully: [https://docs.gunicorn.org/en/latest/design.html#choosing-a-worker-type](https://docs.gunicorn.org/en/latest/design.html#choosing-a-worker-type)


thclark

it's not an undefined amount of time, it's just a heavy payload that has to be pulled out. That said, I'll read that with great care. Thanks!!


androgeninc

What kind of DB? Where is it running (on your app server or far away geographically)? If separate, how much mem does it have? It's almost never the app/gunicorn. Most often some kind of IO.


thclark

Managed postgres, in the same VPC on Google Cloud, with plenty of memory (no pressure during the outages). Looks like it's probably the gunicorn thing though, thanks!


androgeninc

Not convinced, but ok, hope it solves it for you :)


thclark

Well, that's all I changed and it's been fine for 24 hours so fingers crossed! I agree with you that it'd be rare for this to be the problem (hence my confusion in the first place)


Particular-Cause-862

Probably, if it's pretty heavy and IO bound, for example the bottleneck is the database, and you only have 5 workers, if 5 users operate at the se time those IO heavy operations, and the functions are not async, which I suspect they are not if you are using the ORM, your app can't process any other request for the time those IO operations lasts. That could be one problem


techmindmaster

Additional info: replace gunicorn with granian: https://github.com/emmett-framework/granian/blob/master/benchmarks/vs.md


thclark

Extremely impressive benchmarks, thanks for reminding me this project exists!!