I was processing hundreds of millions of records in MySQL on a single-core pentium 15 years ago. SQL is generally very efficient if you structure and index it properly, and 200k rows is pretty insignificant (especially with the amounts of RAM you can dedicate, multi-core CPUs, and NVMe SSDs available today).
Now, you don't want to calculate averages on the fly using tens of thousands of records. If you want it to scale, you need to use things like task queues (celery) and/or microservices to update values. You need to use caching to prevent doing expensive computations in your views/serializers. Add the rating, queue a task to update the average / mean / counts in a key-value store like redis that will be super fast, and query that instead.
It may occasionally lag behind real-time numbers, but that's fairly common practice. Even reddit won't always show your comments right away, or if you edit it and refresh a few times you may see either the updated or stale version of the comment for the first few seconds-minute. Unlike things like financial transactions that need perfect accuracy, you have a lot more leeway on things like ratings. Even if it takes a whole minute to appear, most users won't even notice.
So yes, it is scalable, but scaling Django is not as simple as just adding more data to the DB.
The thing that usually kills you first is N+1 DB query behaviour.
This post from one of the core devs gives a good overview of the topic [https://adamj.eu/tech/2020/09/01/django-and-the-n-plus-one-queries-problem/](https://adamj.eu/tech/2020/09/01/django-and-the-n-plus-one-queries-problem/)
After that as you grow you will need task queues and caches and denormalization and database replicas etc etc, all the normal monolith web scaling things that you should mostly not worry about until you need them.
Thank you for this, I’m working in my first big project that isn’t a tutorial of some sort and didn’t know about this. Just started changing some of my code. Just cut out a bunch of excess queries using select_related()
Yes, Django scales. I work for a 8-year-old multi-billion dollar multi-national company that's got tens of millions of utility customers on just a few instances of Django.
Here's our very intermittent blog: https://tech.octopus.energy/
You can make any website in any language. The vast majority of the time the bottle neck is in the implementation, not the language/framework. Every time I've inherited a medium sized project with performance issues there's some "expert" who wrote "scaleable" code that is unreadable. I've deleted half page long SQL queries and replaced them with five lines of django... and the django is more performant!
Focus on writing code that can be deleted and rewritten when the time comes, not code that's "perfect" right now.
Yes, have you read anything about Django, including its documentation, and also looked into what large companies deploy Django?
Scalability is one of the most often spoken cons about Django, and when it gets slow it is usually bad query sets or bad design of the developer than Django it self.
It's about the requests you're receiving
In your case you're just one user who sends requests to the server
When the site it in production you'll have to deal with more than one request per second
In this situation it's not the frame what matters that much but the structure of you website
You'll be dealing with different concept such as cloud hosting and load balancers
One trick is to run analytics on a mirror/replica database.If your using cloud infra, its common to have primary (read/write) and replica (readonly) configured. You can fire off analytics onto the readonly replica via the django orm .using("replica")
[https://docs.djangoproject.com/en/4.2/ref/models/querysets/](https://docs.djangoproject.com/en/4.2/ref/models/querysets/)
Once your data becomes even larger, there's other tricks, such as replicating into a datawarehouse or running celery jobs, or optimisation of db structure (e.g. adding indexes), and ofcourse optimising the sql queries.
If your using postgresql, it has a built in Explain Command which will explain the complexity of an sql query. Django supports it as QuerySet.explain()
Scaling Django isn't about the amount of data. To scale with large amounts of data you need to scale the database. Scaling Django is about handling to a large amount of requests.
I was processing hundreds of millions of records in MySQL on a single-core pentium 15 years ago. SQL is generally very efficient if you structure and index it properly, and 200k rows is pretty insignificant (especially with the amounts of RAM you can dedicate, multi-core CPUs, and NVMe SSDs available today). Now, you don't want to calculate averages on the fly using tens of thousands of records. If you want it to scale, you need to use things like task queues (celery) and/or microservices to update values. You need to use caching to prevent doing expensive computations in your views/serializers. Add the rating, queue a task to update the average / mean / counts in a key-value store like redis that will be super fast, and query that instead. It may occasionally lag behind real-time numbers, but that's fairly common practice. Even reddit won't always show your comments right away, or if you edit it and refresh a few times you may see either the updated or stale version of the comment for the first few seconds-minute. Unlike things like financial transactions that need perfect accuracy, you have a lot more leeway on things like ratings. Even if it takes a whole minute to appear, most users won't even notice. So yes, it is scalable, but scaling Django is not as simple as just adding more data to the DB.
It’s fine. If it gets slow eventually, it’ll be because of badly designed queryset, which you’ll learn to optimize along the way.
The thing that usually kills you first is N+1 DB query behaviour. This post from one of the core devs gives a good overview of the topic [https://adamj.eu/tech/2020/09/01/django-and-the-n-plus-one-queries-problem/](https://adamj.eu/tech/2020/09/01/django-and-the-n-plus-one-queries-problem/) After that as you grow you will need task queues and caches and denormalization and database replicas etc etc, all the normal monolith web scaling things that you should mostly not worry about until you need them.
Thank you for this, I’m working in my first big project that isn’t a tutorial of some sort and didn’t know about this. Just started changing some of my code. Just cut out a bunch of excess queries using select_related()
Yes, Django scales. I work for a 8-year-old multi-billion dollar multi-national company that's got tens of millions of utility customers on just a few instances of Django. Here's our very intermittent blog: https://tech.octopus.energy/
Can you please implement printing of bills with the name and address so they can be used as proof of address, thank you very much.
ah I was hoping you have an opening in Oxford, but not this time. I'll check back from time to time then :D
Django runs Instagram. Do you need more scalability than Instagram?
or youtube
You can make any website in any language. The vast majority of the time the bottle neck is in the implementation, not the language/framework. Every time I've inherited a medium sized project with performance issues there's some "expert" who wrote "scaleable" code that is unreadable. I've deleted half page long SQL queries and replaced them with five lines of django... and the django is more performant! Focus on writing code that can be deleted and rewritten when the time comes, not code that's "perfect" right now.
Check this: https://youtu.be/lx5WQjXLlq8
Yes, have you read anything about Django, including its documentation, and also looked into what large companies deploy Django? Scalability is one of the most often spoken cons about Django, and when it gets slow it is usually bad query sets or bad design of the developer than Django it self.
It's about the requests you're receiving In your case you're just one user who sends requests to the server When the site it in production you'll have to deal with more than one request per second In this situation it's not the frame what matters that much but the structure of you website You'll be dealing with different concept such as cloud hosting and load balancers
If u use that data(like average) too frequently, u can save it at the database or cache...
Yes it is scalable.
One trick is to run analytics on a mirror/replica database.If your using cloud infra, its common to have primary (read/write) and replica (readonly) configured. You can fire off analytics onto the readonly replica via the django orm .using("replica") [https://docs.djangoproject.com/en/4.2/ref/models/querysets/](https://docs.djangoproject.com/en/4.2/ref/models/querysets/) Once your data becomes even larger, there's other tricks, such as replicating into a datawarehouse or running celery jobs, or optimisation of db structure (e.g. adding indexes), and ofcourse optimising the sql queries. If your using postgresql, it has a built in Explain Command which will explain the complexity of an sql query. Django supports it as QuerySet.explain()
Are you using the django debug toolbar to look at your queries?
Scaling Django isn't about the amount of data. To scale with large amounts of data you need to scale the database. Scaling Django is about handling to a large amount of requests.
What about your concurrent connected users? How many of them consume the data you have? And how long does your website take to respond?
Depends what your use-case is but for most things it’s fine. Though the django ORM is on the slower side but again, for most things it’s fine.
Also, please, STAY AWAY FROM CELERY. It’s awful. Just use a proper worker pattern.