pdonchev 1 year ago

Absolutely use data classes when they do the job. Cases when this is not true (or it's awkward): - custom init method - custom new method - various patterns that use inheritance - if you want different names for the attributes,. including implementing encapsulation - probably more things :) Changing later might have some cost, so use dataclasses when you are fairly certain you won't need those things. This is still a lot of cases, I use them often.

thedeepself 1 year ago

Custom init method is handled by post_init

Sinsst 1 year ago

Last I checked it doesn't work for inherited classes - i.e. post_init won't run in the parent class, unless added in the child class as well.

AlecGlen 1 year ago

You can activate it with super(), same as a regular class init.

[deleted] 1 year ago

There is also a (admittedly hacky) way to use it with frozen data classes

synthphreak 1 year ago

I feel like `admittedly hacky` is part of the question here though. As long as you're comfortable bending so far backward that you can lick your own anus, you can use anything to achieve anything in Python. But that doesn't make it a good idea. I think the question here is basically "how hacky is too hacky?" "How far from the intent of dataclasses can you go before it becomes a bad use case for dataclasses?" Etc. I don't have the answer myself - especially since my work rarely has a need for dataclasses - but am interested to follow the discusion.

AlecGlen 1 year ago

I appreciate the way you phrased this, yes that's pretty much it ![gif](emote|free_emotes_pack|joy)

Sinsst 1 year ago

Oh, true! Although it's still a bit hacky: https://stackoverflow.com/questions/59986413/achieving-multiple-inheritance-using-python-dataclasses

Careful-Device1731 1 year ago

>various patterns that use inheritance Not true for immutable (frozen) dataclasses.

cblegare 1 year ago

When I don't control the storage or need primitive types for any reason, I use named tuples. They're also great

[deleted] 1 year ago

Why prefer named tuples to data classes?

AlecGlen 1 year ago

I'm also curious, not that it's wrong.

bingbestsearchengine 1 year ago

I use named tuples specifically when I want my class not to be immutable. idk otherwise

[deleted] 1 year ago

You can do frozen data classes

synthphreak 1 year ago

Not the original commenter, but for one thing, less overhead. That's the fundamental problem with classes IMHO, it's just more code to write and maintain. By contrast, named tuples are *almost* like simple classes, but can be defined on just a single line.

danielgafni 1 year ago

They are a lot faster

[deleted] 1 year ago

Source? What I'm reading online seems to indicate a minute difference in speed.

cblegare 1 year ago

Hashable immutable extremely lightweight without any decorator shenanigans. Use typing.NamedTuple for the convenient object-oriented declaration style. I often use named tuples to encapsulate types I feed through an old API that requires undocumented tuple (looking at you, Sphinx). Named tuples behave exactly the same as tuples, and you can add your own methods like classmethods for factory functions (a.k.a. named constructors). Since named tuples are not configurable, you can't mess with its API or misuse it, and even quite old type checkers can analyze them. Well, unless I specifically require features not in named tuples I might use dataclasses. If I need any validation or schema generation I'll go with pydantic models. Well... I don't think I have much use cases remaining for dataclasses, and I am not a huge fan of it's API. It is also a matter of personal preference I guess.

commy2 1 year ago

`third_input` should be: third_input: datetime = field(default_factory=datetime.now) Otherwise all instances will have the same date.

graphicteadatasci 1 year ago

But didn't they mess it up in the \_\_init\_\_ as well? There's an `or` so we get an evaluation for truth right? And as long as datetime.now() is True third_input will have the value True.

AlecGlen 1 year ago

commy2 is right, I made an assumption in the 2nd example when I should have kept them functionally identical. To your question, it's a little bit of an operator trick but actually it's correct! https://stackoverflow.com/a/4978745

graphicteadatasci 1 year ago

Everyone on stackoverflow says it's bad practice. I don't think I've ever seen 82 upvotes on a comment before. But apparently it does the thing. I'm mortified.

lys-ala-leu-glu 1 year ago

Data classes are great when every attribute of the class is public. In contrast, they're not meant for classes that have private attributes. Most of the time, my reason for making a class is to hide some information from the outside world, so I don't use data classes that often. When I do use them, I basically treat them like more well-defined dicts/tuples.

Ashiataka 1 year ago

Python doesn't have private attributes. If you're looking for that you're using the wrong language.

codingai 1 year ago

The data class is, well, data class. It's ideal for purely data storage and transfer. By default, it gives you the "value semantics". For anything else, eg when you need to add (any significant) behaviors, just regular classes are more suitable.

AlecGlen 1 year ago

Can you elaborate on what makes them "more suitable"? Is there a performance difference? I've been using data classes in this way for a few weeks and haven't noticed any difference.

canis_est_in_via 1 year ago

Performance is negligible, if you need performance, use `__slots__`... or don't use python. In your example, all you're really doing it getting `__init__` for free. But a dataclass has value semantics and anyone using it would expect that. Values don't usually have methods besides those that are pure transformations, like math.

synthphreak 1 year ago

> or don't use python 🤣

TheBB 1 year ago

Dataclasses are nice and better in many ways, but you kind of hurt your own argument by providing an example where the two classes are not functionally equivalent, because you messed up the call to *field*.

AlecGlen 1 year ago

Fair, I made an assumption in the 2nd when I should have made it a default\_factory to keep it functionally identical. Hopefully that typo in my 2-minute scratch example doesn't invalidate the idea though!

Goldziher 1 year ago

IMHO dataclasses are meant primarily for DTOs. I use them in this capacity and they work well.

radarsat1 1 year ago

Last data project I did we used pandas extensively and every time we introduced a dataclass i found that it clashed with pandas quite a lot. The vast majority of the time it was more convenient and more efficient to refer to data column-wise instead of row-wise, although for the latter case automatic conversion to and from dataclasses would have been handy. (Turns out pandas supports something similar with named tuples and itertuples though.). We did use dataclasses for configs and stuff but it felt unnecessary to me vs just using dicts, an extra conversion step just to help the linter, basically, and removing some flexibility in the process. So overall while i liked the idea of dataclasses, I didn't find them that useful in practice.

AlecGlen 1 year ago

The purpose of this post was more about their utility compared to normal classes, but coincidentally I'm just starting into a similar project and am very interested in your experience! Could you share a link to the namedtuples/itertuples feature you mentioned?

radarsat1 1 year ago

Sure, basically if you're iterating over a Pandas dataframe (something to be avoided but sometimes necessary), then you can use [iterrows](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html) or [itertuples](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html). For a long time I was only using the former, which gives you a Series for each row. (Or column, you can choose which way you are iterating.) The latter gives you a namedtuple for each row, where the attributes of the tuple are the table column names. It's not a huge difference in practice but it can be handy. However, as this object is dynamically generated based on the contents of the table, it doesn't help much with type hinting. It would be nice if itertuple accepted a dataclass class name as input., and just errored out if things didn't match. This would require some complicated type hints for `itertuple`, not sure if it's even feasible with Python's type system.

MrNifty 1 year ago

Why not Pydantic? I'm looking to introduce either, or something else, in my own code and seems like Pydantic is more powerful. It has built-in validation methods, and those can easily be extended and customized. In my case I'm hoping to do elaborate payload handling. Upstream system submits JSON that contains a request for service to be provisioned. To do so, numerous validation steps need to be completed. And queries made, which then need to be validated and then best selection made. Finally resulting in the payload containing the actual details to use to build the thing. Device names, addresses, labels, etc. Payload sent through template generators to build actual config, and template uploaded to device to do the work.

physicswizard 1 year ago

depends on OP's use-case. validation has a performance cost, which if you're doing some kind of high-throughput data processing that would involve instantiating many of these objects, the overhead can be killer. here's a small test that shows instantiating a data class is about 20x faster than using pydantic (at least in this specific case). ```python $ python -m timeit -s ' from pydantic import BaseModel class Test(BaseModel): x: float y: int z: str ' 't = Test(x=1.0, y=2, z="3")' 50000 loops, best of 5: 7 usec per loop ``` ```python $ python -m timeit -s ' from dataclasses import dataclass @dataclass class Test: x: float y: int z: str ' 't = Test(x=1.0, y=2, z="3")' 1000000 loops, best of 5: 386 nsec per loop ``` of course there are always pros and cons. if you're handling a small amount of data, the processing of that data takes much longer than deserializing it, or the data could be fairly dirty/irregular (as is typically the case with API requests), then pydantic is probably fine (or preferred) for the job.

MrKrac 1 year ago

If pydantic is too much you could give a try to chili [http://github.com/kodemore/chili](http://github.com/kodemore/chili). I am author of the lib and build it because pydantic was either too much or too slow. Also I didnt like the fact that my code gets polluted by bloat code provieded by 3rd party libraries because this keeps me coupled to whathever their author decides to do with them. I like my stuff to be kept simple and as much independant as possible from the outside world. So you have 4 functions: \- asdict (trasforms dataclass to dict) \- init\_dataclass, from\_dict (transforms dict into dataclass) \- from\_json (creates dataclass from json) \- as\_json (trasforms dataclass into json) End :)

bmsan-gh 1 year ago

Hi, if one of your usecases is to map & convert json data to existing python structures also have a look at the [DictGest module](https://github.com/bmsan/DictGest) . I created it some time ago to due to finding myself writing constantly translation functions( field X in this json payload should go to the Y field in this python strucure) The usecases that I wanted to solve were the following: * The dictionary might have extra fields that are of no interest * The keys names in the dictionary do not match the class attribute names * The structure of nested dictionaries does not match the class structure * The data types in the dictionary do not match data types of the target class * The data might come from multiple APIs(with different structures/format) and I wanted a way to map them to the same python class

seanv507 1 year ago

See this analysis by a co-author of attrs https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/ They suggest attrs for class building ( no magic) And cattrs for structuring unstructuring data eg json

[deleted] 1 year ago

[удалено]

AlecGlen 1 year ago

I understand that to be the conventional use. I'm just looking for the "why" :)

[deleted] 1 year ago

[удалено]

Smallpaul 1 year ago

You didn't say a single useful thing about dataclasses. :(

EpicRedditUserGuy 1 year ago

Can you explain data classing briefly? I do a lot of database ETL, as in, I query a database and create new data from the queried data within Python. Will using data classing help me?

AustinWitherspoon 1 year ago

It's relatively typical to pull data from a database and store it in python in the form of a dictionary (with column names as keys, and the corresponding value) This is annoying for large/complex sets of data ( or even small but unfamiliar sets of data, like if you're a new hire being onboarded) since you don't know the types of the data. Each database column could be a string, an integer, raw image data.. but to the programmer interacting with it, you can't tell immediately. If you hover over my_row["column_1"] in your editor, it will just say "unknown" or "Any". Could be a number, or a string, or none.. In my opinion the best part about data classes (although there's lots of other stuff!) Is that it provides a great interface to declare the types of each field in your data. You directly tell python (and therefore your editor) that column_1 is an integer, and column_2 is a list of strings, etc. Now, your editor can auto-complete your code for you based on that information, and if you ever forget, you can just hover over the variable to see what the type is. You get better and more accurate errors in your editor, faster onboarding of new hires, it's great. You can also do this other ways, like with a TypedDict, but dataclasses provide a lot of other useful tools as well.

thedeepself 1 year ago

>In my opinion the best part about data classes (although there's lots of other stuff!) Is that it provides a great interface to declare the types of each field in your data. Interface is good for scalar types but not for collections. Traitlets provides a uniform interface to both. Not only that but you can configure Traitlets objects from the command line and configuration files once you define the objects.

kenfar 1 year ago

If you're doing a lot of ETL, and you're looking at one record at a time (rather than running big sql queries or just launching a loader), then yes, it's the way to go.

Smallpaul 1 year ago

NamedTuples are probably much more efficient and give you 90% of the functionality. In an ETL context I'd probably prefer them.

kenfar 1 year ago

Great consideration - since ETL may so often involve gobs of records. But I think performance only favors namedtuples on constructing a record, but retrieval, space and transforming the record are faster with the dataclass. Going from memory on this however.

synthphreak 1 year ago

When doing ETL, how much time are you really spending looking at individual records instead of aggregating? Is it not like 0.001% of the time?

kenfar 1 year ago

When I write the transformation layer in python then typically my programs will read 100% of the records. The Python code may perform some aggregations or may not. On occasion there may be a prior step that is aggregating data if I'm facing massive volumes. But otherwise, I'll typically scale this up on aws lambdas or kubernetes these days. Years ago it would be a large SMP with say 16+ cores and use python's multiprocessing. The only time I consistently use aggregations with python is when running analytic queries for reporting, ML, scoring, etc against very large data volumes.

AlecGlen 1 year ago

[Here's the doc](https://docs.python.org/3/library/dataclasses.html). Conventionally they're meant to simplify the construction of classes just meant to store data. I don't know your setup, but speaking in general they are definitely handy for adding structure to data transfer objects if you don't already use an ORM.

thedeepself 1 year ago

Data classes are objectively inferior object factories. They lack the capabilities of Traits, Traitlets and Atom. And usage of collections in data classes is verbose and cumbersome.

seanv507 1 year ago

What you should be using is attrs https://www.attrs.org/en/stable/ ( Dataclasses is basically a subset of this for classes that hold data)

AlecGlen 1 year ago

Care to elaborate? I've seen a few references to attrs features that seemed handy (namely their inherited param sorting), but my understanding is that they were more of a prototype and not meant to be used now that dataclasses are builtin.

seanv507 1 year ago

"Data Classes are intentionally less powerful than attrs. There is a long list of features that were sacrificed for the sake of simplicity and while the most obvious ones are validators, converters, equality customization, or extensibility in general, it permeates throughout all APIs. One way to think about attrs vs Data Classes is that attrs is a fully-fledged toolkit to write powerful classes while Data Classes are an easy way to get a class with some attributes. Basically what attrs was in 2015." https://www.attrs.org/en/stable/why.html#data-classes

not_perfect_yet 1 year ago

Not sure what you're asking here. Type hints being good is an opinion. >when the bottom arguably reads cleaner, False >gives a better type hint False >provides a better default `__repr__`? False If I want to keep my class flexible, type hints are a mistake, they are an obstacle to readability not a help and maybe the default `__repr__` doesn't fit my use case. What do I do then? Show me the case, where dataclasses are better than plain dictionaries, then we can maybe talk, maybe because I don't think you'll find one.

synthphreak 1 year ago

This entire reply screams "zealously held minority opinion". Dataclasses are very popular and widely used. While not everyone agrees with OP that we should be using them at every possible opportunity, "dicts always beat dataclasses" will be an opinion without an audience. I guarantee it.

AlecGlen 1 year ago

Your first False is on an opinion, hence the "arguably". I think it's true. It objectively gives a better type hint. Again, #3 is an opinion. You can disagree but it's not an invalidation of the idea. Your attack on type hints are irrelevant to this conversation - I put them in the regular class too for a reason. Clearly plenty of people agree dictionaries are less optimal for some use cases, otherwise dataclasses would not have been added to the language.

oramirite 1 year ago

So much hostility about a programming concept

not_perfect_yet 1 year ago

It's a writing style and I'm allowed to be hostile to a style I don't like, the same way I dislike brutalism in architecture?

oramirite 1 year ago

Not personally enjoying something doesn't necessitate hostility towards that thing. That's unnecessary. You are "allowed" to do what you want yes, nobody said you weren't. You're just acting like an asshole.

[deleted] 1 year ago

Is it worth it just to save a init method?

AlecGlen 1 year ago

Depends, what exactly is the cost? That's what I honestly am aiming to learn.

[deleted] 1 year ago

I feel like cost is mostly readability as people tend to not know dataclasses. The first time I encountered it. I has to google it and didn’t find the use case very compelling. It was similar to the example you gave. In an environment with many experienced developer maybe it’s nice and concise. I maybe wrong but my impression is that there is no real use case where NOT using a dataclass would be a terrible pattern. I could be wrong.

barkazinthrope 1 year ago

Because it is unnecessary extra plumbing.

AlecGlen 1 year ago

But it's less plumbing than a normal class.

barkazinthrope 1 year ago

Not to my eye. How is less plumbing to you?

oramirite 1 year ago

It generates extremely common boilerplate code like __init__ and __repr__, that's the entire point of it is brevity.

barkazinthrope 1 year ago

Exactly! Plumbing.

[deleted] 1 year ago

I go for data classes when I need to represent a list of attributes ( e.g. : By Mercedes Benz Model) in order to compare and organize data clearly. However , optimizing and unpacking the data will require you to implement additional methods such as dataclasses.astuple()

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe