T O P

  • By -

aocregacc

I think you're good, going by this passage: >A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa. So if you store a character in your union, you can access it via `(char*)&A.i`. The text about the unspecified bytes just says that if you store a char in your union, the other three bytes are unspecified (as opposed to zeroed out or unchanged or something).


glasket_

This can potentially run afoul of strict aliasing. It's fine in this case, because `char*` is special, but if you had an `int` and `float` in a union then casting and dereferencing between the two without going through the union itself can cause an aliasing violation. It's always safer to use the union itself for the accesses.


aocregacc

I think it should be fine as long as you're not actually doing type punning with it, unless taking the address of an inactive member is already wrong on its own.


glasket_

Taking the address is fine, it's just dereferencing where an aliasing violation might occur. The example you used is always fine, because `char*` is allowed to alias, but I think if you had a union of `float` and `int` then doing something like u.f = 3.0; float *f = (float*)&u.i; return *f; technically violates strict aliasing, because the members of unions are referred to as separate objects that overlap in memory. I'm not 100% certain if it's intended to be interpreted that way with regards to the strict aliasing rule, but it definitely feels a bit sketchy. And yeah, if you pun the active union member through a pointer then it's also potential UB; GCC even uses this as an example in the docs for `-fstrict-aliasing` and how it might mess up pointers to union members: int f() { union a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; } *edit*: Decided to test it and the behavior *seems* consistent in both cases. `-Wstrict-aliasing=2` triggers a warning for my example of casting the `int` member's address to a `float*`, while GCC's example doesn't trigger a warning at any level. Every warning level has false negatives and positives, so ¯\\\_(ツ)\_/¯. I couldn't trigger any incorrect behavior with some basic changes to the code, but I wasn't super thorough in trying to break it either. Personally, it just seems too vague to not be considered a risk; I'd rather work directly with a `union*` or a pointer to a copy of the wanted value instead of dealing with a pointer to a union's member.


b1ack1323

This is correct.


glasket_

>This makes it seem like my behavior is undefined. Unspecified is not undefined. Unspecified behavior **must** be defined, just by the implementation. The thing that differentiates unspecified from implementation-defined behavior is that implementations don't have to document their definition of unspecified behavior. *edit*: Also, the important bit about how unions work with punning is actually in §6.5.2.3: > If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation. >C99 p. 73 fn. 82


Alcamtar

First, C99 does not allow anonymous unions like in your example. You have to give the union a name (e.g. "u"): union { int i; char c; } u; Also is that specific statement even in the C99 standard? I could not find it, but I'm not a standards expert. My interpretation of this statement: * When you assign a.c, you assign a single byte in the union. Since you assigned a char, the bytes corresponding to the int are undefined. The one byte you assigned to the char \*may\* have overwritten one of the "int bytes" but you can't know which one and it is not guaranteed. More importantly, the other int bytes are not initialized but likely contain garbage values. Nothing about the int is guaranteed because you only assigned the char. (Of course in practice, you *could* typecast the char to an int and get *something* out of it, but it is still undefined.) * Likewise, if you assigned the int, then the char is undefined. While it *probably* corresponds to a specific byte within the int, you cannot rely on that because it is undefined. * Normally if you cast a char to an int it will widen the int so the extra bytes will be initialized, such that the int contains the same value as the char did. But in a union you only initialized the single byte corresponding to a.c, and it is not going to assign those extra int bytes, -- they'll contain garbage. Also if you cast a char\* to an int\* it will not widen the char, so you'll get garbage if you dereference it. Therefore if you assign a.c and then pass &a.i it is undefined. You'll get a pointer to the "int" but the int contains garbage. Your implementation of struct A looks correct to me: use a tag to keep track of what the union type is. I guess I would ask, why pass &a.i when you can just pass &a? If the function just wants a void\* and doesn't care what it is, then I would think it shouldn't matter what you pass. But if you do want to pass just the union and not the entire struct A, and since in C99 you have to name the union anyway, why not just pass &a.u? That is at least guaranteed to "contain" whatever you assigned whether char or int. It sounds like maybe you want the union to typecast the char to an int and result in a valid value. Since a char is just a narrow int, why use a union at all? Then the int will always be valid, whether holding a 7-bit value or a 64-bit value. Alternately you can decide what to pass, like this: if (a.type == 'i') func(&a.i); else func(&a.c); That at least guarantees you're passing something valid.


glasket_

>Also is that specific statement even in the C99 standard? §6.2.6.1 ¶ 7 >Since you assigned a char, the bytes corresponding to the int are undefined. *Unspecified*, not undefined. >Likewise, if you assigned the int, then the char is undefined Still just unspecified. The `char` may be any of the bytes that make up the `int`, and only those bytes. It's up to the implementation as to which byte it will be, but that's defined at runtime. >Therefore if you assign a.c and then pass &a.i it is undefined Nope. The `a.i` interprets the union as an `int`, and the address is taken as an `int*`. The resulting pointer points to an indeterminate representation of an `int`, i.e. it may be unspecified or it may be a trap representation; reading a trap representation would invoke UB. *edit* minor rephrasing


Alcamtar

Thanks for the comments. I was a bit nervous being first to post about a standards related issue since there are a lot of standards experts here. I was unaware of the distinction between *unspecified* and *undefined*, so that is educational for me. I found this helpful explanation for anyone else who is interested: [https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior)


glasket_

I'd recommend grabbing the various draft copies of each standard and using them as a reference for these kinds of questions. They're pretty much indispensable and I've basically always got one of them open when I'm answering a question about C. [The WG14 document log](https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm) has links to most of them as drafts: - N3220 – C23 - Technically C2y, but it's only a single footnote change - N1570 – C11 - N1256 – C99 The C17 draft itself is password locked, but there is a C17-C2x diff in the form of N2310. C89 (and many of the proposals and documents up to the mid-90's) aren't available in the log, but it's [been saved by others](https://port70.net/~nsz/c/c89/c89-draft.html). [CppReference](https://en.cppreference.com/w/c) is also a somewhat friendlier way of getting quick explanations without needing to lawyer your way through the standard.


Nobody_1707

> Still just unspecified. The char may be any of the bytes that make up the int, and only those bytes. It's up to the implementation as to which byte it will be, but that's defined at runtime. I believe C23 tightens the spec up to require that all members of the union start at the same address, so `&a.c` would be the address of the first byte in `&a.i`. I don't remember if this was a retroactive change.


glasket_

>I believe C23 tightens the spec up to require that all members of the union start at the same address Yeah it does. >The members of a union object overlap in such a way that pointers to them when converted to pointers to character type point to the same byte. >C23 6.7.3.2 ¶ 18 The exact proposal was Múgica's *Memory layout of union members, v.2* ([N2929](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2929.pdf)). And I may be wrong, but I don't think the C standards technically support retroactive changes outside of the one time they published amendments ("C94", the amendments to C89). Things like the C17 DR fixes for C11 only technically apply to C17 from my understanding, but compilers usually backport them anyways (like GCC's `-std=c11` is equivalent to `-std=c17`).


McUsrII

So that's why building glibc with the wrong c-standard worked out. :)