Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Preprocessor Iceberg Meme (jadlevesque.github.io)
273 points by camel-cdr on July 14, 2022 | hide | past | favorite | 135 comments


Since some people missed this, every entry is a clickable link.



It's a huge reminder C++ is missing a proper, hygienic, macro system. Too many things that are a pain to do in C++ would be easy with a real macro language. I have hope we'll get it sometime this decade, seeing all the work on the language since C++11 still happening more than 10 years later.

It's worth nothing that a macro system plus basic reflection is where the real power of macros lies at.


Actually, some of the problem may be with C++ itself. When the C preprocessor is used in a more flexible, dynamic language, you can do surprising things.

In the cppawk project, which combines the preprocessor with Awk, I used the preprocessor to create an iteration syntax with a vocabulary of useful clauses that combine together for parallel or cross-product iteration.

Furthermore, the clauses are user-definable.

https://www.kylheku.com/cgit/cppawk/about/

What helps is that you don't have to deal with types and declaration syntax. So many of the ideas in cppawk will not translate back to C or C++, or not without wrecking the syntax with additional arguments and whatnot.

The man page for the <iter.h> header has a section on defining a clause; I provided an example of defining a clause that iterates on alpha-numeric string ranges like from "A00" to "Z99".

As an experienced Lisp programmer (and implementor), I had to rub my eyes several times to believe I had such a thing working, under such a universally maligned and reviled preprocessor.


Other than some more clever hacks, templates alongside constexpr already cover a lot of possibilities.


You don't get the guarantees for what code is generated that you do with macros and they take a lot longer to compile. Also you can't just modify the AST of the current scope like you can with macros - pretty much every single time I have to use a macro it's because I need to generate code within the current scope. Fortunately they are usually very short.

The two for me fill different niches - for generic type-safe functions, parametrised types, etc. - templates. For text generation - macros. A hygenic macro system that lets you generate AST nodes and gives you access to type information would be absolutely divine, but it doesn't seem like we're getting it. Imagine if we had a script language that had full access to the compiler's internals.


I don't need to imagine when we have Circle, and experimental implementations of C++ reflection.

That is what is missing piece of the puzzle.


Most of the abuses of #define I have seen, a template or constexpr takes care of. There are still some cases where it is nice. But many times you probably should just write a function/method/template out of it anyway.


> It's a huge reminder C++ is missing a proper, hygienic, macro system.

Templates, man.


In case someone reads this comment as sarcasm: I got into a convo with Stroustrup about this once, back in the 90s. I said one thing I missed was the lack of macros, and he made a glancing comment about the preprocessor which I obviously dismissed and said didn’t even count. He bitterly said, “Yeah, unfortunately when something like that pollutes an ecological niche it becomes impossible to eradicate. The best I could get away with was templates.”

A good perspective.


Bjarne was right, although C++ has been slow to adopt replacements for the preprocessor (such as modules, and conditional compilation).


I'm standing by for the announcement that some caffeine-addled Boost metaprogramming madman has implemented the Rust borrow checker as a C++ template, or at least thinks that he may have, when the compilation completes sometime in the 2030s.



It's not clear how much boost.org magic they used. Failing that, a GCC extension could be needful.

From the ref:

---

Conclusion

We attempted to represent ownership and borrowing through the C++ type system, however the language does not lend itself to this. Thus memory safety in C++ would need to be achieved through runtime checks.

Example code

#include <type_traits>

#include <utility>

#include <assert.h>

#include <stddef.h>


Why use template metaprogramming? You can use compiler extensions for that since the code will be valid either way


C++ has the Maximum Pain Rule.

Templates it must be.


There is a branch of clang with support for "-Wlifetime" which might be a simpler alternative.


It is not like I didn't try...


Don't forget function templates! D can do them, too, but we strongly discourage their use. The trouble is that people use them to create DSLs that are indistinguishable from C++ code. For example:

    #include <boost/spirit.hpp>
    using namespace boost;
    int main() {
      spirit::rule<> group, fact, term, expr;
      group   = '(' >> expr >> ')';
      fact    = spirit::int_p   | group;
      term    = fact >> *(('*' >> fact) | ('/' >> fact));
      expr    = term >> *(('+' >> term) | ('-' >> term));
      assert( spirit::parse("2*(3+4)", expr).full );
      assert( ! spirit::parse("2*(3+4", expr).full );
    }
https://studylib.net/doc/10029968/text-processing-with-boost... slide 40

What's C++ and what's regex?


Aren't these just normal parser combinators?


A >> means right shift, except when it doesn't in the example.


Yes, obviously operations with parser combinators are different that those with numbers. (Also, I find it kind of dumb to reserve short symbols for low-level operations that are rarely used in normal programming.)


Well, the problem here is the re-use of existing operators.

(That's why it's great that Haskell and other languages in that family allow you to define your own operators, instead of eg re-using bit-shifting for IO.)


Defining your own operators means combining the lexer with the semantic analysis, which comes with all sorts of complexities.


Actually, no. At least the way it's handled in Haskell and OCaml avoids this problem.

(Btw, parsing C++ is already Turing complete.)


Templates are nice, but they have shortcomings a more generic macro system wouldn't. They also have the issue where the more complex is your task, the more convoluted the code has to look, compilation times also increase and parsers (ergo, IDEs too) have trouble giving meaningful info on parameters. Don't even get me started on template errors because that's an atrocity on another level :(


The problem with no info on parameters and horrible template errors are related:

The root cause is that template expansion is duck-typed. 'Concepts' are supposed to fix that, I heard.


Can it be used to implement automatic serialization on simple struct / class types?


If you only use std::tuple, then yes xD


There is a nasty trick which uses structured binding to convert an arbitrary aggregate to a tuple.

That still doesn't help if you want the fields names.


You have any link?



Thanks. This may be useful for me someday.


No, because C++ templates are basically a Lisp, and only things that are list-like can be processed by templates. (Like tuples, for example.)


That's a bit confusing. Lisp has all kinds of datatypes.

But you are right that C++ templates are a bad functional programming language with duck typing.


But there are automated ways to to convert aggregates to tuples.


Recently started hiding "... do" and "while (0);..." in macros to write nicely bracketed start and end C macros eg this set for generating HTML:

https://github.com/libguestfs/libguestfs-common/blob/master/...

You can write:

  start_element ("memory") {
    attribute ("unit", "MiB");
    string_format ("%d", g->memsize);
  } end_element ();
to generate <memory unit="MiB">1024</memory>


That's pretty cool. It's been a while since I've done C, but couldn't you use a `for` loop instead of a while and perform any necessary cleanup in the "update" section? i.e. https://gcc.godbolt.org/z/jq84jondh

(The condition is optimized away by the big three: msvc, clang, and gcc)


Looks good. I wonder why this trick is invariably implemented with "do {" and "} while (0)" and never with "if(1) {" and "} else" ?


I think in part because do … while expects a ; at the end so you are obliged to provide one, which makes the macro feel more like a “real” function call.

if … else {} you could omit the ;


Good point, thank you. The while (0) demands the expected ; The trailing else hopes for the expected ; but would tolerate a wide range of nonsense instead.


Why do you need to start with while (0); ?



  #define end_element()    \
    while (0);     \
    do {      \
   if (xmlTextWriterEndElement (xo) == -1) { \
     xml_error ("xmlTextWriterEndElement"); \
   }      \
    } while (0)


Talking about the first while (0);


I wrote a useful extension to the C preprocessor for GCC, and submitted it to the gcc-patches mailing list in April. This went unnoticed, as have my subsequent pings since. I'm planning to ping once a month from now on until the rest of 2022, and then switch to quarterly.

https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593473....

__EXP_COUNTER__ gives a macro expansion to a numeric value which uniquely enumerates that expansion.

The sister macro __UEXP_COUNTER__ allows a macro expansion to access the parent's value: if a macro is being expanded in the body of another macro, one level up, it provides the __EXP_COUNTER__ value of that parent macro.

This feature solves the problem of producing unique names in a macro. (Unique within a translation unit.)

The __LINE__ symbol gets abused for this. The problem is that it's not unique. A macro can be called two or more times in the same line of code. Moreover, the same line number like 42 can occur multiple times in the same translation unit due to #include; a line number is not unique within a translation unit.

__COUNTER__ is next to useless because on each access, its value changes. It's useful in a situation in which a name is needed syntactically, and has to be unique, but is otherwise never referenced: just mentioned once and that's it.

Multiple references to __EXP_COUNTER__ in the same macro expansion context produce the same value.


As someone who understands C macros relatively well and who makes frequent use of them, I don't think I understand what __EXP_COUNTER__ does, and how it is different from __COUNTER__. I would have to experiment each time before using it, and would then quickly forget again the intricate details about expansion order, etc., similar to how every time I do some kind of STRINGIFY macro I have to make sure to use the right number of forwarding macro calls.

Is there a concrete use case for this that really can't be solved by __LINE__? I've used __LINE__ in the past to generate unique identifiers used in macro-generated code chunks. I don't see that non-uniqueness thing you mentioned causing any problems except for global variables (so not really an issue in my book).

As much as I love the C preprocessor as a crude tool that can solve many practical issues that are solved in other languages with a magnitude more complexity (besides solving problems that other languages don't have), I think the value doesn't come from its unintelligible execution model. And if __EXP_COUNTER__ is so difficult to understand, I personally don't like it.


The parent comment explained pretty well the advantages over __LINE__ imo.

Crafting macros is often black magic, but using them shouldn't be (if they're well crafted). Having an implicit rule in your macro that it cannot be used twice in the same line is surprising and potentially dangerous.

Another example is where you have a macro doing lots of work, if it needs to use a submacro multiple times that itself needs a unique identifier, then __LINE__ is no longer sufficient.

__UEXP_COUNTER__ is a little more difficult to imagine a use-case for, I'll admit (I can see it allows passing counters around, but I can't see why a parameter couldn't do the same).

Again, preprocessor macros are black magic, these additions seem a lot simpler to understand than `__VA_OPT__` (and its predecessor `##__VA_ARGS__`) or MSVCs awful stringify problems, as you brought up.


__UEXP_COUNTER__ is required so that you can abstract creating unique symbols:

  #define MAC(...) WHATEVER UNIQ(X) WHATEVER
without __UEXP_COUNTER__, you would have to pass __EXP_COUNTER__ as an extra argument to UNIQ(X).

You cannot make a nickname for __EXP_COUNTER__; e.g.

  #define EC __EXP_COUNTER__
because the expansion of EC itself has its own counter!

Using __UEXP_COUNTER__, UNIQ(X) can just do everything necessary: obtain MAC's counter, and combine it with the prefix X.


__EXP_COUNTER__ has a stable value in a given token replacement sequence (right hand side of a macro).

__COUNTER__ __COUNTER__ __COUNTER__ might give you 42 43 44, whereas __EXP_COUNTER__ __EXP_COUNTER__ __EXP_COUNTER__ will produce 73 73 73.

We can imagine that every macro has a hidden parameter:

  #define MAC(A, B, C, __EXP_COUNTER__)
we don't pass this parameter when calling the macro; the macro expander does that, and it passes an integer token whose value is incremented for each such call.


Thanks, it's a little clearer in my mind now. But wouldn't this work for you?

    #include <stdio.h>

    #define FOO_(c) printf("%d %d %d\n", c, c, c);
    #define FOO() FOO_(__COUNTER__)
    #define FOO2() FOO_(__COUNTER__)

    int main(void)
    {
            printf("%d %d %d\n", __COUNTER__, __COUNTER__, __COUNTER__);
            FOO();
            FOO();
            FOO2();
            FOO2();
            return 0;
    }
Output:

    0 1 2
    3 3 3
    4 4 4
    5 5 5
    6 6 6


Right, but now you can't use that COUNTER value in a macro and have it stay the same value. Think about concatenating a variable name with the counter and trying to use that same new name later in the same macro. Like NAME ## __COUNTER = NAME ## __COUNTER -1; This won't work without some extra state.


This code won't work in any case. You can't do arithmetic like that. And I think it's much better code anyway to create the expansion once, because otherwise you have to construct NAME ## __COUNTER at each use and it quickly becomes unmaintainable and hard to change how you construct that name.

IMO the best solution if you want to avoid an extra indirection to inject some state, would be preprocessor variables that you can assign to in a macro expansion. Procedural preprocessor code basically. But the preprocessor doesn't work like that.


I'm probably missing something, but can't you use counter to generate a unique name once and then forward to to another macro so that it can be used multiple times?


Possibly.

Whereas definitions are lazily evaluated, so that if we have:

  #define COUNT __COUNTER__
each occurrence of COUNT behaves like an invocation of __COUNTER__.

But, this is not true of parameters because they get expanded before substitution.

  $ gcc -E -
  #define MAC(PARAM) PARAM PARAM PARAM

  MAC(__COUNTER__)

  # 1 "<stdin>"
  # 1 "<built-in>"
  # 1 "<command-line>"
  # 31 "<command-line>"
  # 1 "/usr/include/stdc-predef.h" 1 3 4
  # 32 "<command-line>" 2
  # 1 "<stdin>"


  0 0 0
Thus perhaps you may be able to get something like __EXP_COUNTER__ by splitting your macros into interface and implementation:

  #define MAC_IMPL(A, B, EXP_COUNTER)

  #define MAC(A, B) MAC_IMPL(A, B, __COUNTER__)
I'm guessing this is what you mean by forwarding.

This could be a pretty major inconvenience, if you have to do it in the middle of a situation that is already stuffed with preprocessing contortions. Like say you had to define 32 macros that are similar to each other, for whatever reason, and you want this hack: now you have 64.


Yes the interface/impl split is what I had in mind.

Even only moderately complex PP metaprogramming requires multiple rounds of wrapping, so I'm not sure it is such a big burden.


By the way, I'm also interested in solving the "no recursive macro" problem hinted at in this submission. While working on __EXP_COUNTER__, I looked into it a bit.

The big issue is that the GNU C preprocessor uses global state for tracking expansion. In effect, it takes advantage of the no-recursion rule and says that during a macro's expansion, only one context for that expansion needs to exist. That context is patched into the macro definition, or something like that. (I don't have the code in front of me and it's been a few months.) The preprocessor knows that there is a current macro being expanded, and there is a stack of those; but that is referenced by its static definition, which has a 1:1 relationship to expansion state, like parameters, location and whatnot. That might have to turn into a stack, perhaps; there is a refactoring job there, and the code is a bit of a hornet's nest.

In terms of syntax/deployment, it would be easy. I envision that there could be a #defrec directive that is like #define, but which creates a macro that is blessed for recursive expansion. Or other possibilities: #pragma rec(names, of, macros, ...) which is better for code that has to work without the extension, since it uses #define.


We use this (trick taken from stackoverflow) to make __COUNTER__ usefully provide unique, reusable names:

https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/includ...

Example use for MIN and MAX macros which evaluate the parameters once and can be nested to any depth:

https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/includ...

I don't know what __EXP_COUNTER__ would add.


Your NBDKIT_UNIQUE_NAME(name) cannot produce the same name twice because it doesn't take a counter as a parameter.

__EXP_COUNTER__ adds the ability for a macro expansion to have its own counter for that expansion instance, without some other macro having to hand it one as a an extra, visible parameter.


>I'm planning to ping once a month from now on until the rest of 2022, and then switch to quarterly.

I would do the opposite: keep halving the ping interval until they notice.


Macro systems inevitably wind up being used to create a specialized undocumented language that nobody but its creator understands.

I know how enticing they are, I designed and implemented one myself for the ABEL programming language. I used lots of clever C macros in my C programming, and was proud of them.

But, eventually, I removed all the macro usage, and quite preferred the resulting code. It was cleaner and easier to read.

It's not just C macros. It's the same for assembler macros. I've heard from others it's the same for other languages that rely on macros.

Essentially, macros are a cheap way to add power to a language. A better way is to add proper metaprogramming features. This is the route we chose to go with D, and it is satisfyingly successful.

Macros - just say no.


I have seen this kind of ”flip-flop” behaviour people have with macros a few times. First you go all in, burn yourself, and then go to the other extreme.

Personally, i think macros are a good way to automate some common tasks, but you have to be carefull to keep them short. Also it is a good idea to prune macros periodically to remove what you dont need.

In Cpp, If you find yourself choosing weather to use a macro or a template; Choose the one which is more terse!

Also macros will always inline in debug while templates will generate functions in debug builds, without optimizations. This may be an important performance consideration at times.


D has an "always inline" annotation for functions, so they'll be inlined even in debug builds if so desired.

Using a symbolic debugger with macros is just another facet of the slow moving disaster of macro usage.


I can hear the sound of a thousand LISP devs hurting in parens reading this comment lol.

Macros, as with most things (including even goto!) have their place, the problem is when they’re abused. But to say they’re never useful ever and you should instead always rely on language features is not something I agree with, and could even lead to language bloat if you need a full fledged feature for every little thing which would be trivially solved with a macro.


My unfettered opinion is that Lisp has not really caught on because it relies on macros to make it useful. Every project invents their own language on top of Lisp, incompatible with anyone else's.

It's like the problem with C++ before C++98. It had no string class, so everybody invented their own, all incompatible with everyone else's.

BTW, everyone says that they understand my point and use macros modestly and responsibly. Nearly all of them go on to create their own undocumented impenetrable language out of those macros.

It takes a programmer about 10 years of creating and using macros and dealing with other peoples' macros to come to the conclusion that the whole feature needs to be scrapped. Sadly, there aren't any shortcuts to this realization :-)


> My unfettered opinion is that Lisp has not really caught on because it relies on macros to make it useful. Every project invents their own language on top of Lisp, incompatible with anyone else's.

Been saying this for years - this way of programming is powerful for the lone hacker, but lethal for team efforts. I will never forget the guy who ported some weird function evaluation framework from Clojure to a Java app and then left for greener pastures, what he left behind was the gnarliest of mindfucks.


It also took the C++ community about 10 years to realize that the way iostreams was doing operator overloading to do pipelining was an abomination as well.

In the D community, we also strongly discourage operator overloading for any purpose other than creating arithmetic types.


I am doubtful that even WG21 as originally constituted would have accepted I/O Streams with its "Look at me, I've got operator overloading" operator abuse if it wasn't Stroustrup's own code. If some outsider had come along and said "Look at this slower, clumsier, operator abusing alternative to C's stdio" the committee might have quoted Stroustrups' own words condemning such abuse. "the ability to define new meanings for old operators can be used to write programs that are well nigh incomprehensible".

I'm with you up to a point on overloading, if it were up to me for example Rust would not implement Add and AddAssign on String, and certainly Java wouldn't special case += but we are where we are.

However Rust has several operators (fewer than C++ but still several) that aren't just for arithmetic types. Deref and DerefMut of course (used to implement smart pointers such as Arc), Index and IndexMut (for the indexing operator []) but also Try (implementation of the ? operator) and (though rather more distant into your future than Try if you write Stable Rust) the Function operator traits Fn, FnMut and FnOnce which represent callables.

Of course arguably Rust isn't overloading operators at all. Rust has no subtyping, and so whether you can Add or Multiply or Try something is a matter only of whether that type implements the associated Trait.


Not everyone shares that point of view.

Gladly using iostreams since 1993.


Fortunately, I said "community" not "everyone"!


https://www.merriam-webster.com/dictionary/community

Unless you define "community" as those that do not share your point of view.


Sorry, "everyone" is your strawman.


Whatever fits your concept of community.


I think Rust's hygienic and declarative "by example" macros are very nice actually. You could of course do the same things with its procedural macros but that's messy and harder to maintain. Appropriate tools for the job, don't use a chainsaw to trim your rosebush.


Give it 10 years!


What are these meta programming features if I may ask?


Introspection combined with Compile Time Function Execution combined with the ability to compile string literals into code.


I don't know D but it sounds like you do a lot of work at compilation which is good. I never understood why people took away the preprocessor but then forced the use of reflection which then breaks at runtime instead of breaking at compile time. When I write C# there would be so many opportunities for short preprocessor macros. Instead you either have to create a reflection monstrosity or copy/paste the same piece of code dozens of times.


what if instead of the macros being a side language, they _are_ the language? (github.com/civboot/fngi)


Relatedly, this article is claimed to be a description of the algorithm that the C standard intended; it is surprisingly succinct: https://news.ycombinator.com/item?id=22444447


Can someone explain what

  #if static_cast<bool>(-1)
is about?

My thoughts were "that's just #if true, no?", then "wait, static_cast is not part of the preprocessor, that can't work" to "wtf, it actually compiles"...

Edit: As people point out, you can click on it to get context. And yeah, that one is an oof.


> The #if statement replaces, after macro expansion, every remaining identifier with the pp-number 0. So #if static_cast<bool>(-1) is equivalent to #if 0<0>(-1), #if 0 > -1, and #if 1.

Wow, I have no idea why this would ever be done.


> I have no idea why this would ever be done.

You'd never literally write

  #if static_cast<bool>(-1)
but it could be the result of a macro expansion.


well it's the consequence of

    #if SOME_STUFF
evaluating to 0 if SOME_STUFF isn't defined


Because the preprocessor doesn't know what static_cast is, so in an if it just evaluates to false as do any values that haven't been defined.


I suspect the point is that the preprocessor language expression syntax that is out of whack with the host languages it is integrated into. If you hoist an expression of the language proper into a preprocessing directive, you may get gibberish.

This could happen by accident, particularly through layers of macros:

Say you have:

  #if SOME_MACRO(ARG)
originally, this expands to an constant expression in which everything is an integer; then someone edits the macro. Things may still compile, but the expression is gibberish, not doing what it looks like it's doing.

The macro could be used in non-preprocessing contexts:

  int x = SOME_MACRO(X);

  if (SOME_MACRO(Y)) ...
so that programmer might have a good reason for editing it; just they didn't notice it's also used in an #if directive.


This one has bit me twice in the past 35 years.

I have some program I'm working on, doing the usual edit/compile/run/debug cycle. At some point I decide to compare two versions of some section of code, so I write out temporary files of the old section named "old" and the new section named "new". Then compiles start failing, but oddly it is a file that I haven't edited recently.

The issue is that some code (not necessarily even mine) has an "#include <new>" and it is picking up my temporary file named "new".


One of the most odd issues I have encountered was a test case that would fail if one random log line was deleted (which would normally means UB or timing issues) but, wildly, not when the log line was commented out. Turns out it was interaction between the use of __LINE__ in a macro to generate unique identifiers and a violation of the One Definition Rule.


Heh. If I had a nickel for every time I shot myself in the foot over the years by dropping a temporary file named "test.py" somewhere... I'd don't know about rich, but I'd probably at least be able to buy myself a coffee.


Anyone know what SIMD means at the bottom layer is here? I know what SIMD is I mean in the context of the preprocessor (and being the worst offender apparently).


It is not obvious, but the entries are clickable! It [0] is apparently a technique for speeding up processing of preprocessor lists.

edit: it is basically loop unrolling.

[0] https://jadlevesque.github.io/PPMP-Iceberg/explanations#sing...


Sorry, I misread your comment as clickbait instead of clickable.

~~I can see how others are somewhat clickbait, but this is literally single instruction/continuation multiple data. It uses a single step in the continuation mechanism to compute on multiple elements of data to speed up the computation.~~


No worries, I probably misread your comment too :). I thought you were asking what SIMD/SCMD is in this context, but as the submitter you probably already did, and already knew that the links were clickable.

BTW: I don't think the titles of each entry are particularly clickbait-y.


You can click on each item for an explanation.


Unrelated, but yesterday I found out you can have two bools in C++ that are both true but do not equal each other, by reinterpret casting them from (u)ints. I think this was for the standard bool type too... Now I'm questioning the most basic of things.


I'm very sure that's just UB. The standard requires unique representations of true and false (e.g. byte values of 1 and 0, but for all you care it could be 13 and 37). Converting an integer to bool (even implicitly, as in `if (3)`) is required to lead to those values in the abstract machine.

If you somehow force a bool to be a different value (e.g. `*(int*)(&myBool) = 7`), that's UB.


It's operating on a boolean still though, here's an example - https://www.onlinegdb.com/tfHPZ_FOfY


Line 21 and 22 are UB, so the program has no meaning according to the standard.


This is undefined behavior. Compile with -fsanitize=undefined and run it, you will get a runtime error.


And so it does! Thank you, will see if I can keep it to prevent me from running into this issue. I have a feeling the Unreal codebase will be full of UB abuse though.


Unfortunately most large C++ codebases are...


Clang, MSVC and GCC all have options to turn off various flavours of UB, or rather to define the behaviour in those cases. I strongly suggest using -fwrapv -fno-strict-aliasing -fno-delete-null-pointer-checks (and the equivalent in MSVC) in every large project. That is the easiest UB to hit and while the program will have a bug, it will at least be easier to reason about and optimised vs debug builds will have the same behaviour. Debugging "the compiler deleted my if meant to catch and log an error condition because after the inlining pass some function dereferences a pointer, thus the pointer cannot be null, thus the if can be deleted" is... hard.


Additionally all of them have the ability to enable bounds checking and iterator validation in the standard library.


It boils down to C compatiblity, every non zero numeric value is true, that language where good developers make no errors.


I know that's the case for ints, but in this case [0] the int was cast to a boolean, which I thought would ensure comparisons would perform as expected, but no such luck.

[0] https://www.onlinegdb.com/tfHPZ_FOfY


You are not casting an int to a bool (which would indeed do the right thing) but casting a pointer to int to a pointer to bool which violates strict aliasing.


This is not strict aliasing as far as I understand. Strict aliasing is about inference of distinctness of pointers. The case here is that an invalidly typed pointer is created (a bool pointer pointing to where there is no bool). Not sure what this situation is called in standardese.


The type aliasing rule is also known as the strict aliasing rule, see for example: https://en.cppreference.com/w/c/language/object


After skimming this, I still think the strict aliasing rule is used by compiler to avoid re-reads. What you were talking about is probably something else, maybe an "invalid lvalue access" as per your link.


From cpp reference: "Strict Aliasing: Given an object with effective type T1, using an lvalue expression (typically, dereferencing a pointer) of a different type T2 is undefined behavior, unless [non relevant exceptions omitted]".

In this case the expression has type bool and the underlying object has type int, so it is a straightforward strict aliasing violation.

With GCC you can compile with -fno-strict-aliasing to ignore this rule. But now you fall afoul of the rule that prevents accessing an invalid representation (i.e. a trap-representation) of an object. This rule is also described in the link I posted before, under the object representation paragraph.


ok, so in the case above, it's both (if strict aliasing is active). Makes sense now.


Ah fair, thanks for pointing it out! Wasn't my issue, just the minimal reproduction of behaviour that happened over a few external modules, causing this issue.


The strict aliasing violation is incidental. You can change int8_t to char in the code and get the same result with no strict aliasing violation.


Sure, as per my other comment, it then would be an object representation violation.

Even ignoring both rules, there is still no reason to expect the assertion not to fire:

    int x = 2;
    assert(*(bool*)x == true);
This is the same as:

    float x = 2.0
    assert(*(int*)x == 2);
Just because two object are convertible, there is no reason to expect their representation to be the same.


Sometimes people indeed come up with stuff where I can't figure out why anyone would ever write that.


It wasn't really one particular function, but an aggregation of behaviours over some external modules. None of it written by me, luckily :)


This illustration is missing a layer. Buried down in the silt, under the abyssal plane at the bottom of the ocean.. Ken Thompson.


Love this iceberg with explas for PLs. Does anyone know another one?


I made a general one about C++ some time ago: https://fouronnes.github.io/cppiceberg/


"C++0x concepts were rust traits" was an interesting read, leading me to look at P2279, and then at P2437 which may end up in C++ 26. Thanks.



This is awesome!

The C/C++ preprocessor is probably the esoteric programming language that see the most real world use. Or would that be C++ template metaprogramming...?


i don't know which is worse: pre-processor abuse in C land or object oriented design abuse in C++ land, both can lead to code that is quite hard to maintain...

(i know that you can have pre-processor abuse in C++ too, but that's not common practice)


Even with the explanations, I still don’t understand “no argument means one argument”


and... this is why I prefer Go. No macros.


Go has even uglier half supported things like #embed.


#embed coming to c distributions near you in 2023 [0]

[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm


Seems fine to me:

https://godocs.io/embed

Difference is it's just data, not executable code. I don't understand why people need macros, when functions exist.


//go:generate


I started but couldn't finish. For example __FILE__ (and __LINE__ and similar) certainly can be used in user-written macros, but they evaluate to the file/line they appear at in the macro source, not at the line the macro is invoked.


This is not correct. They evaluate at the point where they are expanded.

I’ve used them countless of times to implement “log” and “trace” macros that show file/line.


I have used them countless timers myself, but not within another macro

Edit: Sorry I am on multiple drugs at the moment for a severe fracture - I'm a bit confused, nurse has just been. I will shut up now.

To prove I am not completely out of it (though wrong), here's one I wrote a lot earlier: https://latedev.wordpress.com/2012/08/09/c-debug-macros/


Sorry to hear that! Rest up and don’t worry about cpp macros for now ;) they’ll still be there after you heal. Good luck!


Honestly, I suspect that cpp macros will outlive us all


most commonly they are typically used in the definition of the assert macro. For example: https://github.com/lattera/glibc/blob/master/assert/assert.h .


> they evaluate to the file/line they appear at in the macro source, not at the line the macro is invoked.

What are you talking about? Maybe I'm missing your point, but:

#define A __LINE__

A

A

Expands to:

2

3




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: