Does -march=native actually run faster ? ---------------------------------------- I decided it was time to find out what benefirt (if any) -march=native was giving me. I already know the cost (cannot run the binaries on CPUs which are not a superset of the build CPU). For this build there was also a possibility that using the full cheap hardening (specifically, -D_FORTIUFY_SOURCE=2) on all of texinfo (instead of just hte perl parts) might break something (it didn't). So, as the first build, just change -march= to -mtune=, remove the "do not need fortify on texinfo, because perl brings it in" 'fix', and let it run. Unlike with the rest of hardening, nothing much here warranted extensive testing, all I really cared about was build timings (generally slower, as expected) and the few measurable runtime tests. However, looking at build times it appeared to me that LibRaw, and perhaps also ImageMagick, were compiled faster than I had expected. Those might be outliers but I decided to add an extra test: I have a selection of example raw files from camera reviews. LibRaw is used by ImageMagick, but many raw files are described by 'identify' as tiff files which might mean it uses libtiff to open them. So I selected some raw files where that did not apply (canon, fuji,minolta, olympus, panasonic and sigma) and used 'convert' to convert them to pngs of the same size. I then went back to my original build, and also restored the previous hardened build from a backup to run the script on that. Long story short: with -march=native the runtime is faster but there is variation, hardened runs can be faster than non-hardened (and since hardening only adds extra runtime tests, other variability must be involved). For the second build, I then stripped out all the cheap hardening so that I could compare -O2 -march=native with -O2 -mtune=native. In general, some compiles are a little faster, but by the time I got to the end of the build (qt, texlive-source) the builds with -march=native were significantly faster. For runtimes there is generally an improvement with -march=native. However, 'sox' (which had already given me a lot of pain in the hardening) was a surprise - when built with -O2 -mtune=native instead of -O2 -march=native it does run faster! And to double the surprise, when compiled with -O2 -mtune=native and the cheap hardening, it runs faster still! In all other tests that I have run, it appears that using -march=native with cheap hardening runs faster than -mtune=native without cheap hardening. Looking again at some more of my build times for individual packages (including tests, where I run them), the times continue to show variation. But the following packages seme to show excessive variation in their build times: 1. Anything using rustc, including rustc itself. Further testing on a different machine, with 1.35.0, suggests that most times are reasonable consistent, but that there are outliers. Testing builds of 1.35.0 on _this_ machine (after the -mtune-native build without hardening) suggests that build SBUs with the default settings might be +/- 0.5 SBU, or +/- 1.0 SBU when tests are run - but on this machine I used gcc-8.3.0 and with 8 cores the timings were much worse than on the other 4-core machine which is running gcc-9.1.0. 2. fftw : this has an extreme amount of variation, perhaps building it three times for the different variants magnifies this. 3. fdk-aac and ffmpeg show wide variations in their build times. 4. Packages using qt (falkon, g'mic, vlc) take much longer to build when qt is using -mtune=native instead of -march=native, but for falkon the time for building with cheap hardenng was a lot faster than without. As with sox and some of my other build measurements, something else seems to contribute to odd variations. But qt itself, and qtwebengine, did not take significantly longer to build. 5. TeXLive (source) takes significantly longer to build (including the tests) when not using -march=native. To the extend that I can measure the runtime performance (single runs, because of the overheads to get the files to a usable state) it does not seem to be more than a couple of percent slower. There might be other packages with similar wide variations, I don't check the times for every package. Beyond that, the general rule is that although NOT using -march=native might speed up compilations a little (although not for complex C++ code), it slows down runtime performance. Conclusions: ------------ If you need to be able to use the binaries on a different machine which is not a superset of the build micro-architecture, do not use -march=native. If all your machines use CPUs from the same manufacturer (Intel or AMD) and are compatible, and you wish to use binaries on different machines, use the highest compatible version of -march (e.g. SandyBridge for that and all later desktop/server Intels, amdfam10 for Barcelona and later including both Kaveri and Ryzen which are not mutually compatible). But if you specify -march, please also be aware that the compiler will not be able to bootstrap a more generic system. With those warnings out of the way - for reasonably-recent CPUs (i.e. those where the gcc developers have had time to tune things for them), -march=native does seem to bring a few percent of runtime speed. And using cheap hardening with -march=native seems to generally run faster than using -mtune=native without cheap hardening. Where -march=native cannot be used, cheap hardening tends to be slightly more expensive at runtime (in the cases I repeatedly measured, up to 4% instead of up to 2%). For most uses, I consider that this is still worthwhile (most attacks are a result of leveraging a series of vulnerabilities). 2019-07-06.