Alignment possibilities ----------------------- At last, I got to look at what I had originally planned to investigate. First, I had read that on Intel machines since SandyBridge, things ran faster if functions were aligned to 32 bits instead of the default 16 (apparently, on AMD the default of 16-bit alignment worked well). That was written quite a while ago, if true it might not be true for Zen or Zen+ or Zen2. Second, I had seen the options for aligning data: ABI, or compat (don't waste space), or cacheline (the latter implies that large items will be aligned in their own cache line. The haswell i7 I'm using has adequate memory (15.5 GB after deducting the graphics, for 8 cores (4 real, 4 hyperthreads). so, I decided to try what ought to be the fastest approach: -falign-functions=32 -malign-data=cacheline. This is on top of -march=native and the cheap hardening. Summary of me results --------------------- There is _one_ concrete result: clang does not support -malign-data=cacheline (possibly, it doesn't support -malign-data at all). But I only discovered that near the end of the build, when I tried to build potrace for inkscape. Worked around that by forcing CC=gcc CXX=g++ when I invoked configure. I'm surprised that this didn't bite anywhere else, probably depends on exactly what gets built. Beyond that, the results were very variable: In my series of repeated tests (I've updated the spreadsheet, this build is shown as 'E' and inserted after 'B' (-march=native and the cheap hardening) to assist in comparing them. The test of the time taken to untar glibc was faster (mean time 1.2683s instead of 1.2758s), and also faster than the non-hardened build (1.2694s). The test using ImageMagick to convert raw photos (using LibRaw) to png was slower than the hardened build, but faster than the unhardened build. All the other series of tests had slower mean times. When I started this exercise, I had assumed that running individual tests once would be enough to show variations. Having given up on that, I'm still running the tests (most of them prove that something works, or occasionally that a particular item will hang). With one exception (convert a particular file to mp4 using ffmpeg, where the result took 5.876s instead of 6.191s), all of these results were slower. We're only talking decimals of a second, but in almost every case there was no benefit. I also keep an eye on the times for some of the packages to build, including tests if I run them (my scripts keep a note of this), and comparing them - again, I had wrongly assumed this might show characteristics, but all it really shows is variance. Anyway, a few packages built faster than in the cheap hardening build, I will ignore any variations within +/- 2% as is my custom. OpenSSH was 4.3% faster, ffmpeg 2.9% faster, xine-lib 4.2% faster. But on the other side, python2 was 2.8% slower, gimp 5.7% slower, qt-5.12.4 4.8% slower (qtwebengine was within the 2% tolerance), flakon was 14.3% slower. Given the wide variance across previous builds, I'm ignoring fftw and texlive except to say that they were slower. Summary: -------- The case for using these flags is not proven. 2019-07-09.