Introduction
____________

This is my test suite for attempting to find/resolve differences between
files created in subsequent builds of linuxfromscratch.  I use it to compare
a first full build against a second.  Normally, this means build it once,
then boot into that and use it to build a second time.  It can also be
used to test new build methods (revised instructions, or altered scripts).
Indeed, there is no requirement that a full system be completed - you could
use it to test altered instructions for a set of packages installed into a
separate directory, or built upon copies of the same existing files.

 The tools we use to build the new system (/tools, and /cross-tools in
Cross-LFS) are thrown away, so I don't test them.  In principle, farce can
test any readable files, but in practice c++ libraries and programs sometimes
show binary differences which I can not yet explain - I flag known examples
of these up separately (libstdc++, lynx, etc) but if you are building other
packages you may need to add things here.  An explanation of why these items
differ, and hints on how to resolve the differences, would be welcome. You
will probably need to add extra code to deal with Python, if that crops up
in what you are testing.


Requirements
____________

Apart from the directory trees (either where they were installed, or a copy), 
this suite requires

/bin/bash
make
perl
sed
diff
cmp
binutils for the architecture you are examining (if I build for x86, I can't
expect to use 'ar' and 'strip' on x86_64-64 or ppc archives and binaries!).

These are all installed in a standard lfs installation.

 At the moment, the script will only work for native builds, it just calls
'ar' and 'strip', perhaps it could be extended to use cross-compiled tools.

 Probably, the version of binutils used on the system where you run this
should be at least as new as the version on the system you are testing, in
case new sections get added to the files.


How to use this
_______________

As a user, run filelist for each tree (these runs will typically be some
hours, or days, apart, the default output files are named in the format
filelist-CCYYMMDDhhmm (date and time), or you can pass an extra parm to supply
the output filename.  Then run farce (with an optional --directory dirname).
The output will be in various files with names starting 'farce-'. 


Examples
________

farce / filelist-200510051612 /mnt/lfs filelist-200510052359

(two builds only a few hours apart, running the first build, the second
mounted at /mnt/lfs)

farce build1 filelist-200510061400 build2 filelist-200510061402

(compare two partial builds which have been copied to ~/build1 and
~/build2 and then had filelist run on each of them - NB the time in
filelist's name only resolves whole minutes).

farce --directory results-0-1 build0 filelist-0 build1 filelist-1
(compare to copies of builds in directories ~/build0 and ~/build1)

Understanding farce's output:
_____________________________

Error messages appear on the console, either something in farce broke,
or it's a permissions thing (permissions failures are normally harmless
for farce's intended usage).  Obviously, if you get a lot of these, you
ought to review them.

If files are only in one of the builds, these are reported on the console
as well as in the results.

Failure messages appear on the console.

At the end, you get the totals -
number of files in only one of the builds
number of files compared
number of these which could not be read
number of identical files
number of files where the difference was expected
(see the expected function in farce - this includes certain c++ libraries
and precompiled headers)
number of files where the difference was allowed
('allowed')
number of other files which differed unexpectedly
('differed')

The file farce-results contains all this information, plus detailed messages
explaining why a particular file was treated the way it was (obviously,
files which are identical don't get listed). e.g.

accept: /usr/bin/perl after --strip-debug and processing

 This is a binary, we tried strip --strip-debug and it still differed, so then
we tried replacing common date/time/kernel-version regexes by tokens (the
'processing' part) and the results match.  No, we don't do strip or replacing
the parts that match regexes on the real files, we do them on copies.

accept: /usr/lib/libiberty.a after diffing the members

 All ar archives contain timestamps for the members.  Farce extracts the 
members and then compares them.

accept: /usr/share/i18n/charmaps/CP949.gz after ignoring gzip timestamp

 Gzipped files contain the original timestamp near the start of the
file - we compare the data after this.
 
accepted /usr/share/doc/groff/1.19.2/examples/grnexmpl.ps after processing

 A file that isn't gzipped, not an ar archive, not using shared stuff. In
this case, a postscript file that includes date/time information.

archive member wlocale-inst.o differs after processing

 We ran this member through the regexes, but it still differs.

FAIL: /usr/bin/updatedb is different

 A failure to look at.  In this case, an incorrect build order had encoded
/tools/bin/sort into the updatedb script, instead of /usr/bin/sort, this
only came to light after changing the build order.

 Every failure is diffed into the farce-extras file (for members of ar
archives, we do this for each failing member).  This shows the real file
name, then a diff between two temporary files containing the data when
we decided to fail them - so known date/time regexes have been replaced by
tokens, and binaries have had strip --strip-all run on them.

 Sometimes, the diff output in farce-extras is meaningful (e.g. if you can
see some sort of date/time/kernel information, perhaps containing a token
like %DT05%, you need to add a new pattern to the regexes - I'll take patches,
or make suggestions about how to change this - if you still have the files
and filelists).  Other times, it's just binary data - these usually end up
being marked as expected if I'm reasonably sure that they always differ.

 Details of before and after the regexes are in farce-substitutions - you
should look at this to make sure you are happy that nothing important is
being swept under the carpet (e.g. libc version strings match one of the
kernel regexps, but I don't think this is a problem).

Issues
______

 This seems to work on x86, but some processors/hosts report differences in
c++ programs/libraries and others don't for the same files (copied over as a
tarball).  These differences, when they do show up, are not understood but
assumed to be something specific to c++ files.

 This has only had minimal testing on processors other than x86.  On multilib
builds (both x86_64 and powerpc64) the build results were very different when
last tested, but that might be a result of a particular glibc snapshot.

To Do
_____

1. Review multilib and pure64 results.

2. Do a bit of configure stuff, so that e.g. the version is generated in
 both scripts.

3. i18n.


Ken Moffat, 2006-01-15

