Henceforth, term CmpBO is used to mean Comparing build outputs.
Opening
This article is based on experience in the project that I led. Project was successful and technics described here worked well at real situation. Here is rough description of environment.
- Software is built on Linux host.
- Most source codes are written in C and built with GNUMake.
- Some of them were external sources (ex. sources from 3rd parties or from opensource projects). And most of them uses autotools for building.
- Various toolchains are used for cross-compiling.
- Java, Python and more tools are participated in build.
Purpose
Imagine following cases.
- Developer saids that code changes that are affected to only one executable binary, are committed to software(henceforth SW) repository. And packages built from latest source code are built from repository. Then, it is released to verificaiton team. In this situation, can verification team believes what developer says and skip testing others except for affected binary?
- Build system and infra structure are improved. Then, can we say that these improvement doens't affect to our SW packages?
In case that we can say, "This build output is exactly same with that build output, semantically", we can improve build system and infra freely with confidence. And verification team can save efforts to test with same software.
Difficulties
Workspace path for build
Even if cloud becomes popular, still lots of build system uses bare-metal machine to build SW packages especially in case of large SW products - ex. file-system images for mobile devices(ex. Android). Main reason is building this huge SW uses system resources so heavily that using container or VM may cause loss at build performance(time). And to separate each build job safely - to avoid potential issues caused by sharing same workspace among several build jobs, usually build agent uses unique directory path as build workspace. For example, something like /build/PROJECT-NAME/BUILD-ID/ (BUILD-ID is unique string to identify this build job). This means that directory path is changed at every build even if build job uses same source code.
This causes problems by combining with following examples, at CmpBO.
- FILE or BASE_FILE macro in C/C++.
- Debugging information (ex. DWARF section at ELF).
- Any codes using file path.
Order of files in a directory
At some filesystem - ex. ext4 - order of files in a directory is not deterministic. For example, following commands may show different results at different machines.
$ touch a b c d e f
$ ls -U
Because of this, file created by link, zip or tar from other machine, may not be same(in terms of binary), even if files in a directory are exactly same. But, semantically, as you know, they are same.
Time
At some project, current time is used at build. It may be used as seed to generate random number or information for debugging. Anyway, time is different at every build even if source code is same. For example, you may find TIME or DATE macro in C/C++ files. Then build results are always different at every build in terms of binary.
Inappropriate command line
Usually, various type of host machine is used as build agent. So, even if all other environment are same, host machine may be different. In this case, if someone uses command line that is dependent on host matchine, build outputs may be different based on which host machine is used. For example
$ gcc -march=natvie ...
is well-known and famous compile option of this case.
Unexpected behavior of compiler
Some C/C++ compilers generate different object files even if there is not logical changes.
For instance, in case that some header files are changed, but those changes don't affect anything to C file - source file - logically, we may expect that same object files are generated.
But, in case of some compilers - assuming gcc for example - , different - in terms of elf comparison - outputs may be generated if source files are compiled with optimization enabled - ex -O2 option. But, even if compilers generate different object files with -O2 option, they may create same object files with -O0 option because logically sources are not changed even if some header files are changed.
Therefore, even if compile outputs are different in terms of binary comparison, it's very difficult to say that build outputs are changed because logically, they are not changed.
Workaround
Because of difficulties above, it's almost impossible to compare outputs as they are. But, usually, main purpose of comparing is to tell "which outputs are semantically changed?". That is, we don't need to compare outputs of relesable build. Let me introduce 4 ways I used to workaround these difficulties to generate outputs only for CmpBO(again, this is NOT FOR RELEASE!).
Fix workspace path for build
One popular option is "Let build agents run in container with fixed workspace path". But at this project, this is rejected by Infra. team. So, instead of running build agent in container, we just run build-container inside build agent with volume-mapping(map to fixed path).
Fix order of files in a directory
In my case, ext4 filesystem is used for build-partition. As mentioned above, file order in a directory need to be fixed among build agents. It is possible by using same hash-seed at ext4. For details, please refer my previous article here
Do not use inappropriate command line for release build.
command line generating outputs dependent on host machine is, usually, not correct in terms of build for general release. So, this kind of command line should be fixed suitable for general release. Even at quick glance, -march=naive
is not proper option for release build.
Fix time
This is very hacky and dangerous. But, time is always most headaching part at CmpBO. Time always changed and unpredictable! Furthermore system even cannot be run correctly without time! So, we need to fix time only for build. And I used hack using LD_PRELOAD (For details, please refer my previous short article here). Here are steps.
Create library overriding POSIX functions getting time.
Code compiled to libfixtm.so
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <errno.h>
// #define D(args...) printf(args)
#define D(args...) do {} while(0)
/*
* IMPORTANT NOTE
* ==============
*
* CASE: 'LD_PRELOAD=./libfixtm.so java XXX'
* tv_sec = 1000000
* tv_nsec / tv_usec = 0
* --------------------------------------------------------
* Env: openjdk 11.0.3 2019-04-16
*
* 'java' or 'javac' process never ended if tv_sec is large enough.
* But, it's ok if 'tv_sec' is 0, 1 or 2 etc.
* It's based on experimental results.
* Detail analysis is NOT performed yet.
*
* >>> 'tv_sec' should be small enough.
*
*
* CASE: 'LD_PRELOAD=./libfixtm.so python XXX' (Python 2.x)
* tv_sec = 0
* tv_nsec / tv_usec = 0
* --------------------------------------------------------
* Env: Python 2.7.15+
*
* time.time() fails at python 2.x with following error.
* Traceback (most recent call last):
* File "main.py", line 5, in <module>
* ts = time.time()
* IOError: [Errno 0] Error
*
* >>> 'tv_sec' > 0 for python 2.x.
*/
#define MAGIC_TIME_SEC 1
time_t
time(time_t *t) {
D("*** time\n");
errno = 0;
if (t) *t = MAGIC_TIME_SEC;
return MAGIC_TIME_SEC;
}
int
timespec_get(struct timespec *ts, int base) {
D("*** timespec_get\n");
errno = 0;
if (!ts) {
errno = EFAULT;
return 0;
}
ts->tv_sec = MAGIC_TIME_SEC;
ts->tv_nsec = 0;
return base;
}
int
clock_gettime(clockid_t clk_id, struct timespec *ts) {
D("*** clock_gettime\n");
errno = 0;
if (!ts) {
errno = EFAULT;
return -1;
}
ts->tv_sec = MAGIC_TIME_SEC;
ts->tv_nsec = 0;
return 0;
}
int
gettimeofday(struct timeval *tv, struct timezone *tz) {
D("*** gettimeofday\n");
errno = 0;
if (!tv) return 0;
tv->tv_sec = MAGIC_TIME_SEC;
tv->tv_usec = 0;
return 0;
}
Verification of time-overriding
With libfixtm.so generated above, I observed that time is fixed at most major cases at my build project. Please refer below for details.
C
File: Main.c
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <string.h>
int
main() {
struct timespec ts;
struct timeval tv;
clock_gettime(CLOCK_MONOTONIC, &ts);
if ((1 != ts.tv_sec) || ts.tv_nsec)
return 1;
gettimeofday(&tv, NULL);
if ((1 != tv.tv_sec) || tv.tv_usec)
return 1;
return 0;
}
Console:
$ gcc main.c
$ LD_PRELOAD=$(pwd)/libtmfix.so ./a.out
$ echo $?
0
Summary: function clock_gettime and gettimeofday are successfully overriden by libtmfix.so.
GCC
File: a.c
#include <stdio.h>
void main() {
printf("%s\n", __TIME__);
}
Console:
$ gcc --version
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ LD_PRELOAD=$(pwd)/libtmfix.so gcc -o a0 a.c
$ sleep 2
$ LD_PRELOAD=$(pwd)/libtmfix.so gcc -o a1 a.c
$ cmp a0 a1
$ echo $?
0
Summary: Values of time related macros are fixed at gcc.
Java
File: Main.java
import java.util.Date;
public class Main {
public static void main(String[] args) {
if (1000 != System.currentTimeMillis())
System.exit(1);
if (1000 != (new Date()).getTime())
System.exit(1);
}
}
Console:
$ javac --version
javac 11.0.3
$ java --version
openjdk 11.0.3 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)
$ javac Main.java
$ LD_PRELOAD=$(pwd)/libtmfix.so java Main
$ echo $?
0
Python
File: Main.py
import time;
import datetime;
import calendar
ts = time.time()
assert ts == 1.0
dt = datetime.datetime.now()
if hasattr(dt, 'timestamp'):
ts = dt.timestamp()
assert ts == 1.0
ts = calendar.timegm(time.gmtime())
assert ts == 1
Console:
$ python --version
Python 2.7.15+
$ python3 --version
Python 3.6.8
$ LD_PRELOAD=$(pwd)/libtmfix.so python main.py
$ echo $?
0
$ LD_PRELOAD=$(pwd)/libtmfix.so python3 main.py
$ echo $?
0
Applying to build
Some tools may never ends(ex. in case of using timeout). Workaround is overriding this tool with a proxy by changing PATH during build for CmpBO. This is sample bash script of proxy executable for ping.
#!/bin/bash
unset LD_PRELOAD
/bin/ping $*
Change known C/C++ macros to others having fixed value.
Some toolchains used for cross-compile are NOT dynamic executables. That means, above hack doesn't work. So, before build, all DATE and TIME macros in C/C++ source codes are changed to VERSION. This is possible based on assumption that DATE and TIME are used only for debugging or information, that is, they doens't have any semantic meanings at software.
Check compiler's characteristics
Your compiler may not good at comparing build-outputs, if it generates different outputs of same logic as mentioned above. In this case, you may have to give up comparing or need to disable optimization - most compiler generates same outputs if it is not logically changed and optimization is disabled.
Comparing
Even if it looks that every environment is same, it's very rare that two build outputs are same in terms of binary. For example, tar has last modification time of file at it's header (See wikipedia). And, in case of JSON, follwoing two are semantically same but different at binary.
{"a": 1,"b": 2}
{"b": 2,"a": 1}
So, for CmpBO, tools for semantic comparison depending on file type are needed. For example, eu-elfcmp may be used to compare Elf files
Comments
If you are lucky enough, you can do these with very small efforts without anything above, like follows.
- Checkout base source code files.
- Build them and check timestamp of outputs.
- Checkout changes committed after base.
- Build them again incrementally.
- Find outputs that timestamp is changed.
But, this works only with well structured and defined build system. And cooperation of teams in charge of modules participated in build, are essential.
Consequence
At my project, I successfully make system that can tell which binaries are sementically changed between two different source codes. I hope this hacks or workaround are helpful for you. Enjoy it!