Comparing Floats: How To Determine if Floating Quantities Are Close Enough Once a Tolerance Has Been Reached
UNLIKE INTEGERS, which are either different or equal, floating point numbers may be different, equal, or "close enough." This is a down-to-earth practitioner's dis-cussion on how to determine, in a robust manner, whether or not floating quantities are close enough, once a tolerance (the meaning of "close enough") has been assigned. The goal is a comparison procedure that is correct more often than not and reasonably efficient. Floating point jargon is taken from C/C++, and float means a 32-bit IEEE float. All reasoning can be extended to double precision and other floating point quantities.
While the inequality of floating point numbers is easily established using operators > or <, "close enough," or fuzzy equality requires more work. Consider the following numbers:
float a1 = 0.123456
float b1 = 0.123457
equal? Strictly speaking, a1==b1
will return false, but are a1
close enough to be considered equal? This decision requires some knowledge of the problem domain, usually represented by a tolerance. Let's call it feps
ilon. The tolerance is related to how many significant digits must match so that two numbers are close enough to be, for practical purposes, equal.
Since IEEE floats have at least six significant digits in decimal notation, this discussion will arbitrarily allow losing approximately one digit to error and will require five matching digits for equality by establishing a tolerance like
float feps = 0.00001
will return true, indicating that a1
are equal, as will feps>(b1-a1)
. The "close enough" equality test has become an inequality test against a difference. However, if you consider:
float a2 = 0.123456
float b2 = -0.123457
will return feps>0.246913
, which is false, while feps>(b2-a2)
will return feps> -0.246913
, which is incorrectly trueoops! Equality (and inequality as well) is symmetric: if a!=b
is true, then b!=a
must also be true. Fix: feps>fabsf(a2-b2)
will return false and so will feps>fabsf(b2-a2)
, at the cost of one fabsf
(or your favorite work-alike) per comparison.
Turning a comparison into an inequality against a difference, however, has shortcomings. Let:
float big1 = 1.23456e28
float big2 = 1.23457e28
equal? Their difference is a quantity of the order of 1.0e23
, or, in U.S. dollars, much larger than the net worth of Bill Gates. If both are the result of computations involving thousands of floating point operations, however, big1
are most likely the same number. Yet feps>fabsf(big1-big2)
will return false. How about setting feps=2.0e23
? Just kidding. There's morelet:
float small1 = 1.23456e-10
float small2 = 3.45678e-11
are truly different, and generously bigger than the smallest IEEE float larger than zero. However, feps>fabsf(small1-small2)
returns true so that small1
incorrectly appear to be equal. The same error happens if:
float small3 = 0.000004
float small4 = -0.000004
are used instead of small1
. Of course, you can always change feps to a suitably smaller value. Or can you? Let:
feps = 1.0e-10
is false, so they're different. But feps>fabsf(a1-b1)
is also false, sigh. A feps this small tests for too many significant digits (more than a float can afford). Consequently, it is too often as persnickety as the ==
operator. And feps>fabsf(small1-small2)
is still incorrectly true; the same happens with feps>fabsf(big1-big2)
, which is still (incorrectly) false.
Here is the point: a tolerance tested for inequality against a difference is a practical way to test "close enough" equality to zero. This method cannot account for "close enough" equality of two float numbers over their entire dynamic range.
To test fuzzy equality over the possible range of two float numbers, the ratio (vs. the absolute difference) of the numbers must be "close enough" to unity (vs. smaller than feps). Taking a ratio for a comparison is perhaps expensive in some cases, but is better than returning an incorrect result.
The value of fabsf(a1/b1) is ~0.999992. That of fabsf(b1/a1) is ~1.000008; 1+feps is 1.00001, and 1-feps is 0.99999. A seemingly correct ratio-test predicate such as:
(((1-feps) < fabsf(n/d)) && (fabsf(n/d) < (1+feps)))
=denominator is true for n<d
indifferently if n
is "close enough" to d
The value of fabsf(big1/big2) is the same as fabsf(a1/b1). The value of fabsf(small1/small2) is 3.57146, that of fabsf(small2/small1) is 0.280001; the ratio-test predicate is correctly false for both, indicating that small1 and small2 are different.
Here is the catch: the value of fabsf(a2/b2) is ~1.000008, that of fabsf(b2/a2) is ~0.999992 and the value of fabsf(small3/small4) is 1.0: in all three cases the seemingly correct ratio-test predicate is incorrectly true. But wait: fabsf is a carryover from the difference-test predicate. In fact, the ratio-test must not use fabsf, so the ratio must be signed: (-)/(-) or (+)/(+) is positive, hence potentially close to positive 1.0 and capable of scoring true, but (-)/(+) or (+)/(-) is negative and automatically disqualified from scoring true, as it should be (there are exceptions as noted later on).
The correct predicate for the ratio-test is then:
(((1-feps) < (n/d)) && ((n/d) < (1+feps)))
Now the value of (small3/small4)
is -1.0 and false, which is correct. The same is true for a2
. But don't breathe a sigh of relief just yet! Now, let:
float a3 = 1.0e36
float b3 = 1.0e-4
In this case, n/d
can be either 1.0e40
(oops: IEEE float overflow), or 1.0e-40
(oops: IEEE float underflow). Perhaps 1.0e40
will be interpreted as 0.0F
, and perhaps not. Almost certainly 1.0e40
will cause indigestion to your code. Here is the fix: before taking the ratio, check for overflow or underflow. Here is how.
The predicate ((d<1.0)&&(n>d*maxFloat)) is true (and safe, at least in C/C++) if n/d would overflow; if true, obviously the equality is false. The part (d<1.0) is needed because d*maxFloat may otherwise cause overflow.
The predicate ((d>1.0)&&(n<d*minFloat)) is true (and safe) if n/d would underflow. If d>1.0, then d*minFloat will not underflow. So long to floats.
Doubles, long doubles, and whatever else the future may provide require simple mechanical adjustments that a C++ compiler can be trained to take care of automatically. What about mixed-mode equality tests, like (double==float)? Your compiler will let you do it, as it will all sorts of other nasty things. Consider this admittedly contrived example:
#define PI 3.1415926535897932
double dpi = PI
float fpi = PI
double deps = 8.742278E-8
is false, but deps>fabsf(dpi-fpi)
is true. Is fpi
close enough to dpi
, or not? How can this be decided? In case it matters, who's looking forward to debugging this? Or even worse, would you ever know it happened? A C++ compiler can be trained to complain when attempting mixed-mode testing. Let's make it whine, and let's give heed to its advice.
Listing 1 gives the code to make it all happen. The ANSI standard class numeric_limits<T> provides basic machinery; the class nTol adds typed tolerances for floating point types using specialization. Exact types are dismissed early and nTol<type> needs no specialization for them. If specialization for a type is needed, but missing, the code will abort.
The template function testNumEq takes two arguments of the same type, hence mixed-type comparisons will not compile. It will accept integer arguments (just in case), and get rid of them as soon as possible, as none of the "close enough" code is necessary. All the static inline function calls should be optimized away by the compiler, leaving constant values of the correct floating point type in their place.
As listed, the code compiles on Microsoft Visual C++ 6.0 under NT 4.0. Removing the namespace std:: from calls to numeric_limits<> allows the code to compile happily on HP aCC 1.15 under HPUX 10.20. Results are equivalent on both platforms.
What is the runtime cost of all this?
At best, a function call and a comparison are required if a==b is true. When comparing to 0.0, a function call and no more than six comparisons are required. In most other cases, a function call plus one division, one multiplication, and no more than 14 comparisons are necessary.
Last but not least, user beware: all "close enough" or fuzzy evaluations have room for ambiguity. For instance, testNumEq(a, b) will return true for (a=0.0, b=1.0e-20), (a=0.0, b=1.0e-30), (a=0.0, b=1.0e-10), and (a=0.0, b=-1.0e-10), but will return false for (a=1.0e-20, b=1.0e-30), and (a=1.0e-10, a=-1.0e-10). That is to say equality is not transitive when 0.0 is in the picture, but sometimes it makes sense. For instance, if for some problem domains 1.e-6 and -1.e-6 are indeed "close enough" to each other to be the same number, it is important to understand that this truly means each of them is "close enough" to zero, rather than both being equal to each other. The correct test is, therefore:
(testNumEq(1.E-6F, 0.0F) && testNumEq(-1.E-6F, 0.0F))
which is true (depending on your tolerance of course). Indeed, testNumEq(-1.E-6F, 1.E-6F)
is false, as it should be.