Float Rounding Error
Contents |
by David Goldberg, published in the March, 1991 issue of Computing Surveys. Copyright 1991, Association for Computing round off error in floating point representation Machinery, Inc., reprinted by permission. Abstract Floating-point arithmetic is
Truncation Error Vs Rounding Error
considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous
Floating Point Error Example
in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to
Floating Point Python
compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, round off error in numerical method continues with a discussion of the IEEE floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point. Categories and Subject Descriptors: (Primary) C.0 [Computer Systems Organization]: General -- instruction set design; D.3.4 [Programming Languages]: Processors -- compilers, optimization; G.1.0 [Numerical Analysis]: General -- computer arithmetic, error analysis, numerical algorithms (Secondary) D.2.1 [Software Engineering]: Requirements/Specifications -- languages; D.3.4 Programming Languages]: Formal Definitions and Theory -- semantics; D.4.1 Operating Systems]: Process Management -- synchronization. General Terms: Algorithms, Design, Languages Additional Key Words and Phrases: Denormalized number, exception, floating-point, floating-point standard, gradual underflow, guard digit, NaN, overflow, relative error, rounding error, rounding mode, ulp, underflow. Introduction Builders of computer systems often need information about floating-point arithmetic. There are, however, remarkably few sources of detailed information about it. One of the few books on the subject, Floating-Point Computation by Pat Sterbenz, is long out of
a number and its exact mathematical value due to rounding. This is a form of quantization error.[3] One of the goals of numerical analysis floating point arithmetic examples is to estimate errors in calculations, including round-off error, when using rounding errors excel approximation equations and/or algorithms, especially when using finitely many digits to represent real numbers (which in theory round off error java have infinitely many digits).[4] When a sequence of calculations subject to rounding error is made, errors may accumulate, sometimes dominating the calculation. In ill-conditioned problems, significant error may https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html accumulate.[5] Contents 1 Representation error 2 See also 3 References 4 External links Representation error[edit] The error introduced by attempting to represent a number using a finite string of digits is a form of round-off error called representation error.[6] Here are some examples of representation error in decimal representations: Notation Representation Approximation Error 1/7 0.142857 0.142857 https://en.wikipedia.org/wiki/Round-off_error 0.000000142857 ln 2 0.69314718055994530941... 0.693147 0.00000018055994530941... log10 2 0.30102999566398119521... 0.3010 0.00002999566398119521... ∛2 1.25992104989487316476... 1.25992 0.00000104989487316476... √2 1.41421356237309504880... 1.41421 0.00000356237309504880... e 2.71828182845904523536... 2.718281828459045 0.00000000000000023536... π 3.14159265358979323846... 3.141592653589793 0.00000000000000023846... Increasing the number of digits allowed in a representation reduces the magnitude of possible round-off errors, but any representation limited to finitely many digits will still cause some degree of round-off error for uncountably many real numbers. Additional digits used for intermediary steps of a calculation are known as guard digits.[7] Rounding multiple times can cause error to accumulate.[8] For example, if 9.945309 is rounded to two decimal places (9.95), then rounded again to one decimal place (10.0), the total error is 0.054691. Rounding 9.945309 to one decimal place (9.9) in a single step introduces less error (0.045309). This commonly occurs when performing arithmetic operations (See Loss of Significance). See also[edit] Precision (arithmetic) Truncation Rounding Loss of significance Floating point Kahan summation algorithm Machine epsilon Wilkinson's polynomial References[edit] ^ Butt, Rizwan (2009), Introduction
base 2 (binary) fractions. For example, the decimal fraction 0.125 has value 1/10 + 2/100 + 5/1000, and in the same way the https://docs.python.org/2/tutorial/floatingpoint.html binary fraction 0.001 has value 0/2 + 0/4 + 1/8. https://www.cs.drexel.edu/~introcs/Fa09/extras/Rounding/index.html These two fractions have identical values, the only real difference being that the first is written in base 10 fractional notation, and the second in base 2. Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in floating point general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine. The problem is easier to understand at first in base 10. Consider the fraction 1/3. You can approximate that as a base 10 fraction: 0.3 or, better, 0.33 or, better, 0.333 and so on. round off error No matter how many digits you're willing to write down, the result will never be exactly 1/3, but will be an increasingly better approximation of 1/3. In the same way, no matter how many base 2 digits you're willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100110011... Stop at any finite number of bits, and you get an approximation. On a typical machine running Python, there are 53 bits of precision available for a Python float, so the value stored internally when you enter the decimal number 0.1 is the binary fraction 0.00011001100110011001100110011001100110011001100110011010 which is close to, but not exactly equal to, 1/10. It's easy to forget that the stored value is an approximation to the original decimal fraction, because of the way that floats are displayed at the interpreter prompt. Python only prints a decimal approximation to the true deci
integers in a certain range, depending on the word size used for integers. Certain floating-point numbers may also be represented exactly, depending on the representation scheme in use on the computer in question and the word size used for floating-point numbers. Certain floating-point numbers cannot be represented exactly, regardless of the word size used. Errors due to rounding have long been the bane of analysts trying to solve equations and systems. Such errors may be introduced in many ways, for instance: inexact representation of a constant integer overflow resulting from a calculation with a result too large for the word size integer overflow resulting from a calculation with a result too large for the number of bits used to represent the mantissa of a floating-point number accumulated error resulting from repeated use of numbers stored inexactly Integer Representation A digital computer can represent exactly integers in the range 0..2n -1, where n is the number of bits used to represent an integer (typically 16 or 32; 216-1=65535, 232-1=4,294,967,295). Usually, half these integers are used to represent negative numbers, so the effective range is -2n-1..2n-1 (-32,768..32,767 for 16 bits, -2,147,483,648.. 2,147,483,647 for 32 bits). To accomplish this, "two's complement" representation is typically used so that a negative number k is represented by adding a "bias term" of 2n to get k+2n. For instance, if n=4 bits is used (a number too small to be likely, but useful for illustrative purposes) the numbers -8..7 may be represented, by adding a bias term of 24=16 so that the negative numbers -8..-1 are represented as 8..15 (see below). Four bits: 0 0000 4 0100 8 1000 12 1100 1 0001 5 0101 9 1001 13 1101 2 0010 6 0110 10 1000 14 1110 3 0011 7 0111 11 1011 15 1111 Two's complement: 0 0000 4 0100 -8 1000 -4 1100 1 0001 5 0101 -7 1001 -3 1101 2 0010 6 0110 -6 1000 -2 1110 3 0011 7 0111 -5 1011 -1 1111 Overflow Overflow occurs when an arithmetic calculation results in an integer too large for the word size. For instance, with n=4 bits, the result of adding 6+7 is 13, which exceeds the maximum positive integer (7). The result of this calculation is 1101, which is interpreted as -3. In a more likely example, the result of adding 20000+20000 results in an integer too large for 16-bit integers (with the result interpreted as -25536). Floating-Point Representation The exact way floating-point numbers are represented varies between computing platforms, although the same basic ideas apply in general. The general representation scheme used for floating-point numbers is based on the notion of 'scientific notation form' (e.g., the number 2