libgcc/config/rs6000/ibm-ldouble-format - rust-lang/gcc - Git at Google

 Long double format
 ==================

   Each long double is made up of two IEEE doubles.  The value of the
 long double is the sum of the values of the two parts (except for
 -0.0).  The most significant part is required to be the value of the
 long double rounded to the nearest double, as specified by IEEE.  For
 Inf values, the least significant part is required to be one of +0.0
 or -0.0.  No other requirements are made; so, for example, 1.0 may be
 represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
 is don't-care.

 Classification
 --------------

 A long double can represent any value of the form
   s * 2^e * sum(k=0...105: f_k * 2^(-k))
 where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
 1, and f_k for k>0 is 0 or 1.  These are the 'normal' long doubles.

 A long double can also represent any value of the form
   s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
 where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1.  These are
 the 'subnormal' long doubles.

 There are four long doubles that represent zero, two that represent
 +0.0 and two that represent -0.0.  The sign of the high part is the
 sign of the long double, and the sign of the low part is ignored.

 Likewise, there are four long doubles that represent infinities, two
 for +Inf and two for -Inf.

 Each NaN, quiet or signalling, that can be represented as a 'double'
 can be represented as a 'long double'.  In fact, there are 2^64
 equivalent representations for each one.

 There are certain other valid long doubles where both parts are
 nonzero but the low part represents a value which has a bit set below
 2^(e-105).  These, together with the subnormal long doubles, make up
 the denormal long doubles.

 Many possible long double bit patterns are not valid long doubles.
 These do not represent any value.

 Limits
 ------

 The maximum representable long double is 2^1024-2^918.  The smallest
 *normal* positive long double is 2^-968.  The smallest denormalised
 positive long double is 2^-1074 (this is the same as for 'double').

 Conversions
 -----------

 A double can be converted to a long double by adding a zero low part.

 A long double can be converted to a double by removing the low part.

 Comparisons
 -----------

 Two long doubles can be compared by comparing the high parts, and if
 those compare equal, comparing the low parts.

 Arithmetic
 ----------

 The unary negate operation operates by negating the low and high parts.

 An absolute or absolute-negate operation must be done by comparing
 against zero and negating if necessary.

 Addition and subtraction are performed using library routines.  They
 are not at present performed perfectly accurately, the result produced
 will be within 1ulp of the range generated by adding or subtracting
 1ulp from the input values, where a 'ulp' is 2^(e-106) given the
 exponent 'e'.  In the presence of cancellation, this may be
 arbitrarily inaccurate.  Subtraction is done by negation and addition.

 Multiplication is also performed using a library routine.  Its result
 will be within 2ulp of the correct result.

 Division is also performed using a library routine.  Its result will
 be within 3ulp of the correct result.


 Copyright (C) 2004-2025 Free Software Foundation, Inc.

 Copying and distribution of this file, with or without modification,
 are permitted in any medium without royalty provided the copyright
 notice and this notice are preserved.
	Long double format
	==================

	Each long double is made up of two IEEE doubles. The value of the
	long double is the sum of the values of the two parts (except for
	-0.0). The most significant part is required to be the value of the
	long double rounded to the nearest double, as specified by IEEE. For
	Inf values, the least significant part is required to be one of +0.0
	or -0.0. No other requirements are made; so, for example, 1.0 may be
	represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
	is don't-care.

	Classification
	--------------

	A long double can represent any value of the form
	s * 2^e * sum(k=0...105: f_k * 2^(-k))
	where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
	1, and f_k for k>0 is 0 or 1. These are the 'normal' long doubles.

	A long double can also represent any value of the form
	s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
	where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1. These are
	the 'subnormal' long doubles.

	There are four long doubles that represent zero, two that represent
	+0.0 and two that represent -0.0. The sign of the high part is the
	sign of the long double, and the sign of the low part is ignored.

	Likewise, there are four long doubles that represent infinities, two
	for +Inf and two for -Inf.

	Each NaN, quiet or signalling, that can be represented as a 'double'
	can be represented as a 'long double'. In fact, there are 2^64
	equivalent representations for each one.

	There are certain other valid long doubles where both parts are
	nonzero but the low part represents a value which has a bit set below
	2^(e-105). These, together with the subnormal long doubles, make up
	the denormal long doubles.

	Many possible long double bit patterns are not valid long doubles.
	These do not represent any value.

	Limits
	------

	The maximum representable long double is 2^1024-2^918. The smallest
	normal positive long double is 2^-968. The smallest denormalised
	positive long double is 2^-1074 (this is the same as for 'double').

	Conversions
	-----------

	A double can be converted to a long double by adding a zero low part.

	A long double can be converted to a double by removing the low part.

	Comparisons
	-----------

	Two long doubles can be compared by comparing the high parts, and if
	those compare equal, comparing the low parts.

	Arithmetic
	----------

	The unary negate operation operates by negating the low and high parts.

	An absolute or absolute-negate operation must be done by comparing
	against zero and negating if necessary.

	Addition and subtraction are performed using library routines. They
	are not at present performed perfectly accurately, the result produced
	will be within 1ulp of the range generated by adding or subtracting
	1ulp from the input values, where a 'ulp' is 2^(e-106) given the
	exponent 'e'. In the presence of cancellation, this may be
	arbitrarily inaccurate. Subtraction is done by negation and addition.

	Multiplication is also performed using a library routine. Its result
	will be within 2ulp of the correct result.

	Division is also performed using a library routine. Its result will
	be within 3ulp of the correct result.


	Copyright (C) 2004-2025 Free Software Foundation, Inc.

	Copying and distribution of this file, with or without modification,
	are permitted in any medium without royalty provided the copyright
	notice and this notice are preserved.