Class FP16
The FP16
class is a wrapper and a utility class to manipulate half-precision 16-bit
IEEE 754
floating point data types (also called fp16 or binary16). A half-precision float can be
created from or converted to single-precision floats, and is stored in a short data type.
The IEEE 754 standard specifies an fp16 as having the following format:
- Sign bit: 1 bit
- Exponent width: 5 bits
- Significand: 10 bits
The format is laid out as follows:
1 11111 1111111111 ^ --^-- -----^---- sign | |_______ significand | -- exponent
Half-precision floating points can be useful to save memory and/or bandwidth at the expense of range and precision when compared to single-precision floating points (fp32).
To help you decide whether fp16 is the right storage type for you need, please refer to the table below that shows the available precision throughout the range of possible values. The precision column indicates the step size between two consecutive numbers in a specific part of the range.
Range start | Precision |
---|---|
0 | 1 ⁄ 16,777,216 |
1 ⁄ 16,384 | 1 ⁄ 16,777,216 |
1 ⁄ 8,192 | 1 ⁄ 8,388,608 |
1 ⁄ 4,096 | 1 ⁄ 4,194,304 |
1 ⁄ 2,048 | 1 ⁄ 2,097,152 |
1 ⁄ 1,024 | 1 ⁄ 1,048,576 |
1 ⁄ 512 | 1 ⁄ 524,288 |
1 ⁄ 256 | 1 ⁄ 262,144 |
1 ⁄ 128 | 1 ⁄ 131,072 |
1 ⁄ 64 | 1 ⁄ 65,536 |
1 ⁄ 32 | 1 ⁄ 32,768 |
1 ⁄ 16 | 1 ⁄ 16,384 |
1 ⁄ 8 | 1 ⁄ 8,192 |
1 ⁄ 4 | 1 ⁄ 4,096 |
1 ⁄ 2 | 1 ⁄ 2,048 |
1 | 1 ⁄ 1,024 |
2 | 1 ⁄ 512 |
4 | 1 ⁄ 256 |
8 | 1 ⁄ 128 |
16 | 1 ⁄ 64 |
32 | 1 ⁄ 32 |
64 | 1 ⁄ 16 |
128 | 1 ⁄ 8 |
256 | 1 ⁄ 4 |
512 | 1 ⁄ 2 |
1,024 | 1 |
2,048 | 2 |
4,096 | 4 |
8,192 | 8 |
16,384 | 16 |
32,768 | 32 |
This table shows that numbers higher than 1024 lose all fractional precision.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The number of bytes used to represent a half-precision float value.static final short
Epsilon is the difference between 1.0 and the next value representable by a half-precision floating-point.static final int
The offset of the exponent from the actual value.static final int
The offset to shift by to obtain the exponent bits.static final int
The bitmask to AND with to obtain exponent and significand bits.static final short
Smallest negative value a half-precision float may have.static final int
Maximum exponent a finite half-precision float may have.static final short
Maximum positive finite value a half-precision float may have.static final int
Minimum exponent a normalized half-precision float may have.static final short
Smallest positive normal value a half-precision float may have.static final short
Smallest positive non-zero value a half-precision float may have.static final short
A Not-a-Number representation of a half-precision float.static final short
Negative infinity of type half-precision float.static final short
Negative 0 of type half-precision float.static final short
Positive infinity of type half-precision float.static final short
Positive 0 of type half-precision float.static final int
The bitmask to AND a number shifted byEXPONENT_SHIFT
right, to obtain exponent bits.static final int
The bitmask to AND a number with to obtain the sign bit.static final int
The offset to shift by to obtain the sign bit.static final int
The bitmask to AND a number with to obtain significand bits.static final int
The number of bits used to represent a half-precision float value. -
Method Summary
Modifier and TypeMethodDescriptionstatic short
ceil
(short h) Returns the smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value.static int
compare
(short x, short y) Compares the two specified half-precision float values.static boolean
equals
(short x, short y) Returns true if the two half-precision float values are equal.static short
floor
(short h) Returns the largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value.static boolean
greater
(short x, short y) Returns true if the first half-precision float value is greater (larger toward positive infinity) than the second half-precision float value.static boolean
greaterEquals
(short x, short y) Returns true if the first half-precision float value is greater (larger toward positive infinity) than or equal to the second half-precision float value.static boolean
isInfinite
(short h) Returns true if the specified half-precision float value represents infinity, false otherwise.static boolean
isNaN
(short h) Returns true if the specified half-precision float value represents a Not-a-Number, false otherwise.static boolean
isNormalized
(short h) Returns true if the specified half-precision float value is normalized (does not have a subnormal representation).static boolean
less
(short x, short y) Returns true if the first half-precision float value is less (smaller toward negative infinity) than the second half-precision float value.static boolean
lessEquals
(short x, short y) Returns true if the first half-precision float value is less (smaller toward negative infinity) than or equal to the second half-precision float value.static short
max
(short x, short y) Returns the larger of two half-precision float values (the value closest to positive infinity).static short
min
(short x, short y) Returns the smaller of two half-precision float values (the value closest to negative infinity).static short
rint
(short h) Returns the closest integral half-precision float value to the specified half-precision float value.static float
toFloat
(short h) Converts the specified half-precision float value into a single-precision float value.static short
toHalf
(float f) Converts the specified single-precision float value into a half-precision float value.static String
toHexString
(short h) Returns a hexadecimal string representation of the specified half-precision float value.static short
trunc
(short h) Returns the truncated half-precision float value of the specified half-precision float value.
-
Field Details
-
SIZE
public static final int SIZEThe number of bits used to represent a half-precision float value.- See Also:
-
BYTES
public static final int BYTESThe number of bytes used to represent a half-precision float value.- See Also:
-
EPSILON
public static final short EPSILONEpsilon is the difference between 1.0 and the next value representable by a half-precision floating-point.- See Also:
-
MAX_EXPONENT
public static final int MAX_EXPONENTMaximum exponent a finite half-precision float may have.- See Also:
-
MIN_EXPONENT
public static final int MIN_EXPONENTMinimum exponent a normalized half-precision float may have.- See Also:
-
LOWEST_VALUE
public static final short LOWEST_VALUESmallest negative value a half-precision float may have.- See Also:
-
MAX_VALUE
public static final short MAX_VALUEMaximum positive finite value a half-precision float may have.- See Also:
-
MIN_NORMAL
public static final short MIN_NORMALSmallest positive normal value a half-precision float may have.- See Also:
-
MIN_VALUE
public static final short MIN_VALUESmallest positive non-zero value a half-precision float may have.- See Also:
-
NaN
public static final short NaNA Not-a-Number representation of a half-precision float.- See Also:
-
NEGATIVE_INFINITY
public static final short NEGATIVE_INFINITYNegative infinity of type half-precision float.- See Also:
-
NEGATIVE_ZERO
public static final short NEGATIVE_ZERONegative 0 of type half-precision float.- See Also:
-
POSITIVE_INFINITY
public static final short POSITIVE_INFINITYPositive infinity of type half-precision float.- See Also:
-
POSITIVE_ZERO
public static final short POSITIVE_ZEROPositive 0 of type half-precision float.- See Also:
-
SIGN_SHIFT
public static final int SIGN_SHIFTThe offset to shift by to obtain the sign bit.- See Also:
-
EXPONENT_SHIFT
public static final int EXPONENT_SHIFTThe offset to shift by to obtain the exponent bits.- See Also:
-
SIGN_MASK
public static final int SIGN_MASKThe bitmask to AND a number with to obtain the sign bit.- See Also:
-
SHIFTED_EXPONENT_MASK
public static final int SHIFTED_EXPONENT_MASKThe bitmask to AND a number shifted byEXPONENT_SHIFT
right, to obtain exponent bits.- See Also:
-
SIGNIFICAND_MASK
public static final int SIGNIFICAND_MASKThe bitmask to AND a number with to obtain significand bits.- See Also:
-
EXPONENT_SIGNIFICAND_MASK
public static final int EXPONENT_SIGNIFICAND_MASKThe bitmask to AND with to obtain exponent and significand bits.- See Also:
-
EXPONENT_BIAS
public static final int EXPONENT_BIASThe offset of the exponent from the actual value.- See Also:
-
-
Method Details
-
compare
public static int compare(short x, short y) Compares the two specified half-precision float values. The following conditions apply during the comparison:
NaN
is considered by this method to be equal to itself and greater than all other half-precision float values (including#POSITIVE_INFINITY
)POSITIVE_ZERO
is considered by this method to be greater thanNEGATIVE_ZERO
.
- Parameters:
x
- The first half-precision float value to compare.y
- The second half-precision float value to compare- Returns:
- The value
0
ifx
is numerically equal toy
, a value less than0
ifx
is numerically less thany
, and a value greater than0
ifx
is numerically greater thany
-
rint
public static short rint(short h) Returns the closest integral half-precision float value to the specified half-precision float value. Special values are handled in the following ways:- If the specified half-precision float is NaN, the result is NaN
- If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
- If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
- Parameters:
h
- A half-precision float value- Returns:
- The value of the specified half-precision float rounded to the nearest half-precision float value
-
ceil
public static short ceil(short h) Returns the smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value. Special values are handled in the following ways:- If the specified half-precision float is NaN, the result is NaN
- If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
- If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
- Parameters:
h
- A half-precision float value- Returns:
- The smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value
-
floor
public static short floor(short h) Returns the largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value. Special values are handled in the following ways:- If the specified half-precision float is NaN, the result is NaN
- If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
- If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
- Parameters:
h
- A half-precision float value- Returns:
- The largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value
-
trunc
public static short trunc(short h) Returns the truncated half-precision float value of the specified half-precision float value. Special values are handled in the following ways:- If the specified half-precision float is NaN, the result is NaN
- If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
- If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
- Parameters:
h
- A half-precision float value- Returns:
- The truncated half-precision float value of the specified half-precision float value
-
min
public static short min(short x, short y) Returns the smaller of two half-precision float values (the value closest to negative infinity). Special values are handled in the following ways:- If either value is NaN, the result is NaN
NEGATIVE_ZERO
is smaller thanPOSITIVE_ZERO
- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- The smaller of the two specified half-precision values
-
max
public static short max(short x, short y) Returns the larger of two half-precision float values (the value closest to positive infinity). Special values are handled in the following ways:- If either value is NaN, the result is NaN
POSITIVE_ZERO
is greater thanNEGATIVE_ZERO
- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- The larger of the two specified half-precision values
-
less
public static boolean less(short x, short y) Returns true if the first half-precision float value is less (smaller toward negative infinity) than the second half-precision float value. If either of the values is NaN, the result is false.- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- True if x is less than y, false otherwise
-
lessEquals
public static boolean lessEquals(short x, short y) Returns true if the first half-precision float value is less (smaller toward negative infinity) than or equal to the second half-precision float value. If either of the values is NaN, the result is false.- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- True if x is less than or equal to y, false otherwise
-
greater
public static boolean greater(short x, short y) Returns true if the first half-precision float value is greater (larger toward positive infinity) than the second half-precision float value. If either of the values is NaN, the result is false.- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- True if x is greater than y, false otherwise
-
greaterEquals
public static boolean greaterEquals(short x, short y) Returns true if the first half-precision float value is greater (larger toward positive infinity) than or equal to the second half-precision float value. If either of the values is NaN, the result is false.- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- True if x is greater than y, false otherwise
-
equals
public static boolean equals(short x, short y) Returns true if the two half-precision float values are equal. If either of the values is NaN, the result is false.POSITIVE_ZERO
andNEGATIVE_ZERO
are considered equal.- Parameters:
x
- The first half-precision valuey
- The second half-precision value- Returns:
- True if x is equal to y, false otherwise
-
isInfinite
public static boolean isInfinite(short h) Returns true if the specified half-precision float value represents infinity, false otherwise.- Parameters:
h
- A half-precision float value- Returns:
- True if the value is positive infinity or negative infinity, false otherwise
-
isNaN
public static boolean isNaN(short h) Returns true if the specified half-precision float value represents a Not-a-Number, false otherwise.- Parameters:
h
- A half-precision float value- Returns:
- True if the value is a NaN, false otherwise
-
isNormalized
public static boolean isNormalized(short h) Returns true if the specified half-precision float value is normalized (does not have a subnormal representation). If the specified value isPOSITIVE_INFINITY
,NEGATIVE_INFINITY
,POSITIVE_ZERO
,NEGATIVE_ZERO
, NaN or any subnormal number, this method returns false.- Parameters:
h
- A half-precision float value- Returns:
- True if the value is normalized, false otherwise
-
toFloat
public static float toFloat(short h) Converts the specified half-precision float value into a single-precision float value. The following special cases are handled:
- If the input is
NaN
, the returned value isFloat.NaN
- If the input is
POSITIVE_INFINITY
orNEGATIVE_INFINITY
, the returned value is respectivelyFloat.POSITIVE_INFINITY
orFloat.NEGATIVE_INFINITY
- If the input is 0 (positive or negative), the returned value is +/-0.0f
- Otherwise, the returned value is a normalized single-precision float value
- Parameters:
h
- The half-precision float value to convert to single-precision- Returns:
- A normalized single-precision float value
- If the input is
-
toHalf
public static short toHalf(float f) Converts the specified single-precision float value into a half-precision float value. The following special cases are handled:
- If the input is NaN (see
Float.isNaN(float)
), the returned value isNaN
- If the input is
Float.POSITIVE_INFINITY
orFloat.NEGATIVE_INFINITY
, the returned value is respectivelyPOSITIVE_INFINITY
orNEGATIVE_INFINITY
- If the input is 0 (positive or negative), the returned value is
POSITIVE_ZERO
orNEGATIVE_ZERO
- If the input is a less than
MIN_VALUE
, the returned value is flushed toPOSITIVE_ZERO
orNEGATIVE_ZERO
- If the input is a less than
MIN_NORMAL
, the returned value is a denorm half-precision float - Otherwise, the returned value is rounded to the nearest representable half-precision float value
- Parameters:
f
- The single-precision float value to convert to half-precision- Returns:
- A half-precision float value
- If the input is NaN (see
-
toHexString
Returns a hexadecimal string representation of the specified half-precision float value. If the value is a NaN, the result is
"NaN"
, otherwise the result follows this format:- If the sign is positive, no sign character appears in the result
- If the sign is negative, the first character is
'-'
- If the value is inifinity, the string is
"Infinity"
- If the value is 0, the string is
"0x0.0p0"
- If the value has a normalized representation, the exponent and
significand are represented in the string in two fields. The significand
starts with
"0x1."
followed by its lowercase hexadecimal representation. Trailing zeroes are removed unless all digits are 0, then a single zero is used. The significand representation is followed by the exponent, represented by"p"
, itself followed by a decimal string of the unbiased exponent - If the value has a subnormal representation, the significand starts
with
"0x0."
followed by its lowercase hexadecimal representation. Trailing zeroes are removed unless all digits are 0, then a single zero is used. The significand representation is followed by the exponent, represented by"p-14"
- Parameters:
h
- A half-precision float value- Returns:
- A hexadecimal string representation of the specified value
-