icyllis.arc3d.core.FP16

public final class FP16 extends Object

The FP16 class is a wrapper and a utility class to manipulate half-precision 16-bit IEEE 754 floating point data types (also called fp16 or binary16). A half-precision float can be created from or converted to single-precision floats, and is stored in a short data type.

The IEEE 754 standard specifies an fp16 as having the following format:

Sign bit: 1 bit
Exponent width: 5 bits
Significand: 10 bits

The format is laid out as follows:

 1   11111   1111111111
 ^   --^--   -----^----
 sign  |          |_______ significand
       |
       -- exponent

Half-precision floating points can be useful to save memory and/or bandwidth at the expense of range and precision when compared to single-precision floating points (fp32).

To help you decide whether fp16 is the right storage type for you need, please refer to the table below that shows the available precision throughout the range of possible values. The precision column indicates the step size between two consecutive numbers in a specific part of the range.

Range start	Precision
0	1 ⁄ 16,777,216
1 ⁄ 16,384	1 ⁄ 16,777,216
1 ⁄ 8,192	1 ⁄ 8,388,608
1 ⁄ 4,096	1 ⁄ 4,194,304
1 ⁄ 2,048	1 ⁄ 2,097,152
1 ⁄ 1,024	1 ⁄ 1,048,576
1 ⁄ 512	1 ⁄ 524,288
1 ⁄ 256	1 ⁄ 262,144
1 ⁄ 128	1 ⁄ 131,072
1 ⁄ 64	1 ⁄ 65,536
1 ⁄ 32	1 ⁄ 32,768
1 ⁄ 16	1 ⁄ 16,384
1 ⁄ 8	1 ⁄ 8,192
1 ⁄ 4	1 ⁄ 4,096
1 ⁄ 2	1 ⁄ 2,048
1	1 ⁄ 1,024
2	1 ⁄ 512
4	1 ⁄ 256
8	1 ⁄ 128
16	1 ⁄ 64
32	1 ⁄ 32
64	1 ⁄ 16
128	1 ⁄ 8
256	1 ⁄ 4
512	1 ⁄ 2
1,024	1
2,048	2
4,096	4
8,192	8
16,384	16
32,768	32

This table shows that numbers higher than 1024 lose all fractional precision.

Field Summary

Fields

Modifier and Type

Field

Description

static final int

BYTES

The number of bytes used to represent a half-precision float value.

static final short

EPSILON

Epsilon is the difference between 1.0 and the next value representable by a half-precision floating-point.

static final int

EXPONENT_BIAS

The offset of the exponent from the actual value.

static final int

EXPONENT_SHIFT

The offset to shift by to obtain the exponent bits.

static final int

EXPONENT_SIGNIFICAND_MASK

The bitmask to AND with to obtain exponent and significand bits.

static final short

LOWEST_VALUE

Smallest negative value a half-precision float may have.

static final int

MAX_EXPONENT

Maximum exponent a finite half-precision float may have.

static final short

MAX_VALUE

Maximum positive finite value a half-precision float may have.

static final int

MIN_EXPONENT

Minimum exponent a normalized half-precision float may have.

static final short

MIN_NORMAL

Smallest positive normal value a half-precision float may have.

static final short

MIN_VALUE

Smallest positive non-zero value a half-precision float may have.

static final short

NaN

A Not-a-Number representation of a half-precision float.

static final short

NEGATIVE_INFINITY

Negative infinity of type half-precision float.

static final short

NEGATIVE_ZERO

Negative 0 of type half-precision float.

static final short

POSITIVE_INFINITY

Positive infinity of type half-precision float.

static final short

POSITIVE_ZERO

Positive 0 of type half-precision float.

static final int

SHIFTED_EXPONENT_MASK

The bitmask to AND a number shifted by EXPONENT_SHIFT right, to obtain exponent bits.

static final int

SIGN_MASK

The bitmask to AND a number with to obtain the sign bit.

static final int

SIGN_SHIFT

The offset to shift by to obtain the sign bit.

static final int

SIGNIFICAND_MASK

The bitmask to AND a number with to obtain significand bits.

static final int

SIZE

The number of bits used to represent a half-precision float value.
Method Summary

Modifier and Type

Method

Description

static short

ceil(short h)

Returns the smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value.

static int

compare(short x, short y)

Compares the two specified half-precision float values.

static boolean

equals(short x, short y)

Returns true if the two half-precision float values are equal.

static short

floor(short h)

Returns the largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value.

static boolean

greater(short x, short y)

Returns true if the first half-precision float value is greater (larger toward positive infinity) than the second half-precision float value.

static boolean

greaterEquals(short x, short y)

Returns true if the first half-precision float value is greater (larger toward positive infinity) than or equal to the second half-precision float value.

static boolean

isInfinite(short h)

Returns true if the specified half-precision float value represents infinity, false otherwise.

static boolean

isNaN(short h)

Returns true if the specified half-precision float value represents a Not-a-Number, false otherwise.

static boolean

isNormalized(short h)

Returns true if the specified half-precision float value is normalized (does not have a subnormal representation).

static boolean

less(short x, short y)

Returns true if the first half-precision float value is less (smaller toward negative infinity) than the second half-precision float value.

static boolean

lessEquals(short x, short y)

Returns true if the first half-precision float value is less (smaller toward negative infinity) than or equal to the second half-precision float value.

static short

max(short x, short y)

Returns the larger of two half-precision float values (the value closest to positive infinity).

static short

min(short x, short y)

Returns the smaller of two half-precision float values (the value closest to negative infinity).

static short

rint(short h)

Returns the closest integral half-precision float value to the specified half-precision float value.

static float

toFloat(short h)

Converts the specified half-precision float value into a single-precision float value.

static short

toHalf(float f)

Converts the specified single-precision float value into a half-precision float value.

static String

toHexString(short h)

Returns a hexadecimal string representation of the specified half-precision float value.

static short

trunc(short h)

Returns the truncated half-precision float value of the specified half-precision float value.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- SIZE
  
  public static final int SIZE
  
  The number of bits used to represent a half-precision float value.
  See Also:
  
  Constant Field Values
- BYTES
  
  public static final int BYTES
  
  The number of bytes used to represent a half-precision float value.
  See Also:
  
  Constant Field Values
- EPSILON
  
  public static final short EPSILON
  
  Epsilon is the difference between 1.0 and the next value representable by a half-precision floating-point.
  See Also:
  
  Constant Field Values
- MAX_EXPONENT
  
  public static final int MAX_EXPONENT
  
  Maximum exponent a finite half-precision float may have.
  See Also:
  
  Constant Field Values
- MIN_EXPONENT
  
  public static final int MIN_EXPONENT
  
  Minimum exponent a normalized half-precision float may have.
  See Also:
  
  Constant Field Values
- LOWEST_VALUE
  
  public static final short LOWEST_VALUE
  
  Smallest negative value a half-precision float may have.
  See Also:
  
  Constant Field Values
- MAX_VALUE
  
  public static final short MAX_VALUE
  
  Maximum positive finite value a half-precision float may have.
  See Also:
  
  Constant Field Values
- MIN_NORMAL
  
  public static final short MIN_NORMAL
  
  Smallest positive normal value a half-precision float may have.
  See Also:
  
  Constant Field Values
- MIN_VALUE
  
  public static final short MIN_VALUE
  
  Smallest positive non-zero value a half-precision float may have.
  See Also:
  
  Constant Field Values
- NaN
  
  public static final short NaN
  
  A Not-a-Number representation of a half-precision float.
  See Also:
  
  Constant Field Values
- NEGATIVE_INFINITY
  
  public static final short NEGATIVE_INFINITY
  
  Negative infinity of type half-precision float.
  See Also:
  
  Constant Field Values
- NEGATIVE_ZERO
  
  public static final short NEGATIVE_ZERO
  
  Negative 0 of type half-precision float.
  See Also:
  
  Constant Field Values
- POSITIVE_INFINITY
  
  public static final short POSITIVE_INFINITY
  
  Positive infinity of type half-precision float.
  See Also:
  
  Constant Field Values
- POSITIVE_ZERO
  
  public static final short POSITIVE_ZERO
  
  Positive 0 of type half-precision float.
  See Also:
  
  Constant Field Values
- SIGN_SHIFT
  
  public static final int SIGN_SHIFT
  
  The offset to shift by to obtain the sign bit.
  See Also:
  
  Constant Field Values
- EXPONENT_SHIFT
  
  public static final int EXPONENT_SHIFT
  
  The offset to shift by to obtain the exponent bits.
  See Also:
  
  Constant Field Values
- SIGN_MASK
  
  public static final int SIGN_MASK
  
  The bitmask to AND a number with to obtain the sign bit.
  See Also:
  
  Constant Field Values
- SHIFTED_EXPONENT_MASK
  
  public static final int SHIFTED_EXPONENT_MASK
  
  The bitmask to AND a number shifted by EXPONENT_SHIFT right, to obtain exponent bits.
  See Also:
  
  Constant Field Values
- SIGNIFICAND_MASK
  
  public static final int SIGNIFICAND_MASK
  
  The bitmask to AND a number with to obtain significand bits.
  See Also:
  
  Constant Field Values
- EXPONENT_SIGNIFICAND_MASK
  
  public static final int EXPONENT_SIGNIFICAND_MASK
  
  The bitmask to AND with to obtain exponent and significand bits.
  See Also:
  
  Constant Field Values
- EXPONENT_BIAS
  
  public static final int EXPONENT_BIAS
  
  The offset of the exponent from the actual value.
  See Also:
  
  Constant Field Values
Method Details
- compare
  
  public static int compare(short x, short y)
  Compares the two specified half-precision float values. The following conditions apply during the comparison:
  
  NaN is considered by this method to be equal to itself and greater than all other half-precision float values (including #POSITIVE_INFINITY)
  
  POSITIVE_ZERO is considered by this method to be greater than NEGATIVE_ZERO.
  Parameters:
  
  x - The first half-precision float value to compare.
  
  y - The second half-precision float value to compare
  
  Returns:
  
  The value 0 if x is numerically equal to y, a value less than 0 if x is numerically less than y, and a value greater than 0 if x is numerically greater than y
- rint
  
  public static short rint(short h)
  Returns the closest integral half-precision float value to the specified half-precision float value. Special values are handled in the following ways:
  
  If the specified half-precision float is NaN, the result is NaN
  
  If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
  
  If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  The value of the specified half-precision float rounded to the nearest half-precision float value
- ceil
  
  public static short ceil(short h)
  Returns the smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value. Special values are handled in the following ways:
  
  If the specified half-precision float is NaN, the result is NaN
  
  If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
  
  If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  The smallest half-precision float value toward negative infinity greater than or equal to the specified half-precision float value
- floor
  
  public static short floor(short h)
  Returns the largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value. Special values are handled in the following ways:
  
  If the specified half-precision float is NaN, the result is NaN
  
  If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
  
  If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  The largest half-precision float value toward positive infinity less than or equal to the specified half-precision float value
- trunc
  
  public static short trunc(short h)
  Returns the truncated half-precision float value of the specified half-precision float value. Special values are handled in the following ways:
  
  If the specified half-precision float is NaN, the result is NaN
  
  If the specified half-precision float is infinity (negative or positive), the result is infinity (with the same sign)
  
  If the specified half-precision float is zero (negative or positive), the result is zero (with the same sign)
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  The truncated half-precision float value of the specified half-precision float value
- min
  
  public static short min(short x, short y)
  Returns the smaller of two half-precision float values (the value closest to negative infinity). Special values are handled in the following ways:
  
  If either value is NaN, the result is NaN
  
  NEGATIVE_ZERO is smaller than POSITIVE_ZERO
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  The smaller of the two specified half-precision values
- max
  
  public static short max(short x, short y)
  Returns the larger of two half-precision float values (the value closest to positive infinity). Special values are handled in the following ways:
  
  If either value is NaN, the result is NaN
  
  POSITIVE_ZERO is greater than NEGATIVE_ZERO
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  The larger of the two specified half-precision values
- less
  
  public static boolean less(short x, short y)
  
  Returns true if the first half-precision float value is less (smaller toward negative infinity) than the second half-precision float value. If either of the values is NaN, the result is false.
  
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  True if x is less than y, false otherwise
- lessEquals
  
  public static boolean lessEquals(short x, short y)
  
  Returns true if the first half-precision float value is less (smaller toward negative infinity) than or equal to the second half-precision float value. If either of the values is NaN, the result is false.
  
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  True if x is less than or equal to y, false otherwise
- greater
  
  public static boolean greater(short x, short y)
  
  Returns true if the first half-precision float value is greater (larger toward positive infinity) than the second half-precision float value. If either of the values is NaN, the result is false.
  
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  True if x is greater than y, false otherwise
- greaterEquals
  
  public static boolean greaterEquals(short x, short y)
  
  Returns true if the first half-precision float value is greater (larger toward positive infinity) than or equal to the second half-precision float value. If either of the values is NaN, the result is false.
  
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  True if x is greater than y, false otherwise
- equals
  
  public static boolean equals(short x, short y)
  
  Returns true if the two half-precision float values are equal. If either of the values is NaN, the result is false. POSITIVE_ZERO and NEGATIVE_ZERO are considered equal.
  
  Parameters:
  
  x - The first half-precision value
  
  y - The second half-precision value
  
  Returns:
  
  True if x is equal to y, false otherwise
- isInfinite
  
  public static boolean isInfinite(short h)
  
  Returns true if the specified half-precision float value represents infinity, false otherwise.
  
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  True if the value is positive infinity or negative infinity, false otherwise
- isNaN
  
  public static boolean isNaN(short h)
  
  Returns true if the specified half-precision float value represents a Not-a-Number, false otherwise.
  
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  True if the value is a NaN, false otherwise
- isNormalized
  
  public static boolean isNormalized(short h)
  
  Returns true if the specified half-precision float value is normalized (does not have a subnormal representation). If the specified value is POSITIVE_INFINITY, NEGATIVE_INFINITY, POSITIVE_ZERO, NEGATIVE_ZERO, NaN or any subnormal number, this method returns false.
  
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  True if the value is normalized, false otherwise
- toFloat
  
  public static float toFloat(short h)
  Converts the specified half-precision float value into a single-precision float value. The following special cases are handled:
  
  If the input is NaN, the returned value is Float.NaN
  
  If the input is POSITIVE_INFINITY or NEGATIVE_INFINITY, the returned value is respectively Float.POSITIVE_INFINITY or Float.NEGATIVE_INFINITY
  
  If the input is 0 (positive or negative), the returned value is +/-0.0f
  
  Otherwise, the returned value is a normalized single-precision float value
  Parameters:
  
  h - The half-precision float value to convert to single-precision
  
  Returns:
  
  A normalized single-precision float value
- toHalf
  
  public static short toHalf(float f)
  Converts the specified single-precision float value into a half-precision float value. The following special cases are handled:
  
  If the input is NaN (see Float.isNaN(float)), the returned value is NaN
  
  If the input is Float.POSITIVE_INFINITY or Float.NEGATIVE_INFINITY, the returned value is respectively POSITIVE_INFINITY or NEGATIVE_INFINITY
  
  If the input is 0 (positive or negative), the returned value is POSITIVE_ZERO or NEGATIVE_ZERO
  
  If the input is a less than MIN_VALUE, the returned value is flushed to POSITIVE_ZERO or NEGATIVE_ZERO
  
  If the input is a less than MIN_NORMAL, the returned value is a denorm half-precision float
  
  Otherwise, the returned value is rounded to the nearest representable half-precision float value
  Parameters:
  
  f - The single-precision float value to convert to half-precision
  
  Returns:
  
  A half-precision float value
- toHexString
  
  public static String toHexString(short h)
  Returns a hexadecimal string representation of the specified half-precision float value. If the value is a NaN, the result is "NaN", otherwise the result follows this format:
  
  If the sign is positive, no sign character appears in the result
  
  If the sign is negative, the first character is '-'
  
  If the value is inifinity, the string is "Infinity"
  
  If the value is 0, the string is "0x0.0p0"
  
  If the value has a normalized representation, the exponent and significand are represented in the string in two fields. The significand starts with "0x1." followed by its lowercase hexadecimal representation. Trailing zeroes are removed unless all digits are 0, then a single zero is used. The significand representation is followed by the exponent, represented by "p", itself followed by a decimal string of the unbiased exponent
  
  If the value has a subnormal representation, the significand starts with "0x0." followed by its lowercase hexadecimal representation. Trailing zeroes are removed unless all digits are 0, then a single zero is used. The significand representation is followed by the exponent, represented by "p-14"
  Parameters:
  
  h - A half-precision float value
  
  Returns:
  
  A hexadecimal string representation of the specified value

Class FP16

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

SIZE

BYTES

EPSILON

MAX_EXPONENT

MIN_EXPONENT

LOWEST_VALUE

MAX_VALUE

MIN_NORMAL

MIN_VALUE

NaN

NEGATIVE_INFINITY

NEGATIVE_ZERO

POSITIVE_INFINITY

POSITIVE_ZERO

SIGN_SHIFT

EXPONENT_SHIFT

SIGN_MASK

SHIFTED_EXPONENT_MASK

SIGNIFICAND_MASK

EXPONENT_SIGNIFICAND_MASK

EXPONENT_BIAS

Method Details

compare

rint

ceil

floor

trunc

min

max

less

lessEquals

greater

greaterEquals

equals

isInfinite

isNaN

isNormalized

toFloat

toHalf

toHexString