Binary Integer Fixed-Point Numbers

01 Mar 1998

NOTE: THIS DOCUMENT IS OBSOLETE, PLEASE CHECK THE NEW VERSION: "Mathematics of the Discrete Fourier Transform (DFT), with Audio Applications --- Second Edition", by Julius O. Smith III, W3K Publishing, 2007, ISBN 978-0-9745607-4-8. - Copyright © 2017-09-28 by Julius O. Smith III - Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

<< Previous page TOC INDEX Next page >>

Binary Integer Fixed-Point Numbers

Most prevalent computer languages only offer two kinds of numbers,floating-point and integer fixed-point. On present-day computers, all numbers are encoded using binary digits (called ''bits'') which are either 1 or 0.^4.7 In C, C++, and Java, floating-point variables are declared as float (32 bits) or double (64 bits), while integer fixed-point variables are declared as short int(typically 16 bits and never less), long int (typically 32 bits and never less), or simply int (typically the same as a long int, but sometimes between short and long). For an 8-bit integer, one can use the char datatype (8 bits).

Since C was designed to accommodate a wide range of hardware, including old mini-computers, some lattitude was historically allowed in the choice of these bit-lengths. The sizeof operator is officially the ''right way'' for a C program to determine the number of bytes in various data types at run-time, e.g. sizeof(long). (The word int can be omitted after short or long.) Nowadays, however, shorts are always 16 bits (at least on all the major platforms), ints are 32 bits, and longs are typically 32 bits on 32-bit computers and 64 bits on 64-bit computers (although some C/C++ compilers use long long int to declare 64-bit ints). Table 4.2 gives the lengths currently used by GNUC/C++ compilers (usually called ''gcc'' or ''cc'') on 64-bit processors.^4.8

Table 4.2:Byte sizes of GNU C/C++ data types for 64-bit architectures.

Type Bytes Notes

char 1

short 2

int 4

long 8 (4 bytes on 32-bit machines)

long long 8 (may become 16 bytes)

type * 8 (any pointer)

float 4

double 8

long double 8 (may become 10 bytes)

size_t 8 (type of sizeof())

T* - T* 8 (pointer arithmetic)

Java, which is designed to be platform independent, defines a long int as equivalent in precision to 64 bits, an intas 32 bits, a short int as 16 bits, and additionally a byte int as an 8-bit int. Similarly, the ''Structured Audio Orchestra Language'' (SAOL) (pronounced ''sail'')--the sound-synthesis component of the newMPEG-4audio compression standard--requires only that the underlying number system be at least as accurate as 32-bit floats. All ints discussed thus far are signed integer formats. C and C++ also support unsigned versions of all int types, and they range from to instead of $-2^{N-1}$ to $2^{N-1}-1$ , where is the number of bits. Finally, an unsigned char is often used for integers that only range between 0 and 255.

Subsections

One's Complement Fixed-Point Format

Two's Complement Fixed-Point Format

General Formula for Two's-Complement, Integer Fixed-Point Numbers

''Little Endian'' Formula for Two's-Complement, Integer Fixed-Point Numbers

<< Previous page TOC INDEX Next page >>

Binary Integer Fixed-Point Numbers

GUIDE: Mathematics of the Discrete Fourier Transform (DFT) - Julius O. Smith III. Binary Integer Fixed-Point Numbers

Binary Integer Fixed-Point Numbers