Vector Quantization

🎯 Learning Objectives

By the end of this lesson, you should be able to:

Explain why scalar quantization is suboptimal for correlated signals.
Form vectors from sample streams and interpret them as points in $\mathbb{R}^N$
Describe Voronoi regions and the nearest-neighbor decision rule in vector form.
State and apply the Linde–Buzo–Gray (LBG) algorithm to train a codebook.
Compute centroid updates (sample averages) for codewords.
Explain the geometric advantage of VQ (denser packing, lower distortion).

Introduction:

Scalar quantization assumes a memoryless source: individual samples are statistically independent. Many real signals (speech, audio, images, temperature series) exhibit memory: neighboring samples are statistically dependent. Vector Quantization (VQ) groups consecutive samples into vectors and quantizes those vectors jointly, which captures correlation and reduces distortion.

Vector formation (clean vector notation):

For vector dimension $N$ form non-overlapping block vectors:

\mathbf{x}_n = \big[x_{nN},\, x_{nN+1},\, \ldots,\, x_{nN+N-1}\big]^\top \in \mathbb{R}^N

Each block $\mathbf{x}_n$ is a point in $N$ -dimensional Euclidean space. Using capital-bold for codewords and region labels avoids confusion.

Example (scalar sequence):

$s = [\,23,\,45,\,21,\,4,\, -23,\, -4\,]$

With $N=2$ we form:

$\mathbf{x}_0 = [23,\,45]^\top,\quad \mathbf{x}_1 = [21,\,4]^\top,\quad \mathbf{x}_2 = [-23,\,-4]^\top$

Why Vector Quantization?

Scalar quantizers treat each scalar independently and typically form rectangular cells aligned with axes.
Vector quantizers place codewords $\mathbf{y}_k\in\mathbb{R}^N$ where the data actually lies and partition space into Voronoi regions.
This enables denser packing (reduced average distance to a codeword) and lower mean-squared error for the same bit rate.

Nearest-neighbor decision rule (vector form)

A vector $\mathbf{x}$ is assigned to the region corresponding to the codeword with minimum Euclidean distance:

k(\mathbf{x}) \;=\; \arg\min_{j} \; \big\|\mathbf{x} - \mathbf{y}_j\big\|_2

where the Euclidean norm is

\big\|\mathbf{x} - \mathbf{y}_k\big\|_2 \;=\; \sqrt{\sum_{i=1}^N \big(x_i - y_{k,i}\big)^2}

Use this to build Voronoi partitions:

R_k \;=\; \big\{\,\mathbf{x}\in\mathbb{R}^N : \|\mathbf{x}-\mathbf{y}_k\|_2 \le \|\mathbf{x}-\mathbf{y}_j\|_2\ \forall j \,\big\}

Voronoi region boundary (midpoint hyperplane between two codewords)

The boundary between neighboring codewords $\mathbf{y}_k$ and $\mathbf{y}_{k+1}$ is the set of points equidistant to both. The midpoint (point on the line joining the two codewords) is:

\mathbf{b}_{k,k+1} \;=\; \frac{\mathbf{y}_k + \mathbf{y}_{k+1}}{2}

The decision hyperplane is perpendicular to $\mathbf{y}_{k+1}-\mathbf{y}_k$ and passes through $\mathbf{b}_{k,k+1}$

Linde–Buzo–Gray (LBG) Algorithm — Vector Lloyd-Max

The LBG algorithm iteratively minimizes average squared error by alternating partition and centroid update steps.

Choose number of codewords $M$ and initialize codebook $\{\mathbf{y}_k\}_{k=1}^M$ (random or seeded).
Partition step: assign each training vector $\mathbf{x}_i$ to the closest codeword:

R_k \leftarrow \big\{ \mathbf{x}_i : k = \arg\min_j \|\mathbf{x}_i - \mathbf{y}_j\|_2 \big\}

Centroid update: recompute each codeword as the centroid (sample average) of its assigned vectors:

\mathbf{y}_k \leftarrow \frac{1}{|R_k|}\sum_{\mathbf{x}\in R_k} \mathbf{x},\qquad \text{if } |R_k|>0

Repeat steps 2–3 until codeword changes are below threshold $\epsilon$ or distortion converges.

If you have access to a continuous PDF $p(\mathbf{x})$ , the centroid becomes the conditional expectation:

\mathbf{y}_k \;=\; \frac{\int_{R_k} \mathbf{x}\, p(\mathbf{x})\, d\mathbf{x}}{\int_{R_k} p(\mathbf{x})\, d\mathbf{x}}

Worked numerical example (clean vector math)

Training data (as vectors, $N=2$ ):

\{\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4\} = \{[3,2]^\top,\,[4,5]^\top,\,[7,8]^\top,\,[8,9]^\top\}

Initial codewords:

\mathbf{y}_1^{(0)}=[1,2]^\top,\quad \mathbf{y}_2^{(0)}=[5,6]^\top

Compute distances and assign:

$\mathbf{x}_1=[3,2]$ is closest to $\mathbf{y}_1^{(0)}$
$\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4$ are closest to $\mathbf{y}_2^{(0)}$

Centroid updates:

\mathbf{y}_1^{(1)}=\frac{1}{1}[3,2]^\top=[3,2]^\top

\mathbf{y}_2^{(1)}=\frac{1}{3}\big([4,5]^\top+[7,8]^\top+[8,9]^\top\big) = \big[ \tfrac{4+7+8}{3},\, \tfrac{5+8+9}{3} \big]^\top = [6.\overline{3},\,7.\overline{3}]^\top

Repeat partition/update until convergence.

Encoder & Decoder (precise flow)

Encoder

Form non-overlapping vectors $\mathbf{x}_n$
For each $\mathbf{x}_n$ compute $k_n = \arg\min_j \|\mathbf{x}_n - \mathbf{y}_j\|_2$
Transmit index $k_n (log₂(M)$ bits if fixed-length indices.

Decoder

Receive index $k_n$
Reconstruct vector $\widehat{\mathbf{x}}_n = \mathbf{y}_{k_n}$
Concatenate $\widehat{\mathbf{x}}_n$ vectors to form output sample stream.

Reconstruction rule:

\widehat{\mathbf{x}} = \mathbf{y}_k \quad\text{when } k = \arg\min_j \|\mathbf{x}-\mathbf{y}_j\|_2

Practical considerations

Training set: choose representative samples (training set) that match target signal statistics. Use separate test set for evaluation.

Empty cells: if a region $R_k$ becomes empty, reinitialize $\mathbf{y}_k$ (e.g., split a high-distortion codeword or pick a random training vector).

High-dimensionality: as $N$ increases, data becomes sparse → need larger training sets to estimate centroids robustly.

Complexity: nearest-neighbor search can be accelerated with KD-trees or approximate NN methods for large M.

Key Takeaways

Write vectors as $\mathbf{x}$ and codewords as $\mathbf{y}_k$ to avoid ambiguity.
VQ partitions $\mathbb{R}^N$ into Voronoi regions and assigns vectors by nearest neighbor.
LBG iteratively updates codewords to centroids (sample averages) — a vector Lloyd–Max.
VQ can reduce distortion significantly for correlated sources (speech, images).

Visual Demonstration

Vector Quantizer — LBG (Advanced)

Number of Codewords

Number of Vectors

Training Data Mode

Initialization

Speed

1.00x

Tip: use Split Init to initialize codewords by iterative splitting (common LBG trick). Use Play to animate iterations until convergence.

Visual Demonstration

Audio Vector Quantizer (LBG) — Interactive

Data source

Frame size (ms)

Codewords

Speed

1.00x

Iteration: 0 • Distortion: — • Vectors: 0

Distortion vs Iteration

Reconstruction uses simple frame energy scaling for demonstration (keeps duration).

🧠 Quick Quiz

1) What is the main purpose of vector quantization?

A) To convert analog signals to digital
B) To jointly quantize multiple samples to exploit correlation
C) To compute Fourier transforms efficiently
D) To increase sampling rate

2) In the LBG algorithm, what does the centroid update step compute?

A) The maximum-likelihood estimate of the noise
B) The median of the vectors in each region
C) The average (mean) of vectors assigned to each region
D) The gradient of the loss function

3) What geometric structure defines the decision regions in VQ?

A) Circular regions
B) Rectangular grid partitions
C) Hyperplanes forming Voronoi regions
D) Spherical shells

4) What criterion determines vector-to-codeword assignment in VQ?

A) Maximum dot product
B) Minimum Euclidean distance
C) Minimum variance
D) Maximum entropy

5) Why is codebook stored at both encoder and decoder in VQ?

A) To synchronize sampling clocks
B) To reconstruct vectors based only on transmitted indices
C) To reduce channel noise
D) To allow adaptive bit allocation

Chapter - 1

Key Words: overviewpipelineaudio processing

Vector Quantization

🎯 Learning Objectives

Introduction:

Vector formation (clean vector notation):

Why Vector Quantization?

Nearest-neighbor decision rule (vector form)

Voronoi region boundary (midpoint hyperplane between two codewords)

Linde–Buzo–Gray (LBG) Algorithm — Vector Lloyd-Max

Worked numerical example (clean vector math)

Initial codewords:

Centroid updates:

Encoder & Decoder (precise flow)

Practical considerations

Key Takeaways

Visual Demonstration

Vector Quantizer — LBG (Advanced)

Visual Demonstration

Audio Vector Quantizer (LBG) — Interactive

Distortion vs Iteration

🧠 Quick Quiz