Chapter - 1

Key Words: overviewpipelineaudio processing

Vector Quantization

🎯 Learning Objectives

By the end of this lesson, you should be able to:

  • Explain why scalar quantization is suboptimal for correlated signals.
  • Form vectors from sample streams and interpret them as points in RN\mathbb{R}^N
  • Describe Voronoi regions and the nearest-neighbor decision rule in vector form.
  • State and apply the Linde–Buzo–Gray (LBG) algorithm to train a codebook.
  • Compute centroid updates (sample averages) for codewords.
  • Explain the geometric advantage of VQ (denser packing, lower distortion).

Introduction:

Scalar quantization assumes a memoryless source: individual samples are statistically independent. Many real signals (speech, audio, images, temperature series) exhibit memory: neighboring samples are statistically dependent. Vector Quantization (VQ) groups consecutive samples into vectors and quantizes those vectors jointly, which captures correlation and reduces distortion.


Vector formation (clean vector notation):

For vector dimension NN form non-overlapping block vectors:

xn=[xnN, xnN+1, …, xnN+Nβˆ’1]⊀∈RN\mathbf{x}_n = \big[x_{nN},\, x_{nN+1},\, \ldots,\, x_{nN+N-1}\big]^\top \in \mathbb{R}^N

Each block xn\mathbf{x}_n is a point in NN-dimensional Euclidean space. Using capital-bold for codewords and region labels avoids confusion.


Example (scalar sequence):

s=[ 23, 45, 21, 4,β€‰βˆ’23,β€‰βˆ’4 ]s = [\,23,\,45,\,21,\,4,\, -23,\, -4\,]

With N=2N=2 we form:

x0=[23, 45]⊀,x1=[21, 4]⊀,x2=[βˆ’23,β€‰βˆ’4]⊀\mathbf{x}_0 = [23,\,45]^\top,\quad \mathbf{x}_1 = [21,\,4]^\top,\quad \mathbf{x}_2 = [-23,\,-4]^\top

Why Vector Quantization?

  • Scalar quantizers treat each scalar independently and typically form rectangular cells aligned with axes.
  • Vector quantizers place codewords yk∈RN\mathbf{y}_k\in\mathbb{R}^N where the data actually lies and partition space into Voronoi regions.
  • This enables denser packing (reduced average distance to a codeword) and lower mean-squared error for the same bit rate.

Nearest-neighbor decision rule (vector form)

A vector x\mathbf{x} is assigned to the region corresponding to the codeword with minimum Euclidean distance:

k(x)β€…β€Š=β€…β€Šarg⁑min⁑jβ€…β€Šβˆ₯xβˆ’yjβˆ₯2k(\mathbf{x}) \;=\; \arg\min_{j} \; \big\|\mathbf{x} - \mathbf{y}_j\big\|_2

where the Euclidean norm is

βˆ₯xβˆ’ykβˆ₯2β€…β€Š=β€…β€Šβˆ‘i=1N(xiβˆ’yk,i)2\big\|\mathbf{x} - \mathbf{y}_k\big\|_2 \;=\; \sqrt{\sum_{i=1}^N \big(x_i - y_{k,i}\big)^2}

Use this to build Voronoi partitions:

Rkβ€…β€Š=β€…β€Š{ x∈RN:βˆ₯xβˆ’ykβˆ₯2≀βˆ₯xβˆ’yjβˆ₯2Β βˆ€j }R_k \;=\; \big\{\,\mathbf{x}\in\mathbb{R}^N : \|\mathbf{x}-\mathbf{y}_k\|_2 \le \|\mathbf{x}-\mathbf{y}_j\|_2\ \forall j \,\big\}

Voronoi region boundary (midpoint hyperplane between two codewords)

The boundary between neighboring codewords yk\mathbf{y}_k and yk+1\mathbf{y}_{k+1} is the set of points equidistant to both. The midpoint (point on the line joining the two codewords) is:

bk,k+1β€…β€Š=β€…β€Šyk+yk+12\mathbf{b}_{k,k+1} \;=\; \frac{\mathbf{y}_k + \mathbf{y}_{k+1}}{2}

The decision hyperplane is perpendicular to yk+1βˆ’yk\mathbf{y}_{k+1}-\mathbf{y}_k and passes through bk,k+1\mathbf{b}_{k,k+1}


Linde–Buzo–Gray (LBG) Algorithm β€” Vector Lloyd-Max

The LBG algorithm iteratively minimizes average squared error by alternating partition and centroid update steps.

  1. Choose number of codewords MM and initialize codebook {yk}k=1M\{\mathbf{y}_k\}_{k=1}^M (random or seeded).
  2. Partition step: assign each training vector xi\mathbf{x}_i to the closest codeword:
Rk←{xi:k=arg⁑min⁑jβˆ₯xiβˆ’yjβˆ₯2}R_k \leftarrow \big\{ \mathbf{x}_i : k = \arg\min_j \|\mathbf{x}_i - \mathbf{y}_j\|_2 \big\}
  1. Centroid update: recompute each codeword as the centroid (sample average) of its assigned vectors:
yk←1∣Rkβˆ£βˆ‘x∈Rkx,if ∣Rk∣>0\mathbf{y}_k \leftarrow \frac{1}{|R_k|}\sum_{\mathbf{x}\in R_k} \mathbf{x},\qquad \text{if } |R_k|>0
  1. Repeat steps 2–3 until codeword changes are below threshold Ο΅\epsilon or distortion converges.

If you have access to a continuous PDF p(x)p(\mathbf{x}), the centroid becomes the conditional expectation:

ykβ€…β€Š=β€…β€Šβˆ«Rkx p(x) dx∫Rkp(x) dx\mathbf{y}_k \;=\; \frac{\int_{R_k} \mathbf{x}\, p(\mathbf{x})\, d\mathbf{x}}{\int_{R_k} p(\mathbf{x})\, d\mathbf{x}}

Worked numerical example (clean vector math)

Training data (as vectors, N=2N=2):

{x1,x2,x3,x4}={[3,2]⊀, [4,5]⊀, [7,8]⊀, [8,9]⊀}\{\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4\} = \{[3,2]^\top,\,[4,5]^\top,\,[7,8]^\top,\,[8,9]^\top\}

Initial codewords:

y1(0)=[1,2]⊀,y2(0)=[5,6]⊀\mathbf{y}_1^{(0)}=[1,2]^\top,\quad \mathbf{y}_2^{(0)}=[5,6]^\top

Compute distances and assign:

  • x1=[3,2]\mathbf{x}_1=[3,2] is closest to y1(0)\mathbf{y}_1^{(0)}
  • x2,x3,x4\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4 are closest to y2(0)\mathbf{y}_2^{(0)}

Centroid updates:

y1(1)=11[3,2]⊀=[3,2]⊀\mathbf{y}_1^{(1)}=\frac{1}{1}[3,2]^\top=[3,2]^\top
y2(1)=13([4,5]⊀+[7,8]⊀+[8,9]⊀)=[4+7+83, 5+8+93]⊀=[6.3β€Ύ, 7.3β€Ύ]⊀\mathbf{y}_2^{(1)}=\frac{1}{3}\big([4,5]^\top+[7,8]^\top+[8,9]^\top\big) = \big[ \tfrac{4+7+8}{3},\, \tfrac{5+8+9}{3} \big]^\top = [6.\overline{3},\,7.\overline{3}]^\top

Repeat partition/update until convergence.


Encoder & Decoder (precise flow)

Encoder

  1. Form non-overlapping vectors xn\mathbf{x}_n
  2. For each xn\mathbf{x}_n compute kn=arg⁑min⁑jβˆ₯xnβˆ’yjβˆ₯2k_n = \arg\min_j \|\mathbf{x}_n - \mathbf{y}_j\|_2
  3. Transmit index kn(log2(M)k_n (logβ‚‚(M) bits if fixed-length indices.

Decoder

  1. Receive index knk_n
  2. Reconstruct vector x^n=ykn\widehat{\mathbf{x}}_n = \mathbf{y}_{k_n}
  3. Concatenate x^n\widehat{\mathbf{x}}_n vectors to form output sample stream.

Reconstruction rule:

x^=ykwhenΒ k=arg⁑min⁑jβˆ₯xβˆ’yjβˆ₯2\widehat{\mathbf{x}} = \mathbf{y}_k \quad\text{when } k = \arg\min_j \|\mathbf{x}-\mathbf{y}_j\|_2

Practical considerations

  • Training set: choose representative samples (training set) that match target signal statistics. Use separate test set for evaluation.
  • Empty cells: if a region RkR_k becomes empty, reinitialize yk\mathbf{y}_k (e.g., split a high-distortion codeword or pick a random training vector).
  • High-dimensionality: as NN increases, data becomes sparse β†’ need larger training sets to estimate centroids robustly.
  • Complexity: nearest-neighbor search can be accelerated with KD-trees or approximate NN methods for large M.

Key Takeaways

  • Write vectors as x\mathbf{x} and codewords as yk\mathbf{y}_k to avoid ambiguity.
  • VQ partitions RN\mathbb{R}^N into Voronoi regions and assigns vectors by nearest neighbor.
  • LBG iteratively updates codewords to centroids (sample averages) β€” a vector Lloyd–Max.
  • VQ can reduce distortion significantly for correlated sources (speech, images).

Visual Demonstration

Vector Quantizer β€” LBG (Advanced)

1.00x
Tip: use Split Init to initialize codewords by iterative splitting (common LBG trick). Use Play to animate iterations until convergence.

Visual Demonstration

Audio Vector Quantizer (LBG) β€” Interactive

1.00x
Iteration: 0 β€’ Distortion: β€”Β  β€’ Vectors: 0

Distortion vs Iteration

Reconstruction uses simple frame energy scaling for demonstration (keeps duration).

🧠 Quick Quiz

1) What is the main purpose of vector quantization?

2) In the LBG algorithm, what does the centroid update step compute?

3) What geometric structure defines the decision regions in VQ?

4) What criterion determines vector-to-codeword assignment in VQ?

5) Why is codebook stored at both encoder and decoder in VQ?