Convolutional Decoder
Overview
National Instruments has partnered with Prentice Hall to bring you large portions of in-depth technical topics from several PTR RF and Communications books, including Digital Communications: Fundamentals and Applications, 2nd Edition. This series of content is designed for a broad range of audiences, from experts who want to review a specific topic to students who need easy-to-understand documentation for their projects.
For the complete list of RF topics, please visit the RF and Communications Resource Main Page.
Table of Contents
- 7.3 FORMULATION OF THE CONVOLUTIONAL DECODING PROBLEM
- 7.3.2 Channel Models: Hard versus Soft Decisions
- 7.3.2.1 Binary Symmetric Channel
- 7.3.2.2 Gaussian Channel
- 7.3.3 The Viterbi Convolutional Decoding Algorithm
- 7.3.4 An Example of Viterbi Convolutional Decoding
- 7.3.5 Decoder Implementation
- 7.3.5.1 Add-Compare-Select Computation
- 7.3.5.2 Add-Compare-Select as seen on the Trellis
- 7.3.6 Path Memory and Synchronization
- 7.4 PROPERTIES OF CONVOLUTIONAL CODES
- 7.4.1.1 Error-Correcting Capability of Convolutional Codes
- 7.4.2 Systematic and Nonsystematic Convolutional Codes
- 7.4.3 Catastrophic Error Propagation in Convolutional Codes
- 7.4.4 Performance Bounds for Convolutional Codes
- 7.4.5 Coding Gain
- 7.4.6 Best Known Convolutional Codes
- 7.4.7 Convolutional Code Rate Trace-Off
- 7.4.7.2 Performance with Noncoherent Orthogonal Signaling
- 7.4.8 Soft-Decision Viterbi Decoding
- Relevant NI products
- Buy the Book
7.3 FORMULATION OF THE CONVOLUTIONAL DECODING PROBLEM
7.3.1 Maximum Likelihood Decoding
If all input message sequences are equally likely, a decoder that achieves the minimum probability of error is one that compares the conditional probabilities, also called the likelihood functions P(Z|U(m)), where Z is the received sequence and
U(m) is one of the possible transmitted sequences, and chooses the maximum. The decoder chooses U(m´) if

The maximum likelihood concept, as stated in Equation (7.1), is a fundamental development of decision theory (see Appendix B); it is the formalization of a “common-sense” way to make decisions when there is statistical knowledge of the possibilities. In the binary demodulation treatment in Chapters 3 and 4 there wereonly two equally likely possible signals, s1(t) or s2(t ), that might have been transmitted.
Therefore, to make the binary maximum likelihood decision, given a received signal, meant only to decide that s1(t ) was transmitted if
![]()
otherwise, to decide that s2(t) was transmitted. The parameter z represents z(T),
the receiver predetection value at the end of each symbol duration time t = T. However,
when applying maximum likelihood to the convolutional decoding problem,
we observe that the convolutional code has memory (the received sequence represents
the superposition of current bits and prior bits). Thus, applying maximum
likelihood to the decoding of convolutionally encoded bits is performed in the context
of choosing the most likely sequence, as shown in Equation (7.1). There are
typically a multitude of possible codeword sequences that might have been transmitted.
To be specific, for a binary code, a sequence of L branch words is a member
of a set of 2L possible sequences. Therefore, in the maximum likelihood context, we
can say that the decoder chooses a particular U(m´) as the transmitted sequence if
the likelihood P(ZU(m´)) is greater than the likelihoods of all the other possible
transmitted sequences. Such an optimal decoder, which minimizes the error probability
(for the case where all transmitted sequences are equally likely), is known as
a maximum likelihood decoder. The likelihood functions are given or computed
from the specifications of the channel.
We will assume that the noise is additive white Gaussian with zero mean and
the channel is memoryless, which means that the noise affects each code symbol
independently of all the other symbols. For a convolutional code of rate 1/n, we
can therefore express the likelihood as

where Zi is the ith branch of the received sequence Z, Ui
(m) is the ith branch of a particular codeword sequence U(m), zji is the jth code symbol of Zi, andu ji ( m ) is the jth code symbol of Ui(m), and each branch comprises n code symbols. The decoder problem consists of
choosing a path through the trellis of Figure 7.7 (each possible path defines a codeword sequence) such
that

Generally, it is computationally more convenient to use the logarithm of the
likelihood function since this permits the summation, instead of the multiplication,
of terms. We are able to use this transformation because the logarithm is a monotonically
increasing function and thus will not alter the final result in our codeword
selection. We can define the log-likelihood function as

The decoder problem now consists of choosing a path through the tree of Figure
7.6 or the trellis of Figure 7.7 such that
is maximized. For the decoding of
convolutional codes, either the tree or the trellis structure can be used. In the tree
representation of the code, the fact that the paths remerge is ignored. Since for a
binary code, the number of possible sequences made up of L branch words is 2L,
maximum likelihood decoding of such a received sequence, using a tree diagram,
requires the “brute force” or exhaustive comparison of 2L accumulated loglikelihood
metrics, representing all the possible different codeword sequences that
could have been transmitted. Hence it is not practical to consider maximum likelihood
decoding with a tree structure. It is shown in a later section that with the use
of the trellis representation of the code, it is possible to configure a decoder which
can discard the paths that could not possibly be candidates for the maximum likelihood
sequence. The decoded path is chosen from some reduced set of surviving
paths. Such a decoder is still optimum in the sense that the decoded path is the
same as the decoded path obtained from a “brute force” maximum likelihood decoder,
but the early rejection of unlikely paths reduces the decoding complexity.
For an excellent tutorial on the structure of convolutional codes, maximum
likelihood decoding, and code performance, see Reference [8]. There are several
algorithms that yield approximate solutions to the maximum likelihood decoding
problem, including sequential [9, 10] and threshold [11]. Each of these algorithms is
suited to certain special applications, but are all suboptimal. In contrast, the Viterbi
decoding algorithm performs maximum likelihood decoding and is therefore optimal.
This does not imply that the Viterbi algorithm is best for every application;
there are severe constraints imposed by hardware complexity. The Viterbi
algorithm is considered in Sections 7.3.3 and 7.3.4.
7.3.2 Channel Models: Hard versus Soft Decisions
Before specifying an algorithm that will determine the maximum likelihood decision,
let us describe the channel. The codeword sequence U(m), made up of branch
words, with each branch word comprised of n code symbols, can be considered to
be an endless stream, as opposed to a block code, in which the source data and
their codewords are partitioned into precise block sizes. The codeword sequence
shown in Figure 7.1 emanates from the convolutional encoder and enters the modulator,
where the code symbols are transformed into signal waveforms. The modulation
may be baseband (e.g., pulse waveforms) or bandpass (e.g., PSK or FSK). In
general,
symbols at a time, where
is an integer, are mapped into signal waveforms
si(t), where i = 1, 2, . . . ,
When
= 1, the modulator maps each code
symbol into a binary waveform. The channel over which the waveform is transmitted
is assumed to corrupt the signal with Gaussian noise. When the corrupted signal
is received, it is first processed by the demodulator and then by the decoder.
Consider that a binary signal transmitted over a symbol interval (0, T) is represented
by s1(t) for a binary one and s2(t) for a binary zero. The received signal is r(t) =
si(t) + n(t), where n(t) is a zero-mean Gaussian noise process. In Chapter 3 we described
the detection of r(t) in terms of two basic steps. In the first step, the received
waveform is reduced to a single number, z(T) = ai + n0, where ai is the signal component
of z(T) and n0 is the noise component. The noise component, n0, is a zero-mean
Gaussian random variable, and thus z(T) is a Gaussian random variable with a mean
of either a1 or a2 depending on whether a binary one or binary zero was sent. In the
second step of the detection process a decision was made as to which signal was transmitted,
on the basis of comparing z(T) to a threshold. The conditional probabilities
of z(T), p(z|s1), and p(z|s2) are shown in Figure 7.8, labeled likelihood of s1 and likelihood
of s2. The demodulator in Figure 7.1, converts the set of time-ordered random
variables {z(T)} into a code sequence Z, and passes it on to the decoder. The demodulator
output can be configured in a variety of ways. It can be implemented to make
a firm or hard decision as to whether z(T) represents a zero or a one. In this case, the
output of the demodulator is quantized to two levels, zero and one, and fed into the
decoder (this is exactly the same threshold decision that was made in Chapters 3
Figure 7.8 Hard and soft decoding decisions.
and 4). Since the decoder operates on the hard decisions made by the demodulator,
the decoding is called hard-decision decoding.
The demodulator can also be configured to feed the decoder with a quantized
value of z(T) greater than two levels. Such an implementation furnishes the decoder
with more information than is provided in the hard-decision case. When the quantization
level of the demodulator output is greater than two, the decoding is called
soft-decision decoding. Eight levels (3-bits) of quantization are illustrated on the
abscissa of Figure 7.8. When the demodulator sends a hard binary decision to the
decoder, it sends it a single binary symbol. When the demodulator sends a soft binary
decision, quantized to eight levels, it sends the decoder a 3-bit word describing
an interval along z(T). In effect, sending such a 3-bit word in place of a single binary
symbol is equivalent to sending the decoder a measure of confidence along
with the code-symbol decision. Referring to Figure 7.8, if the demodulator sends
1 1 1 to the decoder, this is tantamount to declaring the code symbol to be a one
with very high confidence, while sending a 1 0 0 is tantamount to declaring the code
symbol to be a one with very low confidence. It should be clear that ultimately,
every message decision out of the decoder must be a hard decision; otherwise, one
might see computer printouts that read: “think it’s a 1,” “think it’s a 0,” and so on.
The idea behind the demodulator not making hard decisions and sending more
data (soft decisions) to the decoder can be thought of as an interim step to provide
the decoder with more information, which the decoder then uses for recovering the
message sequence (with better error performance than it could in the case of harddecision
decoding). In Figure 7.8, the 8-level soft-decision metric is often shown as
−7, −5, −3, −1, 1, 3, 5, 7. Such a designation lends itself to a simple interpretation of
the soft decision: The sign of the metric represents a decision (e.g., choose s1 if
positive, choose s2 if negative), and the magnitude of the metric represents the
confidence level of that decision. The only advantage for the metric shown in
Figure 7.8 is that it avoids the use of negative numbers.
For a Gaussian channel, eight-level quantization results in a performance improvement
of approximately 2 dB in required signal-to-noise ratio compared to
two-level quantization. This means that eight-level soft-decision decoding can provide
the same probability of bit error as that of hard-decision decoding, but
requires 2 dB less Eb/N0 for the same performance. Analog (or infinite-level quantization)
results in a 2.2-dB performance improvement over two-level quantization;
therefore, eight-level quantization results in a loss of approximately 0.2 dB compared
to infinitely fine quantization. For this reason, quantization to more than
eight levels can yield little performance improvement [12]. What price is paid for
such improved soft-decision-decoder performance? In the case of hard-decision
decoding, a single bit is used to describe each code symbol, while for eight-level
quantized soft-decision decoding 3 bits are used to describe each code symbol;
therefore, three times the amount of data must be handled during the decoding
process. Hence the price paid for soft-decision decoding is an increase in required
memory size at the decoder (and possibly a speed penalty).
Block decoding algorithms and convolutional decoding algorithms have been
devised to operate with hard or soft decisions. However, soft-decision decoding is
generally not used with block codes because it is considerably more difficult than
hard-decision decoding to implement. The most prevalent use of soft-decision
decoding is with the Viterbi convolutional decoding algorithm, since with Viterbi
decoding, soft decisions represent only a trivial increase in computation.
7.3.2.1 Binary Symmetric Channel
A binary symmetric channel (BSC) is a discrete memoryless channel (see
Section 6.3.1) that has binary input and output alphabets and symmetric transition
probabilities. It can be described by the conditional probabilities

as illustrated in Figure 7.9. The probability that an output symbol will differ from
the input symbol is p, and the probability that the output symbol will be identical to
the input symbol is (1 − p). The BSC is an example of a hard-decision channel,
which means that, even though continuous-valued signals may be received by the
demodulator, a BSC allows only firm decisions such that each demodulator output
symbol, zji, as shown in Figure 7.1, consists of one of two binary values. The indexing
of zji pertains to the jth code symbol of the ith branch word, Zi. The demodulator
then feeds the sequence Z = {Zi} to the decoder.
Let U(m) be a transmitted codeword over a BSC with symbol error probability
p, and let Z be the corresponding received decoder sequence. As noted previously,
a maximum likelihood decoder chooses the codeword U(m´) that maximizes the
likelihood P(ZU(m)) or its logarithm. For a BSC, this is equivalent to choosing the
codeword U(m´) that is closest in Hamming distance to Z [8]. Thus Hamming distance
is an appropriate metric to describe the distance or closeness of fit between
U(m) and Z. From all the possible transmitted sequences U(m), the decoder chooses
the U(m´) sequence for which the distance to Z is minimum.
Suppose that U(m) and Z are each L-bit-long sequences and that they differ in
dm positions [i.e., the Hamming distance between U(m) and Z is dm]. Then, since the

Figure 7.9 Binary symmetric channel (hard-decision channel).
channel is assumed to be memoryless, the probability that this U(m) was transformed
to the specific received Z at distance dm from it can be written as
![]()
and the log-likelihood function is

If we compute this quantity for each possible transmitted sequence, the last term in
the equation will be constant in each case. Assuming that p < 0.5, we can express
Equation (7.7) as
![]()
where A and B are positive constants. Therefore, choosing the codeword U(m´),
such that the Hamming distance dm to the received sequence Z is minimized, corresponds
to maximizing the likelihood or log-likelihood metric. Consequently, over a
BSC, the log-likelihood metric is conveniently replaced by the Hamming distance,
and a maximum likelihood decoder will choose, in the tree or trellis diagram, the
path whose corresponding sequence U(m´) is at the minimum Hamming distance to
the received sequence Z.
7.3.2.2 Gaussian Channel
For a Gaussian channel, each demodulator output symbol zji, as shown in
Figure 7.1, is a value from a continuous alphabet. The symbol zji cannot be labeled
as a correct or incorrect detection decision. Sending the decoder such soft decisions
can be viewed as sending a family of conditional probabilities of the different symbols
(see Section 6.3.1). It can be shown [8] that maximizing P(Z|U(m)) is equivalent
to maximizing the inner product between the codeword sequence U(m) (consisting
of binary symbols represented as bipolar values) and the analog-valued received
sequence Z. Thus, the decoder chooses the codeword U(m) if it maximizes

This is equivalent to choosing the codeword U(m´) that is closest in Euclidean distance
to Z. Even though the hard- and soft-decision channels require different metrics,
the concept of choosing the codeword U(m´) that is closest to the received
sequence, Z, is the same in both cases. To implement the maximization of Equation
(7.9) exactly, the decoder would have to be able to handle analog-valued arithmetic
operations. This is impractical because the decoder is generally implemented
digitally. Thus it is necessary to quantize the received symbols zji. Does Equation
(7.9) remind you of the demodulation treatment in Chapters 3 and 4? Equation
(7.9) is the discrete version of correlating an input received waveform, r(t), with a
reference waveform, si(t), as expressed in Equation (4.15). The quantized Gaussian
channel, typically referred to as a soft-decision channel, is the channel model
assumed for the soft-decision decoding described earlier.
7.3.3 The Viterbi Convolutional Decoding Algorithm
The Viterbi decoding algorithm was discovered and analyzed by Viterbi [13] in 1967.
The Viterbi algorithm essentially performs maximum likelihood decoding; however,
it reduces the computational load by taking advantage of the special structure
in the code trellis. The advantage of Viterbi decoding, compared with brute-force
decoding, is that the complexity of a Viterbi decoder is not a function of the number
of symbols in the codeword sequence. The algorithm involves calculating a
measure of similarity, or distance, between the received signal, at time ti, and all the
trellis paths entering each state at time ti. The Viterbi algorithm removes from consideration
those trellis paths that could not possibly be candidates for the maximum
likelihood choice. When two paths enter the same state, the one having the
best metric is chosen; this path is called the surviving path. This selection of surviving
paths is performed for all the states. The decoder continues in this way to advance
deeper into the trellis, making decisions by eliminating the least likely paths.
The early rejection of the unlikely paths reduces the decoding complexity. In 1969,
Omura [14] demonstrated that the Viterbi algorithm is, in fact, maximum likelihood.
Note that the goal of selecting the optimum path can be expressed, equivalently,
as choosing the codeword with the maximum likelihood metric, or as
choosing the codeword with the minimum distance metric.
7.3.4 An Example of Viterbi Convolutional Decoding
For simplicity, a BSC is assumed; thus Hamming distance is a proper distance measure.
The encoder for this example is shown in Figure 7.3, and the encoder trellis
diagram is shown in Figure 7.7. A similar trellis can be used to represent the decoder,
as shown in Figure 7.10. We start at time t1 in the 00 state (flushing the encoder
between messages provides the decoder with starting-state knowledge).
Since in this example, there are only two possible transitions leaving any state, not
all branches need be shown initially. The full trellis structure evolves after time t3.
The basic idea behind the decoding procedure can best be understood by examining
the Figure 7.7 encoder trellis in concert with the Figure 7.10 decoder trellis. For
the decoder trellis it is convenient at each time interval, to label each branch with
the Hamming distance between the received code symbols and the branch word
corresponding to the same branch from the encoder trellis. The example in Figure
7.10 shows a message sequence m, the corresponding codeword sequence U, and a
noise corrupted received sequence Z = 11 01 01 10 01 . . . . The branch words seen
on the encoder trellis branches characterize the encoder in Figure 7.3, and are
known a priori to both the encoder and the decoder. These encoder branch words
are the code symbols that would be expected to come from the encoder output as a
result of each of the state transitions. The labels on the decoder trellis branches are
accumulated by the decoder on the fly. That is, as the code symbols are received,

Figure 7.10 Decoder trellis diagram (rate 1/2 , K = 3).
each branch of the decoder trellis is labeled with a metric of similarity (Hamming
distance) between the received code symbols and each of the branch words for that
time interval. From the received sequence Z, shown in Figure 7.10, we see that the
code symbols received at (following) time t1 are 11. In order to label the decoder
branches at (departing) time t1 with the appropriate Hamming distance metric, we
look at the Figure 7.7 encoder trellis. Here we see that a state 00 → 00 transition
yields an output branch word of 00. But we received 11. Therefore, on the decoder
trellis we label the state 00 → 00 transition with Hamming distance between them,
namely 2. Looking at the encoder trellis again, we see that a state 00 → 10 transition
yields an output branch word of 11, which corresponds exactly with the code
symbols we received at time t1. Therefore, on the decoder trellis, we label the state
00 → 10 transition with a Hamming distance of 0. In summary, the metric entered
on a decoder trellis branch represents the difference (distance) between what was
received and what “should have been” received had the branch word associated
with that branch been transmitted. In effect, these metrics describe a correlationlike
measure between a received branch word and each of the candidate branch
words. We continue labeling the decoder trellis branches in this way as the symbols
are received at each time ti. The decoding algorithm uses these Hamming distance
metrics to find the most likely (minimum distance) path through the trellis.
The basis of Viterbi decoding is the following observation: If any two paths in
the trellis merge to a single state, one of them can always be eliminated in the
search for an optimum path. For example, Figure 7.11 shows two paths merging at
time t5 to state 00. Let us define the cumulative Hamming path metric of a given
path at time ti as the sum of the branch Hamming distance metrics along that path
up to time ti. In Figure 7.11 the upper path has metric 4; the lower has metric 1. The
upper path cannot be a portion of the optimum path because the lower path, which

Figure 7.11 Path metrics for two merging paths.
enters the same state, has a lower metric. This observation holds because of the
Markov nature of the encoder state: The present state summarizes the encoder history
in the sense that previous states cannot affect future states or future output
branches.
At each time ti there are 2K − 1 states in the trellis, where K is the constraint
length, and each state can be entered by means of two paths. Viterbi decoding consists
of computing the metrics for the two paths entering each state and eliminating
one of them. This computation is done for each of the 2K − 1 states or nodes at time ti;
then the decoder moves to time ti + 1 and repeats the process. At a given time, the
winning path metric for each state is designated as the state metric for that state at
that time. The first few steps in our decoding example are as follows (see Figure
7.12). Assume that the input data sequence m, codeword U, and received sequence
Z are as shown in Figure 7.10. Assume that the decoder knows the correct initial
state of the trellis. (This assumption is not necessary in practice, but simplifies the
explanation.) At time t1 the received code symbols are 11. From state 00 the only
possible transitions are to state 00 or state 10, as shown in Figure 7.12a. State
00 → 00 transition has branch metric 2; state 00 → 10 transition has branch metric 0.
At time t2 there are two possible branches leaving each state, as shown in Figure 7.12b.
The cumulative metrics of these branches are labeled state metrics a , b, c, and
d, corresponding to the terminating state. At time t3 in Figure 7.12c there are again
two branches diverging from each state. As a result, there are two paths entering
each state at time t4. One path entering each state can be eliminated, namely, the
one having the larger cumulative path metric. Should metrics of the two entering
paths be of equal value, one path is chosen for elimination by using an arbitrary
rule. The surviving path into each state is shown in Figure 7.12d. At this point in
the decoding process, there is only a single surviving path, termed the common
stem, between times t1 and t2. Therefore, the decoder can now decide that the state
transition which occurred between t1 and t2 was 00 → 10. Since this transition is
produced by an input bit one, the decoder outputs a one as the first decoded bit.
Figure 7.12 Selection of survivor paths, (a) Survivors at t2. (b) Survivors at t3. (c) Metric comparisons at t4. (d) Survivors at t4. (e) Metric comparisons at t5.(f) Survivors at t5. (g) Metric comparisons at t6. (h) Survivors at t6.
Here we can see how the decoding of the surviving branch is facilitated by having
drawn the trellis branches with solid lines for input zeros and dashed lines for input
ones. Note that the first bit was not decoded until the path metric computation had
proceeded to a much greater depth into the trellis. For a typical decoder implementation,
this represents a decoding delay which can be as much as five times the constraint
length in bits.
At each succeeding step in the decoding process, there will always be two
possible paths entering each state; one of the two will be eliminated by comparing
the path metrics. Figure 7.12e shows the next step in the decoding process. Again,
at time t5 there are two paths entering each state, and one of each pair can be eliminated.
Figure 7.12f shows the survivors at time t5. Notice that in our example we
cannot yet make a decision on the second input data bit because there still are two
paths leaving the state 10 node at time t2. At time t6 in Figure 7.12g we again see the
pattern of remerging paths, and in Figure 7.12h we see the survivors at time t6.
Also, in Figure 7.12h the decoder outputs one as the second decoded bit, corre-

Figure 7.12 (Continued)
sponding to the single surviving path between t2 and t3. The decoder continues in
this way to advance deeper into the trellis and to make decisions on the input data
bits by eliminating all paths but one.
Pruning the trellis (as paths remerge) guarantees that there are never more
paths than there are states. For this example, verify that after each pruning in Figures
7.12b, d, f, and h, there are only 4 paths. Compare this to attempting a “brute
force” maximum-likelihood sequence estimation without using the Viterbi algorithm.
In that case, the number of possible paths (representing possible sequences)
is an exponential function of sequence length. For a binary codeword sequence that
has a length of L branch words, there are 2L possible sequences.
7.3.5 Decoder Implementation
In the context of the trellis diagram of Figure 7.10, transitions during any one time
interval can be grouped into 2v − 1 disjoint cells, each cell depicting four possible
transitions, where v = K − 1 is called the encoder memory. For the K = 3 example,
v = 2 and 2v − 1 = 2 cells. These cells are shown in Figure 7.13, where a, b, c, and d
refer to the states at time ti, and a’, b’, c’, and d’ refer to the states at time ti + 1.
Shown on each transition is the branch metric
, where the subscript indicates that
the metric corresponds to the transition from state x to state y. These cells and the
associated logic units that update the state metrics
, where x designates a particular
state, represent the basic building blocks of the decoder.
7.3.5.1 Add-Compare-Select Computation
Continuing with the K = 3, 2-cell example, Figure 7.14 illustrates the logic unit
that corresponds to cell 1. The logic executes the special purpose computation
called add-compare-select (ACS). The state metric
is calculated by adding the
previous-time state metric of state a,
, to the branch metric
and the previoustime
state metric of state c,
, to the branch metric
. This results in two possible
path metrics as candidates for the new state metric
. The two candidates are
compared in the logic unit of Figure 7.14. The largest likelihood (smallest distance)
of the two path metrics is stored as the new state metric
for state a . Also stored
is the new path history
for state a, where
is the message-path history of the
state augmented by the data of the winning path.

Figure 7.14 Logic unit that implements the add-compare-select functions corresponding to cell #1.
Also shown in Figure 7.14 is the cell-1 ACS logic that yields the new state
metric
and the new path history
. This ACS operation is similarly performed
for the paths in other cells. The oldest bit on the path with the smallest state metric
forms the decoder output.
7.3.5.2 Add-Compare-Select as seen on the Trellis
Consider the same example that was used for describing Viterbi decoding in
Section 7.3.4. The message sequence was m = 1 1 0 1 1, the codeword sequence was
U = 11 01 01 00 01, and the received sequence was Z = 11 01 01 10 01. Figure 7.15
depicts a decoding trellis diagram similar to Figure 7.10. A branch metric that labels
each branch is the Hamming distance between the received code symbols and
the corresponding branch word from the encoder trellis. Additionally, the Figure
7.15 trellis indicates a value at each state x, and for each time from time t2 to t6,
which is a state metric
. We perform the add-compare-select (ACS) operation
when there are two transitions entering a state, as there are for times t4 and later.
For example at time t4, the value of the state metric for state a is obtained by incrementing
the state metric
at time t3 with the branch metric
yielding a
candidate value of 4. Simultaneously, the state metric
at time t3 is incremented
with the branch metric
yielding a candidate value of 3. The select
operation of the ACS process selects the largest-likelihood (minimum distance)
path metric as the new state metric; hence, for state a at time t4, the new state metric
is
. The winning path is shown with a heavy line and the path that has
been dropped is shown with a lighter line. On the trellis of Figure 7.15, observe the
state metrics from left to right. Verify that at each time, the value of each state metric
is obtained by incrementing the connected state metric from the previous time
along the winning path (heavy line) with the branch metric between them. At some

Figure 7.15 Add-compare-select computations in Viterbi decoding.
point in the trellis (after a time interval of 4 or 5 times the constraint length), the
oldest bits can be decoded. As an example, looking at time t6 in Figure 7.15, we see
that the minimum-distance state metric has a value of 1. From this state d, the winning
path can be traced back to time t1, and one can verify that the decoded message
is the same as the original message, by the convention that dashed and solid
lines represent binary ones and zeros respectively.
7.3.6 Path Memory and Synchronization
The storage requirements of the Viterbi decoder grow exponentially with constraint
length K. For a code with rate 1/n, the decoder retains a set of 2K − 1 paths
after each decoding step. With high probability, these paths will not be mutually
disjoint very far back from the present decoding depth [12]. All of the 2K − 1 paths
tend to have a common stem which eventually branches to the various states. Thus
if the decoder stores enough of the history of the 2K − 1 paths, the oldest bits on all
paths will be the same. A simple decoder implementation, then, contains a fixed
amount of path history and outputs the oldest bit on an arbitrary path each time it
steps one level deeper into the trellis. The amount of path storage required is [12]
![]()
where h is the length of the information bit path history per state. A refinement,
which minimizes the value of h, uses the oldest bit on the most likely path as the
decoder output, instead of the oldest bit on an arbitrary path. It has been demonstrated
[12] that a value of h of 4 or 5 times the code constraint length is sufficient
for near-optimum decoder performance. The storage requirement u is the basic
limitation on the implementation of Viterbi decoders. Commercial decoders are
limited to a constraint length of about K = 10. Efforts to increase coding gain by
further increasing constraint length are met by the exponential increase in memory
requirements (and complexity) that follows from Equation (7.10).
Branch word synchronization is the process of determining the beginning of a
branch word in the received sequence. Such synchronization can take place without
new information being added to the transmitted symbol stream because the received
data appear to have an excessive error rate when not synchronized. Therefore, a simple
way of accomplishing synchronization is to monitor some concomitant indication
of this large error rate, that is, the rate at which the state metrics are increasing or the
rate at which the surviving paths in the trellis merge. The monitored parameters are
compared to a threshold, and synchronization is then adjusted accordingly.
7.4 PROPERTIES OF CONVOLUTIONAL CODES
7.4.1 Distance Properties of Convolutional Codes
Consider the distance properties of convolutional codes in the context of the simple
encoder in Figure 7.3 and its trellis diagram in Figure 7.7. We want to evaluate
the distance between all possible pairs of codeword sequences. As in the case of
block codes (see Section 6.5.2), we are interested in the minimum distance between
all pairs of such codeword sequences in the code, since the minimum distance is related
to the error-correcting capability of the code. Because a convolutional code is
a group or linear code [6], there is no loss in generality in simply finding the minimum
distance between each of the codeword sequences and the all-zeros sequence.
In other words, for a linear code, any test message is just as “good” as any other
test message. So, why not choose one that is easy to keep track of—namely, the allzeros
sequence? Assuming that the all-zeros input sequence was transmitted, the
paths of interest are those that start and end in the 00 state and do not return to the
00 state anywhere in between. An error will occur whenever the distance of any
other path that merges with the a = 00 state at time ti is less than that of the allzeros
path up to time ti, causing the all-zeros path to be discarded in the decoding
process. In other words, given the all-zeros transmission, an error occurs whenever
the all-zeros path does not survive. Thus, an error of interest is associated with a
surviving path that diverges from and then remerges to the all-zeros path. One
might ask, Why is it necessary for the path to remerge? Isn’t the divergence enough
to indicate an error? Yes, of course, but an error characterized by only a divergence
means that the decoder, from that point on, will be outputting “garbage” for
the rest of the message duration. We want to quantify the decoder’s capability in
terms of errors that will usually take place—that is, we want to learn the “easiest”
way for the decoder to make an error. The minimum distance for making such an
error can be found by exhaustively examining every path from the 00 state to the
00 state. First, let us redraw the trellis diagram, shown in Figure 7.16, labeling each
branch with its Hamming distance from the all-zeros codeword instead of with its
branch word symbols. The Hamming distance between two unequal-length sequences
will be found by first appending the necessary number of zeros to the
shorter sequence to make the two sequences equal in length. Consider all the paths
that diverge from the all-zeros path and then remerge for the first time at some arbitrary
node. From Figure 7.16 we can compute the distances of these paths from
the all-zeros path. There is one path at distance 5 from the all-zeros path; this path
departs from the all-zeros path at time t1 and merges with it at time t4. Similarly,
there are two paths at distance 6, one which departs at time t1 and merges at time t5,
and the other which departs at time t1 and merges at time t6, and so on. We can also
see from the dashed and solid lines of the diagram that the input bits for the
distance 5 path are 1 0 0; it differs in only one input bit from the all-zeros input
sequence. Similarly, the input bits for the distance 6 paths are 1 1 0 0 and 1 0 1 0 0;
each differs in two positions from the all-zeros path. The minimum distance in the
set of all arbitrarily long paths that diverge and remerge, called the minimum free
distance, or simply the free distance, is seen to be 5 in this example, as shown with
the heavy lines in Figure 7.16. For calculating the error-correcting capability of the
code, we repeat Equation (6.44) with the minimum distance dmin replaced by the
free distance df as


Figure 7.16 Trellis diagram, labeled with distances from the all-zeros path.
where
means the largest integer no greater than x. Setting df = 5, we see that the
code, characterized by the Figure 7.3 encoder, can correct any two channel errors.
(See Section 7.4.1.1.)
A trellis diagram represents “the rules of the game.” It is a shorthand description
of all the possible transitions and their corresponding start and finish states associated
with a particular finite-state machine. The trellis diagram offers some
insight into the benefit (coding gain) when using error-correction coding. Consider
Figure 7.16 and the possible divergence-remergence error paths. From this picture
one sees that the decoder cannot make an error in any arbitrary way. The error
path must follow one of the allowable transitions. The trellis pinpoints all such allowable
paths. By having encoded the data in this way, we have placed constraints
on the transmitted signal. The decoder knows these constraints, and this knowledge
enables the system to more easily (using less Eb/N0) meet some error performance
requirements.
Although Figure 7.16 presents the computation of free distance in a straightforward
way, a more direct closed-form expression can be obtained by starting with
the state diagram in Figure 7.5. First, we label the branches of the state diagram as
either D0 = 1, D1, or D2, shown in Figure 7.17, where the exponent of D denotes the
Hamming distance from the branch word of that branch to the all-zeros branch.
The self-loop at node a can be eliminated since it contributes nothing to the distance
properties of a codeword sequence relative to the all-zeros sequence. Furthermore,
node a can be split into two nodes (labeled a and e), one of which
represents the input and the other the output of the state diagram. All paths originating
at a = 00 and terminating at e = 00 can be traced on the modified state diagram
of Figure 7.17. We can calculate the transfer function of path a b c e (starting
and ending at state 00) in terms of the indeterminate “placeholder” D, as D2 D D2
= D5. The exponent of D represents the cumulative tally of the number of ones in
the path, and hence the Hamming distance from the all-zeros path. Similarly, the

Figure 7.17 State diagram, labeled according to distance from the all-zeros path.
pathsa b d c e and a b c b c e each have the transfer function D 6 and thus a Hamming
distance of 6 from the all-zeros path. We now write the state equations as

where Xa, . . . , Xe are dummy variables for the partial paths to the intermediate
nodes. The transfer function, T(D), sometimes called the generating function of the
code can be expressed as T(D) = Xe/Xa. By solving the state equations shown in
Equation (7.12), we obtain [15, 16]

The transfer function for this code indicates that there is a single path of distance 5
from the all-zeros path, two of distance 6, four of distance 7, and in general, there
are
paths of distance
from the all-zeros path, where
The free
distance df of the code is the Hamming weight of the lowest-order term in the expansion
of T(D). In this example df = 5. In evaluating distance properties, the transfer
function, T(D), cannot be used for long constraint lengths since the complexity
of T(D) increases exponentially with constraint length.
The transfer function can be used to provide more detailed information than
just the distance of the various paths. Let us introduce a factor L into each branch
of the state diagram so that the exponent of L can serve as a counter to indicate the
number of branches in any given path from state a = 00 to state e = 00. Furthermore,
we can introduce a factor N into all branch transitions caused by the input bit
one. Thus, as each branch is traversed, the cumulative exponent on N increases by
one, only if that branch transition is due to an input bit one. For the convolutional
code characterized in the Figure 7.3 example, the additional factors L and N are
shown on the modified state diagram of Figure 7.18. Equations (7.12) can now be
modified as follows:

The transfer function of this augmented state diagram is

Thus, we can verify some of the path properties displayed in Figure 7.16. There is
one path of distance 5, length 3, which differs in one input bit from the all-zeros
path. There are two paths of distance 6, one of which is length 4, the other length 5,
and both differ in two input bits from the all-zeros path. Also, of the distance 7
paths, one is of length 5, two are of length 6, and one is of length 7; all four paths
correspond to input sequences that differ in three input bits from the all-zeros path.
Thus if the all-zeros path is the correct path and the noise causes us to choose one
of the incorrect paths of distance 7, three bit errors will be made.

Figure 7.18 State diagram, labeled according to distance, length, and number of input ones.
7.4.1.1 Error-Correcting Capability of Convolutional Codes
In the study of block codes in Chapter 6, we saw that the error-correcting capability,
t, represented the number of code symbol errors that could, with maximum
likelihood decoding, be corrected in each block length of the code. However,
when decoding convolutional codes, the error-correcting capability cannot be
stated so succinctly. With regard to Equation (7.11), we can say that the code can,
with maximum likelihood decoding, correct t errors within a few constraint lengths,
where “few” here means 3 to 5. The exact length depends on how the errors are
distributed. For a particular code and error pattern, the length can be bounded
using transfer function methods. Such bounds are described later.
7.4.2 Systematic and Nonsystematic Convolutional Codes
A systematic convolutional code is one in which the input k-tuple appears as part of
the output branch word n-tuple associated with that k-tuple. Figure 7.19 shows a binary,
rate 1/2, K = 3 systematic encoder. For linear block codes, any nonsystematic
code can be transformed into a systematic code with the same block distance properties.
This is not the case for convolutional codes. The reason for this is that convolutional
codes depend largely on free distance; making the convolutional code
systematic, in general, reduces the maximum possible free distance for a given
constraint length and rate.
Table 7.1 shows the maximum free distance for rate 1/2 systematic and nonsystematic
codes for K = 2 through 8. For large constraint lengths the results are even
more widely separated [17].

Figure 7.19 Systematic convolutionalencoder, rate 1/2, K = 3.

7.4.3 Catastrophic Error Propagation in Convolutional Codes
A catastrophic error is defined as an event whereby a finite number of code symbol
errors cause an infinite number of decoded data bit errors. Massey and Sain [18]
have derived a necessary and sufficient condition for convolutional codes to display
catastrophic error propagation. For rate 1/n codes with register taps designated by
polynomial generators, as described in Section 7.2.1, the condition for catastrophic
error propagation is that the generators have a common polynomial factor (of
degree at least one). For example, Figure 7.20a illustrates a rate 1/2, K = 3 encoder
with upper polynomial g1(X) and lower polynomial g2(X), as follows:

The generators g1(X) and g2(X) have in common the polynomial factor 1 + X, since
![]()
Therefore, the encoder in Figure 7.20a can manifest catastrophic error propagation.

Figure 7.20 Encoder displaying catastrophic error propagation. (a) Encoder. (b) State diagram.
In terms of the state diagram for any-rate code, catastrophic errors can occur
if, and only if, any closed-loop path in the diagram has zero weight (zero distance
from the all-zeros path). To illustrate this, consider the example of Figure 7.20. The
state diagram in Figure 7.20b is drawn with the state a = 00 node split into two
nodes, a and e, as before. Assuming that the all-zeros path is the correct path, the
incorrect path a b d d . . . d c e has exactly 6 ones, no matter how many times we go
around the self-loop at node d. Thus for a BSC, for example, three channel errors
may cause us to choose this incorrect path. An arbitrarily large number of errors
(two plus the number of times the self-loop is traversed) can be made on such a
path. We observe that for rate 1/n codes, if each adder in the encoder has an even
number of connections, the self-loop corresponding to the all-ones data state will
have zero weight, and consequently, the code will be catastrophic.
The only advantage of a systematic code, described earlier, is that it can never
be catastrophic, since each closed loop must contain at least one branch generated
by a nonzero input bit, and thus each closed loop must have a nonzero code symbol.
However, it can be shown [19] that only a small fraction of nonsystematic
codes (excluding those where all adders have an even number of taps) are
catastrophic.
7.4.4 Performance Bounds for Convolutional Codes
The probability of bit error, PB, for a binary convolutional code using hard-decision
decoding can be shown [8] to be upper bounded as follows:

where p is the probability of channel symbol error. For the example of Figure 7.3,
T(D, N) is obtained from T(D, L, N ) by setting L = 1 in Equation (7.15).

and

Combining Equations (7.17) and (7.19), we can write

For coherent BPSK modulation over an additive white Gaussian noise
(AWGN) channel, it can be shown [8] that the bit error probability is bounded by

where
Ec /N0 = rEb /N0
Eb /N0 = ratio of information bit energy to noise power spectral density
Ec /N0 = ratio of channel symbol energy to noise power spectral density
r = k/n = rate of the code
and Q(x) is defined in Equations (3.43) and (3.44) and tabulated in Table B.1.
Therefore, for the rate code with free distance df = 5, in conjunction with coherent
BPSK and hard-decision decoding, we can write

7.4.5 Coding Gain
Coding gain, as presented in Equation (6.19), is defined as the reduction, usually
expressed in decibels, in the required Eb/N0 to achieve a specified error probability
of the coded system over an uncoded system with the same modulation and channel
characteristics. Table 7.2 lists an upper bound on the coding gains, compared to
uncoded coherent BPSK, for several maximum free distance convolutional codes
with constraint lengths varying from 3 to 9 over a Gaussian channel with harddecision
decoding. The table illustrates that it is possible to achieve significant coding
gain even with a simple convolutional code. The actual coding gain will vary
with the required bit error probability [20].
Table 7.3 lists the measured coding gains, compared to uncoded coherent
BPSK, achieved with hardware implementation or computer simulation over a
Gaussian channel with soft-decision decoding [21]. The uncoded Eb/N0 is given in
the leftmost column. From Table 7.3 we can see that coding gain increases as the
bit error probability is decreased. However, the coding gain cannot increase indefinitely;
it has an upper bound as shown in the table. This bound in decibels can be
shown [21] to be
![]()
where r is the code rate and df is the free distance. Examination of Table 7.3 also
reveals that at PB = 10−7, for code rates of 1/2 and 2/3, the weaker codes tend to be
closer to the upper bound than are the more powerful codes.
Typically, Viterbi decoding is used over binary input channels with either
hard or 3-bit soft quantized outputs. The constraint lengths vary between 3 and 9,
the code rate is rarely smaller than 1/3 , and the path memory is usually a few con-
straint lengths [12]. The path memory refers to the depth of the input bit history
stored by the decoder. From the Viterbi decoding example in Section 7.3.4, one
might question the notion of a fixed path memory. It seems from the example that
the decoding of a branch word, at any arbitrary node, can take place as soon as
there is only a single surviving branch at that node. That is true; however, to actually
implement the decoder in this way would entail an extensive amount of processing
to continually check when the branch word can be decoded. Instead, a fixed
delay is provided, after which the branch word is decoded. It has been shown [12,
22] that tracing back from the state with the lowest state metric, over a fixed
amount of path history (about 4 or 5 times the constraint length), is sufficient to
limit the degradation from the optimum decoder performance to about 0.1 dB for
the BSC and Gaussian channels. Typical error performance simulation results are
shown in Figure 7.21 for Viterbi decoding with hard decision quantization [12]. Notice
that each increment in constraint length improves the required Eb/N0 by a factor
of approximately 0.5 dB at PB = 10−5.
Figure 7.21 Bit error probability versus Eb /N0 for rate 1/2 codes using coherent BPSK over a BSC, Viterbi decoding, and a 32-bit path memory. (Reprinted with permission from J. A. Heller and I. M. Jacobs, “Viterbi Decoding for Satellite and Space Communication,” IEEE Trans. Commun. Technol., vol. COM19, no. 5, October 1971, Fig. 7, p. 84. © 1971 IEEE.)
7.4.6 Best Known Convolutional Codes
The connection vectors or polynomial generators of a convolutional code are usually
selected based on the code’s free distance properties. The first criterion is to
select a code that does not have catastrophic error propagation and that has the
maximum free distance for the given rate and constraint length. Then the number
of paths at the free distance df, or the number of data bit errors the paths represent,
should be minimized. The selection procedure can be further refined by considering
the number of paths or bit errors at df + 1, at df + 2, and so on, until only one
code or class of codes remains. A list of the best known codes of rate 1/2, K = 3 to 9,
and rate 1/3, K = 3 to 8, based on this criterion was compiled by Odenwalder [3, 23]
and is given in Table 7.4. The connection vectors in this table represent the pres-
ence or absence (1 or 0) of a tap connection on the corresponding stage of the convolutional
encoder, the leftmost term corresponding to the leftmost stage of the encoder
register. It is interesting to note that these connections can be inverted
(leftmost and rightmost can be interchanged in the above description). Under
the condition of Viterbi decoding, the inverted connections give rise to codes
with identical distance properties, and hence identical performance, as those in
Table 7.4.

7.4.7 Convolutional Code Rate Trace-Off
7.4.7.1 Performance with Coherent PSK Signaling
The error-correcting capability of a coding scheme increases as the number of
channel symbols n per information bit k increases, or the rate k/n decreases. However,
the channel bandwidth and the decoder complexity both increase with n. The
advantage of lower code rates when using convolutional codes with coherent PSK,
is that the required Eb /N0 is decreased (for a large range of code rates), permitting
the transmission of higher data rates for a given amount of power, or permitting reduced
power for a given data rate. Simulation studies have shown [16, 22] that for a
fixed constraint length, a decrease in the code rate from 1/2 to 1/3 results in a reduction
of the required Eb /N0 of roughly 0.4 dB. However, the corresponding increase in
decoder complexity is about 17%. For smaller values of code rate, the improvement
in performance relative to the increased decoding complexity diminishes
rapidly [22]. Eventually, a point is reached where further decrease in code rate is
characterized by a reduction in coding gain. (See Section 9.7.7.2.)
7.4.7.2 Performance with Noncoherent Orthogonal Signaling
In contrast to PSK, there is an optimum code rate of about 1/2 for noncoherent
orthogonal signaling. Error performance at rates of 1/3, 2/3, 3/4 and are each worse than
those for rate 1/2. For a fixed constraint length, the rate 1/3, 2/3, 3/4 and codes typically
degrade by about 0.25, 0.5, and 0.3 dB, respectively, relative to the rate 1/2 performance
[16].
7.4.8 Soft-Decision Viterbi Decoding
For a rate 1/2 binary convolutional code system, the demodulator delivers two code
symbols at a time to the decoder. For hard-decision (2-level) decoding, each pair of
received code symbols can be depicted on a plane, as one of the corners of a
square, as shown in Figure 7.22a. The corners are labeled with the binary numbers
(0,0), (0,1), (1,0), and (1,1), representing the four possible hard-decision values that
the two code symbols might have. For 8-level soft-decision decoding, each pair of
code symbols can be similarly represented on an equally spaced 8-level by 8-level
plane, as a point from the set of 64 points shown in Figure 7.22b. In this
soft-decision case, the demodulator no longer delivers firm decisions; it delivers
quantized noisy signals (soft decisions).
The primary difference between hard-decision and soft-decision Viterbi decoding,
is that the soft-decision algorithm cannot use a Hamming distance metric
because of its limited resolution. A distance metric with the needed resolution is
Euclidean distance, and to facilitate its use, the binary numbers of 1 and 0 are
transformed to the octal numbers 7 and 0, respectively. This can be seen in Figure
7.22c, where the corners of the square have been re-labeled accordingly; this allows
us to use a pair of integers, each in the range of 0 to 7, for describing any point in
the 64-point set. Also shown in Figure 7.22c is the point 5,4, representing an
example of a pair of noisy code-symbol values that might stem from a

Figure 7.22 (a) Hard-decision plane (b) 8-level by 8-level soft-decision plane (c) Example of soft code symbols (d) Encoding trellis section (e) Decoding trellis section.
demodulator. Imagine that the square in Figure 7.22c has coordinates x and y.
Then, what is the Euclidean distance between the noisy point 5,4 and the noiseless
point 0,0? It is![]()
Similarly, if we ask what is the Euclidean distance between the noisy point 5,4 and the
noiseless point 7,7?
It is![]()
Soft-decision Viterbi decoding, for the most part, proceeds in the same way as
hard-decision decoding (as described in Sections 7.3.4 and 7.3.5). The only difference
is that Hamming distances are not used. Consider how soft-decision decoding is performed
with the use of Euclidean distances. Figure 7.22d shows the first section of an
encoding trellis, originally presented in Figure 7.7, with the branch words transformed
from binary to octal. Suppose that a pair of soft-decision code symbols with
values 5,4 arrives at a decoder during the first transition interval. Figure 7.22e shows
the first section of a decoding trellis. The metric![]()
representing the Euclidean distance between the arriving 5,4 and the 0,0 branch word, is placed
on the solid line. Similarly, the metric
representing the Euclidean distance between
the arriving 5,4 and the 7,7 code symbols, is placed on the dashed line. The rest of the task,
pruning the trellis in search of a common stem, proceeds in the same way as harddecision
decoding. Note that in a real convolutional decoding chip, the Euclidean distance
is not actually used for a soft-decision metric; instead, a monotonic metric that
has similar properties and is easier to implement is used. An example of such a metric
is the Euclidean distance-squared, in which case the square-root operation shown
above is eliminated. Further, if the binary code symbols are represented with bipolar
values, then the inner-product metric in Equation (7.9) can be used. With such a metric,
we would seek maximum correlation rather than minimum distance.
Relevant NI products
Customers interested in this topic were also interested in the following NI products:
For the complete list of tutorials, return to the NI RF and Communications Fundamentals Main page.
Buy the Book
Purchase Digital Communications from Prentice Hall Professional through this link and receive the following:
-
Between 15% and 30% Off
-
Free Shipping and Handling
Reader Comments | Submit a comment »
Legal
Excerpt from the book published by Prentice Hall Professional (http://www.phptr.com).
Copyright Prentice Hall Inc., A Pearson Education Company, Upper Saddle River, New Jersey 07458.
This material is protected under the copyright laws of the U.S. and other countries and any uses not in conformity with the copyright laws are prohibited, including but not limited to reproduction, DOWNLOADING, duplication, adaptation and transmission or broadcast by any media, devices or processes.






