Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

Document Type: Prentice Hall
Author: Bernard Sklar
Book: Digital Communications: Fundamentals and Applications (2nd Edition)
Copyright: 2001
ISBN: 0130847887
NI Supported: No
Publish Date: Jan 1, 2008


Feedback


Yes No

Related Categories

Related Links - Developer Zone

Related Links - Products and Services

Convolutional Decoder

2 ratings | 4.50 out of 5
Print | PDF

Overview

National Instruments has partnered with Prentice Hall to bring you large portions of in-depth technical topics from several PTR RF and Communications books, including Digital Communications: Fundamentals and Applications, 2nd Edition. This series of content is designed for a broad range of audiences, from experts who want to review a specific topic to students who need easy-to-understand documentation for their projects.

For the complete list of RF topics, please visit the RF and Communications Resource Main Page.

7.3 FORMULATION OF THE CONVOLUTIONAL DECODING PROBLEM

7.3.1 Maximum Likelihood Decoding

If all input message sequences are equally likely, a decoder that achieves the minimum probability of error is one that compares the conditional probabilities, also called the likelihood functions P(Z|U(m)), where Z is the received sequence and

U(m) is one of the possible transmitted sequences, and chooses the maximum. The decoder chooses U()  if

The maximum likelihood concept, as stated in Equation (7.1), is a fundamental development of decision theory (see Appendix B); it is the formalization of a “common-sense” way to make decisions when there is statistical knowledge of the possibilities. In the binary demodulation treatment in Chapters 3 and 4 there wereonly two equally likely possible signals, s1(t) or s2(t ), that might have been transmitted.

Therefore, to make the binary maximum likelihood decision, given a received signal, meant only to decide that s1(t ) was transmitted if

otherwise, to decide that s2(t) was transmitted. The parameter z represents z(T),

the receiver predetection value at the end of each symbol duration time t = T. However,

when applying maximum likelihood to the convolutional decoding problem,

we observe that the convolutional code has memory (the received sequence represents

the superposition of current bits and prior bits). Thus, applying maximum

likelihood to the decoding of convolutionally encoded bits is performed in the context

of choosing the most likely sequence, as shown in Equation (7.1). There are

typically a multitude of possible codeword sequences that might have been transmitted.

To be specific, for a binary code, a sequence of L branch words is a member

of a set of 2L possible sequences. Therefore, in the maximum likelihood context, we

can say that the decoder chooses a particular U() as the transmitted sequence if

the likelihood P(ZU()) is greater than the likelihoods of all the other possible

transmitted sequences. Such an optimal decoder, which minimizes the error probability

(for the case where all transmitted sequences are equally likely), is known as

a maximum likelihood decoder. The likelihood functions are given or computed

from the specifications of the channel.

We will assume that the noise is additive white Gaussian with zero mean and

the channel is memoryless, which means that the noise affects each code symbol

independently of all the other symbols. For a convolutional code of rate 1/n, we

can therefore express the likelihood as

where Zi is the ith branch of the received sequence Z, Ui

(m) is the ith branch of a particular codeword sequence U(m), zji is the jth code symbol of Zi, andu ji ( m ) is the jth code symbol of Ui(m), and each branch comprises n code symbols. The decoder problem consists of

choosing a path through the trellis of Figure 7.7 (each possible path defines a codeword sequence) such

that

Generally, it is computationally more convenient to use the logarithm of the

likelihood function since this permits the summation, instead of the multiplication,

of terms. We are able to use this transformation because the logarithm is a monotonically

increasing function and thus will not alter the final result in our codeword

selection. We can define the log-likelihood function as

The decoder problem now consists of choosing a path through the tree of Figure

7.6 or the trellis of Figure 7.7 such that  is maximized. For the decoding of

convolutional codes, either the tree or the trellis structure can be used. In the tree

representation of the code, the fact that the paths remerge is ignored. Since for a

binary code, the number of possible sequences made up of L branch words is 2L,

maximum likelihood decoding of such a received sequence, using a tree diagram,

requires the “brute force” or exhaustive comparison of 2L accumulated loglikelihood

metrics, representing all the possible different codeword sequences that

could have been transmitted. Hence it is not practical to consider maximum likelihood

decoding with a tree structure. It is shown in a later section that with the use

of the trellis representation of the code, it is possible to configure a decoder which

can discard the paths that could not possibly be candidates for the maximum likelihood

sequence. The decoded path is chosen from some reduced set of surviving

paths. Such a decoder is still optimum in the sense that the decoded path is the

same as the decoded path obtained from a “brute force” maximum likelihood decoder,

but the early rejection of unlikely paths reduces the decoding complexity.

For an excellent tutorial on the structure of convolutional codes, maximum

likelihood decoding, and code performance, see Reference [8]. There are several

algorithms that yield approximate solutions to the maximum likelihood decoding

problem, including sequential [9, 10] and threshold [11]. Each of these algorithms is

suited to certain special applications, but are all suboptimal. In contrast, the Viterbi

decoding algorithm performs maximum likelihood decoding and is therefore optimal.

This does not imply that the Viterbi algorithm is best for every application;

there are severe constraints imposed by hardware complexity. The Viterbi

algorithm is considered in Sections 7.3.3 and 7.3.4.

7.3.2 Channel Models: Hard versus Soft Decisions

Before specifying an algorithm that will determine the maximum likelihood decision,

let us describe the channel. The codeword sequence U(m), made up of branch

words, with each branch word comprised of n code symbols, can be considered to

be an endless stream, as opposed to a block code, in which the source data and

their codewords are partitioned into precise block sizes. The codeword sequence

shown in Figure 7.1 emanates from the convolutional encoder and enters the modulator,

where the code symbols are transformed into signal waveforms. The modulation

may be baseband (e.g., pulse waveforms) or bandpass (e.g., PSK or FSK). In

general, symbols at a time, where  is an integer, are mapped into signal waveforms

si(t), where i = 1, 2, . . . ,  When = 1, the modulator maps each code

symbol into a binary waveform. The channel over which the waveform is transmitted

is assumed to corrupt the signal with Gaussian noise. When the corrupted signal

is received, it is first processed by the demodulator and then by the decoder.

Consider that a binary signal transmitted over a symbol interval (0, T) is represented

by s1(t) for a binary one and s2(t) for a binary zero. The received signal is r(t) =

si(t) + n(t), where n(t) is a zero-mean Gaussian noise process. In Chapter 3 we described

the detection of r(t) in terms of two basic steps. In the first step, the received

waveform is reduced to a single number, z(T) = ai + n0, where ai is the signal component

of z(T) and n0 is the noise component. The noise component, n0, is a zero-mean

Gaussian random variable, and thus z(T) is a Gaussian random variable with a mean

of either a1 or a2 depending on whether a binary one or binary zero was sent. In the

second step of the detection process a decision was made as to which signal was transmitted,

on the basis of comparing z(T) to a threshold. The conditional probabilities

of z(T), p(z|s1), and p(z|s2) are shown in Figure 7.8, labeled likelihood of s1 and likelihood

of s2. The demodulator in Figure 7.1, converts the set of time-ordered random

variables {z(T)} into a code sequence Z, and passes it on to the decoder. The demodulator

output can be configured in a variety of ways. It can be implemented to make

a firm or hard decision as to whether z(T) represents a zero or a one. In this case, the

output of the demodulator is quantized to two levels, zero and one, and fed into the

decoder (this is exactly the same threshold decision that was made in Chapters 3

 

 


[+] Enlarge Image

                                     Figure 7.8 Hard and soft decoding decisions.

and 4). Since the decoder operates on the hard decisions made by the demodulator,

the decoding is called hard-decision decoding.

The demodulator can also be configured to feed the decoder with a quantized

value of z(Tgreater than two levels. Such an implementation furnishes the decoder

with more information than is provided in the hard-decision case. When the quantization

level of the demodulator output is greater than two, the decoding is called

soft-decision decoding. Eight levels (3-bits) of quantization are illustrated on the

abscissa of Figure 7.8. When the demodulator sends a hard binary decision to the

decoder, it sends it a single binary symbol. When the demodulator sends a soft binary

decision, quantized to eight levels, it sends the decoder a 3-bit word describing

an interval along z(T). In effect, sending such a 3-bit word in place of a single binary

symbol is equivalent to sending the decoder a measure of confidence along

with the code-symbol decision. Referring to Figure 7.8, if the demodulator sends

1 1 1 to the decoder, this is tantamount to declaring the code symbol to be a one

with very high confidence, while sending a 1 0 0 is tantamount to declaring the code

symbol to be a one with very low confidence. It should be clear that ultimately,

every message decision out of the decoder must be a hard decision; otherwise, one

might see computer printouts that read: “think it’s a 1,” “think it’s a 0,” and so on.

The idea behind the demodulator not making hard decisions and sending more

data (soft decisions) to the decoder can be thought of as an interim step to provide

the decoder with more information, which the decoder then uses for recovering the

message sequence (with better error performance than it could in the case of harddecision

decoding). In Figure 7.8, the 8-level soft-decision metric is often shown as

−7, −5, −3, −1, 1, 3, 5, 7. Such a designation lends itself to a simple interpretation of

the soft decision: The sign of the metric represents a decision (e.g., choose s1 if

positive, choose s2 if negative), and the magnitude of the metric represents the

confidence level of that decision. The only advantage for the metric shown in

Figure 7.8 is that it avoids the use of negative numbers.

For a Gaussian channel, eight-level quantization results in a performance improvement

of approximately 2 dB in required signal-to-noise ratio compared to

two-level quantization. This means that eight-level soft-decision decoding can provide

the same probability of bit error as that of hard-decision decoding, but

requires 2 dB less Eb/N0 for the same performance. Analog (or infinite-level quantization)

results in a 2.2-dB performance improvement over two-level quantization;

therefore, eight-level quantization results in a loss of approximately 0.2 dB compared

to infinitely fine quantization. For this reason, quantization to more than

eight levels can yield little performance improvement [12]. What price is paid for

such improved soft-decision-decoder performance? In the case of hard-decision

decoding, a single bit is used to describe each code symbol, while for eight-level

quantized soft-decision decoding 3 bits are used to describe each code symbol;

therefore, three times the amount of data must be handled during the decoding

process. Hence the price paid for soft-decision decoding is an increase in required

memory size at the decoder (and possibly a speed penalty).

Block decoding algorithms and convolutional decoding algorithms have been

devised to operate with hard or soft decisions. However, soft-decision decoding is

generally not used with block codes because it is considerably more difficult than

hard-decision decoding to implement. The most prevalent use of soft-decision

decoding is with the Viterbi convolutional decoding algorithm, since with Viterbi

decoding, soft decisions represent only a trivial increase in computation.

7.3.2.1 Binary Symmetric Channel

A binary symmetric channel (BSC) is a discrete memoryless channel (see

Section 6.3.1) that has binary input and output alphabets and symmetric transition

probabilities. It can be described by the conditional probabilities

as illustrated in Figure 7.9. The probability that an output symbol will differ from

the input symbol is p, and the probability that the output symbol will be identical to

the input symbol is (1 − p). The BSC is an example of a hard-decision channel,

which means that, even though continuous-valued signals may be received by the

demodulator, a BSC allows only firm decisions such that each demodulator output

symbol, zji, as shown in Figure 7.1, consists of one of two binary values. The indexing

of zji pertains to the jth code symbol of the ith branch word, Zi. The demodulator

then feeds the sequence Z = {Zi} to the decoder.

Let U(m) be a transmitted codeword over a BSC with symbol error probability

p, and let Z be the corresponding received decoder sequence. As noted previously,

a maximum likelihood decoder chooses the codeword U() that maximizes the

likelihood P(ZU(m)) or its logarithm. For a BSC, this is equivalent to choosing the

codeword U() that is closest in Hamming distance to Z [8]. Thus Hamming distance

is an appropriate metric to describe the distance or closeness of fit between

U(m) and Z. From all the possible transmitted sequences U(m), the decoder chooses

the U() sequence for which the distance to Z is minimum.

Suppose that U(m) and Z are each L-bit-long sequences and that they differ in

dm positions [i.e., the Hamming distance between U(m) and Z is dm]. Then, since the

                                                         Figure 7.9 Binary symmetric channel (hard-decision channel).

channel is assumed to be memoryless, the probability that this U(m) was transformed

to the specific received Z at distance dm from it can be written as

and the log-likelihood function is

If we compute this quantity for each possible transmitted sequence, the last term in

the equation will be constant in each case. Assuming that p < 0.5, we can express

Equation (7.7) as

where A and B are positive constants. Therefore, choosing the codeword U(),

such that the Hamming distance dm to the received sequence Z is minimized, corresponds

to maximizing the likelihood or log-likelihood metric. Consequently, over a

BSC, the log-likelihood metric is conveniently replaced by the Hamming distance,

and a maximum likelihood decoder will choose, in the tree or trellis diagram, the

path whose corresponding sequence U() is at the minimum Hamming distance to

the received sequence Z.

7.3.2.2 Gaussian Channel

For a Gaussian channel, each demodulator output symbol zji, as shown in

Figure 7.1, is a value from a continuous alphabet. The symbol zji cannot be labeled

as a correct or incorrect detection decision. Sending the decoder such soft decisions

can be viewed as sending a family of conditional probabilities of the different symbols

(see Section 6.3.1). It can be shown [8] that maximizing P(Z|U(m)) is equivalent

to maximizing the inner product between the codeword sequence U(m) (consisting

of binary symbols represented as bipolar values) and the analog-valued received

sequence Z. Thus, the decoder chooses the codeword U(m) if it maximizes

This is equivalent to choosing the codeword U() that is closest in Euclidean distance

to Z. Even though the hard- and soft-decision channels require different metrics,

the concept of choosing the codeword U() that is closest to the received

sequence, Z, is the same in both cases. To implement the maximization of Equation

(7.9) exactly, the decoder would have to be able to handle analog-valued arithmetic

operations. This is impractical because the decoder is generally implemented

digitally. Thus it is necessary to quantize the received symbols zji. Does Equation

(7.9) remind you of the demodulation treatment in Chapters 3 and 4? Equation

(7.9) is the discrete version of correlating an input received waveform, r(t), with a

reference waveform, si(t), as expressed in Equation (4.15). The quantized Gaussian

channel, typically referred to as a soft-decision channel, is the channel model

assumed for the soft-decision decoding described earlier.

7.3.3 The Viterbi Convolutional Decoding Algorithm

The Viterbi decoding algorithm was discovered and analyzed by Viterbi [13] in 1967.

The Viterbi algorithm essentially performs maximum likelihood decoding; however,

it reduces the computational load by taking advantage of the special structure

in the code trellis. The advantage of Viterbi decoding, compared with brute-force

decoding, is that the complexity of a Viterbi decoder is not a function of the number

of symbols in the codeword sequence. The algorithm involves calculating a

measure of similarity, or distance, between the received signal, at time ti, and all the

trellis paths entering each state at time ti. The Viterbi algorithm removes from consideration

those trellis paths that could not possibly be candidates for the maximum

likelihood choice. When two paths enter the same state, the one having the

best metric is chosen; this path is called the surviving path. This selection of surviving

paths is performed for all the states. The decoder continues in this way to advance

deeper into the trellis, making decisions by eliminating the least likely paths.

The early rejection of the unlikely paths reduces the decoding complexity. In 1969,

Omura [14] demonstrated that the Viterbi algorithm is, in fact, maximum likelihood.

Note that the goal of selecting the optimum path can be expressed, equivalently,

as choosing the codeword with the maximum likelihood metric, or as

choosing the codeword with the minimum distance metric.

7.3.4 An Example of Viterbi Convolutional Decoding

For simplicity, a BSC is assumed; thus Hamming distance is a proper distance measure.

The encoder for this example is shown in Figure 7.3, and the encoder trellis

diagram is shown in Figure 7.7. A similar trellis can be used to represent the decoder,

as shown in Figure 7.10. We start at time t1 in the 00 state (flushing the encoder

between messages provides the decoder with starting-state knowledge).

Since in this example, there are only two possible transitions leaving any state, not

all branches need be shown initially. The full trellis structure evolves after time t3.

The basic idea behind the decoding procedure can best be understood by examining

the Figure 7.7 encoder trellis in concert with the Figure 7.10 decoder trellis. For

the decoder trellis it is convenient at each time interval, to label each branch with

the Hamming distance between the received code symbols and the branch word

corresponding to the same branch from the encoder trellis. The example in Figure

7.10 shows a message sequence m, the corresponding codeword sequence U, and a

noise corrupted received sequence Z = 11 01 01 10 01 . . . . The branch words seen

on the encoder trellis branches characterize the encoder in Figure 7.3, and are

known a priori to both the encoder and the decoder. These encoder branch words

are the code symbols that would be expected to come from the encoder output as a

result of each of the state transitions. The labels on the decoder trellis branches are

accumulated by the decoder on the fly. That is, as the code symbols are received,

 

                                                      Figure 7.10 Decoder trellis diagram (rate 1/2 , K = 3).

each branch of the decoder trellis is labeled with a metric of similarity (Hamming

distance) between the received code symbols and each of the branch words for that

time interval. From the received sequence Z, shown in Figure 7.10, we see that the

code symbols received at (following) time t1 are 11. In order to label the decoder

branches at (departing) time t1 with the appropriate Hamming distance metric, we

look at the Figure 7.7 encoder trellis. Here we see that a state 00 → 00 transition

yields an output branch word of 00. But we received 11. Therefore, on the decoder

trellis we label the state 00 → 00 transition with Hamming distance between them,

namely 2. Looking at the encoder trellis again, we see that a state 00 → 10 transition

yields an output branch word of 11, which corresponds exactly with the code

symbols we received at time t1. Therefore, on the decoder trellis, we label the state

00 → 10 transition with a Hamming distance of 0. In summary, the metric entered

on a decoder trellis branch represents the difference (distance) between what was

received and what “should have been” received had the branch word associated

with that branch been transmitted. In effect, these metrics describe a correlationlike

measure between a received branch word and each of the candidate branch

words. We continue labeling the decoder trellis branches in this way as the symbols

are received at each time ti. The decoding algorithm uses these Hamming distance

metrics to find the most likely (minimum distance) path through the trellis.

The basis of Viterbi decoding is the following observation: If any two paths in

the trellis merge to a single state, one of them can always be eliminated in the

search for an optimum path. For example, Figure 7.11 shows two paths merging at

time t5 to state 00. Let us define the cumulative Hamming path metric of a given

path at time ti as the sum of the branch Hamming distance metrics along that path

up to time ti. In Figure 7.11 the upper path has metric 4; the lower has metric 1. The

upper path cannot be a portion of the optimum path because the lower path, which

                                                                  Figure 7.11 Path metrics for two merging paths.

enters the same state, has a lower metric. This observation holds because of the

Markov nature of the encoder state: The present state summarizes the encoder history

in the sense that previous states cannot affect future states or future output

branches.

At each time ti there are 2K  1 states in the trellis, where K is the constraint

length, and each state can be entered by means of two paths. Viterbi decoding consists

of computing the metrics for the two paths entering each state and eliminating

one of them. This computation is done for each of the 2K  1 states or nodes at time ti;

then the decoder moves to time ti + 1 and repeats the process. At a given time, the

winning path metric for each state is designated as the state metric for that state at

that time. The first few steps in our decoding example are as follows (see Figure

7.12). Assume that the input data sequence m, codeword U, and received sequence

Z are as shown in Figure 7.10. Assume that the decoder knows the correct initial

state of the trellis. (This assumption is not necessary in practice, but simplifies the

explanation.) At time t1 the received code symbols are 11. From state 00 the only

possible transitions are to state 00 or state 10, as shown in Figure 7.12a. State

00 → 00 transition has branch metric 2; state 00 → 10 transition has branch metric 0.

At time t2  there are two possible branches leaving each state, as shown in Figure 7.12b.

The cumulative metrics of these branches are labeled state metrics a , b, c, and

d, corresponding to the terminating state. At time t3 in Figure 7.12c there are again

two branches diverging from each state. As a result, there are two paths entering

each state at time t4. One path entering each state can be eliminated, namely, the

one having the larger cumulative path metric. Should metrics of the two entering

paths be of equal value, one path is chosen for elimination by using an arbitrary

rule. The surviving path into each state is shown in Figure 7.12d. At this point in

the decoding process, there is only a single surviving path, termed the common

stem, between times t1 and t2. Therefore, the decoder can now decide that the state

transition which occurred between t1 and t2 was 00 → 10. Since this transition is

produced by an input bit one, the decoder outputs a one as the first decoded bit.


[+] Enlarge Image

Figure 7.12 Selection of survivor paths, (a) Survivors at t2. (b) Survivors at t3. (c) Metric comparisons at t4. (d) Survivors at t4. (e) Metric comparisons at t5.(f) Survivors at t5. (g) Metric comparisons at t6. (h) Survivors at t6.

Here we can see how the decoding of the surviving branch is facilitated by having

drawn the trellis branches with solid lines for input zeros and dashed lines for input

ones. Note that the first bit was not decoded until the path metric computation had

proceeded to a much greater depth into the trellis. For a typical decoder implementation,

this represents a decoding delay which can be as much as five times the constraint

length in bits.

At each succeeding step in the decoding process, there will always be two

possible paths entering each state; one of the two will be eliminated by comparing

the path metrics. Figure 7.12e shows the next step in the decoding process. Again,

at time t5 there are two paths entering each state, and one of each pair can be eliminated.

Figure 7.12f shows the survivors at time t5. Notice that in our example we

cannot yet make a decision on the second input data bit because there still are two

paths leaving the state 10 node at time t2. At time t6 in Figure 7.12g we again see the

pattern of remerging paths, and in Figure 7.12h we see the survivors at time t6.

Also, in Figure 7.12h the decoder outputs one as the second decoded bit, corre-

 

Figure 7.12 (Continued)

sponding to the single surviving path between t2 and t3. The decoder continues in

this way to advance deeper into the trellis and to make decisions on the input data

bits by eliminating all paths but one.

Pruning the trellis (as paths remerge) guarantees that there are never more

paths than there are states. For this example, verify that after each pruning in Figures

7.12b, d, f, and h, there are only 4 paths. Compare this to attempting a “brute

force” maximum-likelihood sequence estimation without using the Viterbi algorithm.

In that case, the number of possible paths (representing possible sequences)

is an exponential function of sequence length. For a binary codeword sequence that

has a length of L branch words, there are 2L possible sequences.

7.3.5 Decoder Implementation

In the context of the trellis diagram of Figure 7.10, transitions during any one time

interval can be grouped into 2v 1 disjoint cells, each cell depicting four possible

transitions, where v = K − 1 is called the encoder memory. For the K = 3 example,

v = 2 and 2v 1 = 2 cells. These cells are shown in Figure 7.13, where a, b, c, and d

refer to the states at time ti, and a’, b’, c’, and d’ refer to the states at time ti + 1.

Shown on each transition is the branch metric , where the subscript indicates that

the metric corresponds to the transition from state x to state y. These cells and the

associated logic units that update the state metrics , where x designates a particular

state, represent the basic building blocks of the decoder.


[+] Enlarge Image

7.3.5.1 Add-Compare-Select Computation

Continuing with the K = 3, 2-cell example, Figure 7.14 illustrates the logic unit

that corresponds to cell 1. The logic executes the special purpose computation

called add-compare-select (ACS). The state metric  is calculated by adding the

previous-time state metric of state a, , to the branch metric  and the previoustime

state metric of state c, , to the branch metric . This results in two possible

path metrics as candidates for the new state metric . The two candidates are

compared in the logic unit of Figure 7.14. The largest likelihood (smallest distance)

of the two path metrics is stored as the new state metric for state a . Also stored

is the new path history  for state a, where  is the message-path history of the

state augmented by the data of the winning path.

Figure 7.14 Logic unit that implements the add-compare-select functions corresponding to cell #1.

 

Also shown in Figure 7.14 is the cell-1 ACS logic that yields the new state

metric  and the new path history . This ACS operation is similarly performed

for the paths in other cells. The oldest bit on the path with the smallest state metric

forms the decoder output.

7.3.5.2 Add-Compare-Select as seen on the Trellis

Consider the same example that was used for describing Viterbi decoding in

Section 7.3.4. The message sequence was m = 1 1 0 1 1, the codeword sequence was

U = 11 01 01 00 01, and the received sequence was Z = 11 01 01 10 01. Figure 7.15

depicts a decoding trellis diagram similar to Figure 7.10. A branch metric that labels

each branch is the Hamming distance between the received code symbols and

the corresponding branch word from the encoder trellis. Additionally, the Figure

7.15 trellis indicates a value at each state x, and for each time from time t2 to t6,

which is a state metric  . We perform the add-compare-select (ACS) operation

when there are two transitions entering a state, as there are for times t4 and later.

For example at time t4, the value of the state metric for state a is obtained by incrementing

the state metric   at time t3 with the branch metric  yielding a

candidate value of 4. Simultaneously, the state metric  at time t3 is incremented

with the branch metric  yielding a candidate value of 3. The select

operation of the ACS process selects the largest-likelihood (minimum distance)

path metric as the new state metric; hence, for state a at time t4, the new state metric

is . The winning path is shown with a heavy line and the path that has

been dropped is shown with a lighter line. On the trellis of Figure 7.15, observe the

state metrics from left to right. Verify that at each time, the value of each state metric

is obtained by incrementing the connected state metric from the previous time

along the winning path (heavy line) with the branch metric between them. At some

 

                Figure 7.15 Add-compare-select computations in Viterbi decoding.

point in the trellis (after a time interval of 4 or 5 times the constraint length), the

oldest bits can be decoded. As an example, looking at time t6 in Figure 7.15, we see

that the minimum-distance state metric has a value of 1. From this state d, the winning

path can be traced back to time t1, and one can verify that the decoded message

is the same as the original message, by the convention that dashed and solid

lines represent binary ones and zeros respectively.

7.3.6 Path Memory and Synchronization

The storage requirements of the Viterbi decoder grow exponentially with constraint

length K. For a code with rate 1/n, the decoder retains a set of 2K 1 paths

after each decoding step. With high probability, these paths will not be mutually

disjoint very far back from the present decoding depth [12]. All of the 2K 1 paths

tend to have a common stem which eventually branches to the various states. Thus

if the decoder stores enough of the history of the 2K 1 paths, the oldest bits on all

paths will be the same. A simple decoder implementation, then, contains a fixed

amount of path history and outputs the oldest bit on an arbitrary path each time it

steps one level deeper into the trellis. The amount of path storage required is [12]

where h is the length of the information bit path history per state. A refinement,

which minimizes the value of h, uses the oldest bit on the most likely path as the

decoder output, instead of the oldest bit on an arbitrary path. It has been demonstrated

[12] that a value of h of 4 or 5 times the code constraint length is sufficient

for near-optimum decoder performance. The storage requirement u is the basic

limitation on the implementation of Viterbi decoders. Commercial decoders are

limited to a constraint length of about K = 10. Efforts to increase coding gain by

further increasing constraint length are met by the exponential increase in memory

requirements (and complexity) that follows from Equation (7.10).

Branch word synchronization is the process of determining the beginning of a

branch word in the received sequence. Such synchronization can take place without

new information being added to the transmitted symbol stream because the received

data appear to have an excessive error rate when not synchronized. Therefore, a simple

way of accomplishing synchronization is to monitor some concomitant indication

of this large error rate, that is, the rate at which the state metrics are increasing or the

rate at which the surviving paths in the trellis merge. The monitored parameters are

compared to a threshold, and synchronization is then adjusted accordingly.

7.4 PROPERTIES OF CONVOLUTIONAL CODES

                7.4.1 Distance Properties of Convolutional Codes

Consider the distance properties of convolutional codes in the context of the simple

encoder in Figure 7.3 and its trellis diagram in Figure 7.7. We want to evaluate

the distance between all possible pairs of codeword sequences. As in the case of

block codes (see Section 6.5.2), we are interested in the minimum distance between

all pairs of such codeword sequences in the code, since the minimum distance is related

to the error-correcting capability of the code. Because a convolutional code is

a group or linear code [6], there is no loss in generality in simply finding the minimum

distance between each of the codeword sequences and the all-zeros sequence.

In other words, for a linear code, any test message is just as “good” as any other

test message. So, why not choose one that is easy to keep track of—namely, the allzeros

sequence? Assuming that the all-zeros input sequence was transmitted, the

paths of interest are those that start and end in the 00 state and do not return to the

00 state anywhere in between. An error will occur whenever the distance of any

other path that merges with the a = 00 state at time ti is less than that of the allzeros

path up to time ti, causing the all-zeros path to be discarded in the decoding

process. In other words, given the all-zeros transmission, an error occurs whenever

the all-zeros path does not survive. Thus, an error of interest is associated with a

surviving path that diverges from and then remerges to the all-zeros path. One

might ask, Why is it necessary for the path to remerge? Isn’t the divergence enough

to indicate an error? Yes, of course, but an error characterized by only a divergence

means that the decoder, from that point on, will be outputting “garbage” for

the rest of the message duration. We want to quantify the decoder’s capability in

terms of errors that will usually take place—that is, we want to learn the “easiest”

way for the decoder to make an error. The minimum distance for making such an

error can be found by exhaustively examining every path from the 00 state to the

00 state. First, let us redraw the trellis diagram, shown in Figure 7.16, labeling each

branch with its Hamming distance from the all-zeros codeword instead of with its

branch word symbols. The Hamming distance between two unequal-length sequences

will be found by first appending the necessary number of zeros to the

shorter sequence to make the two sequences equal in length. Consider all the paths

that diverge from the all-zeros path and then remerge for the first time at some arbitrary

node. From Figure 7.16 we can compute the distances of these paths from

the all-zeros path. There is one path at distance 5 from the all-zeros path; this path

departs from the all-zeros path at time t1 and merges with it at time t4. Similarly,

there are two paths at distance 6, one which departs at time t1 and merges at time t5,

and the other which departs at time t1 and merges at time t6, and so on. We can also

see from the dashed and solid lines of the diagram that the input bits for the

distance 5 path are 1 0 0; it differs in only one input bit from the all-zeros input

sequence. Similarly, the input bits for the distance 6 paths are 1 1 0 0 and 1 0 1 0 0;

each differs in two positions from the all-zeros path. The minimum distance in the

set of all arbitrarily long paths that diverge and remerge, called the minimum free

distance, or simply the free distance, is seen to be 5 in this example, as shown with

the heavy lines in Figure 7.16. For calculating the error-correcting capability of the

code, we repeat Equation (6.44) with the minimum distance dmin replaced by the

free distance df as

 

 Figure 7.16 Trellis diagram, labeled with distances from the all-zeros path.

 

where  means the largest integer no greater than x. Setting df = 5, we see that the

code, characterized by the Figure 7.3 encoder, can correct any two channel errors.

(See Section 7.4.1.1.)

A trellis diagram represents “the rules of the game.” It is a shorthand description

of all the possible transitions and their corresponding start and finish states associated

with a particular finite-state machine. The trellis diagram offers some

insight into the benefit (coding gain) when using error-correction coding. Consider

Figure 7.16 and the possible divergence-remergence error paths. From this picture

one sees that the decoder cannot make an error in any arbitrary way. The error

path must follow one of the allowable transitions. The trellis pinpoints all such allowable

paths. By having encoded the data in this way, we have placed constraints

on the transmitted signal. The decoder knows these constraints, and this knowledge

enables the system to more easily (using less Eb/N0) meet some error performance

requirements.

Although Figure 7.16 presents the computation of free distance in a straightforward

way, a more direct closed-form expression can be obtained by starting with

the state diagram in Figure 7.5. First, we label the branches of the state diagram as

either D0 = 1, D1, or D2, shown in Figure 7.17, where the exponent of D denotes the

Hamming distance from the branch word of that branch to the all-zeros branch.

The self-loop at node a can be eliminated since it contributes nothing to the distance

properties of a codeword sequence relative to the all-zeros sequence. Furthermore,

node a can be split into two nodes (labeled a and e), one of which

represents the input and the other the output of the state diagram. All paths originating

at a = 00 and terminating at e = 00 can be traced on the modified state diagram

of Figure 7.17. We can calculate the transfer function of path a b c e (starting

and ending at state 00) in terms of the indeterminate “placeholder” D, as D2 D D2

= D5. The exponent of D represents the cumulative tally of the number of ones in

the path, and hence the Hamming distance from the all-zeros path. Similarly, the

Figure 7.17 State diagram, labeled according to distance from the all-zeros path.

pathsa b d c e and a b c b c e each have the transfer function D 6 and thus a Hamming

distance of 6 from the all-zeros path. We now write the state equations as

where Xa, . . . , Xe are dummy variables for the partial paths to the intermediate

nodes. The transfer function, T(D), sometimes called the generating function of the

code can be expressed as T(D) = Xe/Xa. By solving the state equations shown in

Equation (7.12), we obtain [15, 16]

The transfer function for this code indicates that there is a single path of distance 5

from the all-zeros path, two of distance 6, four of distance 7, and in general, there

are  paths of distance  from the all-zeros path, where  The free

distance df of the code is the Hamming weight of the lowest-order term in the expansion

of T(D). In this example df = 5. In evaluating distance properties, the transfer

function, T(D), cannot be used for long constraint lengths since the complexity

of T(D) increases exponentially with constraint length.

The transfer function can be used to provide more detailed information than

just the distance of the various paths. Let us introduce a factor L into each branch

of the state diagram so that the exponent of L can serve as a counter to indicate the

number of branches in any given path from state a = 00 to state e = 00. Furthermore,

we can introduce a factor N into all branch transitions caused by the input bit

one. Thus, as each branch is traversed, the cumulative exponent on N increases by

one, only if that branch transition is due to an input bit one. For the convolutional

code characterized in the Figure 7.3 example, the additional factors L and N are

shown on the modified state diagram of Figure 7.18. Equations (7.12) can now be

modified as follows:

The transfer function of this augmented state diagram is

Thus, we can verify some of the path properties displayed in Figure 7.16. There is

one path of distance 5, length 3, which differs in one input bit from the all-zeros

path. There are two paths of distance 6, one of which is length 4, the other length 5,

and both differ in two input bits from the all-zeros path. Also, of the distance 7

paths, one is of length 5, two are of length 6, and one is of length 7; all four paths

correspond to input sequences that differ in three input bits from the all-zeros path.

Thus if the all-zeros path is the correct path and the noise causes us to choose one

of the incorrect paths of distance 7, three bit errors will be made.

 

Figure 7.18 State diagram, labeled according to distance, length, and number of input ones.

7.4.1.1 Error-Correcting Capability of Convolutional Codes

In the study of block codes in Chapter 6, we saw that the error-correcting capability,

t, represented the number of code symbol errors that could, with maximum

likelihood decoding, be corrected in each block length of the code. However,

when decoding convolutional codes, the error-correcting capability cannot be

stated so succinctly. With regard to Equation (7.11), we can say that the code can,

with maximum likelihood decoding, correct t errors within a few constraint lengths,

where “few” here means 3 to 5. The exact length depends on how the errors are

distributed. For a particular code and error pattern, the length can be bounded

using transfer function methods. Such bounds are described later.

7.4.2 Systematic and Nonsystematic Convolutional Codes

A systematic convolutional code is one in which the input k-tuple appears as part of

the output branch word n-tuple associated with that k-tuple. Figure 7.19 shows a binary,

rate 1/2, K = 3 systematic encoder. For linear block codes, any nonsystematic

code can be transformed into a systematic code with the same block distance properties.

This is not the case for convolutional codes. The reason for this is that convolutional

codes depend largely on free distance; making the convolutional code

systematic, in general, reduces the maximum possible free distance for a given

constraint length and rate.

Table 7.1 shows the maximum free distance for rate 1/2 systematic and nonsystematic

codes for K = 2 through 8. For large constraint lengths the results are even

more widely separated [17].

Figure 7.19 Systematic convolutionalencoder, rate 1/2, K = 3.

7.4.3 Catastrophic Error Propagation in Convolutional Codes

A catastrophic error is defined as an event whereby a finite number of code symbol

errors cause an infinite number of decoded data bit errors. Massey and Sain [18]

have derived a necessary and sufficient condition for convolutional codes to display

catastrophic error propagation. For rate 1/n codes with register taps designated by

polynomial generators, as described in Section 7.2.1, the condition for catastrophic

error propagation is that the generators have a common polynomial factor (of

degree at least one). For example, Figure 7.20a illustrates a rate 1/2, K = 3 encoder

with upper polynomial g1(X) and lower polynomial g2(X), as follows:

The generators g1(X) and g2(X) have in common the polynomial factor 1 + X, since

Therefore, the encoder in Figure 7.20a can manifest catastrophic error propagation.

 

Figure 7.20 Encoder displaying catastrophic error propagation. (a) Encoder. (b) State diagram.

In terms of the state diagram for any-rate code, catastrophic errors can occur

if, and only if, any closed-loop path in the diagram has zero weight (zero distance

from the all-zeros path). To illustrate this, consider the example of Figure 7.20. The

state diagram in Figure 7.20b is drawn with the state a = 00 node split into two

nodes, a and e, as before. Assuming that the all-zeros path is the correct path, the

incorrect path a b d d . . . d c e has exactly 6 ones, no matter how many times we go

around the self-loop at node d. Thus for a BSC, for example, three channel errors

may cause us to choose this incorrect path. An arbitrarily large number of errors

(two plus the number of times the self-loop is traversed) can be made on such a

path. We observe that for rate 1/n codes, if each adder in the encoder has an even

number of connections, the self-loop corresponding to the all-ones data state will

have zero weight, and consequently, the code will be catastrophic.

The only advantage of a systematic code, described earlier, is that it can never

be catastrophic, since each closed loop must contain at least one branch generated

by a nonzero input bit, and thus each closed loop must have a nonzero code symbol.

However, it can be shown [19] that only a small fraction of nonsystematic

codes (excluding those where all adders have an even number of taps) are

catastrophic.

7.4.4 Performance Bounds for Convolutional Codes

The probability of bit error, PB, for a binary convolutional code using hard-decision

decoding can be shown [8] to be upper bounded as follows:

where p is the probability of channel symbol error. For the example of Figure 7.3,

T(D, N) is obtained from T(D, L, N ) by setting L = 1 in Equation (7.15).

and

Combining Equations (7.17) and (7.19), we can write

For coherent BPSK modulation over an additive white Gaussian noise

(AWGN) channel, it can be shown [8] that the bit error probability is bounded by

 

where

Ec /N0 = rEb /N0

Eb /N0 = ratio of information bit energy to noise power spectral density

Ec /N0 = ratio of channel symbol energy to noise power spectral density

r = k/n = rate of the code

and Q(x) is defined in Equations (3.43) and (3.44) and tabulated in Table B.1.

Therefore, for the rate code with free distance df = 5, in conjunction with coherent

BPSK and hard-decision decoding, we can write

 

7.4.5 Coding Gain

Coding gain, as presented in Equation (6.19), is defined as the reduction, usually

expressed in decibels, in the required Eb/N0 to achieve a specified error probability

of the coded system over an uncoded system with the same modulation and channel

characteristics. Table 7.2 lists an upper bound on the coding gains, compared to

uncoded coherent BPSK, for several maximum free distance convolutional codes

with constraint lengths varying from 3 to 9 over a Gaussian channel with harddecision

decoding. The table illustrates that it is possible to achieve significant coding

gain even with a simple convolutional code. The actual coding gain will vary

with the required bit error probability [20].

Table 7.3 lists the measured coding gains, compared to uncoded coherent

BPSK, achieved with hardware implementation or computer simulation over a

Gaussian channel with soft-decision decoding [21]. The uncoded Eb/N0 is given in

the leftmost column. From Table 7.3 we can see that coding gain increases as the

bit error probability is decreased. However, the coding gain cannot increase indefinitely;

it has an upper bound as shown in the table. This bound in decibels can be

shown [21] to be

where r is the code rate and df is the free distance. Examination of Table 7.3 also

reveals that at PB = 10−7, for code rates of 1/2 and 2/3, the weaker codes tend to be

closer to the upper bound than are the more powerful codes.

Typically, Viterbi decoding is used over binary input channels with either

hard or 3-bit soft quantized outputs. The constraint lengths vary between 3 and 9,

the code rate is rarely smaller than 1/3 , and the path memory is usually a few con-


[+] Enlarge Image

straint lengths [12]. The path memory refers to the depth of the input bit history

stored by the decoder. From the Viterbi decoding example in Section 7.3.4, one

might question the notion of a fixed path memory. It seems from the example that

the decoding of a branch word, at any arbitrary node, can take place as soon as

there is only a single surviving branch at that node. That is true; however, to actually

implement the decoder in this way would entail an extensive amount of processing

to continually check when the branch word can be decoded. Instead, a fixed

delay is provided, after which the branch word is decoded. It has been shown [12,

22] that tracing back from the state with the lowest state metric, over a fixed

amount of path history (about 4 or 5 times the constraint length), is sufficient to

limit the degradation from the optimum decoder performance to about 0.1 dB for

the BSC and Gaussian channels. Typical error performance simulation results are

shown in Figure 7.21 for Viterbi decoding with hard decision quantization [12]. Notice

that each increment in constraint length improves the required Eb/N0 by a factor

of approximately 0.5 dB at PB = 10−5.


[+] Enlarge Image

 


[+] Enlarge Image

Figure 7.21 Bit error probability versus Eb /N0 for rate 1/2 codes using coherent BPSK over a BSC, Viterbi decoding, and a 32-bit path memory. (Reprinted with permission from J. A. Heller and I. M. Jacobs, “Viterbi Decoding for Satellite and Space Communication,” IEEE Trans. Commun. Technol., vol. COM19, no. 5, October 1971, Fig. 7, p. 84. © 1971 IEEE.)

7.4.6 Best Known Convolutional Codes

The connection vectors or polynomial generators of a convolutional code are usually

selected based on the code’s free distance properties. The first criterion is to

select a code that does not have catastrophic error propagation and that has the

maximum free distance for the given rate and constraint length. Then the number

of paths at the free distance df, or the number of data bit errors the paths represent,

should be minimized. The selection procedure can be further refined by considering

the number of paths or bit errors at df + 1, at df + 2, and so on, until only one

code or class of codes remains. A list of the best known codes of rate 1/2, K = 3 to 9,

and rate 1/3, K = 3 to 8, based on this criterion was compiled by Odenwalder [3, 23]

and is given in Table 7.4. The connection vectors in this table represent the pres-

ence or absence (1 or 0) of a tap connection on the corresponding stage of the convolutional

encoder, the leftmost term corresponding to the leftmost stage of the encoder

register. It is interesting to note that these connections can be inverted

(leftmost and rightmost can be interchanged in the above description). Under

the condition of Viterbi decoding, the inverted connections give rise to codes

with identical distance properties, and hence identical performance, as those in

Table 7.4.

7.4.7 Convolutional Code Rate Trace-Off

          7.4.7.1 Performance with Coherent PSK Signaling

The error-correcting capability of a coding scheme increases as the number of

channel symbols n per information bit k increases, or the rate k/n decreases. However,

the channel bandwidth and the decoder complexity both increase with n. The

advantage of lower code rates when using convolutional codes with coherent PSK,

is that the required Eb /N0 is decreased (for a large range of code rates), permitting

the transmission of higher data rates for a given amount of power, or permitting reduced

power for a given data rate. Simulation studies have shown [16, 22] that for a

fixed constraint length, a decrease in the code rate from 1/2 to 1/3 results in a reduction

of the required Eb /N0 of roughly 0.4 dB. However, the corresponding increase in

decoder complexity is about 17%. For smaller values of code rate, the improvement

in performance relative to the increased decoding complexity diminishes

rapidly [22]. Eventually, a point is reached where further decrease in code rate is

characterized by a reduction in coding gain. (See Section 9.7.7.2.)

7.4.7.2 Performance with Noncoherent Orthogonal Signaling

In contrast to PSK, there is an optimum code rate of about 1/2 for noncoherent

orthogonal signaling. Error performance at rates of 1/3, 2/3, 3/4 and are each worse than

those for rate 1/2. For a fixed constraint length, the rate 1/3, 2/3, 3/4 and codes typically

degrade by about 0.25, 0.5, and 0.3 dB, respectively, relative to the rate 1/2 performance

[16].

7.4.8 Soft-Decision Viterbi Decoding

For a rate 1/2 binary convolutional code system, the demodulator delivers two code

symbols at a time to the decoder. For hard-decision (2-level) decoding, each pair of

received code symbols can be depicted on a plane, as one of the corners of a

square, as shown in Figure 7.22a. The corners are labeled with the binary numbers

(0,0), (0,1), (1,0), and (1,1), representing the four possible hard-decision values that

the two code symbols might have. For 8-level soft-decision decoding, each pair of

code symbols can be similarly represented on an equally spaced 8-level by 8-level

plane, as a point from the set of 64 points shown in Figure 7.22b. In this

soft-decision case, the demodulator no longer delivers firm decisions; it delivers

quantized noisy signals (soft decisions).

The primary difference between hard-decision and soft-decision Viterbi decoding,

is that the soft-decision algorithm cannot use a Hamming distance metric

because of its limited resolution. A distance metric with the needed resolution is

Euclidean distance, and to facilitate its use, the binary numbers of 1 and 0 are

transformed to the octal numbers 7 and 0, respectively. This can be seen in Figure

7.22c, where the corners of the square have been re-labeled accordingly; this allows

us to use a pair of integers, each in the range of 0 to 7, for describing any point in

the 64-point set. Also shown in Figure 7.22c is the point 5,4, representing an

example of a pair of noisy code-symbol values that might stem from a

 

Figure 7.22 (a) Hard-decision plane (b) 8-level by 8-level soft-decision plane (c) Example of soft code symbols (d) Encoding trellis section (e) Decoding trellis section.

demodulator. Imagine that the square in Figure 7.22c has coordinates x and y.

Then, what is the Euclidean distance between the noisy point 5,4 and the noiseless

point 0,0? It is

Similarly, if we ask what is the Euclidean distance between the noisy point 5,4 and the

noiseless point 7,7?

It is

Soft-decision Viterbi decoding, for the most part, proceeds in the same way as

hard-decision decoding (as described in Sections 7.3.4 and 7.3.5). The only difference

is that Hamming distances are not used. Consider how soft-decision decoding is performed

with the use of Euclidean distances. Figure 7.22d shows the first section of an

encoding trellis, originally presented in Figure 7.7, with the branch words transformed

from binary to octal. Suppose that a pair of soft-decision code symbols with

values 5,4 arrives at a decoder during the first transition interval. Figure 7.22e shows

the first section of a decoding trellis. The metric

representing the Euclidean distance between the arriving 5,4 and the 0,0 branch word, is placed

on the solid line. Similarly, the metric  representing the Euclidean distance between

the arriving 5,4 and the 7,7 code symbols, is placed on the dashed line. The rest of the task,

pruning the trellis in search of a common stem, proceeds in the same way as harddecision

decoding. Note that in a real convolutional decoding chip, the Euclidean distance

is not actually used for a soft-decision metric; instead, a monotonic metric that

has similar properties and is easier to implement is used. An example of such a metric

is the Euclidean distance-squared, in which case the square-root operation shown

above is eliminated. Further, if the binary code symbols are represented with bipolar

values, then the inner-product metric in Equation (7.9) can be used. With such a metric,

we would seek maximum correlation rather than minimum distance.

Relevant NI products


Customers interested in this topic were also interested in the following NI products:

For the complete list of tutorials, return to the NI RF and Communications Fundamentals Main page.

Buy the Book


Purchase Digital Communications from Prentice Hall Professional through this link and receive the following:

  • Between 15% and 30% Off
  • Free Shipping and Handling
2 ratings | 4.50 out of 5
Print | PDF

Reader Comments | Submit a comment »

 

Legal
Excerpt from the book published by Prentice Hall Professional (http://www.phptr.com).
Copyright Prentice Hall Inc., A Pearson Education Company, Upper Saddle River, New Jersey 07458.
This material is protected under the copyright laws of the U.S. and other countries and any uses not in conformity with the copyright laws are prohibited, including but not limited to reproduction, DOWNLOADING, duplication, adaptation and transmission or broadcast by any media, devices or processes.