Blind Reduced State MAP Equalization

 

Constant Modulus Algorithm (CMA) [1] and its derivatives are often used to equalize 2D linear modulated signals.  However, even when using more sophisticated variants to shorten the duration of data needed for the filter to converge, the performance of CMA is limited by the fact that it is a linear transversal filter.  For some channels, a blind Viterbi type algorithm may prove to be more successful.  If you are not familiar with Viterbi Equalization (VE) [2] or Baum-Welch (BW) [3], read [2] and [3] before continuing.

 

Reduced State Viterbi

            The first challenge is to reduce the complexity of the Viterbi or Maximum Likelihood Sequence Estimation (MLSE) algorithm.  The exponent corresponding to the number of states in the trellis can be reduced by using the Reduced State Sequence Estimation (RSSE) [4] and the Decision Feedback Sequence Equalization (DFSE) [5] algorithms.

Figure 1 shows a reduced state QAM constellation and trellis for a length three channel.  Subset zero is represented by a plus symbol and subset one by a dot symbol.  The symbols within a subset are chosen to be a maximum distance from each other.  By maximizing the intra-subset distance, adjacent symbol errors, which are the most likely to occur, will have distinct paths through the trellis.  If the sets were chosen so as to compact states into the smallest area in the IQ plane, adjacent symbol error paths would immediately be merged with the correct paths since they share the same subset state.

Notice that the trellis only encodes the subset in each state.  The symbols either need to be carried along with the path metrics or the element within each subset needs to be carried along with the path metrics so that the symbols can be computed from the subset and the element within each subset.  For example, the subset transition 01→10 encodes all the following possible symbol states: 010, 012, 210, 212, 030, 032, 230, 232.

Figure 1:  (Left) QAM divided into + and . subsets.  (Right) Reduced state trellis for channel memory length three.

 

                    RSSE is more general than this.  It allows you to partition the symbols into variable sized subsets such that the number of subsets decreases for older symbols.  This can increase the implementation complexity quite a bit.  If the constellation order is relatively low, a simpler implementation might be to use N/2 subsets in combination with decision feedback to cancel any residual energy rather than truncating the channel.  Finite Impulse Response (FIR) decision feedback is a special case of RSSE in which there is only one subset; the other extreme is classic Viterbi in which each symbol is in its own subset.  DFSE requires no additional states in the trellis; however, we need to carry around more symbols.  In the previous example, suppose we wanted to model a channel with memory length five but still using only states of length three.  We would have to carry around two additional symbols.  So, the state transition 01→10 might now encode the sequences 22010 or 10010.

The branch metrics for these extra symbols need not necessarily incur additional computation.  The dot product of the symbols and the channel can be broken into blocks and pre-computed.  Figure 2 shows a length nine channel broken into three blocks.  Each block has 43 precomputed entries for the QAM signal we have been using.  So, the total complexity is three table lookups and three Multiply-Accumulate (MAC) operations.  If the trellis had a channel memory of length seven, carrying the last two symbols along would not incur additional computational overhead in this implementation.

Figure 2:  A dot product broken into blocks of three symbols.

 

The reduced number of trellis states creates the problem of early merging.  The less states there are in the trellis, the quicker paths merge, increasing the number of error events.  The solution is to convert the received signal to its minimum phase equivalent.  This accomplishes two things.  First, it whitens the channel response, which is a requirement of Viterbi.  More importantly, it compacts the channel impulse energy into the start of the channel.  This reduces the length of the channel we need to model while allocating the symbols with the largest number of subsets (more subsets yield better performance) onto the taps with the greatest magnitude.  Recall that the number of subsets can decrease for older symbols in each state.

The first step in converting to the minimum phase equivalent signal is to decompose the channel into its minimum and maximum phase components.  There is a depiction of this decomposition in figure 3.  This can be done by zeroing out the negative frequencies of the cepstrum.  See [6 (p 62), 7] for a discussion and an example implementation.

Figure 3:  Minimum and maximum phase decomposition.

 

The next step is converting the received signal to its minimum phase equivalent.  This yields a signal which has the same power spectral density as the received signal matched filtered, but the transfer function for the entire system is the minimum phase system.  Converting to the minimum phase equivalent signal requires a pre-filter with the maximum-phase filter in the divisor [8].  However, this is an unstable filter.  Instead, you can pre-filter in reverse time with the minimum phase filter.  This has the unfortunate side-effect of forcing the signal processing to be done on discontinuous blocks.

From [8], we have Hmin(z) = A(z)H(z) where H is the system transfer function, A is the desired pre-filter, and Hmin is the desired minimum phase equivalent system.  Also, we have Hmin(z)H*min(1/z*) = H(z)H*(1/z*).  Inversion of z is equivalent to a time reversal.  The double conjugation of z and H cancels so that only the coefficients of the filter are conjugated.  What this tells us is that the system transfer function convolved with the matched filter (the final term or the right side) is the same as the convolution of the minimum phase and maximum phase filters; these are two equally valid spectral decompositions for the magnitude response of the system transfer function.  The time reversed and coefficient conjugated minimum phase filter is equal to the maximum phase filter except for a delay.  Since we are filtering in reverse time to ensure stability, we want A(1/z).  From the previous equations, A(1/z) = H*(z*)/H*min(z*).  Following pre-filtering, the resultant signal is again time reversed before processing by Viterbi.

In general, the spread of the multipath signals will not fall exactly on multiples of the symbol duration [9].  Further, the sampler phase may be unknown, and there will always be at least some excess bandwidth that aliases in a symbol spaced equalizer.  For these reasons, we use a fractionally spaced VE.  Fractional spacing means there is more than one sample per symbol.  Often you will see 2/T spaced equalizers.  This means there are two samples per symbol which will normally ensure there is no aliasing.  The 2/T spaced VE can be implemented by computing the branch metric as the sum of two subsampled VEs with a single sample phase offset [10].  This doubles the number of MACs but does not change the number of states.  Occasionally you will see the subsampled signals in a fractionally spaced channel referred to as a “Virtual Receiver” or “Virtual Channel”.  This is to emphasize the fact that, mathematically, there is no difference between a 2x oversampled channel on a single receiver and two receivers sampling the same signal with a half symbol delay.

For a 2/T VE, if the sample phase is known, one of the virtual channels can be aligned to fall exactly at the symbol peak with the other channel falling exactly between symbols.  Empirically, so long as you choose the channel that falls near the peak of each symbol, RSSE will work well.  However, if you choose the other channel, it will not work at all.  One possible ad hoc solution is to run the VE twice with the second run having a one sample delay.  However, this has the problem that you need to know which equalizer is correct.  Another ad hoc solution, that seems to work well in practice, is to use a 3/T equalizer and select the minimum branch metric over all symbols and phases at each state for each time step as shown in figure 4.

Figure 4:  The branch matric for the 3/T VE is minimized over each new symbol in the sequence and over the three possible delays for each transition in the trellis.  S is the symbol sequence (unknown transmitted sequence) for each state and Y is the observed signal.

 

Baum-Welch

When the channel is unknown, the Viterbi Equalizer can be applied iteratively to demodulate the symbol sequence; however, Expectation Maximization (EM) is a more elegant and higher performance solution [11].  The BW algorithm is EM applied to a Hidden Markov Model (HMM).  Figure 5 shows the reduced state QAM trellis drawn as an HMM.  Each state of the trellis is now a hidden node of the HMM.  The transition probabilities are fixed at 1/2 because there are two transitions out of each old state into each new state.  Each state also has a conditional PDF (not shown): p(Y|S=Sk,θ) in which theta is the parameter set made up of the noise (σᵥ²) and channel (h).  The conditional distribution of the observations given the symbol sequence is thus a function of σᵥ² and h.

Figure 5:  HMM of reduced state QAM.  The PDF of observations conditioned on the state are not shown.

 

EM aims to maximize the marginal likelihood [12] arg max[θ] P(Y,Sk|θ) indirectly by first estimating the expectation of S conditioned on Y and the current estimate of θ, θ’, (the E step) and then maximizing the likelihood over θ (the M step).  The new θ is then used in the next iteration, and the process is repeated until convergence.  The reason for this iterative process is that we can’t directly compute the Maximum Likelihood (ML) of p(Y|θ) due to the latent variable S.  Instead we compute the Expected Sufficient Statistic (ESS) over all states with respect to p(S|Y,θ’) and then maximize over θ.

The first step of BW is to compute the Maximum A-Posteriori (MAP) estimate p(S|Y,θ’) for every state and observation in the trellis.  This is done with the Forward-Backward (FB) algorithm.  It is also referred to as BCJR; however, there is one significant difference.  BCJR computes the likelihood ratios of received bits.  These likelihood ratios are used as the soft information exchanged between the equalizer and the decoder of a Turbo Equalizer (TE).  For BW, we are only concerned with probabilities of sequences of symbols or states.  Otherwise, the algorithms are effectively the same.

The derivation of the FB recursions is rarely explained in adequate detail.  I will present one step of the forward recursion here; the backward recursion and the computation of the joint distribution over all observations (γ in [3]) can be worked out similarly.  The derivation requires modeling the state transitions as a Dynamic Bayesian Network (DBN).  The HMM is represented as a DBN in figure 6.  The square boxes are the latent variables (states), and the circles are the observables (received signal).  The entire HMM in figure 5 is represented by the nodes in the dashed box of figure 6.  Thus, we lose some of the detail in the HMM, but we can now see the dependencies between time steps.

Figure 6:  DBN representation of the MAP problem.  (Top) The dashed box is a compact graphical representation of the HMM.  The DBN is unrolled three time steps so that the independences across time steps are apparent.  (Middle) Observations Y1 and Y2 are made independent of state S3 by conditioning on state S2.  (Bottom) Y1 and Y2 are made independent of Y3 by conditioning on S3.

 

The forward probabilities are αt(i) = p(Yt<,St=Si) in which t is the time step through the trellis and i is the state index.  The notation p(Yt<) is shorthand for all observations up to t (i.e. p(Y1,…,Yt)).  The transition probabilities are p(St+1=Sj|St=Si), and the conditional distribution of the output symbol for a given state j is p(Yt+1|St+1=Sj).  The recurrence relation to get from time step t=2 to t=3 is

p(Y3|S3=Sj)p(Y2<,S2=Si)p(S3=Sj|S2=Si)         # Recurrence relation
p(Y
3|S3=Sj)p(Y2<|S2=Si)p(S2=Si)p(S3=Sj|S2=Si) # Def of cond prob
p(Y
3|S3=Sj )p(Y2<,S3=Sj|S2=Si)p(S2=Si)        # Figure 6 (middle)

p(Y3|S3=Sj)p(Y2<,S2=Si,S3=Sj)                 # Def of cond prob

p(Y2<S3=Sj) p(Y3|S3=Sj)                        # Marginalization

p(Y3<S3=Sj)                                    # Figure 6, def of c.p.

α3(j)

                    

The third and sixth line are a result of conditional independence.  Two variables, a and b, are conditionally independent of a third variable, c, (denoted ab|c) if p(a,b|c) = p(a|c)p(b|c).  The conditioned upon variable is shaded in figure 6 (middle and bottom).  The independence relation is encoded in the graphical structure of the DBN.  In figure 6 (bottom), Y1 and Y2 are d-separated from Y3 by conditioning on S2 [13 (p 16)].

            Unlike Viterbi, MAP requires a backward pass over the trellis.  Because we’re using a reduced state trellis, the backward pass will need the full symbol sequence for each subset sequence to compute the backward probabilities and the re-estimation quantities.  So, the forward pass constructs the trellis, and the backwards pass iterates over the survivor map [14,15] as depicted in figure 7.  VE and MAP are very similar as shown in figure 8.  The main differences are that first, VE computes the minimum of branches coming into each node, whereas MAP computes a summation of the branches, and second, MAP explicitly computes the probabilities at every node, whereas VE computes only the branch metric (which is the negative of the probabilities in the log domain).  If there were enough states such that each state could fully encode the symbol sequence needed to evaluate the Gaussian (e.g. 43 states for a QAM signal with memory length three), MAP could be performed by computing the forward and backward computations at each state without any additional memory requirements or computations.  In the reduced state algorithm, MAP effectively computes VE as a sub step.

Figure 7:  MAP Forward-Backward recursions.  (Left) Forward pass.  (Right) Backward pass using the same symbol sequence.

 

Figure 8:  Viterbi and MAP perform similar computations on the trellis, but Viterbi saves only the survivor at each step, whereas MAP sums over all branches.

 

            Following the FB recursions, the E step computes the auxiliary function from the probability of the observation sequence given the state and the estimate of σᵥ² and h from the prior iteration.  The M step re-estimates σᵥ² and h by maximizing the auxiliary function.  The new channel estimate is computed from the normal equations where the autocorrelation matrix (E{ssT|σᵥ²,h}) and cross-correlation vector are conditioned on the old σᵥ² and h.  This a posteriori Least Squares (LS) [16] estimate computes the expectation over the ensemble of states in the trellis and then averages over each time step in the trellis.  The new expected residual, σᵥ², is likewise computed from the time-averaged ensemble of states in the trellis.

            One remaining detail is that the cross-correlation vector has a delay ambiguity of three samples since we chose the minimum branch metric such that it ranges over the three sample delays within each symbol.  It’s worse than this since the delay ambiguity can change with every node and time step.  Empirically, choosing the middle sample of the received signal as the fixed delay to compute the cross-correlation vector seems to work well.  Choosing the first or last sample tends to cause the delay of the estimated system transfer function to drift.

            Finally, there is a precision issue to consider when computing the probabilities of the observations over the trellis.  The dynamic range of the probabilities quickly exceeds the size of the mantissa of double precision.  For example, P(Y40<,S) is an extremely small number for a large number of states.  One solution is to compute the probabilities in the log domain.  This is known as the Log-Map algorithm [17].  At each node in the trellis, the probabilities need to be converted back to the linear domain by exponentiation, summed, and then converted back to the log domain.  This is done by recursively applying the Jacobian logarithm:  ln(ep1+ep2) = max(p1,p2) + ln(1 + e-|p2-p1|).

            The EM algorithm is only guaranteed to converge to a local optimum.  Further, for this type of blind demodulation problem, there is no guaranteed unique solution.  That is, the solution is not identifiable [11].  At the very least, even in the absence of noise, there is a rotational ambiguity in the IQ constellation and a delay ambiguity in the channel [16] which can result in misconvergence.  In experiments with little to no noise, when using the reduced state BW algorithm, it’s possible to initialize the algorithm with a known channel, and the algorithm will quickly converge to the wrong channel with a lower expected residual than the correct channel.  Thus, at least for some channels, there may be no way to identify the correct symbols and channels.  When the problem is caused by local optima, seeding instantiations of the algorithm with different initial channel estimates may mitigate the problem.

 

Further Work

(1)    There is a generalization of Viterbi known as Per-Survivor Processing (PSP) [18].  The idea is simply that you can embed the various components of the demodulator into every path of the trellis rather than applying them only to the entire trellis.  For example, you might have an adaptive transversal filter and Phase-Locked Loop (PLL) for every state.  This would be a straightforward modification of the BW equalizer but at a significant increase in complexity.

(2)    The block processing of the BW equalizer is undesirable for a couple of reasons.  Mainly, it can be difficult to stitch consecutive blocks of output symbols together due to the unknown delay difference between blocks.  The standard VE can be made adaptive [19].  However, due to the reverse time filtering needed for RSSE, you would also need continuous input time reversed filtering [20].  Further, the backwards pass of MAP would need to be eliminated, necessitating the use of a type-II MAP algorithm [21].  Because the re-estimation process involves a LS type problem, it should be amenable to adaptive solutions such as Recursive Least Squares (RLS) and Least Mean Squares (LMS) [6].

 


References

 

[1] D. N. Godard.  “Self-Recovering Equalization and Carrier Tracking in Two-Dimensional Data Communications Systems.”  IEEE Communications.  Vol 28 (11), pp 1867-1875.  November 1980.

[2] B. Sklar.  “How I learned to Love the Trellis.”  IEEE Signal Processing Magazine.  Vol 20 (93), pp 87-102.  May 2003.

[3] L. R. Rabiner.  “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.”   Proceedings of the IEEE.  Vol 77 (2), pp 257-286.  February 1989.

[4] M. V. Eyuboglu.  S.U. H. Qureshi.  “Reduced-State Sequence Estimation with Set Partitioning and Decision Feedback.”  IEEE Transactions on Communications.  Vol 36 (1), pp 13-20.  January 1988.

[5] A. Duel-Hallen.  C. Heegard.  “Delayed Decision-Feedback Sequence Estimation.  IEEE Transactions on Communications.”  Vol 37 (5), pp 428-436.  May 1989.

[6] D. G. Manolakis.  V. K. Ingle.  S. M. Kogon.  “Statistical and Adaptive Signal Processing.”  Artech House, Inc.  2005.

[7] J. O. Smith.  https://www.dsprelated.com/showcode/20.php

[8] W. H. Gerstacker.  F. Obernosterer.  R. Meyer.  H. B. Huber.  “An Efficient Method for Prefilter Computation for Reduced-State Equalization.!  The 11th IEEE Symposium on Personal, Indoor and Mobile Radio Communications.  PP 604-609.  September 2000.

[9] Y. Wan.  Q. Liu.  A. M. Sendyk.  “A Fractionally-Spaced Maximum-Likelihood Sequence Estimation Receiver in a Multipath Fading Environment.”  IEEE ICASSP. PP 689-692.  March 1992.

[10] G. M. Vachula.  F. S. Hill, Jr.  “On Optimal Detection of Band-Limited PAM Signals with Excess Bandwidth.”  IEEE Transactions on Communications.  Vol 29 (6), pp 886-890.  June 1981.

[11] H. A. Cirpan.  M. K. Tsatsanis.  “Maximum Likelihood Blind Channel Estimation in the Presence of Doppler Shifts.”  IEEE Transactions on Signal Processing.  Vol 47 (6), pp 1559-1569.  June 1999.

[12] Expectation Maximization.  https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm

[13] J. Pearl.  “Causality: Models, Reasoning, and Inference.”  Cambridge University Press.  2009.

[14] M. J. Lopez.  K. Zangi.  J. F. Cheng.  “Reduced-Complexity MAP Equalizer for Dispersive Channels.”  IEEE Vehicular Technology Conference.  PP 1371-1375.  September 2000

[15] G. Colavolpe.  G. Ferrari.  R. Raheli.  “Reduced-State BCJR-Type Algorithms.”  IEEE Journal on Selected Areas in Communications.  Vol 19 (5), pp 848-859.  May 2001.

[16] R. R. Lopes.  J. R. Barry.  “Blind Iterative Channel Identification and Equalization.”  IEEE International Conference on Communications.  June 2001.

[17] P. Robertson.  E. Villebrun.  P. Hoeher.  “A Comparison of Optimal and Sub-Optimal MAP Decoding Algorithms Operating in the Log Domain.”  IEEE International Conference on Communications.  PP 1009-1013.  June 1995.

[18] A. Polydoros.  K. M. Chugg.  “Per-Survivor Processing (PSP).”  http://csi.usc.edu/chugg/papers/PoCh97.pdf

[19] F. R. Magee, Jr.  H. G. Proakis.  “Maximum-Likelihood Sequence Estimation for Digital Signaling in the Presence of Intersymbol Interference.”  IEEE Transactions on Information Theory.  Vol 19 (1), pp 120-124.  January 1973.

[20] S. R. Powell.  P. M. Chau.  “A Technique for Realizing Linear Phase IIR Filters.”  IEEE Transactions on Signal Processing.  Vol 39 (11), pp 2425-2434.  November 1991.

[21] Y. Li.  B. Vucetic.  Y. Sato.  “Optimum Soft-Output Detection for Channels with Intersymbol Interference.”  IEEE Transactions on Information Theory.  Vol 41 (3), pp 704-713.  May 1995.