# Analog DFT Processors for OFDM Receivers: Circuit Mismatch and System Performance Analysis

Nima Sadeghi, Student Member, IEEE, Vincent C. Gaudet, Member, IEEE, and Christian Schlegel, Senior Member, IEEE

Abstract—An N-symbol discrete Fourier transform (N-DFT) processor based on analog CMOS current mirrors that operate in the strong inversion region is presented. It is shown that transistor mismatch can be modeled as an input-referred noise source that can be used in system-level studies. Simulations of a radix-2, 256-symbol fast Fourier transform (FFT) show that the model produces equivalent results to those of a model that incorporates a mismatch term into each current mirror. It is shown that in general, high-radix FFT structures and specifically the full-radix DFT have reduced sensitivity to mismatch and a reduced number of current mirrors compared to radix-2 structures and have some key advantages in terms of transistor count with respect to comparable digital implementations. Simulations of an orthogonal frequency-division multiplexing system with forward error control coding, that take into account current mirror nonidealities such as mismatch, show that an analog DFT front end loses only 0.5 dB with respect to an ideal circuit.

Index Terms—Analog circuits, current mirrors, fast Fourier transform, mismatch, orthogonal frequency division multiplexing.

#### I. INTRODUCTION AND BASIC CIRCUIT

**E** VER since the discovery of capacity approaching forward error control codes such as low-density parity-check codes and turbo codes, and iterative decoding algorithms, there has been considerable effort put into energy-efficient, high-speed implementations of such decoders. Analog implementations of iterative decoders have been widely reported [1]–[12], with promising results in terms of circuit complexity, power, and speed. A natural extension of analog decoding research is to attempt to integrate these decoders with other analog front-end processing blocks. The target application of this paper is a radio receiver and decoder that would operate with extremely low power levels such that energy scavenging methods [13] could provide sufficient power to operate the entire receiver. Such an analog receiver might well find application in ad hoc sensor networks or medical monitoring, where extremely low power consuming communications devices will be a necessity.

N. Sadeghi is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: nimas@ece.ubc.ca).

V. Gaudet and C. Schlegel are with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada (e-mail: vgaudet@ece.ualberta.ca; schlegel@ece.ualberta.ca).

Digital Object Identifier 10.1109/TCSI.2008.2011582

b encoder IFFT OFDM Transmitter DFDM Transmitter b decoder AWGN channel AWGN channel AWGN channel Analog OFDM Receiver

Fig. 1. OFDM communication transceiver system model.

In recent years, orthogonal frequency-division multiplexing (OFDM) has received considerable attention for high-speed wireless communication systems due to its effective transmission capability and robustness to frequency-selective channels when dealing with a wide range of channel impairments, such as impulse noise and severe multi-path fading [14]–[16]. A fast Fourier transform (FFT) processing block is required for OFDM receivers [17], [18]. The communication system model shown in Fig. 1, in which an OFDM transmission format is used with differential binary phase-shift keying (DBPSK) modulation and forward error control, is used in this paper in order to evaluate potential all-analog implementations of the FFT.

# A. Analog Fourier Transform Circuits

Until now there has been limited research into analog implementations of the FFT. In [19], an analog FFT circuit topology based on analog multipliers is reported. An analog current mode FFT circuit is presented in [20] that uses switching current delay flip-flop circuits that act as current memories. The analog FFT and discrete Fourier transform (DFT) designs presented in this work are based on analog current mirrors and hence the circuit complexity is lower than that of the previously reported circuits. The impact of transistor pair mismatch on system performance is mathematically modeled as an input-referred mismatch source. Compelling evidence is presented that higher radix FFT structures such as the full-radix DFT are more suitable for analog implementation than the regular radix-2 FFT. An earlier version of the circuits presented in this paper was reported in [21], and an earlier chip implementation was published in [22].

A DFT contains only two operations, namely, addition and multiplication with constants, as expressed by the following equation:

$$S_n = \sum_{k=0}^{N-1} x_k e^{\frac{j2\pi kn}{N}}, \qquad 0 \le n \le N-1$$
(1)

#### 1549-8328/\$26.00 © 2009 IEEE

Manuscript received July 30, 2008; revised October 08, 2008. First published-December 22, 2008; current version published September 11, 2009. This work was supported by the Alberta Informatics Circle of Research Excellence. This paper was recommended by Associate Editor Y. Massoud.



Fig. 2. Radix-2, 8-FFT butterfly structure. Inputs and outputs are complex differential values and  $W_8^1$ ,  $W_8^2$ , and  $W_8^3$  are complex constants representing WFs. cp1 and cp2 subscripts represent two copies of the input current.

where  $x_k$  are complex inputs for different frequency channels in an OFDM communication system, N is the number of points in the DFT, and n is the number of outputs. The  $e^{(j2\pi k)/(N)}$  term can be expressed as a real and an imaginary constant on the unit circle as follows:

$$W_N{}^k = e^{\frac{j2\pi k}{N}} = \cos\frac{2\pi k}{N} + j\sin\frac{2\pi k}{N} = \mathrm{WF}_i + j\mathrm{WF}_q.$$
(2)

where  $WF_i$  and  $WF_q$  are weight factors (WFs) that are multiplied with the real and imaginary parts of the corresponding inputs  $x_k$ . Considering differential signaling, all WFs have a value between 0 and 1. Therefore, a DFT only requires summation and scaling operations.

Digital designers typically implement the DFT using FFT structures because of a decrease in complexity from  $O(N^2)$  to  $O(N \log N)$ . However, the radix-2 FFT structure is less preferable in the analog case, as discussed later in this paper. One advantage of using analog circuitry over comparable digital implementations is that the summation of currents in the analog domain is free; however, in the digital domain, addition requires many logic gates and incurs delay. Also, in analog designs the number of wires per input is only two; in digital implementations this number is dependent on quantization. In an analog FFT circuit, symbols can be represented as currents; if differential signaling is chosen, then a difference of two currents is proportional to the signal it represents.

## B. Analog FFT Example

As mentioned earlier, the only required operations are addition and scaling, and these are achievable using current mirrors. Consider the radix-2, 8-FFT shown in Fig. 2. In this diagram, the white bubbles represent a duplication of inputs, the black bubbles represent summation of inputs, the crossed bubbles represent scaling by a specified WF value, and the -1 bubbles represent interchanging the negative/positive input signals if using differential signaling. All the bubbles can be implemented using only current mirrors, as shown in Figs. 3 and 4. For white bubbles, the input is copied twice for the next addition step. For black bubbles, the corresponding inputs are tied together, relying on Kirchhoff's current law for summation. For WFs, the



Fig. 3. (a) Graphical illustration. (b) Actual implementation of copy bubbles in the complex differential FFT butterfly structure using only analog current mirrors. cpl and cp2 subscripts represent two copies of the input current, while the i+, i-, q+, and q- subscripts represent the positive/negative and real/imaginary part of the complex differential signals.



Fig. 4. (a) Graphical illustration. (b) Actual implementation of summing bubbles in the complex differential FFT butterfly structure using only analog current mirrors. cpl and cp2 subscripts represent two copies of the input current, while the i+, i-, q+, and q- subscripts represent the positive/negative and real/imaginary part of the complex differential signals.

(W)/(L) ratio of the output transistor of each mirror is chosen to realize the required scaling factor. Each complex signal requires four wires: two for the real part, represented using differential signaling where  $\operatorname{Re}(x) = x_{i+} - x_{i-}$ , and two for the imaginary part, again represented using differential signaling where  $\operatorname{Im}(x) = x_{q+} - x_{q-}$ . The actual implementation of copy and summing bubbles are shown in Figs. 3 and 4. The number of transistors used in each FFT stage is explained in detail in [23], and the results are summarized in Table I.

## C. Paper Organization

This paper: 1) demonstrates the feasibility of an analog DFT processor; 2) analyzes its tolerance to current mirror nonlin-



Fig. 5. I<sub>out</sub> versus I<sub>in</sub> dc transfer characteristic of NMOS current mirror with PMOS load, showing three regions of operation. Circles represent dc circuit simulation data points. Lower line represents ideal outputs.

TABLE I Number of Transistors Used in Each Radix-2 FFT

| Radix-2 FFT Size | Number of Transistors |
|------------------|-----------------------|
| 2-FFT            | 24                    |
| 4-FFT            | 96                    |
| 8-FFT            | 328                   |
| 16-FFT           | 968                   |
| 32-FFT           | 2520                  |
| 64-FFT           | 6248                  |
| 128-FFT          | 14952                 |
| 256-FFT          | 34856                 |

earities and mismatch in the context of an OFDM communication system with forward error control; and 3) demonstrates the superiority of a full-radix DFT processor over small-radix implementations, both in terms of implementation cost and tolerance to mismatch. The rest of the paper is organized as follows. In Section II, the nonidealities of current mirrors are modeled. A mathematical input-referred mismatch model is derived in Section III. In Section IV, results of a Monte Carlo simulation study that incorporates the mismatch model are presented. Finally, Section V concludes this paper.

## II. CURRENT MIRROR FFT BUILDING BLOCK

This section investigates the nonideal behavior of the current mirror, which is the basic building block in analog DFT processors.

## A. Current Mirror Model

The proposed analog FFT uses both NMOS and PMOS current mirrors. A current mirror's transfer characteristic is dependent on the value of drain-to-source voltage  $V_{ds}$ , at the output. To find the actual output of each mirror, an NMOS

mirror with a PMOS diode-connected load and a PMOS mirror with an NMOS diode-connected load are simulated using BSIM3v3 models in a typical 180-nm CMOS technology. Such a model can be used since there is a diode-connected load as a next stage everywhere in the FFT. The  $I_{out}$  versus  $I_{in}$  dc transfer characteristic of the NMOS current mirror with a PMOS load is shown in Fig. 5. By increasing the input current,  $V_{ds}$  decreases and  $V_{gs}$  increases, and the current mirror goes through three different operating regions.

From Fig. 5, the actual mirror's outputs are slightly larger than its inputs. This is mostly due to different values of  $V_{\rm ds}$  on each branch. In order to use the results in the system-level Matlab simulation of analog FFTs in Section IV, a linear curve was fitted to the circuit simulation values. The line for the PMOS mirror is slightly closer to the ideal mirror since a PMOS transistor has smaller mobility that causes less difference in the values of  $V_{\rm ds}$  on the mirror branches.

# B. Current Scaling

Since currents are copied and then summed at each FFT stage, the total current is doubled every time. This results in increased power consumption and nonlinearity. To cancel this the (W)/(L) ratios of the output transistors of each current mirror participating in summation in every stage of FFT are scaled by a factor of 1/2, resulting in the same input/output current range at each stage. The current range can then be chosen to operate the transistors in the selected region of operation.

# C. Temperature Effects

The behavior of a current mirror is largely insensitive to temperature, as long as both transistors operate at the same temperature. Differences in temperature can be modeled as a mismatch term. Since Section III discusses the impact of transistor mismatch in analog FFTs, the topic of temperature sensitivity is not further discussed in this paper.

## III. N-FFT MISMATCH MODEL

There are many considerations to take into account in analog design, such as device mismatch, body effect, and channel-length modulation. However, most of the error due to channel-length modulation and body effect is common mode. Thus, by using a fully differential circuit, these errors can be eliminated and only differential errors remain. Device mismatch is one such source of error. Since the FFT structure is based on current mirrors, mismatch is an important impairment that needs to be modeled.

As described shortly, mismatch in threshold voltage  $V_{\rm th}$  is modeled as an additive white Gaussian random variable. Also, to analyze the impact of mismatch over an entire *N*-FFT block, the mismatch due to all interior current mirror nodes in the block is modeled as a *signal-dependent additive input-referred* mismatch source. The per-mirror model and the input-referred model produce comparable bit-error-rate (BER) simulation performance, as discussed later in Section IV.

## A. Transistor Mismatch Model

This section explains the mismatch model for strong and weak inversion modes of operation. We note that the impact of mismatch is between the input and output transistors of a single current mirror; in other words a current mirror transfer function relies on having equal  $V_{\rm th}$  values. The PMOS load acts as an input node into a separate current mirror, whose mismatch can be evaluated separately. The PMOS load does not require matched W/L or  $V_{\rm th}$  with the output of the previous current mirror. A separate issue is the one of finite impedances (channel-length modulation) of multiple current mirror outputs driving the same load. This can be accounted for in the sizing of transistors.

1) Mismatch for Strong Inversion:  $V_{\rm th}$  variation is the dominant source of mismatch, compared to other sources like (W)/(L) or  $\mu_n C_{\rm ox}$  [24].  $V_{\rm th}$  variation can be modeled as a normally distributed random variable with zero mean and a unitless variance of  $\delta_{\varepsilon}^2$  [25]. Assuming a variation  $\Delta V_{\rm th}$ , transistor current  $I_D$  in saturated strong inversion can be expressed as

$$I_D = \frac{\mu C_{\rm ox}}{2} \frac{W}{L} (V_{\rm gs} - V_{\rm th} - \Delta V_{\rm th})^2.$$
 (3)

The square term can be expanded, producing

$$I_D = \frac{\mu C_{\rm ox}}{2} \frac{W}{L} (V_{\rm gs} - V_{\rm th})^2 - \mu C_{\rm ox} \frac{W}{L} (V_{\rm gs} - V_{\rm th}) \Delta V_{\rm th} + \frac{\mu C_{\rm ox}}{2} \frac{W}{L} \Delta V_{\rm th}^2.$$
(4)

The first term is the ideal current, and the second term is the dominant mismatch term since the last term can be ignored due to small variance of  $\Delta V_{\rm th}^2$ . Equation (4) can be rewritten as follows:

$$I_D = I_{\text{ideal}} \left( 1 - \frac{2\Delta V_{\text{th}}}{V_{\text{gs}} - V_{\text{th}}} \right).$$
(5)

 $\Delta V_{\rm th}$  can be represented as  $V_{\rm th}\varepsilon$  where  $\varepsilon$  is a normal distributed random variable  $\varepsilon$  :  $N(0, \delta_{\varepsilon}^2)$ . Therefore (5) becomes

$$I_D = I_{\text{ideal}}(1 + \zeta_{\text{str}}\varepsilon) \tag{6}$$

where  $\zeta_{\rm str}$  is a normalizing factor in strong inversion equal to  $(2V_{\rm th})/(V_{\rm gs} - V_{\rm th})$ . Therefore, a normally distributed additive mismatch term for  $I_D$  is obtained.

A numerical example is now provided to highlight the impact of threshold voltage variance  $\delta_{\varepsilon}^2$  on  $I_D$ . For  $V_{\rm th} = 450$  mV in a 180-nm CMOS process technology, a reasonable threshold voltage variation value  $\Delta V_{\rm th}$  of  $\pm 45$  mV is chosen, which corresponds to 10% of  $V_{\rm th}$  [26]. For a normal distribution, since 97% of outcomes fall within three standard deviations, it can be assumed that the maximum variation of  $\varepsilon$  is equal to  $3\delta_{\varepsilon}$ . Therefore, the  $\delta_{\varepsilon}$  is (10%)/(3) = 0.033 and  $\delta_{\varepsilon}^2$  is 0.001. The overall  $I_D$  variance is equal to  $(I_{\rm ideal}\delta_{\varepsilon}\zeta_{\rm str})^2$ . By choosing  $V_{\rm gs} = 2V_{\rm th}$ the normalizing factor becomes 2 and  $\zeta_{\rm str}^2 = 4$ . Hence, the worst-case  $I_D$  variation due to the threshold voltage mismatch is equal to  $0.004 \times I_{\rm ideal}^2$ .

2) Mismatch for Weak Inversion: For a transistor operating in weak inversion [27] with a threshold variation  $\Delta V_{\text{th}}$ , the drain current  $I_D$  is

$$I_D = I_{\text{ideal}} e^{\frac{-\Delta V_{\text{th}}}{nV_T}} \tag{7}$$

where  $V_T$  is the thermal voltage equal to (kT)/(q), and n is the threshold slope. Since  $\Delta V_{\rm th}$  is a normal distributed random variable,  $e^{\zeta_{\rm weak} \cdot \varepsilon}$  is a log-normal random variable, where  $\zeta_{\rm weak}$ , the subthreshold normalizing factor, is equal to  $(V_{\rm th})/(V_T)$  for n = 1. For  $V_{\rm th} = 450$  mV and  $V_T = 25$  mV,  $\zeta_{\rm weak}$  is about an order of magnitude higher than  $\zeta_{\rm str}$ . Hence, it makes weak inversion operation much more sensitive to mismatch. This observation is confirmed in Section IV, where an N-FFT that uses weak-inversion current mirrors with  $V_{\rm th}$  mismatch is simulated and is found to incur too much of a performance loss to be considered feasible.

### B. N-FFT Input-Referred Mismatch Model

To model the mismatch due to interior current mirror nodes of the N-FFT block as an additive input-referred mismatch source, the first step is to derive the output currents of the two-FFT block with mismatch shown in Fig. 6. For example,  $I_{2out1}$  is found to be equal to

$$I_{2out1} = I_{in21}(1 + \varepsilon_{211}) + I_{in22}(1 + \varepsilon_{212})$$
  
=  $I_{in21} + I_{in22} + I_{in21}\varepsilon_{211} + I_{in22}\varepsilon_{212}$  (8)

with similar equations for the other outputs, and  $\varepsilon$  subscripts are according to a stage number, copy number, and input. The FFT mismatch at each output can be modeled as an external additive term. The above equation can be seen as an ideal FFT output



Fig. 6. Analysis of mismatch for the two-FFT block. White nodes are current mirrors, each of which has its  $\varepsilon_{sci}$  where *s* indicates the stage, and *c* is either 1 or 2 representing the first or second copy of an input *i*.



Fig. 7. Analysis of mismatch for the radix-2, 4-FFT block. White nodes are current mirrors, each of which has its  $\varepsilon_{sci}$  where s indicates the stage, and c is either 1 or 2 representing the first or second copy of an input *i*.

plus an external mismatch term with double mismatch variance.

Similarly, the mismatch in the four-FFT pictured in Fig. 7 can be derived. The following equation is obtained, which has the same form as (8)

$$I_{4out1} = I_{in41}(1 + \varepsilon_{411})(1 + \varepsilon_{211}) + I_{in42}(1 + \varepsilon_{412})(1 + \varepsilon_{212}) + I_{in43}(1 + \varepsilon_{413})(1 + \varepsilon_{211}) + I_{in44}(1 + \varepsilon_{414})(1 + \varepsilon_{212}) = I_{in41}(1 + \alpha_{41}) + I_{in42}(1 + \alpha_{42}) + I_{in43}(1 + \alpha_{43}) + I_{in44}(1 + \alpha_{44})$$
(9)

where  $\alpha$  is in general in the form of  $\varepsilon + \varepsilon' + \varepsilon \varepsilon'$ . The mismatch model for the four-FFT thus has the same structure as for the two-FFT, and the only differences are the coefficients in the model.

The input-referred mismatch variance for the 4-FFT can now be derived from the above equations. In general, for any  $I_{4\text{out}-k}$ there are four  $I_{\text{in}-l}(1 + \alpha_l)$  terms added together, which have the same mean and variance. Since the input signals are roughly in the same range, the total variance should be 4 times the variance of  $\alpha$ . Also  $\alpha$  itself has roughly twice the variance of each individual transistor, since the term  $\varepsilon \cdot \varepsilon'$  is negligible. Thus the input variance of the mismatch model for the 4-FFT is roughly 8 times that of each transistor pair. This procedure can be repeated to find the overall variance of any larger FFT, as shown in Table. II.

The input-referred mismatch variance for an N-FFT can be expressed as follows:

$$\delta_N^2 = (Nm)3(3\mu_{\rm WF})^{(m-3)}\delta_\varepsilon^2 \tag{10}$$

where m is the number of stages in the butterfly structure, the first three is the average input factor for a current stage due to WFs used in eight-FFT and larger FFTs, the second factor 3 is the average input factor for previous stages due to WFs. The

TABLE II INPUT-REFERRED MISMATCH VARIANCE FOR RADIX-2 N-FFT FROM 2-FFT UP TO 256-FFT

|                       | -                                                                                                                          |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------|
| $\delta_N^2$          | Input Mismatch Variance                                                                                                    |
| $\delta_{2}^{2}:$     | $(2\cdot 1)\cdot \delta_{\varepsilon}^2 = 2\delta_{\varepsilon}^2$                                                         |
| $\delta_4^2$ :        | $(4\cdot 2)\cdot\delta_{\varepsilon}^2 = 8\delta_{\varepsilon}^2$                                                          |
| $\delta_{8}^{2}$ :    | $(8\cdot3)\cdot3\cdot\delta_{\varepsilon}^2 = 72\delta_{\varepsilon}^2$                                                    |
| $\delta_{16}^2$ :     | $(16\cdot 4)\cdot 3\cdot (3\cdot 0.7)\cdot \delta_{\varepsilon}^2 = 403.2\delta_{\varepsilon}^2$                           |
| $\delta_{32}^2$ :     | $(32 \cdot 5) \cdot 3 \cdot (3 \cdot 0.7)^2 \cdot \delta_{\varepsilon}^2 = 2.1 \cdot (10^3) \cdot \delta_{\varepsilon}^2$  |
| $\delta_{64}^{2^-}$ : | $(64 \cdot 6) \cdot 3 \cdot (3 \cdot 0.7)^3 \cdot \delta_{\varepsilon}^2 = 10.7 \cdot (10^3) \cdot \delta_{\varepsilon}^2$ |
| $\delta_{128}^2$ :    | $(128\cdot7)\cdot3\cdot(3\cdot0.7)^4\cdot\delta_{\varepsilon}^2 = 52.3\cdot(10^3)\cdot\delta_{\varepsilon}^2$              |
| $\delta_{256}^2$ :    | $(256\cdot 8)\cdot 3\cdot (3\cdot 0.7)^5\cdot \delta_{\varepsilon}^2 = 251\cdot (10^3)\cdot \delta_{\varepsilon}^2$        |



Fig. 8. Input-referred mismatch model for the N-FFT block.

TABLE III COMPARISON OF THE INPUT-REFERRED MISMATCH VARIANCES FOR DIFFERENT RADIX STRUCTURES

| 256-FFT Type | Input Mismatch Variance                                                                                            |
|--------------|--------------------------------------------------------------------------------------------------------------------|
| Radix-2      | $(2^8 \cdot 8) \cdot 3 \cdot 2.1^5 \cdot \delta_{\varepsilon}^2 = 251 \cdot (10^3) \cdot \delta_{\varepsilon}^2$   |
| Radix-4      | $(4^4 \cdot 4) \cdot 3 \cdot 2.1^3 \cdot \delta_{\varepsilon}^2 = 28.4 \cdot (10^3) \cdot \delta_{\varepsilon}^2$  |
| Radix-16     | $(16^2 \cdot 2) \cdot 3 \cdot 2.1^2 \cdot \delta_{\varepsilon}^2 = 6.76 \cdot (10^3) \cdot \delta_{\varepsilon}^2$ |
| DFT          | $(256^1 \cdot 1) \cdot 3 \cdot 2.1 \cdot \delta_{\varepsilon}^2 = 1.61 \cdot (10^3) \cdot \delta_{\varepsilon}^2$  |

constant  $\mu_{WF}$  is the average value of WFs in every stage, which is equal to 0.7. For *m* less than 4 the *m*-3 term should be 1 since there is no previous stage having WFs. Therefore, the mismatch of an entire *N*-FFT can be modeled as a *signal-dependent inputreferred* mismatch source, as shown in Fig. 8, which is a random vector with this equivalent variance. This input-referred noise vector is multiplied by the inputs and then added at the output of the ideal FFT without mismatch to provide the same output as an FFT having mismatch at each current mirror. Simulation results using both models (per-mirror and input-referred) are presented in Section IV.

From Table II, the input-referred mismatch variance is highly dependent on the number of stages and, e.g., for the 256-FFT the variance is in order of  $10^5$  times the variance of a single transistor pair. To make the *N*-FFT less sensitive to transistor mismatch, one idea is to decrease the number of stages in the FFT butterfly structure.

A higher radix FFT structure could be the solution. For instance, in a radix-4 FFT, four currents are summed at each stage rather than two. This change in the FFT diagram reduces the number of stages in half, meaning that only four stages are required for the 256-FFT. Also, a radix-16 FFT only needs two stages; perhaps the 256-FFT could even be realized in one single (full-radix) step, which essentially implements a full DFT. The input-referred mismatch variance for the radix-4 FFT, the radix-16 FFT, and full-radix DFT are provided in Table III.

It is clear that the DFT has the least variance, and its implementation is less sensitive to mismatch compared to other FFT

TABLE IV Comparison of the Number of Current Mirrors in the 256-FFT for Different Radix Structures

| 256-FFT Type | Number of Current Mirrors per Output                 |
|--------------|------------------------------------------------------|
| Radix-2      | $2 \cdot (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) = 510$ |
| Radix-4      | $4 \cdot (1 + 4 + 16 + 64) = 340$                    |
| Radix-16     | $16 \cdot (1+16) = 272$                              |
| DFT          | $256 \cdot (1) = 256$                                |

structures. Using a DFT results in about 100 times less input-referred mismatch variance, the impact of which is discussed in Section IV. Also, from Table IV, the number of current mirrors decreases as the radix of the 256-FFT increases. Thus, there is a win-win scenario in terms of the circuit complexity and the mismatch sensitivity to implement an analog DFT processor.

# C. Analog/Digital FFT Design Tradeoffs

The ease of implementation of a full-radix DFT using analog circuitry is due to the different cost structure for analog circuits than for digital ones. In a digital implementation, additions are costly. By sorting the inputs and using several stages the total number of additions can be minimized, thus obtaining the FFT structure. Increasing the number of stages does not increase the cost of copying signals in digital implementation, since bits are represented as voltages, and these can be tapped as many times as necessary. However in an analog FFT implementation, copying current signals is costly since the required current mirrors may incur losses due to mismatch. Furthermore, there is no cost for summations in analog design. Therefore, it is preferable to decrease the number of stages in order to decrease the number of current mirrors as well as the impact of corresponding mismatch. Clearly the low-radix FFT structure is not an ideal way to perform the DFT using analog circuitry.

Due to the large number of wires summing at one node, highradix FFT structures may have design limitations due to large capacitances and finite output impedances. However, the critical path delay of the receiver is most likely due to the error control decoder due to its iterative structure, hence the FFT is not the main area of concern. Also, a proper output impedance can be obtained by sizing the load transistors at each FFT stage.

To mitigate the mismatch sensitivity of an analog FFT processor, one idea is to increase the  $V_{\rm gs}$  of transistors to decrease the normalizing factor  $\zeta_{\rm str}$ , which is equal to  $(2V_{\rm th}/V_{\rm gs} - V_{\rm th})$ . For fixed values of W/L, increasing  $V_{\rm gs}$  incurs extra power consumption.

#### D. Power Consumption

Using a 180-nm CMOS technology, the power consumption of the analog FFT for the worst-case, radix-2, 256-FFT, for bias currents equal to 100 nA in strong inversion with 1% mismatch, and 0.5 dB loss in BER performance for a coded system, is about 16 mW [23], which is still remarkably less than that of a comparable digital FFT implementation, which is about 340 mW in [28]. For higher radix FFT structures, the power consumption decreases even further. For instance, the power consumption of the full-radix analog 256-DFT is about 1.6 mW, which is more than an order of magnitude less compared to that of



Fig. 9. Monte Carlo simulation of radix-2, 256-FFT having the nonideal current mirror model for different bias currents. The higher the bias current, the better the system performance.

the radix-2, 256-FFT above [23]. A digital implementation of a radix-4, 1024-FFT presented in [29] consumes 2.3 mW/MHz.

## **IV. SIMULATIONS AND SYSTEM PERFORMANCE**

This section presents simulation results of the OFDM communication system model with forward error control shown in Fig. 1, using Matlab and C programs to characterize the FFT/DFT blocks. Monte Carlo simulations were run over an additive white Gaussian noise channel, measuring the BER performance of the system versus signal-to-noise ratio (SNR) =  $(E_b)/(N_0)$ . Unless otherwise specified, each data point was generated using  $10^6$  bits.

## A. Current Mirror Model for 256-FFT Simulations

The performance of the 256-FFT using the current mirror linear curve model provided in Section II-A is presented in Fig. 9. This model is used at each current mirror, and the statistical simulation was run for different bias currents. The bias current range was varied from 100 pA to 100 nA based on the model shown in Fig. 5. As it is clear in this figure, performance improves with increased bias current values. We also simulated bias currents up to 100  $\mu$ A, but the results were nearly ideal and were not plotted for the sake of clarity.

# B. Mismatch Model Simulations

Both mismatch models provided in Section III are now simulated for a radix-2, 256-FFT, to show that the simulation results match. The first model is the FFT that has a mismatch component at each current mirror, the per-mirror model, and its results are shown as solid lines in Fig. 10. The other model is the input-referred mismatch model shown as dashed lines in Fig. 10. This model has the input-referred noise vector with an equivalent variance defined in (10), which is multiplied by the inputs and then added at the output of the ideal FFT. The simulation results show that the output of the input-referred mismatch



Fig. 10. Mismatch simulation for radix-2 FFT structures with varying numbers of symbols N and with 1% mismatch. The input-referred mismatch model matches the per-mirror mismatch model.



Fig. 11. Strong inversion and weak inversion mismatch simulation comparison for a radix-2, 8-FFT. Weak inversion operation is much more sensitive to mismatch compared to strong inversion operation.

model matches that of the FFT having mismatch at every current mirror.

## C. Strong and Weak Inversion Mismatch Sensitivity

As shown in Fig. 11, an eight-FFT operating in strong inversion (normal distribution of current outputs) with 1% mismatch does not lose significant BER performance; however, for weak inversion (log-normal distribution of current outputs) operation, the BER curve with only 0.2% mismatch is about  $10^{-1}$  at an SNR value of 10 dB, which means that weak inversion operation is too sensitive to mismatch for proper operation and thus it will not be further considered in this paper.



Fig. 12. Mismatch simulations for the 256-FFT with radix-2, radix-4, radix-16, and full-radix structures, with 2% mismatch. The higher the radix, the better the system performance due to reduced mismatch.

# D. High-Radix FFT Simulations

Fig. 10 shows that the radix-2, 256-FFT is very sensitive to mismatch, losing about 4 dB at  $10^{-2}$  BER for 1% mismatch. To mitigate the impact of mismatch, a higher radix structure should be used to reduce the number of stages. Fig. 12 shows simulation results of radix-2, radix-4, radix-16 and full-radix structures. The simulation assumes 2% mismatch for every current mirror in each version of the 256-FFT. For the radix-2, 256-FFT simulation the performance degradation is unacceptable; however the radix-4 structure performance is far better, with a loss of 2 dB at  $10^{-3}$  BER; the full-radix DFT only loses 0.5 dB at BER of under  $10^{-4}$ . As shown in Table III, for a 256-FFT the ratio of the input-referred mismatch variance for the radix-2 and full-radix structures is equal to (251)/(1.61) = 156, which is more than two orders of magnitude. Based on the numerical example of (6) provided earlier, the square root of mismatch variance can be considered as a mismatch percentage. Therefore, for a 256-FFT, a radix-2 structure, and a full-radix structure with  $\sqrt{156} = 12.5$ times the mismatch percentage of the radix-2 structure should have equivalent performance.

Fig. 13 presents BER versus transistor pair mismatch simulations for the 256-FFT with radix-2, radix-4, radix-16, and full-radix structures, at fixed SNR = 5 dB, for different mismatch values from 0% to 10%. This curve illustrates how system performance degrades with small increases in the transistor pair mismatch. Again we notice that the higher the radix, the better the system performance due to reduced overall mismatch.

# E. Coded System Performance

Fig. 14 presents the simulation of a 256-DFT with 5% mismatch concatenated with a  $(16, 11)^2$  turbo product code (TPC) as a forward error control mechanism and using BPSK modulation. Each data point was generated using  $10^7$  bits. The decoder mitigates the mismatch loss compared to the performance of the uncoded system. The system only loses about 0.5 dB at a BER of  $10^{-6}$ , which can be reasonably considered as a system noise



Fig. 13. BER versus transistor pair mismatch simulations for the 256-FFT with radix-2, radix-4, radix-16, and full-radix structures, at fixed SNR = 5 dB. The higher the radix, the smaller the difference between the system performance and theoretical reference line due to reduced overall mismatch.



Fig. 14. Mismatch simulation results for a coded system using a  $(16, 11)^2$  TPC. The decoder mitigates the mismatch loss, which is about 0.5 dB at a BER of  $10^{-6}$  for 5% mismatch at every current mirror.

component in an analog OFDM receiver. It is generally recognized in the literature that 5%–10% mismatch is fairly straightforward to realize [24]–[26].

## V. CONCLUSION

A novel analog N-FFT processor has been proposed, along with a mathematical model of its mismatch as an input-referred noise source. The mismatch model output closely matches the output of an N-FFT having mismatch at each current mirror. It was shown that an analog front end operating in strong inversion significantly outperforms one operating in weak inversion. It was demonstrated that the higher the radix of N-FFT, the lower its sensitivity is to mismatch. Therefore, the full-radix DFT is more appropriate than the radix-2 FFT for analog design. Finally, simulations demonstrated that for a receiver for an OFDM communication system concatenated with a forward error control code, an analog 256-DFT with 5% mismatch at each current mirror can be used, with a total system loss of only 0.5 dB at a BER of  $10^{-6}$ . The fact that 5% mismatch is easily achievable demonstrates that such an analog receiver system is feasible.

Future work will include the design, fabrication, and testing of a full system, and a study comparing the performance and silicon area of finite-precision digital implementations to analog implementations with mismatch. We foresee that the regular structure that consists only of current mirrors could make it feasible to exploit design automation tools to generate a compact layout.

## ACKNOWLEDGMENT

The authors would like to thank S. F. Fard, H. M. Nik, K. Boyle, and other colleagues working in the High-Capacity Digital Communications Laboratory for their kind support.

#### REFERENCES

- C. Winstead, "Analog iterative error control Decoders," Ph.D. dissertation, University of Alberta, Edmonton, AB, Canada, 2005.
- [2] J. Hagenauer and M. Winklhofer, "The analog decoder," in Proc. Int. Symp. Inf. Theory, Aug. 1998, p. 145.
- [3] M. Moerz, T. Gabara, R. Yan, and J. Hagenauer, "An analog 0.25 μ m BiCMOS tailbiting MAP decoder," in *Proc. IEEE Int. Solid-State Circuits Conf. Digital Tech. Papers*, Feb. 2000, pp. 356–357.
- [4] F. Lustenberger, "On the design of analog VLSI iterative decoders," Ph.D. dissertation, Swiss Federal Inst. of Technol. Zurich (ETH), Zurich, Switzerland, Nov. 2000.
- [5] H. A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarköy, "Probability propagation and decoding in analog VLSI," *IEEE Trans. Inf. Theory*, vol. 47, no. 2, pp. 837–843, Feb. 2001.
- [6] S. Hemati and A. H. Banihashemi, "Full CMOS min-sum analog iterative decoder," in *Proc. IEEE Int. Symp. Inf. Theory*, 2003, p. 347.
- [7] V. C. Gaudet and P. G. Gulak, "A 13.3-Mb/s 0.35-μ m CMOS analog turbo decoder IC with a configurable interleaver," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 2010–2015, Nov. 2003.
- [8] C. Winstead, J. Dai, S. Yu, C. Myers, R. R. Harrison, and C. Schlegel, "CMOS analog map decoder for (8,4) Hamming code," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 122–131, Jan. 2004.
- [9] D. Vogrig, A. Gerosa, A. Neviani, A. G. I. Amat, G. Montorsi, and S. Benedetto, "A 0.35 μ m CMOS analog turbo decoder for the 40-bit rate 1/3 UMTS channel code," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 753–762, Mar. 2005.
- [10] C. Winstead, N. Nguyen, V. C. Gaudet, and C. Schlegel, "Low-voltage CMOS circuits for analog iterative decoders," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 53, no. 4, pp. 829–841, Apr. 2006.
- [11] S. Hemati, A. H. Banihashemi, and C. Plett, "A 0.18 μ m CMOS analog min-sum decoder for a (32,8) low-density parity-check (LDPC) code," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2531–2540, Nov. 2006.
- [12] M. Yiu, C. Winstead, V. C. Gaudet, and C. Schlegel, "Design for testability of CMOS analog sum-product error-control decoders," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 54, no. 8, pp. 675–679, Aug. 2007.
- [13] J. A. Paradiso and T. Starner, "Energy scavenging for mobile and wireless electronics," *IEEE Trans. Pervasive Comput.*, vol. 4, no. 1, pp. 18–25, Jan.-Mar. 2005.
- [14] J. A. C. Bingham, "Multicarrier modulation for data transmission: An idea whose time has come," *IEEE Commun. Mag.*, vol. 28, no. 5, pp. 5–14, May 1990.
- [15] W. Y. Zou and Y. Wu, "COFDM: An overview," *IEEE Trans. Broad-cast.*, vol. 41, pp. 1–8, Mar. 1995.
- [16] R. V. Nee and R. Prasad, OFDM for Wireless Multimedia Communications. Boston, MA: Artech House, 2000.
- [17] Y. W. Lin and C. Y. Lee, "Design of an FFT/IFFT processor for MIMO OFDM systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 54, no. 4, pp. 807–815, Apr. 2007.

- [18] J. Y. Oh and M. S. Lim, "Fast Fourier transform algorithm for lowpower and area-efficient implementation," *IEICE Trans. Commun.*, vol. E89B, no. 4, pp. 1425–1429, Apr. 2006.
- [19] M. Lehne and S. Raman, "An analog/mixed-signal FFT processor for wideband OFDM systems," in *Proc. IEEE Sarnoff Symp.*, Mar. 2006, pp. 1–4.
- [20] S. K. Kim, J. S. Cha, H. Nakase, and K. Tsubouchi, "Novel FFT LSI for orthogonal frequency division multiplexing using current mode circuit," *Jpn. J. Appl. Phys. Solid State Devices Mater.*, vol. 40, no. 4B, pp. 2859–2865, Apr. 2001.
- [21] N. Sadeghi, H. M. Nik, C. Schlegel, V. C. Gaudet, and K. Iniewski, "Analog FFT interface for ultra-low power analog receiver architectures," in *Proc. Analog Decoding Workshop*, Turin, IT, June 2006, pp. 11–14.
- [22] K. Boyle, P. Mercier, N. Sadeghi, V. Gaudet, C. Schlegel, C. Winstead, and M. Kashyap, "Design and implementation of an all-analog fast-Fourier transform processor," in *Proc. IEEE Midwest Symp. Circuits Syst.*, Aug. 2007, pp. 1532–1535.
- [23] N. Sadeghi, "Analog FFT interface for ultra-low power analog receiver architectures," M.Sc. thesis, Univ. Alberta, Edmonton, AB, Canada, 2007.
- [24] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1440, Oct. 1989.
- [25] P. G. Drennan and C. C. McAndrew, "Understanding MOSFET mismatch for analog design," *IEEE J. Solid-State Circuits*, vol. 38, no. 3, pp. 450–456, Mar. 2003.
- [26] A. Oruganti and N. Ranganathan, "Leakage power reduction in dual-Vdd and dual-Vth designs through probabilistic analysis of vth variation," in *Proc. IEEE Int. Conf. VLSI Des.*, 2006, pp. 1–4.
- [27] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd ed. New York: Oxford Univ. Press, 1999, pp. 170–175.
- [28] M. Hasan and T. Arslan, "Implementation of low-power FFT processor cores using a novel order-based processing scheme," *IEE Proc. on Circuits, Devices and Syst.*, vol. 150, no. 3, pp. 149–154, June 2003.
- [29] K. Pagiamtzis and G. Gulak, "Empirical performance prediction for IFFT/FFT cores for OFDM systems-on-a-chip," in *Proc. IEEE Midwest Symp. Circuits and Systems*, Aug. 2002, pp. I-583–I-586.



Nima Sadeghi (S'07) received the B.Sc. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2004, and the M.Sc. degree in electrical and computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2007. He is currently working toward the Ph.D. degree in electrical and computer engineering at the University of British Columbia, Vancouver, BC, Canada.

He has received two Full Graduate Scholarships from the Electrical and Computer Engineering De-

partment, University of Alberta and the University of British Columbia during his M.S. studies and his Ph.D. studies, respectively. His current research interests include analog and mixed signal processing, specifically for digital communication applications. Other research interests include mutually considering system and circuit levels to optimize design performance. He is also engaged in research on submicron CMOS system-on-a-chip sensor interface design for high-temperature and low-power applications.



Vincent C. Gaudet (S'97–M'03) received the B.Sc. degree in computer engineering from the University of Manitoba, Winnipeg, Canada, in 1995, the Master of Applied Science degree from the University of Toronto, Toronto, Canada, in 1997, and the Ph.D. degree from the University of Toronto, in 2003.

From February 2002 to July 2002, he was a Research Associate at the Ecole Nationale Superieure des Telecommunications de Bretagne, Brest, France. He was the University of Alberta's Canadian Microelectronics Corporation Faculty

Representative. He is currently an Associate Professor in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, and during 2008–2009, is on sabbatical leave at Northeastern University, Boston, MA. His current research interests include information processing microsystems and more specifically on energy-efficient graph-based decoding of error control codes such as turbo code and low-density parity-check code.

Dr. Gaudet is licensed as a Professional Engineer in the Province of Ontario, Canada. He was a member of the Natural Sciences and Engineering Research Council Scholarships and Fellowships Committee 176. He has served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS and a Digest Editor of the IEEE International Solid-State Circuits Conference.



**Christian Schlegel** (S'86–M'88–SM'97) received the Dipl. El. Ing. degree from the Swiss Federal Institute of Technology, Zurich, Switzerland, in 1984, and the M.S. and Ph.D. degrees in electrical engineering from the University of Notre Dame, Notre Dame, IN, in 1986 and 1989, respectively.

He held academic positions at the University of South Australia, the University of Texas, and the University of Utah, Salt Lake City. In 2001, he was appointed Chaired Professor for Digital Communications at the University of Alberta, Edmonton, Canada,

heading a large research laboratory funded by the Alberta Informatics Circle of Research Excellence (iCORE). He is currently with the Department of Electrical and Computer Engineering, University of Alberta. He is the author or coauthor of the research monographs *Trellis Coding and Trellis and Turbo Coding* (IEEE/Wiley) and coauthor of *Coordinated Multiple User Communications* (Springer-Verlag). His current research interests include error control coding and applications, multiple access communications, modulation and detection, as well as analog and digital implementations of communications system.

Dr. Schlegel has served as an Associate Editor for coding theory and techniques of the IEEE TRANSACTIONS ON COMMUNICATIONS from 1999 to 2008, and a Guest Editor of the PROCEEDINGS OF THE IEEE for their special issue on turbo coding. He has been a Technical Program Cochair for the IEEE Information Theory Workshop 2001, the IEEE International Symposium on Information Theory 2005, and the General Chair for the IEEE Communications Theory Workshop 2005. He has also served on numerous other IEEE conference program committees, and is an IEEE Distinguished Lecturer. He received an 1997 Career Award, a Canada Research Chair in 2001, and an iCORE Professorship in 2002 and 2007, providing \$5M in research funding.