A Quad-Core-Coupled Triple-Push 295-to-301 GHz Source with 1.25 mW Peak Output Power in 65nm CMOS Using Slow-Wave Effect

Amir Hossein Masnadi Shirazi, Amir Nikpaik, Shahriar Mirabbasi, and Sudip Shekhar
University of British Columbia, Vancouver, Canada
{amirms, amirnik, shahriar, sudip}@ece.ubc.ca

Abstract—Achieving high output power in (sub-)THz voltage-controlled oscillators (VCOs) has been a severe design challenge in CMOS technology. In this work, an architecture for coupled terahertz (THz) VCOs is presented. The architecture utilizes four coupled triple-push VCOs and combines the generated third harmonic currents using slow-wave coplanar waveguide (S-CPW) at 300 GHz. Coupling four cores increases output power, and use of S-CPW reduces the loss and increases the quality factor of the VCO tank. It is shown that using S-CPW results in ~2.6 dB of lower loss as compared to the conventional CPW or grounded-CPW (GCPW) structures. The VCO is tuned using parasitic tuning technique and achieves 1.7% frequency tuning range (FTR). The proposed structure is designed and fabricated in a 65-nm bulk CMOS process. The measured peak output power of the 295-to-301 GHz VCO is 0.9 dBm (≈1.25 mW) at 300 GHz while consuming 235 mW (with a DC to RF efficiency of 0.52%).

Index Terms—CMOS, coupled oscillator, slow-wave, triple push, harmonic generation, power combining, terahertz, VCO.

I. INTRODUCTION

In recent years, the continued scaling of the CMOS process has attracted the attention of designers to implement sub-THz to THz building blocks [1]−[10]. Ranging from 300 GHz to 3 THz, the THz band on the frequency scale lies between the domains of optics and electronics and can be used for a wide range of applications, e.g., high-data-rate communication and high-resolution imaging.

THz signal detection using CMOS has been shown to be quite feasible [9], however, efficient THz signal generation on CMOS is still a challenge and an open topic of research. Most of the abovementioned and emerging applications require a relatively high power (>1mW) THz source which is currently implemented in a costly non-CMOS process using HBT/HEMT oscillators, cascade quantum lasers, and group III–V based multipliers. Generating a mW THz source in CMOS using direct signal synthesis is difficult due to the limited maximum frequency of oscillation for a MOSFET (for example \( f_{\text{max}} \approx 200 \text{ GHz} \) in a 65 nm process [1], [2], [6]). Hence, an indirect signal synthesis or multiplication is required, but this impacts the generated output power. A high-power signal generation typically requires a constructive summation of weak harmonics of a lower frequency signal, which in turn often requires a completely symmetric layout. Any mismatch between cores adversely affects the superposition of the signals and lowers the output power. Losses in the combiner also reduce the output power. Obviously, the indirect signal synthesis approach has a poor DC-to-RF efficiency [1]−[10] and a small frequency tuning range (FTR), limited by the parasitic capacitances. Using an explicit varactor is not easily viable due to its poor quality factor [1].

Most of CMOS-based THz sources use coupled harmonic generators, such as N-Push architecture, to generate and combine THz harmonics (as shown in Fig. 1). Among them, push-push VCO (PPV) structure has attracted distinct attention due to its even-harmonics extraction capability and ease of symmetric layout design [1]−[4]. However, the maximum achievable fundamental oscillation frequency \( f_{\text{osc-max}} \) of a PPV is architecture dependent and in most cases is a fraction of \( f_{\text{max}} \) (e.g., \( f_{\text{osc-max}} \approx f_{\text{max}}/2 \) for a class-B PPV which results in \( f_{\text{osc-max}} \approx 100 \text{ GHz} \) for a 65-nm CMOS process [1]-[2]). Thus, the 2nd-harmonic extraction technique is usually employed for generating harmonic frequencies close to \( f_{\text{max}} \). To further increase the frequency of operation beyond \( f_{\text{max}} \), push-push approach with 4th-harmonic extraction has been used [2]. Although this approach benefits from requiring a lower fundamental frequency and from a symmetric structure of PPV, it has a relatively weak output power and poor DC-to-RF efficiency (0.13% at 320 GHz [1] and 0.03% at 256 GHz [2]). For odd-harmonics, triple-push VCO (TPV) structure can be used [6]−[8]. It has been shown that TPV is a candidate architecture for boosting the \( f_{\text{osc-max}} \) towards the \( f_{\text{max}} \) of the process [2][7]. Most of the reported THz TPVs are single-stage (step 2 in Fig. 1) and are not

Fig. 1. Steps towards generating a high power THz source

![Fig. 1. Steps towards generating a high power THz source](image-url)
coupled. Thus, the reported output powers are in the range of $-10$ to $-6$ dBm \cite{7,8}. The reason could be partly attributed to the layout complexity of TPV structures which may force the designer to use an asymmetric tank layout. For example, laying out transmission-lines at the required $60^\circ$ angles are not allowed in many standard CMOS processes.

In this paper, to improve the output power and DC-to-RF efficiency, we propose a quad-core passively coupled TPV which extracts the third harmonic and delivers 0.9 dBm (1.25 mW) at the third harmonic. As will be shown, compared to the 4th harmonic, the 3rd harmonic extraction results in a $1.5\times$ higher efficiency in the band of 250-to-300 GHz. To further boost the efficiency, each VCO utilizes a slow-wave inductor for its tank and combiner. Compared to the conventional CPW and grounded-CPW (GCPW) structures, the proposed slow-wave structure can reach 40\% higher quality factor (Q) and 2.6 dB lower insertion loss. Measurement results confirm that as compared to CPW or GCPW, S-CPW can deliver 2.6 dBm higher power (both structures are measured). The paper is organized as follows: Section II briefly compares the efficiency of PPV and TPV structures. Section III presents the proposed slow-wave triple-push VCO. Measurement results and concluding remarks are provided in Section IV and V, respectively.

![Fig. 2. (a) Simulated PPV and TPV with their (b) DC-to-RF efficiency in 65nm process with ideal combiner](image)

**II. PPV AND TPV EFFICIENCY COMPARISON**

To compare harmonic efficiency of PPV and TPV over frequency, two structures as shown in Fig. 2 are designed and simulated in a 65-nm CMOS process. At each frequency, the component values of the core oscillator of each structure are adjusted so as to optimize the DC-to-RF efficiency. For the purpose of simulation, ideal passive components such as RF choke (RFC) and combiner are used. Also Q of the LC tanks are chosen to be $\approx 30$ at all frequencies. In addition, both architectures utilise $L_{\text{Gate}}$ to boost their effective $f_{\text{max}}$ \cite{1}-\cite{2}. Fig. 2 plots the DC-to-RF efficiency for the 2nd, 3rd, and 4th harmonics. For the simulated PPV structure, the maximum $f_{\text{max}}$ is about 140 GHz which would generate 2nd and 4th harmonics at 280 and 560 GHz. Fig. 2 suggests that for frequencies higher than 210 GHz the 2nd harmonic generation is not as efficient as the 3rd harmonic counterpart. This can be attributed to the fact that the fundamental frequency is approaching the $f_{\text{max}}$ of the transistor. Also, for frequencies higher than 360 GHz, the 4th harmonic generation is preferred. In this work, our target frequency is below 360 GHz and thus we focus on the TPV architecture which offers a superior efficiency based on Fig. 2.

**III. PROPOSED SLOW-WAVE QUAD-CORE-COUPLED TPV**

Fig. 3(a) shows the proposed TPV. Four triple push oscillators are coupled in-phase and the third harmonics are combined and matched to 50\(\Omega\). Each oscillator is tuned at 100 GHz and the tank (40\(\mu H\) inductor) is implemented using a slot-type float S-CPW as shown in Fig. 3(b) \cite{11}. $L_{\text{Gate}}$ is implemented using CPW line to control drain-gate phase and boost $g_{\text{m}}$ of the devices \cite{1}-\cite{2}. At the third harmonic, the $L_{\text{Gate}}$ shows high impedance (ideally open) and the generated harmonic current sinks to the centre-tap (CT) node. The oscillators are coupled in-phase at the fundamental frequency by coupling the drain node of each transistor with the consecutive stage. The generated 3rd harmonics are then combined with four shielded S-CPW and connected to the output pad. A 5-port electromagnetic (EM) simulation is carried out to match the combiner to 50\(\Omega\). Fig. 3(c) shows the output matching ($S_{11}$) of the combiner. As will be discussed next, using slotted slow-wave structure for inductor and combiner results in a higher quality factor which in turn relaxes the start-up condition of the oscillator as well as reduces the insertion loss of the combiner.
A. Slotted S-CPW and Comparison with Conventional CPW

Fig. 4 illustrates the difference between GCPW and S-CPW lines. The primary goal of using patterned CPW lines is to isolate the lossy substrate from the signal path and reduce the associated eddy current loss in the substrate. Theoretically, GCPWs are able to fully isolate the substrate from the signal path; however, in practice providing a truly 0 V reference is impossible and thus signal can be induced to the ground plane which impacts substrate/signal isolation [11]. An alternative solution is to use slotted S-CPW. Since the shield is a good conductor, there is no electric field tangential to the strips and thus the voltage on the shield is zero with respect to CPW and hence can provide a better shield than GCPW [11]. The phase velocity is given by:

\[ v_p = \frac{\omega}{\beta} = \frac{c_0}{\sqrt{\varepsilon_{r,eff}}} = \frac{1}{\sqrt{LC}} , \]

where \( \omega \) is angular frequency, \( \beta \) is propagation constant, \( c_0 \) is speed of light, \( \varepsilon_{r,eff} \) is effective substrate permittivity, and \( L \) and \( C \) are inductance and capacitance per unit length. Using slotted strips under the CPW increases the effective \( C \) without impacting the inductance significantly. Consequently the \( \varepsilon_{r,eff} \) (or \( \beta \)) increases. It can be shown that the quality factor of the transmission line can be written as [11]:

\[ Q = \frac{\beta}{2\alpha} = \frac{\omega\sqrt{\varepsilon_{r,eff}}}{2\varepsilon_{r,eff}} , \]

where \( \alpha \) is attenuation constant of the line. Increasing \( \varepsilon_{r,eff} \) using float strip lines in turn boosts the \( Q \). To validate the phenomenon, the slotted S-CPW is designed using the top thickest metal (M9) with slots on the next metal layer (M8). The structure is EM simulated and compared with conventional CPW and GCPW. Fig. 5 shows simulation results of the insertion loss as well as quality factor of the structures. The S-CPW attains around 50% better quality factor and 2 dB lower insertion loss at 300 GHz.

IV. IMPLEMENTATION AND MEASUREMENT RESULTS

As a proof-of-concept, the proposed slow-wave TPV (Fig. 3) is designed and implemented in a 65-nm CMOS process. Fig. 6 shows chip micrographs of single-stage and quad-core-coupled TPVs. The active die area (including pads) for the quad-core-coupled TPV is about 290×316 μm² (Fig. 6(c)). To confirm the advantages of slow-wave design, the same TPV structure is replicated using GCPW and CPW combiners and tanks (Fig. 6(a) and Fig. 6(b)). All passive components are simulated using Sonnet 3D electromagnetic (EM) simulator. Fig. 7 shows the test setup used for the measurements. For frequency measurements, the chip is probed and the VCO signal is down converted using an OML M03HWD harmonic mixer (the chip is also measured using a VDI WR3.4 sub-harmonic mixer). The LO is provided using an Agilent E4448A PSA spectrum analyser with an added capability to map the downconverted signal back to its original frequency. The output power is measured using Erickson PM4 power meter. Table 1 summarizes the measurement results for the different flavours of the implemented structure. As can be seen from the table, the slow-wave TPV has the best performance, with 2.6 dBm

![Fig. 4. Grounded and slow-wave CPW structures](image1)

![Fig. 5. Q and S21 insertion loss of S-CPW, GCPW, and CPW](image2)

![Fig. 6. Chip micrographs for proposed TPVs.](image3)

![Fig. 7. Measurement test setup](image4)

### TABLE I. COMPARISON OF CPW, GCPW, AND S-CPW TPV

<table>
<thead>
<tr>
<th>TPV Architecture</th>
<th>Frequency (GHz)</th>
<th>Peak Output Power (dBm)</th>
<th>DC-to-RF Efficiency (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single with CPW tank</td>
<td>298</td>
<td>-11.9</td>
<td>0.30%</td>
</tr>
<tr>
<td>Single with GCPW tank</td>
<td>301</td>
<td>-11.2</td>
<td>0.22%</td>
</tr>
<tr>
<td>Single with S-CPW tank</td>
<td>300</td>
<td>-10.8</td>
<td>0.68%</td>
</tr>
<tr>
<td>Quad-core with CPW Combiner</td>
<td>297</td>
<td>-3.2</td>
<td>0.15%</td>
</tr>
<tr>
<td>Quad-core with GCPW Combiner</td>
<td>301</td>
<td>-1.7</td>
<td>0.21%</td>
</tr>
<tr>
<td>Quad-core with S-CPW Combiner</td>
<td>299</td>
<td>0.9</td>
<td>0.51%</td>
</tr>
</tbody>
</table>

192
higher output power and almost 2X better DC-to-RF efficiency compared to other implementations. The advantage of using a quad-core-coupled architecture is also apparent. Fig. 8(a) shows the captured 300 GHz signal (note that the spectrum analyser has mapped the downconverted IF signal back to the RF band). The VCO is tuned by changing the supply voltage which consequently changes the gate parasitics of the MOS devices. Fig. 8(b) shows output power and tuning range of the proposed prototype. Table II summarizes the performance of the proposed slow-wave TPV prototype and includes the performance of the related state-of-the-art designs for the purpose of comparison. The proposed design compares favourably with the state-of-the-art and achieves 2.4 dBm higher output power and 2X better efficiency than the best performing prior design at 300 GHz [6].

![Fig. 8. (a) Measured frequency at 300.8 GHz. (b) and Frequency tuning and output power versus supply voltage](image)

V. CONCLUSION

A high output power, passively-coupled, tunable triple-push source is presented. A 3rd-harmonic 295-to-301 GHz proof-of-concept prototype is designed and measured in a 65nm CMOS process. The structure uses S-CPW to increase the quality factor of the fundamental tank and the combiner, and four cores coupled together to increase the output power. The measurement results show S-CPW-based design achieves 2.6 dB and 4.1 dB lower power loss as compared to the equivalent GCPW-based and CPW-based structures, respectively. The performance of the proposed designs compares favourably with that of the state-of-the-art.

ACKNOWLEDGMENT

We would like to thank Professor S. Savafi-Naeini and A. Nabavi at University of Waterloo and Professor A. Niknejad at BWRC, UC Berkeley for providing access to measurement equipment and Andrew Townley for his help with the test setup.

REFERENCES