# An All-Digital Phase-Frequency Tunable Clock Generator for Wireless OFDM Communications Systems

Jui-Yuan Yu, Juinn-Ting Chen, Mei-Hui Yang, Ching-Che Chung, and Chen-Yi Lee Institution of Electronics, National Chiao Tung University, Taiwan Email: <u>blues@si2lab.org</u>, <u>cylee@si2lab.org</u>

## ABSTRACT

An all-digital clock generator is designed to enable the clock phase and frequency tuning dynamically during the wireless communications system is in operation. This phase-frequency tunable clock generator (PFTCG) provides 8 clock phases for selection and enables the ADC circuits sampling signals with lower frequency and better sampling phase, resulting in lower power consumption. The PFTCG provides the frequency tuning range ±150ppm centered at 5MHz, resulting in high performance due to smaller sampling clock offset. This PFTCG is simulated under the wireless body area network system, and shows a 6.3dB SNR improvement at BER=1e<sup>-3</sup>, and the hardware is simulated with power 77.56 $\mu$ W in the standard process 90nm CMOS technology.

#### **I. INTRODUCTION**

All-digital clock generator has been adopted in more and more SoC systems. With the all-digital design approach, the system is integrated with lower cost and easier process scaling. In wireless communications systems especially in the OFDM modulation, the overall decoding performance is highly related to the synchronization (timing and frequency) accuracy. To maintain sampling timing existing designs accuracy, [1-2] apply oversampling approaches to acquire sufficient signal integrity. This oversampling rate, however, cannot exactly be a multiple of signal bandwidth in a real implementation. Thus a sampling clock offset (SCO) exists in the sampled digital signals, resulting in the decrease of signal-to-noise (SNR) value due to the inter-carrier interference (ICI). To quarantee the stability of correct decoding, the phase-error tracking [3] is applied to compensate the residual phase rotation in these noisy signals.

However, the oversampling scheme consumes too much power in the analog-to-digital (ADC) circuits, and the phase-error tracking cannot reduce the noise level from the ICI effect. Therefore, the proposed all-digital phase-frequency tunable clock generator (PFTCG) is designed to enable the power reduction and performance improvement from the problems mentioned above. When the oversampling scheme is replaced by a symbol-rate sampling approach, a dynamic sample-timing control (DSTC) [4] is used to calculate the best sampling timing instance. Then the PFTCG is controlled by the commands from DSTC, and generates an alternative clock sampling phase to the ADC circuits. As the clock sampling timing is decided, the next step is to compensate the clock frequency offset. By the phase-error tracking, the phase slop of the subcarriers corresponds to the SCO value, and this information is used to change the generated frequency from the PFTCG. The overall control algorithm can be found in [5].

To enable the operation discussed above, the clock generator is designed to change the generated clock phase and frequency dynamically in several-cycle response time. Existing analog frequency synthesizer requires longer tracking time. Therefore, it is difficult to shift the resulting clock phase and frequency after locked. As a result, the all-digital scheme is proposed in this paper. Unlike the previous design [4] [6], which has only the capability of clock phase adjustment, this paper addresses both the clock phase and frequency control and adjustment to achieve the purposes of robust signal decoding, less power consumption, and higher system performance.

This paper is organized as follows. In section II, the system overview is presented, and section III shows the designs of the proposed phasefrequency tunable clock generator. Then the system performance and hardware profile are simulated in section IV followed by the conclusion in section V.



Fig 1. Block diagram of the system operation with the aid of the proposed all-digital  $\mathsf{PFTCG}$ 

### **II. SYSTEM OVERVIEW**

The overall system operation with the aid of the all-digital PFTCG is illustrated in Fig. 1. The signals are transmitted through the channel in addition to environmental noise, and received in the receiver side. The PFTCG provides the driving clock to the ADC for incoming signal sampling. The sampled signals are detected and computed by the timing synchronization block, including packet detection and boundary detection. When a packet is detected, the timing-error detector (TED) then computes if the sampling instance is best within a symbol period interval. If not, a control code  $\hat{\varepsilon}$  is sent to the PFTCG and an updated clock signal with a timing offset in a fraction of period is selected to drive the ADC. Therefore, the later signals are sampled by the same frequency but a little timing offset. This TED computation loop changes the  $\hat{\varepsilon}$  continuously until the best timing offset is reached.

After TED, the frequency-error detector (FED) takes frequency-domain signals (after FFT) to compute the sampling frequency offset  $\hat{\xi}$  from phase rotation of the complex signals. Then, the offset  $\hat{\xi}$  is feedback to PFTCG for an updated clock generation with slightly frequency shift to reduce the SCO value. In general, PFTCG takes  $\hat{\varepsilon}$  and  $\hat{\xi}$  to generate a clock source with the desired frequency and phase, and the results are calculated in the synchronization loop under overall system considerations. This also confirms the role and the specifications of this all-digital PFTCG.

### III. Phase-Frequency Tunable Clock Generator A. PFTCG Architecture

The proposed architecture is shown in Fig. 2. The PFTCG is composed of two main parts: (i) 5MHz-frequency clock generation from the reference clock and (ii) multi-phase clock generation from the generated 5MHz clock source. The phase-frequency detector (PFD) and digitalcontrolled oscillator (DCO) are used for 5MHz clock generation. Furthermore, the DCO is also designed with 8 delay buffers, and each buffer provides  $T_{FREF}/8$  delay time, resulting in 8 clock signals with equal-spaced T<sub>FREF</sub>/8 between Phase\_N and Phase\_(N-1). Then the selected clock from these 8 sources is picked up via the phase-selection. In order to dynamically generate the clock with variable frequencies and phases, the estimated frequency offset  $\hat{\xi}$  and phase offset  $\hat{\varepsilon}$ are sent to the controller and phase-selection multiplexer, respectively. From this architecture, the



Fig 2. The proposed PFTCG architecture

correct frequency is generated by the closed loop of PFD, controller, and the phase\_0 in the DCO. When this loop reaches steady state, the DCO is able to divide the total delay in 8 partitions with different clock phases.

The control loop of this architecture is depicted in Fig. 3. In the beginning of clock generation and locking loop, the PFD generates an UP or DOWN command to modify the delay in the tracking loop. When a new DCO code is calculated, then the present DCO and PFD control signals are first cleared and then updated to the latest DCO code. The behavior of *clear* $\rightarrow$ *update* rather than directly update is to prevent the loop from frequency and phase divergence. When this loop achieves the lock state. the resulting clock frequency corresponds to the desired 5MHz clock, which is regarded as the coarse tuning loop. In order to match the signal sampling frequency from the transmitter (SCO may come from TX or RX itself), another fine-tuning loop is applied to approach the

TX clock frequency according to the information  $\hat{\xi}$ . Then the SCO is minimized to the resolution provided by the delay hardware.

## B. Circuit Designs

The PFD design follows the circuit topology proposed in [7]. In the DCO design as shown in Fig. 4, it is composed of 3 tuning stage to meet the wide-range delay resolution from several tens of nanoseconds to the ten-pico-second scale. In order to provide 8 phases from the generated 5MHz clock source, the buffers in the 1<sup>st</sup> tuning stage divide the total delay into a multiple of 25ns in each delay segment. Moreover, the 2<sup>nd</sup> tuning stage provides the delays to compensate delay changes due to the process-voltage-temperate (PVT) variations. The circuit topology in the 2<sup>nd</sup> tuning stage follows the 1<sup>st</sup> one except the delay resolution.

In the 3<sup>rd</sup> tuning stage, it provides the highest delay resolution, i.e. least timing delay. This stage is used to slightly move the frequency of the generated clock to approach the TX's clock frequency. Furthermore in the 1<sup>st</sup> tuning stage, the clock signals between every buffer in the delay line

are connected to 8 multiplexer groups, and each multiplexer group selects a clock signal with T<sub>FREF</sub>/8 to each other and forms the vector out0~out7. Then every outN signal is fine-tuned individually by the 2<sup>nd</sup> and 3<sup>rd</sup> stage. The SCO between TX and RX is assumed to be 150ppm @ 5MHz, so the frequency variation is 750Hz, corresponding to the tuning range of PFTCG ±30ps. To achieve this resolution, the loading variation of NAND logic-gate capacitance is applied as shown in Fig. 5 [8]. The 3-input NAND is selected with one input node tied to zero to cut off the path of NMOS and PMOS from ground and voltage supply, respectively. Then the on-off behavior from ON3 decides if an additional capacitance  $\Delta C$  appeared in the output node of the delay cells, resulting in the change of charge and discharge in the desired delay resolution.

### **IV. SIMULATION RESULTS**

The simulation results in this paper are explored in terms of system performance and hardware profile, respectively. The target application is the new generation wireless body area network (WBAN) as reported in [6]. In this system, а multi-tone CDMA (MT-CDMA) modulation scheme is used. In this application, the chip design is required to be highly integrated in a tiny area, and the power consumption is limited in the µW scale. To enable the area shrink, a crystal reference clock is replaced by an oscillator pad [9], which performs worse SCO value. Therefore, the tolerated SCO amount in this system is extended and defined as ±150ppm related to 5MHz sampling frequency. Under this system constraint, the proposed PFTCG in the receiver is able to generate a clock with the frequency approached to the one from the transmitted signals. Fig. 6 shows the performance difference before and after the PFTCG is applied. Without the PFTCG, the noise level due to ICI in the received signals is too high to recover the primitive signals. So, a SNR 7dB loss presents in the decoded results at BER=1e<sup>-3</sup>. With the PFTCG, the clock frequency in the receiver side is adjusted to approach the one in the transmitter side with remaining SCO <  $\pm$  25ppm. Therefore, the system achieves 6.3dB SNR improvement.

In the generated waveforms, Fig. 7(a) shows the overview of the operation scenario. In the beginning of the RESET trigger, the PFTCG starts to converge the generated clock to 5MHz until the LOCK signal appeared. During this time, it is found that the CLEAR\_DCO signal is sent frequently to update the loop to a new delay path. After this, the



Fig 5. The design of the 3<sup>rd</sup> fine tuning stage

circuit enters the steady state. When the PFTCG is required to change the frequency from the control signal (corresponding the to signals TUNE VALID and TUNE CODE), the resulting clock then has an updated frequency. In the Fig. 7(b) shows 8 even-spaced clock waveforms in the steady state. Each pair of waveforms has about 25ns delay. When looking into the exact delay spacing, the percentage between each phase slot is 10%, 10%, 10%, 12%, 14%, 14%, 14%, and 15% of one period from Phase 0 to Phase 7. In Table 1, we summarize the hardware information. This PFTCG is implemented in the standard process 90nm high threshold voltage (SPHVT) CMOS technology. The generated clock frequency and phase number are 5MHz and 8 phases, respectively. The lock-in time is designed within 128 cycles. The delay cell resolutions of  $1^{st} \sim 3^{rd}$ tuning stage in the DCO circuit are 37.16ns, 1.16ns, and 18ps, respectively. With the scaled 0.8V and SPHVT CMOS, the power consumes  $77.56\mu W$ with maximum RMS jitter 62ps. Finally, the

designed layout view is shown in Fig. 8. In the area of this PFTCG, the main part is the DCO circuits, largely from the delay cells to constitute the 25ns delay in each delay phase. In the rest of the area, it mainly comes from the control circuits because the long delay line has multiple of delay stage to control and it requires lots of circuits to decode the control signals. This PFTCG is integrated in a test system for system verification with the PFTCG area  $125\mu m \times 252\mu m$ .

### **V. CONCLUSION**

A phase-frequency tunable clock generator is designed in this paper. The design specification considers the system's performance, integrated chip area, and system power consumption. With this PFTCG, the system maintains the same performance with worse sampling clock offset. Also a smaller and highly integrated reference clock source can be used. Therefore, this proposed PFTCG enables the robust and high performance SoC design in wireless applications.

### **VI. ACKNOWLEDGEMENT**

This work has been supported by MOEA project and MTK university program. The authors would like to appreciate CAD tool supports from Cadence and thank their colleagues within SI2 group of National Chiao Tung University for fruitful discussions.

#### REFERENCES

- 1.F. M. Gardner, "Interpolation in Digital Modems Part I: Fundamentals," *IEEE Trans. Communications*, vol. 41, pp. 501-507, Mar. 1993.
- F. M. Gardner, "Interpolation in Digital Modems Part II: Implementation and Performance," *IEEE Trans. Communications*, vol. 41, pp. 998-1008, Jun. 1993.
- Hung-Kuo Wei, "A Frequency Estimation and Compensation Method for High Speed OFDM-based WLAN System," M.S. thesis, Dept. Electron. Eng., NCTU, Hsinchu, Taiwan, 2003.
- 4.J.Y. Yu, Ching-Che Chung, Hsuan-Yu Liu, Yu-Wei Lin, Wan-Chun Liao, Terng-Yin Hsu, and Chen-Yi Lee, "A 31.2mW UWB Baseband Transceiver with All-Digital I/Q mismatch Calibration and Dynamic Sampling," in Proc. 2006 IEEE Symposium VLSI Circuits, pp. 290-291, Jun. 2006.
- Mei-Hui Yang, Jui-Yuan Yu, Juinn-Ting Chen, and Chen-Yi Lee, "A Dynamic Phase-Frequency Recovery for Power Reduction in OFDM Systems," in *Proc. 2007 IEEE VLSI-DAT*, pp. 107-110, Apr. 2007.
- 6. Jui-Yuan Yu, et al., "A sub-mW Multi-Tone CDMA Baseband Transceiver Chipset for Wireless Body Area Network Applications," *ISSCC Dig. Tech. Papers*, pp. 364-365, Feb., 2007.
- 7.Ching-Che Chung and Chen-Yi Lee, "An All-Digital Phase-Locked Loop for High-Speed Clock Generation," IEEE J. Solid-State Circuits, vol. 38, NO.2, Feb. 2003.
- 8.Pao-Lung Chen, Ching-Che Chung, Jyh-Neng Yang and Chen-Yi Lee, "A clock generator with cascaded dynamic frequency counting loops for wide multiplication range applications," *IEEE J. Solid-State Circuits*, vol. 41, NO. 6, June 2006.
- 9.90nm SP\_RVT Process 2.5V/3.3V Tolerant In-Line I/O Library Databook by UMC.

Table 1: The proposed PFTCG hardware profile. Technology 90nm SPHVT CMOS Target Frequency 5MHz Supply Voltage 0.8V Lock-In Time < 128 cycles Power Consumption 77.56 μW 1<sup>st</sup> tuning stage delay resolution 37.16ns 2<sup>nd</sup> tuning stage delay resolution 1.16ns 3<sup>rd</sup> tuning stage delay resolution 18ps Phase Number 8 RMS litter 62ps Core Area 125µm x 252µm









Fig 7. The generated waveforms from the PFTCG (a) operation overview (b) multi-phase waveforms