# A New DLL-Based Approach for All-Digital Multiphase Clock Generation

Ching-Che Chung and Chen-Yi Lee

Abstract-A new DLL-based approach for all-digital multiphase clock generation is presented. By using the time-to-digital converter (TDC) with fixed-step search scheme, the proposed all-digital and cell-based solution can overcome the false-lock problem in conventional designs. Furthermore, the proposed all-digital multiphase clock generator (ADMCG) can easily be ported to different processes in a short time. Thus, it can reduce the design time and design complexity in many different applications. The test chip shows that our proposal demonstrates a wide frequency range to meet the needs of many digital communication applications.

Index Terms-Delay-locked loops (DLLs), digitally controlled delay line (DCDL), multiphase clock generation, phase synchronization.

## I. INTRODUCTION

ULTIPHASE clocks are useful in many applications. In high-speed serial link applications [5], [6], [11], multi-phase clocks are used to process data streams at a bit rate higher than internal clock frequencies. In clock multiplier applications [1], [4], [10], multiphase clocks are combined to produce the desire output frequency for the synthesizer, and in microprocessors, multiphase clocks can ease the clock constraints in pre-charged logic to achieve higher operating speed [8]. In wireless LAN baseband design, the multiphase clocks can be used to find a better sampling point for the analog-to-digital converter (ADC) to improve overall system performance.

Both phase-locked loops (PLLs) [11] and delay-locked loops (DLLs) can be employed for multiphase clock generation. DLL offers better jitter performance than PLL because the noise induced by power supply or substrate noise disappears at the end of the delay line. On the other hand, the ring oscillator of the PLL accumulates jitter, and any uncertainty in an earlier transition affects all the following transitions, and its effect persists indefinitely [3], [6], [7], [9]. Thus, DLLs are good alternatives for PLLs in multiphase clock generation applications.

However, there are two major drawbacks of conventional DLLs. One is their limited phase capture range [7], and the other is restricted voltage-controlled delay line (VCDL) range to avoid false-lock to the harmonics [3], [4]. By increasing the VCDL delay range and changing the phase alignment

The authors are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: wildwolf@ si2lah org)

Digital Object Identifier 10.1109/JSSC.2003.822890

LOCK

Fig. 1. Proposed ADMCG architecture.

algorithm, it can be extended to infinite phase capture range, but the false-lock problem still cannot be overcome. Thus, in [3] and [4], a self-correcting circuit is employed to prevent the DLL locking to an incorrect delay and it can bring the DLL back into a correct locked state. However, this self-correcting circuit [3] is sensitive to the duty cycle of the reference clock since it makes decisions based on the sampling values of multiphase clock signals.

The register-controlled digital DLL is proposed in [13] to provide an all-digital solution for the DLL design. For multiphase clock generation applications, this DLL can overcome the false-lock problem by setting the delay line in minimum delay time at the beginning of phase acquisition. However, the long lock-in time makes it unsuitable for wide-range operations.

In this paper, a new DLL-based approach for multiphase clock generation is presented. The proposed all-digital multiphase clock generator (ADMCG) uses a time-to-digital converter (TDC) to choose a reasonable delay range rather than using self-correcting circuit. Thus, its operation is very robust and can avoid the possible false-lock of conventional designs. The lock-in time of the proposed ADMCG can also be reduced by adding a TDC module. After TDC operation, a fixed step search scheme is used in the ADMCG to fine-tune the output phase accuracy. The proposed architecture is all-digital and can be realized by standard cells. Thus, it yields good testability, programmability, stability, and portability over different processes, and the design time for the multiphase clock generator can also be reduced.

A test chip for the proposed ADMCG has been verified on silicon using a standard  $0.35 \mu$ m one-poly four-metal (1P4M) CMOS process with 3.3-V power supply. In this test chip, the seven-phase ADMCG is applied to design a 7:1 data channel compression transceiver. The chip measurement results show that the proposed ADMCG has a wide frequency range of 20-85 MHz, and this transceiver can achieve a maximum data



Manuscript received March 18, 2003; revised November 20, 2003. This work was supported by the National Science Council of Taiwan, R.O.C., under Grant NSC90-2215-E-009-105.



Fig. 2. Proposed ADMCG control algorithm.

rate up to 595 (=  $85 \times 7$ ) Mb/s (at 85 MHz). The maximum ADMCG's output  $P_{\rm K}$ - $P_{\rm K}$  jitter is <310 ps over the frequency range of the ADMCG with a noisy reference clock ( $P_{\rm K}$ - $P_{\rm K}$  jitter: 180 ps). Power dissipation is <75.1 mW for the transmitter and <85.5 mW for the receiver (at 20–85 MHz).

This paper is organized as follows. Section II describes the proposed ADMCG. Section III shows the implementation of the proposed ADMCG using standard cells and the test chip design for a 7:1 data channel compression transceiver. Simulation and chip measurement results of the ADMCG test chip are shown in Section IV. Section V concludes this paper with a summary.

## II. PROPOSED ADMCG

The proposed ADMCG architecture for multiphase clock generation is shown in Fig. 1. The ADMCG consists of four major modules, namely: phase detector (PD), TDC, digital-controlled delay line (DCDL), and ADMCG controller. The DCDL is divided into K equal delay stages, and all delay stages are controlled by the same control code. The TDC estimates the period of the reference clock and passes it to the ADMCG controller for selecting the suitable delay range of the DCDL. The PD detects the phase error between the reference clock and the delay line output ( $P_{K-1}$ ). It generates UP and DOWN signals to indicate that the ADMCG controller should decrease or increase the delay time of the DCDL, respectively. When phase error between reference clock and  $P_{K-1}$  is less than the dead zone of PD, the LOCK signal is asserted and then multiphase clock signals  $P_0$ - $P_{K-1}$  are generated.

The delay range problem of conventional DLL is discussed in [3], [4], and [7]. The reason that the DLL may lock to multiples of reference clock's period is because only the phase of the delay line output and reference clock is compared. Thus, when the delay line has a wide controllable range, the unpredictable initial delay time of the delay line and the unknown relationship between the delay line output and reference clock may result in locking to multiples of the reference clock's period, and hence, the multiphase clock generation fails.

Since the wrong operating delay range for the delay line and lack of information for the reference clock's period is the reason that caused false lock, how to dynamically adjust the delay line's operating range to a suitable range is the challenge for multiphase clock generator design.

Fig. 2 describes the proposed ADMCG control algorithm. As discussed in [3], [4], and [7], to avoid false lock, the DCDL should always operate under the delay range  $0.5 \times T_{\mathrm{REF}} < T_{\mathrm{DCDL}} < 1.5 \times T_{\mathrm{REF}}$ , where  $T_{\mathrm{REF}}$  means the period of reference clock and  $T_{\text{DCDL}}$  means the delay time of the delay line. In the proposed ADMCG architecture, the TDC shown in Fig. 3 converts the reference clock's period information  $(T_{\text{REF}})$  into multiples of range delay units (RDUs) delay time. After TDC encoder, the DCDL range selection control code (range [M-1:0]) is sent to the ADMCG controller. Then it makes the DCDL first operate in the delay range  $0.5 \times T_{\text{REF}} < T_{\text{DCDL}} < T_{\text{REF}}$ . After TDC operation, the ADMCG controller enters phase tracking mode, and it increases the delay time of the DCDL until the residual phase error between the reference clock and  $P_{K-1}$  has disappeared and the PD's output changes from DOWN to UP (or LOCK is asserted). Then the ADMCG controller turns into phase maintaining mode, and decreases or increases the delay time of the DCDL according to the PD's UP/DOWN signal, respectively. To speed up the lock-in time, in phase tracking mode, the phase search step is set to half of one coarse-tuning delay time, but



Fig. 3. Architecture of the time-to-digital converter (TDC).



Fig. 4. Architecture of the delay stage.

after the ADMCG controller enters phase maintaining mode, the phase search step is reduced to one fine-tuning step.

Since the proposed ADMCG is not dependent on the relationship among multiphase clock signals and it does not need to set up a start-up control to avoid the false lock, the proposed design is very robust to process, voltage, and temperature (PVT) variations. Moreover, it is insensitive to the duty cycle of the reference clock since only the rising edge of reference clock is used.

The output phase accuracy of the generated multiphase clock signals is dependent on the phase resolution of the DCDL and the dead zone of the PD. The operating frequency range of the proposed ADMCG is limited by the minimal delay time of the DCDL and the controllable range of each delay stage.

The proposed DCDL consists of K equal delay stages, and the architecture for one delay stage is shown in Fig. 4. The delay time of one delay stage is controlled by three cascading stages: range selection stage, coarse-tuning stage, and fine-tuning stage. They are controlled by the range selection control code (range [M-1:0]), coarse-tuning control code (coarse [N-1:0]), and fine-tuning control code (fine [5:0]), respectively. The range selection and coarse-tuning stages are implemented using the path selector. The difference between these two stages is that the RDU has larger delay than the coarse-tuning delay unit (CDU). The (M, N) parameters are used to adjust the operating range of the path selector by changing the number of selectable paths in the path selector. To improve the phase resolution, the fine-tuning delay cell [12] is added after the coarse-tuning stage. The fine-tuning delay cell uses six control bits (EN1, A1, B1, EN2, A2, and B2) to alter the delay time finely.

The proposed TDC architecture is shown in Fig. 3. In Fig. 3, all RDUs are cleared to low after system reset, and in the first reference clock cycle, the TDC's input (PULSE\_IN) persists at high. This high signal will propagate through the RDUs. When



Fig. 5. Proposed 7:1 data channel compression transceiver. (a) Transmitter circuit. (b) Receiver circuit.

the falling edge of the PULSE\_IN signal comes, implying the end of the pulse, the D-flip/flops will sample the current state of each RDU's output. After the TDC encoder, the reference clock's period information ( $T_{\rm REF}$ ) can be converted into multiples of RDU's delay time. The ADMCG controller uses this information to select a certain range for the DCDL.

The phase detector used in the ADMCG is the same as the phase detector which was proposed in [12]. After using the digital amplifier [12] in PD design, the dead zone of the PD can be reduced to 50 ps in the target process. The ADMCG controller is described using Hardware Description Language (HDL) and then is synthesized by logic synthesizer. All function blocks in the proposed ADMCG are cell-based design. Thus, the proposed design can be easily ported to different processes with cell library support, and it can also reduce the design time and design complexity for multiphase clock generator design.

## **III. TEST CHIP DESIGN**

The ADMCG test chip is fabricated in a standard 0.35- $\mu$ m 1P4M CMOS process. To reduce area and power consumption of the DCDL, the RDU is implemented with delay cells provided in the cell library. In those delay cells, the MOS channel length is longer than in normal cells. Therefore, they have an extremely larger delay than normal cells. The delay time of one RDU  $(T_{\rm RDU})$  is 1.6 ns in the target process. The delay time of coarse-tuning delay cell  $(T_{CDU})$  is 0.16 ns. After adding the fine-tuning delay cell, the phase resolution of each delay stage can be improved to 3 ps on the average, and the total controllable range of the fine-tuning delay cell is 0.174 ns ( $T_{\text{FINE}}$ ). To avoid a large phase jump when the path selection of the coarse-tuning stage is changed, the value of  $T_{\text{FINE}}$  must be kept larger than or equal to  $T_{\text{CDU}}$ , and the total controllable range of coarse-tuning stage also needs to be larger than  $T_{\rm RDU}$ . Thus, a 16-to-1 path selector is used in the coarse-tuning stage (i.e.,  $(16-1) \times T_{\text{CDU}} > T_{\text{RDU}}$ ). After carefully selecting the delay cells in the delay line design, the jitter effect caused by the path selector can be minimized and the possibility changing the path selection can also be reduced.

In the test chip, the proposed ADMCG is applied to design a 7:1 data channel compression transceiver. The architecture of the transceiver is shown in Fig. 5. From design specifications, the reference clock period ( $T_{\rm REF}$ ) ranges from 50 ns (20 MHz) to 11.765 ns (85 MHz), and a seven-phase multiphase clock generator is needed in the transceiver design. Thus, a 4-to-1 path selector is used in the range selection stage to provide a maximal DCDL delay time of 50.4 ns (= (7 × (4 - 1) ×  $T_{\rm RDU}$  + 7 × (16 - 1) ×  $T_{\rm CDU}$ ) larger than  $T_{\rm REF}$ .

The transmitter (TX) and the receiver (RX) are fabricated in the same test chip. The transmitter's outputs, TX\_DATA and TX\_CLK, are sent to the receiver's inputs, RX\_DATA and RX\_CLK, respectively. In the transmitter, the generated seven-phase clock signals are used to transfer 7-bits data (DATA[6:0]) into one data channel (TX\_DATA), and the transmitted data's reference clock (TX\_CLK) is also sent to the receiver. The "TX delay mirror" shown in Fig. 5(a) is used to compensate the delay time of the parallel-to-serial converter.

The receiver shown in Fig. 5(b) recovers the received data stream (RX\_DATA) back to original 7-bits data (DATA\_OUT[6:0]). The two-phase ADMCG shown in Fig. 5(b) is used to estimate the accurate delay of  $T_{\rm REF}/14$ . It aligns two adjacent phases of the seven-phase ADMCG's outputs (i.e., P<sub>6</sub> and P<sub>0</sub>) to measure the  $T_{\rm REF}/14$  delay, and the received data stream will first be delayed by  $T_{\rm REF}/14$  and then sampled by the seven-phase multiphase clock signals. Thus, those multiphase clock signals can sample the received data stream in the center of the bit symbol boundary, and this maximizes the timing margin of the receiver circuit.

Since the RX\_CLK may not have 50% duty cycle, the inverse of multiphase clock signals cannot be directly applied to sample the received data stream. Thus, to make a robust receiver, the two-phase ADMCG is necessary for the proposed receiver circuit design.



Fig. 6. Transient response of the ADMCG (at 85 MHz).



Fig. 7. Post-layout simulation of the receiver (at 85 MHz).

### **IV. EXPERIMENTAL RESULTS**

Fig. 6 shows the post-layout simulation waveform of the proposed ADMCG. To make sure that the proposed design will not cause a failure with a noisy reference clock, an 85-MHz noisy reference clock ( $P_k - P_k$  jitter:  $\pm 500$  ps) is used in this simulation. After system reset (i.e., PDWN = 1), the TDC measures the period of the reference clock, and makes the DCDL operate in a suitable delay range (i.e.,  $0.5 \times T_{REF} < T_{DCDL} < T_{REF}$ ). Then the ADMCG controller continues fine-tuning the output phase accuracy with the PD's UP/DOWN signal. When the phase error between the delay line's output (PHASE[6]) and reference clock (CLK\_IN) is minimized, the multiphase clock generation is completed.

The worst-case lock-in time of the proposed ADMCG, in terms of reference clock cycles, is equal to  $T_{\text{UPDATE}} \times (T_{\text{TDC}} + (P_{\text{COARSE}} - 1) \times 2)$ , where  $T_{\text{UPDATE}}$  means the ADMCG con-

troller update interval,  $T_{\text{TDC}}$  means the TDC operation time, and  $P_{\text{COARSE}}$  means the total paths in the coarse-tuning stage. To make sure that the previous update of DCDL control code takes effect on the delay line's output, the ADMCG controller cannot update the DCDL control code at every cycle. Hence, the  $T_{\text{UPDATE}}$  is chosen as 4. TDC only needs one clock cycle to estimate the reference clock's period. Therefore, the total lock-in time for the seven-phase ADMCG is  $< 124 (= 4 \times (1 + (16 - 1) \times 2))$  reference clock cycles.

Fig. 7 shows the operation of the receiver. In the receiver, the seven-phase ADMCG generates seven-phase multiphase clock signals (PHASE[6:0]) from the data's reference clock (RCLK). After ADMCG is locked, the two-phased ADMCG estimates the  $T_{\rm REF}/14$  delay and then the received data stream (RA\_DATA) is delayed by  $T_{\rm REF}/14$ , which is shown in Fig. 7 as INT\_RA\_DATA. As a result, the receiver can directly use the generated multiphase clock signals to sample the delayed



Fig. 8. Measured multiphase clock signals (at 32 MHz). (a) PHASE[6] and PHASE[0]. (b) PHASE[0] and PHASE[1].



Fig. 9. Measured long-term jitter of the transmitted data (at 32 MHz).

received data stream (INT\_RA\_DATA) in the center of the bit symbol boundary and achieves a maximal timing margin in the receiver circuit.

Fig. 8 shows the measured multiphase clock signals with noisy digital circuitry ( $\approx 600 \text{ mVpp}$  supply noise). The reference clock is a 32-MHz oscillator with rms jitter of 79 ps and  $P_{\rm K} - P_{\rm K}$  jitter of 180 ps. Due to the limitations of digital scope, only two data channels can be displayed simultaneously. Therefore, PHASE[6] and PHASE[0] are shown in Fig. 8(a), and PHASE[0] and PHASE[1] are shown in Fig. 8(b). The long-term  $P_{\rm K} - P_{\rm K}$  jitter histogram of the output multiphase clock signals and the measured delay time between two adjacent phases are also shown. Ideally, two adjacent phases

should be 4.464 ns (= (1/32 MHz)/7) apart, and the measured results show that the maximum error is less than 0.36% (=  $(4.48 \text{ ns}_{(PHASE[0] \sim PHASE[1])} - 4.464 \text{ ns}_{(Ideal)})/4.464 \text{ ns})$ . The long-term rms jitter and  $P_{\rm K} - P_{\rm K}$  jitter of the ADMCG's output are 154 and 310 ps, respectively.

A repetition data stream "10101010..." is applied to the transmitter where the transmitted data (TX\_DATA) have a transition at every rising edge of multiphase clock signals. This test pattern is used to measure the output data jitter and check the stability of the ADMCG's output. Thus, the transmitted data looks like a clock signal and its frequency is 3.5 (= 7/2) times higher than the reference clock. Fig. 9 shows the measured long-term  $P_K - P_K$  jitter histogram of the transmitted data.



Fig. 10. Microphotograph of the ADMCG test chip.

From the chip measurement, the transmitted data's rms jitter and  $P_{\rm K} - P_{\rm K}$  jitter are 254 and 670 ps, respectively.

Since the ADMCG needs to continue tracking the phase of the reference clock, the jitter of the reference clock will influence the measurement for the output jitter of the ADMCG and the transmitted data jitter.

The total gate count of the transmitter and the receiver is 7343 and 9683, respectively, where the gate count of the seven-phase ADMCG is 7203. The power consumption of the transmitter is 17.3 mW at 20 MHz and 75.1 mW at 85 MHz. The power consumption of the receiver is 23.6 mW at 20 MHz and 85.5 mW at 85 MHz. Fig. 10 shows a microphotograph of the test chip. The core area of the test chip is 1380  $\mu$ m × 1380  $\mu$ m.

### V. CONCLUSIONS

In this paper, an all-digital cell-based multiphase clock generator architecture is presented. The proposed ADMCG can overcome the false-lock problem in conventional designs. In the test chip, the ADMCG is applied to design a 7:1 data channel compression transceiver. The test chip shows that the proposed ADMCG has a wide frequency range (20–85 MHz) and is very robust to PVT variations and reference clock jitter. The proposed ADMCG can reduce both design time and circuit complexity. Therefore, it is very suitable for many digital communication applications.

### ACKNOWLEDGMENT

The authors would like to thank their colleagues within the SI2 group of National Chiao Tung University for many fruitful discussions. The multiproject chip support from Chip Implementation Center is acknowledged as well.

#### REFERENCES

 D. Birru, "A novel delay-locked loop based CMOS clock multiplier," *IEEE Trans. Consumer Electron.*, vol. 44, pp. 1319–1322, Nov. 1998.

- [2] Y.-S. Song and J.-K. Kang, "A delay locked loop circuit with mixed-mode tuning," in *1st IEEE Asia Pacific Conf. ASICs*, Aug. 1999, pp. 347–350.
- [3] D. J. Foley and M. P. Flynn, "CMOS DLL based 2 V, 3.2 ps jitter, 1 GHz clock synthesizer and temperature compensated tunable oscillator," in *Proc. IEEE Custom Integrated Circuits Conf.*, May 2000, pp. 371–374.
- [4] —, "A 3.3 V, 1.6 GHz, low-jitter, self-correcting DLL based clock synthesizer in 0.5 μm CMOS," in *Proc. IEEE Int. Symp. Circuits and Systems*, vol. 2, May 2000, pp. 249–252.
- [5] M.-J. E. Lee, W. J. Dally, J. W. Poulton, P. Chiang, and S. F. Greenwood, "An 84-mW 4-Gb/s clock and data recovery circuit for serial link applications," in *Symp. VLSI Circuits, Dig. Tech. Papers*, June 2001, pp. 149–152.
- [6] Y. Moon, D.-K. Jeong, and G. Ahn, "A 0.6–2.5-Gbaud CMOS tracked 3× oversampling transceiver with dead-zone phase detection for robust clock/data recovery," *IEEE J. Solid-State Circuits*, vol. 36, pp. 1974–1983, Dec. 2001.
- [7] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance," *IEEE J. Solid-State Circuits*, vol. 35, pp. 377–384, Mar. 2000.
- [8] K. Yamaguchi, M. Fukaishi, T. Sakamoto, N. Akiyama, and K. Nakamura, "A 2.5-GHz four-phase clock generator with scalable no-feedback-loop architecutre," *IEEE J. Solid-State Circuits*, vol. 36, pp. 1666–1672, Nov. 2001.
- [9] A. Hajimiri, S. Limotyrakis, and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE J. Solid-State Circuits*, vol. 34, pp. 790–804, June 1999.
- [10] L. J. Cheng and Q. Y. Lin, "The performances comparison between DLL and PLL based RF CMOS oscillators," in *Proc. 4th Int. Conf. ASIC*, Oct. 2001, pp. 827–830.
- [11] W.-H. Chen, G.-K. Dehng, J.-W. Chen, and S.-I. Liu, "A CMOS 400-Mb/s serial link for AS-memory systems using a PWM scheme," *IEEE J. Solid-State Circuits*, vol. 36, pp. 1498–1505, Oct. 2001.
- [12] C.-C. Chung and C.-Y. Lee, "An all-digital phase-locked loop for high-speed clock generation," *IEEE J. Solid-State Circuits*, vol. 38, pp. 347–351, Feb. 2003.
- [13] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H. Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura, K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T. Anezaki, M. Hasegawa, and M. Taguchi, "A 256-Mb SDRAM using a register-controlled digital DLL," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1728–1734, Nov. 1997.



Ching-Che Chung received the B.S. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1997. Since September 1998, he has been working toward the Ph.D. degree in the Si2 research group of the Department of Electronics Engineering, National Chiao Tung University.

His research interests include system-on-chip design methodologies, cell-based and fully custom VLSI design, high-speed interface circuit design, and wireless baseband processor design.

**Chen-Yi Lee** received the B.S. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1982, and the M.S. and Ph.D. degrees from Katholieke University Leuven, Belgium, in 1986 and 1990, respectively, all in electrical engineering.

From 1986 to 1990, he was with IMEC/VSDM, working in the area of architecture synthesis for DSP. In February 1991, he joined the faculty of the Electronics Engineering Department, National Chiao Tung University, Hsinchu, where he is currently a Professor. His research interests mainly include

VLSI algorithms and architectures for high-throughput DSP applications. He is also active in various aspects of high-speed networking, system-on-chip design technology, very low-bit-rate coding, and multimedia signal processing.