# A Fast-Lock-In ADPLL with High-Resolution and Low-Power DCO for SoC Applications

Duo Sheng, Ching-Che Chung, and Chen-Yi Lee Dept. of Electronics Engineering National Chiao Tung University Hsinchu, Taiwan, R.O.C. hysteria@si2lab.org

Abstract—In this paper, we propose a fast-lock-in all-digital phase-locked loop (ADPLL), which is designed with the cell library and described by Hardware Description Language (HDL). The proposed ADPLL uses a novel 2-level flash timeto-digital converter (TDC) to lock in within 2 reference clock cycles. The novel digitally controlled oscillator (DCO) achieves high-resolution with 0.93ps resolution and can extend the controllable range easily. In addition to high-resolution, the power consumption of the proposed DCO can be lowered as  $110\muW(@200MHz)$ . The proposed ADPLL can be easily ported to different process as a soft intellectual property (IP), making it very suitable for System-On-Chip (SoC) applications as well as system-level power management.

# I. INTRODUCTION

Phase-locked loop (PLL) is a very important clocking IP for many digital systems such as digital communication and microprocessor. It can be used for frequency synthesis, clock de-skew, and duty-cycle enhancement. Traditionally, the PLL is designed by analog approaches. However, in the SoC era and deep-sub micro technology, integrating an analog block into a digital system needs to take more design efforts. Furthermore, as the technology migrates, the analog blocks need to be re-designed. In contrast, all-digital phase-locked loop (ADPLL) uses the cell-based design approaches, so it can be easily integrated into the digital system. In addition, the ADPLL has the higher immunity for switching noise, and process, voltage and temperature (PVT) variations.

As the chip density increases, the power management scheme becomes more and more important in SoC design. Many systems have the sleep mode to save power consumption. When the system exits the sleep mode to enter normal active mode, the clock generator should provide the system clock as soon as possible [1]. The proposed ADPLL uses the novel 2-level flash TDC to achieve lock-in time within two reference clock cycles. This unique feature is very attractive for low-power system design with fast entry and exit power management techniques.

The digitally controlled oscillator (DCO) is the heart of ADPLL. Just like the voltage-controlled oscillator (VCO) in PLL, DCO provides the ADPLL output clock signal. The most important design consideration of ADPLL is how to

design a high-resolution and low-power DCO. Since the major jitter source of ADPLL comes from the DCO, the high-resolution DCO can reduce the jitter of ADPLL significantly. The proposed novel DCO can achieve 0.93ps resolution and the controllable range can be extended easily, thus the proposed ADPLL can achieve low-jitter operation. In addition to resolution and operation range, power consumption becomes more important in SoC applications. Since DCO occupies over 50% power consumption of ADPLL [2], reducing DCO power consumption is very important for low-power ADPLL design.

This paper presents a new ADPLL solution for SoC applications. The proposed ADPLL can be implemented with cell library. Including the DCO, all designs of ADPLL can be described with HDL language. Because its portability, it can be used as a soft IP, and it is very suitable for SoC design since both design time and complexity can be reduced. Since the proposed DCO can achieve both high resolution and low power, it can meet the demands of system-level integration.



#: TDC code[5:0] \*: DCO code[13:0]

Figure 1. The proposed ADPLL architecture

# II. ARCHITECTURE OVERVIEW

Fig. 1 is the proposed ADPLL architecture. There are several functional blocks: TDC, phase/frequency detector (PFD), ADPLL controller, DCO, and two frequency dividers (pre-divider and DCO divider). Through the DCO divider, the signal DCO\_M is the output of DCO divided by M. The Ref\_N comes from reference clock divided by N. Once the

ADPLL is enabled, TDC provides the coarse DCO control code (TDC\_code [5:0]) to the ADPLL controller after two reference clock cycles, and then DCO generates the desired frequency output by this coarse DCO control code. After TDC operation is completed, the PFD generates the signal "lead" or "lag" depending on the phase and frequency difference between Ref\_N and DCO\_M. If DCO\_M leads Ref\_N, PFD generates a "lead" signal to slow down the DCO. Conversely, when DCO\_M lags Ref\_N, PFD generates a "lead" or "lag" from the PFD, it changes the DCO control code (DCO\_code [13:0]). And then DCO control code controls the DCO to generate the output clock (DCO\_CLK). These blocks form a close-loop to achieve the "phase-locked" function.

#### III. FAST-LOCK-IN MECHANISM

In SoC design, the power management scheme becomes more and more important. Many systems have the sleep mode to save power consumption. To reduce the power consumption in the sleep mode, the clock signal should be turned off. In order to further reduce power consumption, clock generator should be shut down simultaneously. When the system exits the sleep mode to enter normal active mode, the clock generator should provide the system clock as soon as possible [1]. Here, we propose a novel 2-level flash TDC to obtain the desired DCO output frequency within two reference clock cycles. After TDC operation is completed, ADPLL enters the phase tracking mode.

## A. A Novel 2-level Flash TDC Architecture

The TDC is used to convert the timing information to the digital codes. During TDC operation, the period of reference clock divided by N (Ref N) can be quantized as the multiples of delay cell that forms the DCO coarse-tuning stage. After ADPLL controller receives the TDC output code (TDC code [5:0]), the DCO control codes are determined to generate the desired DCO output clock. There are many approaches to implement a TDC [2]-[4]. The counter-based TDC [2] uses a high-frequency clock or multi-phase clock to sample the timing interval. Utilizing a vernier delay line can enhance TDC resolution [3]. Because every delay cell output in single level flash TDC is sent to one D-flip-flop [4], it need many D-flip-flops, increasing much area and power. In contrast to single level type, our proposed design takes only 12 D-flip-flops (8+4), thus it can reduce hardwire complexity and power consumption.



Figure 2. The proposed 2-level flash TDC architecture

Fig. 2 shows the proposed 2-level flash TDC architecture. There are several functional blocks, namely one long delay chain, one short delay chain, 1st level flash TDC, 2nd level flash TDC, path selection multiplexer, and cycle time calculator. The long delay chain consists of 32 delay cells, and these delay cells are partitioned into four sections (Secs.0~3). In contrast to long delay chain, the short delay chain has only 8 delay cells. All delay cells used in long and short delay chain remain the same as those for DCO coarsetuning stage. When the TDC is enabled, Ref\_N is sent to the long delay chain, and all outputs (DL [3:0]) are sent to the 1<sup>st</sup> level flash TDC. When the first falling edge of Ref N arrives, the 1<sup>st</sup> level flash TDC generates the section selection signal (L1 SEL) to select one of section outputs for the short delay chain. Then the 2<sup>nd</sup> level flash TDC generates the delay selection signal (L2 SEL) based on the delay outputs (BL [7:0]). The section and delay outputs are thermometer code type that can be used to generate selection signals easily. When both L1 SEL and L2 SEL have been generated, the cycle time calculator can estimate the period of Ref N. The conversion equation can be given as

$$Tr = (L1\_SEL \times 8 + L2\_SEL) \times 2$$
(1)

where Tr is the period of Ref\_N. For example, as shown in Fig.3, if the period equals to 36 times of delay cell delay time, L1\_SEL and L2\_SEL should be 2 and 2 respectively. To reduce lock-in time, the TDC only measures half period of Ref\_N, thus the calculated value should be shifted left to obtain one Ref\_N cycle time.



Figure 3. Simulation of the proposed 2-level flash TDC architecture

In the proposed ADPLL architecture, the frequency of Ref\_N should be the same as the frequency of DCO divided by M (DCO\_M). The delay time of coarse-tuning stage in DCO equals Tr divided by M. In order to reduce the hardware complexity of division, we propose a novel method to approximate this division operation results. This simplified operation can be divided into two steps. First, if the value of division ratio (M) is the power of two, this division operation is only a shift-right operation. If not, we extract the value of power of two of MSB in M (MS) and add one to MS. Second, the division ratio will be shifted

right by MS and ML (MS+1), and then the division results (TM) equals the average of these two values (TL and TS). Simulation results of the novel 2-level flash TDC is shown in Fig. 3. The TDC takes only two reference clock cycles to complete lock-in operation. Because the TDC resolution equals delay time of one delay cell (165ps), the frequency error is 3.3% (@ 200MHz) in the lock-in state.

## B. Phase Tracking Mode

After TDC operation is completed, ADPLL takes one reference cycle time to align phase and enter the phase tracking mode. This mode is to track phase of the reference clock depending on the results of PFD outputs. By the pulse amplifiers and the signal expanders, the minimum detectable phase error of the PFD is 5ps [5]. When the PFD output changes from "lead" to "lag" (or vice versa), the search step is reduced by one half of the previous step. If the direction remains the same for eight times, the search step jumps twice as the previous step to accelerate phase tracking.

# IV. DIGITALLY CONTROLLED OSCILLATOR

The most important design consideration of ADPLL is how to reach a high-resolution and low-power DCO. In [1] and [6], the DCO consists of many binary-weighted width MOS to achieve high resolution. However, it needs fullcustom design and takes longer design time once process changes. Some proposals use tri-state buffers to build the DCO [7]-[8], where different delay path is selected to meet target timing. The proposed DCO can achieve 0.93ps resolution, where power consumption of ADPLL is only  $110\mu W(@200MHz)$ .



Figure 4. (a) Architecture of the proposed DCO (b) 1<sup>st</sup> and 2<sup>nd</sup> fine-tuning stage

# A. High-Resolution and Low-Power DCO Arehitecture

Fig.4 (a) shows architecture of the proposed DCO, which consists of three stages, namely coarse-tuning stage, 1<sup>st</sup> finetuning stage, and 2<sup>nd</sup> fine-tuning stage. First, in the coarsetuning stage, there are 64 different paths and only one path is selected by the 64-to-1 path selector MUX. In order to reduce the loading capacitance of MUX output, we use multistage tri-stage buffers to form this MUX [5], [7]. The supply voltage for DCO can be lowered to reduce power consumption of DCO. Since the supply voltage is lower, it needs fewer delay cells to obtain the same delay time and hence area and power can be saved. In order to further reduce power consumption of DCO, the 64 coarse-tuning delay cells are divided into several delay groups. When the DCO operates in high frequency, the desired delay time is short. Thus the some segmented delay groups can be disabled and the power consumption can be reduced.

Second, in order to increase the frequency resolution of DCO, the  $1^{st}$  fine-tuning stage and  $2^{nd}$  fine-tuning stage are added into the DCO design as shown in Fig.4 (b). The 1<sup>st</sup> stage is composed of eight 1<sup>st</sup> fine-tuning delay cells and each of which contains one inverter and one tri-stage inverter. These delay cells are controlled by the control signals (F1ON  $[0] \sim F1ON [7]$ ). When the tri-stage inverter in 1<sup>st</sup> fine-tuning delay cell is enabled, the output signal of enabled tri-stage inverter has the hysteresis phenomenon. Thus the delay cell can achieve different delay time. Finally, in order to further increase the DCO resolution, the 2<sup>nd</sup> fine-tuning stage is added. It is composed of 32 tri-state inverters to improve the resolution. The gate capacitance of tri-state inverter is changed by the control signals (F2ON  $[0] \sim$  F2ON [31]) [5], [8]. As the gate capacitance of tri-state inverter changes, the delay of 2<sup>nd</sup> fine-tuning stage can be adjusted.

TABLE I. DCO SIMULATION RESULTS

|                  | Best Case (ps) |        | Typical Case (ps) |        | Worst Case (ps) |        |
|------------------|----------------|--------|-------------------|--------|-----------------|--------|
|                  | Step           | Range  | Step              | Range  | Step            | Range  |
| Coarse-tuning    | 98.75          |        | 165.13            |        | 334.26          |        |
| 1st. Fine-tuning | 14.51          | 101.54 | 24.93             | 174.51 | 53.22           | 372.52 |
| 2nd. Fine-tuning | 0.64           | 19.69  | 0.93              | 28.86  | 1.72            | 53.29  |



Figure 5. Linearity of DCO

TABLE II. COMPARISON WITH EXISTING DCOS

| Performance Parameter | Proposed    | [10]        | [2]           | [8]           | [6]                    | [9]             |
|-----------------------|-------------|-------------|---------------|---------------|------------------------|-----------------|
| Process               | 0.13µm CMOS | 0.13µm CMOS | 0.35µm CMOS   | 0.35µm CMOS   | 0.18µm CMOS            | 0.18µm CMOS     |
| Operation range (MHz) | 98 ~ 599    | 150         | 152 ~ 366     | $18 \sim 214$ | 413 ~ 485              | $140 \sim 1030$ |
| LSB resolution (ps)   | 0.93        | 40          | $10 \sim 150$ | 1.55          | 2                      | 22              |
| Power consumption     | 110µW       | 1mW         | NA            | 18mW          | $170 \sim 340 \ \mu W$ | NA              |
|                       | (@200MHz)   | (@150MHz)   |               | (@200MHz)     | (Static only)          |                 |
| Portability           | Yes         | No          | Yes           | Yes           | No                     | Yes             |

## B. The Proposed DCO Performance

Table I shows the DCO HSPICE simulation results in the typical case (TT, 0.8V, 25°C), the best case (FF, 0.88V, -40°C), and the worst case (SS, 0.72V, 125°C) respectively. Note that the controllable range of each stage should cover the step of the pervious stage. As a result, the DCO does not have any unreachable zone. It is easy to extend the controllable range of DCO by changing the coarse delay cell number. And the finest step of 2<sup>nd</sup> fine-tuning stage determines the DCO resolution. Thus the proposed DCO can achieve high resolution with 0.93ps. In addition to DCO resolution, the proposed DCO has a good linearity as shown in Fig. 5. Because DCO operates under lower supply voltage and has the segmented groups, it consumes only 110µW (@200MHz and fine-tuning stage providing the maximum delay). Table II lists the comparisons with the state-of-the-art DCOs. The proposed DCO has the finest resolution, the lowest power consumption, and the best portability.

## V. SIMULATION RESULTS

The proposed ADPLL is designed by cell-based design flow. We use Hardware Description Language (HDL) to design and describe the proposed ADPLL. Fig.6 shows the transient response of the ADPLL, where the reference clock is 20MHz, and the division ratio (M) is 10. Thus the output frequency is 200MHz (=20MHz \* 10). The TDC takes 2 reference clock cycles to complete lock-in operation and one cycle to align phase. After TDC operation is completed, the DCO control code is changed by PFD output frequency to generate desired DCO output frequency. In Fig. 6, we can see that the DCO control code will be converged to a stable value and complete the lock function.



Figure 6. Transient response of the proposed ADPLL

### VI. CONCLUSIONS

In this paper, a fast lock-in ADPLL has been proposed. The lock-in operation takes 2 reference clock cycles by the proposed 2-level flash TDC. In addition, a high-resolution and low-power DCO has three different tuning stages and segmented groups, resulting in 0.93ps resolution and power consumption of  $110\mu W(@200MHz)$ . Since all designs of the proposed ADPLL are described with HDL language, it can be ported to different processes, making our proposal very suitable for system-level and SoC applications.

## ACKNOWLEDGMENT

The authors would like to thank their colleagues within the SI2 group of National Chiao Tung University for many fruitful discussions in design and implementation.

### REFERENCES

- J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, "An all-digital phase-locked loop with 50-cycle lock time suitable for highperformance microprocessors," IEEE J. Solid-State Circuits, vol. 30, pp. 412–422, Apr. 1995.
- [2] T. Olsson and P. Nilsson, "A digitally controlled PLL for Soc Appications," IEEE J. Solid-State Circuits, vol. 39, no. 5, pp. 751–760, May. 2004.
- [3] P. Dudek, S. Szczepan'ski, and J. V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," IEEE Trans. Solid-State Circuits, vol. 35, no. 2, pp. 240–247, Feb. 2000.
- [4] P.M. Levine and G.W. Roberts, "High-resolution flash time-to-digital conversion and calibration for system-on-chip testing," IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 3, pp. 415-426, May 2005.
- [5] D. Sheng, C.-C. Chung and C.-Y. Lee, "An all-digital phase-locked loop with high-resoultion for SoC applications," IEEE VLSI-DAT International Symposium, April 2006, in press.
- [6] M. Maymandi-Nejad and M. Sachdev, "A Monotonic Digitally Controlled Delay Element," IEEE J. Solid-State Circuits, vol. 40, no. 11, pp. 2212–2219, Nov. 2005.
- [7] C.-C. Chung and C.-Y. Lee, "An all digital phase-locked loop for highspeed clock generation," IEEE J. Solid-State Circuits, vol. 38, no. 2, pp. 347–351, Feb. 2003.
- [8] P.-L. Chen, C.-C. Chung and C.-Y. Lee, "A novel digitally-controlled oscillator using novel varactors," IEEE Trans. Circuits and Syst. II, Express Briefs, Vol. 52, No. 5, pp. 233-237, May 2005.
- [9] C.-T. Wu, W. Wang, I-C. Wey, and A.-Y. Wu, "A Scalable DCO Design for Portable ADPLL Designs," IEEE International Symposium on Circuits and Systems, Vol., pp.5449 –5452, May 2005.
- [10] P. Raha, S. Randall, R. Jennings, B. Helmick, A. Amerasekera, and B. Haroun, "A robust digital delay line architecture in a 0.13-µm CMOS technology node for reduced design and process sensitivities," *ISQED*'02, pp. 148–153, Mar. 2002.