# AN ALL-DIGITAL PLL WITH CASCADED DYNAMIC PHASE AVERAGE LOOP FOR WIDE MULTIPLICATION RANGE APPLICATIONS\*

Pao-Lung Chen<sup>1,2</sup>, Ching-Che Chung<sup>1</sup>, Chen-Yi Lee<sup>1</sup>

 Dept. of Electronics Engineering, National Chiao Tung University, 1001, University Road, Hsinchu, 300, Taiwan, ROC
Dept. of Electronics Engineering, Chin-Min Institute of Technology, 110, Shei-Fu Road, Toufen, Miaoli 351, Taiwan, ROC

Email:{pchen, cylee}@royals.ee.nctu.edu.tw

## ABSTRACT

An all-digital phase locked loop (ADPLL) with cascaded dynamic phase average (DPA) loop for wide multiplication range applications is presented in this paper. The multiplication factor can range from 4 to 65025 (255 x 255). The proposed architecture involves a minimum of hardware and improves jitter performance to reduce the noise and jitter associated with input reference. The dynamic phase averaging (DPA) loop control employing digital phase estimators (DPE) enhances frequency detection resolution and loop stability. A (Q.R) vector counter and an additional state counter serve as phase estimators. The proposed ADPLL includes cascaded DPA loops: the first stage is low frequency loop and the second stage is high frequency loop. A proto-type chip has been implemented with 0.18µm 1P6M CMOS process that can operate from 2MHz to 500MHz. The input frequency ranges from 5KHz to 50MHz. Thus it not only reduces the cost and design complexity of ADPLL, but also offers particular advantages for wide multiplication range applications.

## **1. INTRODUCTION**

Numerous applications, such as video graphics card and telecommunication system, require a frequency synthesis which has high multiplication ratio. Quartz oscillators frequently require conversion when operating at low frequency. Several methods exist for realizing frequency multiplication: analog phase locked loop (PLL) [1-2] and all-digital phase locked loop (ADPLL) [3-4]. The major components of a PLL circuit, as illustrated in Fig. 1, comprise a phase-frequency detector (PFD), charge pump, loop filter (generally a 2<sup>nd</sup> order RC filter), a programmable divider, and a voltage controlled oscillator (VCO). While the PLL approach offers flexible frequency multiplication, it requires a complex sampled feedforward filter network and multistage inverse-linear programmable current mirror for constant loop dynamics that is independent of multiplication factor 1-to-4096 as indicated in [1]. Moreover, the locking time of analog PLL using PFD and a charge pump is limited by the large time constant of the analog filter. In contrast, ADPLL can provide fast locking owing to binary searching algorithm as indicated in [3-4]. However, the specific transistor sizing of DCO in [3] come to be with changes in design specifications.

Thus, efforts at the physical design level remain unsolved. A complete clock generator design using standard cell only as the IP block with portability in [5-7] can partially solve the problem. A portable clock multiplier generator using digital CMOS standard cells based on delay locked loop is presented in [5].

However, its multiplication factor is limited to between 4 and 20. Additionally, three large register files are required for storing the history of previous 256 cycles. To generate low jitter clock output, two identical DCOs as utilized, as shown in Fig. 2, and lead to high power consumption and large silicon area in [6-7]. Thus, the work proposes an all-digital phase locked loop with a simple structure and a novel frequency control algorithm.



Fig.1. Conventional block diagram of an analog PLL.

Consider a typical all-digital phase locked loop that can be divided into five main parts: PFD, loop controller, loop filter, DCO, and programmable divider (see Fig. 2). The function of the programmable divider is simply to slow the DCO output frequency for comparison. The length of programmable divider will be very long and induces noise to DCO's output when multiplication factor is high. The loop controller generates the digital commands to track the DCO output clock based on the results from PFD. Two extra digital pulse amplifier circuits are required to minimize the dead zone of PFD, as indicated in [7]. An average loop filter is necessary to filter out the rippling and produce a smoother digital controlled word with less jumping. This requirement leads to a highly complex and expensive design. A simple method of solving the problem is based on power-of-2 integer operation; this not only simplifies the phase calculation but also significantly reduces circuit complexity.

This work proposes a salient cascaded DPA loop control algorithm for frequency search to simplify the hardware cost and enhance frequency detection's resolution. The proposed method can reduce the noise and jitter associated with input reference by dynamic phase averaging. Rather than using a PFD, a programmable divider and loop filter as in conventional approaches, a DPA loop controller and two digital phase estimators are applied.

<sup>\*</sup>This work was support by National Science Council of Taiwan, R.O.C., under Grant NSC93-2220-E-009-033.



Figure 3 shows the proposed approach based on the phase domain operation with a newly proposed DPA loop controller [8-9]. The result of the digital arithmetic comparator can be used to accelerate or slow DCO clock output. No additional need exists for another loop filter in the proposed structure because the DPA loop controller has achieved similar functionality. The proposed DPE and DPA algorithms for the all-digital phase locked loop have been verified in the 0.18- $\mu$ m CMOS process with a frequency range of (2 ~ 500) MHz at 1.8V. This demonstrates the effectiveness of the proposed mechanism.



Fig.3. Architecture of proposed ADPLL with cascaded DPA.

The remainder of this paper is organized as follows. Section 2 describes the proposed architecture of cascaded DPA loop controller. It also describes the proposed DPE and cascaded DPA algorithms. Subsequently, implementation and chip simulation results of the all-digital phase locked loop are displayed in Section 4. Finally, Section 5 offers a summary and conclusions.

#### 2. ARCHITECTURE AND ALOGORITHM

#### 2.1. Architecture of Proposed Cascaded DPA Loop

Figure 3 shows the proposed block diagram of the DPA loop control. It consists of four main functional units: a state counter. (O.R) vector counter, DPA loop controller, and DCO. This new approach does not require the programmable divider because the (Q.R) vector counter not only performs as a phase estimator of DCO output frequency but also works as a programmable divider. Similarly, the state counter performs as a phase estimator of the input reference clock. The DPA loop control algorithm performs adaptive bandwidth control based on average phase error as illustrated in Fig. 4. During the frequency acquisition, adaptive loop gain control with binary search is applied to achieve fast locking. The state counter operates at the speed of the input reference clock. The counter counts up, initially from zero, at every rising-edge of input reference clock. Similarly, the (Q.R) vector counter operates at the speed of the DCO output clock. The value of the (Q.R) is compared with the input multiplication control word when the value of state counter is power-of-2 integer. The phase sampling period thus is power-of-2 input reference clocks. If the phase comparison remains unchanged, both counters will continue phase accumulation. Otherwise, the phase error signal will be transformed to change current DCO control word (DCW) that the DCO output frequency is adjusted. Meanwhile, both the state counter and (Q.R) vector counter are reset (i.e. zero phase). Therefore, the zero and averaging phase of both counters move according to phase sampling period and the result of phase comparison.



Fig.4. The proposed DPA loop control algorithm.

#### 2.2. Structure of proposed DPA loop controller

Figure 5 illustrates the structure of the DPA loop controller, and the structure of (Q.R) vector is discussed in the following section. The decision unit performs the digital arithmetic comparisons and control signals for updating the DCW. The decision unit compares the (Q.R) vector counter based on phase sampling period with power-of-2 input reference clocks. The decision unit also controls the frequency acquisition process and fine tuning process.



Fig.5. Structure of proposed DPA loop controller.

Figure 6 illustrates the state transition diagram of the DPA loop controller. State 0 is the initial state, and both the state counter and (Q.R) vector counter are reset in this state. After a phase sampling period of  $2^1 \times T_{\text{REF},clk}$ , state 0 switches into state 1 for phase comparison. If the phase error is below the threshold region (namely R vector), state 1 switches to state 2 following  $2^2 \times T_{\text{REF},clk}$  sampling period and the DCO control word is left unchanged. Otherwise, changes the DCW depending on the phase comparison result. It also switches back state 0 to reset phase. The other states also perform a similar function to state 1.



Fig.6. State transition diagram proposed DPA loop controller.

## 2.3. Structure of (Q.R) Vector Counter

The length of the proposed (Q.R) vector counter is related to the multiplication control word (namely N1, N2) and the state counter's maximum number in each separate loop. If the maximum input multiplication control word is P, L is formulated as

$$L = \left\lceil \log_2 P \right\rceil \tag{1}$$

where |X| represents the least integer greater than or equal to X. If the length of the state counter is M, the length of the (Q.R) vector counter equals L+M. Figure 7 shows the block diagram of the (Q.R) vector counter with (A<sub>L+M-1</sub> A<sub>L+M-2</sub>....A<sub>2</sub> A<sub>1</sub> A<sub>0</sub>) bits. The values of the Q and R vector counter in the *i-th* state are

$$Q_{i} = (A_{L+i-1}, \dots, A_{i+1}, A_{i}),$$
(2)  
and  
$$R_{i} = (A_{i+1}, \dots, A_{1}, A_{0})$$
(3)

where  $A_i$  denotes the *i-th* bit of (Q.R) vector counter. The phase cycle time of the DPA loop control is defined as

 $C_i = 2^i x T_{REF\_clk}$  (4) where *i* represents the *i*-th state and  $T_{REF\_clk}$  is the cycle time of the input reference clock. To simplify the architecture and avoid complex division, the measured phase period is power-of-2 the input reference clock. The measured phase values of Q<sub>i</sub> and R<sub>i</sub> then can be calculated as follows:

$$[\mathbf{Q}_i \cdot 2^i + \mathbf{R}_i] = \left[\frac{C_i}{T_{DCO\_clk}}\right]$$
(5)

where  $T_{DCO\_clk}$  denotes the cycle time of DCO generated frequency.

![](_page_2_Figure_11.jpeg)

#### 2.4. Structure of Digitally-Controlled Oscillator (DCO)

High resolution DCO is the key component in low jitter frequency multiplier. To deal with this problem, novel DCV using NAND gates is proposed in [10] for portable delay cell design. It uses the gate capacitance difference of NOR gates under different digital control inputs to establish a digitallycontrolled varactor.

![](_page_2_Figure_14.jpeg)

Fig.8. Structure digitally-controlled oscillator (DCO).

Figure 8 illustrates the basic structure of the proposed cell-based DCO2 with 16 bits of binary weighted control  $(0000_{16} \sim \text{FFFF}_{16})$ . The DCO is implemented with standard 0.18-µm 1P6M CMOS cell library. It is separated into two stages: a coarse-tuning stage and a fine-tuning stage. The higher seven bits of the control code are for the coarse-tuning stage, while the lower 9 bits are for the fine-tuning stage. The coarse-tuning stage includes 128 buffer stages for delay-chain selection. Moreover, the number of delay cell is through a 128-to-1 path selector. This selector is implemented using multistage tri-state buffers to reduce the loading effects of coarse-tuning buffers. The coarse decoder of the DCO decodes 7 ( $=\log_2(128)$ ) bits control code into 128 control signals. This architecture enables the operating frequency of DCO to be easily modified to meet different specifications. The  $T_{PHL} + T_{PLH}$  (=  $T_{buffer}$ ) of one coarse delay cell is around 135ps in the 0.18-µm 1P6M CMOS standard cell library.

To improve the frequency resolution of the DCO, 512 digitallycontrolled varactors (DCVs) with capacitance difference  $\Delta C$  is added following the coarse-tuning stage to increase the resolution. Different types of NAND gates are used. It equals 512 DCVs with capacitance difference  $\Delta C$  in the fine-tuning stage. Therefore, the proposed NAND gate varactors for fine-tuning stage can improve delay resolution by 512 times compared with a simple buffer design.

## 3. SIMULATIONS RESULTS AND CHIP IMPLEMETATION

Post-layout simulations of proposed all-digital PLL are performed in 1.8V, and 0.18um CMOS 1P6M CMOS technology. Simulations have been successfully performed under different multiplication factors. For example, the input reference frequency is 4 MHz, and target frequency is 320 MHz (3.125ns). The multiplication factor is 80 (N=N1 x N2=8 x 10).

![](_page_3_Figure_3.jpeg)

Fig.9. Post-layout simulation of phase lock after 53 clock cycles.

![](_page_3_Picture_5.jpeg)

Fig.10. Layout view of our proposed ADPLL with cascaded DPA loop.

A proto-type of all-digital PLL with cascaded DPA loop for wide multiplication range has been implemented with the above architecture and algorithm. This chip is designed in 0.18um 1P6M CMOS process. It is a cell-based design for fast tape-out to verify the proposed architecture and algorithm. Figure 9 shows the layout of our proposed low-cost clock multiplier. The chip's size is 935 x 935 um<sup>2</sup> (core: 340 x 340 um<sup>2</sup>); it is I/O pad limited and the power consumption is 12 mw under 400MHZ.

## 4. CONCLUSION

An all-digital phase locked loop with cascaded dynamic phase average loop for wide multiplication range applications has been implemented in 0.18um 1P6M CMOS standard cell library. The proposed novel mechanism can be implemented with two (Q.R) vector counters, two state counters and two DPA loop controllers and two DCOs. Conventional frequency comparison is replaced with a digital arithmetic comparator in the proposed algorithm. The length of the (Q.R) vector counter is determined by the

maximum multiplication control word and the number of states in each separate loop. The proposed approach does not need the phase frequency detector, loop filter, and programmable divider. Since all of the circuits can be built in all-digital and cell-based, it has better stability and portability than conventional approaches under different process. This work also designed a prototype chip. The frequency output of the prototype chip from 2MHz to 500MHz at 1.8V. The input frequency ranges from 5KHz to 500MHz. The multiplication factor can range from 2 to 65025 (255 x 255). The proposed all-digital phase locked loop can be treated as soft IP to accelerate turnaround time. Therefore, it is very suitable for system-on-chip applications.

#### **5. ACKNOWLEDGEMENT**

The authors would like to thank the Chip Implementation Center (CIC) of the National Science Council in Taiwan for chip fabrication.

#### 6. REFERENCES

- J. G. Maneatis, J. Kim, I. McClatchie, J. Maxey and M. Shankarads, "Self-Biased High-Bandwidth Low-Jitter 1to-4096 Multiplier Clock Generator PLL," *IEEE J. Solid-State Circuits*, Vol. 38, pp. 1795-1803, Nov. 2003.
- [2] H. T. Ahn and D. J. Allstot, "A low-jitter 1.9-V CMOS PLL for ultra-SPARC Microprocessor Applications," *IEEE J. Solid-State Circuits*, Vol. 35, pp. 450-454, May 1999.
- [3] T. Watanabe and S. Yamauchi, "An All-digital PLL for Frequency Multiplication by 4 to 1022 with Seven-cycle Lock Time," *IEEE J. Solid-State Circuits*, Vol. 38, pp. 198-204, Feb. 2003.
- [4] J. Dunning, G. Garcia, J. Lundberg, and Ed Nuckolls, "An All-digital Phase-Locked Loop with 50-cycle Lock Time Suitable for High Performance Microprocessors," *IEEE J. Solid-State Circuits*, Vol. 30, pp. 412-422, Apr. 1995.
- [5] M. Combes, K. Dioury, and A. Greiner, "A Portable Clock Multiplier Generator Using Digital CMOS Standard Cells," *IEEE J. Solid-State Circuits*, Vol. 31, pp. 958-965, July, 1996.
- [6] T.-Y Hsu, C.-C Wang, and C.-Y. Lee, "Design and Analysis of a Portable High-Speed Clock Generator," *IEEE Trans. Circuit and Syst. II*, Vol. 36, pp. 1574-1581, Oct. 2001.
- [7] C.-C Chung and C.-Y. Lee, "An All-digital Phase-Locked Loop for High-Speed Clock Generation," *IEEE J. Solid-State Circuits*, Vol. 38, pp. 347-351, Feb. 2003.
- [8] A. kajiwara and M. Nakagawa, "A New PLL Frequency Synthesizer with High Switching Speed," *IEEE Trans. Vehicular Technology*, Vol. 41, pp. 407-413, Nov. 1992.
- [9] R. B. Staszewski, et al., "All-Digital TX Frequency Synthesizer and Discrete-Time Receiver for Bluetooth Radio in 130-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 2278-2291, Dec. 2004.
- [10] P.-L. Chen, C.-C. Chung and C.-Y. Lee, "A Novel Digitally-Controlled Varactor for Portable Delay Cell Design," *IEICE Trans. Fundamentals*, Vol. E-87A, pp. 3324-3326, Dec. 2004.