# A Clock Generator With Cascaded Dynamic Frequency Counting Loops for Wide Multiplication Range Applications

Pao-Lung Chen, Member, IEEE, Ching-Che Chung, Member, IEEE, Jyh-Neng Yang, Member, IEEE, and Chen-Yi Lee, Member, IEEE

Abstract—This work presents a clock generator with cascaded dynamic frequency counting (DFC) loops for wide multiplication range applications. The DFC loop, which uses variable time period to estimate and tune the frequency of the digitally controlled oscillator (DCO), enhances the resolution of frequency detection. The conventional phase-frequency detector (PFD) and programmable divider are replaced with a digital arithmetic comparator and a DCO timing counter. The value in the DCO timing counter is separated into quotient and remainder vectors. A threshold region is set in the remainder vector to reduce the influence of jitter variation in frequency detection. The loop stability can be retained by cascading two DFC loops when the multiplication factor (N) is large. The proposed clock generator achieves a multiplication range from 4 to 13888 with output peak-to-peak jitter less than 2.8% of clock period. A test chip for the proposed clock generator is fabricated in 0.18- $\mu$ m CMOS process with core area of 0.16 mm<sup>2</sup>. Power consumption is 15 mW @ 378 MHz with 1.8-V supply voltage.

*Index Terms*—Clock generator, digitally controlled oscillator (DCO), digitally controlled varactor (DCV), dynamic frequency counting (DFC), phase-locked loop (PLL).

#### I. INTRODUCTION

**P**ROGRAMMABLE multifunction unit clock generators are becoming popular for system-level integration, including processors and video/chip interfaces. Conventional approaches [1]–[4] utilize phase-locked loops (PLLs) to generate different high-frequency outputs with a low-frequency crystal clock by setting the multiplying factor. To reduce the cost and enhance the stability as shhown in [5], a capacitorresistor (CR) oscillator can be incorporated into the integrated circuit (IC) to obtain a stable clock in a low-frequency clock around 10 kHz. Thus, the external reference clock, e.g., the crystal clock, can be eliminated. However, high-speed clock output is still necessary (e.g., 100 MHz). In this case, the frequency multiplication ratio becomes very large (e.g., 10000). The multiplication ratio is also over 6000 for a signal generator with input frequency of 32.768 kHz as indicated in [6]. As the

Manuscript received July 8, 2005; revised November 14, 2005. This work was supported by the National Science Council of Taiwan, R.O.C., under Contract NSC-93-2220-E-009-033.

P.-L. Chen was with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. He is now with the Department of Electronics Engineering, Chin Min Institute of Technology, Miaoli 350, Taiwan, R.O.C. (e-mail: paoo@ms.chinmin.edu.tw).

C.-C. Chung, J.-N. Yang, and C.-Y. Lee are with the Department of Electronics Engineering, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: cylee@si2lab.org).

Digital Object Identifier 10.1109/JSSC.2006.874273

input-to-output multiplication factor increases, the prior art PLLs [1], [3] become increasingly sensitive to noise sources resulting in additional output jitter. Thus, this work develops an all-digital cell-based clock generator for wide multiplication range frequency synthesis using the proposed cascaded dynamic frequency counting loops.

Fig. 1 shows the major components of a second-order PLL circuit, which consists of a phase-frequency detector (PFD), charge pump, loop filter (RC filter), programmable divider (N), and voltage-controlled oscillator (VCO). While PLL provides flexible frequency multiplication, the loop parameters, such as damping factor, loop bandwidth, must be adjusted to minimize jitter and to ensure that each output frequency and multiplication factor is stable. The loop bandwidth should be around 1/20 of the reference frequency, as demonstrated by Maneatis [2]. To minimize period jitter, the third-order pole should be set about 1/2 of reference frequency. Furthermore, the damping factor  $\zeta$  is given by

$$\zeta = \frac{1}{2} \sqrt{\frac{C}{N} \cdot I_{\rm CP} \cdot R^2 \cdot K_{\rm VCO}}.$$
 (1)

The damping factor affects the loop stability, and the loop bandwidth significantly influences the system response rate. The damping factor should be approximately 1 in the second-order PLL loop [2], which creates a challenge for wide multiplication ranges as revealed in (1).

To cope with a wide multiplication range, [3] utilized a current mode filter, while [4] decomposed capacitor C into smaller  $C_1$  and  $C_2$  with a cascaded loop. However, both multiplication factors (N) were less than 255. The work in [2] applied a scalable charge-pump current to compensate for the damping factor and bandwidth dependence of the multiplication factor. The multiplication factor (N) of [2] ranges from 1 to 4096. A signal generator [6] used a frequency multiplier in the first stage, followed by an analog PLL to achieve a multiplication factor over 6000 with an input frequency of 32.768 kHz. The method of [6] method requires a D/A converter with an additional high-speed oscillator for the frequency multiplier of the first stage.

The time constant of the loop filter limits the locking time of the analog PLL with a PFD and a charge pump. By contrast, an all-digital PLL (ADPLL) can achieve fast locking due to an efficient searching algorithm and digital loop filter techniques [5],



Fig. 1. Block diagram of a conventional second-order PLL-based clock generator.



Fig. 2. Loop control algorithm for the proposed DFC loop.

[7]–[10]. However, each of those proposed methods has a multiplication factor (N) smaller than 1024. Furthermore, changing specifications requires transistor resizing of DCO in some proposed methods [7]–[11]. Thus, the physical design effort must still be considered.

A full clock generator design using standard cell only as the intellectual property (IP) block in [11]–[14] can partially solve the portability problem. A portable clock multiplier generator using a digital CMOS standard cells based on a delay-locked loop is presented in [12]. However, the multiplication factor in the generator of [12] is limited to  $4\sim20$ . Additionally, three large register files are needed to store the history of the previous 256 cycles. The work in [13] and [14] applied two identical DCOs to generate low-jitter clock output, leading to higher power consumption and larger silicon area.

This work proposes a cascaded DFC loop for a wide multiplication range over 10 000 and a minimum input frequency around 20 kHz. The cascaded DFC loops consist of two DFC loops in series. Each DFC loop has a similar first-order transfer function, but different loop parameters and requirements in DCO 1 and DCO 2. To enhance frequency detection resolution, the DFC loop adopts a variable period to measure and control the DCO output frequency. The variable sampling period is  $2^n$  reference clock cycles, where *n* ranges from 1 to 8. The proposed frequency detection loop can simplify the frequency calculation and significantly reduce circuit complexity. The first DFC loop generates an intermediate frequency for next DFC loop to enhance the overall loop stability because the cascading DFC loops result in smaller loop gains than a single loop.

The proposed all-digital clock generator for wide multiplication range was verified in  $0.18 \ \mu m$  one-poly six-metal (1P6M) CMOS process with a frequency range of  $2.4 \ 378$  MHz at 1.8 V. When the multiplication factor is less than 13888, the peak-to-peak ( $P_k \ P_k$ ) jitter is less than 2.8% of the output clock period. This demonstrates the effectiveness of the proposed mechanism for wide multiplication range applications.

The rest of this paper is organized as follows. Section II describes the proposed DFC loop acquisition algorithm and structure of the DFC loop controller. The quotient and remainder vectors in the DCO timing counter and the threshold decision zone are also addressed. Section III considers the loop design issues in cascaded DFC loops for wide multiplication range. Section IV then describes the structure of DCO 1 and highresolution DCO 2 with digitally controlled varactors (DCVs). Section V discusses the chip measurement results of the proposed clock generator with cascaded DFC loops. Finally, Section VI presents a summary and conclusions.

#### II. DFC LOOP CONTROL ALGORITHM

# A. Algorithm of Dynamic Frequency Counting Loop

If a loop uses a phase frequency detector (PFD) to detect the phase difference, the loop's order may become greater than 1. Note that phase is the integral of frequency, implying a 90 degree shifted response from the DCO control word (DCW). A mathematical analysis shows that the phase measurement in PFD creates a pole in the loop response. Hence, a zero is needed to eliminate the inherent pole. And hence, the loop is at least a second-order control system [7], [11].

The dynamic frequency counting loop is a first-order system that uses a variable time period to estimate and tune the frequency of the DCO. Fig. 2 shows the loop control algorithm of the proposed DFC loop. The reference timing counter (REF\_timing\_counter) operates at the reference clock rate. The counter initially from zero counts up at every rising-edge



Fig. 3. Timestamp of frequency comparison.

of the reference clock. Similarly, the DCO timing counter (DCO\_timing\_counter) operates at the speed of the DCO output clock. The quotient and remainder vectors in the DCO timing counter are compared with the input multiplication factor (N) when the reference timing counter is power-of-2 reference clock cycles. The sampling cycle time of the DFC loop control is defined as

$$C_i = 2^i \cdot T_{\text{REF}} \tag{2}$$

where *i* represents the *i*th DFC sampling state and  $T_{\text{REF}}$  is the cycle time of the input reference clock ( $F_{\text{REF}}$ ). The sampling cycle time  $C_i$  is the timestamp performing the frequency comparison of DCO clock output and M is the maximum number of the DFC sampling state, which is shown in Fig. 3. If the result of frequency comparison remains unchanged and the maximum number of the DFC sampling state is less than M + 1, both counters will continue frequency accumulation. Otherwise, the frequency error signal will be asserted to update current DCO control word (DCW) to adjust the DCO output frequency. Meanwhile, both reference timing counter and DCO timing counter are reset to zero.

Fig. 4 shows the basic structure of the dynamic frequency counting loop. It consists of four main functional units: reference timing counter, DCO timing counter, DCO, and DFC loop controller. The reference timing counter serves as a variable timer for decision unit to estimate and control the DCO. The DCO timing counter performs as a frequency estimator of DCO output frequency. The DFC loop controller performs loop control, frequency error accumulation, and gain control based on the measured frequency in the DCO timing counter. The DFC loop is a discrete-time sampled system implemented with all digital components. Consequently, the z-domain representation is the most natural method instead of using damping factor as indicated in (1) for analog PLL. Fig. 5 illustrates the signal model of the proposed clock generator with the DFC loop control. The transfer function of Fig. 5 is given by

$$H[z(t)] = \frac{K(i) \cdot [z(t)]^{-1}}{1 - (1 - K(i)) \cdot [z(t)]^{-1}}$$
(3)

where K(i) is the loop gain (i.e.,  $K_i \cdot K_{\text{DCO}} \cdot 2^i / F_{\text{REF}}$ ) and  $t = 2^i \times T_{\text{REF}}$ , i = 0, 1, 2, ..., M. From (3), it should be noted that K(i) should be placed in the range  $0 \sim 2$  for loop stability. Fig. 6 illustrates the pole displacement with gain variation. The maximum value of M is bounded by the loop gain K(i). The DFC loop control is a first-order time-varying system. It only accu-



Fig. 4. Basic structure of the proposed dynamic frequency counting loop.

mulates the frequency error, thus, it generally features faster dynamics and greater stability than higher order loops. The digital arithmetic comparator replaces conventional PFD conversion mechanisms. The variable frequency  $F_{\rm DCO}[j]$  is determined by counting the number of rising-edge clock transitions of the digitally controlled oscillator. The sampled  $F_{\rm DCO}[i]$  is compared with the multiplication factor (N) in a digital arithmetic comparator. If the comparison is equal, the comparator outputs 0; otherwise, it outputs +1 (-1) if the reference frequency is faster (slower) than the DCO frequency. The result is then multiplied by gain  $K_i$ . Then, the DCO's output frequency is adjusted.

Fig. 7 shows the step response of the proposed clock generator with DFC algorithm as compared with the sequential search. The step size of the sequential search is 1/128 (i.e., DCO is 8 bits) and the initial condition is set to 0.5 of normalized frequency to improve visualization. The proposed all-digital clock generator with DFC loop achieves fast locking in less than 10 iterations as illustrated in Fig. 7(a). By contrast, the sequential search requires at least 50 iterations to achieve frequency acquisition, as shown in Fig. 7(b). Furthermore, jitter variations will occur as indicated in Fig. 7(c) if the fine resolution of the DCO is inadequate.

# B. Structure of the Reference and DCO Timing Counters

The structure of reference timing counter is a ripple counter with reset function. The length of reference timing counter is dependent on the maximum DFC sampling state to estimated DCO frequency. The structure of DCO timing counter is also a ripple counter with reset function. Therefore, most bits of the DCO timing counter are working in the low clock rate which can save power consumption. The length of the DCO timing counter is related to the multiplication factor (N) and the length of reference timing counter. If the maximum value of multiplication factor is P, L is formulated as

Ì

$$L = \lceil \log_2 P \rceil \tag{4}$$



Fig. 5. Signal model of the proposed dynamic frequency counting loop.



Fig. 6. Pole displacement by gain variation.



Fig. 7. Step response of the proposed clock generator with DFC loop control versus sequential search. (a) Proposed. (b) Sequential search. (c) Jitter variation.

where  $\lceil X \rceil$  represents the least integer greater than or equal to X. If the length of the reference timing counter is M, which is equal to the maximum number of the DFC sampling state, the length of the DCO timing counter equals L + M. Fig. 8 shows the block diagram of the quotient and remainder vectors in the DCO timing counter with  $(A_{L+M-1}A_{L+M-2}...A_2A_1A_0)$ 



Fig. 8. Structure of quotient and remainder vectors in the DCO timing counter.

bits. The values of the quotient and remainder vectors in the *i*th sampling state are

$$Q_i = (A_{L+i-1} \dots A_{i+1} A_i) \tag{5}$$

$$R_i = (A_{i-1} \dots A_1 A_0) \tag{6}$$

where  $A_i$  denotes the *i*th bit of the DCO timing counter. The measured values of  $Q_i$  and  $R_i$  in the DCO timing counter can then be calculated as follows:

$$[Q_i \cdot 2^i + R_i] = \left\lceil \frac{C_i}{T_{\text{DCO}}} \right\rceil \tag{7}$$

where  $T_{\rm DCO}$  denotes the cycle time of the DCO generated frequency.

# C. Structure of the DFC Loop Controller

The structure of the DFC loop controller is shown in Fig. 9. The decision unit performs the digital arithmetic comparisons and updates the DCO control word (DCW). The decision unit compares the DCO timing counter based on frequency sampling period with power-of-2 input reference clock cycles as shown in Fig. 3. The decision unit also controls the frequency acquisition process and fine-tuning process. During the acquisition, loop gain control with binary search is applied to achieve fast locking. A multiplexer is used to select the DCO control code. Fig. 10(a)



Fig. 9. Structure of the proposed DFC loop controller.

illustrates the state transition diagram of the DFC loop controller for frequency search. Fig. 10(b) shows the state of sampling period which is performed among acquisition, fine-tuned, and locked states. Sampling state 0 is the initial state, and both reference timing counter and DCO timing counter are reset in this state. After a sampling period of  $2^1 \times T_{\text{REF}}$ , the DFC loop controller will switch from sampling state 0 into sampling state 1 for frequency comparison. If the quotient vector in DCO timing counter equals multiplication factor (N) and the frequency error in remainder vector is below the threshold region, sampling state 1 enters into sampling state 2 until  $2^2 \times T_{\text{REF}}$  sampling period and the DCO control word is left unchanged. Otherwise, the DFC loop controller changes the DCW depending on the frequency comparison result. Then, it also switches back to sampling state 0 and reset both reference and DCO timing counters. Similar operations as in sampling state 1 are performed in 2 to M-1 sampling states. When the DFC loop controller enters the maximum sampling state M, it will automatically switch back to sampling state 0.

#### D. Threshold Decision

Considering the influence of jitter variation on frequency detection, the accumulation of frequency helps minimizing jitter effect in the frequency detection. A time varying with average processing and a threshold region scheme can be applied. For simplification and without losing generality, the average frequency period following n cycles can be represented as

$$\overline{T} = T_{\text{ideal}} + \frac{\sum_{j=0}^{n-1} T_{\text{jitter}(j)}}{n}$$
(8)

where  $T_{\text{ideal}}$  denotes the period of the jitterless clock or the named perfect frequency period, while  $T_{\text{jitter}(j)}$  represents the frequency jitter in the *j*th cycle. All jitters are assumed to be independent and identically distributed (i.i.d) in statistics. The jitter factors can be reduced to an acceptable range when average *n* is large. Therefore, (7) can be rewritten using the average period of input reference frequency and DCO generated clock output frequency by using (8):

$$[2^{i} \cdot Q_{i} + R_{i}] = \left[\frac{2^{i} \cdot \overline{T_{\text{REF}}}}{\overline{T_{\text{DCO}}}}\right]$$
$$= \left[\frac{2^{i} \cdot \left(T_{\text{REF\_ideal}} + \frac{\sum\limits_{j=1}^{2^{i}} T_{\text{REF\_jitter}(j)}}{2^{i}}\right)}{T_{\text{DCO\_ideal}} + \frac{\sum\limits_{j=1}^{N*2^{i}} T_{\text{DCO\_jitter}(j)}}{2^{i} \cdot N}}\right]$$
$$\approx 2^{i} \cdot N + \text{threshold}(i, N). \tag{9}$$

Both  $T_{\text{DCO\_ideal}}$  and  $T_{\text{REF\_ideal}}$  denote the ideal period of DCO output and input reference frequency. Similarly,  $T_{\text{DCO\_jitter}(j)}$ and  $T_{\text{REF\_jitter}(j)}$  represent the frequency jitter of the DCO generated clock output and reference frequency in the *j*th individual cycle. Therefore, a threshold value can be defined to solve the jitter variation in frequency detection. The resolution of frequency detection can be expressed as follows:

$$\operatorname{Resolution}(i, N) = 1 - \frac{\operatorname{Threshold}(i, N)}{2^{i} \cdot N}.$$
 (10)

The resolution of frequency detection is enhanced by increasing the number of sampling state and multiplication factor (N). Therefore, the maximum value of sampling state M is a tradeoff between loop gain K(i) and resolution of frequency detection.

# III. CASCADED DFC LOOPS FOR WIDE MULTIPLICATION RANGE

Cascaded DFC loops are similar in design to multistage amplifiers. The dynamic frequency counting loop uses variable sampling period to estimate and adjust the DCO frequency, thereby enhancing the frequency detection resolution. However, the loop gain K(i) in (3) caused by  $K_{\text{DCO}}$  becomes large when multiplication factor (N) has a wide range. The stable



Fig. 10. State transition of the proposed DFC loop controller. (a) State of frequency search. (b) State of sampling period.

region of K(i) is 0~2. To ensure that the loop remains stable for applications with wide multiplication ranges, the transfer function of (3) can be decomposed into

$$H[z(t)] = \frac{K_1(i) \cdot [z(t_1)]^{-1}}{1 - (1 - K_1(i)) \cdot [z(t_1)]^{-1}} \cdot \frac{K_2(i) \cdot [z(t_2)]^{-1}}{1 - (1 - K_2(i)) \cdot [z(t_2)]^{-1}} \quad (11)$$

where  $K_1(i)$  denotes the loop gain of the first stage, and equals  $(K_{i1} \cdot K_{\text{DCO1}} \cdot 2^i / F_{\text{REF}}), t_1 = 2^i \cdot T_{\text{REF}}, i = 0, 1, 2, \dots, M_1$ . Similarly,  $K_2(i)$  represents the loop gain of the second stage, and equals  $(K_{i2} \cdot K_{\text{DCO2}} \cdot 2^i / F_{\text{DCO1}})$  and  $t_2 = 2^i \cdot T_{\text{DCO1}}, i = 0, 1, 2, \dots, M_2$ . In these formulas,  $M_1$  and  $M_2$  represent the maximum numbers of sampling state in each DFC loop.

The DFC loop in the first stage generates a low-frequency output, or an intermediate frequency for the next stage. The output frequency range of a single DCO can be further divided



Fig. 11. Structure of the proposed clock generator with cascaded DFC loops.

into two DCO ranges. Hence, the loop gain  $K_1(i)$  and  $K_2(i)$  are easier to control in a stable region than K(i) using a single loop in (3). This approach not only decreases the length of the DCO timing counter and reference timing counter, but also the gain of the DCO. The proposed cascaded loops generate output frequency by averaging process in each stage. Hence, the noise influence or jitter variation on the DFC loop in the first stage has less influence on the output frequency of the DFC loop in the second stage.

### A. System Block Diagram and Loop Parameters Design

Fig. 11 shows a system block diagram of the cascaded dynamic frequency counting loops for wide multiplication range applications. The system consists of two DFC loops in series, called DFC loop 1 and DFC loop 2. The first DFC loop 1 generates low-frequency output, or intermediate frequencies for the DFC loop 2. DFC loops 1 and 2 have different loop parameters and DCO requirements. DFC loop 1 only requires a low-frequency DCO 1 and a low-frequency detector. By contrast, the second stage is a high-resolution DCO 2, and also requires a high-resolution frequency detector. Both DFC loops 1 and 2 can be disabled, depending on output requirement, by the mode control. To prevent false locking, the DFC loop 2 is enabled after DFC loop 1 is locked in the acquisition process when two loops are employed. Because the DCO is divided into two DCO ranges, DCO 1 has a smaller control code than the DCO of a single loop, thus shortening the locking time.

To produce the maximum multiplication factor range over 10 000 and the lowest input frequency around 10 kHz, the multiplication factor (N) needs to be evenly distributed in the first and second stages. Therefore, the multiplication factor (N1) in the first stage's DFC loop is in the range 2~255, and the multiplication factor (N2) in the second stage ranges 2~128. However, the lengths of the reference timing and DCO timing counters are different because of differences between  $K_1(i)$  and  $K_2(i)$  in (11). The variable sampling period in the DFC loop improves frequency detection as indicated in (10), but also affects the loop stability. After tradeoff of (10) and (11) with simulations, the length of reference timing counter is 4 bits in DFC loop 1 (i.e.,  $M_1 = 4$ ), and 8 bits in DFC loop 2 (i.e.,  $M_2 = 8$ ). Therefore, the DCO timing counters are 12 and 16 bits long in the first and second stages, respectively. The value of threshold(i, N) is set

to 8 in DFC loop 1, and 2 in DFC loop 2. Additionally, the least significant bit (LSB) resolutions of DCO 1 and DCO 2 are 65 ps and 0.55 ps, respectively.

# IV. DESIGN OF DCO 1 AND 2 FOR CASCADED DFC LOOPS

#### A. Structure of DCO 1

In the proposed design, the gain of DCO 1 ( $K_{\text{DCO1}}$ ) is 65 ps/LSB, and the output frequency of DCO 1 is in the range 2~34 MHz (i.e., 29.4~500 ns). The proposed design only needs 13 binary weighted control bits, and achieves LSB resolution below 65 ps. The LSB resolution of DCO 1 can be easily implemented using standard cells in standard 0.18- $\mu$ m 1P6M CMOS process. Fig. 12 illustrates the basic structure of DCO 1, which is controlled by three cascading stages, range selection (DCW1[12:7]), coarse-tuning (DCW1[6:1]), and fine-tuning (DCW1[0]). The range selection and coarse-tuning stages are implemented with a path selector. The difference between these two stages is that the unit delay in range-tuning stage is larger than that in the coarse-tuning stage. To lower the chip area and power consumption of DCO 1, the unit delay in range-tuning stage is implemented with a delay cell in the cell library. Delay cells have longer MOS channel than normal cells, so have a much larger delay. A coarse-tuning delay cell is simply one buffer delay cell. The fine-tuning stage is 1-bit to control parallelized tri-state buffers.

# *B.* High-Resolution DCO 2 With Digitally Controlled Varactors

The high-resolution DCO 2 is the key component in the proposed low-jitter clock generator for a wide multiplication range. To achieve 0.55 ps/LSB, [15] has developed a novel DCV using three-input NAND gates for DCO design. The DCV [15] uses the gate capacitance difference of NAND gates under different digital control inputs to build a digitally controlled varactor (DCV) as shown in Fig. 13(a). Fig. 13(b) shows the equivalent circuit of Fig. 13(a), an initial capacitance  $(C_I)$  parallels with a capacitance difference with  $(\Delta C)$ . The D input controls the capacitance  $(\Delta C)$  in the output (Out) node. The marked transistor (M1) produces a large capacitance difference under different D states. 256 DCVs with different NAND gates are utilized to achieve high-resolution DCO. Therefore, the



Fig. 12. Structure of DCO 1.



Fig. 13. Using three-input NAND gate as DCV. (a) Circuit with digital control. (b) Equivalent circuit with  $\Delta C$  capacitance.



Fig. 14. 256 DCV's in the fine-tuning stage of DCO 2.

LSB resolution of DCO 2 for fine-tuning stage can be improved by 256 times as compared with a simple buffer design.

The structure of DCO 2 is similar to DCO 1 shown in Fig. 12. It is also separated into three stages, namely range selection, coarse-tuning, and fine-tuning. The coarse-tuning stage is the same as that of DCO 1, and the fine-tuning stage is replaced by 256 DCV's as shown in Fig. 14. DCO 2 has 16 bits of binary weighted control code ( $0000_{16} \sim FFFF_{16}$ ). The coarse-tuning stage includes 32 buffer stages for delay-chain selection. The output frequency of DCO 2 is in the range 28~335 MHz. This architecture allows the operating frequency of DCO 2 to be modified easily to meet different specifications.

Both DCO 1 and DCO 2 are designed in gate-level Hardware Description Language (HDL) codes. The DFC loop controller is also designed in HDL code and synthesized by a target library. The final circuit layout is generated by auto placement and routing (APR) tools. As a result, the design cycle time is tremendously reduced during process migration.

#### V. EXPERIMENTAL RESULTS

The proposed clock generator with cascaded DFC loops is fabricated and tested under different inputs reference frequencies and multiplication factors. LeCory LC584A is used to measure the output frequency with noisy digital circuitry. Fig. 15 illustrates the measured results with an input reference frequency of 19.26 kHz from an HP-3312 signal generator, multiplication factor N = 13888 (i.e., N1 = 224, N2 = 62) and test output divided by 2 at 134.7 MHz under a supply voltage of 1.8 V. The signal at Channel 2 displays the DCO generated test output signal, while Channel D shows the long-term cycle-to-cycle jitter histogram. The measured rms jitter and peak-to-peak  $(P_k-P_k)$  jitter of DCO 1 shown in Fig. 15(a) are 45 and 110 ps, respectively, and, the measured rms jitter and P<sub>k</sub>-P<sub>k</sub> jitter of DCO 2 shown in Fig. 15(b) are 76 and 200 ps, respectively. Fig. 16 shows the measured long-term cycle-to-cycle jitter of clock output versus multiplication factor (N). The dashed line indicates the output jitter with the input reference clock set to 19.26 kHz and different multiplication factors. The solid line shows the output jitter with the test output clock fixed at 134.76 MHz and both multiplication factor (N) and input reference clock are changed. The  $P_k-P_k$ 



Fig. 15. Measured results with N = 13888 (N1 = 224, N2 = 62), REF\_clk = 10 kHz. (a) DCO 1 output @ 4.30 MHz  $\pm$  55 ps. (b). DCO 2 output @ 134.7 MHz  $\pm$  100 ps.

jitter ratio is represented as the percentage of the output clock period, and is always less than 2.8% when multiplication factor (N) is less than 13 888. The output clock frequency limits the performance owing to noise induced by the I/O pad transition. When the input clock frequency is set to 19.26 kHz, the P<sub>k</sub>-P<sub>k</sub> jitter ratio increases with the multiplication factor (N), and remains less than 2.8% of output clock period. The output clock period dominates the P<sub>k</sub>-P<sub>k</sub> jitter ratio as output frequency is increased.

The maximum multiplication factor of N1 is 224 instead of 255 owing to the overflow in the DCO timing counter 1, which causes false frequency locking. In addition, the overflow also limits the lowest input frequency of DCO 1. Similarly, overflow also restricts the multiplication factor of N2. To prevent overflow, one more bit has to be extended in the most significant bit (MSB) of (5), as well as the DCO timing counter. Bit 0 is added to both the MSB of multiplication factors N1 and N2 for comparison.

Fig. 17 shows a chip microphotograph of the proposed design. Table I lists the summary of chip features. The total gate count is



Fig. 16. Measured output cycle-to-cycle jitter versus multiplication factor (N) for fixed input clock = 19.26 kHz and fixed output clock = 134.7 MHz.



Fig. 17. Microphotograph of the proposed clock generator with cascaded DFC loops.

TABLE I SUMMARY OF CHIP FEATURES

| Items                 | Specification              |  |  |
|-----------------------|----------------------------|--|--|
| Technology            | UMC 0.18µm CMOS            |  |  |
| Function              | Cell-based clock generator |  |  |
| Loop Bandwidth        | Dynamic                    |  |  |
| Frequency Range       | 2.4 MHz ~ 378 MHz          |  |  |
| Multiplication Factor | 4 ~ 13888                  |  |  |
| Lock-in Time          | < 75 cycles                |  |  |
| Power Consumption     | 15 mw @ 378 MHz output     |  |  |
| Power Supply          | 1.8 V                      |  |  |
| Gate Count            | 6400 gates                 |  |  |

6400 and the core area is  $400 \times 400 \,\mu \text{m}^2$ . Table II compares different clock generators: analog PLL [2] and all-digital PLLs [5],

| Performance<br>Parameter                                               | This work                 | JSSC 03<br>[2]            | ISSCC 04<br>[7]         | JSSC 03<br>[5]          | JSSC 03<br>[14]           |
|------------------------------------------------------------------------|---------------------------|---------------------------|-------------------------|-------------------------|---------------------------|
| Process                                                                | 0.18µm CMOS               | 0.13µm CMOS               | 90nm CMOS               | 0.65µm CMOS             | 0.35µm CMOS               |
| Area                                                                   | 0.16mm <sup>2</sup>       | $0.182  \text{mm}^2$      | 0.18mm <sup>2</sup>     | 1.17 mm <sup>2</sup>    | $0.71\mathrm{mm}^2$       |
| Power                                                                  | 15 mW@378MHz              | 7 mW@240MHz               | 1.7mW@520MHz            |                         | 100 mW@500MHz             |
| Approach                                                               | DFC<br>Cascaded loops     | 12-b DAC<br>Analog filter | PFD+TDC<br>Digital loop | TDC<br>Digital loop     | PFD<br>Digital loop       |
| Input<br>Range                                                         | 19.26 KHz ~ 60 MHz        |                           | 30 KHz ~ 65 MHz         | 11.2 ~ 339.7 KHz        | $0.5 \sim 60 \text{ MHz}$ |
| Output<br>Range                                                        | 2.4 ~ 378 MHz             | $30\sim 650 \; MHz$       | 0.18~600 MHz            | 0.0449 ~ 61.3 MHz       | 40 ~ 510 MHz              |
| Multiplication<br>Factor                                               | 4 ~ 13888                 | 1 ~ 4096                  | 1~1023                  | 4 ~ 1022                | 2~1023                    |
| Max. Lock<br>time                                                      | < 75 cycles               |                           | >150 cycles             | < 7 cycles              | < 46 cycles               |
| Supply V dd                                                            | 1.8 V                     | 1.5 V                     | 1.0 V                   | 5 V                     | 3.3 V                     |
| Output Jitter<br>(P <sub>k</sub> -P <sub>k</sub> )<br>(%output period) | 2.8 %@134.7MHz<br>N=13888 | 1.7 %@240MHz<br>N=4096    | 1.2 %@30.7MHz<br>N=1023 | 4.8 %@30.4MHz<br>N=1022 | 3.2 %@450MHz<br>N=137     |

TABLE II Performance Comparisons

[7], [14]. The proposed all-digital clock generator with the proposed cascaded DFC loops has the largest multiplication range and the smallest chip area among the published PLLs, since it does not use analog capacitance and programmable divider. The work in [5] achieves faster locking than the proposed clock generator, but it has worse jitter performance, less multiplication range, and higher cost.

# VI. CONCLUSION

In this paper, a clock generator with cascaded dynamic frequency counting loops for wide multiplication range is presented. The DFC loop is constructed with ripple counters and digital comparator which replaces conventional programmable divider and phase frequency detector. A threshold region is set for jitter variation on frequency detection. Different DCO requirements and loop parameters have been investigated in each DFC loop to maintain loop stability for wide-range multiplication factor (N). A test chip demonstrates that the proposed cascade DFC loops achieves the lowest input reference frequency at 19.26 kHz, and the corresponding  $P_k-P_k$  jitter is less than 2.8% of the output clock as the multiplication factor (N) changes from 4 to 13888. Hence, the proposed cell-based clock generator with the cascaded DFC loops lowers circuit cost and improves testability. The design can also be treated as a soft IP to accelerate turnaround time, making it suitable for system-on-chip applications.

#### ACKNOWLEDGMENT

The authors would like to thank their colleagues within the SI2 group of National Chiao Tung University for many fruitful discussions. The multiproject chip (MPC) support from National Chip Implementation Center is acknowledged as well.

#### REFERENCES

- H. T. Ahn and D. J. Allstot, "A low-jitter 1.9-V CMOS PLL for ultra-SPARC microprocessor applications," *IEEE J. Solid-State Circuits*, vol. 35, no. 5, pp. 450–454, May 1999.
- [2] J. G. Maneatis, J. Kim, I. McClatchie, J. Maxey, and M. Shankarads, "Self-biased high-bandwidth low-jitter 1-to-4096 multiplier clock generator PLL," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1795–1803, Nov. 2003.
- [3] G. Yan, C. Ren, Z. Guo, Q. Ouyang, and Z. Chang, "A self-biased PLL with current-mode filter for clock generation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2005, pp. 420–421.
- [4] K. L. Wang, E. Fayneh, E. Knoll, R. H. Law, C. H. Lim, R. J. Parker, F. Wang, and C. Zhao, "Cascaded PLL design for a 90 nm CMOS high-performance microprocessor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2004, pp. 346–347.
- [5] T. Watanabe and S. Yamauchi, "An all-digital PLL for frequency multiplication by 4 to 1022 with seven-cycle lock time," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 198–204, Feb. 2003.
- [6] M. Zarubinsky, K. Berman, and E. Zipper, "Signal Generator, and Method," U.S. Patent 6,380,811, Apr. 30, 2002.
- [7] J. Lin, B. Haroun, T. Foo, J.-S. Wang, B. Helmick, S. Randall, T. Mayhugh, C. Barr, and J. Kirkpartick, "A PVT tolerant 0.18 MHz to 600 MHz self-calibrated digital PLL in 90 nm CMOS process," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2004, pp. 488–489.
- [8] R. B. Staszewski et al., "Digitally controlled oscillator (DCO)-based architecture for RF frequency synthesis in a deep-submicrometer CMOS process," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 815–828, Nov. 2003.
- [9] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, "An all-digital phase-locked loop with 50-cycle lock time suitable for high performance microprocessors," *IEEE J. Solid-State Circuits*, vol. 30, no. 4, pp. 412–422, Apr. 1995.
- [10] I. Hwang, S. Lee, and S. Kim, "A digitally controlled phase-locked loop with a digital phase-frequency detector for fast acquisition," *IEEE J. Solid-State Circuits*, vol. 36, no. 10, pp. 1574–1581, Oct. 2001.
- [11] T. Olsson and P. Nilsson, "A digitally controlled PLL for SoC applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 751–760, May 2004.
- [12] M. Combes, K. Dioury, and A. Greiner, "A portable clock multiplier generator using digital CMOS standard cells," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 958–965, Jul. 1996.
- [13] T.-Y. Hsu, C.-C. Wang, and C.-Y. Lee, "Design and analysis of a portable high-speed clock generator," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 36, no. 10, pp. 1574–1581, Oct. 2001.

- [14] C.-C. Chung and C.-Y. Lee, "An all-digital phased-locked loop for high-speed clock generation," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 347–351, Feb. 2003.
- [15] P.-L. Chen, C.-C. Chung, and C.-Y. Lee, "A portable digitally-controlled oscillator using novel varactors," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 52, no. 5, pp. 233–237, May 2005.



**Pao-Lung Chen** (S'01–M'05) received the B.S. degree from Chung-Yuan Christian University, Chung-Li, Taiwan, R.O.C., in 1987, and the M.S. degree from Texas A&M University, College Station, TX, in 1992, both in electrical engineering. He received the Ph.D degree from the Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, in 2005.

He is an Assistant Professor in the Department of Electronics Engineering, Chin-Min Institute of Technology, Toufen, Taiwan. His interests include all-dig-

ital PLL design, cell-based frequency synthesizers, and all-digital DLL design.



**Ching-Che Chung** (S'02–M'05) received the Ph.D. degree in electrical engineering from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 2003.

Since January 2004, he has been a Postdoctorial Researcher in the Department of Electronics Engineering, National Chiao Tung University. His research interests include system-on-chip design methodologies, low-power wireless baseband processor design and high-speed interface circuit design.



**Jyh-Neng Yang** (M'05) was born in Miao-Li, Taiwan, R.O.C., in 1959. He received the B.S. and M.S. degrees from the Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, in 1983 and 1991. He has been a Lecturer in the Department of Electronics Engineering, Ming Hsin University of Science and Technology, since 1991, and he receives the Ph.D. degree from the Department of Electronics Engineering and the Institute of Electronics, National Chiao-Tung University, Hsinchu, Taiwan, in 2006.

His research interests include analog integrated circuits and RF integrated circuits design.



**Chen-Yi Lee** (M'90) received the B.S. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1982, and the M.S. and Ph.D. degrees from Katholieke University Leuven (KUL), Belgium, in 1986 and 1990, respectively, all in electrical engineering.

From 1986 to 1990, he was with IMEC/VSDM, working in the area of architecture synthesis for DSP. In February 1991, he joined the faculty of the Electronics Engineering Department, National Chiao Tung University, where he is currently a Pro-

fessor and Department Chair. His research interests include VLSI algorithms and architectures for high-throughput DSP applications. He is also active in various aspects of high-speed networking, system-on-chip design technology, very low-power designs, and multimedia signal processing. He served as the Director of Chip Implementation Center (CIC), an organization for IC design promotion in Taiwan. He is now the microelectronics program coordinator of Engineering Division under National Science Council of Taiwan.

Dr. Lee is a former IEEE Circuits and Systems Taipei Chapter Chair.