# A Monotonic and Low-Power Digitally Controlled Oscillator Using Standard Cells for SoC Applications

Duo Sheng<sup>1</sup>, Ching-Che Chung<sup>2</sup>, and Jhih-Ci Lan<sup>1</sup>

<sup>1</sup>Department of Electrical Engineering, Fu Jen Catholic University, 510, Zhongzheng Road, Xinzhuang Dist., New Taipei City 24205, Taiwan, ROC

<sup>2</sup>Department of Computer Science and Information Engineering, National Chung Cheng University, 168

University Road, Minhsiung Township, Chiayi County, 621, Taiwan, ROC

<sup>1</sup>E-mail: duosheng@mail.fju.edu.tw

# Abstract

In this paper, a monotonic and low-power digitally controlled oscillator (DCO) with cell-based design for System-On-Chip (SoC) applications is presented. The proposed DCO employs a cascade-stage structure to achieve high resolution and wide range at the same time. Besides, based on the proposed two-level controlled interpolation structure, the proposed DCO can provide monotonic delay with low power consumption and low circuit complexity as compared with conventional approaches. Simulation results show that power consumption of the proposed DCO can be improved to 0.337mW (@1118MHz) with 0.82ps resolution. In addition, the proposed DCO can be implemented with standard cells, making it easily portable to different processes and very suitable for SoC applications.

# Keywords

Digitally controlled oscillator (DCO), standard cells, delay monotonicity, portable, low power

# 1. Introduction

Phase-locked loop (PLL) is a very important clocking circuit for many electronic systems such as digital communication and microprocessor. Traditional PLL's are designed by analog approaches. However, as supply voltage decreases, both gain and frequency range need to be traded off in voltage-controlled oscillator (VCO) which is the most important block in PLL. In addition, due to serious leakage current problem, it is hard to design a charge-pump circuit in more advanced process technology. Thus it needs more design efforts to integrate analog PLL's in SoC with lower supply voltage and advanced process. Furthermore, as technology migrates, the analog blocks in PLL need to be re-designed. In contrast, all-digital phase-locked loop (ADPLL) [1]-[3] does not utilize any passive components and use digital design approaches, making it easily be integrated into digital and low-supply voltage systems.

The conventional ADPLL architecture is shown in Figure 1. A phase/frequency detector (PFD) compares the frequency and phase of reference clock (Ref. CLK) and ADPLL output clock (DCO CLK), and then provides the control signal (UP and DN) to an ADPLL controller. Based on the comparison results of PFD, the ADPLL controller generates the DCO control code (DCO Code) to a digitally controlled oscillator (DCO), leading to change the frequency of DCO CLK. Among the functional blocks of all-digital clock generators, DCO is the kernel module, because it



Figure 1: The block diagram of ADPLL

dominates overall performance and power consumption of all-digital clock generator [1]-[4]. For example, DCO occupies over 50% power consumption of all-digital clock generator [2], and the delay resolution and operating range affect jitter performance and output frequency range of alldigital clock generator, respectively. According to these design requirements, all-digital clock generators require a high-performance and low-jitter DCO. Recently, different architectural solutions have been proposed to implement the DCO. The current-starved type DCO [4] controls the supply current of delay cell to obtain different delay values. Although it has high resolution, it needs a static current source that will consume more static power dissipation. In addition, such approach demands high complexity at circuit level, resulting in long design cycle and low portability.

In order to reduce design cycle when process or specification is changed, many DCOs implemented with standard cells have been proposed to enhance portability [2], [3], [5]. Driving capability modulation (DCM) changes the driving current of each delay cell by controlling number of enabled tri-state buffers/inverters [2]. The design concept of this approach is straightforward, but it has a poor performance in linearity and power consumption, and the resolution is insufficient. The or-and-inverter (OAI) cells are proposed to enhance resolution by different input pattern combinations; however linearity remains to be solved [3]. Although digitally controlled varactor (DCV) has a good performance in resolution and linearity [5], it is hard to take a few cells to provide wider operation range. As a result, large power consumption is demanded due to many DCV cells to maintain an acceptable operation range.

To improve the control code resolution and extend the operation range at the same time, the cascading structure DCO has been proposed [2], [3], [5]. However, this structure requires that the controllable range of each stage must be larger than the finest delay step of the previous stage to

This work was supported in part by the National Science Council of Taiwan, R.O.C., under Grant NSC 100-2221-E-030 -012 -

978-1-4673-2688-9/12/\$31.00 ©2012 IEEE



Figure 2: Non-monotonic phenomenon in DCO



Figure 3: Architecture of the proposed DCO.

ensure it does not have any dead zone larger than the LSB resolution of DCO. Because of such design constraint, the cascading structure DCO not only needs over design, but also has the non-monotonic problem will occur when DCO code switches at the boundary of different tuning stages as shown in Figure 2. Because the non-monotonic DCO induces large delay change, it will increase the jitter of DCO. Moreover, when the non-monotonic DCO is used in a feedback control system such as PLL, the feedback loop may get stuck and toggle forever between two control codes, resulting in unlock phenomenon. Furthermore, in some frequency modulation applications such as spread spectrum clock generator (SSCG), the control code of DCO is required to span evenly to reduce the electromagnetic interference (EMI) effect, thus the non-monotonic DCO is not suitable for SSCG application [6], [7].

In this paper, a monotonic, low-power, high-resolution, and wide-range DCO with high portability is proposed for SoC applications. In contrast to [6], the proposed design does not need the extra calibration block to maintain the delay monotonicity. The proposed DCO not only uses the cascading structure to preserve the control code resolution and operation range, but also employs the novel two-level controlled interpolation structure to save power consumption and obtain monotonic gain curve. In addition, all design of the proposed DCO can be described by HDL language and implemented with standard cells, making it easily portable to different processes and very suitable for SoC applications.

### 2. Architecture overview

Figure 3 illustrates the architecture of the proposed monotonic and low-power DCO, which consists of three stages, namely coarse-tuning stage,  $1^{st}$  fine-tuning stage, and  $2^{nd}$  fine-tuning stage. The proposed DCO employs the cascading structure to achieve fine frequency resolution and wide operation range. The coarse-tuning stage and fine-tuning stage can extend operation range and improve the delay resolution, respectively. Based on the required



Figure 4: The ladder-shaped coarse-tuning stage



Figure 5: Proposed coarse-tuning stage.

frequency range and resolution for our application, the delay of coarse-tuning stage,  $1^{st}$  fine-tuning stage, and  $2^{nd}$  fine-tuning stage is controlled by coarse-tuning control code (C[15:0], EN[15:0]),  $1^{st}$  fine-tuning control code (F1A[6:0] and F1B[5:0]), and  $2^{nd}$  fine-tuning control code (F2[3:0]) respectively.

In order to maintain the monotonicity in the cascading structure, the controllable range of each stage should be correlated with the finest delay step of the previous stage. First, the coarse-tuning stage sends two signals (CA\_OUT and CB\_OUT) with time difference of one coarse delay cell (CDC) in the coarse-tuning stage. Second, the 1<sup>st</sup> fine-tuning stage interpolates these two signals to generate two signals (F1A\_OUT and F1B\_OUT) with 1/6 of time difference of one CDC. Finally, because the resolution of the 1<sup>st</sup> fine-tuning stage is not sufficient for typical DCO applications, a 2<sup>nd</sup> fine-tuning stage is added to further improve overall delay resolution of DCO. The 2<sup>nd</sup> fine-tuning stage, and than generates F2\_OUT with 1/16 of time difference of one delay cell in the 1<sup>st</sup> fine-tuning stage by delay interpolation.

### 3. Circuit design

### 3.1. Two-output coarse-tuning stage

In the cascading structure DCO, the coarse-tuning stage determines the overall DCO frequency operating range. Generally, the coarse-tuning stage consists of CDCs, and the total delay of the coarse-tuning stage is determined by the number of CDCs and delay of each cell. There are two types of the coarse-tuning stage structure. The ladder-shaped coarse-tuning stage is composed of  $2^{M}$ -1 CDCs, consisting of one delay buffer and one multiplexer, and the coarse-tuning control code (C[ $2^{M}$ -1:0]) selects the  $2^{M}$  different propagation value from CDCs as shown in Figure 4 [8]. The minimum delay of the ladder-shaped coarse-tuning stage is independent of the delay range. However, the delay step of



Figure 6: Multi-stage interpolation structure DCO [9].



Figure 7: Proposed 1<sup>st</sup> fine-tuning stage.

one CDC is large, resulting in decreasing the overall delay resolution of DCO. In contrast to ladder-shaped structure, the path-selection coarse-tuning stage has small delay step, because of the CDC is only one delay buffer [3], [5], [6]. The conventional ladder-shaped coarse-tuning stage can only generate one output that is not suitable for the interpolation type DCO. Thus, the two-output coarse-tuning stage is proposed in this design as shown in Figure 5. The proposed two-output coarse-tuning stage is composed of 16 CDCs which is a two-input AND gate. The difference delay values between outputs (CA\_OUT and CB\_OUT) can be controlled by selecting different delay paths organized by these 16 delay cells. When delay line is requested to provide higher operation frequency, a shorter delay path is selected and the rest CDCs will not be used. However, these CDCs



Figure 8: 2<sup>nd</sup> fine-tuning stage.

are not disabled. To reduce power consumption as the operating frequency changes, those redundant two-input AND gates will be disabled by the controlled signals (EN[15:0]) are set to low level.

# **3.2.** Two-level controlled interpolation fine-tuning stage

Because the resolution of the coarse-tuning stage is not sufficient for typical DCO applications, two fine-tuning stages are added to further improve overall delay resolution of DCO. The design challenge of the fine-tuning stage is how to improve delay resolution while keeping monotonic delay characteristic. The multi-stage interpolation structure is the conventional solution for the fine-tuning stage as shown in Figure 6 [9]. The multi-stage interpolation structure employs the interpolation cell that consists of two buffers to improve the delay resolution. When the multistage interpolation fine-tuning stage is requested to generate  $2^N$  times resolution improvement, it needs N delay stages and  $2^{N+1} + N - 2$  interpolation cells. Thus, when this approach obtains the finer delay resolution, it not only consumes large power, but also has long intrinsic delay

Figure 7 illustrates the architecture of the proposed 1<sup>st</sup> fine-tuning stage, which consists of seven interpolation delay cells (IDCs) and two driving inverters. The delay of the 1<sup>st</sup> fine-tuning stage is controlled by level one control code (F1A[6:0]) and level two control code (F1B[5:0]). Each IDC has different delay combination of inputs (CA\_OUT and CB\_OUT) due to different number of parallel tri-state inverters. Table 1 lists the combination of

| Level One Control Code<br>(F1A[6:0]) | Level Two Control Code<br>(F1B[5:0]) | Combination | F1B_OUT Timing<br>Combination<br>(CA_OUT: CB_OUT) | F1A_OUT Timing<br>Value | F1B_OUT Timing<br>Value |
|--------------------------------------|--------------------------------------|-------------|---------------------------------------------------|-------------------------|-------------------------|
| 0000011                              | 000001                               | 6:0         | 5:1                                               | TCA                     | TCA + S                 |
| 0000110                              | 000010                               | 5:1         | 4:2                                               | TCA + S                 | TCA + 2S                |
| 0001100                              | 000100                               | 4:2         | 3:3                                               | TCA + 2S                | TCA + 3S                |
| 0011000                              | 001000                               | 3:3         | 2:4                                               | TCA + 3S                | TCA + 4S                |
| 0110000                              | 010000                               | 2:4         | 1:5                                               | TCA + 4S                | TCA + 5S                |
| 1100000                              | 100000                               | 1:5         | 0:6                                               | TCA + 5S                | TCB                     |

 TABLE 1
 Timing Control of 1<sup>st</sup> Fine-Tuning Stage

TCA: Timing of CA\_OUT, TCB: Timing of CB\_OUT, S: Delay Step of 1st fine-tuning stage

TABLE 2 Simulation Results of Step/Range of Tuning Stage

|                | Coarse-Tuning | 1 <sup>st</sup> Fine-Tuning | 2 <sup>nd</sup> Fine-Tuning |
|----------------|---------------|-----------------------------|-----------------------------|
| <br>Range (ps) | 1465          | 92.7                        | 12.4                        |
| <br>Step (ps)  | 95.2          | 14.2                        | 0.82                        |

the two-level control codes. To save the power consumption, there are only two IDCs turn-on at the same time based on the level one control code. The level two control code determines which IDC output will be passed to the output of  $1^{st}$  fine-tuning stage (F1A\_OUT and F1B\_OUT). Because the control codes can change the timing of F1A\_OUT and F1B\_OUT, making F1A\_OUT always has one delay step less than F1B\_OUT. The proposed  $1^{st}$  fine-tuning stage uses the novel two-level controlled structure to increase delay resolution and reduce power consumption and circuit complexity.

### **3.3. Second fine-tuning stage**

Because the resolution of the 1<sup>st</sup> fine-tuning stage is not sufficient for typical DCO applications, a 2<sup>nd</sup> fine-tuning stage is added to further improve overall delay resolution of DCO. The 2<sup>nd</sup> fine-tuning stage employs the simple interpolation structure uses two driving groups that are controlled by the 2<sup>nd</sup> fine-tuning stage control code (F2[3:0]) to perform a delay interpolation as shown in Figure 8 [10]. The 2<sup>nd</sup> fine-tuning stage is composed of the binaryweighted driving capability tri-state inverters. The 2<sup>nd</sup> finetuning stage receives two outputs of 1<sup>st</sup> fine-tuning stage, and than further improves delay resolution by delay interpolation.

# 4. Implementation and experimental results

The proposed DCO is implemented in 90nm 1P9M CMOS process, where the DCO HSPICE simulation results of controllable delay range and the finest delay step of different tuning stages are shown in Table 2. Because the finest step of  $2^{nd}$  fine-tuning stage determines the DCO resolution, the proposed DCO can achieve high resolution with 0.82ps. From the code-to-delay simulation results of 1<sup>st</sup> and 2<sup>nd</sup> fine-tuning stages as shown in Figure 9 and Figure 10, the proposed DCO can achieve monotonic delay in each fine-tuning stage. Figure 11 shows that proposed DCO keeps monotonic gain curve when DCO code switches cross over different tuning stages. Because the proposed DCO employs the interpolation delay stage, it will not occur the nonmonotonic problem in the proposed cascading structure. In addition to resolution, operation range, and monotonicity, due to the single delay extraction scheme, the power consumption can be reduced to 0.337mW including leakage power at 1.118GHz with 1V supply voltage. Figure 12 shows the DCO output waveform at 1.118GHz.

Table 3 lists comparison results with the state-of-the-art DCOs. The proposed DCO has the finest resolution and wide operation frequency range. Based on the power index comparison, it is clear that the proposed DCO can provide better power-to-frequency ratio, implying the proposed DCO is more effective in power saving for a given operating frequency. Furthermore, the proposed low-power solution does not induce any performance loss. Additionally, since



Figure 9: Simulation results of the proposed 1<sup>st</sup> fine-tuning stage.



Figure 10: Simulation results of the proposed  $2^{nd}$  fine-tuning stage.



Figure 11: Simulation results of DCO code switches cross over different tuning stages.



Figure 12: DCO output waveform at 1.118GHz.

the proposed DCO can be implemented with standard cells, it has a good portability and very suitable for SoC integration as compared with [11], [12]. Except the proposed design, only [13] can achieve monotonic delay characteristic and high portability a same time. However, [13] utilizes the extra calibration circuit to maintain the monotonicity, resulting in more power consumption and hardware cost. As a result the proposed DCO has the benefits of better resolution, power consumption, monotonicity, and portability.

### 5. Conclusions

In this paper, we have proposed a monotonic and lowpower DCO with cell-based design for SoC applications. The proposed two-level controlled interpolation structure not only can maintain the monotonic gain curve, but also

| Performance Indices                  | Proposed DCO  | TCASII'11 [13]  | TCASII'07 [5] | TCASII'08 [11] | TCASI'09 [12]   |  |
|--------------------------------------|---------------|-----------------|---------------|----------------|-----------------|--|
| Process                              | 90nm CMOS     | 65nm CMOS       | 90nm CMOS     | 0.18µm CMOS    | 0.35µm CMOS     |  |
| Operation Range (MHz)                | 424 ~ 1118    | 47.8 ~ 538.7    | 191 ~ 952     | 300 ~ 1300     | 33~1040         |  |
| LSB Resolution (ps)                  | 0.82          | 17.4            | 1.47          | 5.9            | NA              |  |
| Power Consumption (mW)               | 0.337@1118MHz | 0.205 @481.6MHz | 0.14 @200MHz  | 4.5 @950MHz    | 7.85 @1040MHz** |  |
| Power-to-Frequency Ratio<br>(mW/GHz) | 0.3           | 0.43            | 0.7           | 4.7            | 7.5             |  |
| Monotonicity                         | Yes           | Yes*            | No            | Yes            | Yes             |  |
| Portability                          | Yes           | Yes             | Yes           | No             | No              |  |
|                                      |               |                 |               |                |                 |  |

Table 3 Performance Comparisons

\* With extra calibration; \*\* Power consumption calculated from 50% of PLL [1].

can reduce the overall power consumption and circuit complexity as compared with conventional approaches. The proposed DCO employs a cascade-stage structure to achieve high resolution and wide range at the same time. Simulation results show that power consumption of the proposed DCO can be improved to 0.337mW at 1118MHz with 0.82ps resolution. Moreover, because the proposed DCO has a good portability as a soft intellectual property (IP), it can reduce both design time and complexity. As a result, it is very suitable for SoC applications as well as system-level integration.

# Acknowledgement

The authors would like to thank National Chip Implementation Center (CIC) for technical support.

### 6. References

- J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, "An all-digital phase-locked loop with 50-cycle lock time suitable for high-performance microprocessors," *IEEE J. Solid-State Circuits*, vol. 30, pp. 412–422, Apr. 1995.
- [2] T. Olsson and P. Nilsson, "A digitally controlled PLL for Soc Applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 751–760, May. 2004.
- [3] C. -C. Chung and C. -Y. Lee, "An all digital phaselocked loop for high-speed clock generation," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 347–351, Feb. 2003.
- [4] M. Maymandi-Nejad and M. Sachdev, "A monotonic digitally controlled delay element," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2212–2219, Nov. 2005.
- [5] D. Sheng, C. -C. Chung and C. -Y. Lee, "An ultra-lowpower and portable digitally controlled oscillator for SoC applications," *IEEE Trans. Circuits and Syst. II, Exp. Briefs*, vol. 54, no. 11, pp. 954-958, Nov. 2007.
- [6] D. Sheng, C. -C. Chung and C. -Y. Lee, "A Low-Power and Portable Spread Spectrum Clock Generator for SoC Applications," *IEEE Trans. Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 6, pp. 1113-1117, Jun. 2011.
- [7] D. Sheng and J. -C. Lan, "Monotonic and Low-Power Digitally Controlled Oscillator with Portability for SoC Applications," *IEEE 54th IEEE Midwest Symposium on Circuits and Systems*, Aug. 2011.

- [8] C. -T. Wu, W. Wang, I. -C. Wey, and A. -Y Wu, "A scalable DCO design for portable ADPLL designs," *IEEE International Symposium on Circuits and Systems*, pp. 5449-5452, May 2005.
- [9] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y. -F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 632–644, May 1999.
- [10] M. Combes, K. Dioury, and A. Greiner, "A portable clock multiplier generator using digital CMOS standard cells," *IEEE J. Solid-State Circuits*, vol. 31, no. 7, pp. 958–965, Jul. 1996.
- [11] B. -M. Moon, Y. -J. Park and D. -K. Jeong, "Monotonic wide-range digitally controlled oscillator compensated for supply voltage variation," *IEEE Trans. Circuits and Syst. II, Exp. Briefs*, vol. 55, no. 10, pp. 1036-1040, Oct. 2008.
- [12] K. -H. Choi, J. -B. Shin, J. -Y. Sim, and H. -J. Park, "An interpolating digitally controlled oscillator for a wide-range all-digital PLL," *IEEE Trans. Circuits and Syst. I, Reg. Papers*, vol. 56, no. 9, pp. 2055-2063, Sep.2009.
- [13] C. -C. Chung, C. -Y. Ko, and S. -E. Shen, "A built-in self calibration circuit for monotonic digitally controlled oscillator design in 65nm CMOS technology," *IEEE Trans. on Circuits and Syst. II: Exp. Briefs*, vol. 58, no. 3, pp. 149-153, Mar. 2011.