# A Low-Power Delay-Recycled All-Digital Duty-Cycle Corrector with Unbalanced Process Variations Tolerance

Ching-Che Chung and Chang-Jun Li Department of Computer Science and Information Engineering, National Chung Cheng University, No. 168 University Rd., Min-Hsiung, Chia-Yi, Taiwan Email: wildwolf@cs.ccu.edu.tw

Abstract-In this paper, a low-power delay-recycled alldigital duty-cycle corrector (ADDCC) is presented. The proposed ADDCC corrects the duty-cycle of the distorted clock to 50% under process, voltage, and temperature (PVT) variations. Besides, the delay-recycled architecture reduces the required length of the delay line to 1/2 of the input clock period. The proposed ADDCC architecture saves both the chip area and the power consumption. In addition, the proposed ADDCC can work properly at unbalanced process corners (i.e. SF and FS). The proposed design is implemented in a standard performance 90nm CMOS process, and the active area is 70  $\mu$ m  $\times$  70  $\mu$ m. The simulation results show that the maximum duty cycle error of the output clock can be less than 1.9% with the input duty-cycle ranging from 20% to 80 %, and the input frequency ranging from 450 MHz to 1 GHz. The power consumption of the proposed ADDCC is 1.7mW at 450MHz and 3.45mW at 1 GHz with a 1.0V power supply.

## I. INTRODUCTION

High speed data communication applications, such as the double data rate (DDR) memories and the double sampling analog-to-digital converters (ADCs) require sampling the input data via the positive and negative edges of the reference clock. Accordingly, a reference clock with a precise 50% duty-cycle is demanded in such applications. For the sake of this requirement, a duty-cycle corrector (DCC) is employed in the system-on-a-chip (SoC) to correct the distorted clock signal with process, voltage, and temperature (PVT) variations. Further, the corrected clock signal should be phase aligned with the input clock to avoid inserting an additional clock skew by the DCC.

In recent years, many DCCs have been proposed and can be classified into two categories: analog DCCs [1] and digital DCCs [2]-[9]. Analog DCCs usually use a pulse-width control loop (PWCL) [1] to correct the input clock. The PWCL takes a relatively long lock-in time and needs several large on-chip capacitors. Thus it often occupies a relatively large chip area and has relatively high power consumption. Further, the PWCL has a serious charge pump mismatch problem at unbalanced process corners (i.e. SF or FS), and the leakage current problem in advanced CMOS process causes ripples on the control voltage and affects the stability of the output clock. In addition, the output clock is not phase aligned with the input clock in the PWCL. As a result, all-digital DCCs have been proposed to overcome these drawbacks.

The synchronous mirror delay (SMD)-based all-digital duty cycle corrector (ADDCC) [2] uses a half-cycle delay line (HCDL) to generate a 50% duty-cycle clock. The SMD-based ADDCC generates a short pulse to measure the period of the input clock. However, when the short pulse propagates through the delay line, the

pulse width is easily affected by the delay line at unbalanced process corners. Therefore, once the pulse width is large enough to turn on two or more AND gates, the duty-cycle error becomes larger than expectation.

The time-to-digital converters (TDCs) are widely used in the ADDCCs [4]-[8] to reduce the lock-in time. However, the TDC costs an extra chip area in [4],[5], and it also has a delay mismatch problem in these ADDCCs. Therefore, the ADDCCs [6]-[8] merge the TDC with the delay line, and thus, the chip area can be reduced. However, since the delay unit restricts the duty-cycle correction resolution, the duty-cycle error of the output is dependent on the TDC resolution. It is difficult to design a wide range TDC with a high resolution, and thus, either the operating frequency range of these ADDCCs [4],[6] is very limited or the power consumption is very large [4],[5].

The delay-recycled ADDCCs [7]-[9] save the number of delay cells and flip-flops of the TDC. Thus the operating frequency range can be extended and the chip area and power consumption can be further reduced. However, the duty-cycle error of the delay-recycled ADDCCs [7],[8] is still restricted by the delay unit of the delay line. Therefore, the ADDCCs [7],[8] without fine-tuning delay cells cannot achieve a small duty cycle error at high frequency operation. In addition, the binary-weighted delay line (BWDL) [8] also has a serious linearity problem with on-chip variations. Although the finetuning delay cells are added in the ADDCC [9] to achieve a relatively small duty-cycle error, the delay controllable range of the fine-tuning delay cell is not equal to the coarse-tuning resolution with PVT variations, and thus, this causes the non-monotonic response problem when the controller switches the coarse-tuning control code. Thus, the ADDCC [9] causes a large cycle-to-cycle jitter during the coarse-tuning control code switching.

In this paper, a low-power delay-recycled ADDCC is presented. The proposed ADDCC uses the delay line architecture with a coarsetuning delay stage and a fine-tuning delay stage to improve the dutycycle correction resolution. The proposed delay-recycled ADDCC architecture reduces the required length of the delay line to 1/2 of the input clock period. Thus, the chip area and power consumption of the ADDCC can be reduced. The proposed ADDCC uses a TDC to achieve fast lock-in time, and the balanced rise time and fall time delay line architecture makes it has a high tolerance to the unbalanced process variations. In addition, the proposed monotonic delay line architecture can reduce the jitter of the output clock.

The rest of the paper is organized as follows: Section II describes the system architecture, design constraints, and the locking procedure of the proposed ADDCC. The experimental results are discussed in Section III. Finally, Section IV concludes with a summary.

## II. SYSTEM ARCHITECTURE

A. Delay-Recycled ADDCC

This work was supported in part by the National Science Council of Taiwan, under Grant NSC101-2221-E-194-063.

Fig. 1 shows the block diagram of the proposed ADDCC. The ADDCC is composed of a pulse generator (PG), a half-cycle delay line (HCDL), a phase and frequency detector (PFD) [12], a ADDCC controller, a TDC encoder, and a D-type flip-flop (DFF). The HCDL is composed of a 4-bit TDC-embedded coarse-tuning delay line (CDL) and a 5-bit fine-tuning delay line (FDL) [11], as shown in Fig. 2. The dummy cells are added to balance the capacitance loading of the NAND gates. As a result, the CDL can have less pulse width distortion effects on the input signal with PVT variations.



FIGURE 1. THE PROPOSED ADDCC.

The FDL [11] is composed of two parallel connected tri-state buffer arrays operating as an interpolator circuit. The total controllable delay range of the FDL is always equal to one coarse tuning delay step at all PVT variations. Hence, the proposed delay line can always provide a monotonic response between the delay line control code (ctrl\_code[8:0]) and the output delay time. Therefore, the proposed monotonic delay line architecture can reduce the jitter of the output clock during the coarse-tuning control code switching.



FIGURE 2. THE TDC-EMBEDDED CDL.

#### B. Design Constraints

If the period of the input clock (CLK\_IN) is  $T_{CLK_IN}$ , the intrinsic delay of the PG is  $T_{PG}$ , the delay time of the HCDL is  $T_{HCDL}$ , and the clock-to-q delay of the DFF is  $T_{DFF}$ . When the ADDCC is locked, the Eq. 1 must be satisfied. Therefore, the maximum delay of the HCDL restricts the minimum input frequency of the ADDCC, and the minimum delay of the HCDL limits the maximum input frequency of the ADDCC. In the proposed ADDCC, the required delay time of the HCDL is reduced to 1/2 of the input clock period. Thus, the chip area and power consumption of the ADDCC can be reduced, as compared to the prior researches [4]-[6].

$$2 * (T_{PG} + T_{HCDL} + T_{DFF}) = T_{CLK \ IN}$$
(1)

#### C. Locking Procedure

Fig. 3 shows the overall timing diagram of the proposed ADDCC. After the ADDCC is reset, the PG generates the short pulses from the input clock (CLK\_IN), and the coarse-tuning control code (ctrl\_code[8:5]) of the HCDL is set to the maximum value (i.e. 4'b1111) at this cycle. Subsequently, the short pulses propagate through the HCDL. At the next rising edge of the input clock (CLK\_IN), the TDC captures the propagated pulse signals and stores as tdc\_data [15:0]. The signal "is\_half\_cycle" is used to determine if the period of the input clock (CLK\_IN) is larger than the maximum delay time of the HCDL. Then, the TDC encoder will search for the

bit location of the first "1" in tdc\_data[15:0] from the mostsignificant bit to the least-significant bit. Then, the TDC encoder outputs the initial delay control code (tdc\_code[3:0]) for the ADDCC controller to achieve fast lock-in. After setting the initial control code, the proposed ADDCC increases or decreases the delay line control code (ctrl\_code[8:0]) according to the PFD's output until the output clock (CLK\_OUT) is phase aligned with the input clock (CLK\_IN). A binary search scheme is adopt in the ADDCC controller to speed up the fine-tuning process. Whenever the PFD's output is changed from UP to DOWN or vice versa, the search step (step[4:0]) is divided by 2 until the step is reduced to 1. Once the step is equal to one fine-tuning control code, the ADDCC is locked.



FIGURE 3. OVERALL TIMING DIAGRAM OF THE PROPOSED ADDCC.



FIGURE 4. TIMING DIAGRAM OF TDC AT LOW FREQUENCY OPERATION.

Fig. 4 shows the detail timing diagram of the TDC at low frequency operation. In Fig. 4, the period of the input clock (CLK\_IN) is larger than the maximum delay time of the HCDL. Thus, the short pulses propagate through the full delay line and trigger the DFF then generate the feedback pulse (fb\_pulse) before the next rising edge of the input clock (CLK\_IN). The signal "is\_half\_cycle" is pulled high in this case. In Fig. 4, the first "1" bit location of tdc\_data [15:0] from the most-significant bit to the least-significant bit is 3. Therefore, the period of the input clock (CLK\_IN) can be quantized as 20 (=16+4) coarse-tuning delay unit's delay time. Since the lock condition is to provide a half cycle delay time by the HCDL, as shown in Eq. 1, the tdc\_code[3:0] outputs by the TDC encoder is 10(=20/2) in this example.

When the period of the input clock (CLK\_IN) is smaller than the maximum delay time of the HCDL, the short pulses require more than one input clock cycle to pass through the full delay line, as shown in Fig. 5. Thus, at next rising edge of the input clock (CLK\_IN), the signal "is\_half\_cycle" is not pulled high in this case. In Fig. 5, the first "1" bit location of tdc\_data [15:0] from the most-significant bit to the least-significant bit is 11. Therefore, the period of the input clock (CLK IN) can be quantized as 12 coarse-tuning

delay unit's delay time. In addition, the tdc\_code[3:0] is 6(=12/2) in this example. With the TDC, the proposed ADDCC can achieve fast lock-in within 25 clock cycles. Hence, the proposed ADDCC not only saves the chip area and the power consumption by the delay-recycled architecture, but also reduces the lock-in time by the TDC.



FIGURE 5. TIMING DIAGRAM OF TDC AT HIGH FREQUENCY OPERATION.

#### **III. EXPERIMENTAL RESULTS**



FIGURE 6. LAYOUT OF THE PROPOSED ADDCC.



FIGURE 7. THE DUTY-CYCLE ERROR AT TYPICAL PROCESS CORNER.

The proposed ADDCC is implemented in a standard performance 90nm 1P9M CMOS process with a 1.0V power supply. Fig. 6 shows the layout of the ADDCC, and the active area is  $70\mu m \times 70\mu m$ . Fig. 7 shows the duty-cycle error of the output clock at typical process corner (TT) with different input frequencies and input duty cycles. The maximum duty-cycle error is always smaller than 1.9%. The

input frequency of the proposed ADDCC ranges from 450MHz to 1GHz, and the input duty-cycle ranges from 20% to 80%.

In the proposed ADDCC architecture, the short pulses, which generated by the PG, propagate through the HCDL. Thus, in unbalanced process variation (i.e. SF or FS process corner), the pulse width will be enlarged or shrunk. Since the capacitance loading of the NAND gates is balanced by adding dummy cells, the HCDL can have less pulse width distortion effects on the pulse signals with unbalanced process variations. In addition, the pulse width of the short pulses should be wide enough to trigger the DFFs at worst case condition. Fig. 8 shows the simulation results of the proposed ADDCC can work correctly with PVT variations, and thus, the proposed design is very robust, and it can against unbalanced process variations.



FIGURE 8. THE DUTY-CYCLE ERROR WITH PVT VARIATIONS.

Table I shows the comparisons with prior researches. Although the analog PWCL [1] have a relatively small duty-cycle error, it exhibits considerable power dissipation and have a long lock-in time. In addition, the PWCL has a serious charge pump mismatch problem at unbalanced process corners, and thus, it may result in worse dutycycle error with unbalanced process variations. Moreover, the output clock of [1],[3] is not phase aligned with the input clock, makes it not easy to be integrated in the SoC. The ADDCCs [3],[8] do not have fine-tuning delay cells, and thus they cannot achieve a small duty cycle error at high frequency operation.

In [4], the TDC-based ADDCC without fine-tuning delay cells is difficult to achieve a high duty-cycle correction resolution; further, maintaining a wide operation frequency is also difficult. In addition, the interpolator of the ADDCC [4] is easily affected by unbalanced process variations. Thus, the output duty cycle error will become worse at unbalanced process corners. In [5], a delay-locked loop is integrated with the ADDCC to align the phase of the output clock with the input clock. However, the dual loop architecture results in more power consumption and higher design complexity.

Although the fine-tuning delay cells are added in the ADDCC [9] to achieve a relatively small duty-cycle error, the delay controllable range of the fine-tuning delay cell is not equal to the coarse-tuning resolution with PVT variations, and thus, this causes the non-monotonic response problem when the controller switches the coarse-tuning control code. Thus, the ADDCC [9] may cause a large cycle-to-cycle jitter during the coarse-tuning control code switching.

In comparison to prior studies, the proposed ADDCC has a lower area cost, lower power consumption, wider operating frequency range and better tolerance of PVT variations.

|                                  | Proposed                                  | JSSC'08<br>[1]       | VLSI'11<br>[3]                            | TCAS-II'07<br>[4]    | JSSC'09<br>[5]                               | VLSI-DAT'12<br>[8]                          | CASVLSI'09<br>[9]  |
|----------------------------------|-------------------------------------------|----------------------|-------------------------------------------|----------------------|----------------------------------------------|---------------------------------------------|--------------------|
| Phase<br>Alignment               | Yes                                       | No                   | No                                        | Yes                  | Yes                                          | Yes                                         | Yes                |
| Туре                             | TDC/HCDL                                  | Analog<br>PWCL       | TDC/HCDL                                  | TDC                  | TDC/HCDL                                     | BWDL/SAR                                    | HCDL               |
| Process                          | 90 nm                                     | 0.18µm               | 0.18µm                                    | 0.18µm               | 0.18µm                                       | 0.18µm                                      | 90 nm              |
| Supply Voltage<br>(V)            | 1.0                                       | 1.8                  | 1.8                                       | 1.8                  | 1.8                                          | 1.8                                         | 1.2                |
| Frequency                        | 450 MHz ~<br>1 GHz                        | 1 MHz ~ 1.3<br>GHz   | 400 MHz ~<br>2 GHz                        | 800 MHz ~<br>1.2 GHz | 440 MHz ~<br>1.5 GHz                         | 68 MHz ~<br>470 MHz                         | 500 MHz            |
| Input Duty<br>Cycle Range<br>(%) | 20~80                                     | 30 ~ 70              | 10 ~ 90<br>@ 400 MHz<br>20 ~ 80<br>@2 GHz | 40 ~ 60              | 20 ~ 80<br>@ 440 MHz<br>40 ~ 60<br>@ 1.5 GHz | 2 ~ 99<br>@ 47 MHz<br>10 ~ 90<br>@ 470 MHz  | 20~80              |
| Output Duty<br>Cycle Error (%)   | 1.4<br>@ 450 MHz<br>1.9<br>@ 1 GHz        | 1.0                  | 1.0<br>@ 400MHz<br>3.5<br>@ 1GHz          | 1.5                  | 1.8                                          | 0.4<br>@ 68 MHz<br>2.27<br>@ 470 MHz        | 0.5                |
| Lock-in Time<br>(Cycle)          | < 25                                      | < 60                 | < 3.5                                     | 10                   | 10~15                                        | ≦16                                         | 46                 |
| Power<br>Consumption             | 1.7 mW<br>@ 450 MHz<br>3.45 mW<br>@ 1 GHz | 4.8 mW*<br>@ 1.3 GHz | 1.76 mW<br>@ 400 MHz<br>3.6 mW<br>@ 2 GHz | 15 mW                | 43 mW<br>@ 1.5 GHz                           | 1.13 mW<br>@ 68 MHz<br>2.09 mW<br>@ 470 MHz | 12 mW<br>@ 500 MHz |
| Area (mm <sup>2</sup> )          | 0.0049                                    | 0.2068               | 0.025                                     | 0.2236               | 0.053                                        | N/A                                         | N/A                |

TABLE I. PERFORMANCE COMPARISONS

\*: only PWCL

# IV. CONCLUSION

In this paper, a low-power delay-recycled ADDCC with a high tolerance of PVT variations is presented. The proposed ADDCC can achieve a wide-range operation with input frequency ranging from 450 MHz to 1 GHz and input duty-cycle ranging from 20% to 80%. It only needs one-half of the reference clock period delay line length to generate a 50% duty-cycle clock. Most of all, the proposed ADDCC is robust enough at unbalanced corners and is suitable for low cost applications.

#### ACKNOWLEDGMENT

The authors would like to thank their colleagues in the Silicon Sensor and System (S3) Laboratory of National Chung Cheng University for many fruitful discussions. The EDA tools supported by National Chip Implementation Center (CIC) are acknowledged as well.

#### REFERENCES

- Kuo-Hsing Cheng, Chia-Wei Su and Kai-Fei Chang, "A high linearity, fast-locking pulsewidth control loop with digitally programmable duty cycle correction for wide range operation, "*IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 399-413, Feb. 2008.
- [2] Yi-Ming Wang and Jinn-Shyan Wang, "An all-digital 50% duty-cycle corrector," in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), May 2004, pp. 925-928.
- [3] Junhui Gu, Jianhui Wu, Danhong Gu, Meng Zhang and Longxing Shi, "All-digital wide range precharge logic 50% duty cycle corrector," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 4, pp. 760-764, Apr. 2012.
- [4] Shao-Ku Kao and Shen-Iuan Liu, "All-digital fast-locked synchronous duty-cycle corrector," *IEEE Transactions on*

Circuits and Systems II: Express Briefs, vol. 53, no. 12, pp. 1363-1367, Dec. 2006.

- [5] Dongsuk Shin, Janghoon Song, Hyunsoo Chae, and Chulwoo Kim, "A 7 ps jitter 0.053 mm<sup>2</sup> fast lock all-digital DLL with a wide range and high resolution DCC, "*IEEE Journal of Solid-State Circuits*, vol. 44, no. 9, pp. 2437-2451, Sep. 2009.
- [6] Shao-Ku Kao and Shen-luan Liu, "A wide-range all-digital duty cycle corrector with a period monitor," in Proceedings of IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC), Dec. 2007, pp. 349-352.
- [7] Yi-Ming Wang, Jen-Tsung Yu, Yuandi Surya, and Chung-Hsun Huang, "A compact delay-recycled clock skewcompensation AND/OR duty-cycle-correction circuit," *in Proceedings IEEE International SOC Conference (ISOCC)*, Sep. 2011, pp. 42-47.
- [8] Shih-Nung Wei, Yi-Ming Wang, Jyun-Hua Peng, and Yuandi Surya, "A range extending delay-recycled clock skewcompensation AND/OR duty-cycle-correction circuit," in Proceedings IEEE International Symposium on VLSI Design, Automation & Test (VLSI-DAT), Apr. 2012.
- [9] R.Swathi and M.B.Srinivas, "All digital duty cycle correction circuit in 90nm based on mutex," in Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), May 2009, pp. 258-262.
- [10] Rong-Jyi Yang and Shen-Iuan Liu, "A 40-550 MHz harmonicfree all-digital delay-locked loop using a variable SAR algorithm," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 2, pp. 361-373, Feb. 2007.
- [11] Ching-Che Chung, Duo Sheng, and Wei-Da Ho, "A lowpower and small-area all-digital spread-spectrum clock generator in 65nm CMOS technology," *in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT)*, Apr. 2012.
- [12] Ching-Che Chung and Chen-Yi Lee, "An all-digital phaselocked loop for high-speed clock generation," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 2, pp. 347-351, Feb. 2003.