# A Fast Phase Tracking ADPLL for Video Pixel Clock Generation in 65 nm CMOS Technology

Ching-Che Chung, Member; IEEE, and Chiun-Yao Ko

Abstract-A phase-locked loop (PLL) for analog video RGB signal acquisition interface requires precise clock generation from a very noisy and low-frequency horizontal synchronization signal (HSYNC). In such applications, the frequency multiplication ratio is always larger than 800 and can be up to over 2600. The output pixel clock has to be phase aligned to the HSYNC. Otherwise, the displayed image will become blurry. A fast phase tracking all-digital PLL (ADPLL) for video pixel clock generation in a 65 nm CMOS technology is presented in this paper. In the proposed ADPLL, the digital loop filter eliminates the reference clock jitter effects and then the period jitter of the output pixel clock can be reduced. A time-to-digital converter (TDC) and a delta-sigma modulator (DSM) are used to perform the fast phase tracking, and the tracking jitter is controlled at less than one-third of the output pixel clock period. As compared to prior studies, the proposed ADPLL does not require an extra external oscillator to overcome the reference clock jitter effects. Thus, it has a small chip area and low power consumption, and is well-suited to video pixel clock generation applications in 65 nm CMOS process.

*Index Terms*—Digital filters, frequency multiplication, jitter, phase-locked loops.

# I. INTRODUCTION

**P** HASE-LOCKED loops (PLLs) are widely used for many applications, such as clock and data recovery (CDR) circuits, frequency synthesizers, and on-chip clock generators. PLLs have become indispensable modules in system-on-a-chip (SoC) designs. In this paper, a PLL design for liquid crystal display (LCD) analog video RGB (Red/Green/Blue) signals acquisition interface applications is discussed. In a digital video display system, the analog video RGB signals sent from the personal computer (PC) graphics card are accompanied with a vertical synchronization clock (VSYNC) and a horizontal synchronization clock (HSYNC). The RGB acquisition interface converts the analog video signals into digital codes by analog-to-digital converters (ADCs). Subsequently, the sampled digital signals are sent to the digital video display system, and then the digital RGB signals can be computed in the digital processors. The sampling pixel clock (PIXEL CLK) for the

The authors are with the Department of Computer Science and Information Engineering, National Chung Cheng University, Chia-Yi, Taiwan (e-mail: wildwolf@cs.ccu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2011.2160789

ADC is generated by the clock generator inside the RGB acquisition interface. In general, the clock generator is implemented by a PLL. The clock generator takes the HSYNC as a reference clock to generate a high speed pixel clock (PIXEL\_CLK). The frequency multiplication ratio of the PLL is dependent on the display resolution of the digital video display system [22]. Furthermore, the output pixel clock has to be phase aligned to the HSYNC.

The design challenges for a pixel clock generator in a digital video display system include a very high frequency multiplication ratio, a noisy low-frequency reference clock, low tracking jitter and low period jitter requirements. The tracking jitter (phase error) of a PLL is measured between the edges of the HSYNC and the output pixel clock (PIXEL CLK). In addition, the period jitter of a PLL is the time difference between a measured pixel clock period and the required pixel clock period. The HSYNC in a digital video display system has a range of 30 kHz to 100 kHz, and the output pixel clock ranges from 25 MHz to 300 MHz in different display resolutions. Accordingly, the frequency multiplication ratio is always larger than 800 and up to over 2600 in this application. In addition, the peak-to-peak period jitter of the HSYNC can be as large as a nano-second level in different PC graphics cards. Moreover, the output pixel clock has to be phase aligned to the noisy HSYNC, and the tracking jitter should be less than one-third of the output pixel clock period. Otherwise, the displayed image will become blurry.

The jitter performance of the reported PLLs over the past decade has been summarized in [4]. For most of the PLLs, only period jitter performance has been listed because there is no need to align the clock phase in their applications. In addition, only a few studies on PLLs [2], [12], [14]–[18] reported that they can be used in high multiplication ratio applications. Traditionally, PLLs are implemented with charge-pump based architectures [1], [2]. However, if the charge-pump based PLL with MOS capacitors is applied to implement a pixel clock generator, the leakage current problem of the MOS capacitor in 65 nm CMOS technology becomes worse with a low-frequency reference clock (< 100 kHz) and results in frequency drifting and poor jitter performance. In addition, as reported in [2], with a frequency multiplication ratio ranging from 1 to 4096, the period jitter can be controlled less than 2% of the output clock period with a loop bandwidth scaling scheme. However, since the loop bandwidth scales inversely with the frequency multiplication ratio, the tracking jitter scales linearly with the frequency multiplication ratio. In [2], when the frequency multiplication factor is greater than 512, the peak-to-

Manuscript received October 20, 2010; revised June 13, 2011; accepted June 13, 2011. Date of publication July 25, 2011; date of current version September 30, 2011. This paper was approved by Associate Editor Andreas Kaiser. This work was supported in part by the National Science Council of Taiwan under Grant NSC97-2218-E-194-009-MY2. The shuttle program was supported by the United Microelectronics Corporation.



Fig. 1. Tracking jitter in high frequency multiplication ratio.

peak tracking jitter achieves 100% of the output clock period. As a result, the conventional charge-pump based PLL architecture can not be directly applied to implement the video pixel clock generator.

An all-digital phase-locked loop (ADPLL) is suitable for video pixel clock generation in 65 nm CMOS technology, since by using robust digital control codes, the MOS capacitor leakage problem in charge-pump PLLs can be avoided. However, the resolution limitation of a digitally controlled oscillator (DCO) without dithering scheme can cause a large tracking jitter in a digital video display system, as shown in Fig. 1. In Fig. 1, the HSOUT is the output pixel clock divided by the frequency divider, the frequency multiplication ratio is N, the DCO's resolution without dithering scheme is  $\Delta_{DCO}$ , and the phase error at the n-th cycle of the HSYNC is zero. Then, in the next cycle, the phase error increases to P due to the frequency difference between the HSYNC and the HSOUT. In ADPLLs [4]–[7], [13], [14], [18], [20], [21], the DCO's output frequency is increased by the ADPLL controller to compensate for the phase error at the (n + 1)-th cycle of the HSYNC. As a result, the pixel clock's period is reduced from T to  $(T - \Delta_{DCO})$ . However, the frequency multiplication ratio in a digital video display system is very large (N > 800). For this reason, the phase error at the (n + 2)-th cycle of the HSYNC is increased to  $P - N * \Delta_{DCO}$ . The phase error can not be effectively reduced and its amount is larger than in the previous clock cycle. Therefore, unless the DCO resolution is very good, the tracking jitter (phase error) of the output pixel clock will be very large.

Nevertheless, it is very difficult to create an extremely high resolution DCO. As a result, in ADPLLs [3], [8]–[11], the DCO dithering scheme with a delta-sigma modulator (DSM) has been proposed to improve the frequency resolution of the DCO. A DSM produces a high-rate integer stream whose average value is equal to the low-rate fractional input, as discussed in [3]. A DSM with a DCO is essential in an ADPLL design for video pixel clock generator applications. Since the equivalent DCO resolution can be improved by the DCO dithering scheme, the tracking jitter in high frequency multiplication ratio applications can be effectively reduced.

The HSYNC in a digital video display system is very noisy with peak-to-peak jitter at a nano-second level. Therefore, the noisy HSYNC will worsen the period jitter and the tracking jitter performance. Thus, a loop filter is often needed to filter out the reference clock jitter effects, which then results in a more stable output clock. An ADPLL with a low bandwidth can reduce the period jitter, but the tracking jitter is increased due to the slow response to the reference clock edge variations. As a result, in ADPLLs [15]-[17], two-PLL architecture is used with an external oscillator as a precise timing reference to generate a very high speed sampling clock, as shown in Fig. 2. In the flying-adder based PLL [16], a 756 MHz 10-phase timing reference is generated by the analog PLL with an external 27 MHz oscillator. Then, the digital PLL uses these high-speed multi-phase clock signals to perform clock and data recovery with the noisy HSYNC. In this case, the equivalent sampling clock rate is very high (at 7.56 Gb/s). Therefore, this design requires a large chip area and has high power consumption. In addition, an external oscillator increases the cost of the design.

A digital loop filter is essential in an ADPLL design for video pixel clock generator applications. In prior ADPLLs [4]–[11], [15]–[17], the PI digital loop filter (DLF), composed of a proportional ( $\alpha$ ) path and an integral ( $\beta$ ) path, was widely used as a trade-off between locking speed and jitter performance. The  $\alpha$ -to- $\beta$  ratio in most ADPLLs [4]–[8], [15] is a fixed value, which is optimized in the PLL loop analysis and circuit simulation, as discussed in [5], [7], [17].

There are two problems with a PI digital loop filter with a fixed  $\alpha$ -to- $\beta$  ratio. First, the phase detector gain (K<sub>PD</sub>) and DCO gain ( $K_{DCO}$ ) are assumed as constant values in the PLL loop analysis to get an optimal  $\alpha$ -to- $\beta$  ratio in the PI digital loop filter. However, the phase detector gain and the DCO gain are varied with process, voltage, and temperature (PVT) variations. Thus, jitter performance is worsened due to these gain variations. Second, the reference clock in most PLLs is an external oscillator with negligible jitter effects. Therefore, the reference clock jitter effects are often not included in the PLL loop analysis. For this reason, if a large phase error occurs repeatedly due to the noisy reference clock, and since the phase tracking speed is restricted by the integral path of the PI digital loop filter, an ADPLL with a PI digital loop filter can not quickly compensate for the phase error. Nevertheless, fast phase tracking ability is very important in a video pixel clock generator in order to reduce the tracking jitter. As a result, the PI digital loop filter with a fixed  $\alpha$ -to- $\beta$  ratio is not suitable for video pixel clock generator applications. Therefore, an adaptive gain control is



Fig. 3. Proposed ADPLL architecture.

implemented in the ADPLL [16], [17] for video pixel clock generation.

In this paper, a fast phase tracking ADPLL for video pixel clock generation in 65 nm CMOS technology is presented. The proposed digital loop filter stores the median values in the history of the DCO control codes to eliminate the noisy reference clock jitter effects in the frequency tracking and frequency maintenance. Accordingly, the period jitter of the output pixel clock is reduced by the proposed digital loop filter. In addition, in the proposed ADPLL, time-to-digital converters are applied to quantize the phase error into digital codes. Subsequently, the compensation codes for the digitized phase errors are added to the fractional bits of the DCO control codes. Hence, the phase error is immediately compensated for by the DCO dithering scheme with a delta-sigma modulator (DSM). In this way, the proposed fast phase tracking scheme effectively reduces the tracking jitter of the output pixel clock.

The rest of the paper is arranged as follows: the architecture of the proposed ADPLL is introduced in Section II. Section III describes the circuit implementations of the proposed design. Section IV shows the experimental results of the test chip. Finally, Section V concludes with a summary.

# II. THE PROPOSED ADPLL ARCHITECTURE

Fig. 3 shows the block diagram of the proposed ADPLL. The ADPLL is composed of a phase and frequency detector (PFD), a

time-to-digital converter (TDC), a first-order delta-sigma modulator (DSM), an interpolation-type digitally controlled oscillator (DCO), an ADPLL controller, a digital loop filter (DLF) and a frequency divider. The HSYNC is taken as the reference clock, and the HSOUT is the output pixel clock (PIXEL\_CLK) divided by the frequency divider. In addition, the frequency multiplication ratios (N) for different video display resolutions are specified by the VESA display monitor timing standard [22].

An interpolation-type DCO, which the fine-tuning stage implemented with interpolator circuits [19], is used in the proposed ADPLL to achieve a monotonic response between the DCO control code and the output frequency. The proposed DCO is composed of a coarse-tuning stage and a fine-tuning stage. The DCO's output frequency control is accomplished by a 5-bit coarse-tuning control code, followed by a 5-bit fine-tuning control code to ensure linear and monotonic responses. The frequency resolution of the DCO is further enhanced by employing a DCO dithering scheme through a 9-bit first-order delta-sigma modulator (DSM). Therefore, the integral part of the DCO control code (dco\_code) has 10 bits, and the fractional part of the DCO control code has 9 bits. The operation speed of the DSM is the output pixel clock frequency divided by 8.

The lock-in procedure of the proposed ADPLL controller is divided into four states: a coarse code search state, a fine code search state, a fractional code search state and a fast phase tracking state. In the coarse code search state and the fine code search state, the DSM is turned off, and the ADPLL



Fig. 4. Timing diagram in frequency search.

controller adjusts the integral part of the DCO control code (dco\_code[18:9]) with the PFD's output. Subsequently, the DSM is turned on to improve the equivalent resolution of the DCO. Then, the ADPLL controller adjusts the fractional part of the DCO control code (dco\_code[8:0]) to minimize the frequency error between the HSYNC and the HSOUT. After frequency acquisition is complete, the ADPLL controller enters the fast phase tracking state, and the phase error between the HSYNC and the HSOUT, then the proposed fast phase tracking scheme is applied to reduce the phase error between the HSYNC and the HSOUT. As a result, after the ADPLL is locked, the phase error is minimized.

Fig. 4 shows the timing diagram of the proposed ADPLL in the coarse code search state. After system is reset, the bangbang PFD detects the phase and frequency error between the HSYNC and the HSOUT. Then, it outputs "up" and "down" control signals to the ADPLL controller to indicate that the DCO's output frequency should be sped up or slowed down, respectively. When the ADPLL controller increases the DCO control code (dco code), the DCO's output frequency is slowed down. Oppositely, when the ADPLL controller decreases DCO control code (dco code), the DCO's output frequency is sped up. A binary search scheme is used in the ADPLL controller to reduce the lock-in time to search for the target DCO control code (dco code). Therefore, when the PFD's output is changed from "up" to "down" or vice versa (shown as A in Fig. 4), the search step is divided by 2 (shown as **B** in Fig. 4) until the search step is reduced to 1.

The reference clock (HSYNC) is very noisy in the digital video display system. Thus, in the proposed ADPLL architecture, a digital loop filter is used to produce a baseline frequency control code (avg\_dco\_code). When the phase polarity is changed, the ADPLL controller restores the baseline frequency control code (avg\_dco\_code) to the DCO control code (dco\_code) (shown as C in Fig. 4) to reduce the period jitter of the output pixel clock.

The flow chart of the proposed digital loop filter is shown in Fig. 5. The proposed DLF accepts the DCO control code (dco\_code) outputted by the ADPLL controller. Then, it stores eight DCO control codes ( $C_0$  to  $C_7$ ) to generate a baseline



Fig. 5. Flow chart of the proposed digital loop filter (DLF).

frequency control code (avg\_dco\_code). Every time two new DCO control codes ( $C_{N1}$  and  $C_{N2}$ ) are received by the DLF, the DLF searches for the maximum and minimum values in  $C_0$  to  $C_7$ ,  $C_{N1}$  and  $C_{N2}$ . Subsequently, the maximum and minimum values are removed and the remaining DCO control codes are then stored in  $C_0$  to  $C_7$ .

For instance, if the initial values of the DCO control codes in C<sub>7</sub>) are (9, 10, 10, 10, 10, 10, 10, 10). In addition, the two new inputs in the  $C_{N1}$  and  $C_{N2}$  registers expressed as  $(C_{N1}, C_{N2})$ , are (10, 11), (12, 13), (14, 13), (12, 11), (12, 13) and (12, 11) in sequence. With (10, 11) inputs to the digital loop filter, the maximum and minimum values in  $C_0$  to  $C_7$ ,  $C_{N1}$  and  $C_{N2}$  are 9 and 11. Therefore, 9 and 11 are removed and the other values are stored in the  $C_0$  to  $C_7$  registers. Thus, the values of the DCO 10, 10, 10, 10, 10). Accordingly, the baseline frequency control code (avg dco code), which is the average value of the stored DCO control codes, is changed from 9 to 10. Subsequently, with the input DCO control codes, the values of the  $C_0$  to  $C_7$  registers are changed to (12, 10, 10, 10, 10, 10, 10, 10), (12, 13, 10, 10, 10, 10, 10, 10), (12, 12, 11, 10, 10, 10, 10, 10), (12, 12, 11, 12, 10, 10, 10, 10) and (12, 12, 11, 12, 11, 10, 10, 10) in sequence. As a result, the baseline frequency control codes (avg dco code) are changed to 10, 10, 10, 10 and 11 in sequence.

Fig. 6 shows the simulation results of the proposed DLF for frequency tracking and frequency maintenance. Fig. 6(a) shows the operation of the digital loop filter in frequency acquisition, and there are large DCO control code variations during frequency acquisition. The proposed digital loop filter quickly updates the baseline DCO control code for the ADPLL controller to track the target frequency. Therefore, the lock-in time of the ADPLL is further reduced by the proposed DLF. Fig. 6(b)



Fig. 6. Simulation of the digital loop filter in (a) frequency tracking, and (b) frequency maintenance.

shows the operation of the digital loop filter in frequency maintenance. After the frequency acquisition is complete, the DCO control code is converged to a certain small range. Thus, the proposed digital loop filter can filter out the DCO control code variations due to the reference clock jitter effects and produces a stable baseline DCO control code for the ADPLL controller to reduce the period jitter of the output clock.

The proposed digital loop filter quickly updates the baseline DCO control code in frequency acquisition and maintains a stable baseline DCO control code in frequency maintenance. As compared to the ADPLLs with PI digital loop filter [4]–[11], [15]–[17], the proposed DLF does not require an adaptive loop filter gain control, and therefore, it can reduce the design complexity for the video pixel clock generator.

Fig. 7 shows the proposed fast phase tracking scheme with the TDC and the DSM. After frequency acquisition is complete, the ADPLL keeps tracking the phase error between the HSYNC and the HSOUT. In Fig. 7, the frequency multiplication ratio is N, and the DCO resolution without dithering scheme



Fig. 7. Fast phase tracking with the TDC and the DSM.

is  $\Delta_{\rm DCO}$ . Moreover, in this example, the frequency error between the HSYNC and the HSOUT is assumed to be zero. If the fractional bits of the DCO control code are zero at the n-th cycle of the HSYNC, the output pixel clock period is T, and the phase error is P at the n-th cycle of the HSYNC. Since the HSOUT leads the HSYNC, the digitized phase error (tdc\_code) outputted by the TDC is Q, then, the (Q/2) value is added to the fractional bits of the DCO control code. Therefore, in a total of N pixel clock cycles, (Q/2) pixel clock cycles change from the T to the (T +  $\Delta_{\rm DCO}$ ) period, and the other pixel clock cycles are unchanged with the T period by DCO dithering scheme. As a result, in the next HSYNC cycle, the phase error is reduced to P -  $\Delta_{\rm DCO} * (Q/2)$ .

In the proposed ADPLL, the resolution of the TDC ( $\Delta_{TDC}$ ) is designed as  $4 * \Delta_{DCO}$ . As a result, P is equal to  $4 * Q * \Delta_{DCO}$  (=  $Q * \Delta_{TDC}$ ). Thus, the compensated phase error  $\Delta_{DCO} * (Q/2)$ is equal to P/8 (=  $\Delta_{DCO} * (1/8) * (1/\Delta_{DCO}) * P$ ). In the proposed fast phase tracking scheme, there may have a frequency error between the HSYNC and the HSOUT due to the noisy reference clock (HSYNC) jitter effects. Therefore, only 1/8 of the measured phase error is compensated for. The proposed fast phase tracking scheme is verified with different reference clock jitter models and tested in different video display resolutions. As compared to the ADPLL with the fixed  $\alpha$ -to- $\beta$  ratio PI digital loop filter, the phase error is immediately compensated for with the DSM before the next cycle of the HSYNC. For this reason, the proposed fast phase tracking scheme effectively reduces the tracking jitter of the output pixel clock. In addition, since a first-order DSM is applied to dither the output pixel clock between the T and the  $(T + \Delta_{DCO})$  periods, the period jitter and the cycle-to-cycle jitter are not greatly increased by the proposed fast phase tracking scheme.

### **III. CIRCUIT IMPLEMENTATIONS**

The proposed ADPLL is implemented in a standard performance (SP) 65 nm CMOS technology with 1.0 V power supply to verify the effectiveness of the proposed architecture. Most of the proposed circuits are implemented with standard cells for better portability. In the proposed ADPLL, time-to-digital converters (TDCs) are applied to quantize the phase error between the HSYNC and the HSOUT into digital codes. The block diagram of the proposed TDC is shown in Fig. 8(a). The TDC is composed of two sub-TDCs and one TDC code selection circuit.



Fig. 8. Proposed TDC architecture. (a) Block diagram of the TDC. (b) Details of the sub-TDC circuits.

Since there may be positive phase error or negative phase error, we use two sub-TDCs to quantize the phase error in the same time. The bang-bang PFD's output signals are used to select the output of the sub-TDCs. If the HSYNC leads the HSOUT, the output of the #1 sub-TDC (tdc\_code\_lead) is selected as the "tdc\_code". Oppositely, if the HSYNC lags the HSOUT, the output of the #2 sub-TDC (tdc\_code\_lag) is selected as the "tdc code".

The details of the sub-TDC circuit are shown in Fig. 8(b). The rising edge of the "start" signal propagates through a chain of TDC delay units (TDUs). The phase decision circuit (PDC) [21] detects the lead or lag between the "stop" signal and the delayed "start" signal. When the rising edge of the "stop" signal arrives, the outputs of the PDCs (t[127:0]) are decoded as the "code[6:0]" signal by the TDC decoder.

In the TDC architecture, D-Flip/Flops (DFFs) are often used to sample the outputs of the TDUs. However, if we use the conventional static DFFs provided by the 65 nm standard cell-library, the timing resolution of the TDC is affected by the setup time and hold time requirements of the DFFs (i.e., the dead zone of the DFFs). Therefore, we need to reduce the sampling error caused by the dead zone of the DFF. Thus, the sense-amplifier-based DFF is proposed in [23] to reduce the dead zone of the DFF. In addition, a conventional static DFF with two timing-amplifiers to form a small dead zone PDC is presented in [24]. In this paper, we use the sense-amplifier-based PDC [21] to sample the outputs of the TDUs. Since the dead zone of the PDC is about several pico-second which is much smaller than the dead zone of the conventional static DFFs provided by the 65 nm standard cell-library, and thus the timing resolution of the TDC ( $\Delta_{\text{TDC}}$ ) is improved as the delay time of the TDU.

The proposed TDC can quantize the phase error into digital codes, but how to use the TDC code (tdc\_code) to compensate for the phase error is an important issue. The TDC and the DCO are different circuits with different resolutions. As a result, a mapping gain is often needed between the TDC code and the compensation code for the DCO. However, the resolutions of the TDC ( $\Delta_{TDC}$ ) and the DCO ( $\Delta_{DCO}$ ) are varied with PVT variations. If a fixed mapping gain is used in the ADPLL, the tracking jitter performance will become worse due to gain variations. Therefore, in the proposed ADPLL, an interpolation-type DCO is used to provide a fixed  $\Delta_{TDC}$  to  $\Delta_{DCO}$  ratio even with PVT variations.

Fig. 9 shows the architecture of the proposed interpolationtype DCO. The proposed DCO is composed of a coarse-tuning stage and a fine-tuning stage. In addition, a coarse encoder and a fine encoder are used to convert the binary DCO control code (int\_dco\_code[9:0]) into the thermometer code (c\_sel[31:0] and F[31:0]). In Fig. 9(a), the coarse-tuning stage, which has 32 coarse-tuning delay units (CDUs) with 33 multiplexers, provides 32 different delays. The two adjacent branch delays are selected as the inputs to the fine-tuning stage ("O" and "E"). For example, if c\_sel[31:0] = 32'h0, "D[0]" and "D[1]" are selected, the delay difference between the "O" signal and "E" signal is one CDU delay. Subsequently, if c\_sel[31:0] = 32'h3, "D[2]" and "D[3]" are selected as the inputs to the fine-tuning stage, and the delay difference between the "O" signal and "E" signal is still one CDU delay.

The proposed DCO uses the interpolator circuit as its finetuning stage. The eight interpolator units are connected in parallel with the 32-bit fine-tuning control code (F[31:0]), as shown



Fig. 9. Proposed DCO architecture. (a) Coarse-tuning stage. (b) Fine-tuning stage. (c) Interpolator circuit.

Fig. 9(b). Fig. 9(c) shows the circuits of the interpolator unit in detail. Tri-state inverters [19] are applied to interpolate the "O"

signal and the "E" signal and produce the output pixel clock (PIXEL\_CLK). The fine-tuning control code (F[31:0]) in



Fig. 10. Simulation of the proposed DCO with PVT variations.

controls the relative weight of the two selected branches ("O" and "E"). Each interpolator unit has 4-bit control, for instance, the "interpolator unit 0" has total four control bits: (F[0], F[1], F[2], and F[3]), and it can provide four different delays. Therefore, the fine-tuning stage can provide total 32 (= 4\*8) different delays, and the total delay controllable range of the fine-tuning stage is one CDU delay.

Fig. 10 shows the simulation results of the proposed DCO with PVT variations. The resolution of the coarse-tuning stage is 550 ps in the typical case, 435 ps in the best case and 891 ps in the worst case. The delay controllable range of the fine-tuning stage is 517 ps in the typical case, 406 ps in the best case and 846 ps in the worst case. In addition, the resolution of the fine-tuning stage, which means the DCO resolution ( $\Delta_{\rm DCO}$ ), is 16.2 ps in the typical case, 12.7 ps in the best case and 26.4 ps in the worst case. In the proposed interpolation-type DCO, the interpolator units are applied to generate a fine-tuning delay. Thus, the proposed DCO can be seamlessly switched between the two adjacent sub-frequency bands when the coarse-tuning control code (int dco code[9:5]) is changed, thus ensuring that the DCO output frequency has a monotonic response with the input DCO control code (int\_dco\_code[9:0]), as shown in Fig. 10.

To provide a fixed  $\Delta_{\rm TDC}$  to  $\Delta_{\rm DCO}$  ratio even with PVT variations, one coarse-tuning delay unit (CDU) is composed of four TDC delay units (TDUs) in series connection. Therefore, the resolution of the DCO coarse-tuning stage becomes  $8 * \Delta_{\rm TDC}$ . Subsequently, the resolution of the fine-tuning stage, which means the DCO resolution ( $\Delta_{\rm DCO}$ ), is equal to (1/4) \*  $\Delta_{\rm TDC}$  (= (1/32) \* (8 \*  $\Delta_{\rm TDC}$ )). As a result, if on-chip variations (OCVs) can be ignored, the resolution of the TDC ( $\Delta_{\rm TDC}$ ) is equal to 4 \*  $\Delta_{\rm DCO}$ .



Fig. 11. Microphotograph of the proposed ADPLL test chip.

### **IV. EXPERIMENTAL RESULTS**

Fig. 11 shows a microphotograph of the proposed ADPLL test chip. This chip is fabricated in a standard performance (SP) 65 nm CMOS technology. The chip size is  $910 \times 820 \ \mu m^2$  and the core size is  $280 \times 250 \ \mu m^2$ . Fig. 12 shows the simulation waveform of the proposed ADPLL. A binary search scheme is used in the ADPLL controller to reduce the lock-in time to search for the target DCO control code (dco\_code). After frequency acquisition is complete, the frequency error between the HSYNC and HSOUT is minimized. Then, the ADPLL controller enters the fast phase tracking state, and the phase error between the HSYNC and the HSOUT is quantized by the TDC.



Fig. 12. Simulation waveform of the proposed ADPLL.

| PLL's Multiplication Factor            | 1376<br>(XGA) | 1688<br>(SXGA) | 2160<br>(UXGA) | 2592<br>(WUXGA) | 5600<br>(TEST Only) |
|----------------------------------------|---------------|----------------|----------------|-----------------|---------------------|
| HSYNC Period (µs)                      | 14.56         | 12.50          | 13.33          | 13.41           | 10.24               |
| HSYNC Frequency (kHz)                  | 68.68         | 79.98          | 75.00          | 74.56           | 97.66               |
| Pixel Clock Period (ns)                | 11.09         | 7.41           | 6.17           | 5.17            | 1.90                |
| Pixel Clock Frequency (MHz)            | 90.14         | 134.97         | 161.98         | 193.26          | 527.06              |
| Pixel Clock Jitter <sub>RMS</sub> (ps) | 78.31         | 41.12          | 33.94          | 29.71           | 8.64                |

 TABLE I

 MEASUREMENT RESULTS IN DIFFERENT DISPLAY RESOLUTIONS

Then, the proposed fast phase tracking scheme is applied to minimize the tracking jitter of the output pixel clock.

Fig. 13 shows the simulated maximum phase error when the TDC is turned off or turned on. The simulation results for the four video display modes, XGA, SXGA, UXGA and WUXGA are shown. The frequency multiplication ratios (N) in these modes are 1376, 1688, 2160, and 2592, respectively. The reference clock (HSYNC) with different peak-to-peak jitter is inputted to verify the effectiveness of the proposed fast phase tracking scheme. When the TDC is turned off, the ADPLL controller only uses the bang-bang PFD's output to adjust the fractional part of the DCO control code to minimize the phase error between the HSYNC and the HSOUT. Therefore, when the input jitter is very small, the phase error can be controlled within a reasonable range. However, if we keep increasing the input jitter, the phase error quickly exceeds 100% of the pixel clock period in the WUXGA mode. If the TDC is turned on, the proposed fast phase tracking scheme effectively reduces the phase error in different display resolutions.

The reference clock has 39 ps root-mean-square (rms) jitter and 391 ps peak-to-peak jitter in Figs. 14 and 15 and Table I. Fig. 14 shows the measured ADPLL locking behavior in SXGA mode and 5600 multiplication factor mode. Fig. 14 shows that the phase error between the HSYNC and the HSOUT can be



Fig. 13. Maximum phase error when TDC is OFF or ON.

TABLE II Performance Comparisons

| Parameter                 | Proposed                                       | TVLSI'09[12]              | JSSC'06 [18]               | JSSC'04 [17]                                     | WCCSIE'09[15]                             |
|---------------------------|------------------------------------------------|---------------------------|----------------------------|--------------------------------------------------|-------------------------------------------|
| Process                   | 65nm CMOS                                      | 0.18µm CMOS               | 0.18µm CMOS                | 0.6µm CMOS                                       | 0.13µm CMOS                               |
| Approach                  | All-Digital                                    | All-Digital               | All-Digital                | Mixed-Mode                                       | Mixed-Mode                                |
| Phase Alignment<br>Method | TDC-based PFD<br>with DSM                      | Bang-bang PFD             | No                         | Counter-Based<br>PFD<br>with external<br>crystal | TDC-Based PFD<br>with external<br>crystal |
| Area                      | $0.07 \text{ mm}^2$                            | $0.14 \text{ mm}^2$       | $0.16 \text{ mm}^2$        | $1.8 \text{ mm}^2$                               | $0.2 \text{ mm}^2$                        |
| Power                     | 0.85 mW<br>(@193 MHz)<br>1.81 mW<br>(@520 MHz) | 26.7 mW<br>(@600 MHz)     | 15 mW<br>(@378MHz)         | 180 mW<br>(@78MHz)                               | N/A                                       |
| Input Range               | 35.71 kHz ~<br>12.5 MHz                        | 30.3 kHz ~<br>100 MHz     | 19.26 kHz ~<br>60 MHz      | N/A                                              | N/A                                       |
| Output Range              | $90\sim527~MHz$                                | $62 \sim 616 \text{ MHz}$ | $2.4 \sim 378 \text{ MHz}$ | $10 \sim 80 \text{ MHz}$                         | N/A                                       |
| Multiplication<br>Factor  | $16\sim 5600$                                  | $1 \sim 2046$             | 4~13888                    | N/A                                              | N/A                                       |
| Jitter <sub>RMS</sub>     | 78.31ps<br>(@90MHz)<br>8.64ps<br>(@527MHz)     | 7.28 ps<br>(@600 MHz)     | 76ps<br>(@134.7 MHz)       | N/A                                              | 32.4ps<br>(@78MHz)                        |

bounded in steady state even with a very high frequency multiplication ratio. Fig. 15 shows the measured jitter histogram of the output pixel clock in SXGA mode and 5600 multiplication factor mode. The measurement results show that the output pixel clock has 41.1 ps rms jitter in SXGA mode and 8.64 ps rms jitter in SXGA mode, which shows that the period jitter is effectively reduced by the proposed digital loop filter. The measurement results from different display resolutions are summarized in Table I. The output period jitter is reduced by the proposed digital loop filter while the tracking jitter is minimized by the proposed fast phase tracking scheme with the TDC and the DSM.

The performance comparisons of the prior studies are listed in Table II. In [18], a very large frequency multiplication ratio (13,888) is reported, but the proposed frequency counter architecture can only be used for frequency synthesis applications and not for phase tracking applications. In [15] and [17], two-PLL architecture is used for pixel clock generation applica-



Fig. 14. Measured locking behavior with (a) SXGA mode, and (b) 5600 multiplication factor.

Fig. 15. Measured jitter histogram of the pixel clock with (a) SXGA mode, and (b) 5600 multiplication factor.

tions, but it requires a large chip area and has high power consumption. In addition, an external oscillator increases the cost of the design. In [12], the bang-bang PFD without a TDC can not effectively track the phase of the noisy reference clock, as explained in Fig. 13. Thus, it is not suitable for pixel clock generation applications.

# V. CONCLUSION

In this paper, a fast phase tracking ADPLL for video pixel clock generation in 65 nm CMOS technology is presented. A DCO dithering scheme is applied to improve the frequency resolution of the DCO. Thus, the ADPLL can achieve phase tracking even with a high frequency multiplication ratio (>800). When the reference clock has large jitter, the proposed digital loop filter can eliminate the reference clock jitter effects and reduces the period jitter of the output pixel clock. The proposed fast phase tracking scheme with the TDC and the DSM effectively reduces the tracking jitter of the output pixel clock. The maximum phase error with 1.2 ns peak-to-peak HSYNC jitter in the

WUXGA mode is controlled less than 1.7 ns, which is less than one-third of the output pixel clock period (5.17 ns). The proposed ADPLL can perform a precise clock generation from a very noisy and low frequency reference clock. Therefore, it is very suitable for video pixel clock generation applications in an advanced CMOS process.

#### REFERENCES

- C.-C. Hung and S.-I. Liu, "A leakage-compensated PLL in 65-nm CMOS technology," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 56, no. 7, pp. 525–529, Jul. 2009.
- [2] J. G. Maneatis, J. Kim, I. McClatchie, J. Maxey, and M. Shankaradas, "Self-biased high-bandwidth low-jitter 1-to-4096 multiplier clock generator PLL," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1795–1803, Nov. 2003.
- [3] R. B. Staszewski, D. Leipold, K. Muhammad, and P. T. Balsara, "Digital controlled oscillator (DCO)-based architecture for RF frequency synthesis in a deep-submicrometer CMOS process," *IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 815–828, Nov. 2003.
- [4] P.-H. Hsieh, J. Maxey, and C.-K. K. Yang, "A phase-selecting digital phase-locked loop with bandwidth tracking in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 781–792, Apr. 2010.

- [5] V. Kratyuk, P. K. Hanumolu, U.-K. Moon, and K. Mayaram, "A design procedure for all-digital phase-locked loops based on a charge-pump phase-locked-loop analogy," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 54, no. 3, pp. 247–251, Mar. 2007.
- [6] X. Chen, J. Yang, and L.-X. Shi, "A fast locking all-digital phaselocked loop via feed-forward compensation technique," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 5, pp. 857–868, May 2011.
- [7] S.-P. Lee and S.-H. Cho, "A background K<sub>DCO</sub> compensation technique for constant bandwidth in all-digital phase-locked loop," in *Proc.* 2010 IEEE Int. Symp. Circuits and Systems (ISCAS), May 2010, pp. 3401–3404.
- [8] K.-H. Choi, J.-B. Shin, J.-Y. Sim, and H.-J. Park, "An interpolating digitally controlled oscillator for a wide-range all-digital PLL," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 56, no. 9, pp. 2055–2063, Sep. 2009.
- [9] S.-Y. Yang, W.-Z. Chen, and T.-Y. Lu, "A 7.1 mW, 10 GHz all digital frequency synthesizer with dynamically reconfigured digital loop filter in 90 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 578–586, Mar. 2010.
- [10] R. B. Staszewski and P. T. Balsara, "All-digital PLL with ultra fast settling," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 54, no. 2, pp. 181–185, Feb. 2007.
- [11] C.-C. Hung, I.-F. Chen, and S.-I. Liu, "A 1.25 GHz fast-locked all-digital phase-locked loop with supply noise suppression," in *Proc. 2010 IEEE Int. Symp. VLSI Design Automation and Test (VLSI-DAT)*, Apr. 2010, pp. 237–240.
- [12] H.-J. Hsu and S.-Y. Huang, "A low-jitter ADPLL via a suppressive digital filter and an interpolation-based locking scheme," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 1, pp. 165–170, Jan. 2011.
- [13] C.-T. Wu, W.-C. Shen, W. Wang, and A.-Y. Wu, "A two-cycle lock-in time ADPLL design based on a frequency estimation algorithm," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 57, no. 6, pp. 430–434, Jun. 2010.
- [14] T. Watanabe and S. Yamauchi, "An all-digital PLL for frequency multiplication by 4 to 1022 with seven-cycle lock time," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 198–204, Feb. 2003.
- [15] G.-J. Xie and C. Wang, "An all-digital PLL for video pixel clock regeneration applications," in *Proc. 2009 World Congr. Computer Science* and Information Engineering (CSIE), Mar. 2009, pp. 392–396.
- [16] T.-Y. Oh, S.-H. Yi, S.-H. Yang, B.-C. Lim, and K.-T. Hong, "A digital PLL with 5-phase digital PFD for low long-term jitter clock recovery," in *Proc. 2006 IEEE Custom Integrated Circuits Conf. (CICC)*, Sep. 2006, pp. 745–748.
- [17] L. Xiu, W. Li, J. Meiners, and R. Padakanti, "A novel all-digital PLL with software adaptive filter," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 476–483, Mar. 2004.
- [18] P.-L. Chen, C.-C. Chung, J.-N. Yang, and C.-Y. Lee, "A clock generator with cascaded dynamic frequency counting loops for wide multiplication range applications," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1275–1285, Jun. 2006.

- [19] B.-M. Moon, Y.-J. Park, and D.-K. Jeong, "Monotonic wide-range digitally controlled oscillator compensated for supply voltage variation," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 55, no. 10, pp. 1036–1040, Oct. 2008.
- [20] C.-C. Chung and C.-Y. Lee, "An all-digital phase-locked loop for highspeed clock generation," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 347–351, Feb. 2003.
- [21] H.-J. Hsu, C.-C. Tu, and S.-Y. Huang, "A high-resolution all-digital phase-locked loop with its application to built-in speed grading for memory," in *Proc. 2008 IEEE Int. Symp. VLSI Design Automation and Test (VLSI-DAT)*, Apr. 2008, pp. 267–270.
- [22] VESA and Industry Standards and Guidelines for Computer Display Monitor Timing, 1.0, Revision 10, Video Electronics Standards Assoc., 2004.
- [23] R. B. Staszewski *et al.*, "1.3 V 20 ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 53, no. 3, pp. 220–224, Mar. 2006.
- [24] S.-Y. Lin and S.-I. Liu, "A 1.5 GHz all-digital spread-spectrum clock generator," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3111–3119, Nov. 2009.



Ching-Che Chung (S'01–M'03) received the B.S. and Ph.D. degrees in electronics engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 1997 and 2003, respectively.

From 2004 to 2008, he was serving as a postdoctoral researcher in the same university, working in the area of system-on-chip design methodologies and high-speed interface circuit design. In August 2008, he joined the faculty of the Computer Science and Information Engineering Department, National Chung Cheng University, Chia-Yi, Taiwan, where he is cur-

rently an Assistant Professor. His research interests mainly include wireless and wireline communication systems, low-power and system-on-a-chip (SoC) design technology, mixed-signal IC design and sensor circuits design, all-digital phase-locked loop, all-digital delay-locked loop and its applications.



**Chiun-Yao Ko** received the M.S. degree in computer science and information engineering from National Chung Cheng University, Chia-Yi, Taiwan, in 2010.

He is currently a design engineer at the design service division of Global Unichip Corporation (GUC), Hsinchu, Taiwan, working on physical design service. His research interests include system-on-a-chip (SoC) design methodologies and all-digital phase-locked loop.