# An All-Digital Delay-Locked Loop for DDR SDRAM Controller Applications

Ching-Che Chung, Pao-Lung Chen, and Chen-Yi Lee Dept. of Electronics Engineering / National Chiao Tung University Hsinchu, Taiwan, R.O.C.

#### ABSTRACT

This paper presents an all-digital Delay-Locked Loop (DLL) for DDR SDRAM controller applications. The presented all-digital, cellbased, DLL-based five-phase multi-phase clock generator can generate the required fixed timing delay (tSD) for DDR SDRAM controller to capture the output data (DQ) correctly. The proposed DLL-based multi-phase clock generator architecture can lock to the harmonic of input clock period and still get a correct multi-phase clock output. Hence the design challenges to build a high resolution delay line with minimum intrinsic delay can be reduced. Simulation results and chip measurement results show that the proposed DLL can generate desired tSD delay with error < 7.6%). The power consumption of the proposed DLL is 4.1mW (at DDR-200) and is 9.0mW (at DDR-400).

#### INTRODUCTION

Delay-Locked Loops (DLLs) have been widely used for designing high-speed memory interface circuit or clock multiplier to perform clock de-skew [2, 4] and multi-phase clock generation [3, 5]. In those applications, the DLL offers better jitter performance than Phase-Locked Loop (PLL) because the reference clock jitter and the noise induced by power supply noise or substrate noise disappear at the end of the delay line. On the other hand, the ring oscillator of PLL accumulates jitter and noise effects. Thus DLL is good alternative for PLL in those applications and has a good phase tracking ability.

In Double Data Rate (DDR) SDRAM controller design, the output data strobe (DQS) signal must be delayed by a fixed timing delay (tSD) to capture the output data (DQ) correctly. Figure 1 shows this read operation timing budget. Ideally, the DQS and DQ is edge aligned by DDR SDRAM. However due to pin-to-pin skew among all DQ and DQS, and PCB board skew, the data valid window becomes smaller than expected. As a result, how to generate an optimal timing delay (tSD) to make sure that both setup/hold time budget for the controller can be met, has become an important design issue for DDR SDRAM controller design.

In [1], the calculations for timing budget show that the optimal value for tSD is approximately 20 percent of an input clock period. Since the input clock frequencies range from 100MHz to 200MHz (DDR-200/266/333/400), the tSD value varies from  $2ns(=10ns\times0.2)$  to 1ns (= $5ns\times0.2$ ). In this paper, a five-phase, all-digital, cell-based, DLL-based multi-phase clock generator architecture is proposed to generate this desired tSD delay for DQS signal. The proposed design can overcome process, voltage, and temperature (PVT) variations and still generate the desired tSD delay.

DLL-based multi-phase clock generator must have anti-harmonic lock capability. Otherwise the multi-phase clock generation will be failed. In previous DLL design [3-4], the lock detector is used to overcome the possibility to lock to the harmonic of input clock period. A similar concept is proposed in [5], it uses a Time-to-Digital Converter (TDC) to perform period measurement to avoid false-lock. After DLL is lock, the total delay of whole delay line is the same as input clock period.

However, if the required numbers of output multi-phase signals are increased or the required maximum operating frequency is increased, DLL architectures [3, 5] will face difficult design challenges to build a high resolution delay cell with minimum intrinsic delay requirement. As a result, the operating range of previous DLL is severely limited.

In this paper, the new DLL-based multi-phase clock generator architecture is proposed to let the DLL lock to harmonic of the input clock period and can still get a correct multi-phase clock output. The proposed architecture can reduce the design challenges to build a high resolution delay line with minimum intrinsic delay. Hence the frequency operating range of DLL can be improved.



FIGURE 1. READ OPERATION TIMING BUDGET

The proposed DLL is implemented with a 0.13µm 1P8M CMOS process Structured ASIC cell-library. Frequency operating range of the proposed DLL ranges from 100MHz to 200MHz to meet the DDR-200/266/333/400 specifications. The power consumption of the proposed DLL is 4.1mW (at DDR-200) and is 9.0mW (at DDR-400).

#### **PROPOSED DLL ARCHITECUTRE**

The proposed DLL architecture is shown in Figure 2. Like most of DLL-based multi-phase clock generators [3,5], the DLL has a multi-stage delay line with the same control code (dline\_control) to generate equally spaced multi-phase clock output. The TDC, which used in the close-loop for reference clock (FREF) period measurement [5], provides the range selection control for the DLL controller.

In previous DLL designs [3-5], when DLL is locked, total delay of the whole delay line is equal to one period of the reference clock ( $T_{FREF}$ ). Hence each delay stage should have  $T_{FREF}/5$  delay. As a result, the design requirements for the minimum delay of each delay stage must be smaller than this delay value:  $T_{FREF}/5$  in the worst case.

Sometimes it is difficult to meet this minimum delay constraint when using standard cell to build up a high resolution delay cell. Thus in the proposed DLL, the TDC measures two periods of the reference clock period and makes the DLL lock to  $2 \times T_{FREF}$ . After DLL is locked, each delay stage should have  $2 \times T_{FREF}/5$  delay. Hence the minimum delay constraint for each delay stage is extended twice as original. And the total delay from DQS to DQSD becomes  $1.2 \times T_{FREF} = 3 \times (2 \times T_{FREF}/5)$ , which means the phase shift between DQS and DQSD is still  $0.2 \times T_{FREF}$ . As a result, the desired tSD delay can be generated by the proposed DLL.



FIGURE 2. THE PROPOSED DLL ARCHITECTURE

The output order of multi-phase clock is rearranged. The output multi-phase clock signals are also shown in Figure 2 and named as CKOUT0 to CKOUT4. The phase shift between FREF and P2 is  $0.2 \times T_{FREF} = \{3 \times (2T_{FREF}/5) \text{ modulus } T_{FREF}\}$ , thus it is rearranged as CKOUT0. Similarly, the phase shift of P0 is  $0.4 \times T_{FREF}$ , phase shift of P3 is  $0.6 \times T_{FREF}$ , phase shift of P1 is  $0.8 \times T_{FREF}$ , and phase shift of P4 is 0. As a result, the output multi-phase clock signals are mapped as follows: {CKOUT0, CKOUT1, CKOUT2, CKOUT3, CKOUT4} = {P2, P0, P3, P1, P4}.

In the proposed DLL architecture, since DLL can lock to harmonic of input clock period and can still get a correct multi-phase clock output. The proposed architecture actually helps to improve multi-phase clock rate and helps to increase the numbers of multiphase clock signals with a high-speed reference clock input. On the other hand, it also increases the maximum operating frequency of the DLL.

### **CIRCUIT IMPLEMENTATION**

Figure 3 shows the circuit of one delay stage. The total delay line consists of five equal delay stages. The proposed delay stage is controlled by three cascading stage: range selection stage, coarse-tuning stage, and fine-tuning stage.

In fine-tuning stage, the digital-controlled varactor (DCV) [7] is used to achieve very fine resolution and those DCVs are implemented with standard cells. In Figure 3, the change of finetuning control code (FCON[31:0]) will finely adjust the capacitive loading on F\_OUT net. The worst phase resolution of this fine-tuning stage is 1.4ps with good linearity when it is implemented with a  $0.13\mu$ m 1P8M CMOS process Structured ASIC cell-library, and the total controllable delay range of this fine-tuning stage is 37ps. Both range selection stage and coarse-tuning stage are implemented using path-selector [5] with different delay cells. The total controllable delay range of fine-tuning delay stage must cover the delay step size of coarse-tuning stage. For coarse-tuning stage, it also needs to cover the delay step size of range selection stage.

The maximum delay of one delay stage must be larger than  $2 \times T_{FREF}/5$  in the best case (FF, 1.32V, -40°C), and minimum delay of one delay stage must be smaller than  $2 \times T_{FREF}/5$  in the worst case (SS, 1.08V, 125°C).



FIGURE 3. THE PROPOSED HIGH RESOLUTION DELAY LINE

The phase detector (PD) used in the proposed DLL is almost the same with the phase detector proposed in [6] but with some modifications for phase error detection ability enhancement.



FIGURE 4. PHASE DETECTOR DEAD ZONE MINIMIZATION

Figure 4 shows modified circuit and signal waveforms for the proposed PD. In the proposed PD [6], phase detector only detects the sign of phase error (i.e. lead or lag). Both QU and QD are three-state PD's outputs to decide whether IN leads or lags FB. By using the digital pulse amplifier concept in [6] to extend the pulse width of QU and inserting a delay on QD net, the generated OUTU pulse width can be further extended. It means the detected phase error is enlarged

by this improvement circuit. As a result, the minimum detectable phase error of this PD can be improved. Similarly, the modified circuit for OUTD pulse generation is also shown in Figure 4.

Thus the next stage digital pulse amplifier can more easily extend the OUT/OUTD pulse width to meet the output register's timing requirements. By carefully design the digital pulse amplifier to extend the three-state PD output pulse width, the dead zone of this PD can be reduced to 5ps when it is implemented with  $0.13\mu m$ 1P8M CMOS process Structured ASIC cell-library.

Since the delay line output signal (P4) shown in Figure 2 has more capacitive loadings than the other multi-phase clock signals. Dummy cells must be added to the rest of multi-phase clock signals (P0-P3) to balance the loadings of each multi-phase clock signal.

And during DLL placement and routing process, the delay stage must be placed and routed firstly, and then one delay stage layout is duplicated and is connected to build up the whole delay line.



FIGURE 5. LAYOUT OF THE PROPOSED DLL TEST CHIP

The total gate count of the proposed DLL is 5250, and the area of the proposed DLL is  $452\mu m \times 462\mu m$ . Figure 5 shows the layout of DLL test chip. A digital-controlled oscillator (DCO) is added on this test chip to use as the reference clock for DLL. And DLL can also take external clock as reference clock.

The proposed design can use normal auto placement and routing (APR) tools to generate the final layout with carefully writing timing constraints for each delay cell. Thus the proposed design actually reduces the design time and design complexity to build up the DLL and multi-phase clock generator.

# SIMULATION AND MEASURED RESULTS

Figure 6 shows the transient response of DLL after system reset. After system reset (i.e. PDN=1), the TDC performs reference clock (FREF) period measurement. The dline\_in signal shown in Figure 6 becomes "low" for two reference clock periods. This pulse is sent to TDC and is converted into delay line range control code (range\_control[2:0]) and then the delay line executes range selection.

After delay line finishes range selection, the total delay of the whole delay line falls into this range:  $1.5 \times T_{FREF} \leq T_{delay-line} \leq 2.0 \times T_{FREF}$ . Then the DLL controller continues fine-tuning the output phase accuracy with PD's UP/DOWN signal. Since the delay range of the delay line is determined first, thus the false-locked problem will not occur in the proposed design.

When the phase error between reference clock (FREF) and delay line output (P4 or CKOUT [4]) is smaller than PD's dead zone, the DLL is locked. Both multi-phase clock generation and tSD delay generation are completed.



FIGURE 6. TRANSIENT RESPONSE OF THE DLL (AT DDR-200)



FIGURE 7. DLL AT STEADY STATE (AT DDR-400)

Figure 7 shows the proposed DLL at steady state. When DLL is locked, the generated multi-phase clock signals (CKOUT [4:0]) reach equal space in one reference clock period. And the phase shift between DQS and DQSD is  $0.2 \times T_{FREF}$ . In this simulation, the DQS is assigned to reference clock, thus the phase shift between CKOUT [0] and DQSD should be 0. (i.e. tSD= $0.2 \times T_{FREF}$ ).

The DLL continues to update the delay line control code and keeps tracking to the rising edge of reference clock. Since in the proposed DLL, only rising edge information of reference clock is used hence it also has good duty-cycle error immunity. Compared to previous DLL [4], the extra duty-cycle correction (DCC) circuit is eliminated.

Post-layout Fast-SPICE simulations are performed in different operation conditions to make sure the stability of the proposed DLL. Table 1 lists the simulated tSD value vs. desired tSD value. In different input conditions (DDR-200/266/333/400) and different operation environment, the generated delay error ( $\Delta$ tSD) can still keep smaller than 7.6% of the desired tSD value, which meets the required delay error (< 13%) as mentioned in [1].

TABLE 1. DLL IN DIFFERENT OPERATING CONDITION

| Input   | Desired  | Simulated tSD (ns) |       |        |
|---------|----------|--------------------|-------|--------|
|         | tSD (ns) | 1.08V,             | 1.2V, | 1.32V, |
|         |          | 125 °C             | 25 °C | -40 °C |
| DDR-200 | 2.0      | 1.911              | 1.972 | 1.898  |
| DDR-266 | 1.5      | 1.537              | 1.568 | 1.588  |
| DDR-333 | 1.2      | 1.136              | 1.291 | 1.266  |
| DDR-400 | 1.0      | 0.981              | 0.976 | 0.947  |

Figure 8 shows the measured DLL output waveform after DLL lock with 1.2V power supply at room temperature. The signal shown in channel 2 is reference clock (FREF), and the signal shown in channel 1 is CKOUT0. The reference clock is a 100MHz noisy clock source with rms jitter 269ps and  $P_{\rm K}$ - $P_{\rm K}$  jitter 950ps. The jitter histogram of reference clock is also shown in Figure 8.



FIGURE 8. MEASURE DLL OUTPUT (AT DDR-200)

Ideally, the phase shift between FREF and CKOUT0 should be  $360^{\circ} \div 5 = 72^{\circ}$ , but due to the delay line resolution limitations, phase detector dead zone, and reference clock jitter effects, the measured phase shift between FREF and CKOUT0 is  $73.6^{\circ}$ , which means the delay error is  $44.4\text{ps} = (73.6^{\circ}-72^{\circ}) \div 360^{\circ} \times 10\text{ns}$  (at DDR-200). The jitter of output multi-phase clock signal CKOUT0 is rms jitter 267ps and  $P_{\text{K}}$ - $P_{\text{K}}$  jitter 920ps. The measured results show that the jitter performance of DLL output multi-phase clock signals is dependent on the reference clock jitter. But even with noisy reference clock input, the proposed DLL can still achieve lock and generate the required tSD delay.

The power consumption of the proposed DLL is 4.1mW (at DDR-200) and is 9.0mW (at DDR-400).

# CONCLUSION

In this paper, an all-digital delay-locked loop for DDR SDRAM controller applications is presented. The proposed DLL architecture can not only reduce design challenges to build up a high resolution delay cell with minimum delay constraint, but also make it possible to implement with standard cells. Thus the proposed architecture can reduce both design time and circuit complexity. And it is very suitable for many high-speed interface circuit or digital communication applications.

# ACKNOWLEDGMENT

The author would like to thank their colleagues within the SI2 group of National Chiao Tung University for many fruitful discussions. The Structured ASIC cell-library support from Faraday Technology Corporation is acknowledged as well.

#### REFERENCES

- MICRON Technology Inc., "DDR SDRAM Functionality and Controller Read Data Capture", *DesignLine*, Vol.8, Issue 3, Sep. 1999.
- [2] Jae Joon Kim, Sang-Bo Lee, Tae-Sung Jung, Chang-Hyun Kim, Soo-In Chom, and Beomsup Kim, "A Low-Jitter Mixed-Mode DLL for High-Speed DRAM Applications," *IEEE J. Solid-State Circuits*, Vol.35, pp. 1430-1436, Oct. 2000.
- [3] D.J. Foley, and M. P. Flynn, "CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillator," *IEEE J. Solid-State Circuits*, Vol.36, pp. 417-423, Mar. 2001.
- [4] Eunseok Song, Seung-Wook Lee, Jeong-Woo Lee, Joonbae Park, Soo-Ik Chae, "A Reset-Free Anti-Harmonic Delay-Locked Loop Using a Cycle Period Detector," *IEEE J. Solid-State Circuits*, Vol.39, pp. 2055-2061, Nov. 2004.
- [5] Ching-Che Chung and Chen-Yi Lee, "A New DLL-Based Approach for All-Digital Multiphase Clock Generation," *IEEE J. Solid-State Circuits*, Vol.39, pp. 469-475, Mar. 2004.
- [6] Ching-Che Chung and Chen-Yi Lee, "An All-Digital Phase-Locked Loop for High-Speed Clock Generation," *IEEE J. Solid-State Circuits*, Vol.38, pp. 347-351, Feb. 2003.
- [7] Pao-Lung Chen, Ching-Che Chung, and Chen-Yi Lee, "A Novel Digitally-Controlled Varactor for Portable Delay Cell Design," *IEICE Tran. Fundamentals*, Vol. E87-A, pp. 3324-3326, Dec. 2004.