# All-Digital Delay-Locked Loop for 3D-IC Die-to-Die Clock Synchronization

Ching-Che Chung and Chi-Yu Hou Department of Computer Science and Information Engineering, National Chung Cheng University, No. 168 University Rd., Min-Hsiung, Chia-Yi, Taiwan Email: wildwolf@cs.ccu.edu.tw

Abstract - In this paper, an all-digital delay-locked loop (ADDLL) for 3D-IC die-to-die clock synchronization with through silicon vias (TSVs) is presented. The proposed ADDLL can tolerate delay variations in TSVs and synchronize the clock signals in multiple layers of a given 3D-IC. Firstly, after system is reset, the proposed ADDLL uses two high resolution delay lines which composed of digital controlled varactors (DCVs) to compensate for the delay variations in TSVs. Subsequently, the proposed ADDLL can further compensate for the clock skew of clock signals in multiple layers of a 3D-IC. After ADDLL is locked, the clock skew or phase error is eliminated, and data transfer between dies can be performed synchronously. The proposed design can operate from 300MHz to 1GHz. The proposed ADDLL is implemented in a standard performance 90nm CMOS process, and the area of the ADDLL per die is 0.045mm<sup>2</sup>. The power consumption of the proposed ADDLL is 3.27mW at 1GHz, and the maximum phase error of clock signals in multiple layers of a given 3D-IC is 21.9ps.

*Index Term*—all-digital delay-locked loop, through silicon via (TSV), 3D-IC, digitally controlled delay line.

#### I. INTRODUCTION

According to Moore's Law, there are more and more transistors can be integrated in a single chip with the improvement of semiconductor technologies. However, in a system-on-a-chip (SoC), some modules, such as, dynamic random access memory (DRAM) or radio-frequency (RF) circuits require a special process technology rather than the logic process technology. Thus, they should be fabricated in different processes to minimize the overall design cost. To integrate these fabricated dies, bonding wires can be used for dieto-die connection. In recent years, the die-to-die connection through through-silicon-vias (TSVs) becomes more popular due to a much smaller delay of TSVs than the bonding wires. TSVs can be placed anywhere on the dies, and thus, the number of IOs are increased, and the wire length between dies can be greatly reduced. However, TSVs can be faulty due to incomplete fills at pre-bond or misalignment during post-bonding. As a result, sometimes it needs to re-route signals through spare TSVs. In addition, defects of TSVs can also exhibit a longer delay time than its ideal value. As a result, the propagation delay through different TSVs will have large delay variations, and for inter-die clock distribution, there should be a way to tolerate any unexpected TSV delay variations.

Many researches [5]-[8] pay attention to the TSV variation problem. In [5], the research focuses on analyzing the impact of TSV open defects. It shows that the resistive-open phenomenon will cause increasing of the TSV propagation delay. In [6]-[8], a ring oscillator (RO)-based architecture is proposed to detect the TSV variation phenomenon. In [6], the variable output threshold is proposed. They can detect the parametric delay fault by dynamically switching the inverter driving ability to observe the output frequency of RO. This research shows that the maximum TSV delay variation can be up to about 500ps. In [7], the research focuses on pre-bond TSV test. It also uses the RO architecture to detect the TSV propagation delay variation in their RC parameters. The proposed pre-bond TSV testing can detect leakage and resistive-open faults during manufacturing test. In accordance with above results, the propagation delay of TSVs will vary with process variations.

In high-speed SoC design, the global clock distribution through clock tree buffers and clock network routing should be carefully designed to minimize the clock skew between modules. The delay-locked loops (DLLs) [1] and phase-locked loops (PLLs) are widely used to eliminate the clock skew between the local clock and the global clock. Therefore, for 3D IC clock synchronization, a DLL-based data self-aligner (DBDA) [2] is presented to reduce the data confliction time of the memory's outputs with stacked dies. In the DBDA, a replica TSV delay is required for the DLL circuit. However, since unexpected TSV delay variations may occur due to faulty TSVs, the DBDA may still have large phase error after the DLL is locked.

A dual-locking DLL [3] is proposed for die-to-die clock synchronization. The dual-locking DLL does not need to replicate the delay of TSVs, and therefore, the phase error caused by the mismatch in the replica TSV delay and the real TSV delay will not happen. However, the dual-locking DLL needs to continue fine-tuning the two DLLs in an interleaved manner to keep maintaining the phase alignment between the clock signals in multiple layers of a 3D-IC. As a result, the dual-locking DLL needs to regularly switch the direction of the forward path and the feedback path, and perform fine-tuning in two DLLs, which may cause a relatively large phase error during phase maintaining mode.

A dual-delay-locked loop (D-DLL) [4] is proposed for die-to-die clock deskew circuit applications. Two analog charge-pump-based DLLs are used in this design. However, a special bidirectional buffer is required in this design to simultaneous transmit of signals in both directions on a single TSV. In addition, two DLLs are working at the same time which increases the design complexity of the D-DLL. In advanced CMOS process, the leakage current problem of the MOS transistor and high voltage control gain problem with a low supply voltage in design of the voltage controlled delay line for wide frequency range operation will be the design challenges of the D-DLL.

In this paper, an all-digital delay locked loop (ADDLL) for 3D-IC die-to-die synchronization with two TSVs is presented. The proposed ADDLL uses two high resolution delay lines with digital controlled varactors (DCVs) to compensate for the delay variations of two TSVs. Then in the proposed ADDLL, two digital controlled delay lines (DCDLs) will be used to eliminate the phase error of clock signals in two dies of the 3D-IC. After ADDLL is locked, data transfer between dies can be transmitted synchronously with a high-speed clock through a large number of TSVs concurrently

The rest of the paper is organized as follows: Section II describes the overall architecture, circuit design details, TSV delay variation compensation procedure, and the locking mechanism of the proposed

This work was supported in part by the National Science Council of Taiwan, under Grant NSC102-2221-E-194-063-MY3.



\* : dcdl\_code[10:0]

FIGURE. 1 THE PROPOSED ADDLL ARCHITECTURE.

ADDLL. The experimental results at three process, voltage, and temperature (PVT) cases are discussed in Section III. Finally, Section IV concludes with a summary.

## II. SYSTEM ARCHITECTURE

Fig. 1 shows the architecture of the proposed ADDLL for die-todie clock synchronization in a given 3D-IC. The ADDLL is composed of two digital controlled delay lines (DCDL\_A and DCDL\_B), two digital controlled varactor-based delay lines (DCV\_A and DCV\_B), two ADDLL controllers (CTRL\_A and CTRL\_B), two phase detectors (PD\_A and PD\_B), two frequencydivided-by-2 circuits, and six tri-state buffer groups (buf\_A to buf\_F). The DCDL [9] is composed of a coarse-tuning delay stage and a fine-tuning stage. DCV\_A and DCV\_B are used to compensate for the TSV delay variations. DCDL\_A and DCDL\_B are used to compensate for the phase error between clk div 2 and fb div2.



FIGURE. 2 THE PROPOSED DCV-BASED DELAY LINE ARCHITECTURE.

Fig. 2 shows the proposed DCV-based delay line architecture (DCV\_A and DCV\_B). It is composed of a bypass inverter chain and a DCV delay line. There are 63 NAND gates used as digital controlled varators to provide a fine resolution delay line. The DCV\_A and DCV\_B are used to compensate the delay variations between two TSVs. The dcv\_en signal is used to control the bypass inverter chain to provide a longer propagation delay time in best case PVT conditions. When the operation condition is at worst case PVT condition, the dcv\_en is set to 0 to reduce the overall delay time of the DCV-based delay line. The DCDL\_A and DCDL\_B are controlled by the same delay line control code (dcdl\_code[10:0]). The delay line control code can adjust the propagation delay of the upper delay path and the lower delay path.

The proposed digitally controlled delay line [9] (DCDL\_A and DCDL\_B) is composed of a coarse-tuning delay stage and a finetuning stage. The coarse-tuning delay stage is composed of 63 coarse-tuning delay units. Each coarse-tuning delay unit is composed of three NAND gates and a dummy cell. The dummy NAND gate is added to balance the wire capacitance. The fine-tuning delay stage [9] is composed of two parallel connected tri-state buffers. The tri-state buffer arrays operate as an interpolator circuit to achieve a fine resolution.



# FIGURE. 3 TIMING DIAGRAM OF TSV DELAY VARIATIONS COMPENSATION.

There are three steps in the proposed ADDLL to achieve die-todie clock synchronization. **First**, when the ADDLL is reset, the path\_control signal is set to zero, and DCDL\_A, DCDL\_B, DCV\_A, and DCV\_B are set to provide a minimum delay time. In the upper delay path, the DIE1\_CLK signal passes through buf\_A, DCDL\_A, buf\_B, TSV1, DCV\_A, and buf\_E to the phase detector B (PD\_B) denoted as dcva\_to\_pd. Similarly, in the lower delay path, the DIE1\_CLK signal passes through buf\_C, DCDL\_B, buf\_D, TSV2, DCV\_B, and buf\_F to the PD\_B denoted as dcvb\_to\_pd. In the proposed ADDLL, six tri-state buffer groups (buf\_A to buf\_F) are designed with same tri-state buffers, and the delay time of the DCDL A is the same as the DCDL B. Therefore, the phase error between dcva\_to\_pd signal and dcvb\_to\_pd signal comes from the delay variations between TSV1 and TSV2. Fig. 3 shows the timing diagram of TSV delay variations compensation. In Fig. 3, dcva\_to\_pd signal leads dcvb\_to\_pd signal, and thus, the PD\_B generates dcv\_up signal to the CTRL\_B to increase the delay time of the DCV\_A. After two times polarity change of the PD\_B from dcv\_up to dcv\_down or dcv\_down to dcv\_up, the dcv\_lock signal is pulled high to stop tuning the control code of the DCV\_A and the DCV\_B. Then, the phase error between dcva\_to\_pd signal and dcvb\_to\_pd signal are eliminated which means the delay variations between TSV1 and TSV2 is compensated.

Second, after TSVs delay variations are compensated, the clock gate signal is set to zero for three consecutive clock cycles to stop the DIE1 CLK signal propagating to the upper delay path. The clock-gating is performed to recognize the first positive edge transition of the fb clk signal for the next locking procedure. Third, after clock-gating is performed, the path control signal and clock gate signal are pulled high, and the DIE1 CLK signal passes through buf A, DCDL A, buf B, TSV1, DCV A, buf E, buf F, DCV\_B, TSV2, buf\_D, DCDL\_B, buf\_C, and a frequency-dividedby-2 circuit to the phase detector A (PD A) denoted as fb div2. In addition, the DIE1 CLK signal is divided by 2 and sent to the PD A denoted as clk div2. The PD A detects the phase relationship between the clk div2 signal and fb div2 signal, and it outputs dcdl\_up signal and dcdl\_down signal to the CTRL\_A. The CTRL\_A outputs the delay line control code (dcdl code[10:0]) for adjusting the delay time of the DCDL A and the DCDL B.



FIGURE. 4 TIMING DIAGRAM OF DIE-TO-DIE CLOCK SYNCHRONIZATION.

 $T_{buf\_A} + T_{DCDL\_A} + T_{buf\_B} + T_{TSV1} + T_{DCV\_A}$   $+ T_{buf\_E} + T_{buf\_F} + T_{DCV\_B} + T_{TSV2} + T_{buf\_D}$   $+ T_{DCDL\_B} + T_{buf\_C} = 2N \times T_{DIE1\_CLK}$   $2 \times \left( T_{buf\_A} + T_{DCDL\_A} + T_{buf\_B} + T_{TSV1} + T_{DCV\_A} + T_{buf\_E} \right) = 2N \times T_{DIE1\_CLK}$ (1)

Fig. 4 shows the timing diagram of die-to-die clock synchronization. After clock\_gate signal is pulled high, the CTRL\_A starts to align the phase of clk\_div2 signal and fb\_div2 signal. Because the DCDL\_A and DCDL\_B are set to a minimum delay time in the beginning, the fb\_div2 signal will lead to the clk\_div2 signal. The CTRL\_A will keep increasing the delay time of the DCDL\_A and DCDL\_B until the polarity of the PD\_A changes from dcdl\_down to dcdl\_up. The lock condition of the ADDLL can be expressed as Eq. 1. Since the total delay in the upper delay path will be equal to the lower delay path, and thus, after ADDLL is locked,

the total delay time in the upper path will be equal to N×T<sub>DIE1\_CLK</sub> which means the phase error between the DIE1\_CLK signal and DIE\_2 signal is cancelled.

# III. EXPERIMENTAL RESULTS



FIGURE 5. LAYOUT OF THE TEST CHIP

The proposed ADDLL is implemented in a standard performance 90nm 1P9M CMOS process with a 1.0V power supply. Fig. 5 shows the layout of the test chip, and the active area of the test chip is  $300\mu m \times 300\mu m$ . The area of the proposed ADDLL per die is  $0.045 \mu m$ , and two delay lines are added in the test chip for simulation of the TSV delay. Table I shows the simulation delay time of the DCV-based delay line (DCV\_A and DCV\_B) with PVT variations. The worse resolution of the TSV delay variations. The proposed DCDL is composed of a coarse-tuning delay line and a fine-tuning delay line. The worse coarse-tuning resolution of the DCDL is 93ps with PVT variations.

TABLE I. DELAY TIME OF DCV-BASED DELAY LINE.

|                    | FF    | TT     | SS     |
|--------------------|-------|--------|--------|
| Intrinsic<br>Delay | 496ps | 676ps  | 1046ps |
| Maximum<br>Delay   | 762ps | 1024ps | 1585ps |



FIGURE. 6 SIMULATION WAVEFORM OF THE PROPOSED ADDLL.

Fig. 6 shows the simulation waveform of the proposed ADDLL with an 1GHz input clock. After the ADDLL is reset, the path control signal is set to zero, and DCDL A, DCDL B, DCV A, and DCV B are set to provide a minimum delay time. The CTRL B adjusts the DCV\_A and DCV\_B according to the PD\_B's output to increase the delay time of the DCV A or DCV B until the phase error between dcva\_to\_pd signal and dcvb\_to\_pd is eliminated which means the delay variations between TSV1 and TSV2 is compensated. After TSV delay variations are compensated, the DCV-based delay line control code (dcva[5:0] and dcvb[5:0]) is fixed. Then, path control signal is pulled high, and the CTRL A adjusts the DCDL A and DCDL B according to the PD A's output to reduce the phase error between clk\_div2 signal and fb\_div2 signal. After the ADDLL is locked, the phase error between clk div2 signal and fb\_div2 signal is eliminated, and the phase error between the DIE1\_CLK signal and DIE2\_CLK signal is also cancelled, as explained in Section II. In Fig. 6, the delay variations between TSV1 and TSV2 is 176.4ps, after the proposed ADDLL is locked, the phase error between DIE1 CLK signal and DIE2 CLK signal is reduced to 21.9ps.

TABLE II. PERFORMANCE COMPARISONS

|                                    | Proposed             | JSSC'13               | TCAS-I'13              | ISCAS'12                |
|------------------------------------|----------------------|-----------------------|------------------------|-------------------------|
|                                    | Toposeu              | [2]                   | [3]                    | [4]                     |
| Туре                               | All-Digital          | All-Digital           | All-Digital            | Analog                  |
| Process                            | 90nm                 | 130nm                 | 90nm                   | 0.18µm                  |
| Supply<br>Voltage                  | 1.0V                 | 1.2V                  | 1.0V                   | 1.8V                    |
| Frequency                          | 300MHz<br>~<br>1 GHz | 200MHz<br>~<br>1.6GHz | 50 MHz<br>~<br>600 MHz | 556 MHz<br>~<br>1.5 GHz |
| Phase<br>Error                     | < 21.9ps             | <50ps*                | < 15.8ps               | < 2ps                   |
| Area per<br>Die (mm <sup>2</sup> ) | 0.045                | 0.06                  | 0.0044                 | N/A                     |
| Power                              | 3.27mW<br>@1GHz      | 0.9mW<br>@1.6GHz      | 1.8mW<br>@600MHz       | 56mW<br>@1.5GHz         |

\*if replica delay matched perfectly

Table II shows the performance comparisons with the state-ofthe-arts. In the DBDA [2], they needs to replicate the delay of the inter-die TSV wire delay, Therefore, the unexpected TSV delay variations can greatly increase the phase error of clock signals in multiple layers of a 3D-IC after the DLL is locked. In the duallocking DLL [3], after the DLL is locked, the dual-locking DLL needs to continue fine-tuning the two DLLs in an interleaved manner to keep maintaining the phase alignment. However, the regularly switching the direction of the forward path and the feedback path and perform fine-tuning in two DLLs may cause a relatively large error during phase maintaining mode. The dual-delay-locked loop (D-DLL) [4] does not need to switch the direction of the forward path and the feedback path since a bidirectional buffer is applied. However, two DLLs are working at the same time which increases the design complexity to maintain the stability of the D-DLL. The relative high power consumption is also the disadvantage of the analog chargepump-based D-DLL.

### IV. CONCLUSION

In this paper, an all-digital delay locked loop (ADDLL) for 3D-IC die-to-die synchronization with two TSVs is presented. The delay variations of TSVs will be compensated before the ADDLL's normal operation. As compared with current DLLs for 3D-IC clock deskew

applications, the proposed ADDLL does not need to switch the path during phase maintaining mode. The proposed ADDLL can operate with a 300MHz – 1GHz input clock, and the maximum phase error is smaller than 21.9ps. In addition, the lock-in time is 79 cycles at 1GHz. Furthermore, the proposed ADDLL is implemented with standard cells, and the proposed design can be ported to different process in a short time. Therefore, the proposed ADDLL is very suitable for 3D-IC die-to-die clock synchronization applications.

#### ACKNOWLEDGMENT

The authors would like to appreciate their colleagues in the Silicon Sensor and System (S3) Laboratory of National Chung Cheng University for many beneficial discussions. The EDA tools supported by National Chip Implementation Center (CIC) are acknowledged as well.

#### REFERENCES

- [1] Ching-Che Chung and Chia-Lin Chang, "A wide-range alldigital delay-locked loop in 65nm CMOS technology, " *in Proceedings of International Symposium on VLSI Design*, *Automation, and Test (VLSI-DAT)*, Apr. 2010, pp. 66-69.
- [2] Soo-Bin Lim, Hyun-Woo Lee, Junyoung Song, and Chulwoo Kim, "A 247 μW 800 Mb/s/pin DLL-based data self-aligner for through silicon via (TSV) interface, "*IEEE Journal of Solid-State Circuits*, vol. 48, no. 3, pp. 711-723, Mar. 2013.
- [3] Ji-Wei Ke, Shi-Yu Huang, Chao-Wen Tzeng, Ding-Ming Kwai, and Yung-Fa Chou, "Die-to-die clock synchronization for 3-D IC using dual locking mechanism," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 4, pp. 908-917, Apr. 2013.
- [4] Ai-Jia Chuang, Yu Lee, and Ching-Yuan Yang, "A chip-tochip clock-deskewing circuit for 3-D ICs," in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), May 2012, pp. 1652-1655.
- [5] Fangming Ye and Krishnendu Chakrabarty, "TSV open defects in 3D integrated circuits: Characterization, test, and optimal spare allocation," in Proceedings of 49th ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp. 1024-1030.
- [6] Yu-Hsiang Lin, Shi-Yu Huang, Kun-Han Tsai, Wu-Tung Cheng, Stephen Sunter, Yung-Fa Chou, and Ding-Ming Kwai, "Parametric delay test of post-bond through-silicon vias in 3-D ICs via variable output thresholding analysis," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 32, no. 5, pp. 737-747, May 2013.
- [7] Sergej Deutsch and Krishnendu Chakrabarty, "Non-invasive pre-bond TSV test using ring oscillators and multiple voltage levels," in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), Mar. 2013, pp. 1065-1070.
- [8] Jhih-Wei You, Shi-Yu Huang, Ding-Ming Kwai, Yung-Fa Chou, and Cheng-Wen Wu, "Performance characterization of TSV in 3D IC via sensitivity analysis," *in Proceedings of IEEE Asian Test Symposium (ATS)*, Dec. 2010, pp. 389-394.
- [9] Ching-Che Chung and Chang-Jun Li, "A low-power delayrecycled all-digital duty-cycle corrector with unbalanced process variations tolerance," *in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT)*, Apr. 2013.