Contents lists available at ScienceDirect

ELSEVIER



## Microelectronics Journal

journal homepage: www.elsevier.com/locate/mejo

# An all-digital delay-locked loop for 3-D ICs die-to-die clock deskew applications



### Ching-Che Chung<sup>\*</sup>, Chi-Yu Hou

Department of Computer Science and Information Engineering, National Chung Cheng University, No. 168 University Rd., Min-Hsiung, Chia-Yi, Taiwan

| ARTICLE INFO                                                                                                                                                                                         | A B S T R A C T                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| Index terms:<br>All-digital delay-locked loop<br>Digitally controlled delay line<br>Clock synchronization<br>Clock distribution<br>De-skew<br>Through-silicon-via (TSV)<br>3-D IC<br>3-D integration | In a system on a chip (SoC), there are several long global wires that typically limit the maximum SoC clock speed. Therefore, through-silicon via (TSV) technology has been proposed to shorten the length of the global wires. However, the TSV delay variation phenomenon created during the manufacturing process may prevent SoC systems from working properly. This TSV delay variation problem affects data transmission between dies. In this paper, we present an all-digital delay-locked loop (ADDLL) architecture to synchronize clock signals between two dies. We implement the proposed ADDLL in TSMC 90-nm CMOS process with standard cells, which can tolerate process, voltage, and temperature (PVT) variations. In addition, the ADDLL architecture can overcome the TSV delay variation problem using only a single TSV channel. The proposed ADDLL can operate in an input frequency range of 195–960 MHz with a maximum phase error of less than 40.6 ps. |  |  |  |

#### 1. Introduction

Using the skillful system-on-chip (SoC) technologies, more modules can be integrated into a single chip. However, two-dimensional (2-D) SoCs can have several long global wires that make timing closure more difficult and limit the maximum SoC clock rate. To address this situation, through-silicon via (TSV) package technology enables the threedimensional (3-D) integration of dies to shorten the length of the global wires. As shown in Fig. 1, TSVs can connect two 2-D integrated circuits (ICs) and increase the number of I/Os.

For complex applications, TSVs can be applied to the network-on-chip (NoC) system to connect multi-processors in 3-D-NoC architectures [1,2]. TSVs are also applied in 3-D random access memory (RAM) designs [3,4] to increase bandwidth and lower the energy consumption of the memory interface. 3-D power-supply networks have been proposed to improve IR-drop and dynamic noise [5–7]. Distributed topology power planning yields better results than traditional 2-D power planning and clustered power planning. Previous studies have discussed in detail the clock distribution networks in 3-D ICs [15–22]. Maintaining a symmetric interconnected structure to distribute the clock signal in 3-D ICs is a difficult task due to the TSV delay variations.

Several researchers have evaluated scalable TSV electric models using different parameters, such as the TSV diameter, pitch between TSVs, and TSV height [8–11]. The RC time constant for a TSV is extremely small

compared with the intrinsic gate delay. However, the capacitance of a TSV (diameter > 10  $\mu m$ ) is much larger than that of a usual wire [10,11]. Therefore, a TSV causes an extra delay in the driving gate. As reported by Ref. [3], in a 4-stack 3-D RAM design, the increased overall propagation delay due to the loading of the TSV causes a latency of 750 ps at 1333 Mb/s.

The advantages of TSV technology include performance improvement, power reduction, and an increment in the chip density. There are several challenges associated with the design of 3-D ICs, including yield, power density, design flow, clock distribution, and IC testing. Defects in the TSVs incorporated during the fabrication process decrease the yield and reliability of 3-D ICs [14]. In addition, defects in TSVs cause variation in their RC parameters and therefore cause variations in the propagation delay between TSVs. Moreover, TSV-induced stress can impact the performance of circuits in close proximity to TSVs [14]. Several non-invasive TSV testing approaches [12,13] have been proposed to measure the delay variations in nets connected to TSVs. One study reported that for a 5  $\mu$ m  $\times$  5  $\mu$ m faulty TSV, with a faulty resistance of 10 k $\Omega$ , the delay time added to the fault-free TSV delay can be ~400 ps [13].

The 3-D IC architecture shown in Fig. 2 illustrates the impact of TSV delay variation. Two dies are connected by several TSVs and data signals are transmitted between them. In addition, the output from the clock of DIE1 (DIE1\_CLK) is also sent to DIE2 through the TSV. Fig. 3 shows that DIE1\_CLK and DIE1\_DATA are transmitted from DIE1 to DIE2, which are

https://doi.org/10.1016/j.mejo.2017.10.008

Received 19 November 2016; Received in revised form 29 June 2017; Accepted 23 October 2017

0026-2692/© 2017 Published by Elsevier Ltd.

<sup>\*</sup> Corresponding author. E-mail address: wildwolf@cs.ccu.edu.tw (C.-C. Chung).



Fig. 1. 2-D SoC and 3-D IC.

denoted as DIE1-2\_CLK and DIE1-2\_DATA, respectively. At this point,



Fig. 2. Clock and data signals between two dies.

DIE1-2\_CLK leads DIE1-2\_DATA due to the TSV delay variation phenomenon. In DIE2, DIE1-2\_DATA will be sampled by the positive edge of DIE1-2\_CLK and be denoted as DIE2\_DATA. Subsequently, DIE2\_DATA is sent back to DIE1 and is denoted as DIE2-1\_DATA. In this case, if DIE2-1\_DATA can be successfully sampled by the positive edge of DIE1\_CLK, Eqs. (1) and (2) should be satisfied.

$$T_{hold} < \Delta < T_{cycle} - T_{setup}$$
<sup>(1)</sup>

$$T_{\text{hold}} < T_{\text{clk1-2}} + T_{\text{data2-1}} < T_{\text{cycle}} - T_{\text{setup}}$$
(2)

where  $T_{hold}$  is the hold time requirement of the D-Flip/Flops,  $T_{setup}$  is the setup time requirement of the D-Flip/flops,  $T_{cycle}$  is the period of the DIE1\_CLK,  $T_{clk1-2}$  is the delay time while DIE1\_CLK is being transmitted from DIE1 to DIE2, and  $T_{data2-1}$  is the delay time while DIE2\_DATA is being transmitted from DIE2 to DIE1.  $\Delta$  is the delay variation between TSVs. However,  $T_{clk1-2}$ ,  $T_{data2-1}$ , and  $\Delta$  are unknown values because the TSV delay is determined after fabrication. As a result, it is difficult to ensure that both Eqs. (1) and (2) are satisfied.

To simplify the problem of data transmission between the dies, the clock signals (DIE1\_CLK and DIE1-2\_CLK) should be phase-aligned, as illustrated in Fig. 4. Then, to accurately sample DIE2-1\_DATA by the positive edge of DIE1\_CLK, Eq. (3) should be satisfied. In Eq. (3), the only unknown value is T<sub>data2-1</sub>, and therefore can be easily satisfied.

$$\Gamma_{\text{hold}} < T_{\text{data2-1}} < T_{\text{cycle}} - T_{\text{setup}}$$
(3)

3D-ICs die-to-die clock synchronization was proposed in Refs. [23–27,29]. In a delay-locked loop (DLL)-based data self-aligner (DBDA) [23], a replica TSV delay is used in the DLL circuit of the DBDA to reduce the data conflict time between the memory outputs of the stacked dies, thereby reducing the short current. However, the replica TSV delay is not reliable without calibration.

A dual-locking DLL [24] was proposed for die-to-die synchronization of 3-D ICs. It requires a long lock-in time and perfect matching between two averaging delay lines. In addition, if there are temperature or voltage variations after the first DLL is locked, the two DLLs of the dual-locking DLL [24] must be restarted to minimize clock skew. An all-digital delay-locked loop (ADDLL) [26] was proposed for die-to-die synchronization of 3-D ICs. However, the proposed architecture requires two TSV channels and two high-resolution varactor-based delay lines to compensate for the delay variations between the two TSV channels. Therefore, the ADDLL [26] occupied a large chip area to accommodate the varactor-based delay lines. Moreover, the mismatch between the average delay lines of DLLs [24,26] can be an order of magnitude higher than hundreds of picoseconds due to within-die process variations in the advanced CMOS process, which can lead to a significant clock skew after the DLL is locked, as discussed in Ref. [29].

A dual-delay-locked loop (D-DLL) was proposed for die-to-die clock deskew circuit applications in 3D-ICs [25]. This proposed design contains two analog charge-pump-based DLLs. However, special bidirectional buffers are required to simultaneously transmit signals between two directions on a single TSV. Furthermore, the two DLLs operate at the same time, which increases the design complexity to ensure the stability of the two loops. Moreover, in advanced CMOS processes, the high voltage gain of the delay line at a low supply voltage and the leakage current problem of MOS transistors must be overcome for operation over a wide frequency



∆: TSV delay variation (DIE1\_to\_DIE2)

Fig. 3. Clock and data signals timing diagram with TSV delay variations.



Fig. 4. Clock and data signals timing diagram with clock synchronization.



Fig. 5. Proposed ADDLL for 3D-IC clock synchronization.



Fig. 6. Timing diagram of the locking procedure.

range. The above problems present challenges in the design of the D-DLL circuit.

In this paper, we present an all-digital delay-locked loop (ADDLL) that

can achieve die-to-die clock synchronization for 3D-ICs. Our proposed ADDLL uses a single TSV channel and a time-to-digital converter (TDC) to reduce the lock-in time. No averaging delay lines are used in our



Fig. 7. (a) The pulse generator (PG). (b) Timing diagram of the PG.

proposed ADDLL, so the extra clock skew caused by a mismatch in delay lines [24,26,29] is eliminated. The rest of this paper is organized as follows. We present the architecture of the proposed ADDLL in section II. In section III, we describe the circuit implementation of the proposed design. In section IV, we present our experimental results. Lastly, the conclusion is given in Section V.

#### 2. The proposed ADDLL architecture

Fig. 5 shows the proposed ADDLL for die-to-die clock synchronization in 3D-ICs. The ADDLL is composed of an 11-bit digital-controlled delay line (DCDL), a pulse generator (PG), a TDC-embedded controlled delay line (CDL), a phase detector (PD), two delay circuits (DELAY), two ADDLL controllers (DIE1\_CTL and DIE2\_CTL), and six tri-state buffers. In addition, the DCDL is controlled by a 6-bit coarse-tuning control code (coarse\_code [5:0]) and a 5-bit fine-tuning control code (fine\_code [4:0]). DIE1\_CLK is the reference clock of DIE1, and is transmitted to DIE2 as its reference clock. The goal of the proposed ADDLL is to phase-align clock signals DIE1\_CLK and DIE2\_CLK. To achieve die-to-die clock synchronization, there are two steps in our proposed ADDLL.

First, when the ADDLL is reset, pg\_en is set to high, and pass\_clk is set to low. The signal oe of DIE2 is set to low. Then, DIE1\_CLK provides an input to the PG, which then generates a single-shot pulse signal. After a pulse is generated, pg\_en is set to low. The single-shot pulse is transmitted to DIE2 through the TSV channel and circulates in DIE2 from the upper delay circuit (tri-state buffer E + DELAY) to the bottom delay circuit (DELAY + tri-state buffer F), before being transmitted back to DIE1 through the same TSV channel. At this point, the TDC-embedded CDL starts to calculate the time difference between the first pulse generated by the PG and the second pulse sent back from DIE2. Then, the delay time of the TDC-embedded CDL is set to half the time difference (between two pulses) to mirror the summation of the delay times of the TSV channel,



Fig. 8. The 11-bit digital controlled delay line (DCDL).



Fig. 9. The fine-tuning delay line (FDL).



Fig. 10. The TDC-Embedded CDL



Fig. 11. Microphotograph of the proposed ADDLL.

tri-state buffer, and DELAY circuit. After the first step is finished, oe is permanently set to low.

Next, the ADDLL locking procedure starts. The signal pass\_clk is set to high and DIE1\_CLK is input to the TDC-embedded CDL and transmitted to DIE2 through the TSV channel. The delayed signal, mirror\_sig, is generated by the TDC-embedded CDL to mirror the delay time of the TSV channel, tri-state buffer, and DELAY circuit. The DIE1\_CTL begins to increase the delay time of the DCDL until mirror\_sig and DIE1\_CLK are phase-aligned, which means that DIE1\_CLK and DIE2\_CLK will also be phase-aligned. This is because the total delay time of the TSV channel, tristate buffer, and DELAY circuit is mirrored by the TDC-embedded CDL in the first step. Thus, mirror\_sig and DIE2\_CLK are also phase-aligned. Finally, when the ADDLL is locked, DIE1\_CTL starts to fine-tune the delay time of the DCDL to maintain phase alignment between DIE1\_CLK and mirror\_sig in the maintain mode.

The propagation delay time between DIE1\_CLK and DIE2\_CLK ( $T_{clk12}$ ) can be expressed as in Eq. (4).

$$T_{clk12} = T_{DIE1} + T_{TSV} + T_{DIE2}$$

$$= T_A + T_{DCDL} + T_B + T_{TSV} + T_E + T_{DELAY}$$
(4)

where  $T_{DIE1}$  and  $T_{DIE2}$  are the propagation delay times of DIE1\_CLK in DIE1 and DIE2, respectively.  $T_A$  and  $T_B$  are the delay times of the tri-state buffers A and B in DIE1.  $T_{DCDL}$  is the delay time of the DCDL in DIE1. The propagation delay times of the TSV channel, the tri-state buffer E, and the DELAY circuit in DIE2 are denoted as  $T_{TSV}$ ,  $T_E$ , and  $T_{DELAY}$ , respectively. Since the total delay of  $T_{TSV}$ ,  $T_E$ , and  $T_{DELAY}$  is mirrored by the TDC-embedded CDL circuit, DIE1\_CTL of the ADDLL can tune the delay time of the DCDL. This maintains phase alignment between DIE1\_CLK and mirror\_sig, which leads to the elimination of phase error between DIE1\_CLK and DIE2\_CLK.

Fig. 6 details the timing diagram of the ADDLL locking procedure. When the ADDLL is reset, DIE1\_CLK inputs to the PG, which is controlled by pg\_en. The pg\_en is set to high by the first negative edge of DIE1\_CLK, and is set to low by the second negative edge of DIE1\_CLK. This implies that the pulse signal is generated by the PG circuit between the first and second negative edges of DIE1\_CLK. The pulse transmits through the TSV channel to DIE2. When the pulse is transmitted to DIE2, the signal, or\_sig, is generated by the 2-input oR gate. Later, the pulse is transmitted to DIE2\_CLK from the upper delay circuit (tri-state buffer E + DELAY) and the second or\_sig pulse is generated. At this point, oe is set to high to allow the pulse to be sent back to DIE1. When the pulse is transmitted back to DIE1, oe is set to low. Subsequently, after the TDC operation, the tdc\_code [5:0] of the TDC-embedded CDL is set to half the delay time of the pulse circulation time to mirror the summation of the delay times of the TSV channel, tri-state buffer, and DELAY circuit.

The signal pass\_clk is set to high after the delay time of the TSV channel, tri-state buffer, and DELAY circuit is mirrored by setting the tdc\_code [5:0] of the TDC-embedded CDL. Then, DIE1\_CTL starts to increase the coarse-tuning control code of the DCDL (coarse\_code [5:0]) until the lock signal is set to high, which means the rising edge of mirror\_sig has fallen into the locking window of the PD. This is because the total delay time from DIE1\_CLK to mirror\_sig is the same as the delay time from DIE1\_CLK to DIE2\_CLK if the TDC-embedded CDL accurately mirrors the delay time of the TSV channel, tri-state buffer, and DELAY circuit. Therefore, when DIE1\_CLK is phase-aligned with mirror\_sig, DIE2\_CLK will also be phase-aligned with DIE1\_CLK. Finally, DIE1\_CTL continues to fine-tune the DCDL to maintain phase alignment between DIE1\_CLK and mirror\_sig.

Compared to [24,26,29], the DELAY circuits in DIE2 only have several buffers to reduce the frequency of the signal tsv\_clk in the pulse



Fig. 12. DCO circuit for providing high-speed on-chip clock.



Fig. 13. Output frequency divider circuit.

routing state shown in Fig. 6. Therefore, the within-die process variations between the DELAY circuits can be easily reduced by placing these circuits closer together in the layout design. In addition, if there are high thermal gradients between DIE1 and DIE2 after the ADDLL is locked, the ADDLL can quickly be restarted to align the clock phase again.

#### 3. Circuit implementation

To mirror the delay time of the TSV channel, tri-state buffer, and

DELAY circuit with the TDC-embedded CDL, the single-shot pulse signal generated by the PG must have a sufficient pulse width. Fig. 7(a) shows the circuit of the proposed PG. Two tri-state buffers are used to avoid generating a second pulse signal, because only one single-shot pulse is needed. Fig. 7(b) shows the timing diagram of the proposed PG. First, pg\_en is set to high, and DIE1\_CLK is input to the PG. The "a" signal is delayed by the tri-state buffer and the "b" signal is delayed by the delay chain. At this point, the AND gate generates the pulse signal PG\_OUT and the pulse width can be controlled by adjusting the delay time of the



Fig. 14. Output clock at 960 MHz. (a) Phase error between DIE1\_CLK and DIE2\_CLK. (b) Jitter histogram of the DIE2\_CLK.

#### delay chain.

Fig. 8 shows the 11-bit digital-controlled delay line (DCDL) circuit [28], which is composed of 63 lattice delay units (LDUs), a fine-tuning delay line (FDL), and a delay line decoder. The coarse-tuning resolution of the delay line is the delay time of two NAND gates. Dummy NAND gates are used to balance the capacitance loading of each NAND gate. The 11-bit control code can be encoded as a 63-bit thermal meter coarse-tuning code (coarse [62:0]) and a 31-bit thermal meter fine-tuning code (fine [30:0]) to control the CDL and the FDL, respectively.

The FDL circuit [28] is composed of two tri-state buffer arrays connected in parallel, two input buffers, and an output inverter (Fig. 9). The delay time difference between CA\_OUT and CB\_OUT is one coarse-tuning resolution, as shown in Fig. 8. The thermal meter fine-tuning control code (fine [30:0]) can control the driving strength of the tri-state buffer array. When the fine-tuning code is fully opened (31'h7FFF\_FFFF), the rising edge of OUT will phase align with the falling edge of CA\_OUT. Conversely, when the thermal meter fine-tuning code is fully closed (31'h0000\_0000), the rising edge of OUT will phase align with the falling edge of CB\_OUT. The FDL can enhance the resolution of the DCDL as well as providing a total delay controllable range equal to one coarse-tuning resolution.

The proposed TDC-embedded CDL comprises 63 lattice delay units



Fig. 15. Output clock at 195 MHz. (a) Phase error between DIE1\_CLK and DIE2\_CLK. (b) Jitter histogram of the DIE2\_CLK.

(LDUs), 64 DFFs (time-to-digital converter units), a TDC encoder, and a TDC decoder (Fig. 10). First, the delay time of the TDC-embedded CDL is set to the maximum value. This implies that the signal can be transmitted from all the upper-path to all the bottom-path delay cells. The first pulse generated by the PG can propagate through the whole delay path. When the second pulse loop is transmitted back from DIE2 and into the TDC-embedded CDL, the DFFs will be triggered to record the time difference between the pulses. The TDC encoder then encodes the sampled value (tdc\_data [63:0]) into tdc\_code [5:0] and sends it to DIE1\_CTL. Finally, the TDC decoder decodes tdc\_code [5:0] as the thermal meter control code (code [62:0]), which sets the delay time of the delay path to half the time difference between the pulses.

#### 4. Experimental results

The proposed ADDLL is fabricated using a 90-nm standard performance (SP) CMOS process with standard cells. Fig. 11 shows a microphotograph of the ADDLL. The core area is  $170 \times 170 \ \mu\text{m}^2$  and the chip area, including the I/O pads, is  $670 \times 670 \ \mu\text{m}^2$ . The chip consists of an ADDLL and a tester circuit. In addition, the TSV model includes a delay line circuit used to model the delay of the TSV channel between DIE1 and DIE2. In addition, the TSV model can provide a delay ranging from 100ps to 450 ps.

#### Table 1

|                         | This Work           | JSSC'13<br>[23]   | TCAS-I'13<br>[24] | ISCAS'12<br>[25]             |
|-------------------------|---------------------|-------------------|-------------------|------------------------------|
| Туре                    | All-Digital         | All-Digital       | All-Digital       | Analog                       |
| Process                 | 90 nm               | 130 nm            | 90 nm             | 0.18 µm                      |
| Supply Voltage          | 1.0 V               | 1.2 V             | 1.0 V             | 1.8 V                        |
| Number of TSV channel   | 1                   | 2                 | 2                 | 2                            |
| Frequency               | 195 MHz–960 MHz     | 200 MHz-1.6 GHz   | 50 MHz-600 MHz    | 556 MHz-1.5 GHz              |
| Phase Error             | 9.8 ps @(195 MHz)   | <50 ps @(400 MHz) | 15.8 ps @(50 MHz) | 2 ps @(1.5 GHz) <sup>b</sup> |
|                         | 40.64 ps @(960 MHz) |                   | 9.6 ps @(600 MHz) |                              |
| Area (mm <sup>2</sup> ) | 0.0289              | 2.6 <sup>a</sup>  | 0.0088            | N/A                          |
| Power                   | 5.5 mW @960 MHz     | N/A               | 1.8 mW @600 MHz   | 56 mW @1.5 GHz <sup>b</sup>  |
|                         | 1.12 mW @195 MHz    |                   |                   |                              |

<sup>a</sup> Test chip area.

<sup>b</sup> Pre-layout simulation result.

Due to the speed limitations of the I/O pads provided by the celllibrary vendor, signals with a frequency higher than 300 MHz cannot be transmitted through the I/O pads. Hence, we use a digitally controlled oscillator (DCO) to generate an on-chip high-speed clock (SYSTEM\_CLK) with various frequencies to test the proposed ADDLL (Fig. 12). The DCO is composed of a fast-speed DCO and a slow-speed DCO, and its operating frequency range is 144–960 MHz. We apply the signals EN\_FAST\_DCO and EN\_SLOW\_DCO to control the two DCOs. The signal SYSTEM\_CLK is generated by an AND gate and the inputs of the AND gate are from the output of the two DCOs. Then, SYSTEM\_CLK is sent to the proposed ADDLL as DIE1\_CLK.

The divider circuit is designed to divide high frequency clock signals into low frequency clock signals, as shown in Fig. 13. When the RESET signal is set to low, the top divider path divides the SYSTEM\_CLK signal by eight. The clock signals from DIE1 (DIE1\_CLK) and DIE2 (DIE2\_CLK) can be divided by two or eight after the proposed ADDLL is locked. Then, the divided clock signals can be sent to the I/O pads for off-chip measurement.

Fig. 14 shows the measured output clocks of the proposed ADDLL at 960 MHz. These clock signals are divided by eight, as explained in Fig. 13. In Fig. 14(a), signal no. 2 is DIE1\_CLK\_DIV and signal no. 3 is DIE2\_CLK\_DIV. After the proposed ADDLL is locked, the phase error between DIE1\_CLK and DIE2\_CLK is reduced to 40.64 ps. Moreover, as shown in Fig. 14(b), the peak-to-peak ( $P_{K}$ - $P_{K}$ ) jitter and the root mean square (RMS) jitter of DIE2\_CLK\_DIV are 8.04 ps and 1.28 ps, respectively.

Fig. 15 shows the measured clock output of the proposed ADDLL at 195 MHz. These clock signals are also divided by eight, as illustrated in Fig. 13. In Fig. 15(a), signal no. 2 is DIE1\_CLK\_DIV and signal no. 3 is DIE2\_CLK\_DIV. After the proposed ADDLL is locked, the phase error between DIE1\_CLK and DIE2\_CLK is reduced to 9.8 ps. Moreover, as shown in Fig. 15(b), the PK-PK jitter and RMS jitter of DIE2\_CLK\_DIV are 21.49 ps and 3.63 ps, respectively. In addition, the lock-in time of the proposed ADDLL is less than 50 cycles at 960 MHz from the post-layout simulation results with process, voltage, and temperature variations.

Table 1 compares the proposed ADDLL with existing systems. The proposed ADDLL can eliminate clock skew between two dies, even with TSV delay variations. In addition, the area of the proposed ADDLL architecture is smaller than that reported by Lim et al. [23]. Compared to the system proposed by Ke et al. [24], the measured operation frequency of the proposed design is higher and compared to the system proposed by Chuang et al. [25], the proposed design has lower power consumption.

The mismatch between the average delay lines of the DLLs [24] can be an order of magnitude higher than hundreds of picoseconds due to the within-die process variations in advanced CMOS processes, and this could lead to a significant clock skew after the DLL is locked, as discussed in Ref. [29]. In addition, in Ref. [25], two DLLs are in operation at the same time to ensure that the two loops are stable, which increases the design complexity. Moreover, in advanced CMOS processes, the high voltage gain of the delay line at low supply voltage and the leakage current problem of the MOS transistors should be overcome for operations over a wide frequency range in the charge-pump-based architecture. In summary, the proposed ADDLL can avoid TSV delay variation and delay line mismatch problems and is suitable for die-to-die clock synchronization in 3-D ICs.

#### 5. Conclusion

In this paper, we proposed an ADDLL design to synchronize the clock signals between two dies. The proposed ADDLL can operate at an input frequency ranging from 195 MHz to 960 MHz, with a maximum phase error of <40.64 ps. The proposed architecture can be easily ported to different processes in a short time because it is implemented with standard cells. As a result, the proposed ADDLL architecture is suitable for die-to-die clock synchronization in 3-D ICs.

#### Acknowledgements

This work was supported in part by the Ministry of Science and Technology of Taiwan under Grant MOST-103-2221-E-194-063-MY3.

#### References

- [1] Yaoyao Ye, Jiang Xu, Baihan Huang, Xiaowen Wu, Wei Zhang, Xuan Wang, Mahdi Nikdast, Zhehui Wang, Weichen Liu, Zhe Wang, 3-D mesh-based optical network-on-chip for multiprocessor system-on-chip, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 32 (4) (Apr. 2013) 584–596.
- [2] M.H. Jabbar, D. Houzet, O. Hammami, 3D multiprocessor with 3D NoC architecture based on Tezzaron technology, in: Proceedings of IEEE International 3D Systems Integration Conference, Jan. 2012, pp. 1–5.
- [3] Uksong Kang, Hoe-Ju Chung, Seongmoo Heo, Duk-Ha Park, Hoon Lee, Jin Ho Kim, Soon-Hong Ahn, Soo-Ho Cha, Jaesung Ahn, DukMin Kwon, Jae-Wook Lee, Han-Sung Joo, Woo-Seop Kim, Dong Hyeon Jang, Nam Seog Kim, Jung-Hwan Choi, Tae-Gyeong Chung, Jei-Hwan Yoo, Joo Sun Choi, Changhyun Kim, Young-Hyun Jun, 8 Gb 3-D DDR3 DRAM using through-silicon-via technology, IEEE J. Solid-State Circuits 45 (1) (Jan. 2010) 111–119.
- [4] Christian Weis, Igor Loi, Luca Benini, Norbert Wehn, Exploration and optimization of 3-D integrated DRAM subsystems, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 32 (4) (Apr. 2013) 597–610.
- [5] Michael B. Healy, Sung Kyu Lim, Distributed TSV topology for 3-D power-supply networks, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20 (11) (Nov. 2012) 2066–2079.
- [6] Kiyeong Kim, Chulsoon Hwang, Kyoungchoul Koo, Jonghyun Cho, Heegon Kim, Joungho Kim, Junho Lee, Hyung-Dong Lee, Kun-Woo Park, Jun So Pak, Modeling and analysis of a power distribution network in TSV-based 3-D memory IC including P/G TSVs, on-chip decoupling capacitors, and silicon substrate effects, IEEE Trans. Comp., Packag, Manuf. Technol. 2 (12) (Dec. 2012) 2057–2070.
- [7] Eunseok Song, Kyoungchoul Koo, Jun So Pak, Joungho Kim, Through-silicon-viabased decoupling capacitor stacked chip in 3-D-ICs, IEEE Trans. Comp., Packag. Manuf. Technol. 3 (9) (Sep. 2013) 1467–1480.
- [8] Joohee Kim, Jun So Pak, Jonghyun Cho, Eakhwan Song, Jeonghyeon Cho, Heegon Kim, Taigon Song, Junho Lee, Hyungdong Lee, Kunwoo Park, Seungtaek Yang, Min-Suk Suh, Kwang-Yoo Byun, Joungho Kim, High-frequency scalable electrical model and analysis of a through silicon via (TSV), IEEE Trans. Comp., Packag. Manuf. Technol. 1 (2) (Feb. 2011) 181–195.
- [9] Sukeshwar Kannan, Bruce Kim, Sang-Bock Cho, Byoungchul Ahn, Analysis of propagation delay in 3-D stacked DRAM, in: Proceedings of IEEE International Symposium on Circuits and Systems, May 2012, pp. 1839–1842.

- [10] Jhih-Wei You, Shi-Yu Huang, Ding-Ming Kwai, Yung-Fa Chou, Cheng-Wen Wu, Performance characterization of TSV in 3D IC via sensitivity analysis, in: Proceedings of IEEE Asian Test Symposium, Dec. 2010, pp. 394–398.
- [11] Xiaoxia Wu, Wei Zhao, Mark Nakamoto, Chandra Nimmagadda, Durodami Lisk, Sam Gu, Riko Radojcic, Matt Nowak, Yuan Xie, Electrical characterization for intertier connections and timing analysis for 3-D ICs, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20 (1) (Jan. 2012) 186–191.
- [12] Sergej Deutsch, Krishnendu Chakrabarty, Non-invasive pre-bond TSV test using ring oscillators and multiple voltage levels, in: Proceedings of Design, Automation & Test in Europe Conference & Exhibition, Mar. 2013, pp. 1065–1070.
- [13] Yu-Hsiang Lin, Shi-Yu Huang, Kun-Han Tsai, Wu-Tung Cheng, Stephen Sunter, Yung-Fa Chou, Ding-Ming Kwai, Parametric delay test of post-bond through-silicon vias in 3-D ICs via variable output thresholding analysis, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 32 (5) (May 2013) 737–747.
- [14] Thuy Dao, Dina H. Triyoso, Mike Petras, Michael Canonico, Through silicon via stress characterization, in: Proceedings of IEEE International Conference on IC Design and Technology, May 2009, pp. 39–41.
- [15] Michael Buttrick, Sandip Kundu, On testing prebond dies with incomplete clock networks in a 3D IC using DLLs, in: Proceedings of Design, Automation & Test in Europe Conference & Exhibition, Mar. 2011, pp. 1–6.
- [16] Xin Zhao, Saibal Mukhopadhyay, Sung Kyu Lim, Variation-tolerant and low-power clock network design for 3D ICs, in: Proceedings of IEEE Electronic Components and Technology Conference, May 2011, pp. 2007–2014.
- [17] Xi Chen, Ting Zhu, William Rhett Davis, Paul D. Franzon, Adaptive and reliable clock distribution design for 3-D integrated circuits, IEEE Trans. Comp., Packag. Manuf. Technol. 4 (11) (Nov. 2014) 1862–1870.
- [18] Vasilis F. Pavlidis, Ioannis Savidis, Eby G. Friedman, Clock distribution networks in 3-D integrated systems, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19 (12) (Dec. 2011) 2256–2266.
- [19] Jae-Seok Yang, Jiwoo Pak, Xin Zhao, Sung Kyu Lim, David Z. Pan, Robust clock tree synthesis with timing yield optimization for 3D-IC, in: Proceedings of 16th Asia and South Pacific Design Automation Conference, Jan. 2011, pp. 621–626.

- [20] Mosin Mondal, Andrew J. Ricketts, Sami Kirolos, Tamer Ragheb, Greg Link, N. Vijaykrishnan, Yehia Massoud, Thermally robust clocking schemes for 3D integrated circuits, in: Proceedings of Design, Automation & Test in Europe Conference & Exhibition, Apr. 2007, pp. 1–6.
- [21] Kwanyeob Chae, Xin Zhao, Sung Kyu Lim, Saibal Mukhopadhyay, Tier adaptive body biasing: a post-silicon tuning method to minimize clock skew variations in 3-D ICs, IEEE Trans. Comp., Packag. Manuf. Technol. 3 (10) (Oct. 2013) 1720–1730.
- [22] Hu Xu, Vasilis F. Pavlidis, Xifan Tang, Wayne Burleson, Giovanni De Micheli, Timing uncertainty in 3-D clock trees due to process variations and power supply noise, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21 (12) (Dec. 2013) 2226–2239.
- [23] Soo-Bin Lim, Hyun-Woo Lee, Junyoung Song, Chulwoo Kim, A 247 µW 800 Mb/s/ pin DLL-based data self-aligner for through silicon via (TSV) interface, IEEE J. Solid-State Circuits 48 (3) (Mar. 2013) 711–723.
- [24] Ji-Wei Ke, Shi-Yu Huang, Chao-Wen Tzeng, Ding-Ming Kwai, Yung-Fa Chou, Die-todie clock synchronization for 3-D IC using dual locking mechanism, IEEE Trans. Circuits Syst. I Regul. Pap. 60 (4) (Apr. 2013) 908–917.
- [25] Ai-Jia Chuang, Yu Lee, Ching-Yuan Yang, A chip-to-chip clock-deskewing circuit for 3-D ICs, in: Proceedings of IEEE International Symposium on Circuits and Systems, May 2012, pp. 1652–1655.
- [26] Ching-Che Chung, Chi-Yu Hou, All-digital delay-locked loop for 3D-IC die-to-die clock synchronization, in: Proceedings of International Symposium on VLSI Design, Automation, and Test, Apr. 2014, pp. 1–4.
- [27] Kundan Nepal, Soha Alhelaly, Jennifer Dworak, R. Iris Bahar, Theodore Manikas, Ping Gui, Repairing a 3-D die-stack using available programmable logic, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 34 (5) (May 2015) 849–861.
- [28] Ching-Che Chung, Duo Sheng, Chang-Jun Li, A wide-range low-cost all-digital dutycycle corrector, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23 (11) (Nov. 2015) 2487–2496.
- [29] Tejinder Singh Sandhu, Kamal El-Sankary, A mismatch-insensitive skew compensation architecture for clock synchronization in 3-D ICs, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 24 (6) (Jun. 2016) 2026–2039.