## 國立中正大學

## 資訊工程研究所碩士論文

## 應用於三維晶片時脈同步之全數位延遲 鎖相迴路

An all-digital delay-locked loop for 3D ICs die-to-die clock synchronization

- 研究生: 侯紀宇
- 指導教授: 鍾菁哲 博士
- 中華民國 一零三 年 七 月

國立中正大學碩士班研究生

### 學位考試同意書

# 本人所指導資訊工程學系

## 研究生 侯紀宇 所提之論文

應用於三維晶片時脈同步之全數位延遲鎖相迴路 (An all-digital delay-locked loop for 3D ICs die-to-die clock synchronization)

同意其提付 碩 士學位論文考試

鐘書哲 指導教授\_ 簽章 <u>10)年6月6</u>日

## 國立中正大學碩士學位論文考試審定書

#### 資訊工程學系

#### 研究生 侯紀宇 所提之論文

應用於三維晶片時脈同步之全數位延遲鎖相迴路 (An all-digital delay-locked loop for 3D ICs die-to-die clock synchronization) 經本委員會審查,符合碩士學位論文標準。



#### 博碩士論文授權書

(本聯請裝訂於論文紙本書名頁前空白處,供學校圖書館做為授權管理用)

ID:102CCU00392051

本授權書所授權之論文為授權人在國立中正大學(學院)<u>資訊工程研究所</u>系所 \_\_\_\_\_ 組 <u>102</u> 學年度第 <u>二</u> 學期取得 <u>項</u>士學位之論文。

論文題目:應用於三維晶片時脈同步之全數位延遲鎖相迴路

指導教授: 鍾菁哲, Ching-Che Chung

茲同意將授權人擁有著作權之上列論文全文(含摘要),提供讀者基於個人非營利性質之線上檢 索、閱覽、下載或列印,此項授權係非專屬、無償授權國家圖書館及本人畢業學校之圖書館, 不限地域、時間與次數,以微縮、光碟或數位化方式將上列論文進行重製,並同意公開傳輸數 位檔案。

紙本論文:茲同意將授權人擁有著作權之上列論文全文(含摘要),提供讀者基於個人非營利性 質之閱覽或列印,此項授權系非專屬、無償授權國立中正大學圖書館做為編目上架及公開陳列 閱覽使用。

□ 校内外立即開放

□ 校內立即開放,校外於 2019 年 08 月 12 日後開放
☑ 校內於 2019 年 08 月 12 日;校外於 2019 年 08 月 12 日後開放
□ 其他 \_\_\_\_\_

授權人:侯紀宇

名: 候纪宇

日期: 103年 8月 12日

摘要

由於成熟的系統晶片(SoC)技術,讓許多模組可以整合至單一晶片中,然而 在系統晶片中,存在著許多長距離的绕線,這些長距離绕線會限制系統晶片的可 最高工作時脈速度。因此,矽穿孔技術被提出來縮短這些長距離接線。根據莫爾 定理,當有愈多的電晶體被整合至單一晶片後,矽穿孔包裝技術在近年來也變得 越來越普及與重要。然而,在製造過程中之矽穿孔延遲變異現象有可能會導致系 統晶片工作不正常。這是因為使用矽穿孔之三維晶片,上下層晶片互相傳送資料 時,矽穿孔延遲變異現象將會對傳送資料能否正常造成影響。對於此問題,將三 維晶片中的上下層晶片之時脈相位同步,將能有效簡化三維晶片資料傳輸的問 題。

因此,我們提出了兩種全數位延遲鎖相迴路架構,可以有效同步三維晶片之 各個晶片間的時脈相位。我們提出之全數位延遲鎖相迴路是由台積電 90 奈米製 程之標準元件庫實現,並且具有容忍製程飄移的能力。其中第一種架構可以使用 我們所提出的數位可控制變容延遲線來補償矽穿孔延遲變異現象,第二種架構因 為使用單一矽穿孔,因此可以直接避免掉矽穿孔延遲變異現象。第一種架構使用 雙矽穿孔可以工作的時脈範圍為 300MHz ~ 1GHz,鎖定後最大相位誤差為 21.9ps, 第二種單一矽穿孔架構可以工作的時脈範圍為 200MHz ~ 1GHz,鎖定後之最大 相位誤差小於 80ps。

**關鍵詞**:全數位延遲鎖相迴路、矽穿孔、三維積體電路、單一矽穿孔通道,晶片 與晶片間之時脈訊號同步。

## Abstract

With the advance of system-on-a-chip (SoC) technology, many transistors can be integrated into a single chip. However, in an SoC, there may have many global wires with long distances, and the highest clock speed of a SoC are usually limited by these wires. Therefore, the through silicon via (TSV) technology is proposed to shorten the length of the global wires. Besides, according to the Moore's Law, the (TSV) technology becomes more and more popular in recent years with the increasing of the integrated transistors in a single chip. However, the TSV delay variation phenomenon during manufacturing may cause the SoC systems not work correctly. The TSV delay variation problem affects the data transmission between chips. Therefore, the clock signals between chips need to be phase aligned to simplify the data transmission between chips.

Hence, two all-digital delay-locked loops (ADDLLs) with architectures are proposed in this thesis to synchronize the clock signals between two chips. The proposed ADDLLs are implemented in TSMC 90nm CMOS process with standard cells, and they can tolerate PVT variations. Besides, the first ADDLL architecture with two TSV channels can compensate for the TSV delay variation with the digital controlled varactor-based delay line. The second ADDLL architecture can avoid the TSV delay variation problem with only a single TSV channel. The first ADDLL with two TSVs can operate with the input frequency range from 300MHz to 1GHz, and the maximum phase error is 21.9ps. The second ADDLL with a single TSV can operate with the input frequency range from 200MHz to 1GHz, and the maximum phase error is 80ps. Index Terms — All-Digital Delay-Locked Loop, Through Silicon Via, 3-D ICs, Single TSV channel, Die-to-die clock synchronization.



## 致 謝

首先,我要由衷地感謝我的指導教授一鍾菁哲博士一這兩年以來的指導。從 大學對未來徬徨的我,自從進入鍾老師實驗室後,經由新生訓練、論文報告訓練、 開始著手研究、發表自己的第一篇會議論文、晶片下線、參加學術會議並用英文 發表自己的研究成果、一直到完成這本論文,鍾老師功不可沒。在教學上,鍾老 師有著十分的熱忱,而且並不會介意你的過去是什麼領域,他都會一一熱心指導, 也會在研究有疑問時給予適當的建議,這讓我在學習 IC 設計這條路上走得越來 越順利。也因為鍾老師細心的訓練,使我在尋找研發替代役時,錄取了自己心目 中心儀的公司,在此我非常感謝鍾老師對我的教誨。

第二,我要感謝我的父母以及家人,父母與家人始終支持我自己的選擇,並 且給予我適當的建議、以及勇往直前的動力,在我研究疲累之餘,每一次回家, 都會讓我充滿能量。在此,我誠心地感謝父母辛苦的扶持我,使我能完成今日之 學業,讓我在未來的路上走得更順暢。

第三,我要感謝我的女友-王王,感謝他在我繁瑣的研究所生活之餘,願意 陪著我一起度過研究所的時光,也很謝謝他時常在我氣餒的時候,適時地給我打 氣,讓我能夠繼續努力下去。

最後,我要感謝我的實驗室同儕、學長姊、學弟妹們。因為有學長姊的提攜, 讓我能夠迅速步入 IC 設計的領域。也因為有同儕的討論,使我能夠在學習遇到 問題時,能夠有所討論並解決問題。最後,因為有學弟妹的幫忙,我們能夠在最 後研究要進入尾聲時,全心全意的衝刺,不需要太打理實驗室的瑣事!

侯紀宇

中華民國一零三年七月

寫於國立大學資訊工程所

IV

## Contents

| Chapter                     | 1 Introd | luction                                                | 1  |
|-----------------------------|----------|--------------------------------------------------------|----|
| 1.1 Through Silicon Via     |          |                                                        | 1  |
|                             | 1.1.1    | 3-D Network-on-Chip (NoC)                              | 4  |
|                             | 1.1.2    | Random Access Memory in 3-D ICs                        | 5  |
|                             | 1.1.3    | 3-D ICs Power-Supply Networks                          | 6  |
| 1.2                         | Impac    | et of TSV delay variation                              | 7  |
| 1.3                         | Die-to   | D-Die Clock Synchronization                            | 9  |
|                             | 1.3.1    | 3D-IC Die-to-Die Clock Synchronization                 | 16 |
|                             | 1.3.2    | Motivation                                             | 20 |
| 1.4                         | Thesis   | s Organization                                         | 21 |
| Chapter                     | 2 3-D IC | s clock synchronization with two TSVs                  | 22 |
| 2.1                         | Archi    | tecture Overview                                       | 22 |
| 2.2                         | The L    | ocking Procedure                                       | 24 |
| 2.3                         | Circui   | t Design and Implementation of ADDLL with two TSVs     | 27 |
|                             | 2.3.1    | Digitally Controlled Varactor (DCV)                    | 27 |
|                             | 2.3.2    | Phase and Frequency Detector (PFD)                     | 29 |
|                             | 2.3.3    | Digital Controlled Delay Line (DCDL)                   | 31 |
|                             | 2.3.4    | FDL                                                    | 33 |
| 2.4                         | Exper    | imental Results                                        | 35 |
|                             | 2.4.1    | Specifications of the ADDLL with two TSVs              | 35 |
|                             | 2.4.2    | Simulation Results of the ADDLL with two TSVs          | 36 |
|                             | 2.4.3    | Comparison Table                                       | 38 |
| Chapter                     | 3 3-D IC | s clock synchronization with a single TSV              | 40 |
| 3.1                         | Archi    | tecture Overview                                       | 40 |
| 3.2                         | The L    | ocking Procedure                                       | 42 |
| 3.3                         | Circui   | t Design and Implementation of ADDLL with a single TSV | 44 |
|                             | 3.3.1    | Pulse Generator (PG)                                   | 44 |
|                             | 3.3.2    | TDC-Embedded CDL                                       | 46 |
|                             | 3.3.3    | Phase Detector (PD)                                    | 47 |
| 3.4                         | Exper    | Experimental Results                                   |    |
|                             | 3.4.1    | Test Circuit Implementation                            | 49 |
|                             | 3.4.2    | Specifications                                         | 55 |
|                             | 3.4.3    | Simulation Results                                     | 58 |
|                             | 3.4.4    | Phase Error                                            | 63 |
| 3.5 Performance Comparisons |          | mance Comparisons                                      | 64 |
|                             | 3.5.1    | Two Proposed ADDLLs                                    | 64 |

|           | 3.5.2 Comparison with prior arts | 66 |
|-----------|----------------------------------|----|
| Chapter 4 | Conclusion and Future Works      | 68 |
| 4.1       | Conclusion                       | 68 |
| 4.2       | Future Works                     | 69 |
| Reference | ·S                               | 70 |



## **List of Figures**

| Figure 1.1 System on a board (SoB)1                                                 |
|-------------------------------------------------------------------------------------|
| Figure 1.2 System in a package (SiP)                                                |
| Figure 1.3 System on A Chip (SoC)                                                   |
| Figure 1.4 2D-SoC and 3D-IC                                                         |
| Figure 1.6 Concept of 3D-DRAM Architecture                                          |
| Figure 1.7 Concept of Clustered and Distributed Topology                            |
| Figure 1.8 TSV resistive-open phenomenon7                                           |
| Figure 1.9 Ring-oscillator architecture for TSV delay variation detection           |
| Figure 1.10 3D-IC Architecture                                                      |
| Figure 1.11 3D-IC timing diagram without TSV delay variation                        |
| Figure 1.12 3D-IC timing diagram with TSV delay variation (clock leads data) 11     |
| Figure 1.13 3D-IC timing diagram with TSV delay variation (clock lags data) 11      |
| Figure 1.14 Bidirectional timing diagram with TSV delay variation (clock leads      |
| data)                                                                               |
| Figure 1.15 Bidirectional timing diagram with TSV delay variation (clock lags data) |
|                                                                                     |
| Figure 1.16 Bidirectional timing diagram with clock synchronization                 |
| Figure 1.17 Dual-Locking DLL Architecture                                           |
| Figure 1.18 Timing diagram of Dual-Locking DLL                                      |
| Figure 1.19 Dual-Delay-Locked Loop Architecture                                     |
| Figure 2.1 The proposed ADDLL Architecture with two TSVs                            |
| Figure 2.2 Timing diagram of TSV delay variation compensation24                     |
| Figure 2.3 Timing diagram of die-to-die clock synchronization                       |
| Figure 2.4 The proposed DCV-based delay line architecture                           |
| Figure 2.5 The Phase and Frequency Detector                                         |
| Figure 2.6 The timing diagram of PFD                                                |
| Figure 2.7 The digital controlled delay line                                        |
| Figure 2.8 The FDL                                                                  |
| Figure 2.9 Layout of The Proposed ADDLL                                             |
| Figure 2.10 Simulation waveform of the proposed ADDLL                               |
| Figure 3.1 The proposed ADDLL architecture with a single TSV40                      |
| Figure 3.2 The timing diagram of locking procedure                                  |
| Figure 3.3 The Pulse Generator (PG)                                                 |
| Figure 3.4 The timing diagram of PG45                                               |
| Figure 3.5 The TDC-Embedded CDL                                                     |
| Figure 3.6 The Phase Detector                                                       |

## **List of Tables**

| Table 1.1 PVT conditions in this thesis                               | 21 |
|-----------------------------------------------------------------------|----|
| Table 2.1 The Delay Time of DCV-Based Delay Line                      | 28 |
| Table 2.2 The dead-zone of the proposed PFD                           | 30 |
| Table 2.3 The properties of The Proposed DCDL                         | 32 |
| Table 2.4 Properties of The Proposed FDL                              | 34 |
| Table 2.5 Performance Comparisons                                     | 38 |
| Table 3.1 Pulse width of The generated pulse                          | 45 |
| Table 3.2 The output frequency range of the slow DCO                  | 52 |
| Table 3.3 The output frequency range of the fast DCO                  | 53 |
| Table 3.4 I/O pins information of ADDLL with a single TSV             | 57 |
| Table 3.5 The TSV mirror mismatch                                     | 63 |
| Table 3.6 The deadzone of the phase detector                          | 63 |
| Table 3.7 Comparison table of ADDLL between a single TSV and two TSVs | 65 |
| Table 3.8 Gate count of ADDLL between a single TSV and two TSVs       | 65 |
| Table 3.9 Performance Comparison with prior arts                      | 67 |
| Table 3.10 The power consumption of the DFFs                          | 67 |



## Chapter 1 Introduction

## 1.1 Through Silicon Via

System on a board (SoB) package technology is widely applied to integrate a system on a printed circuit board (PCB), as shown in Fig. 1.1. The advantage of SoB is low cost. However, the bonding wires and interconnection wires between modules cause RLC parasitic effects. The speed of the integrated system is restricted, and the power consumption due to the extra driving buffers is high. In addition, the volume of the system is very large.



Figure 1.1 System on a board (SoB)

The system in a package (SiP) is to improve the disadvantages of SoB. As shown in Fig. 1.2, the modules are stacked to reduce the length of bonding wire, and the volume of SiP is smaller than the traditional SoB technology. However, the bonding wire is still huge, and has large impact on the speed limitation of modules.



http://javier.esilicon.com/page/3/ Figure 1.2 System in a package (SiP)

In order to improve the speed of the system, the system on a chip (SoC) technology is proposed, as shown in Fig. 1.3. The modules are integrated on a single chip, and the bonding wires are eliminated. The RLC effects caused by bonding wires or interconnection wires on PCB can be greatly reduced, and the power consumption also can be reduced due to the elimination of driving buffers. According to Moore's Law, there are a lot of transistors can be integrated in a single chip, and the numbers of transistors are doubled every 18 months. However, Moore's Law seems to achieve a bottleneck in recent years, because the size of the transistor is hardly reduced due to the physical limitations of the transistor.



Figure 1.3 System on A Chip (SoC)

With the skillful SoC technology, more modules can be integrated in a single chip. However, in a SoC, there may have global wires with long distances, and it may limit the highest clock speed of a SoC. Therefore, the through silicon via package technology is proposed to shorten the length of the global wires. Besides, in order to follow the Moore's Law, Through Silicon Via (TSV) [1]-[2] also plays an important role in three-dimensional integrated circuit (3D-IC). As shown in Fig. 1.4, the TSV package technology can connect many two-dimensional integrated circuits (2D-ICs), and the numbers of I/O can be increased. In addition, the delay time of the TSV channel is much smaller than the bonding wires.



Figure 1.4 2D-SoC and 3D-IC

In summary, the advantages of TSV package are the small wire delay, large numbers of I/O, low power consumption, and high circuit speed, and therefore the TSV technology is popular in recent years. Due to the TSV, the performance of the SoC becomes more powerfully.

#### 1.1.1 3-D Network-on-Chip (NoC)

As shown in Fig. 1.5, TSVs can be applied to the network-on-chip (NoC) to connect multi-processors in 3D-NoC architecture [9]-[18]. In [12], to measure the wire delay impact of the 3D-NoC, the 3D-NoC architectures are proposed to compare the wire delay impact between traditional 2D-NoC and 3D-NoC. However, there are many design challenges for the 3D-ICs such as yield, power density, design flow, design cost, and IC testing. Besides, the thermal problem in 3D-NoC is caused by the traffic block of the network packets. For 3D-NoC traffic block problem, the traffic-balanced topology-aware multiple routing adjustment (TTMRA) [16] is proposed to dynamically adjust the routing path of each packet.



Figure 1.5 Concept of 3D-NoC Architecture

#### 1.1.2 Random Access Memory in 3-D ICs

In [19]-[21], TSV package technology is applied to Random Access Memory (RAM) design. The advantages of TSV technology are performance improvement, power reduction, and increase the chip density. To overcome the I/O speed limitation, several buffered modules are proposed, and many additional chips are added to buffer the data pins. However, the additional chip solutions increase the power consumption, and latency of the operation.

In [19], a Double-Data-Rate (DDR3) DRAM with TSVs is proposed to overcome the limits of conventional DRAMs. It proposes a master and slave architecture to reduce the power consumption as compared to the conventional quad-die packages (QDPs). This architecture also increases the I/O speed of DRAMs and reduces the noise at power lines.



Figure 1.6 Concept of 3D-DRAM Architecture

#### **1.1.3 3-D ICs Power-Supply Networks**

IR-drop is the resistive voltage drop in power and ground lines. IR-drop is a design challenge in the advanced CMOS process in addition to leakage power and dynamic power consumption. The distributed TSV topology for 3-D ICs power-supply networks are proposed in [22]-[25].

In [22], the distributed topology power planning is better than traditional 2-D power planning and clustered power planning. The distributed topology power planning is proposed to reduce the IR-drop of 3-D ICs about 21% and also lower the dynamic noise of 3-D ICs about 32% as compared to 2-D ICs. It shows that the distributed topology of 30 stacked tiers can improve nearly 42% lower dynamic noise and 50% lower IR-drop than the clustered power planning topology in 3-D ICs. However, the design challenge for distributed topology power planning is that too many macros or array structure designs are not suited because the small pitch TSVs are required.



Figure 1.7 Concept of Clustered and Distributed Topology

## **1.2 Impact of TSV delay variation**

As mentioned above, the TSV package technology becomes more and more popular in recent years. However, the cost of the TSV package technology is high, and TSVs may be faulty during manufacturing. Many researches [3]-[8] pay attention to the TSV delay variation problem. In [8], this research focuses on analyzing the impact of TSV open defects. It shows that the resistive-open phenomenon will increase the propagation delay of the TSV. Fig. 1.8 shows the TSV resistive-open phenomenon.



http://www.newswire.co.kr/newsRead.php?no=463484

Figure 1.8 TSV resistive-open phenomenon

In [3]-[6], a ring oscillator (RO)-based architecture is proposed to detect the TSV delay variation phenomenon, as shown in Fig. 1.9. In [4], the variable output threshold is proposed. They can detect the parametric delay fault by dynamically switching the inverter driving ability to observe the output frequency of the RO. This research shows that the maximum TSV delay variation between two TSVs can be up to about 500ps. In [3], this research focuses on pre-bond TSV test. It also uses the RO architecture to detect the TSV propagation delay variation with different RC parameters. The proposed pre-bond TSV testing can detect leakage and resistive-open faults during manufacturing test. In accordance with above results, the propagation delay of TSVs varies with process variations.



Figure 1.9 Ring-oscillator architecture for TSV delay variation detection

## **1.3 Die-to-Die Clock Synchronization**

In order to explain the impact of the TSV delay variation, a 3D-IC architecture is shown in Fig. 1.10 for illustration. There are two dies connected by many TSVs, and the data signals can transmit between two dies.



Figure 1.10 3D-IC Architecture

The DIE1\_CLK is the global reference clock for the system, and the DIE1\_CLK will send to the DIE2 to trigger the circuits on the DIE2. For the ideal situation shown in Fig. 1.11, the TSVs are manufactured without TSV delay variation, and the TSV delays for the clock signal and the data signal are exact the same. For the signal transmission from DIE1 to DIE2, the DIE1-2\_DATA signal in DIE2 can be sampled correctly by the negative edge of the DIE1-2\_CLK signal. The DIE2\_DATA is transmitted back to the DIE1 and denoted as DIE2-1\_DATA. Then, the positive edge

of the DIE1\_CLK is used to sample the DIE2-1\_DATA. Therefore, without TSV delay variation, data can be transmitted between dies correctly.



Figure 1.11 3D-IC timing diagram without TSV delay variation

Fig 1.12 shows that the delay time of the clock TSV is shorter than the delay time of the data TSV, and thus, the DIE1-2\_CLK leads DIE1-2\_DATA. The DIE1-2\_CLK can sample the DIE1-2\_DATA correctly while Eq. 1.1 is satisfied.

$$T_{hold} < \Delta < T_{cycle} - T_{setup}$$
 (Eq. 1.1)

where  $T_{setup}$  means a small timing margin before the positive edge of the clock signal for the DFFs to sample the data signal, and the data signal must be stable in this timing margin.  $T_{hold}$  means a small timing margin after the positive edge of the clock signal for the DFFs to sample the data signal, and the data signal must be stable in this timing margin.  $T_{cycle}$  means the clock period of the system clock.



 $\Delta$  : TSV delay variation

Figure 1.12 3D-IC timing diagram with TSV delay variation (clock leads data)

Fig. 1.13 shows that the delay time of clock TSV is larger than the delay time of data TSV, and thus, the DIE1-2\_CLK lags DIE1-2\_DATA. The DIE1-2\_CLK can sample the DIE1-2\_DATA correctly while the Eq. 1.2 is satisfied.





Figure 1.13 3D-IC timing diagram with TSV delay variation (clock lags data)

When there is no delay variations in TSVs, the data can easily transmit between dies, as shown in Fig. 1.11. However, when there has delay variations in TSVs, the Eq.

1.1 and Eq. 1.2 should be satisfied or the data transmission between dies may be failed.

Fig. 1.14 shows that the DIE1\_CLK, DIE1\_DATA are transmitted from DIE1 to DIE2 and denoted as DIE1-2\_CLK and DIE1-2\_DATA, respectively. At this moment, the DIE1-2\_CLK leads DIE1-2\_DATA due to the TSV delay variation phenomenon. The DIE2\_DATA will transmit back to DIE1 after the DIE1-2\_DATA is sampled by the positive edge of DIE1-2\_CLK. In this case, if the DIE2-1\_DATA can be sampled by the positive edge of DIE1\_CLK successfully, the Eq. 1.3 should be satisfied.

# $T_{hold} < T_{clk1-2} + T_{data2-1} < T_{cycle} - T_{setup} \quad (Eq. 1.3)$

where Tclk1-2 means, the delay time while the clock signal transmits from DIE1 to DIE2 and denoted as DIE1\_CLK and DIE1-2\_CLK. The Tdata2-1 means, the delay time while the data signal transmits from DIE2 to DIE1 and denoted as DIE2\_DATA and DIE2-1\_DATA, respectively.



 $\Delta$  : TSV delay variation (DIE1\_to\_DIE2)

Figure 1.14 Bidirectional timing diagram with TSV delay variation (clock leads

data)

Fig. 1.15 shows that the DIE1\_CLK, DIE1\_DATA are transmitted from DIE1 to DIE2 and denoted as DIE1-2\_CLK and DIE1-2\_DATA, respectively. At this moment, the DIE1-2\_CLK lags DIE1-2\_DATA due to the TSV delay variation phenomenon. The DIE2\_DATA will transmit back to DIE1 after the DIE1-2\_DATA is sampled by the positive edge of DIE1-2\_CLK. In this case, if the DIE2-1\_DATA can be sampled successfully, the Eq. 1.4 should be satisfied.

$$If \left( T_{clk1-2} + T_{data2-1} < T_{cycle} \right)$$

$$T_{clk1-2} + T_{data2-1} < T_{cycle} - T_{setup}$$
 (Eq. 1.4)

else,

$$T_{hold} < T_{clk1-2} + T_{data2-1}$$



**θ** : TSV delay variation (DIE1\_to\_DIE2)

Figure 1.15 Bidirectional timing diagram with TSV delay variation (clock lags

data)

In summary, the  $T_{clk1-2}$ ,  $T_{data2-1}$ ,  $\Delta$  and  $\theta$  are unpredictable values because the TSV delay value is decided after manufacturing. In order to simplify the problem in data transmission between dies, the phase between the DIE1\_CLK and DIE1-2\_CLK should be aligned.

In Fig. 1.16, the data transmission problem is simplified when the phase between DIE1\_CLK and DIE1-2\_CLK is aligned. The unpredictable values after the manufacturing are simplified to one value. Finally, the data access formula can be simplified, as Eq. 1.5. In addition, Eq. 1.5 is easy to be satisfied.

$$T_{hold} < T_{data2-1} < T_{cycle} - T_{setup}$$
 (Eq. 1.5)



Figure 1.16 Bidirectional timing diagram with clock synchronization

#### **1.3.1 3D-IC Die-to-Die Clock Synchronization**

In high-speed SoC design, the clock distribution through clock tree buffers should be carefully planned to eliminate the clock skew between the global clock and the local clocks. Phase-locked loops (PLLs) and delay-locked loops (DLLs) [26]-[29] are widely applied to cancel the clock skew. As mentioned in section 1.3, if the clock phase between two dies are aligned, the data transmission between dies can be simplified.

The 3D-ICs die-to-die clock synchronization between two dies are proposed by [30]-[33]. In order to reduce the data confliction time between the memory outputs of the stacked dies, a DLL-based data self-aligner (DBDA) [30] is proposed. A replica TSV delay is needed for the DLL circuit in the DBDA. However, the DBDA may still have a large phase error after the DLL is locked because the unpredictable TSV delay variations may occur due to faulty TSVs. As a result, the replica TSV delay is not reliable.

A dual-locking DLL [31] is proposed for 3-D ICs die-to-die synchronization, as shown in Fig. 1.17. This architecture needs two loops to finish die-to-die clock synchronization. The first loop path is from Die1\_clk, DCDL, TSV1, divider (Div), buffer (B2), TSV2, buffer(B2) to PD (Phase Detector). The  $\theta$  is defined as the total delay of a divider (Div), a TSV2, and two buffers (B2). When the first loop is locked, the Die1\_clk and the fb are phase aligned. That means, the  $\Phi$ 1 plus the delay time of  $\theta$ can be phase aligned with Die1\_clk, as shown in Fig. 1.18. The second loop path is from Die1\_clk, divider (Div), buffer (B1), TSV2, buffer (B1) to PD, and thus, the  $\Phi$ 1 signal sends to the 2-Phase DCDL to generate the Die2\_clk signal and the out signal. That means, the Die1\_clk plus the delay time of  $\theta$  can be phase aligned with the out signal. The 2-Phase DCDL is divided into two same delay lines, and it can generate two phases (Die2\_clk and out). The Die1\_clk plus  $\theta$  equals to the  $\Phi$ 1 plus double  $\theta$ , and the double  $\theta$  is generated by 2-Phase DCDL. After the second loop is locked, the Die1\_clk and the Die2\_clk are phase aligned, as shown in Fig. 1.18.

The phase error caused by the delay mismatch between the real TSV and the replica TSV delay [30] will not occur in [31] because the delay of TSVs does not need to be mirrored in the dual-locking DLL. However, the dual-locking DLL needs to keep fine-tuning the two DLL loops in an interleaved manner to maintain the phase alignment between the clock signals in multiple layers. To perform fine-tuning in two DLLs, the dual-locking DLL needs to regularly change the direction of the forward path and the feedback path, and it may cause a large phase error during maintaining mode.



Figure 1.17 Dual-Locking DLL Architecture



Figure 1.18 Timing diagram of Dual-Locking DLL

A dual-delay-locked loop (D-DLL) [32] is proposed for 3D-IC die-to-die clock deskew circuit applications, as shown in Fig. 1.19. This architecture also needs two loops to finish die-to-die clock synchronization. The first loop path is from Die1\_clk, TSV1, VCDL<sub>2</sub>, Buffer, BB, TSV2, BB, VCDL<sub>1</sub> to PD, and the second loop path is from Die2\_clk, BB, TSV2, BB, VCDL<sub>1</sub>, VCDL<sub>1</sub>, BB, TSV2, BB to PD. The locking formula is shown in Eq. 1.6, and the total delay of upper path is equal to half of N times Die1\_clk period.

Two analog charge-pump-based DLLs are proposed in this design. However, a special bidirectional buffer (BB) is needed to simultaneous transmit signals between two directions on a single TSV. Furthermore, two DLLs are working at the same time, and it increases the design complexity of the D-DLL. In advanced CMOS process, the large voltage control gain problem with a low supply voltage and the leakage current problem of the MOS transistor should be overcome in design of the voltage controlled delay line for operation for the wide frequency range. The above problems will be the design challenges of the D-DLL circuit.



Figure 1.19 Dual-Delay-Locked Loop Architecture

 $T_{loop1} = T_{TSV1} + T_{VCDL2} + T_{Buffer} + T_{BB} + T_{TSV2} + T_{BB} + T_{VCDL1} \quad (Eq. 1.6)$   $T_{loop2} = 2T_{BB} + 2T_{TSV2} + 2T_{BB} + 2T_{VCDL1}$   $T_{loop1} = T_{loop2} = NT_{Die1\_clk}$   $T_{TSV1} + T_{VCDL2} + T_{Buffer} = \frac{N}{2} * T_{Die1\_clk}$ 

#### 1.3.2 Motivation

In recent years, the clock distribution problem and the clock skew problem are the design challenges for the high-speed SoC designs because clock skew may cause operation failure of the system. Besides, the high density transistor designs are implemented by 3-D package technology and face the design challenges of the clock distribution.

For the 3D-IC die-to-die clock synchronization application, DBDA architecture [30] does not consider the TSV delay variation phenomenon. If the TSV delay variation occurs, the phase error between two clock signals may be relatively large. Besides the TSV variation phenomenon, we found that all of the prior architectures require two TSV channels. Therefore, we propose a DLL architecture with a single TSV to eliminate the TSV delay variation. In addition, the cost of TSV manufacturing can be decreased.

## **1.4 Thesis Organization**

Table 1.1 lists the PVT conditions used in the following chapters. The rest of thesis is organized as follows: Chapter 2 depicts the system architecture of the proposed ADDLL with two TSV channels, the locking procedure, and simulation results. Chapter 3 describes the system architecture of the proposed ADDLL with a single TSV channel, the locking procedure, experimental results, and the chip planning of the test chip. Besides, we also list a comparison table to compare the proposed ADDLL with prior arts. At the end of the thesis, we make a conclusion and discuss the future works.

Table 1.1 PVT conditions in this thesis

| Corner | Nominal Supply Voltage<br>(V) | Temperature (℃) |
|--------|-------------------------------|-----------------|
| FF     |                               | 0               |
| TT     | 1.0                           | 25              |
| SS     | 0.9                           | 125             |

## Chapter 2 3-D ICs clock synchronization with two TSVs

## 2.1 Architecture Overview

Fig. 2.1 shows the proposed ADDLL for 3-D ICs die-to-die clock synchronization with two TSVs. The ADDLL is composed of two digital controlled delay lines (DCDL\_A and DCDL\_B), two ADDLL controllers (CTRL\_A and CTRL\_B), two digital controlled varactor-based delay lines (DCV\_A and DCV\_B), two phase detectors (PD\_A and PD\_B), two frequency-divided-by-2 circuits, and six tri-state buffers (buf\_A to buf\_F).



Figure 2.1 The proposed ADDLL Architecture with two TSVs

There are three steps in the proposed ADDLL to achieve die-to-die clock synchronization. **First**, when the ADDLL is reset, the path\_control signal is set to zero, and DCDL\_A, DCDL\_B, DCV\_A, and DCV\_B are set to provide a minimum

delay time. In the upper delay path, the DIE1\_CLK signal passes through buf\_A, DCDL\_A, buf\_B, TSV1, DCV\_A, and buf\_E to the phase detector B (PD\_B) denoted as dcva\_to\_pd. Similarly, in the lower delay path, the DIE1\_CLK signal passes through buf\_C, DCDL\_B, buf\_D, TSV2, DCV\_B, and buf\_F to the PD\_B denoted as dcvb\_to\_pd. In the proposed ADDLL, six tri-state buffers (buf\_A to buf\_F) are designed with same tri-state buffers, and the delay time of the DCDL\_A is the same as the DCDL\_B. Therefore, the phase error between dcva\_to\_pd signal and dcvb\_to\_pd signal comes from the delay variations between TSV1 and TSV2.

**Second**, after TSVs delay variations are compensated, the clock\_gate signal is set to zero for three consecutive clock cycles to stop the DIE1\_CLK signal propagating to the upper delay path. The clock-gating is performed to recognize the first positive edge transition of the fb\_clk signal for the next locking procedure.

Third, after clock-gating is performed, the path\_control signal and clock\_gate signal are pulled high, and the DIE1\_CLK signal passes through buf\_A, DCDL\_A, buf\_B, TSV1, DCV\_A, buf\_E, buf\_F, DCV\_B, TSV2, buf\_D, DCDL\_B, buf\_C, and a frequency-divided-by-2 circuit to the phase detector A (PD\_A) denoted as fb\_div2. In addition, the DIE1\_CLK signal is divided by 2 and sent to the PD\_A denoted as clk\_div2. The PD\_A detects the phase relationship between the clk\_div2 signal and fb\_div2 signal, and it outputs dcdl\_up signal and dcdl\_down signal to the CTRL\_A. The CTRL\_A outputs the delay line control code (dcdl\_code[10:0]) for adjusting the delay time of the DCDL\_A and the DCDL\_B.
## 2.2 The Locking Procedure

Fig. 2.2 shows the timing diagram of TSV delay variations compensation. In Fig. 2.2, dcva\_to\_pd signal leads dcvb\_to\_pd signal, and thus, the PD\_B generates dcv\_up signal to the CTRL\_B to increase the delay time of the DCV\_A. After two times polarity change of the PD\_B from dcv\_up to dcv\_down or dcv\_down to dcv\_up, the dcv\_lock signal is pulled high to stop tuning the control code of the DCV\_A and the DCV\_B. Then, the phase error between dcva\_to\_pd signal and dcvb\_to\_pd signal are eliminated which means the delay variation between TSV1 and TSV2 is compensated.



Figure 2.2 Timing diagram of TSV delay variation compensation

Fig. 2.3 shows the timing diagram of die-to-die clock synchronization. After clock\_gate signal is pulled high, the CTRL\_A starts to align the phase of clk\_div2 signal and fb\_div2 signal. Because the DCDL\_A and DCDL\_B are set to a minimum delay time in the beginning, the fb\_div2 signal will lead to the clk\_div2 signal. The CTRL\_A will keep increasing the delay time of the DCDL\_A and DCDL\_B until the polarity of the PD\_A changes from dcdl\_down to dcdl\_up. The lock condition of the ADDLL can be expressed as Eq. 2.1. Since the total delay in the upper delay path will be equal to the lower delay path, and thus, after the ADDLL is locked, the total delay time in the upper path will be equal to N\*T<sub>DIE1\_CLK</sub> which means the phase error between the DIE1\_CLK signal and DIE\_2 signal is cancelled.

$$T_{buf\_A} + T_{DCDL\_A} + T_{buf\_B} + T_{TSV1} + T_{DCV\_A} + T_{buf\_E} + T_{buf\_F}$$
(Eq. 2.1)  
+ $T_{DCV\_B} + T_{TSV2} + T_{buf\_D} + T_{DCDL\_B} + T_{buf\_C} = 2N * T_{DIE1\_CLK}$   
2 \*  $(T_{buf\_A} + T_{DCDL\_A} + T_{buf\_B} + T_{TSV1} + T_{DCV\_A} + T_{buf\_E}) = 2N * T_{DIE1\_CLK}$ 



Figure 2.3 Timing diagram of die-to-die clock synchronization



## 2.3 Circuit Design and Implementation of ADDLL with two TSVs

## 2.3.1 Digitally Controlled Varactor (DCV)

Fig. 2.4 shows the proposed digital controlled varactor (DCV)-based delay line architecture. It is composed of a bypass inverter chain and a DCV delay line. There are 63 NAND gates used as digital controlled varators to provide a fine resolution delay line. The DCV\_A and DCV\_B are used to compensate for the delay variations between two TSVs. The dcv\_en signal is used to control the bypass inverter chain to provide a longer propagation delay time in best case PVT conditions. When the operation condition is at worst case PVT condition, the dcv\_en is set to 0 to reduce the overall delay time of the DCV-based delay line.



Figure 2.4 The proposed DCV-based delay line architecture

Table 2.1 shows the simulation delay time of the DCV-based delay line (DCV\_A and DCV\_B) with PVT variations. The worst resolution of the DCV-based delay line is 8.4ps for accurate compensation for the TSV delay variations.

Table 2.1 The Delay Time of DCV-Based Delay Line

|                 | FF    | TT     | SS     |
|-----------------|-------|--------|--------|
| Intrinsic Delay | 496ps | 676ps  | 1046ps |
| Maximum Delay   | 762ps | 1024ps | 1585ps |



## **2.3.2** Phase and Frequency Detector (PFD)

In order to align the input clock and the feedback clock, we use a bang-bang phase and frequency detector (PFD). The short high pulses (QU and QD) are generated by the DFFs triggered by CLK\_IN and CLK\_FB. Subsequently, the NAND gates and digital pulse amplifier (DPA) can determine whether the CLK\_IN is leading or lagging the CLK\_FB. However, the pulse width of OUTU and OUTD is too narrow to trigger the DFFs of the digital controller. Therefore, we use a digital pulse amplifier to widen the pulse width of OUTU and OUTD. The digital pulse amplifier is composed of many series connected AND gates and a buffer, as shown in Fig. 2.5.



Figure 2.5 The Phase and Frequency Detector

The tiny dead-zone is the advantage of the proposed PFD, and thus, the phase error between CLK\_IN and CLK\_FB can be controlled in a small value. The dead-zone is whether the PFD can determine the phase error.



Figure 2.6 The timing diagram of PFD

Table 2.2 shows the dead-zone of the proposed PFD with PVT variations. It shows that the proposed PFD can correctly operate at 1.0V in three different process corners.

|     | Dead-Zone (ps) |  |  |
|-----|----------------|--|--|
| PVT | 1.0V           |  |  |
| FF  | 13             |  |  |
| TT  | 18             |  |  |
| SS  | 29             |  |  |

Table 2.2 The dead-zone of the proposed PFD

## 2.3.3 Digital Controlled Delay Line (DCDL)

Fig. 2.7 shows the digital controlled delay line (DCDL), and it is composed of 63 lattice delay units (LDU), a fine-tuning delay line (FDL), and a delay line decoder. The resolution of a coarse-tuning is two NAND gates, and the dummy NAND gates are used to balance the capacitance loading of every NAND gates. The 11-bit control code can be encoded as 63-bit coarse-tuning code and 31-bit fine-tuning code to control CDL and FDL.



Figure 2.7 The digital controlled delay line

Table 2.3 illustrates the properties of the proposed DCDL with process variations. The proposed DCDL can provide an enough delay time to finish the locking procedure of ADDLL. With a two-stage delay tuning architecture, the proposed DCDL can provide a wide coarse-tuning delay time and has a high resolution fine-tuning resolution.

|     | 1.0V           |                      |                 |
|-----|----------------|----------------------|-----------------|
| PVT | Max Delay (ps) | Intrinsic Delay (ps) | Resolution (ps) |
| FF  | 3324           | 248                  | 48              |
| TT  | 4227           | 315                  | 62              |
| SS  | 5863           | 433                  | 86              |

Table 2.3 The properties of The Proposed DCDL



### 2.3.4 FDL

Traditional fine-tuning delay line (FDL) in a two-stage coarse-fine delay line architecture should overlap fine-tuning range by more than one coarse-tuning resolution about 20% to 30% with PVT variations. This requirement not only increases the chip area but also results in a large cycle-to-cycle jitter when the coarse-tuning control code is switching. Therefore, we use an interpolator architecture to guarantee that the delay controllable range of the FDL can overlap one coarse-tuning resolution with PVT variations.



Figure 2.8 The FDL

Fig. 2.8 shows the proposed FDL architecture, and it is composed of two parallel connected tri-state buffer arrays, two buffers, and an inverter. The delay time difference between CA\_OUT and CB\_OUT is one coarse-tuning resolution, as shown in Fig. 2.7. The control code (code[30:0]) can control the driving strength of the tri-state buffers. When the fine-tuning code is fully opened (31'h7FFF\_FFFF), the rising edge of OUT will phase align with the falling edge of CA\_OUT. Oppositely,

when the fine-tuning code is fully closed (31'h0000\_0000), the rising edge of OUT will phase align with the falling edge of CB\_OUT. The FDL can divide one coarse-tuning resolution into 31 parts to achieve fine-tuning resolution, and it also provides a total delay controllable range equal to one coarse-tuning resolution.

|     | 1.0V                 |            |  |
|-----|----------------------|------------|--|
| PVT | Intrinsic Delay (ps) | Resolution |  |
| FF  | 114.3                | 1.49       |  |
| TT  | 149.9                | 1.92       |  |
| SS  | 209.4                | 2.57       |  |

Table 2.4 Properties of The Proposed FDL



## 2.4 Experimental Results



## 2.4.1 Specifications of the ADDLL with two TSVs

Figure 2.9 Layout of The Proposed ADDLL

The proposed ADDLL is implemented in a standard performance 90nm 1P9M CMOS process with a 1.0V power supply. Fig. 2.9 shows the layout of the test chip, and the active area of the test chip is 300 um \* 300 um. The area of the proposed ADDLL per die is 0.045 um, and two delay lines are added in the test chip for simulation of the TSV delay.

#### 2.4.2 Simulation Results of the ADDLL with two

#### **TSVs**

Fig. 2.10 shows the simulation waveform of the proposed ADDLL with an 1GHz input clock. After the ADDLL is reset, the path\_control signal is set to zero, and DCDL\_A, DCDL\_B, DCV\_A, and DCV\_B are set to provide a minimum delay time. The CTRL\_B adjusts the DCV\_A and DCV\_B according to the PD\_B's output to increase the delay time of the DCV\_A or DCV\_B until the phase error between dcva\_to\_pd signal and dcvb\_to\_pd is eliminated which means the delay variations between TSV1 and TSV2 is compensated.



Figure 2.10 Simulation waveform of the proposed ADDLL

After TSV delay variations are compensated, the DCV-based delay line control code (dcva[5:0] and dcvb[5:0]) is fixed. Then, path\_control signal is pulled high, and the CTRL\_A adjusts the DCDL\_A and DCDL\_B according to the PD\_A's output to reduce the phase error between clk\_div2 signal and fb\_div2 signal. After the ADDLL is locked, the phase error between clk\_div2 signal and fb\_div2 signal is eliminated, and the phase error between the DIE1\_CLK signal and DIE2\_CLK signal is also cancelled, as explained in Section 2.2. In Fig. 2.10, the delay variations between TSV1 and TSV2 is 176.4ps, after the proposed ADDLL is locked, the phase error between DIE1\_CLK signal and DIE2\_CLK signal and DIE3\_CLK signal and DIE3\_SUB\_3.



## 2.4.3 Comparison Table

|                                    | Duonocod    | JSSC'13     | TCAS-I'13    | ISCAS'12 |
|------------------------------------|-------------|-------------|--------------|----------|
|                                    | Proposed    | [30]        | [31]         | [32]     |
| Туре                               | All-Digital | All-Digital | All-Digital  | Analog   |
| Process                            | 90nm        | 130nm       | 90nm         | 0.18µm   |
| Supply                             | 1.0V        | 1.2V        | 1.0V         | 1.8V     |
| voltage                            | 2001/11     | 2003 (11    | <b>20 MH</b> | 556 MH   |
| _                                  | 300MHz      | 200MHz      | 50 MHz       | 556 MHz  |
| Frequency                          | ~           | ~           | ~            | ~        |
|                                    | 1 GHz       | 1.6GHz      | 600 MHz      | 1.5 GHz  |
| Phase Error                        | < 21.9ps    | <50ps*      | < 15.8ps     | < 2ps    |
| Area per Die<br>(mm <sup>2</sup> ) | 0.045       | 0.06        | 0.0044       | N/A      |
| Power                              | 3.27mW      | 0.9mW       | 1.8mW        | 56mW     |
|                                    | @1GHz 🛒     | @1.6GHz     | 🧝 @600MHz    | @1.5GHz  |

 Table 2.5 Performance Comparisons

\*if replica delay matched perfectly

Table 2.5 shows the performance comparisons with the state-of-the-arts. In the DBDA [30], they needs to replicate the delay of the inter-die TSV wire delay, Therefore, the unexpected TSV delay variations can greatly increase the phase error of clock signals in multiple layers of a 3D-IC after the DLL is locked. In the dual-locking DLL [31], after the DLL is locked, the dual-locking DLL needs to continuously fine-tunes the two DLLs in an interleaved manner to keep maintaining the phase alignment. However, the regularly switching the direction of the forward path and the feedback path and performing fine-tuning in two DLLs may cause a relatively large error during phase maintaining mode. The dual-delay-locked loop (D-DLL) [32] does not need to switch the direction of the forward path and the feedback path since a bidirectional buffer is applied. However, two DLLs are working

at the same time which increases the design complexity to maintain the stability of the D-DLL. The relative high power consumption is also the disadvantage of the analog charge-pump-based D-DLL.

An all-digital delay locked loop (ADDLL) for 3D-IC die-to-die synchronization with two TSVs is presented. The delay variations of TSVs will be compensated before the ADDLL's normal operation. As compared with current DLLs for 3D-IC clock deskew applications, the proposed ADDLL does not need to switch the path during phase maintaining mode. The proposed ADDLL can operate with a 300MHz to 1GHz input clock, and the maximum phase error is smaller than 21.9ps. In addition, the lock-in time is 79 cycles at 1GHz. Furthermore, the proposed ADDLL is implemented with standard cells, and the proposed design can be ported to different process in a short time. Therefore, the proposed ADDLL is very suitable for 3D-IC die-to-die clock synchronization applications.

# Chapter 3 3-D ICs clock synchronization with a single TSV

## 3.1 Architecture Overview

Fig. 3.1 shows the proposed ADDLL for die-to-die clock synchronization in 3D-ICs. The ADDLL is composed of a digital controlled delay line (DCDL), a pulse generator (PG), a time-to-digital convertor (TDC) embedded controlled delay line (CDL), a phase detector (PD), two delay circuits (DELAY), two ADDLL controllers (DIE1\_CTL and DIE2\_CTL), a OR-logic gate, and six tri-state buffers.



Figure 3.1 The proposed ADDLL architecture with a single TSV

The DIE1\_CLK is the reference clock of Die1, and it will be transmitted to Die2 as reference clock of Die2. The goal of the proposed ADDLL is to phase align the clock signals between DIE1\_CLK and DIE2\_CLK. There are two steps in the proposed ADDLL to achieve die-to-die clock synchronization. **First**, when the ADDLL is reset, the pg\_en signal is set to high, and the pass\_clk signal is set to low. The oe signal of Die2 is set to low. The DIE1\_CLK inputs to the PG, and the PG generates a single-shot pulse signal. After a pulse is generated, the pg\_en signal is set to low. The pulse will be transmitted to Die2 through the TSV channel, and it will circulate in Die2 from the upper delay circuit (tri-state buffer E + DELAY) to the bottom delay circuit (DELAY + tri-state buffer F), and then transmits back to Die1. At this moment, the TDC-Embedded CDL will start to calculate the time difference between the first pulse generated by Die1. Then, the delay time of TDC-Embedded CDL is set to half of the time difference (between two pulses) to mirror the delay time of the TSV channel. After the first step is finished, the oe signal is set to low forever.

Second, the ADDLL locking procedure starts to begin. The pass\_clk signal is set to high, and the DIE1\_CLK can be input to TDC-Embedded CDL through the TSV channel to Die2. The delayed signal mirror\_sig is generated by TDC-Embedded CDL for mirroring the TSV channel delay. The DIE1\_CTL starts to increase the delay time of the DCDL until the mirror\_sig and DIE1\_CLK are phase aligned. The DIE1\_CLK can transmit to the Die2 and denoted as DIE2\_CLK. When the DIE1\_CLK and mirror\_sig are phase aligned, the DIE1\_CLK and DIE2\_CLK are also phase aligned. Because the total delay time of the TSV, tri-state buffer, and DELAY is mirrored by the TDC-Embedded CDL circuit in the first step, the mirror\_sig and DIE2\_CLK are also phase aligned. Finally, when the ADDLL is locked, the DIE1\_CTL starts to fine-tune the delay time of the DCDL to maintain the phase alignment between DIE1\_CLK and mirror\_sig in the maintain mode.

## 3.2 The Locking Procedure

Fig. 3.2 shows the locking procedure of the proposed ADDLL. **First**, when the ADDLL is reset, the DIE1\_CLK inputs to the PG. The PG is controlled by the pg\_en signal. The pg\_en is set to high by the first negative edge of DIE1\_CLK, and is set to low by the second negative edge of DIE1\_CLK. That means, the pulse signal is generated by PG circuit between the first and second negative edge of DIE1\_CLK. The pulse transmits from TSV to Die2. When the pulse is transmitted to Die2, the or\_sig is generated by the OR-logic gate. Later, the pulse will pass to DIE2\_CLK, and the second or\_sig pulse is generated. At this moment, the oe is set to high to let the pulse back to Die1. When the pulse transmits back to Die1, the oe signal is set to low. After the pulse transmits from Die2 to Die1, the tdc\_code[5:0] is setting to half of the delay of the pulse circulating time.



Figure 3.2 The timing diagram of locking procedure

**Second**, the pass\_clk is set to high after the TSV delay time is mirrored by setting tdc\_code[5:0]. When the mirror\_sig is received by PD, the DIE1\_CTL starts to increase the value of coarse[5:0] until the lock signal is set to high. That means, the rising edge of mirror\_sig is arriving at the locking window of the PD. Because the total delay time of the unidirectional path is mirrored by the TDC-Embedded CDL, the mirror\_sig and DIE2\_CLK are phase aligned. The lock signal is set to high when the phase between mirror\_sig and DIE1\_CLK is aligned. To sum up, when the mirro\_sig and DIE1\_CLK are phase aligned, the phase between DIE1\_CLK and DIE2\_CLK is zero. Finally, the DIE1\_CTL keeps fine-tuning the value of fine[4:0] to maintain the phase alignment between DIE1\_CLK and mirror\_sig.



## 3.3 Circuit Design and Implementation of ADDLL with a single TSV

## 3.3.1 Pulse Generator (PG)

In order to mirror the TSV delay time to the TDC-Embedded CDL, we should generate a pulse signal with an enough pulse width. Fig. 3.3 shows the proposed PG, and it is composed of two tri-state buffers, six buffers, and an inverter. The two tri-state buffers are used to avoid generating the second pulse signal, because we only need an one single-shot pulse.



Fig. 3.4 shows the timing diagram of the proposed PG. First, we set the pg\_en signal to high, and the clock signal can input to the PG. The "a" signal is delayed after a tri-state buffer, and the "b" signal is delayed by a delay chain (one inverter and six buffers). At this moment, the AND-logic gate generates the pulse signal between "a" signal and "b" signal.



Figure 3.4 The timing diagram of PG

Table 3.1 illustrates the pulse width of the generated pulse by the proposed PG with process variations. The pulse width can't be too narrow, because it may be disappeared during the pulse routing path. Table 3.1 shows that the pulse is generated with PVT variations, and the pulse width is always larger than 200ps. Besides, when the pg\_en is low, the PG circuit will not generate any pulse signal.

|     | 1.0V             |
|-----|------------------|
| PVT | Pulse Width (ps) |
| FF  | 202.2            |
| TT  | 248.1            |
| SS  | 321.4            |

Table 3.1 Pulse width of The generated pulse

#### 3.3.2 TDC-Embedded CDL

The proposed TDC-Embedded CDL is composed of 63 lattice delay units (LDUs), 64 DFFs (time-to-digital converter units), a TDC encoder, and a TDC decoder, as shown in Fig. 3.5. First, the delay time of the TDC-Embedded CDL is set to maximum value. That means, signal can transmit from all upper path to all bottom path. The first pulse generated by PG can propagate through the whole delay path. When the second pulse from the Die2 loops back to the TDC-Embedded CDL, the DFFs will be triggered to record the time difference between pulses. Then, the TDC encoder encodes the sampled value into tdc\_code[5:0]. Finally, the TDC decoder decodes tdc\_code[5:0] as code[62:0] which sets the delay time of the delay path to half of the timing difference between pulses.



Figure 3.5 The TDC-Embedded CDL

## **3.3.3** Phase Detector (PD)

The phase detector is composed of three buffers, three DFFs, one XOR-logic gate, and one AND-logic gate, as shown in Fig. 3.6. The lock signal is high, when the positive edge of CLK\_FB falls into the locking window, as shown in Fig. 3.7. The "tune" signal is for the controller to fine-tuning the DCDL.



Figure 3.6 The Phase Detector



Figure 3.7 The concept of locking window

In order to avoid the ADDLL locking to the negative edge of CLK\_IN, the xor\_b is an important signal. As shown in Fig. 3.8, when two rising edges of CLK\_IN and CLK\_FB are aligned, the lock\_tmp signal is set to high. At this moment, if the xor\_b signal is low, the rising edge of CLK\_FB locks to the falling edge of CLK\_IN. Therefore, the lock signal is decided by lock\_tmp, xor\_b, and pd\_en. Besides, when the tune signal is low, the ADDLL controller increases the delay time of CLK\_FB. In contrast, when the tune signal is high, the ADDLL controller decreases the delay time of CLK\_FB.



Figure 3.8 The timing diagram of PD

## **3.4 Experimental Results**

## 3.4.1 Test Circuit Implementation

In order to solve the clock rate restriction on I/O pads, we propose a test circuit to provide ADDLL circuit a high-speed on-chip clock. The test circuit divides the frequency of clock signals in die1 and die2. Therefore, the internal clock signals can be observed. The test circuit also provides the working state of the ADDLL making the debug process easier.



Figure 3.9 The test circuit

The tester circuit is composed of a state checker, a digital controlled oscillator (DCO), and a divide circuit, as shown in Fig. 3.9. The DCO provides a 150MHz to 1GHz clock signal for the ADDLL. When the ADDLL is locked, the high-speed clocks cannot pass through the I/O pads due to the I/O pad speed limitations. The signal divide circuit can divide the frequency of the clock signals by two or eight, and let the divided clock signals can pass through the low-speed I/O pads.



#### 3.4.1.1 DCO

The DCO is composed of a MUX-typed slow DCO, a fast DCO, and a DCO decoder, as shown in Fig. 3.10. The MUX-typed slow DCO is composed of a NAND-logic gate to enable the slow DCO, and 15 delay units (a MUX and a delay buffer) to provide with different frequency ranges. The fast DCO is composed of a NAND-logic gate, a delay buffer chain, and three tri-state buffers. The DCO decoder can encode the dco\_code[3:0] into SLOW[14:0] and FAST[2:0] for controlling the DCO frequency range. Besides, the signals of EN\_FAST\_DCO and EN\_SLOW\_DCO are used to enable and disable the fast DCO and slow DCO, respectively.



Figure 3.10 The DCO

Table 3.2 lists the frequency range of the slow DCO. It shows that the slow DCO can provide ADDLL with a 150MHz to 770MHz clock.

| dco_code[3:0]    | Output Frequency |
|------------------|------------------|
|                  |                  |
| 4'b 0000 (4'd0)  | 771              |
| 4'b 0001 (4'd1)  | 603              |
| 4'b 0010 (4'd2)  | 496              |
| 4'b 0011 (4'd3)  | 421              |
| 4'b 0100 (4'd4)  | 365              |
| 4'b 0101 (4'd5)  | 323              |
| 4'b 0110 (4'd6)  | 289              |
| 4'b 0111 (4'd7)  | 262              |
| 4'b 1000 (4'd8)  | 239              |
| 4'b 1001 (4'd9)  | 220              |
| 4'b 1010 (4'd10) | 204              |
| 4'b 1011 (4'd11) | 190              |
| 4'b 1100 (4'd12) | 178              |
| 4'b 1101 (4'd13) | 167              |
| 4'b 1110 (4'd14) | 157              |
| 4'b 1111 (4'd15) | 149              |

Table 3.2 The output frequency range of the slow DCO

Table 3.3 lists the frequency range of the fast DCO. It shows that the fast DCO can provide ADDLL with a 982MHz to 1011MHz clock.

| dco_code[1:0] | Output Frequency<br>@1.0V (MHz) |  |  |
|---------------|---------------------------------|--|--|
| 2'b 00 (2'd0) | 982                             |  |  |
| 2'b 01 (2'd1) | 1001                            |  |  |
| 2'b 10 (2'd2) | 1007                            |  |  |
| 2'b 11 (2'd3) | 1011                            |  |  |

Table 3.3 The output frequency range of the fast DCO



#### **3.4.1.2 Divider**

The divider circuit is composed of two type dividers, as shown in Fig. 3.11. One is composed of three DFFs, a MUX, and an inverter. The other is only composed of three DFFs. The difference between two dividers is that the SYSTEM\_CLK signal is only divided by eight, but the clock signals of die1 and die2 are divided by two or eight controlled by the divide\_select signal. Besides, when the lock signal is high, the divider starts to divide the clock signals.



Figure 3.11 The Divider

## 3.4.2 Specifications



Figure 3.12 The layout of the ADDLL with a single TSV

The test chip is fabricated in TSMC 90nm standard performance CMOS process. Fig. 3.12 shows the layout of the proposed ADDLL for 3-D ICs clock synchronization with a single TSV. The core area occupies  $170 \times 170 \ \mu\text{m}^2$  and the chip area including I/O pads occupies  $670 \times 670 \ \mu\text{m}^2$ . The chip is composed of an ADDLL and a tester circuit. The tester circuit can generate a on-chip high-speed clock, and divided the clock signals in die1 and die2.



Figure 3.13 The layout of the ADDLL with a single TSV

Fig. 3.13 shows the chip I/O and the floorplanning of the ADDLL with a single TSV. The ADDLL with a single TSV has 9 output pins, 11 input pins, and 12 power pins. Table 3.4 illustrates the detail I/O pads information.

We use the I\_CLK to observe the system clock frequency to ensure the proposed DCO works normally. The I\_DIE1 and I\_DIE2 are generated to observe the phase error when the ADDLL is locked.

| Pin Number | Number Pin Name |        | Information                              |
|------------|-----------------|--------|------------------------------------------|
| 1          | I_TSV1          | Input  | TSV Delay Control Code                   |
| 2          | I_TSV0          | Input  | TSV Delay Control Code                   |
| 3          | I_CLK           | Output | DCO CLK Divided By 8                     |
| 4          | VDDP1           | Input  | Pad Power                                |
| 5          | I_TDC_GET       | Output | TDC Mapping Finish                       |
| 6          | I_DIV           | Input  | DIE1_CLK & DIE2_CLK<br>Divided By 2 or 8 |
| 7          | VDDP0           | Input  | Pad Power                                |
| 8          | VSSC0           | Input  | Core Power                               |
| 9          | VDDC0           | Input  | Core Power                               |
| 10         | VSSP0           | Input  | Pad Power                                |
| 11         | I_RESET         | Input  | ADDLL RESET                              |
| 12         | I_DIE2          | Output | DIE2_CLK                                 |
| 13         | VDDP3           | Input  | Pad Power                                |
| 14         | I_DIE1          | Output | DIE1_CLK                                 |
| 15         | I_PULSE_BACK0   | Output | PG Pulse Out (First)                     |
| 16         | I_PULSE_BACK1   | Output | PG Pulse Back (Second)                   |
| 17         | I_DCO3          | Input  | DCO Control Code                         |
| 18         | I_DCO2          | Input  | DCO Control Code                         |
| 19         | I_DCO1          | Input  | DCO Control Code                         |
| 20         | VSSP3           | Input  | Pad Power                                |
| 21         | I_LOCK          | Output | ADDLL LOCK                               |
| 22         | I_DCO0          | Input  | DCO Control Code                         |
| 23         | VSSP2           | Input  | Pad Power                                |
| 24         | VDDC1           | Input  | Core Power                               |
| 25         | VSSC1           | Input  | Core Power                               |
| 26         | VDDP2           | Input  | Pad Power                                |
| 27         | I_EN_SLOW       | Input  | Enable Slow DCO                          |
| 28         | I_PULSE_GEN     | Output | Pulse Generated By PG                    |
| 29         | VSSP1           | Input  | Pad Power                                |
| 30         | I_PD_EN         | Output | Locking Procedure Start                  |
| 31         | I_EN_FAST       | Input  | Enable Fast DCO                          |
| 32         | I_TSV2          | Input  | TSV Delay Control Code                   |

Table 3.4 I/O pins information of ADDLL with a single  $\ensuremath{\mathsf{TSV}}$ 

## 3.4.3 Simulation Results

Fig. 3.14 shows the post-layout simulation of ADDLL with a single TSV, and the operation frequency is 1GHz. The proposed ADDLL with a single TSV can operate with PVT variations. The operation frequency of proposed ADDLL with a single TSV ranges from 200MHz to 1GHz. First, as we mentioned in section 3.2, the PG generates a pulse signal to transmit from die1 to die2, and the TDC\_Embedded CDL mirrors the delay time of the TSV channel. At this moment, the mirror\_sig and DIE2\_CLK are phase aligned. Second, the controller starts to increase the coarse code of DCDL until the lock signal of PD is high. Finally, the controller updates the fine-tuning code of the DCDL to maintain the phase alignment between DIE1\_CLK and DIE2\_CLK.



Figure 3.14 The post-layout simulation of ADDLL with a single TSV

To ensure that the ADDLL with a single TSV operates normally, and we also observe the phase error between DIE1\_CLK and DIE2\_CLK. After the ADDLL is locked, we sample 1000 numbers of phase error with PVT variations. In FF case, the proposed ADDLL operates with 420MHz and 1.5GHz clock signals. The phase error is smaller than 70ps after the ADDLL is locked, as shown in Fig. 3.15 and Fig. 3.16. In TT case, the proposed ADDLL operates with 300MHz and 1GHz clock signals. The phase error is smaller than 80ps after the ADDLL is locked, as shown in Fig. 3.17 and Fig. 3.18. In SS case, the proposed ADDLL operates with 187MHz and 648MHz clock signals. The phase error is smaller than 170ps after the ADDLL is locked, as shown in Fig. 3.19 and Fig. 3.20.



Figure 3.15 The phase error between DIE1\_CLK and DIE2\_CLK at FF corner with 1GHz operation frequency.


Figure 3.16 The phase error between DIE1\_CLK and DIE2\_CLK at FF corner



Figure 3.17 The phase error between DIE1\_CLK and DIE2\_CLK at TT corner

with 1GHz operation frequency.



Figure 3.18 The phase error between DIE1\_CLK and DIE2\_CLK at TT corner



Figure 3.19 The phase error between DIE1\_CLK and DIE2\_CLK at SS corner with 1GHz operation frequency.



Figure 3.20 The phase error between DIE1\_CLK and DIE2\_CLK at SS corner



### 3.4.4 Phase Error

In section 3.4.3, the reasons of phase error are composed of the TSV delay mirror mismatch, the deadzone of phase detector, and the signal dividers. The TSV delay mirror mismatch is composed of two possible reasons. The first reason is caused by the capacitance unbalance between upper path and bottom path in die2. The second reason is caused by the TSV delay mirror mismatch by the TDC-Embedded CDL, as shown in Table 3.5.

Table 3.5 The TSV mirror mismatch

|              | FF   | TT   | SS   |
|--------------|------|------|------|
| TSV delay    |      |      |      |
| Mirror       | < 22 | < 30 | < 47 |
| Mismatch(ps) |      |      |      |
|              |      | 15)7 |      |

The deadzone of phase detector is limited by the DFFs, and the deadzone value of the phase detector is shown in Table 3.6.

|                  | FF | TT   | SS   |
|------------------|----|------|------|
| Deadzone<br>(ps) | 19 | < 31 | < 51 |

Table 3.6 The deadzone of the phase detector

## 3.5 Performance Comparisons

### 3.5.1 Two Proposed ADDLLs

Table 3.7 shows the comparison table between the proposed ADDLL with two TSVs and the proposed ADDLL with a single TSV. Both of two architectures are implemented in 90nm process. The proposed ADDLL with two TSVs can eliminate the TSV delay variation by the DCV-based delay line before the ADDLL's normal operation. That means, the TSV delay variation problem can be solved. As compared with ADDLL with a single TSV architecture, the phase error is smaller. However, the area of the proposed ADDLL with two TSVs is larger than the ADDLL with a single TSV due to the high resolution DCV-based delay line. Besides, the proposed ADDLL with a single TSV only uses a single TSV channel, and thus, the TSV delay variation can be avoided. The area of ADDLL with a single TSV is smaller, and the cost for TSV manufacturing is also lower. To sum up, both of two architectures have its advantages, and they are both suitable to be applied to 3-D ICs die-to-die clock synchronization.

|                         | ADDLL with  | ADDLL with   |  |
|-------------------------|-------------|--------------|--|
|                         | two TSVs    | a single TSV |  |
| Туре                    | All-Digital | All-Digital  |  |
| Process                 | 90nm        | 90nm         |  |
| Supply                  | 103         | 1.017        |  |
| Voltage                 | 1.0 V       | 1.0V         |  |
| Number of               | 2           | 1            |  |
| TSV channel             | 2           | 1            |  |
| Onenstion               | 300MHz      | 200MHz       |  |
| Operation               | ~           | ~            |  |
| Frequency               | 1GHz        | 1GHz         |  |
| Phase Error             | < 21.9ps    | < 80ps       |  |
| Area (mm <sup>2</sup> ) | 0.09        | 0.0289       |  |
| Power (mW)              | 3.27 @ 1GHz | 4.88 @ 1GHz  |  |

Table 3.7 Comparison table of ADDLL between a single TSV and two TSVs

Table 3.8 illustrates the gate count numbers of two proposed ADDLL, and it shows that the gate count of two proposed ADDLL is almost same. That means, the core utilization of ADDLL with two TSVs is not enough, and its layout can be compacted.

|                | Gate count of ADDLL | Gate count of ADDLL |  |
|----------------|---------------------|---------------------|--|
|                | with two TSVs       | with a single TSV   |  |
| DIE1           | 3102                | 3206                |  |
| DIE2           | 427                 | 298                 |  |
| Tester Circuit | N/A                 | 266                 |  |

Table 3.8 Gate count of ADDLL between a single TSV and two TSVs

#### **3.5.2** Comparison with prior arts

Table 3.9 illustrates the performance comparison with prior arts. The proposed two ADDLL architectures can solve the TSV delay variation phenomenon. If the TSV delay variation cannot be solved, [2] may cause a large phase error due to the mirrored TSV delay line. Besides, the area of the proposed two ADDLL architectures is smaller than [2]. As compared to [3], the operation speed of the proposed design is higher. Furthermore, the [3] needs to switch the forward path and feedback path in the phase maintain mode, it may cause a large phase error during this mode. [4] is an analog architecture, and its power consumption is higher than the proposed two ADDLL architectures. Besides, the two DLLs work at the same time with the bidirectional buffers will increase the design complexity. In the advanced COMS process, the analog charge-pump-based D-DLL faces the design challenges, such as high voltage gain with a low supply voltage and leakage current of MOS transistors.

In summary, the proposed two ADDLL architectures can prevent the TSV delay variation problem, and it is suitable for 3-D ICs die-to-die clock synchronization.

|                          | ADDLL<br>with<br>two TSVs | ADDLL<br>with a<br>single TSV | JSSC'13<br>[30] | TCAS-I'13<br>[31] | ISCAS'12<br>[32] |
|--------------------------|---------------------------|-------------------------------|-----------------|-------------------|------------------|
| Туре                     | All-Digital               | All-Digital                   | All-Digital     | All-Digital       | Analog           |
| Process                  | 90nm                      | 90nm                          | 130nm           | 90nm              | 0.18µm           |
| Supply<br>Voltage        | 1.0V                      | 1.0V                          | 1.2V            | 1.0V              | 1.8V             |
| Number of<br>TSV channel | 2                         | 1                             | 2               | 2                 | 2                |
|                          | 300MHz                    | 200MHz                        | 200MHz          | 50 MHz            | 556 MHz          |
| Frequency                | ~                         | ~                             | ~               | ~                 | ~                |
|                          | 1 GHz                     | 1 GHz                         | 1.6GHz          | 600 MHz           | 1.5 GHz          |
| Phase Error              | < 21.9ps                  | < 80ps                        | <50ps*          | < 15.8ps          | < 2ps            |
| Area (mm <sup>2</sup> )  | 0.09                      | 0.0289                        | 0.12**          | 0.0088            | N/A              |
|                          | 3.27mW                    | 4.88mW                        |                 |                   |                  |
| Dorrige                  | @1GHz                     | @1GHz                         | 0.9mW           | 1.8mW             | 56mW             |
| Power                    | 1.06mW                    | 2.61mW                        | @1.6GHz         | @600MHz           | @1.5GHz          |
|                          | @600MHz                   | @600MHz                       | $\mathbf{E}$    |                   |                  |

Table 3.9 Performance Comparison with prior arts

\*if replica delay matched perfectly \*\*two dies area

Table 3.10 illustrates that the power consumption of ADDLL with a single TSV can be further decreased by the clock gating of the 64 DFFs of the TDC-Embedded CDL. However, the trigger clock signals of 64 DFFs are not gated in the current tape-out version.

|                  | Power Consumption of 64 DFFs |
|------------------|------------------------------|
| 64 DFFs @ 600MHz | 0.611mW                      |
| 64DFFs @ 1GHz    | 1.012mW                      |

Table 3.10 The power consumption of the DFFs

# Chapter 4 Conclusion and Future Works

### 4.1 Conclusion

Two ADDLL for 3-D ICs die-to-die clock synchronization with PVT variations are proposed in this thesis. First, the ADDLL with two TSVs can solve the TSV delay variation with the proposed DCV-based delay line, and it can operate with an input frequency ranging from 300MHz to 1GHz. However, the larger chip area is the disadvantage of this architecture. Second, the ADDLL with a single TSV can avoid the TSV delay variation, because this architecture only requires a single TSV channel. It can operate with an input frequency ranging from 200MHz to 1GHz. The proposed architectures are easy to be ported to different process in a short time, because they are implemented with standard cells, Therefore, both of two ADDLL architectures are suitable for 3-D ICs die-to-die clock synchronization.

### 4.2 Future Works

In recent years, the TSV package technology is more and more popular, and it has many advantages. The advantages of TSV are increasing numbers of I/O saving, power consumption, and reducing propagation delay time. With the skillful advanced process, the TSV package technology can improve the SoC performance. However, the TSV package technology has many design challenges.

For the 3D-ICs RAM application, many dies are connected with TSVs, but the TSV delay values of each die are different from each other. That's a serious problem to transfer and receive data. That's why we propose the ADDLL architectures for 3-D ICs die-to-die clock synchronization. However, the ADDLL for 3-D ICs die-to-die clock synchronization also faces the design challenges about the multi-layer dies. The multi-layer die-to-die clock synchronization problem should be solved in the future. Besides, the numbers of chip implemented with TSVs becomes more and more, and the reference clock signal given by the master chip transmits to slave dies. The multi-layer die-to-die clock synchronization problem must be solved, or the other slave dies may not work correctly.

## References

- [1] Joohee Kim, Jun So Pak, Jonghyun Cho, Eakhwan Song, Jeonghyeon Cho, Heegon Kim, Taigon Song, Junho Lee, Hyungdong Lee, Kunwoo Park, Seungtaek Yang, Min-Suk Suh, Kwang-Yoo Byun, and Joungho Kim, "High-Frequency Scalable Electrical Model and Analysis of a Through Silicon Via (TSV)," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 1, no. 2, pp. 181-195, Feb. 2011.
- [2] B. Xie, X. Q. Shi, C. H. Chung, and S. W. R. Lee, "Novel Sequential Electro-Chemical and Thermo-Mechanical Simulation Methodology for Annular Through-Silicon-Via (TSV) Design," in Proceedings of 60th Electronic Components and Technology Conference (ECTC), Jun. 2010, pp. 1166-1172.
- [3] Sergej Deutsch and Krishnendu Chakrabarty, "Non-Invasive Pre-Bond TSV Test Using Ring Oscillators and Multiple Voltage Levels," *in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE)*, Mar. 2013, pp. 1065-1070.
- [4] Yu-Hsiang Lin, Shi-Yu Huang, Kun-Han Tsai, Wu-Tung Cheng, Yung-Fa Chou, and Ding-Ming Kwai, "Parametric Delay Test of Post-Bond Through-Silicon Vias in 3-D ICs via Variable Output Thresholding Analysis," *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 5, pp. 181-195, May. 2013.
- [5] Jhih-Wei You, Shi-Yu Huang, Ding-Ming Kwai, Yung-Fa Chou, and Cheng-Wen Wu, "Performance Characterization of TSV in 3D IC via Sensitivity Analysis," *in Proceedings of 19th IEEE Asian Test Symposium (ATS)*, Dec. 2010, pp. 398-394.
- [6] Shi-Yu Huang, Yu-Hsiang Lin, Kun-Han (Hans) Tsai, Wu-Tung Cheng, Stephen Sunter, Yung-Fa Chou, and Ding-Ming Kwai, "Small delay testing for TSVs in 3-D ICs," *in Proceedings of 49th ACM/EDAC/IEEE Design Automation Conference (DAC)*, Jun. 2012, pp. 1031-1036.
- [7] Minki Cho, Chang Liu, Dae Hyun Kim, Sung Kyu Lim, and Saibal Mukhopadhyay, "Pre-Bond and Post-Bond Test and Signal Recovery Structure to Characterize and Repair TSV Defect Induced Signal Degradation in 3-D System," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 1, no. 11, pp. 1718-1727, Nov. 2011.
- [8] Fangming Ye and Krishnendu Chakrabarty, "TSV Open Defects in 3D Integrated Circuits: Characterization, Test, and Optimal Spare Allocation," in Proceedings of 49th ACM/EDAC/IEEE Design Automation Conference (DAC), Jun. 2012, pp.

1024-1030.

- [9] Yaoyao Ye, Jiang Xu, Baihan Huang, Xiaowen Wu, Wei Zhang, Xuan Wang, Mahdi Nikdast, Zhehui Wang, Weichen Liu, and Zhe Wang, "3-D Mesh-Based Optical Network-on-Chip for Multiprocessor System-on-Chip," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 32, no. 4, pp. 584-596, Apr. 2013.
- [10] M.H Jabbar, D. Houzet, and O. Hammami, "3D Multiprocessor with 3D NoC Architecture Based on Tezzaron Technology," in Proceedings of IEEE International 3D Systems Integration Conference (3DIC), Jan. 2012, pp. 1-5.
- [11] Yue Qian, Zhonghai Lu, and Wenhua Dou, "From 2D to 3D NoCs: A Case Study on Worst-Case Communication Performance," in Proceedings of IEEE/ACM International Computer-Aided Design (ICCAD), Nov. 2009, pp. 555-562.
- [12] Mohamad Hairol Jabbar, Dominique Houzet, and Omar Hammami, "Impact of 3D IC on NoC Topologies: A Wire Delay Consideration," *in Proceedings of Euromicro Conference on Digital System Design (DSD)*, Nov. 2013, pp. 68-72.
- [13] Michael Buttrick and Sandip Kundu, "On Testing Prebond Dies with Incomplete Clock Networks in a 3D IC using DLLs," *in Proceedings of Design, Automation* & Test in Europe Conference & Exhibition (DATE), Mar. 2011, pp. 1-6.
- [14] Kun-Chih Chen, En-Jui Chang, Huai-Ting Li, and An-Yeu (Andy) Wu, "RC-based Temperature Prediction Scheme for Proactive Dynamic Thermal Management in Throttle-based 3D NoCs," in press, IEEE Transactions on Parallel and Distributed Systems, Feb. 2014.
- [15] Chih-Hao Chao, Kun-Chih Chen, and An-Yeu (Andy) Wu, "Routing-Based Traffic Migration and Buffer Allocation Schemes for 3-D Network-on-Chip Systems With Thermal Limit," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 11, pp. 2118-2131, Nov. 2013.
- [16] Kun-Chih Chen, Shu-Yen Lin, Hui-Shun Hung, and An-Yeu (Andy) Wu, "Traffic-Balanced Topology-Aware Multiple Routing Adjustment for Throttled 3D NoC Systems," in Proceedings of IEEE Workshop on Signal Processing Systems (SiPS), Oct. 2012, pp. 120-124.
- [17] Chih-Hao Chao, Tsu-Chu Yin, Shu-Yen Lin, and An-Yeu (Andy) Wu, "Transport Layer Assisted Routing for Non-Stationary Irregular Mesh of Thermal-Aware 3D Network-on-Chip Systems," *in Proceedings of IEEE International SOC Conference (SOCC)*, Sep. 2011, pp. 284-289.
- [18] Xin Zhao, Saibal Mukhopadhyay, and Sung Kyu Lim, "Variation-Tolerant and Low-Power Clock Network Design for 3D ICs," in Proceedings of IEEE 61st Electronic Components and Technology Conference (ECTC), May. 2011, pp. 2007-2014.

- [19] Uksong Kang, Hoe-Ju Chung, Seongmoo Heo, Duk-Ha Park, Hoon Lee, Jin Ho Kim, Soon-Hong Ahn, Soo-Ho Cha, Jaesung Ahn, DukMin Kwon, Jae-Wook Lee, Han-Sung Joo, Woo-Seop Kim, Dong Hyeon Jang, Nam Seog Kim, Jung-Hwan Choi, Tae-Gyeong Chung, Jei-Hwan Yoo, Joo Sun Choi, Changhyun Kim, and Young-Hyun Jun, "8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 1, pp. 111-119, Jan. 2010.
- [20] Sukeshwar Kannan, Bruce Kim, Sang-Bock Cho, and Byoungchul Ahn, "Analysis of Propagation Delay in 3 – D Stacked DRAM," in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), May. 2012, pp. 1839-1842.
- [21] Christian Weis, Igor Loi, Luca Benini, and Norbert Wehn, "Exploration and Optimization of 3-D Integrated DRAM Subsystems," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 32, no. 4, pp. 597-610, Apr. 2013.
- [22] Michael B. Healy, and Sung Kyu Lim, "Distributed TSV Topology for 3-D Power-Supply Networks," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 20, no. 11, pp. 2066-2079, Nov. 2012.
- [23] Kiyeong Kim, Chulsoon Hwang, Kyoungchoul Koo, Jonghyun Cho, Heegon Kim, Joungho Kim, Junho Lee, Hyung-Dong Lee, Kun-Woo Park, and Jun So Pak, "Modeling and Analysis of a Power Distribution Network in TSV-Based 3-D Memory IC Including P/G TSVs, On-Chip Decoupling Capacitors, and Silicon Substrate Effects," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 2, no. 12, pp. 2057-2070, Dec. 2012.
- [24] Wen Yueh, Subho Chatterjee, Amit R. Trivedi, and Saibal Mukhopadhyay, "Performance and Robustness of 3-D Integrated SRAM Considering Tier-to-Tier Thermal and Supply Crosstalk," *IEEE Transactions on Components, Packaging* and Manufacturing Technology, vol. 3, no. 6, pp. 943-953, Jun. 2013.
- [25] Eunseok Song, Kyoungchoul Koo, Jun So Pak, and Joungho Kim, "Through-Silicon-Via-Based Decoupling Capacitor Stacked Chip in 3-D-ICs," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 3, no. 9, pp. 1467-1480, Sep. 2013.
- [26] Ching-Che Chung, Duo Sheng, and Wei-Siang Su, "A 0.5V1.0V Fast Lock-In ADPLL for DVFS Battery-Powered Devices," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2013.
- [27] Duo Sheng, Ching-Che Chung, and Chen-Yi Lee, "A Fast-Lock-In ADPLL with High-Resolution and Low-Power DCO for SoC Applications," *in Proceedings of* 2006 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Dec.

2006, pp. 105-108.

- [28] Ching-Yuan Yang, and Shen-Iuan Liu, "A One-Wire Approach for Skew-Compensating Clock Distribution Based on Bidirectional Techniques," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 2, pp. 266-272, Feb. 2001.
- [29] Ching-Che Chung and Chia-Lin Chang, "A Wide-Range All-Digital Delay-Locked Loop in 65nm CMOS Technology," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2010, pp. 66-69.
- [30] Soo-Bin Lim, Hyun-Woo Lee, Junyoung Song, and Chulwoo Kim, "A 247 μW 800 Mb/s/pin DLL-Based Data Self-Aligner for Through Silicon via (TSV) Interface," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 3, pp. 711-723, Mar. 2013.
- [31] Ji-Wei Ke, Shi-Yu Huang, Chao-Wen Tzeng, Ding-Ming Kwai, and Yung-Fa Chou, "Die-to-Die Clock Synchronization for 3-D IC Using Dual Locking Mechanism," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 4, pp. 908-917, Apr. 2013.
- [32] Ai-Jia Chuang, Yu Lee, and Ching-Yuan Yang, "A Chip-to-Chip Clock-Deskewing Circuit for 3-D ICs," in Proceedings of 2012 IEEE International Symposium on Circuits and Systems (ISCAS), May. 2012, pp. 1652-1655.
- [33] Ching-Che Chung and Chi-Yu Hou, "All-Digital Delay-Locked Loop for 3D-IC Die-to-Die Clock Synchronization," *in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2014.*