# 國立中正大學

資訊工程研究所碩士論文

可工作於 0.5V/1.0V 並具有非對稱製程 漂移容忍度之低功耗延遲線重複利用全 數位責任週期校正電路

A 0.5V/1.0V Low-Power Delay-Recycled All-Digital Duty-Cycle Corrector with Unbalanced Process Variations Tolerance

| 研究生: |   | 李長潤 |    |
|------|---|-----|----|
| 指導教授 | • | 鍾菁哲 | 博士 |

中華民國 一零二 年 七 月

國立中正大學碩士班研究生

學位考試同意書

本人所指導 資訊工程學系

研究生 李長潤 所提之論文

可工作於 0.5V/1.0V 並具有非對稱製程漂移容忍度之低功耗延遲 線重複利用全數位責任週期校正電路(A 0.5V/1.0V Low-Power Delay-Recycled All-Digital Duty-Cycle Corrector with Unbalanced Process Variations Tolerance)

同意其提付 碩 士學位論文考試

童喜药 资章 指導教授 102年5月28日

國立中正大學碩士學位論文考試審定書

#### 資訊工程學系

#### 研究生李長潤所提之論文

可工作於 0.5V/1.0V 並具有非對稱製程漂移容忍度之低功 耗延遲線重複利用全數位責任週期校正電路(A 0.5V/1.0V Low-Power Delay-Recycled All-Digital Duty-Cycle Corrector with Unbalanced Process Variations Tolerance)

經本委員會審查,符合碩士學位論文標準。



#### 博碩士論文授權書

#### (本聯請裝訂於論文紙本書名頁前空白處,供學校圖書館做為授權管理用) ID:101CCU00392047

本授權書所授權之論文為授權人在<u>國立中正</u>大學(學院)<u>資訊工程研究所</u>系所 \_\_\_\_\_ 組 <u>101</u> 學年度第 二 學期取得 項 士學位之論文。

論文題目: <u>可工作於0.5V/1.0V並具有非對稱製程漂移容忍度之低功耗延遲線重複利用全數位</u> 責任週期校正電路

指導教授: <u>鍾菁哲, Ching-Che Chung</u>

茲同意將授權人擁有著作權之上列論文全文(含摘要),提供讀者基於個人非營利性質之線上檢 索、閱覽、下載或列印,此項授權係非專屬、無償授權國家圖書館及本人畢業學校之圖書館, 不限地域、時間與次數,以微縮、光碟或數位化方式將上列論文進行重製,並同意公開傳輸數 位檔案。

紙本論文:茲同意將授權人擁有著作權之上列論文全文(含摘要),提供讀者基於個人非營利性 質之閱覽或列印,此項授權系非專屬、無償授權國立中正大學圖書館做為編目上架及公開陳列 閱覽使用。

□ 校内外立即開放

□ 校内立即開放,校外於 2018 年 07 月 26 日後開放

☑ 校內於 2018 年 07 月 26 日;校外於 2018 年 07 月 26 日後開放
□ 其他 \_\_\_\_\_

授權人:李長潤

The Fe 名:

日期: 102年 7月26.日

## 摘要

由於 Clock Tree Buffers 本身的非對稱充、放電時間影響,當晶片上輸入各模 組的時脈信號在穿越 Clock Tree 上串接的 Clock Buffers 時,其責任週期將受到 破壞。然而,對於高速資料傳輸電路來說,例如:雙倍資料率同步動態隨機存取 記憶體 (DDR SDRAM) 與雙重取樣類比數位轉換器 (ADC) 等,它們透過參考 時脈信號的正、負緣來取樣資料。參考時脈信號的責任週期誤差將導致這些電路 不正常地工作。因此,我們必須在系統晶片 (SoC) 內加上責任週期校正電路 (DCC),將被破壞的時脈信號之責任週期校正回百分之五十。

隨著節能意識的抬頭,設計出一個低功率消耗的電子產品是必要的。根據電 晶體的動態功率消耗公式, P = CV<sup>2</sup>f,如果我們將供給電壓減為原本的二分 之一,我們將可以節省百分之七十五的功率消耗。然而,在接近臨界電壓的工作 電壓下,電晶體的充、放電速度將變得更緩慢。因此,邏輯閘本身的延遲時間也 將變得更長,連帶影響整體電路的表現成果。

因此,本論文提出一個能工作在兩種電壓之下並具有非對稱製程漂移影響容 忍度之低功耗延遲線重複利用全數位責任週期校正電路 (ADDCC) 並以90 奈米 製程標準元件庫實現。除此之外,本論文所提出之 ADDCC 具有以下特色:快速 鎖定、低晶片面積使用率、低功耗及高校正準度的特色,適合應用在低功耗考量 的裝置中。

**關鍵詞**:全數位責任週期校正電路、重複利用半週期延遲線、時間對數位轉換器、 非對稱製程漂移容忍度。

## Abstract

Due to the unbalanced rise time and fall time of the clock tree buffers, the duty-cycle of the on-chip clock may be distorted when it is distributed through the clock buffers to every module. However, for high speed data communication applications, such as double data rate synchronous dynamic random access memory (DDR SDRAM) and double sampling analog-to-digital converter (ADC), it requires to sample the input data via the positive and negative edges of the reference clock. Duty-cycle error causes malfunction in these applications. For the sake of this requirement, a duty-cycle corrector (DCC) is employed in the system-on-a-chip (SoC) to correct the distorted clock.

With the growing recognition of energy savings, designing low-power electronic devices is demanded. According to the dynamic power dissipation equation,  $P = CV^2 f$ , if we reduce the supply voltage to one-half of the nominal voltage, it can reduce 75% of power dissipations. However, the operating voltage near to the threshold voltage makes transistors charging and discharging slower. Hence, the intrinsic delay of logic gates becomes longer and directly affects the overall chip performance.

Hence, an all-digital duty-cycle corrector (ADDCC) with dual supply voltage mode and unbalanced process variation tolerance is presented in this thesis. The proposed ADDCC is implemented in TSMC 90nm CMOS process with standard cells. The proposed ADDCC has following characteristics: fast lock-in time, low area cost, low power consumption and high precision in duty-cycle correcting. Therefore, it is very suitable for low-power applications.

Index Terms — All-Digital Duty-Cycle Corrector, Delay-Recycled Half-Cycle Delay

Line, Time-to-Digital Converter, Unbalanced Process Variations Tolerance.



## 致 謝

首先,我要誠摯地感謝指導教授—鍾菁哲博士—這兩年來的指導。無論是從 進實驗室前的新生訓練、做研究的過程、發表人生第一篇會議論文、晶片下線、 參加會議上台英文簡報一直到完成這本論文,鍾老師十分具有教學熱忱,總是不 遺餘力地教導我,也不時的給予我鼓勵與意見,讓我在 IC 設計領域中獲益匪淺。 除此之外,鍾老師的教導也讓我在尋找研發替代役機會時驗證了:從 S3 LAB 畢 業的學生,實力絕對不亞於台清交的學生。

第二,我要感謝我的父母。我的父母從小讓我自由地選擇想念的科系、學校 並且不惜金錢地栽培我的語言能力。最重要的是,在我念私立大學四年與中正資 工所兩年這六年裡,我的父母沒有申請就學貸款,讓我不用出社會即背債,可以 安心地完成學業,我十分感激。

第三,我要感謝鐵三角一牙籤、阿民。鐵三角從高一認識到現在已九年,感 情一樣不變。謝謝你們倆在這九年間的相互照顧,讓我在人生低潮時成為我訴苦 的對象,一解心中牢騷。鐵三角能湊在一塊真是上天給予的福氣,我絕對珍惜。

第四,我要感謝摯友們—小書、小可、哈比、阿溫、魚丸以及點點。謝謝你 們無時無刻給予的加油打氣,使我在幾次的極度沮喪中作為我的後盾,要我咬牙 撐下去。和你們出遊可以沒有心機、沒有壓力地盡情玩耍,每個人都還是像以前 剛認識時那樣的單純,這份友誼我永生難忘。

最後,IC 設計這條路是我自己選的,我絕對負責到底!

李長潤

中華民國一零二年七月

寫於國立中正大學資工所

## Contents

| Chapte | er 1  | Introduction                                                | 1    |
|--------|-------|-------------------------------------------------------------|------|
| 1.1    | Int   | roduction to Duty-Cycle Corrector                           | 1    |
| 1.2    | Pri   | or Duty-Cycle Correctors                                    | 3    |
|        | 1.2.1 | Conventional PWCL-based Analog DCC                          | 3    |
|        | 1.2.2 | Synchronous Mirror Delay Based DCC                          | 4    |
|        | 1.2.3 | Time-to-Digital Converter Based DCC                         | 6    |
|        | 1.2.4 | Delay-Recycled DCC                                          | 8    |
| 1.3    | Mo    | otivation                                                   | .10  |
| 1.4    | De    | sign Challenges with Dynamic Voltage and Frequency Scaling. | . 11 |
| 1.5    | Th    | esis Organization                                           | .14  |
| Chapte | er 2  | Architecture of All-Digital Duty-Cycle Corrector            | .16  |
| 2.1    | Are   | chitecture Overview                                         | .16  |
| 2.2    | De    | lay Line Architecture                                       | .18  |
| 2.3    | Th    | e Locking Procedure                                         | .19  |
| 2.4    | Th    | e Fast-Locking Mechanism                                    | .21  |
| 2.5    | De    | sign Constraints                                            | .24  |
| Chapte | er 3  | Circuit Design and Implementation of ADDCC                  | .25  |
| 3.1    | Pul   | lse Generator                                               | .25  |
| 3.2    | Pha   | ase and Frequency Detector                                  | .27  |
| 3.3    | TD    | OC-embedded HCDL                                            | .30  |
|        | 3.3.1 | TDC-Embedded CDL                                            | .30  |
|        | 3.3.2 | FDL                                                         | .34  |
| 3.4    | De    | sign Challenges in Dynamic Voltage and Frequency Scaling    | .39  |

|        | 3.4.1 | TDC Quantization Error                                | 39      |
|--------|-------|-------------------------------------------------------|---------|
|        | 3.4.2 | Bottleneck in Increasing the Maximum Operating Freque | ency.42 |
| Chapte | er 4  | Experimental Results                                  | 44      |
| 4.1    | Tes   | st Circuit Implementation                             | 44      |
|        | 4.1.1 | MUX-typed DCO                                         | 46      |
|        | 4.1.2 | Duty-Cycle Generator                                  | 48      |
|        | 4.1.3 | DIV_FOUR Circuit                                      | 50      |
|        | 4.1.4 | Level Shifter                                         | 52      |
| 4.2    | AĽ    | DDCC Ver.1                                            | 55      |
|        | 4.2.1 | Specifications                                        | 55      |
|        | 4.2.2 | Simulation Results                                    | 58      |
|        | 4.2.3 | Measurement Results                                   | 74      |
| 4.3    | AĽ    | DDCC Ver.2                                            | 81      |
|        | 4.3.1 | Specifications                                        | 81      |
|        | 4.3.2 | Simulation Results                                    | 84      |
| 4.4    | Per   | rformance Comparisons                                 | 85      |
|        | 4.4.1 | ADDCC Ver.1                                           | 85      |
|        | 4.4.2 | ADDCC Ver.2                                           | 88      |
| Chapte | er 5  | Conclusion and Future Works                           | 90      |
| 5.1    | Co    | nclusion                                              | 90      |
| 5.2    | Fu    | ture Works                                            | 91      |
| Refere | nces  |                                                       | 92      |

# **List of Figures**

| Figure 1.1 Duty-Cycle Distortion Phenomenon1                                      |
|-----------------------------------------------------------------------------------|
| Figure 1.2 Duty-Cycle Distortion with (a) 7 (b) 20 buffers at 1.0V2               |
| Figure 1.3 The conventional PWCL-based Analog DCC                                 |
| Figure 1.4 The Synchronous Mirror Delay Based DCC4                                |
| Figure 1.5 The Block Diagram of TDC6                                              |
| Figure 1.6 The Block and Timing Diagram of the Delay-Recycled DCC8                |
| Figure 1.7 The Non-monotonic Response Problem during Coarse-tuning Control        |
| Code Switching9                                                                   |
| Figure 1.8 Duty-Cycle Distortion with (a) 7 (b) 20 buffers at 0.5V                |
| Figure 1.9 The Pulse (a) Stretching/ (b)Shrinking Timing Diagram of the SMD-based |
| DCC                                                                               |
| Figure 1.10 The Block Diagram of the Dual Loop Architecture ADDCC [14], [15]13    |
| Figure 2.1 The Proposed ADDCC [16]16                                              |
| Figure 2.2 The TDC-Embedded HCDL18                                                |
| Figure 2.3 Overall Timing Diagram of the Proposed ADDCC19                         |
| Figure 2.4 Timing Diagram of TDC at Low Frequency Operation21                     |
| Figure 2.5 Timing Diagram of TDC at High Frequency Operation22                    |
| Figure 3.1 The Block and Timing Diagram of PG25                                   |
| Figure 3.2 PG generates pulses with the pulse width of the input clock larger and |
| smaller than the buffer chain delay26                                             |
| Figure 3.3 The Harmonic Lock Phenomenon27                                         |
| Figure 3.4 The Phase and Frequency Detector [17]28                                |
| Figure 3.5 The MUX-based Delay Line with a low supply voltage                     |
|                                                                                   |

| Figure 3.6 Duty-Cycle Distortion Test of TDC-Embedded CDL with 1.0V power          |
|------------------------------------------------------------------------------------|
| supply31                                                                           |
| Figure 3.7 Duty-Cycle Distortion Test of TDC-Embedded CDL with 0.5V power          |
| supply32                                                                           |
| Figure 3.8 The FDL [20]35                                                          |
| Figure 3.9 DNL of the FDL                                                          |
| Figure 3.10 INL of the FDL                                                         |
| Figure 4.1 The Block Diagram of the Test Chip Circuit                              |
| Figure 4.2 The Block Diagram of Mux-typed DCO46                                    |
| Figure 4.3 The Block Diagram of Duty-Cycle Generator                               |
| Figure 4.4 The Block Diagram of DIV_FOUR Circuit                                   |
| Figure 4.5 The Timing Diagram of DIV_FOUR Circuit                                  |
| Figure 4.6 The Schematic Diagram of the Proposed Level Shifter                     |
| Figure 4.7 The Layout of the Proposed Level Shifter                                |
| Figure 4.8 The Timing Diagram of the Proposed Level Shifter                        |
| Figure 4.9 Microphotograph of ADDCC Ver.155                                        |
| Figure 4.10 Chip I/O Planning and Floorplanning in ADDCC Ver.1                     |
| Figure 4.11 The Output Duty-Cycle Error at Typical Process Corner60                |
| Figure 4.12 The Output Duty-Cycle Error with PVT variations                        |
| Figure 4.13 Convergence of the Output Duty-Cycle Error with PVT variations, a 1.0V |
| Power Supply, and Highest Frequency Operation62                                    |
| Figure 4.14 Convergence of the Output Duty-Cycle Error with PVT variations, a 1.0V |
| Power Supply, and Lowest Frequency Operation63                                     |
| Figure 4.15 Convergence of the Output Duty-Cycle Error with PVT variations, a 0.5V |
| Power Supply, and Highest Frequency Operation64                                    |

# **List of Tables**

| Table 1.1 PVT conditions in this thesis                                           |
|-----------------------------------------------------------------------------------|
| Table 3.1 The dead-zone of the proposed PFD    29                                 |
| Table 3.2 Properties of The Proposed TDC-Embedded HCDL                            |
| Table 3.3 Properties of The Proposed FDL                                          |
| Table 4.1 The Controllable Output Frequency Range of the MUX-typed DCO         47 |
| Table 4.2 Controllable Duty-Cycle Range of the Proposed Duty-Cycle Generator49    |
| Table 4.3 I/O Pins Information of ADDCC Ver.1    57                               |
| Table 4.4 Properties of the Test Circuit in ADDCC Ver.1    74                     |
| Table 4.5 Jitter Measurement Results of the System and Output Clock               |
| Table 4.6 I/O Pins Information of ADDCC - Ver.2    83                             |
| Table 4.7 Performance Comparisons with ADDCC Ver.1    85                          |
| Table 4.8 Performance Comparisons with ADDCC Ver.2    88                          |

## Chapter 1 Introduction

#### **1.1 Introduction to Duty-Cycle Corrector**

High speed data communication applications, such as double data rate (DDR) memories and double sampling analog-to-digital converters (ADCs) require sampling the input data via the positive and negative edges of the reference clock. However, the duty-cycle error of the on-chip clock may be distorted as high as +/- 20% when clock signals are distributed to other module blocks through clock buffers [1].



Figure 1.1 Duty-Cycle Distortion Phenomenon

Figure 1.1 shows the phenomenon when PLL's output clock passes through the clock buffers. The duty-cycle will be distorted due to the unbalanced rise time and fall time delay of the clock tree buffers. Figure 1.2 illustrates the simulation of buffer chains with unbalanced rise time and fall time delay in 90nm CMOS process. With more clock buffer buffers, the duty-cycle of the output clock becomes worse. Based on the simulation, the non-CLK type buffer (BUFX2) has larger duty-cycle distortion with process, voltage, and temperature (PVT) variations than that of the CLK type buffer (CLKBUFX2).



Figure 1.2 Duty-Cycle Distortion with (a) 7 (b) 20 buffers at 1.0V

Duty-cycle error causes unbalanced calculating time for sequential circuits. Accordingly, a clock with a 50% duty-cycle is demanded in many applications. For the sake of this requirement, a duty-cycle corrector (DCC) is employed in the system-on-a-chip (SoC) to correct the distorted clock signal with PVT variations. Further, the corrected clock signal should be phase aligned with the input clock to avoid inserting an additional clock skew by the DCC.

#### **1.2 Prior Duty-Cycle Correctors**

In recent years, many DCCs have been proposed and can be classified into two categories: analog DCCs [2] and digital DCCs [3]-[15].

#### 1.2.1 Conventional PWCL-based Analog DCC

Analog DCCs usually use a pulse-width control loop (PWCL) [2] to correct the input clock. Figure 1.3 shows the block diagram of the PWCL-based analog DCC. The analog DCC corrects input clock by continuously adjusting the feedback voltage (Vctrl) of the control stage. Then, the PWCL-based DCC takes relatively long locking time and needs several large on-chip capacitors for filtering control voltage. It often occupies a relatively large chip area and has relatively high power consumption. Further, the PWCL-based DCC has a serious charge pump mismatch problem at unbalanced process corners (i.e. SF or FS). In addition, the MOS device leakage problem in advanced CMOS process also causes ripples on the control voltage and affects the stability of the output clock. In addition, the output clock is not phase aligned with the input clock in the PWCL-based DCC. As a result, all-digital DCCs have been proposed to overcome these drawbacks.



Figure 1.3 The conventional PWCL-based Analog DCC

#### **1.2.2 Synchronous Mirror Delay Based DCC**

The synchronous mirror delay (SMD)-based all-digital duty cycle corrector (ADDCC) [4] uses a half-cycle delay line (HCDL) to generate a 50% duty-cycle clock as illustrated in Figure 1.4. The SMD-based ADDCC generates a short pulse (A) to measure the period of the input clock. At the next input clock cycle, the new pulse selects the propagated pulse and outputs to the mirror delay line through the AND gate. The mirror delay line will generate a new pulse (B) which delays the input clock by half of the input clock (CLK\_IN) period. Meanwhile, the MDL circuit generates C to compensate for the intrinsic delay of the HCDL. Subsequently, the system sends B and C to a Set-Reset (SR) latch and produces a 50% duty-cycle clock.



Figure 1.4 The Synchronous Mirror Delay Based DCC

However, when the short pulse propagates through the delay line, as illustrated in Figure 1.2, the pulse width is easily affected by the delay line at unbalanced process corners. Therefore, once the pulse width is large enough to turn on two or more AND gates, the duty-cycle error becomes larger than expectation.



#### **1.2.3** Time-to-Digital Converter Based DCC

The time-to-digital converter (TDC) is widely used in ADDCCs [5]-[9] to reduce the lock-in time. Figure 1.5 shows the block diagram of the TDC. In the beginning, the Multi-Phase Generator generates multi-phase clock signals from the input clock (CLK\_IN). At the next positive input clock edge, the DFFs capture the multi-phase clock signals. Then, the TDC encoder converts the sampled multi-phase clock signals into digital codes (CLK\_Period [N-1:0]). It merely takes two input clock cycles to obtain the clock period information. Therefore, TDC reduces the locking time as compared to the DCC [11] with successive approximation register controlled (SAR) or the pulse shrinking/stretching approach [12]-[15].



Figure 1.5 The Block Diagram of TDC

However, the TDC costs an extra chip area [5], [6], and it also has a delay mismatch problem in these ADDCCs. Hence, the ADDCCs [7]-[9] integrate the TDC into the delay line, and thus, the chip area can be further reduced. However, since the delay unit restricts the duty-cycle correction resolution, the duty-cycle error of the

output clock depends on the TDC resolution. Besides, the delay line length of the TDCs should be enough for quantizing the input clock period. It is difficult to design a wide range TDC with a high resolution, and thus, either the operating frequency range of these ADDCCs [5], [7] is very limited or the power consumption is very large [5], [6].



#### 1.2.4 Delay-Recycled DCC

Figure 1.6 shows the block and timing diagram of the delay-recycled DCC. The HCDL offers a half-cycle delay time,  $\triangle$ , and the DCC comprises the feedback clock (CLK\_FB) and the input clock (CLK\_IN) to a 50% duty-cycle clock. The delay-recycled ADDCCs [8]-[10] save the number of delay cells and flip-flops of the TDC. Therefore, the operating frequency range can be extended and the chip area and power consumption can be further reduced. However, the duty-cycle error of the delay-recycled ADDCCs [8], [9] is still restricted by the resolution of the delay line. Therefore, the ADDCCs [8], [9] without fine-tuning delay cells cannot achieve a small duty cycle error at high frequency operation. In addition, the binary-weighted delay line (BWDL) [9] also has a non-linearity problem with on-chip variations.



Figure 1.6 The Block and Timing Diagram of the Delay-Recycled DCC

Although the fine-tuning delay cells are added in the ADDCC [10] to achieve a relatively small duty-cycle error. In [10], the controllable delay range of the fine-tuning delay cell is not equal to the coarse-tuning resolution with PVT variations. Hence, it usually needs to overlap 20% to 30% coarse-tuning step in design of the fine-tuning delay cell to ensure the controllable delay range of the fine-tuning delay cell is larger than one coarse-tuning resolution with PVT variations. However, this causes the non-monotonic response problem when the controller switches the coarse-tuning control code as shown in Figure 1.7. Once the controller switches the coarse-tuning control code, the ADDCC [10] will have large cycle-to-cycle jitter during the coarse-tuning control code switching.



Figure 1.7 The Non-monotonic Response Problem during Coarse-tuning Control

Code Switching

### **1.3 Motivation**

With the growing recognition of energy savings, designing low-power electronic devices is demanded. According to the dynamic power dissipation equation,  $P = CV^2 f$ , if we reduce the supply voltage to one half of the nominal voltage, it can reduce 75% of power dissipations. However, the operating voltage near to the threshold voltage makes transistors charging and discharging slower. Hence, the intrinsic delay of logic gates becomes longer and directly affects the overall chip performance.

Moreover, when the chips are in mass production, process variations also affect the performance. The speed ratio of PMOS and NMOS is symmetric at FF, TT, and SS corners. However, it is asymmetric at the unbalanced process corners (i.e. SF and FS and is rarely considered in the DCC design). Besides, the asymmetric speed ratio of PMOS and NMOS may cause malfunction in many DCCs. Hence, we need to design a DCC that works correctly under all process corners with low-voltage and nominal voltage power supply.

# 1.4 Design Challenges with Dynamic Voltage and Frequency Scaling

When the supply voltage is reduced and close to the threshold voltage, we may face some design challenges. For example, the duty-cycle distortion in the clock buffer tree becomes worse than that at nominal supply voltage. Figure 1.8 shows the duty-cycle after the clock buffer tree can be as high as 90% with 20 non-CLK type series connected buffers at SS corner. Such worse duty-cycle distortion emphasizes the importance of employing a DCC in the SoC.



Figure 1.8 Duty-Cycle Distortion with (a) 7 (b) 20 buffers at 0.5V

However, some of the previously discussed DCC architectures are not suitable for low-voltage applications with unbalanced process variations. For example, the mismatch problem in the charge pump of the PWCL-based DCC [1] at unbalanced process corners will directly affect the output duty-cycle error. On the other hand, the cascaded buffers also contribute an accumulative duty-cycle error in the SMD-based DCC's [4] measurement delay line, as shown in Figure 1.9. Once the narrow pulse is stretched to turn on two or more AND gates, the generated pulses will be two or more and thus the output duty-cycle error will be increased (Figure 1.9(a)). In contrast, once the narrow pulse is stretched to disappear, the SMD-based DCC is in malfunction (Figure 1.9(b)). Besides, the narrow pulse is easily to be enlarged or reduced at low supply voltage. Hence, the SMD-based DCC cannot be adopted in low supply voltage SoC.



Figure 1.9 The Pulse (a) Stretching/ (b)Shrinking Timing Diagram of the SMD-based DCC

Although TDC-based DCCs [5]-[9] can achieve fast lock-in time at a low supply voltage, the output duty-cycle error still depends on the resolution of the delay line. Thus, the duty-cycle error of these TDC-based DCCs will be worse than the DCCs with fine-tuning delay cells. Even if the DCC [10] adopts the coarse-fine delay line architecture, it's hard to design a fine-tuning delay line that the controllable range can

cover one coarse resolution at all process corners.

Figure 1.10 shows the block diagram of the dual loop architecture ADDCC [14], [15]. The DCC and DLL dual loop architecture based ADDCC has high duty-cycle correcting accuracy and can align the phase of the input and output clock. However, it takes a long lock-in time. Besides, the accumulative duty-cycle error due to the digitally controlled delay line (DCDL) directly affects the duty-cycle error of the output clock (CLK\_OUT). Since the DLL loop is used to generate a complementary duty-cycle signal, the duty-cycle error caused by the delay line of the DLL cannot be corrected by the DCC loop. As previously discussed, the delay line with a low supply voltage distorts the pulse width of the input clock much more than that with a nominal supply voltage. Therefore, this type of ADDCC will result in a huge output duty-cycle error and is not suitable for low voltage SoC.



Figure 1.10 The Block Diagram of the Dual Loop Architecture ADDCC [14],

<sup>[15]</sup> 

### **1.5** Thesis Organization

The PVT conditions used in the following chapters are listed in Table 1.1. We have defined the best and worst cases with PVT variations in two categories: unbalanced process corners and balanced process corners.

The supply voltages and the temperature at unbalanced process corners, such as TT, SF, and FS are set to 1.0V nominal supply voltage mode, 0.5V low supply voltage mode, and 25 °C in TSMC 90nm process. By setting the same voltage and temperature condition, we can clearly understand how unbalanced process corners degrade the system performance. On the other hand, we let balanced corners, such as FF and SS corner have +10% and -10% voltage variation against TT corner. Besides, the temperature is set to 0 and 75°C for FF and SS corner, respectively.

| Corner | Nominal Supply Voltage (V) | Low Supply Voltage (V) | Temperature (°C) |
|--------|----------------------------|------------------------|------------------|
| FF     | 1.1                        | 0.55                   | 0                |
| TT     | 1.0                        | 0.5                    | 25               |
| SS     | 0.9                        | 0.45                   | 75               |
| SF     | 1.0                        | 0.5                    | 25               |
| FS     | 1.0                        | 0.5                    | 25               |

Table 1.1 PVT conditions in this thesis

The rest of the thesis is organized as follows: Chapter 2 depicts the system architecture, the fast-locking mechanism, the locking procedure, and the design constraints of the proposed ADDCC. Chapter 3 describes the implementation of each modules of the proposed ADDCC, and we also discuss their performance. The -14-

experimental results are shown in Chapter 4 including the test chip plan, the simulation results, and the measurement results. Besides, we also offer a comparison table to compare the proposed ADDCC with prior DCCs. At the end of the thesis, we make a conclusion and provide future works.



# Chapter 2 Architecture of All-Digital Duty-Cycle Corrector

#### 2.1 Architecture Overview

Figure 2.1 shows the block diagram of the proposed ADDCC [16]. The ADDCC is composed of a multiplexer (MUX), a pulse generator (PG), an AND gate, a half-cycle delay line (HCDL), a phase and frequency detector (PFD) [17], an ADDCC controller, a TDC encoder, and a D-type flip-flop (DFF).



Figure 2.1 The Proposed ADDCC [16]

The PG transforms the input clock (CLK\_IN) and the feedback clock (CLK\_FB or CLK\_OUT) clock into narrow pulses (in\_pulse and fb\_pulse). The signal tdc\_start selects CLK\_FB or CLK\_OUT to generate fb\_pulse. The AND gate before the "pulse" signal will be pulled down at zero to avoid unnecessary pulses triggering the DFF until the reset signal (RESET) is pulled low. Once reset signal (RESET) is pulled down, the AND gate before the "pulse" signal allows the short pulses propagate through the digitally controlled HCDL. The ADDCC controller adjusts the delay line

control code (ctrl\_code [10:0]) by the PFD's outputs. When ADDCC is locked, the frequency of the "pulse" and "toggle" signals will be two times of the reference clock frequency. Finally, the DFF divides the "toggle" signal by two and outputs an exact 50% duty-cycle clock. Consequently, the output clock (CLK\_OUT) frequency is the same as the reference clock.

With the aid of PFD, input and output clock are phase-aligned. Hence, our ADDCC will not insert an extra clock skew between the input and output clocks. Besides, in the proposed ADDCC, the required delay time of the HCDL is reduced to one half of the input clock period. Thus, the chip area and power consumption of the ADDCC can be reduced as compared to the prior researches [5]-[7].



#### 2.2 Delay Line Architecture

The proposed TDC-embedded half-cycle delay line (HCDL) is composed of a 6-bit TDC-embedded coarse-tuning delay line (CDL) and a 5-bit fine-tuning delay line (FDL) [20], as shown in Figure 2.2. The dummy cells are added to balance the capacitance loading of the NAND gates. The proposed CDL is composed of 63 lattice delay units (LDU) [21] and embedded with a time-to-digital converter (TDC). The FDL [20] is composed of two parallel connected tri-state buffer arrays operating as an interpolator circuit.



Figure 2.2 The TDC-Embedded HCDL

The resolution of the CDL is two NAND gates. Thus every two NAND gates have a DFF for quantizing the period of the input clock and output as tdc\_data [63:0].

#### 2.3 The Locking Procedure

Figure 2.3 shows the overall timing diagram of the proposed ADDCC. After the ADDCC is reset, the coarse-tuning control code (ctrl\_code [10:0]) of the HCDL is set to the maximum value (i.e. 11'b111\_1111\_1111) and the tdc\_start is pulled high in the beginning. Subsequently, the narrow pulses propagate through the HCDL. At the next rising edge of the input clock (CLK\_IN), the TDC captures the propagated pulse signals and stores as tdc\_data [63:0]. Then, the TDC encoder will search for the bit location of the first "1" in tdc\_data [63:0] from the most-significant bit to the least-significant bit. Then, the TDC encoder outputs the initial delay control code (tdc\_code [5:0]) for the ADDCC to achieve fast lock-in time.



Figure 2.3 Overall Timing Diagram of the Proposed ADDCC

After setting the initial control code, the input (CLK\_IN) and output (CLK\_OUT) clock still have a small phase error due to the finite TDC resolution. Hence, the proposed ADDCC increases or decreases the delay line control code (ctrl\_code [10:0]) according to the outputs of the PFD. A binary search scheme is adopted in the

ADDCC controller to accelerate the fine-tuning process. Consequently, the output clock (CLK\_OUT) is phase-aligned with the input clock (CLK\_IN) when the ADDCC is locked. Whenever the PFD's output is changed from UP to DOWN or vice versa, the search step (step [4:0]) is divided by 2 until the step is reduced to 1. Once the step is equal to one, the ADDCC is locked.

Besides, when the proposed ADDCC operates at low frequency, our precise quantization technique usually makes the rising edges of input clock (CLK\_IN) and output clock (CLK\_OUT) nearly phase-aligned after setting the initial control code. That is, the PFD is unavailable to distinguish which signal is lead or lag in such conditions. Hence, once the phase error between the input clock (CLK\_IN) and output clock (CLK\_OUT) is smaller than the PFD's dead-zone, the ADDCC is locked as well.



#### **2.4 The Fast-Locking Mechanism**

Figure 2.4 shows the detail timing diagram of the TDC at low frequency operation. In Figure 2.4, the period of the input clock (CLK\_IN) is larger than the maximum delay time of the HCDL. Thus, the short pulses propagate through the full delay line and trigger the DFF, and then it generates the output clock (CLK\_OUT) rising edge transition before the next rising edge of the input clock (CLK\_IN). The rising edge transition of output clock (CLK\_OUT) indicates the period of the input clock (CLK\_IN) is longer than the maximum delay time of the HCDL. The first rising edge transition of the "toggle" signal triggers the DFF to pull-up the output clock (CLK\_OUT) to logic 1 state. Then, PG generates the feedback pulse (fb\_pulse) from the CLK\_OUT. Subsequently, the combined "pulse" signal will propagate through the HCDL and produces the next rising transition of the "toggle" signal.



Figure 2.4 Timing Diagram of TDC at Low Frequency Operation

In Figure 2.4, the first "1" bit location of tdc\_data [63:0] from the most-significant bit to the least-significant bit is 16. However, the logic 1 state of the CLK\_OUT indicates that the pulse signal has already propagated the HCDL once. Therefore, the period of the input clock (CLK\_IN) is quantized as 80 (=16+64) coarse-tuning delay unit's delay time. Since the HCDL is to provide a half cycle delay time by the HCDL, the tdc\_code[5:0] outputs by the TDC encoder is 40 (=80/2) in this example.



Figure 2.5 Timing Diagram of TDC at High Frequency Operation

When the period of the input clock (CLK\_IN) is smaller than the maximum delay time of the HCDL, the short pulses require more than one input clock cycle to pass through the full delay line, as shown in Figure 2.5. Thus, at next rising edge of the input clock (CLK\_IN), the output clock (CLK\_OUT) does not have a rising transition in this case. In Figure 2.5, the first "1" bit location of tdc\_data [63:0]
from the most-significant bit to the least-significant bit is 20. Therefore, the input clock (CLK\_IN) period can be quantized as 20 coarse-tuning delay unit's delay time. In addition, the tdc\_code [5:0] should be 10 (=20/2) in this example.

With the TDC, the proposed ADDCC can achieve fast lock-in time within 15 input clock cycles with both 1.0V and 0.5V supply voltage at all process corners.



## 2.5 Design Constraints

If the input clock period (CLK\_IN) is  $T_{CLK,IN}$ , the output clock period (CLK\_OUT) is  $T_{CLK,OUT}$  the intrinsic delay of the PG is  $T_{PG}$ , the intrinsic delay of the AND gate is  $T_{AND}$ , the delay time of the HCDL is  $T_{HCDL}$ , and the clock-to-q delay of the DFF is  $T_{DFF}$ , Eq. 2.1 must be satisfied when the ADDCC is locked.

$$T_{CLK\_OUT} = T_{CLK\_IN} = 2 \times (T_{PG} + T_{AND} + T_{HCDL} + T_{DFF})$$

$$(2.1)$$

The total intrinsic delay of the path from input clock (CLK\_IN) to the output clock (CLK\_OUT) restricts the maximum operating frequency of the proposed ADDCC as shown in Eq. 2.2. On the other hand, the minimum operating frequency range can be extended by adding coarse-tuning stages in the HCDL as illustrated in Eq. 2.3. However, the coarse-tuning stages cannot be extended infinitely. The total bit number of the control code also restricts the highest clock rate of the ADDCC controller. In our design, the CDL is composed of 63 coarse delay units.

$$T_{CLK\_IN_{min}} = 2 * (T_{PG} + T_{AND} + T_{HCDL_{min}} + T_{DFF})$$

$$(2.2)$$

$$T_{CLK_{IN_{max}}} = 2 * (T_{PG} + T_{AND} + T_{HCDL_{max}} + T_{DFF})$$
(2.3)

When the PVT variations are considered, the overlapped input frequency range in all PVT corners is depicted in Eq. 2.4 and Eq. 2.5. The SS corner and FF corner dominates the maximum and minimum input operating frequency, respectively.

$$T_{CLK\_IN_{min}} = 2 * (T_{PG, SS} + T_{AND, SS} + T_{HCDL_{min, SS}} + T_{DFF, SS})$$
(2.4)

$$T_{CLK_{-IN_{max}}} = 2 * (T_{PG, FF} + T_{AND, FF} + T_{HCDL_{max, FF}} + T_{DFF, FF})$$
(2.5)

# Chapter 3 Circuit Design and Implementation of ADDCC

## **3.1** Pulse Generator

Figure 3.1 depicts the block and timing diagram of the PG. The PG generates narrow pulses to propagate through the HCDL at every rising transitions of input clocks (CLK\_A and CLK\_B). For example, the inverter inverses the CLK\_A to generate signal "a" and then the buffer chain delays signal "a" to signal "b". Subsequently, the AND gate comprises CLK\_A and signal "b" into a fixed pulse width signal (a\_pulse). The OR gate outputs the short pulses (PG\_OUT) to the HCDL from signals "a\_pulse" and "b\_pulse". Figure 3.1 also shows the PG can generate fixed pulse width pulses when the duty-cycle of the input clock (CLK\_A) is greater than or smaller than 50%.



Figure 3.1 The Block and Timing Diagram of PG

In addition, the PG also restricts the acceptable duty-cycle range of the proposed ADDCC as shown in Figure 3.2. When the pulse width of the input clock is longer than the buffer chain delay ( $T_{buf}$ ), the PG generates fixed pulses whose pulse width is equal to the buffer chain delay ( $T_{buf}$ ). On the other hand, when the pulse width of the input clock is shorter than the buffer chain delay, the PG generates pulses whose pulse width is equal to the input clock. Hence, once the pulse width is too small and cannot trigger the DFF, the proposed ADDCC will not work correctly.



CLK\_A pulse width > buffer delay CLK\_A pulse width < buffer delay

Figure 3.2 PG generates pulses with the pulse width of the input clock larger and smaller than the buffer chain delay

- 26 -

# **3.2 Phase and Frequency Detector**

To make input clock (CLK\_IN) and output clock (CLK\_OUT) to be phase-aligned, we use a phase and frequency detector (PFD) [17] rather than a phase detector (PD) in the proposed ADDCC. Although the sense amplifier based PD [18] has the tiny dead-zone, the quantization error of the TDC sometimes makes the ADDCC lock to the harmonic at high frequency operation, as shown in Figure 3.3.



Hence, to avoid the proposed ADDCC being lock to the harmonic lock state, we employ a PFD [17] to avoid this problem, as shown in Figure 3.4. The proposed PFD [17] decreases the number of the DFFs as compared to the conventional bang-bang PFD [19].



Figure 3.4 The Phase and Frequency Detector [17]

The rising edges of the input clock (CLK\_IN) and output clock (CLK\_OUT) trigger DFFs to generate short high pulses (QU and QD). Subsequently, the NAND gates determine the input clock (CLK\_IN) leads or lags to the feedback clock (CLK\_OUT) by the QU signal and QD signal. However, the pulse width of OUTU or OUTD is still too narrow for the ADDCC controller. Hence, we use a digital pulse amplifier (DPA) which composed of six series connected AND gates to enlarge the pulse width of OUTU and OUTD to generate UP signal and DOWN signal.

The PHASE\_CLK is comprised by performing AND operation with UP and DOWN signals. We also add CLK-type buffers before the PHASE\_CLK to increase the driving strength to the ADDCC controller.

Table 3.1 illustrates the dead zone of the proposed PFD with PVT variations. With dual supply voltage mode, PFD can correctly distinguish the phase error between the input clock (CLK\_IN) and output clock (CLK\_OUT) at 1.0V and 0.5V.

|     | Dead-Zone (ps) |      |  |
|-----|----------------|------|--|
| PVT | 1.0V           | 0.5V |  |
| FF  | 22             | 30   |  |
| TT  | 36             | 110  |  |
| SS  | 70             | 300  |  |
| SF  | 36             | 70   |  |
| FS  | 44             | 90   |  |
|     |                |      |  |

Table 3.1 The dead-zone of the proposed PFD

# 3.3 TDC-embedded HCDL

### 3.3.1 TDC-Embedded CDL

In the proposed ADDCC architecture, the short pulses, which generated by the PG, propagate through the HCDL. Thus, in unbalanced process variations (i.e. SF or FS process corner), the pulse width of the short pulses will be enlarged or shrunk. Figure 3.5 shows the timing diagram of the conventional multiplexer (MUX)-based delay line [8] with a 0.5V power supply. The input clock (CLK\_IN) is 50% duty-cycle and we gradually increase the number of the pass through delay cells by increasing the delay line control code (code [7:0]). Due to the slow charging and discharging time with a low supply voltage, the conventional MUX-based delay line [8] seriously enlarges the pulse width of the output clock (CLK\_OUT). Once the pulse width of the short pulses is stretched to be kept in the logic 1 state, it cannot trigger the DFF, and the proposed ADDCC will fail in this situation.



Figure 3.5 The MUX-based Delay Line with a low supply voltage

Hence, we adopt the NAND-based delay line architecture [21] in the CDL design to reduce the pulse width distortion by the delay cells, as shown in Figure 2.2. Each node of NAND gates has an equal input capacitance loading so that the pulse width of the short pulse will be less affected when they propagate the delay line. After integrating the TDC into the CDL, the capacitance loading of each NAND gates are not equal. Nevertheless, Figure 3.6 and Figure 3.7 show that the proposed TDC-embedded CDL can still provide pulses (toggle) which are able to trigger the DFF at all PVT corners and dual supply voltage mode. In Figure 3.6 and Figure 3.7, the input clock for the delay line has 50% duty-cycle. The maximum pulse width distortion is about +4.8% with a 1.0V power supply. On the other hand, the proposed TDC-embedded CDL shrinks the pulse width of input pulses with a 0.5V power supply at FS corner. Hence, pulses should be carefully designed to be wide enough to trigger the DFF with a 0.5V power supply at FS corner.



Figure 3.6 Duty-Cycle Distortion Test of TDC-Embedded CDL with 1.0V power

supply



Figure 3.7 Duty-Cycle Distortion Test of TDC-Embedded CDL with 0.5V power

The properties of the proposed TDC-embedded CDL are listed in Table 3.2. The intrinsic delay time and the resolution of the proposed TDC-embedded CDL are three NAND gate delay and two NAND gate delay, respectively. According to Eq. 2.3 in Section 2.5, the max delay time of the TDC-embedded CDL dominates the minimum operating frequency of the proposed ADDCC.

|     | 1.0V  |           | 0.5V       |       |           |            |
|-----|-------|-----------|------------|-------|-----------|------------|
|     | Max   | Intrinsic | Desclution | Max   | Intrinsic | Desclution |
| PVT | Delay | Delay     | (ps)       | Delay | Delay     |            |
|     | (ps)  | (ps)      |            | (ps)  | (ps)      | (ps)       |
| FF  | 2757  | 73        | 43         | 9055  | 212       | 140        |
| TT  | 3884  | 103       | 60         | 19890 | 427       | 309        |
| SS  | 6103  | 157       | 94         | 49899 | 1016      | 776        |
| SF  | 3986  | 110       | 61.5       | 22650 | 541       | 351        |
| FS  | 3828  | 95        | 59         | 20418 | 390       | 318        |

Table 3.2 Properties of The Proposed TDC-Embedded HCDL



### 3.3.2 FDL

Conventional fine-tuning delay line (FDL) in coarse-fine delay line architecture needs to overlap 20% to 30% coarse-tuning step in design of the fine-tuning delay cell to ensure the controllable delay range of the fine-tuning delay cell is always larger than one coarse-tuning resolution with PVT variations, as discussed in Section 1.2.4. It not only increases the area cost and power consumption of the chip, but also causes a large cycle-to-cycle jitter when the coarse-tuning control code is switching. Therefore, we use a digitally controlled phase interpolator to enhance the resolution of the delay line and guarantee the controllable delay range of the FDL can cover one coarse-tuning resolution.

The phase interpolator [22] is composed of 8 parallel tri-state inverter interpolator units. Although it has monotonic frequency response and supply noise compensation technique inside, it is not cell-based and consumes large power. Hence, we adopt a lower power and cell-based phase interpolator [20] in the proposed ADDCC.

Figure 3.8 describes the architecture and the timing diagram of the FDL. The delay time of FDL is adjusted by the driving strength of two parallel connected tri-state buffer arrays. The rising edge of the output clock (OUT) will be phase aligned with CA\_OUT when the fine-tuning control code (code [30:0]) is fully opened (i.e. 31'h7FFF\_FFF). In contrast, the output clock (OUT) will be phase aligned with CB\_OUT. With adjusting the number of turned-on tri-state buffers, the resolution of the delay line can be enhanced to be 1/31 coarse-tuning step by the FDL. At the same time, it also reduces the output duty-cycle error.



Figure 3.8 The FDL [20]

The properties of the FDL are listed in Table 3.3. The large intrinsic delay time of the FDL restricts the highest operating frequency of the proposed ADDCC at a 0.5V power supply.



|     | 1.0V                 | 0.5V                 |
|-----|----------------------|----------------------|
| PVT | Intrinsic Delay (ps) | Intrinsic Delay (ps) |
| FF  | 83.7                 | 258.66               |
| TT  | 113.9                | 488.6                |
| SS  | 164.9                | 1077.6               |
| SF  | 118.8                | 607.9                |
| FS  | 109                  | 458.6                |

The definition of the differential nonlinearity (DNL) and the integral nonlinearity are listed in Eq. 3.1 and Eq. 3.2. DNL describes the delay deviation of two adjacent control codes. A DNL error specification of less than or equal to one least-significant bit (LSB) guarantees a transfer function with no missing codes. On the other hand, INL describes the deviation in LSB of an actual transfer function from an ideal transfer curve.

$$DNL = \frac{(T_{i+1} - T_i)}{T_{LSB-IDEAL}} - 1, \text{ where } 0 < i < 2^N - 2$$
(3.1)

$$INL = \frac{(T_j - T_0)}{T_{LSB-IDEAL}} - j, \text{ where } 0 < j < 2^N - 1$$
(3.2)

Figure 3.9 shows the DNL of the proposed FDL with dual supply voltages. The proposed FDL has a maximum DNL of -0.71 LSB and 1.46 LSB at 1.0V and 0.5V supply voltage, respectively. Moreover, the DNL of the proposed FDL are higher than -1.0 LSB, and that means the proposed FDL has a monotonic response.

Figure 3.10 shows the INL of the proposed FDL with dual supply voltages. Even though the linearity of the proposed FDL at unbalanced process corners is not so good, the output duty-cycle error can be still limited within 2%.

Besides, the total controllable delay range of the FDL is always equal to one coarse-tuning delay step at all PVT variations. Hence, the proposed HCDL can always provide a monotonic response between the delay line control code (ctrl\_code [10:0]) and the output delay time. Therefore, the proposed monotonic delay line architecture can reduce the jitter of the output clock during the coarse-tuning control code switching.



(a)



(b)



- (a) The FDL with a 1.0V Power Supply
- (b) The FDL with a 0.5V Power Supply







(b)



- (a) The FDL with a 1.0V Power Supply
- (b) The FDL with a 0.5V Power Supply

# 3.4 Design Challenges in Dynamic Voltage and Frequency Scaling

The proposed ADDCC can operate with dual supply voltage mode and thus it can support the dynamic voltage and frequency scaling (DVFS) technique to further reduce the dynamic power dissipation. However, when the proposed ADDCC operates with a low voltage, it faces two design challenges: TDC quantization error and the bottleneck in increasing the maximum operating frequency.

### 3.4.1 TDC Quantization Error

The proposed TDC quantizes the input clock period into digital control codes. If the input clock period (CLK\_IN) is  $T_{CLK,IN}$ , the output clock period (CLK\_OUT) is  $T_{CLK,OUT}$  the intrinsic delay of the PG is  $T_{PG}$ , the intrinsic delay of the AND gate is  $T_{AND}$ , the delay time of the HCDL is  $T_{HCDL}$  including the intrinsic delay time of the FDL ( $T_{FDL}$ ) and the delay time of the CDL ( $T_{CDL}$ ), and the clock-to-q delay of the DFF is  $T_{DFF}$ , the output clock period will be equal to the input clock period when the ADDCC is locked and it can be derived in Eq. 3.3.

$$T_{CLK\_OUT} = T_{CLK\_IN}$$
  
= 2 × (T<sub>PG</sub> + T<sub>AND</sub> + T<sub>HCDL</sub> + T<sub>DFF</sub>)  
= 2 × (T<sub>PG</sub> + T<sub>AND</sub> + T<sub>FDL</sub> + T<sub>CDL</sub> + T<sub>DFF</sub>) (3.3)

We assume that the proposed ADDCC is at high frequency operation and pulses have not propagated through the HCDL, as shown in Figure 2.5. The output clock period after TDC operation can be illustrated in Eq. 3.4. The input clock period at high frequency operation  $(T_{CLK_{IN}, H})$  can be represented with the sum of the intrinsic delays and numbers of coarse-tuning delay line units' delay time  $(T_{CDU})$ . The half-cycle delay line provides one-half of the delay time of the input clock period. Therefore, the CDL turns on  $\frac{1}{2}n$  CDUs after TDC operation. However, the output clock period has a quantization error,  $\triangle_H$  as compared to the input clock. It can be corrected by the ADDCC controller after TDC operation.

$$T_{CLK,IN, H} = T_{PG} + T_{AND} + T_{FDL} + n \cdot T_{CDU}$$

$$T_{CDL} = \frac{1}{2} \cdot (n \cdot T_{CDU})$$

$$= \frac{1}{2} \times (T_{CLK,IN} - T_{PG} - T_{AND} - T_{FDL})$$

$$T_{CLK,OUT, H} = 2 \times (T_{PG} + T_{AND} + T_{FDL} + T_{CDL} + T_{DFF})$$

$$= T_{CLK,IN, H} + T_{PG} + T_{AND} + T_{FDL} + 2 \times T_{DFF}$$

$$= T_{CLK,IN, H} + \Delta_{H}$$
(3.4)

When the proposed ADDCC is at low frequency operation and pulses have propagated through the full HCDL, as shown in Figure 2.4. The output clock period  $(T_{CLK-OUT, L})$  after TDC operation is illustrated in Eq. 3.5, and the output clock period has a small quantization error  $(\triangle_L)$  which is equal to one DFF's propagation delay time.

$$T_{CLK,IN,L} = 2 \times (T_{PG} + T_{AND} + T_{FDL}) + T_{DFF} + n \cdot T_{CDU}$$

$$T_{CDL} = \frac{1}{2} \cdot (n \cdot T_{CDU})$$

$$= \frac{1}{2} \times (T_{CLK,IN} - 2 \times (T_{PG} + T_{AND} + T_{FDL}) - T_{DFF})$$

$$T_{CLK,OUT,L} = 2 \times (T_{PG} + T_{AND} + T_{FDL} + T_{CDL} + T_{DFF})$$

$$= T_{CLK,IN,L} + \frac{T_{DFF}}{\Delta_L}$$

$$(3.5)$$

The TDC quantization error will have at least one DFF's propagation delay time.

Thus, the ADDCC controller with the proposed PFD can reduce the residual duty-cycle error after TDC operation.



# 3.4.2 Bottleneck in Increasing the Maximum Operating Frequency

According to the design constraints of the proposed ADDCC in Section 2.5, the maximum operating frequency of the proposed ADDCC is limited by the sum of logic gate's intrinsic delays as shown in Eq. 2.4. However, with a low supply voltage, the delay time of logic gates become four to five times slower than that at nominal power supply. Hence, if we can reduce the delay time of the logic gates, we are able to increase the maximum operating frequency of the proposed ADDCC.

Some well-known techniques, for example, forward body bias (FBB) [23], reverse short channel effect (RSCE) [24], and bulk-driven techniques [25] are used to improve the driving strength of the circuits at a low supply voltage. However, these techniques are process-dependent and are less portable to different technologies.

All of standard cells have longer transition time and propagation delay with a low supply voltage but they can still work correctly. However, the DFF may have unacceptable setup time and hold time margins and have a large clock-to-Q delay with a low supply voltage [26]. Therefore, the true single phase clock (TSPC) DFF with FBB technique [27] and the pulse-latch DFF [28] are proposed to effectively reduce the clock-to-Q delay of the DFF in low voltage supply applications.

When the clock-to-Q delay time of the DFF is reduced, it can effectively shorten the propagation delay path of the proposed ADDCC. However, the difference between the rise time and fall time of the DFF in the proposed ADDCC will affects the output duty-cycle error. Therefore, a low power DFF [29] with balanced rise time and fall time delay can improve the overall performance of the proposed ADDCC. However, designing a DFF with the balanced rise time and fall time delay for all PVT corners is always a design challenge.

Consequently, after we review our system architecture, we find out that we may bypass the delay path between PG and DFF to increase the maximum operating frequency [30]. Then, it can increase the highest operating frequency of the proposed ADDCC.



# **Chapter 4** Experimental Results

## 4.1 Test Circuit Implementation

The proposed ADDCC is implemented in TSMC 90nm 1P9M standard performance CMOS process.

Due to the clock rate restriction on I/O pads, signals with a frequency higher than 300 MHz are unable to transmit through I/O pads. Hence, we propose a test circuit for chip measurement at high frequency operation, as shown in Figure 4.1. We use a digitally controlled oscillator (DCO) and a duty-cycle generator (DUTY\_GEN) to generate a high-speed clock (DCO\_CLK) with various frequencies and duty-cycles.



Figure 4.1 The Block Diagram of the Test Chip Circuit

The DCO and the DUTY\_GEN can only generate an output clock (A) with a duty-cycle > 50%. Hence, an inversed signal (I\_A) is made to provide a clock whose duty-cycle is under 50%. Then, the duty wide selection bit (WIDE\_SELECT) selects the generated clock as the high-speed on-chip clock (DCO\_CLK).

The system clock selection bit (OUT\_SELECT) selects the on-chip clock (DCO\_CLK) or the external clock (INPUT\_CLK) to be the ADDCC's input clock (SYSTEM\_CLK). When the input clock frequency is lower than 300 MHz, we can directly input it from the external pin. In this case, OUT\_SELECT disables the DCO for power saving. Otherwise, the ADDCC's input clock is provided by the DCO with the DUTY\_GEN.

We also design a divide-by-four (DIV\_FOUR) circuit to divide the high frequency signal to a lower frequency for the measurement considerations. Hence, we can still measure the duty-cycle of the system input clock (SYSTEM\_CLK) and the output clock (OUTPUT\_CLK) from the external pins indirectly.

## 4.1.1 MUX-typed DCO

The DCO in the test circuit uses the MUX-typed structure to generate the output clock (CLK\_OUT) with various frequencies, as shown in Figure 4.2. The DCO is composed of a NAND gate to enable/disable the output clock (CLK\_OUT) and 15 delay units for controlling the DCO frequency range. Each delay unit consists of a delay buffer and a MUX. When the system clock selection bit (OUT\_SELECT) is in the logic 0 state, the DCO starts to produce the on-chip clock. The FREQ Decoder encodes the 4-bit digital frequency selection codes (FREQ\_SELECT) into a 15-bit thermometer code (FREQ\_CODE) for controlling the DCO's delay units. With more delay units are passed through, the signal goes through a longer delay path so that the DCO can output clock (CLK\_OUT) to a low frequency.



Figure 4.2 The Block Diagram of Mux-typed DCO

Table 4.1 lists the frequency range of the MUX-typed DCO. The output frequency ranges 156 MHz to 802 MHz and 32.7 MHz to 175.5 MHz with a 1.0V and a 0.5V power supply, respectively.

| FREQ_SELECT      | Output Frequency<br>@1.0V (MHz) | Output Frequency<br>@0.5V (MHz) |
|------------------|---------------------------------|---------------------------------|
| 4'b 0000 (4'd0)  | 802                             | 175.5                           |
| 4'b 0001 (4'd1)  | 630                             | 136.4                           |
| 4'b 0010 (4'd2)  | 517                             | 111                             |
| 4'b 0011 (4'd3)  | 440                             | 93.2                            |
| 4'b 0100 (4'd4)  | 382                             | 80                              |
| 4'b 0101 (4'd5)  | 338                             | 71.4                            |
| 4'b 0110 (4'd6)  | 302                             | 63.8                            |
| 4'b 0111 (4'd7)  | 273                             | 57.4                            |
| 4'b 1000 (4'd8)  | 250                             | 52.8                            |
| 4'b 1001 (4'd9)  | 230                             | 48.3                            |
| 4'b 1010 (4'd10) | 213                             | 44.8                            |
| 4'b 1011 (4'd11) | 198                             | 41.8                            |
| 4'b 1100 (4'd12) | 185                             | 38.9                            |
| 4'b 1101 (4'd13) | 174                             | 36.5                            |
| 4'b 1110 (4'd14) | 164                             | 34.5                            |
| 4'b 1111 (4'd15) | 156                             | 32.7                            |

Table 4.1 The Controllable Output Frequency Range of the MUX-typed DCO

### 4.1.2 Duty-Cycle Generator

The DCO has provided output clock with various frequency ranges. Then we use the Duty-Cycle Generator (DUTY\_GEN) to control the duty-cycle of the clock.

The proposed DUTY\_GEN is composed of 28 AND gates and 29 OR gates, as shown in Figure 4.3. The Duty Decoder generates a 28-bit control code (DUTY\_CODE) from the 3-bit duty-cycle selection code (DUTY\_SELECT) and the 4-bit frequency selection code (FREQ\_SELECT) to control the pulse width of the output clock (CLK\_OUT).



Figure 4.3 The Block Diagram of Duty-Cycle Generator

The pulse width of the input clock (CLK\_IN) will be increased and output as CLK\_OUT after passing each delay stages. When the input clock passes through more delay stages, the pulse width of the input clock (CLK\_IN) will be larger. With all delay stages are closed, the input clock (CLK\_IN) still passes through one OR gate. Hence, the DUTY\_GEN provides an output clock (CLK\_OUT) with duty-cycle over 50%. With an inverter, we can easily inverse the output clock (CLK\_OUT) to achieve

a duty-cycle under 50%.

Table 4.2 lists the controllable duty-cycle range of the proposed DUTY\_GEN and we assume that the frequency selection bits (FREQ\_SELECT) is fixed to 4'b1010 (4'd10). By means of the inverter and the DUTY\_GEN, the duty-cycle range can be controlled by a step of 8% (@213 MHz) and 10% (@44.8 MHz) with a nominal 1.0V and a 0.5V low supply voltage, respectively.

| Supply Voltage |                               | 1.0V    | 0.5V        |
|----------------|-------------------------------|---------|-------------|
| FREQ_SELECT    | 4'b1010 (4'd10)               | 213 MHz | 44.8MHz     |
| WIDE_SELECT    | DUTY_SELECT Output Duty-Cycle |         | y-Cycle (%) |
|                | 3'b 111 (3'd7)                | 0       | 0           |
|                | 3'b 110 (3'd6)                | 0       | 0           |
|                | _3'b 101 (3'd5)               | 7       | 0           |
| 1              | 3'b 100 (3' <u>d4</u> )       | 15      | 6           |
| 1              | 3'b 011 (3'd3)                | 23      | 16          |
|                | 3'b 010 (3'd2)                | 31      | 26          |
|                | 3'b 001 (3'd1)                | 39      | 36          |
|                | 3'b 000 (3'd0)                | 47      | 46          |
|                | 3'b 000 (3'd0)                | 53      | 54          |
|                | 3'b 001 (3'd1)                | 61      | 64          |
|                | 3'b 010 (3'd2)                | 69      | 74          |
| 0              | 3'b 011 (3'd3)                | 77      | 84          |
| 0              | 3'b 100 (3'd4)                | 85      | 94          |
|                | 3'b 101 (3'd5)                | 93      | 100         |
|                | 3'b 110 (3'd6)                | 100     | 100         |
|                | 3'b 111 (3'd7)                | 100     | 100         |

Table 4.2 Controllable Duty-Cycle Range of the Proposed Duty-Cycle Generator

## 4.1.3 DIV\_FOUR Circuit



Figure 4.4 The Block Diagram of DIV\_FOUR Circuit

Due to the clock rate restriction on the I/O pad in 90nm CMOS process, we design a divide-by-four (DIV\_FOUR) circuit to divide a high frequency signal to a lower frequency. Figure 4.4 shows the block diagram of the proposed DIV\_FOUR circuit. We use two DFFs which trigger by the positive and negative edges of the CLK\_IN to divide the input clock (CLK\_IN) frequency by four. After frequency division, the low frequency signals (CLK\_P and CLK\_N) are able to send to the output pads. When the input clock (CLK\_IN) frequency is low, we can directly send the input clock (ORIG\_CLK) to the I/O pads without frequency division. The system clock selection bit (OUT\_SELECT) will block the output signals for power saving. For instance, when OUT\_SELECT is in the logic 0 state (i.e. a high-speed on-chip clock (DCO\_CLK) input to the ADDCC), ORIG\_CLK will be stuck at the logic 1 state to save the dynamic power dissipations. On the contrary, CLK\_P and CLK\_N

will be stuck at the logic 1 state when ORIG\_CLK is output.

The timing diagram of the DIV\_FOUR circuit is shown in Figure 4.5. Assumed that input clock (CLK\_IN) period is T and its pulse width is A. Therefore, the duty-cycle of the input clock (CLK\_IN) is A/T.



Figure 4.5 The Timing Diagram of DIV\_FOUR Circuit

After frequency division, the period of CLK\_P and CLK\_N (*TD*) is four times longer than the input clock (CLK\_IN) (i.e.  $TD = 4 \cdot T$ ). However, the phase error between the rising edges of CLK\_P and CLK\_N is always *A*. Thus, we can derive the duty-cycle of the input clock (CLK\_IN) from *TD* and *A*, as illustrated in Eq. 4.1.

$$Duty-Cycle (CLK_IN) = \frac{A}{T} = \frac{A}{\frac{TD}{4}}$$
(4.1)

Actually, the rise time and the fall time of the I/O pads are unbalanced. The pulse-width of CLK\_P and CLK\_N will be distorted when they pass through the output pads. However, the phase error between the rising edges of CLK\_P and CLK\_N is fixed and thus we can still use the proposed measurement approach to derive the duty-cycle and period of the internal signals.

## 4.1.4 Level Shifter

The proposed ADDCC Ver.1 is implemented in TSMC 90nm CMOS process. However, the output pads cannot work correctly when the core power supply is 0.5V. That is, we cannot measure the performance of the proposed ADDCC Ver.1 after the chip fabrication at 0.5V. Hence, we have to put a level shifter before the output pad to pull up the 0.5V low-voltage swing signals back to 1.0V voltage swing in our ADDCC Ver.2.



Figure 4.6 The Schematic Diagram of the Proposed Level Shifter

Figure 4.6 shows the schematic diagram of the proposed level shifter. When the low-voltage input signal (A) is in logic 1 state, the level shifter will pull up the low-voltage ( $VDD_L$ ) swing to a high-voltage ( $VDD_H$ ) swing. On the other hand, the logic 0 state is the shared ground (VSS) and thus the voltage level does not need to be shifted.

The layout of the proposed level shifter is shown in Figure 4.7. The cell size is  $6.44\mu m X 2.52\mu m$ . The power nets (VDD and VSS) are placed on the top and bottom of the cell for routing with other standard cells. On the other hand, the high-voltage power net (VCC) is drawn by the metal 2 layer.

A buffer is added before the output signal (Y) to improve driving strength. Otherwise, the level shifter cannot drive the input capacitance of the output pad.



Figure 4.7 The Layout of the Proposed Level Shifter

The timing diagram of the proposed level shifter is shown in Figure 4.8. The proposed level shifter is able to convert the low-voltage (VDD) swing signal (A) into the high-voltage (VCC) swing signal (Y). After that, the output pad can further output an output signal (O\_Y) with 3.3V voltage swing from Y. Consequently, we are able to measure the chip performance at low supply voltage from the external pins.



Figure 4.8 The Timing Diagram of the Proposed Level Shifter



# 4.2 ADDCC Ver.1

## 4.2.1 Specifications



Figure 4.9 Microphotograph of ADDCC Ver.1

The test chip is fabricated in TSMC 90nm 1P9M standard performance CMOS process. Figure 4.9 shows the microphotograph of ADDCC Ver.1. The core area occupies 170 X 170  $\mu$ m<sup>2</sup> and the chip area including I/O pads occupies 734.24 X 734.24  $\mu$ m<sup>2</sup>. The chip consists of an ADDCC and a test chip circuit for generating high-speed clock and the gate count including the test circuit is about 3028

(=8546/2.8224 (2.8224 is one NAND gate size in TSMC 90nm CMOS process)).



Figure 4.10 Chip I/O Planning and Floorplanning in ADDCC Ver.1

Figure 4.10 depicts the chip I/O planning and the floorplanning of ADDCC Ver.1. The proposed ADDCC Ver.1 has 11 input pins, 7 output pins, and 14 power pins. The detail I/O pads information are shown in Table 4.3.

We use the O\_TESTCHIP\_FREQ\_CLK\_P and O\_TESTCHIP\_FREQ\_CLK\_N to calculate the duty-cycle and the period of ADDCC's input clock (SYSTEM\_CLK) and O\_DCC\_FREQ\_CLK\_P and O\_DCC\_FREQ\_CLK\_N are used to calculate the duty-cycle and the period of ADDCC's output clock (OUTPUT\_CLK).

| Pin<br>Number | Pin Name              | Input/<br>Output | Information                                                                   |  |
|---------------|-----------------------|------------------|-------------------------------------------------------------------------------|--|
| 1             | VSSP1                 | Input            | Pad Power                                                                     |  |
| 2             | O_DCC_FREQ_CLK_P      | Output           | Divided by CLK_OUT via<br>Positive edge                                       |  |
| 3             | VDDP1                 | Input            | Pad Power                                                                     |  |
| 4             | O_LOCK                | Output           | DCC LOCK                                                                      |  |
| 5             | VSSP0                 | Input            | Pad Power                                                                     |  |
| 6             | I_WIDE_SELECT         | Input            | Duty Wide Select<br>(over 50% or under 50%)                                   |  |
| 7             | VDDC0                 | Input            | Core Power                                                                    |  |
| 8             | VDDP0                 | Input            | Pad Power                                                                     |  |
| 9             | I_INPUT_CLK           | Input            | External CLK                                                                  |  |
| 10            | VSSC0                 | Input            | Core Power                                                                    |  |
| 11            | I_OUT_SELECT          | Input            | Select Internal or External<br>clock                                          |  |
| 12~14         | I_DUTY_SELECT[2:0]    | Input            | Duty Error Select<br>(7% ~ 93%, step 8%, @1.0V;<br>6% ~ 94%, step 10%, @0.5V) |  |
| 15            | VSSP4                 | Input            | Pad Power                                                                     |  |
| 19~16         | I_FREQ_SELECT[3:0]    | Input            | Frequency Select<br>(150 MHz~800 MHz @ 1.0V;<br>30 MHz~170 MHz @ 0.5V)        |  |
| 20            | VDDP4                 | Input            | Pad Power                                                                     |  |
| 21            | I_RESET               | Input            | DCC RESET                                                                     |  |
| 22            | O_TESTCHIP_FREQ_CLK_N | Output           | Divided by FREQ_CLK via<br>Negative edge                                      |  |
| 23            | VDDC1                 | Input            | Core Power                                                                    |  |
| 24            | O_TESTCHIP_ORIG_CLK   | Output           | Without Divided<br>TESTCHIP_CLK                                               |  |
| 25            | VSSP3                 | Input            | Pad Power                                                                     |  |
| 26            | VSSC1                 | Input            | Core Power                                                                    |  |
| 27            | O_TESTCHIP_FREQ_CLK_P | Output           | Divided by FREQ_CLK via<br>Positive edge                                      |  |
| 28            | VDDP3                 | Input            | Pad Power                                                                     |  |
| 29            | VSSP2                 | Input            | Pad Power                                                                     |  |
| 30            | O_DCC_ORIG_CLK        | Output           | Without Divided<br>ADDCC CLK_OUT                                              |  |
| 31            | VDDP2                 | Input            | Pad Power                                                                     |  |
| 32            | O_DCC_FREQ_CLK_N      | Output           | Divided by CLK_OUT via<br>Negative edge                                       |  |

Table 4.3 I/O Pins Information of ADDCC Ver.1

#### 4.2.2 Simulation Results

The operating frequency of the proposed ADDCC Ver.1 ranges from 110 MHz to 900 MHz and 20 MHz to 170 MHz with a 1.0V and a 0.5V power supply, respectively. When considering the PVT variations, the overlapped operating frequency region of the proposed ADDCC Ver.1 is 200 MHz to 550 MHz and 45 MHz to 60 MHz with a 1.0V and a 0.5V power supply, respectively. The input duty-cycle ranges from 20% to 80% in dual supply voltage mode at all PVT corners.

The proposed ADDCC consumes 5.7mW (@800MHz) and 1.23mW (@150MHz) with 1.0V power supply (including the test chip circuit). When the power supply is reduced to 0.5V, it consumes  $215\mu$ W (@170MHz) and  $75\mu$ W (@30MHz) (including the test chip circuit).

Figure 4.11 summarizes the output-duty-cycle error of the proposed ADDCC Ver.1 at TT process corner with different input frequencies and duty-cycles at dual supply voltage mode. The maximum output duty-cycle error is always smaller than 1.845% and 1.1% with a 1.0V and a 0.5V power supply, respectively.

Figure 4.12 shows the simulation results of the proposed ADDCC operating with PVT variations. The proposed ADDCC can work correctly with PVT variations and the output duty-cycle error with PVT variations can be limited within 1.6% and 1.4% with a nominal and a low supply voltage. Thus, the proposed design is very robust, and it can against unbalanced process variations.

The proposed ADDCC employs a TDC to accelerate the system lock-in time. Figure 4.13 to Figure 4.16 present the convergence of the duty-cycle correction operation and the detailed output duty-cycle with PVT variations at dual supply voltage. As shown in these figures, the proposed ADDCC corrects the reference clock
to 50% duty-cycle within 15 reference clock cycles in dual supply voltage mode.

Besides, we also plot the power spectrum density and the jitter histogram to show the performance of the proposed ADDCC Ver.1 with dual supply voltage, different input frequencies, and different input duty-cycle ranges, as shown in Figure 4.17 to Figure 4.20.

The period jitter, the peak-to-peak ( $P_k$ - $P_k$ ) jitter and the root-mean-square (RMS) jitter of the proposed ADDCC are smaller than 14.2ps and 1.56ps, respectively with a 1.0V power supply. In addition, the cycle-to-cycle jitter, the  $P_k$ - $P_k$  jitter and the RMS jitter of the proposed ADDCC are smaller than 24.2ps and 2.59ps with 1.0V power supply.

The proposed ADDCC is still robust with a 0.5V low supply voltage. The  $P_k$ - $P_k$  jitter and RMS jitter of the period jitter is less than 123ps and 14.18ps, respectively. The  $P_k$ - $P_k$  jitter and RMS jitter of the cycle-to-cycle jitter is less than 186ps and 23.76ps, respectively.





Figure 4.11 The Output Duty-Cycle Error at Typical Process Corner (a) The Proposed ADDCC Ver.1 with a 1.0V Power Supply

#### (b) The Proposed ADDCC Ver.1 with a 0.5V Power Supply







(b)

Figure 4.12 The Output Duty-Cycle Error with PVT variations(a) The Proposed ADDCC Ver.1 with a 1.0V Power Supply(b) The Proposed ADDCC Ver.1 with a 0.5V Power Supply

- 61 -







<sup>(</sup>b)

Figure 4.13 Convergence of the Output Duty-Cycle Error with PVT variations, a

1.0V Power Supply, and Highest Frequency Operation

- (a) The Duty-Cycle Convergence Diagram
- (b) Detailed Output Duty-Cycle after the Proposed ADDCC Ver.1 is locked







<sup>(</sup>b)

Figure 4.14 Convergence of the Output Duty-Cycle Error with PVT variations, a

1.0V Power Supply, and Lowest Frequency Operation

- (a) The Duty-Cycle Convergence Diagram
- (b) Detailed Output Duty-Cycle after the Proposed ADDCC Ver.1 is locked







<sup>(</sup>b)

Figure 4.15 Convergence of the Output Duty-Cycle Error with PVT variations, a

0.5V Power Supply, and Highest Frequency Operation

- (a) The Duty-Cycle Convergence Diagram
- (b) Detailed Output Duty-Cycle after the Proposed ADDCC Ver.1 is locked







<sup>(</sup>b)

Figure 4.16 Convergence of the Output Duty-Cycle Error with PVT variations, a

0.5V Power Supply, and Lowest Frequency Operation

- (a) The Duty-Cycle Convergence Diagram
- (b) Detailed Output Duty-Cycle after the Proposed ADDCC Ver.1 is locked







Figure 4.17 Simulated Power Spectrum Density and Jitter Histograms of the Proposed ADDCC Ver.1 at High Frequency with a 1.0V Power Supply

- (a) Power Spectrum Density at 783.27 MHz
- (b) Peak-to-Peak Jitter Histogram at 783.27 MHz
- (c) Cycle-to-Cycle Jitter Histogram at 783.27 MHz







Figure 4.18 Simulated Power Spectrum Density and Jitter Histograms of the Proposed ADDCC Ver.1 at Low Frequency with a 1.0V Power Supply

- (a) Power Spectrum Density at 152.37 MHz
- (b) Peak-to-Peak Jitter Histogram at 152.37 MHz
- (c) Cycle-to-Cycle Jitter Histogram at 152.37 MHz







Figure 4.19 Simulated Power Spectrum Density and Jitter Histograms of the Proposed ADDCC Ver.1 at High Frequency with a 0.5V Power Supply

- (a) Power Spectrum Density at 92.64 MHz
- (b) Peak-to-Peak Jitter Histogram at 92.64 MHz
- (c) Cycle-to-Cycle Jitter Histogram at 92.64 MHz







Figure 4.20 Simulated Power Spectrum Density and Jitter Histograms of the Proposed ADDCC Ver.1 at High Frequency with a 0.5V Power Supply

- (a) Power Spectrum Density at 34.29 MHz
- (b) Peak-to-Peak Jitter Histogram at 34.29 MHz
- (c) Cycle-to-Cycle Jitter Histogram at 34.29 MHz

#### 4.2.3 Measurement Results

| FREO SELECT | (DUTY_SELECT, | Output Frequency         | Output Duty-Cycle |  |  |
|-------------|---------------|--------------------------|-------------------|--|--|
|             | WIDE_SELECT)  | output i requency        |                   |  |  |
| 0000        | (010, 0)      |                          | 86%               |  |  |
|             | (001, 0)      |                          | 64%               |  |  |
|             | (000, 0)      | 734 MHz                  | 61%               |  |  |
| 0000        | (000, 1)      | / <b>5</b> + <b>WITZ</b> | 49%               |  |  |
|             | (001, 1)      |                          | 39%               |  |  |
|             | (010, 1)      |                          | 19%               |  |  |
|             | (100, 0)      |                          | 77%               |  |  |
|             | (011, 0)      |                          | 71%               |  |  |
|             | (010, 0)      |                          | 63%               |  |  |
|             | (001, 0)      |                          | 56%               |  |  |
| 0100        | (000, 0)      | 254 MHz                  | 46%               |  |  |
| 0100        | (000, 1)      | 334 14112                | 43%               |  |  |
|             | (001, 1)      | $\mathbf{D}$ )/          | 34%               |  |  |
|             | (010, 1)      |                          | 28%               |  |  |
|             | (011, 1)/     |                          | 22%               |  |  |
|             | (100, 1)      | ~                        | 9%                |  |  |
|             | (100, 0)      |                          | 79%               |  |  |
|             | (011, 0)      |                          | 71%               |  |  |
| 1111        | (010, 0)      |                          | 65%               |  |  |
|             | (001, 0)      | 144 MHz                  | 58%               |  |  |
|             | (000, 0)      |                          | 52%               |  |  |
|             | (000, 1)      |                          | 42%               |  |  |
|             | (001, 1)      |                          | 31%               |  |  |

Table 4.4 Properties of the Test Circuit in ADDCC Ver.1

After chip measurement, the DCO in test circuit of ADDCC Ver.1 can output frequencies ranging from 144 MHz to 734 MHz. The duty-cycle generator (DUTY\_GEN) can output duty-cycles ranging from 9% to 86%.



Figure 4.21 Measured Output Duty-Cycle of the Proposed ADDCC Ver.1

Figure 4.21 summarizes the measurement results of the proposed ADDCC Ver.1. The input frequency ranges from 144 MHz to 734 MHz and the input duty-cycle ranges from 9% to 86%. The maximum output duty-cycle error is 1.78%. In addition, the core power is supplies with 1.0V and the pad power is supplies with 3.3V. The proposed ADDCC Ver.1 consumes 4.59 mW, 3,56 mW, and 1.13 mW at 734 MHz, 354 MHz, and 144 MHz, respectively.

| Signal       | Frequency | P <sub>k</sub> -P <sub>k</sub> Jitter (ps) | RMS Jitter (ps) |  |
|--------------|-----------|--------------------------------------------|-----------------|--|
|              | 734 MHz   | 33.62                                      | 4.55            |  |
| System Clock | 354 MHz   | 46.7                                       | 6.8             |  |
|              | 144 MHz   | 96.37                                      | 12.21           |  |
|              | 734 MHz   | 27.13                                      | 4.1             |  |
| Output Clock | 354 MHz   | 37.42                                      | 5.78            |  |
|              | 144 MHz   | 84.09                                      | 12.44           |  |

Table 4.5 Jitter Measurement Results of the System and Output Clock

Table 4.5 shows the measured  $P_k$ - $P_k$  jitter and the RMS jitter of the system and output clock at different operating frequencies. The  $P_k$ - $P_k$  jitter is smaller than 96.37 ps and the RMS jitter is smaller than 12.44 ps. In other words, the proposed ADDCC Ver.1 has a good jitter performance even the clocks have been divided by four.



We use the indirect duty-cycle measurement approach discussed in section 4.1 to calculate the actual duty-cycle of the internal clock. Figure 4.22 shows the duty-cycle measurement results of the proposed ADDCC Ver.1 at 734 MHz and 144 MHz. The signal No.3 (DCC\_FREQ\_CLK\_P) is the divided signal from the output clock's (CLK\_OUT) positive edges and the signal No.2 (DCC\_FREQ\_CLK\_N) is the divided signal from output clock's (CLK\_OUT) negative edges. The phase difference between the signal No.3 and signal No.2 is the pulse width of the internal clock. Hence, the duty-cycle is 50.3% (=(0.6884/(5.47367/4))\*100%) at 734 MHz and the duty-cycle is 49.63% (=(3.452491/(27.826028/4))\*100%) at 144 MHz.

Figure 4.23 shows the jitter histogram of the proposed ADDCC Ver.1 at 734 MHz and 144 MHz, respectively. Since the signal is divided by four, the measured clock period will be enlarged four times longer. Hence, the actual frequency of the Figure 4.23(a) is 734 MHz (=1/(5.43263/4)). In Figure 4.23(b), the actual frequency is 144 MHz (=1/(28.94976/4)).

The indirect duty-cycle measurement approach can't verify whether the system clock (CLK\_IN) and the output clock (CLK\_OUT) of the proposed ADDCC Ver.1 are phase-aligned or not. Hence, the direct duty-cycle measurement method is used to verified that our ADDCC Ver.1 will not insert an extra clock skew between the input clock and the output clock, as shown in Figure 4.24. The signal No.2 is the input clock (TESTCHIP\_ORIG\_CLK) and the signal No.3 is the output clock (DCC\_ORIG\_CLK). The proposed ADDCC Ver.1 can work correctly with both 19.7% and 83.6% input duty-cycle clocks.







Figure 4.22 Duty-Cycle Measurement Results of the Proposed ADDCC Ver.1

- (a) Duty-Cycle Measurement Results at 734 MHz
- (b) Duty-Cycle Measurement Results at 144 MHz







<sup>(</sup>b)

Figure 4.23 Measured Jitter Histogram of the Proposed ADDCC Ver.1

- (a) The Jitter Histogram at 734 MHz
- (b) The Jitter Histogram at 144 MHz





<sup>(</sup>b)

Figure 4.24 Phase Alignment between Input Clock and Output Clock at 75 MHz

- (a) Duty-Cycle Correction with a 19.7% Input Duty-Cycle
- (b) Duty-Cycle Correction with a 83.6% Input Duty-Cycle

# 4.3 ADDCC Ver.2

#### 4.3.1 Specifications



Figure 4.25 Layout of ADDCC Ver.2

Figure 4.25 shows the layout of ADDCC Ver.2. The ADDCC Ver.2 has integrated 7 level shifters for each output signals so that 0.5V low-voltage swing signals can be pulled up back to 1.0V voltage swing signals and can be transmitted through output pads subsequently. After integrating level shifters, the core area of the proposed

ADDCC Ver.2 still occupies 170 X 170  $\mu$ m<sup>2</sup> and the chip area including I/O pads still occupies 734.24 X 734.24  $\mu$ m<sup>2</sup>.

Figure 4.26 depicts the chip I/O planning and the floorplanning of ADDCC Ver.2. The proposed ADDCC Ver.2 has 11 input pins, 7 output pins, 16 power pins, and 2 power-cut cells. The detail I/O pads information are shown in Table 4.6.

Unlike conventional single power domain design, the proposed ADDCC Ver.2 has two power domains: 0.5V power domain (VDD<sub>L</sub>) and 1.0V power domain (VDD<sub>H</sub>). The two power domains are split by two power-cut cells (PRCUT\_0 and PRCUT\_1).



Figure 4.26 Chip I/O Planning and Floorplanning in ADDCC Ver.2

| Pin<br>Number | Pin Name              | Input/<br>Output | Information                                                                   |  |  |
|---------------|-----------------------|------------------|-------------------------------------------------------------------------------|--|--|
| 1             | VSSC2                 | Input            | High Voltage Core Power                                                       |  |  |
| 2             | VDDC2                 | Input            | High Voltage Core Power                                                       |  |  |
| 3             | O_DCC_FREQ_CLK_N      | Output           | Divided by CLK_OUT via<br>Negative edge                                       |  |  |
| 4             | VSSP1                 | Input            | Pad Power                                                                     |  |  |
| 5             | O_DCC_FREQ_CLK_P      | Output           | Divided by CLK_OUT via<br>Positive edge                                       |  |  |
| 6             | VDDP1                 | Input            | Pad Power                                                                     |  |  |
| 7             | O_LOCK                | Output           | DCC LOCK                                                                      |  |  |
| 8             | PRCUT_0               | N/A              | Power Cut Cell                                                                |  |  |
| 9             | I_WIDE_SELECT         | Input            | Duty Wide Select<br>(over 50% or under 50%)                                   |  |  |
| 10            | VDDC0                 | Input            | Low Voltage Core Power                                                        |  |  |
| 11            | VSSC0                 | Input            | Low Voltage Core Power                                                        |  |  |
| 12            | I_INPUT_CLK           | Input            | External CLK                                                                  |  |  |
| 13            | VSSP0                 | Input            | Pad Power                                                                     |  |  |
| 14            | VDDP0                 | Input            | Pad Power                                                                     |  |  |
| 15            | I_OUT_SELECT          | Input            | Select Internal or External<br>clock                                          |  |  |
| 16~18         | I_DUTY_SELECT[2:0]    | Input            | Duty Error Select<br>(7% ~ 93%, step 8%, @1.0V;<br>6% ~ 94%, step 10%, @0.5V) |  |  |
| 19            | VSSP4                 | Input            | Pad Power                                                                     |  |  |
| 23~20         | I_FREQ_SELECT[3:0]    | Input            | Frequency Select<br>(150 MHz~800 MHz @ 1.0V;<br>30 MHz~170 MHz @ 0.5V)        |  |  |
| 24            | VDDP4                 | Input            | Pad Power                                                                     |  |  |
| 25            | I_RESET               | Input            | DCC RESET                                                                     |  |  |
| 26            | VDDC1                 | Input            | Low Voltage Core Power                                                        |  |  |
| 27            | VSSC1                 | Input            | Low Voltage Core Power                                                        |  |  |
| 28            | PRCUT_1               | N/A              | Power Cut Cell                                                                |  |  |
| 29            | VSSP3                 | Input            | Pad Power                                                                     |  |  |
| 30            | O_TESTCHIP_ORIG_CLK   | Output           | Without Divided<br>TESTCHIP_CLK                                               |  |  |
| 31            | VDDP3                 | Input            | Pad Power                                                                     |  |  |
| 32            | O_TESTCHIP_FREQ_CLK_N | Output           | Divided by FREQ_CLK via<br>Negative edge                                      |  |  |
| 33            | O_TESTCHIP_FREQ_CLK_P | Output           | Divided by FREQ_CLK via<br>Positive edge                                      |  |  |
| 34            | VSSP2                 | Input            | Pad Power                                                                     |  |  |
| 35            | O_DCC_ORIG_CLK        | Output           | Without Divided<br>ADDCC CLK_OUT                                              |  |  |
| 36            | VDDP2                 | Input            | Pad Power                                                                     |  |  |

Table 4.6 I/O Pins Information of ADDCC Ver.2

#### 4.3.2 Simulation Results

Figure 4.27 shows the timing diagram of the proposed ADDCC Ver.2. After putting level shifters before each output pads, 0.5V low-voltage swing signals can be pulled up back to 1.0V voltage swing signals and then 3.3V voltage swing signals. Hence, we can measure the performance of the proposed ADDCC after the chip fabrication at 0.5V.



Figure 4.27 The Timing Diagram of the Proposed ADDCC Ver.2

# 4.4 Performance Comparisons

#### 4.4.1 ADDCC Ver.1

| Parameter                                    | Ver.1                                      | JSSC'08<br>[2]                  | TVLSI'13<br>[31]                            | TVLSI'13<br>[15]                             | TVLSI'12<br>[3]                            | TCAS-II'07<br>[5]    | JSSC'09<br>[6]                               | ISSCC'08<br>[35]                       |
|----------------------------------------------|--------------------------------------------|---------------------------------|---------------------------------------------|----------------------------------------------|--------------------------------------------|----------------------|----------------------------------------------|----------------------------------------|
| Phase<br>Alignment                           | Yes                                        | No                              | Yes                                         | Yes                                          | No                                         | Yes                  | Yes                                          | Yes                                    |
| Unbalanced<br>Process<br>Tolerance           | Yes                                        | No                              | No                                          | No                                           | Yes                                        | Yes                  | Yes                                          | Yes                                    |
| Туре                                         | TDC/<br>HCDL                               | Analog<br>PWCL                  | TDC/<br>HCDL/<br>Interpolator               | Sequential<br>Search/<br>HCDL                | TDC/<br>HCDL                               | TDC                  | DCC/DLL                                      | DCC/DLL                                |
| Process                                      | 90 nm                                      | 0.18 μm                         | 0.18 μm                                     | 65 nm                                        | 0.18 μm                                    | 0.18 µm              | 0.18 μm                                      | 66 nm                                  |
| Supply<br>Voltage (V)                        | 1.0                                        | 1.8                             | 1.8                                         | 1.0                                          | 1.8                                        | 1.8                  | 1.8                                          | 1.5                                    |
| Frequency<br>(MHz)                           | 75 ~ 734                                   | 50 ~ 1100                       | 250 ~ 625                                   | 262 ~ 1020                                   | 400 ~ 2000                                 | 800 ~ 1200           | 440 ~ 1500                                   | 100 ~ 1000                             |
| Input<br>Duty-Cycle<br>Range (%)             | 9~86                                       | 30 ~ 70                         | 30~70                                       | 14~86                                        | 10 ~ 90<br>@ 400 MHz<br>20 ~ 80<br>@ 2 GHz | 40 ~ 60              | 20 ~ 80<br>@ 440 MHz<br>40 ~ 60<br>@ 1.5 GHz | 40 ~ 60                                |
| Maximum<br>Output<br>Duty-Cycle<br>Error (%) | 1.78                                       | 1.0                             | 1.6                                         | 0.6<br>@ 262 MHz<br>1.4<br>@ 1.02 GHz        | 1.0<br>@ 400MHz<br>3.5<br>@ 1GHz           | 1.5                  | 1.8                                          | 1                                      |
| Duty-Cycle<br>Corrector<br>Resolution        | 1.94 ps                                    | N/A                             | <sup>1</sup> 11.36 ps                       | 3.5 ps                                       | <sup>2</sup> 78.1 ps                       | <sup>3</sup> 78.1 ps | <sup>4</sup> 17.75 ps                        | N/A                                    |
| Lock-in Time<br>(Cycle)                      | < 15                                       | < 60                            | < 36                                        | N/A                                          | < 3.5                                      | 10                   | 10 ~ 15                                      | < 400                                  |
| P <sub>k</sub> -P <sub>k</sub> Jitter        | 27.13 ps<br>@734 MHz                       | 13.2 ps<br>@ 1.3 GHz            | 21.1 ps<br>@ 625 MHz                        | 23.64 ps<br>@ 1.02 GHz                       | 28.45ps<br>@ 1 GHz                         | 12.9ps<br>@ 1 GHz    | 7ps<br>@ 1.5 GHz                             | 45.76 ps<br>@ 1GHz                     |
| Power<br>Consumption                         | 0.9 mW<br>@ 75 MHz<br>4.59 mW<br>@ 734 MHz | <sup>5</sup> 4.8 mW<br>@ 1.3GHz | 8.4 mW<br>@ 250 MHz<br>10.8 mW<br>@ 625 MHz | 1.96 mW<br>@ 262 MHz<br>6.5 mW<br>@ 1020 MHz | 1.76 mW<br>@ 400 MHz<br>3.6 mW<br>@ 2 GHz  | 15 mW                | 43 mW<br>@ 1.5 GHz                           | 4.2 mW<br>@ 100 MHz<br>20 mW<br>@ 1GHz |
| Area (mm <sup>2</sup> )                      | 0.0289                                     | 0.2068                          | 0.09                                        | 0.01                                         | 0.025                                      | 0.2236               | 0.053                                        | 0.111                                  |
| PBR (mW)                                     | 0.47                                       | 0.02                            | 0.78                                        | 1.66                                         | N/A                                        | 0.94                 | 0.95                                         | 0.89                                   |

Table 4.7 Performance Comparisons with ADDCC Ver.1

<sup>1</sup>2.5 ns (frequency range)  $\div$  22 (coarse)  $\div$  5 (fine)  $\div$  2 (interpolator) = 11.36 ps

 $^{2}2500 \text{ ps} \div 32 = 78.1 \ (@400 \text{ MHz})$ 

 $^{3}1250 \text{ ps} \div 16 = 78.1 \text{ ps} (@800 \text{ MHz})$ 

 $^{4}\tau = 2272.7 \text{ ps} \div 16 = 142 \text{ ps} (@440 \text{ MHz})$ 

resolution is  $\tau \times 0.125 = 142 \times 0.125 = 17.75 \ ps$ 

<sup>5</sup>PWCL only

$$\begin{split} NF: Normalized \ Frequency &= F \ \times \ \left(\frac{Technology}{0.09}\right), \\ NP: Normalized \ Power &= P \ \times \left(\frac{0.09}{Technology}\right) \ \times \left(\frac{1.0}{VDD}\right)^2 \ \times \ \frac{734}{F_{max}} \\ PBR: Power \ Bandwidth \ Ratio &= NP \ \div \ \left(\frac{NF_{max}}{NF_{min}}\right) \end{split}$$

We conclude the performance of the proposed ADDCC Ver.1 in compared with prior DCC researches in Table 4.7. Although the analog PWCL [2] have a relatively small duty-cycle error, it has a large chip area and have a long lock-in time. In addition, the PWCL has a serious charge pump mismatch problem at unbalanced process corners, and thus, it may result in worse duty-cycle error with unbalanced process variations. When the analog PWCL-based DCC is supplied with 0.5V, the charge pump may suffer a longer charging/discharging time. Moreover, the output clock of [2], [3] is not phase aligned with the input clock, makes it not easy to be integrated in the SoC.

In [5], the TDC-based ADDCC without fine-tuning delay cells is difficult to achieve small duty-cycle error at high frequency operation; further, maintaining a wide operation frequency is also difficult in this architecture. In addition, the interpolator of the ADDCC [5] is easily affected by unbalanced process variations. Thus, the output duty cycle error will become worse at unbalanced process corners. Besides, when the TDC-based ADDCC without fine-tuning delay cells operates with a low supply voltage, the duty-cycle error depends on the resolution of the delay line. Thus, the duty-cycle error will become very large at a low supply voltage.

In [6], [15], and [35], a delay-locked loop (DLL) is integrated with the ADDCC to align the phase of the output clock with the input clock. However, the dual loop architecture results in more power consumption and higher design complexity.

The coarse-fine delay line architecture based ADDCCs [15], [31] can effectively

improve the output duty-cycle error. In [15], the sequential search scheme takes many reference clock cycles in shrinking and stretching the pulse width. Besides, the accumulative duty-cycle error in the delay line of the DLL loop directly affects the output duty-cycle error.

In [31], the three half delay lines (HDLs) compensate for the process variations to enhance the reliability of the ADDCC at all process corners. However, the HDLs may have on-chip mismatch problem and cause large power dissipations. Besides, the FDL in [31] should overlap 20% to 30% coarse-tuning step to ensure the controllable delay range of the FDL is larger to one coarse-tuning resolution with PVT variations. Otherwise, it will cause a non-monotonic response problem when control codes switching.

Furthermore, the phase interpolator in [31] directly affects the output duty-cycle error. Hence, when the ADDCC [31] is at unbalanced process corners, the phase of the interpolated output clock will not be exactly at the middle of the two input clocks.

Besides, we have defined a power bandwidth ratio (PBR) index to quantize the relationship between the operating frequency range and the power consumption. As shown in Table 4.7, the PBR of the proposed ADDCC Ver.1 is the smallest one except for [2] against published papers. Even though the analog PWCL-based DCC has a relatively small PBR, its area cost is significantly higher than other DCCs. In addition, the output duty-cycle error of the DCC[3] at the operating frequency which is over 1GHz is too large and cannot be used, and thus we exclude it in PBR comparisons.

In comparison to prior studies, the proposed ADDCC has a lower area cost, a wider operating frequency range and a better tolerance to PVT variations.

#### 4.4.2 ADDCC Ver.2

| Parameter                                    | Ve                                                | r.2                                           | JSSC'08<br>[2]                  | TVLSI'13<br>[31]                            | TVLSI'13<br>[15]                             | TVLSI'12<br>[3]                            | TCAS-II'07<br>[5]    | JSSC'09<br>[6]                               | ISSCC'08<br>[35]                       |
|----------------------------------------------|---------------------------------------------------|-----------------------------------------------|---------------------------------|---------------------------------------------|----------------------------------------------|--------------------------------------------|----------------------|----------------------------------------------|----------------------------------------|
| Phase<br>Alignment                           | Ye                                                | es                                            | No                              | Yes                                         | Yes                                          | No                                         | Yes                  | Yes                                          | Yes                                    |
| Unbalanced<br>Process<br>Tolerance           | Ye                                                | 28                                            | No                              | No                                          | No                                           | Yes                                        | Yes                  | Yes                                          | Yes                                    |
| Туре                                         | TD<br>HC                                          | C/<br>DL                                      | Analog PWCL                     | TDC/<br>HCDL/<br>Interpolator               | Sequential<br>Search/<br>HCDL                | TDC/<br>HCDL                               | TDC                  | DCC/DLL                                      | DCC/DLL                                |
| Process                                      | 90                                                | nm                                            | 0.18 μm                         | 0.18 µm                                     | 65 nm                                        | 0.18 μm                                    | 0.18 µm              | 0.18 μm                                      | 66 nm                                  |
| Supply<br>Voltage (V)                        | 1.0                                               | 0.5                                           | 1.8                             | 1.8                                         | 1.0                                          | 1.8                                        | 1.8                  | 1.8                                          | 1.5                                    |
| Frequency<br>(MHz)                           | 110 ~ 900                                         | 20 ~ 170                                      | 50 ~ 1100                       | 250 ~ 625                                   | 262 ~ 1020                                   | 400 ~ 2000                                 | 800 ~ 1200           | 440 ~ 1500                                   | 100 ~ 1000                             |
| Input<br>Duty-Cycle<br>Range (%)             | 20 ~ 80                                           |                                               | 30 ~ 70                         | 30 ~ 70                                     | 14 ~ 86                                      | 10 ~ 90<br>@ 400 MHz<br>20 ~ 80<br>@ 2 GHz | 40 ~ 60              | 20 ~ 80<br>@ 440 MHz<br>40 ~ 60<br>@ 1.5 GHz | 40 ~ 60                                |
| Maximum<br>Output<br>Duty-Cycle<br>Error (%) | 1.146<br>@ 110<br>MHz<br>1.845<br>@ 900<br>MHz    | 0.747<br>@ 20<br>MHz<br>0.839<br>@ 170<br>MHz | 1.0                             | 1.6                                         | 0.6<br>@ 262 MHz<br>1.4<br>@ 1.02 GHz        | 1.0<br>@ 400MHz<br>3.5<br>@ 1GHz           | 1.5                  | 1.8                                          | 1                                      |
| Duty-Cycle<br>Corrector<br>Resolution        | 1.94 ps                                           | 9.97 ps                                       | N/A                             | <sup>1</sup> 11,36 ps                       | 3.5 ps                                       | <sup>2</sup> 78.1 ps                       | <sup>3</sup> 78.1 ps | <sup>4</sup> 17.75 ps                        | N/A                                    |
| Lock-in Time<br>(Cycle)                      | <                                                 | 15                                            | < 60                            | < 36                                        | N/A                                          | < 3.5                                      | 10                   | 10 ~ 15                                      | < 400                                  |
| P <sub>k</sub> -P <sub>k</sub> Jitter        | 14.2ps<br>@783.27<br>MHz                          | 142ps<br>@92.64<br>MHz                        | 13.2 ps<br>@ 1.3 GHz            | 21.1 ps<br>@ 625 MHz                        | 23.64 ps<br>@ 1.02 GHz                       | 28.45 ps<br>@ 1 GHz                        | 12.9 ps<br>@ 1 GHz   | 7 ps<br>@ 1.5 GHz                            | 45.76 ps<br>@ 1GHz                     |
| Power<br>Consumption                         | 1.23 mW<br>@ 150<br>MHz<br>5.7 mW<br>@ 800<br>MHz | 75 μW<br>@ 30<br>MHz<br>215μW<br>@ 170<br>MHz | <sup>5</sup> 4.8 mW<br>@ 1.3GHz | 8.4 mW<br>@ 250 MHz<br>10.8 mW<br>@ 625 MHz | 1.96 mW<br>@ 262 MHz<br>6.5 mW<br>@ 1020 MHz | 1.76 mW<br>@ 400 MHz<br>3.6 mW<br>@ 2 GHz  | 15 mW                | 43 mW<br>@ 1.5 GHz                           | 4.2 mW<br>@ 100 MHz<br>20 mW<br>@ 1GHz |
| Area (mm <sup>2</sup> )                      | 0.0289                                            |                                               | 0.2068                          | 0.09                                        | 0.01                                         | 0.025                                      | 0.2236               | 0.053                                        | 0.111                                  |
| PBR (mW)                                     | 0.72                                              | 0.44                                          | 0.02                            | 0.78                                        | 1.66                                         | N/A                                        | 0.94                 | 0.95                                         | 0.89                                   |
| Experimental<br>Results Type                 | Simu                                              | lation                                        | Measurement                     | Measurement                                 | Measurement                                  | Measurement                                | Measurement          | Measurement                                  | Measurement                            |

Table 4.8 Performance Comparisons with ADDCC Ver.2

<sup>1</sup>2.5 ns (frequency range)  $\div$  22 (coarse)  $\div$  5 (fine)  $\div$  2 (interpolator) = 11.36 ps

 $^{2}2500 \text{ ps} \div 32 = 78.1 \ (@400 \text{ MHz})$ 

 $^{3}1250 \text{ ps} \div 16 = 78.1 \text{ ps} (@800 \text{ MHz})$ 

 $^{4}\tau = 2272.7 \text{ ps} \div 16 = 142 \text{ ps} (@440 \text{ MHz})$ 

resolution is  $\tau \times 0.125 = 142 \times 0.125 = 17.75 \ ps$ 

<sup>5</sup>PWCL only

$$\begin{split} NF: Normalized \ Frequency &= F \ \times \ \left(\frac{Technology}{0.09}\right), \\ NP: Normalized \ Power &= P \ \times \left(\frac{0.09}{Technology}\right) \ \times \left(\frac{1.0}{VDD}\right)^2 \ \times \ \frac{734}{F_{max}} \\ PBR: Power \ Bandwidth \ Ratio &= NP \ \div \ \left(\frac{NF_{max}}{NF_{min}}\right) \end{split}$$

Table 4.8 compares the performance of the ADDCC Ver.2 with published DCCs. The proposed ADDCC supports the dynamic voltage and frequency scaling (DVFS) for saving the chip power consumption. Besides, the power consumption is reduced from the milli-watt level to the micro-watt level. Most of all, it still have a good duty-cycle correcting accuracy even if it is supplied with a low-voltage power supply.



# Chapter 5 Conclusion and Future Works

#### 5.1 Conclusion

A 0.5V/1.0V low-power delay-recycled ADDCC with the tolerance to PVT variations is presented in this thesis. The proposed ADDCC can achieve a wide-range operation with input frequency ranging from 75 MHz to 734 MHz with a 1.0V nominal supply voltage, and from 20 MHz to 170 MHz with a 0.5V low supply voltage. The input duty-cycle ranges from 20% to 80% in dual supply voltage mode. The delay-recycled architecture reduces the delay line length to one-half of the reference clock period. In addition, the proposed ADDCC supports the DVFS for saving the chip power consumption. When the supply voltage is 0.5V, the power consumption is at the micro-watt level. Most of all, the proposed ADDCC is robust to unbalanced corners and is suitable for low cost applications.

### 5.2 Future Works

Low power consumption is always the major design challenge in battery-powered applications. The proposed ADDCC can achieve the low power consumption from milli-watt level to micro-watt level by reducing the supply voltage to one-half of the nominal voltage. However, the operating frequency range of the proposed ADDCC is still narrow with 0.5V supply voltage. As discussed in Section 3.4, we can further reduce the intrinsic delay of all logic gates, the operating frequency range will be extended.

The NAND-based delay line of the proposed TDC-embedded HCDL will cause glitches when four bits switching on the coarse-tuning delay control codes. The glitch-free NAND-based delay line [32] is proposed to solve the glitch problem. Although the intrinsic delay time and the resolution of the glitch-free NAND-based delay line is the same as the conventional NAND-based delay line [21], the coarse-tuning control codes on each delay cells are hard to design with the discussed constraints. The glitch problem of the NAND-based delay line directly influences the performance of the system, and thus it might be solved in the future.

With state-of-the-art technologies, the integrated circuit (IC) is able to form a three dimensional (3D) architecture to increase the density of the dynamic random access memories (DRAMs). Chips are interconnected with through silicon via (TSV) channels in 3D-IC. TSV can be arranged in the core area and have negligible inductance and low parasitic capacitance [33]. Hence, the data rate can be further increased by interconnecting chips with TSVs. In the future, it needs to perform the die-to-die clock synchronization [34] and duty-cycle correction in 3D-IC.

## References

- [1] Ravi Mehta, Sumantra Seth, Siddharth Shashidharan, Biman Chattopadhyay and Sujoy Chakravarty, "A programmable, multi-GHz, wide-range duty cycle correction circuit in 45nm CMOS process" *in Proceedings of European Solid-State Circuits Conference (ESSCIRC)*, Sep. 2012, pp. 257-260.
- [2] Kuo-Hsing Cheng, Chia-Wei Su and Kai-Fei Chang, "A high linearity, fast-locking pulsewidth control loop with digitally programmable duty cycle correction for wide range operation, " *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 399-413, Feb. 2008.
- [3] Junhui Gu, Jianhui Wu, Danhong Gu, Meng Zhang and Longxing Shi,
  "All-digital wide range precharge logic 50% duty cycle corrector," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 4, pp. 760-764, Apr. 2012.
- [4] Yi-Ming Wang and Jinn-Shyan Wang, "An all-digital 50% duty-cycle corrector," in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), May 2004, pp. 925-928.
- [5] Shao-Ku Kao and Shen-Iuan Liu, "All-digital fast-locked synchronous duty-cycle corrector," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 53, no. 12, pp. 1363-1367, Dec. 2006.
- [6] Dongsuk Shin, Janghoon Song, Hyunsoo Chae, and Chulwoo Kim, "A 7 ps jitter 0.053 mm<sup>2</sup> fast lock all-digital DLL with a wide range and high resolution DCC, " *IEEE Journal of Solid-State Circuits*, vol. 44, no. 9, pp. 2437-2451, Sep. 2009.

- [7] Shao-Ku Kao and Shen-luan Liu, "A wide-range all-digital duty cycle corrector with a period monitor," in Proceedings of IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC), Dec. 2007, pp. 349-352.
- [8] Yi-Ming Wang, Jen-Tsung Yu, Yuandi Surya, and Chung-Hsun Huang, "A compact delay-recycled clock skew-compensation AND/OR duty-cycle-correction circuit," *in Proceedings of IEEE International SOC Conference (ISOCC)*, Sep. 2011, pp. 42-47.
- [9] Shih-Nung Wei, Yi-Ming Wang, Jyun-Hua Peng, and Yuandi Surya, "A range extending delay-recycled clock skew-compensation AND/OR duty-cycle-correction circuit," *in Proceedings of IEEE International Symposium on VLSI Design, Automation & Test (VLSI-DAT)*, Apr. 2012.
- [10] R.Swathi and M.B.Srinivas, "All digital duty cycle correction circuit in 90nm based on mutex," in Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), May 2009, pp. 258-262.
- [11] Young-Jae Min, Chan-Hui Jeong, Kyu-Young Kim, Won Ho Choi, Jong-Pil Son, Chulwoo Kim, and Soo-Won Kim, "A 0.31-1 GHz fast-corrected duty-cycle corrector with successive approximation register for DDR DRAM applications," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no.8, pp. 1524-1528, Aug. 2012.
- [12] Ji-Wei Ke, Shi-Yu Huang, and Ding-Ming Kwai, "A high-resolution all-digital duty-cycle corrector with a new pulse-width detector," in Proceedings of IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC), Dec. 2010.
- [13] Poki Chen, Shi-Wei Chen, and Juan-Shan Lai, "A low power wide range duty cycle corrector based on pulse shrinking/stretching mechanism," *in Proceedings* 93 -

of IEEE Asian Solid-State Circuit Conference (ASSCC), Nov. 2007, pp. 460-463.

- [14] Dong-Hoon Jung, Kyungho Ryu, Jung-Hyun Park, and Seong-Ook Jung, "A low-power and small-area all-digital delay-locked loop with closed-loop duty-cycle correction," in Proceedings of European Solid-State Circuits Conference (ESSCIRC), Sep. 2012, pp. 181-184.
- [15] Ching-Che Chung, Duo Sheng, and Sung-En Shen, "High-resoultuion all-digital duty-cycle corrector in 65-nm CMOS technology," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. PP, no.99, Jun. 2013.
- [16] Ching-Che Chung and Chang-Jun Li, "A low-power delay-recycled all-digital duty-cycle corrector with unbalanced process variations tolerance," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2013.
- [17] Ching-Che Chung and Wei-Cheng Dai, "A referenceless all-digital fast frequency acquisition full-rate CDR circuit for USB 2.0 in 65nm CMOS technology," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2011, pp. 217-220.
- [18] Hsuan-Jung Hsu, Chun-Chieh Tu, and Shi-Yu Huang, "A high-resolution all-digital phase-locked loop with its application to built-in speed grading memory," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2008, pp. 267-270.
- [19] Ching-Che Chung and Chen-Yi Lee, "An all-digital phase-locked loop for high-speed clock generation," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 2, pp. 347-351, Feb. 2003.
- [20] Ching-Che Chung, Duo Sheng, and Wei-Da Ho, "A low-power and small-area all-digital spread-spectrum clock generator in 65nm CMOS technology," in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2012.
- [21] Rong-Jyi Yang and Shen-Iuan Liu, "A 40-550 MHz harmonic-free all-digital delay-locked loop using a variable SAR algorithm," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 2, pp. 361-373, Feb. 2007.
- [22] Byoung-Mo Moon, Young-June Park, and Deog-Kyoon Jeong, "Monotonic wide-range digitally controlled oscillator compensated for supply voltage variation," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 55, no. 10, pp. 1036-1040, Oct. 2008.
- [23] Wu-Hsin Chen, Wing-Fai Loke, Gabriel J. Thompson, and Byunghoo Jung, "A 0.5-V, 440-µW frequency synthesizer for implantable medical devices," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 8, pp. 1896-1907, Aug. 2012.
- [24] Tae-Hyoung Kim, John Keane, Hanyong Eom, and Chris H. Kim, "Utilizing reverse short-channel effect for optimal subthreshold circuit design" *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 15, no.7, pp. 821-829, Jul. 2007.
- [25] Yu-Lung Lo, Wei-Bin Yang, Ting-Sheng Chao, and Kuo-Hsing Cheng,
  "Designing an ultralow-voltage phased-locked loop using a bulk-driven technique," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 5, pp. 339-343, May 2009.
- [26] Ching-Che Chung, Duo Sheng, and Wei-Siang Su, "A 0.5V/1.0V fast lock-in ADPLL for DVFS battery-powered devices," *in Proceedings of International Symposium on VLSI Design, Automation, and Test (VLSI-DAT)*, Apr. 2013.

- [27] Seungsoo Kim, and Hyunchol Shin, "An E-TSPC divide-by-2 circuit with forward body biasing in 0.25µm CMOS," *IEEE Microwave and Wireless Components Letters*, vol. 19, no. 10, pp. 656-658, Oct. 2009.
- [28] Chulwoo Kim, and Sung-Mo (Steve) Kang, "A low-swing clock double-edge triggered flip-flop," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 5, pp. 648-652, May 2002.
- [29] Rubil Ahmadi, "A low power sense amplifier flip-flop with balanced rise/fall delay," in Proceedings of 16<sup>th</sup> International Conference on Electronics, Circuits and Systems (ICECS), Dec. 2006, pp. 1292-1295.
- [30] Chun-Yuan Cheng, Jinn-Shyan Wang, and Cheng-Tai Yeh, "A 0.35 V, 100 MHz, 0.19 μW/MHz, 3-locking-cycle all digital delay locked loop with asynchronous-deskewing technology in 55 nm cmos technology," *in Proceedings of 13<sup>th</sup> International Symposium on Integrated Circuits (ISIC)*, Dec. 2011, pp. 19-24.
- [31] You-Gang Chen, Hen-Wai Tsao, and Chorng-Sii Hwang, "A fast-locking all-digital deskew buffer with duty-cycle correction," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no.2, pp. 270-280, Feb. 2013.
- [32] Davide De Caro, "Glitch-free NAND-based digitally controlled delay-lines," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no.1, pp. 55-66, Jan. 2013.
- [33] Soo-Bin Lim, Hyun-Woo Lee, Junyoung Song, and Chulwoo Kim, "A 247µW 800 Mb/s/pin DLL-based data self-aligner for through silicon via (TSV) interface," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 3, pp. 711-723, Mar. 2013.

- [34] Ji-Wei Ke, Shi-Yu Huang, Chao-Wen Tzeng, Ding-Ming Kwai, and Yung-Fa Chou, "Die-to-die clock synchronization for 3-D IC using dual locking mechanism," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 4, pp. 908-917, Apr. 2013.
- [35] Won-Joo Yun, Hyun Woo Lee, Dongsuk Shin, Shin Deok Kang, Ji Yeon Yang, Hyeng Ouk Lee, Dong Uk Lee, Sujeong Sim, Young Ju Kim, Won Jun Choi, Keun Soo Song, Sang Hoon Shin, Hyang Hwa Choi, Hyung Wook Moon, Seung Wook Kwack, Jung Woo Lee, Young Kyoung Choi, Nak Kyu Park, Kwan Weon Kim, Young Jung Choi, Jin-Hong Ahn, and Ye Seok yang, "A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and updare gear circuit for DRAM in 66nm CMOS technology," *in Digest of Technical Papers, IEEE Solid-State Circuits Conference (ISSCC)*, Feb. 2008, pp.

282-283.

