ISSN: 2395 -5775

# A NOVEL CONFIGURABLE MULTI RAIL POWER FOR WAFER SCALE SYSTEM

Kusuma N., and Hareesh Reddy G.F\*

#### Abstract

We propose in this paper a novel configurable multi-power-rail pad that combines power supply support circuits and a digital input/output (I/O) buffers designed for a wafer-scale system. This wafer-scale platform includes a reconfigurable wafer-scale circuit, the wafer IC, comprising an alignment-insensitive surface that can be configured to interconnect any digital components manually deposited on its surface. The proposed multi –power-rail pad minimizes power losses and heat dissipation within the circuit. The pad that is fed from two distinct voltage sources providing power at 1.8 and 3.3V has been implemented and tested. This pad has two merged configurable control loops that can select the power source. Merging takes place through shared transistors. The dual supply pad embeds a voltage regulator that achieves a fast response time of 21.1ns and that can operate over a wide range of configurable regulated output voltage, from 500mV up to 2.955V.This regulator is capable of providing a maximum output current of 40mA while needing only a very small quiescent current of 126µA.The regulator's power supply noise rejection ranges from-25 down to-40dB for frequencies ranging from 1KHz up to 1MHz. The embedded digital I/O pad shares a common output with the power distribution and can be configured from 0.5 up to 3.3V for a maximum speed of 250MHz

Key Words: Configurable, I/O, LDO, multi-power rail, NanoPad, voltage regulator, WaferlC, wafer-scale.

#### INTRODUCTION

Today's electronic systems are constantly growing in size and complexity. The increasing complexity combined with decreasing time to market makes it challenging for designers to meet cost and performance constraints.

A novel electronic system prototyping platform has been recently introduced to address these issues. This platform is based on an active surface implemented using a 200mm full wafer device. This active surface is covered with over 1.2million tiny conductive pads called NanoPads interconnected with a configurable interconnection networks .Every unit-cell comprises a 4×4 array NanoPads, and 32×32 array of unit-cells defines a reticle image. The assembly at wafer-scale level is called waferIC and is achieved by photorepeating 76 copies of the reticle image that are stitched together to implement wafer scale interconnections.

When using the prototyping platform, user integrated circuits (ulCs) are deposited on the active surface to build the target electronic system. This surface is

designed to be insensitive to the alignment of deposited ulCs (fig). A thermal flexible pouch, filled with a thermal grease to evacuate heat, is put on top of the ulCs firmly held in place by a uniformly applied pressure to ensure good electrical contact, with an anisotropic conductive film(Z-axis film) that embeds conductive vertical fibres(nickel needles). The Z-axis film also protects NanoPads from possible mechanical damage(fig1). A short-circuit detection mechanism maps all the ulC ball connected to more than one Nanopad, and the platform allows creating all the connections specified by a user netlist (fig1).

The active surface must also feed power to the ulCs. This is done from the bottom of the WaferlC using Through Silicon Vias (TSVs) for adequate signal integrity. The top side of the WaferlC must be free of any other mechanical or electrical structures to ensure good electrical contact between ulCs ball and Z-axis film wires, which means that no decoupling capacitor or external components can be used on the WaferlC. The digital interconnection between two or more distant NanoPads is accomplished by the WaferNet. This Wafer Net is very dense configurable interconnection network that spreads between unit-

cells in every direction(N-S-E-W) with unidirectional connections of various lengths. These connections have lengths 2,4,8,16and 32,where for instance 8 means that the connected unit-cells are separated by 7 others.



Figure 1 (a) WaferIC with user ICs deposited on its alignment insensitive surface.

(b) Platform cross-section where pressure in the thermal pouch ensures good electrical contact between uIC balls and the NanoPads through a Z-axisfilm.

(c) Interconnection of uICs through the WaferNet.

#### Single Rail Configurable I/O PAD

The design proposed takes advantage of a hierarchical topology in order to minimize quiescent current and silicon area consumption by sharing the maximum number of common circuitries. A masterslave topology is used in every unit-cell. In reference to fig(2), the top module uses a reference voltage (V<sub>SET</sub>) shared between 16 NanoPads. The fast load regulator (FLR) embedded in each NanoPad (fig 3) uses V<sub>SET</sub> to set the output voltage within the range of 1.0V to 2.5V. In addition,  $V_{SET}$  sets the digital I/Ovoltage levels. This technique leads to a reduction in silicon area by sharing fast load regulator with the digital I/O to provide power through configurable voltage (V<sub>SET</sub>), which avoids duplicating power stages for supplying the digital I/O. However control circuits must be added to share this regulated power supply between the digital I/O and the load. This comes at the cost of speed for both the I/O and FLR response time, since a significant parasitic capacitance is added by its gate.



Figure 2 (a) The master-slave topology proposed where a master stage feeds to 16 fast load regulators (FLR) a common reference voltage. (b) The embedded digital I/O where the feedback signal is either controlled by the FLR or the digital I/O control circuits.

The configurable I/O pad proposed in (fig 3) integrates a digital I/O within the regulation loop coupled with a boost technique using a differential pair. The digital I/O can be configured to fit standard CMOS voltages of 1.0,1.2,1.5,1.8,2.0, 2.5 and 3.3V with a post layout simulated bandwidth of over 300MHZ with a 5pF load. This approach allows very high current capabilities within a unit cell that could supply more than 100mA per NanoPad, with a theoretical 1.6 A maximum per unit-cell (16 FLR according (fig 3), with adequate integrated power.



This is due to the fact that the maximum power efficiency of a linear regulator is  $(V_{OUT})^2/(V_{DD})^2$  This fact limits the one rail approach in terms of the maximum output power that every NanoPad can provide within a small silicon area (such a unit-cell).

# A Multi-rail Power Supply for Power Efficiency Improvement

To maximize efficiency of the embedded FLR a multipower supply rail where a multilevel converter using a single rail is used to generate several output voltage levels using a multiplexed voltage supply or stack voltage cells (independent cells put in series where the output voltage is a combination of them).this multi-rail approach can increase the power efficiency by 49%. This efficiency depends on the source power supply and output voltage, with a maximum 50W of instantaneous Unfortunately this approach uses discrete components that make it incompatible with our embedded FLRs, where a fully integrated solution is required.

# A Configurable Power I/O PAD with Multi Power Rail Fast Load Regulator

To overcome the constraints on power distribution, silicon area, quiescent current minimum required efficiency, a multi-power rail FLR is proposed. A preliminary version was proposed in (fig 3) where a FLR uses dual 1.8 and 3.3V rails with an overall improvement of 40% of the power efficiency when operating at low voltage (1.0V) compared to the solution with only a single 3.3V rail [10]. A drawback of this multi power rail FLR is the duplication of all control circuitries used to assert the FLR which is costly in terms of silicon area. Another drawback is that this architecture is optimized to operate at low voltages (such as 1.0V) where 80% of the current is provided by the 1.8V rail. This contribution from the lowest voltage rail to the output current drastically decreases as the output voltage gets close 1.5V, where the 3.3V rail supplies .a complimentary solution is proposed in the presented paper where a single rail can be selected to minimize heat dissipation and silicon area.

With the solution proposed in figures,a silicon area similar to that reserved for the power transistors where power is scaled down to handle only half the maximum current capability of the previous solution

This architecture benefits from a configurable control loop, power supply and bulk biasing. The principle is that when lower output voltages are required at the NanoPad, the 1.8V rail is activated for a theoretical efficiency from 55% up to 83% (1.0 and 1.5V output voltages).

A challenge with multi power rail systems is the potential for latch up. To prevent any possibilities of latch up, protection transistors (switches) were added. With these transistors, it is possible to ensure that only one power rail at a time is tapped. Specifically, transistors M14 must be turned off when V<sub>OUT</sub> is larger than the branch power supply V<sub>DDn</sub>.M14 is turned off using the voltage V<sub>BASE</sub> that provides static bulk biasing for M13 and M14 and that also feed the control loop with the suitable supply voltage. When the  $V_{DDn}$  rail is in operation M13-M14 bulks  $(V_{BASE})$  are set to  $V_{DDn.}$  When not in operation the bulks are biased at the highest voltage,  $V_{\text{dd1}}$ . Table summarizes key characteristics of a previously reported solution and of the proposed solution for a configurable power I/O pad suitable for the wafer IC described in this section. It shows that for the same silicon area the proposed solution offers an extended tugtuo range,better power efficiency comparable I/O speed but at the cost of a smaller maximum output per rail(50mA instead 110mA). However, the same power is still available throughout the whole WaferIC.

### **Power Distribution**

Power distribution presents several significant problems.

First, we must design a global power distribution network that runs both VDD and VSS entirely in metal. Second, we must size wires properly so that they can handle the required currents. Third, we must ensure that the transient behavior of the power distribution network does not cause problems for the logic to which it supplies current.



Figure 4 A floorplan that isolates a ground pin

While keeping all these problems in mind, we must tackle two types of power supply loss:

- IR drops from steady state currents
- Drops from transient current

# **Power Distribution Types**

The predominant types of power distributions in use are

- H-tree
- A Balanced Clock Tree
- Grid

#### H-tree

The H tree is a very regular structure which allows predictable delay. The balanced tree takes the opposite approach of synthesizing a layout based on the characteristics of the circuit to be clocked.



Figure 5 H tree

An H tree is shown in Figure 5. It is a recursive construction of Hs-given one level of H structure, four smaller H structures can be added at the four endpoints of the H bars. The H tree structure can be recursively refined to any level of required detail.

The widths of the wires in the H tree can be adjusted to account for variations in load capacitance to equalize skew throughout the H tree. Buffers can also be added into the H tree network to increase drive capability.

An H tree network can be thought of as a top-down clock distribution methodology since the floorplan of the H tree determines the floorplan of the logic to which it is connected. Since skew increases with physical distance in the H tree, memory elements must be grouped together to make use of the same or nearby distribution points in the H tree network.



Figure 6 Balanced Clock Tree

A Balanced Tree Clock network, illustrated in Figure 6, is generated by placement and routing. Memory elements are clustered into groups. The clustering is used to guide placement and a clocking tree is then synthesized based on the skew information generated during clustering. The tree is irregular in shape but has been balanced during design to minimize skew. Once again, wire widths can be varied in the tree and buffers can be added. Several tools exist for generating balanced clock trees.

# Grid

A processor will have a large number of these local points and will require a large number of branches and therefore a deep distribution tree. A deep distribution tree will exhibit large POD delays and degraded clock performance. Subdividing the die into a smaller number of clock regions and applying a grid to serve each region can be a superior solution.



Figure 7 schematically shows a 2-dimensional grid serving one of these clock regions. This clock grid resembles a mesh with fully connected clock tracks in both dimensions and grid drivers located on all four sides. Local loads within a region can be directly connected to the grid. The grid effectively shorts the output of all drivers and helps minimize delay mismatches.

Figure 6 shows an idealized delay profile of a 2-dimensional grid assuming uniform loading. The shorted grid node helps balance the load non-uniformities and results in a more gradual delay profile across the region. Additionally, since the grid drivers are shorted, the POD delay to all the loads within a region is limited to the interconnect delay of the grid which is typically small and results in lower clock skew uncertainty across the region.

#### Introduction to DSCH

The DSCH2 program is a logic editor and simulator. DSCH2 is used to validate the architecture of the logic circuit before the microelectronics design is started. DSCH2 provides a user-friendly environment for hierarchical logic design, and fast simulation with delay analysis, which allows the design and validation of complex logic structures. Some techniques for low power design are described in the manual. DSCH also features the symbols, models and assembly support for 8051 and 18f64. DSCH also includes an interface to SPICE.

#### Introduction to MICROWIND

The MICROWIND2 program allows the student to design and simulate an integrated circuit at physical description level. The package contains a library of common logic and analog ICs to view and simulate. MICROWIND2 includes all the commands for a mask editor as well as original tools never gathered before in a single module (2D and 3D process view, VERILOG compiler, tutorial on MOS devices). You can gain access to Circuit Simulation by pressing one single key. The electric extraction of your circuit is automatically performed and the analog simulator produces voltage and current curves immediately.

# **SIMULATION RESULTS**

- The schematic diagram is drawn using the DSCH2 software.
- First, the requiredcomponents for the circuit are placed by drawing them from the symbol

palette available as a sidebar in the DSCH software.



Figure 8 Schematic Circuit



Figure 9 Layout diagram

**Table 1** Performance Summary of Power Rail FLR for 3.3V

| Year                      | 2005      | 2010       | 2012       | 2014            | 2015          |
|---------------------------|-----------|------------|------------|-----------------|---------------|
| CMOS<br>Process(um)       | 0.09      | 0.18       | 0.065      | 0.18            | 0.12          |
| Area(mm²)                 | 0.098     | 0.006      | 1.0908     | 0.0080          | 0.0075        |
| V <sub>IN)</sub> (V)      | 1.2       | 1.8        | 1.1        | 1.8 or 3.3      | 1.5 or 3.1    |
| $V_{out}(V)$              | 0.9       | 0.9 to 1.7 | 0.5 to 1.0 | 0.5 to<br>2.955 | 0.5 to<br>1.5 |
| I <sub>MAX</sub> (mA)     | 100       | 20         | 100        | 40              | 35            |
| $I_Q(\mu A)$              | 6000      | 1.06       | 164.5      | 0.000145        | 0.0012        |
| Respose Time $T_R(\mu s)$ | 0.00054   | 0.0015     | 0.0054     | 0.02115         | 0.011         |
| Decoupling capacitor(µF)  | 0.00060   | 0.0005     | 0.0045     | 0.0047          | 0.0025        |
| FOM(ns)                   | 0.0000314 | 0.0000632  | 0.0000969  | 0.0000133       | 0.00001       |
| Area per mA               | 0.00098   | 0.0003     | 0.010908   | 0.0002          | 0.0001        |
|                           |           |            |            |                 |               |

- After completion of the circuit, save the file with .SCH extension.
- Go to File menu and click 'Make a Verilog file' option to generate a verilog program for the designed circuit.
- The circuit drawn using DSCH software is shown in fig.8
- Layout diagram is drawn using the MICROWIND software.

- First, go to the comple menu and select the option 'compile a verilog file'.
- Then a dialogue box gets opened.
- Select the text file which is generated using abov software and the program to check for errors.
- If no errors are present, then the layout diagram get displayed.
- We can also get the characteristics of each MOS used by using this software.

#### CONCLUSION

A platform for rapidly prototyping electronic systems,the wafer board,is being developed in our lab. It is based on a configurable wafer-scale active circuit. Electronic components firmly held in contact with its surface are powered and interconnected using circuits implemented in this active surface. This paper focuses on means for power delivery that mitigate heat dissipation by introducing a novel multi power rail voltage regulator that operates from 1.8V and 3.3V rails. The addition of second power rail allows power savings up to 25%, while offering a wider range of operation at the cost of reducing the total deliverable power per rail due to limitations in the available area. The proposed design mergestwo fast load regulators into one by using configurable power supplies, the bulk biasing technique and shared transistors. The proposed architecture was fabricated in a 0.12µm CMOS technology and occupies a small area of 0.0075mm<sup>2</sup> by combining the two control loops into one, which makes it suitable for wafer scale integration. Moreover, the proposed design offers a fast response time of 11ns, with a 35mA load on either supply rail or very low quiescent currents of 120µA. This work also achieves the best figure of merit that outperforms by a factor of 3 its closest competitors.

# References

- Laflamme-Mayer.N, Blaquiere.Y, Savaria.Y, Sawan.M, "A Configurable Multi-Rail Power and I/O Pad Applied to Wafer-Scale Systems," Circuits and Systems I: Regular Papers, IEEE Transactions on , vol.61, no.11, pp.3135, 3144, Nov. 2014
- A textbook on "Modern VLSI Design- IP Based Design" by Wayne Wolf - Prentice Hall Modern Semiconductor Design Series- Fourth Edition.
- Norman. R, Valorge.O, Blaquiere.Y, Lepercq.E, Basile-Bellavance.Y, El-Alaoui.Y, Prytula.R, Savaria.Y, "An active reconfigurable circuit board," Circuits and Systems and TAISA

- Conference, 2008. NEWCAS-TAISA 2008. 2008 Joint 6th International IEEE Northeast Workshop on, vol., no., pp.351,354, 22-25 June 2008.
- Laflamme-Mayer.N, Andre.W, Valorge.O, Blaquiere.Y, Sawan. M, "Configurable Input-Output Power Pad for Wafer-Scale Microelectronic Systems," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.21, no.11, pp.2024,2033, Nov. 2013
- Valorge.O, Andre.W, Savaria.Y, Blaquiere.Y, "Power supply analysis of a largearea integrated circuit," New Circuits and Systems Conference (NEWCAS), 2011 IEEE 9th International, vol., no., pp.398, 401, 26-29 June 2011
- Hazucha.P, Karnik.T, Bloechel.B.A, Parsons.C, Finan.D, Borkar.S, "Area-efficientlinear regulator with ultra-fast load regulation," Solid-State Circuits, IEEE Journal of , vol.40, no.4, pp.933,940, April 2005

\*\*\*\*\*