

## KGx – Known Good x

September 7, 2022



## Today's Presenters



**Tom Katsioulas**GSA Global



Marc Hutner
ProteanTecs



Jay Rathert KLA



## Technical Program Committee (TPC)











Abram Detofsky Intel

Neal Edwards AMD

**Zoe Conroy** Cisco

Dave Armstrong
Advantest

**Ira Feldman**Feldman Engineering



## Support MEPTEC

Membership and sponsorship enable MEPTEC to produce events and publications with the highest quality technical content relevant to the packaging, test, and design communities.

#### **Membership**

- Registration discounts
- MEPTEC Report subscription

Join/renew at <a href="https://www.meptec.org">www.meptec.org</a>

#### **Sponsorship**

Multiple levels of corporate sponsorships are available for virtual and upcoming in-person events.

Questions about joining or sponsoring? Please contact Bette Cooper <a href="mailto:bcooper@meptec.org">bcooper@meptec.org</a>





# **Elevating KGD with Deep Data Analytics**



Sept 2022

Marc Hutner, Sr. Director of Product Marketing



## proteanTecs Leading a New Category

Deep data health & performance monitoring for advanced electronics

Founded in 2017 by industry leaders and co-founders of Mellanox



Addressing industry-wide challenges of scale

In use and proven in 50+ Design wins



Customers in multiple key segments including **Datacenter**, **Automotive**, **Communications**, and **Mobile** 

New category with a multi-disciplinary approach

# Global Footprint Germany Ukraine New Jersey Taiwan India R&D

Backed by worldwide leading investors with a proven track record in the Electronics and SaaS industries































Responsibility moves to the machine



- → Electronics everywhere
- → 24/7 availability
- → As-a-service

#### A siloed industry unprepared for scale

**12-18** months of chip **debugging** 

2.7% of product revenue

Warranty claim rates

50% of fault investigations inconclusive

car failure per hour

Mega functionality



- → Advanced technologies
- → Quality/ performance tradeoffs
- → Surging costs

McKinsey & Company, Advanced Analytics in Semiconductor Manufacturing; 2017

## **Challenges in High Performance Computing Resilience**



- HW failures and service disruption
- No in-mission monitoring solution
- Expensive redundancies
- Constant replacements
- Lifetime reliability and safety concerns

#### Market leaders are looking for answers from the industry

#### facebook.

"Silent data corruption due to silicon latent defects and aging. (1,000 DPPM!)"



"Frequent unexplained HW failures with "No Issue Found" at high rates"



"Rare, short-term computational errors on systems that passed all manufacturing tests successfully"



CARIAD

"Current FuSa guidelines do not cover HW/system reliability during the operational lifetime"

- "Silent Data Corruptions at Scale"; Harish Dattatraya Dixit Et. Al., Facebook, arxiv.org/abs/2102.11245. Feb. 2021
- 2. "Cores that Don't Count"; Peter H. Hochschild Et. Al., Google Research, Jun. 2021
- "Circuit Reliability Mitigation Techniques & EDA Requirements", Georgios Konstadinidis, Google, EDPS 2019
- "Improving Cloud Scale Hardware Fault Diagnostics", Neeraj Ladkani, Rama Bhimanadhuni, Microsoft, Mar. 2020 OCP Virtual Summit
   "Automotive semiconductor reliability and its changed importance for complex system engineering, Andreas Aal, Volkswagen, Dr. Oliver Aubel. Globalfoundries. IRPS 2021

<sup>/</sup> **protean**Tecs

## Visibility at Every Stage



**Faster Time to Market** 

**10X Lower DPPM** 

**Reduced Costs** 

**Failure Prevention** 

## **Multi-Pillar Solution**

Deep Data

Machine Learning

Cloud & Edge **Analytics** 











Universal Chip Telemetry™ (UCT) with on-chip Agents

Agent fusion and inference with ML algorithms

Advanced analytics for actionable insights



**Automated** insertion tools







## Universal Chip Telemetry™ (UCT)

#### On-chip Agents built for analytics

- Parametric measurements
- High coverage & high resolution
- Minimal PPA penalty
- Operate in mission-mode
- Sense the surrounding electronics
- Application optimization to HW





Interconnect Performance Monitoring



Operational Monitoring



Performance and Degradation Monitoring



Classification and Profiling

## Introducing Proteus™ Platform

Flexible, easy-to-use and deep lifecycle analytics for advanced electronics

Cloud based, enterprise grade

Built-in targeted solutions

Open for innovation

Open for integration

Actionable at edge

Cloud agnostic technologies | Secure and robust | Scalable for big data

Machine learning | Actionable insights | Dashboards and alerts

SDK enabled | DIY ML Ops development | Customizable data delivery

Multiple data sources | Automated feedback deployment | API integration

Insights on edge devices | Near real time | Minimal footprint

## **Proteus Targeted Analytics**

#### Chip NPI



Post to Pre silicon correlation



Correlation between value chain stages



Performance Tuning



Degradation monitoring in qualification

#### **Chip Production**



DPPM Reduction (Fine grain latent defects screening)



Power reduction



Test time reduction



Lower RMAs and Fast Time-to-Resolution

## **Proteus Targeted Analytics**

#### System NPI & Production



Performance Optimization (HW-SW)



Quality: Chip as system monitor



System performance early predictions for TTR



Debug & root cause analysis

#### In-Field



**Predictive Maintenance** 



Continuous performance monitoring



Real time applications (Power, Performance & Reliability)



RMA Reduction with Fast Time-to-Resolution

## **Use Cases**

## **Estimator Based Outlier Detection**

- Personalized chip assessment
- Multi dimensional outlier detection
- Yield reclamation
- Fast RMA with pinpoint Root Cause Analysis



## **Correlation Across Stages**

#### Accurate correlation & validation between test stages: setup and test stress

Optimization of test conditions across all test stages (WS, FT, SLT, ST)

- Test limits and guard-bands, test equipment validation, etc.
- Accurate characterization of operational, environmental and stress effects
- ATPG structural test tuning correlative to real application at system
  - Common data 'language' between ATE and system test



## Operational Savings Vddmin predictor at HVM

Early & accurate binning assessment for Final test or System @Wafer Sort: Cost reduction, operational efficiencies Test time reduction: Avoid doing Vddmin search in Wafer sort and/or Final Test



- Models built in proteanTecs
  Platform and deployed at the Test
  floor
- Multiple Vddmin predictors @Freq (or many Freqs) for different tests
- Similar application for Fmax prediction @ V (or different Vs)
- Predictions for different tests and test stages



r / **protean**Tecs

### **Characterization and Test for D2D interfaces**

Comprehensive parametric lane grading

Go beyond just Pass/Fail testing

- 100% lane coverage
- During test and in-mission
- Data analytics capabilities









Complementary to proteanTecs comprehensive chip performance and health monitoring solutions

## proteanTecs at the Edge:

#### Models deployed for wafer sort or final product

- A subset of the proteanTecs capabilities, called the proteanTecs Library, can be run at the test floor for low latency, "real time" decisions and insight
- A proteanTecs function library is called from within the test program. This function library is integrated as a *dll* as part of the test release





## **System Performance Monitoring**

Agent data recorded throughout multiple phases of system operation:

- Configure
- Operation Start 1
- Operation End 1
- Idle 1
- Idle 2
- Operation Start 2
- Operation End 2
- Idle 3
- Idle 4
- Correlating and merging different Agent measurements
- Margins of millions of paths
- IR drop
- Workload/stress: V\*T\*Toggle rate
- Cycle to cycle clock jitter



Bring up & Production EOL software optimization and performance tuning

# Health Monitoring with Alerts on Faults Before Failure

- UCT-based continuous performance and health monitoring
- In mission-mode
- High coverage critical path monitoring based on millions of internal paths and low margin alert
- Correlated to system environment and application induced stress
- Workload management for achieving longer product lifetime
- High reliability through personalized Predictive Maintenance





## **Customer Success Stories**

#### **FORTUNE 50 NETWORKING COMPANY**

18% power reduction in chip

IR drop recovery at SLT

"The average voltage improvement is around 6~7steps and this is significant since it's such a big chip"

#### LEADING FABLESS SEMICONDUCTOR COMPANY

2% yield improvement due to feed back to the fab of design sensitivity to process

Inferred process parameters per die detected sensitivity of yield to process

"The proteanTecs analysis helps us to prove the correctness of external circuit modification and the chip design quality"

#### LEADING ASIC VENDOR

Detection of silicon-to-simulation miscorrelation in 5nm testchip

Post-to-pre parametric correlation

"proteanTecs speeds chip and system bring-up, significantly reducing time-to-market"

#### **FORTUNE 50 NETWORKING COMPANY**

5% system outlier rate detected at System Test

Latent defect detection of thermal effects

"Solid and leverageable data flows"

#### **AUTOMOTIVE STARTUP**

Board redesign due to detection of PDN issue

IR drop detection at SLT

"They made us look like a 15 year semi company"

## **Concluding Thoughts**

Deep Data Analytics is required for KGD

- New level of silicon and system understanding
- Enables shift left for wafer sort
- Provides visibility for interaction of chiplets within package
- Diagnoses interaction of SW with HW
- Methods can be used at every stage from Wafer Sort to in-field

## Thank you.

marc.hutner@proteanTecs.com





## **COPYRIGHT NOTICE**

This presentation in this publication was presented at the **Known Good X (KGx) Workshop** (September 7, 2022). The content reflects the opinion of the author(s) and their respective companies. The inclusion of presentations in this publication does not constitute an endorsement by MEPTEC or the sponsors.

There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.

www.meptec.org

