OSSP-PTA: An Online Stochastic Stepping Policy for PTA on Reinforcement Learning

  • Dan Niu
  • , Yichao Dong
  • , Zhou Jin
  • , Chuan Zhang
  • , Qi Li
  • , Changyin Sun

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

The dc analysis is essential and still quite challenging in large-scale nonlinear circuit simulation. Pseudo transient analysis (PTA) is a widely used and has great potential solver in the industry. However, the PTA convergence and simulation efficiency is still seriously affected by its stepping policy. This article proposes an online stochastic stepping policy (OSSP) for PTA based on deep reinforcement learning (DRL). To achieve better policy evaluation and stronger stepping exploration ability, the dual soft Actor-Critic agents work with the proposed valuation splitting and online momental scaling, enabling our OSSP to intelligently encode PTA iteration status and online further adjust forward and backward time-step size for unseen test circuits without human intervention and domain knowledge, trained solely by reinforcement learning from self-search. Our public sample buffer and priority sampling are also introduced to overcome the sparsity and imbalance of sample data. Numerical examples demonstrate that the proposed OSSP achieves a significant efficiency speedup (up to 47.0× less Newton-Raphson iterations) and convergence enhancement on unseen test circuits compared with the previous iter-based and switched evolution/relaxation-based stepping methods, in just one stepping iteration.

Original languageEnglish
Pages (from-to)4310-4323
Number of pages14
JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume42
Issue number11
DOIs
StatePublished - 1 Nov 2023
Externally publishedYes

Keywords

  • DC analysis
  • momental scaling
  • pseudo transient analysis (PTA)
  • stochastic stepping
  • valuation splitting

Fingerprint

Dive into the research topics of 'OSSP-PTA: An Online Stochastic Stepping Policy for PTA on Reinforcement Learning'. Together they form a unique fingerprint.

Cite this