Potential Solutions when encountering the nan gradient problem with pytorch

less than 1 minute read

Published: March 26, 2024

When working with pytorch, NaN gradient problem can be common, here are the potential solutions that might work:

Firstly make sure the inputs do not contain or loss is not inf or NaN (e.g., via printing).
Make sure there’s no division-by-zero throughout the entire computational graph. Especially also check operations like x.sqrt() or x.pow(), make sure the values involved don’t cause mathmatical errors that can happen when they are too small, add an epsilon (e.g., 1e-8) if that’s the case.
Sometimes the problem can be caused by low precision rate: for example, if your tensors involved in the computation are torch.float16, try change to float32 by tensor.to(torch.float32), that can help in reducing numerical instability, potentially resolving the issue, though at the cost of increased computational resources.
Make sure the learning rate is not too large if it is involved in the problem. Also try gradient clipping to prevent gradients from becoming too large.
Sometimes calling torch.autograd.set_detect_anomaly(True) as a starting point may help.

Share on

Twitter Facebook LinkedIn

Understanding Diffusion Models

20 minute read

Published: June 27, 2024

Behind Diffusion Models

Diffusion models were first introduced in the seminal work by Sohl-Dickstein et al. (2015). The core idea involves reversing a Markov chain-based forward diffusion process, which gradually degrades the structure of the data $\mathbf{z}_0$ from the real data distribution $q(\mathbf{z}_0)$, by adding noise over a sufficient number of steps. When this noise is Gaussian, as is commonly assumed in practice, the cumulative effect transforms the data distribution towards a standard normal distribution $\mathcal{N}(0, I)$ as the forward process progresses. We can then sample from this distribution and use a learned reverse process to generate new samples that match the real data distribution.

Yinghua Zhou

Potential Solutions when encountering the nan gradient problem with pytorch

When working with pytorch, NaN gradient problem can be common, here are the potential solutions that might work:

Share on

You May Also Enjoy

Understanding Diffusion Models

Behind Diffusion Models

Conda Setup Diary

This is the general steps on top of my head, that I followed to set up the conda environment on a computing server for a new user account, which might be commonly encountered when doing AI research.

Leetcode 904: Fruit Into Baskets

Intuition

Leetcode 1359: Count All Valid Pickup and Delivery Options

General Idea