Self-Supervised Learning for Lithography

masked autoencoders for hotspot detection

March - June 2026

PyTorchSelf-supervised learning

read the paper →see the poster →view the code →

Lithography hotspots are small regions in chip layouts that are likely to print incorrectly during semiconductor manufacturing, creating defects that can hurt yield or require expensive redesigns. While supervised machine learning methods can detect known hotspot patterns, they often fail on truly never-before-seen layouts from newer chip process nodes.

With two teammates, I built a self-supervised hotspot detection system using masked autoencoders. We pretrained Vision Transformer and CNN / ResNet-18 MAE models on unlabeled binary chip layout patches from ICCAD 2012, masking 75% of each image and training the models to reconstruct the missing geometry. At test time, we used reconstruction error as an anomaly score: layouts the model reconstructed poorly were more likely to be hotspots.

Pretraining and inference pipeline — the pipeline: mask 75% of a layout, reconstruct, and score by how badly it fails

Original, masked, reconstruction, and error maps for hotspot and non-hotspot layouts — original · masked input · reconstruction · error map, on truly-never-seen-before layouts

We evaluated across three levels of distribution shift: in-distribution ICCAD 2012 layouts, mildly out-of-distribution ICCAD 2019 layouts, and the ICCAD 2019 “truly never seen before” (TNSB) benchmark. The supervised ResNet-18 baseline achieved 0.989 AUROC in-distribution but dropped to 0.456 AUROC on TNSB, showing how brittle supervised detectors can be on novel geometries.

ViT MAE reconstruction-error AUROC across ICCAD 2012, ICCAD 2019, and TNSB — ViT MAE reconstruction-error AUROC across the three datasets as distribution shift increases (left to right). Each curve is a different pretraining setup: trained on all layouts vs. non-hotspots only (NHS), at 25% or 75% masking.

In contrast, our best MAE reconstruction model reached 0.870 AUROC on TNSB, outperforming the supervised baseline by over 0.4 AUROC. These results suggest that self-supervised reconstruction can learn geometric layout priors that transfer better to new chip nodes, even without labeled hotspot examples from the target dataset.

To check what the pretrained encoder had actually learned, we also ran a linear probe: freezing the ViT MAE features and training only a linear classifier on top. The probe climbed past 0.99 validation AUROC in-distribution, confirming that the self-supervised features carry strong, linearly-separable signal about layout structure, not just reconstruction quirks.

Linear probe on frozen ViT MAE features — linear probe on frozen ViT MAE (non-hotspot pretrained) features, reaching >0.99 in-distribution AUROC

← back to work