Anomalies by Synthesis

Anomaly Detection using Generative Diffusion
Models for Off-Road Navigation


Sunshine Jiang*
MIT CSAIL
Siddharth Ancha*
MIT CSAIL
Travis Manderson
MIT CSAIL
Laura Brandt
MIT CSAIL
Yilun Du
MIT CSAIL
Philip R. Osteen
US Army Research Lab
Nicholas Roy
MIT CSAIL

Video (3 min)
Pipeline for anomaly detection using analysis by synthesis

Left to right: In the synthesis step, a trained diffusion model edits a given input image to remove anomaly segments without modifying other parts of the image. In this case, the model blends the OOD vehicle into dirt in the background. The analysis step extracts anomalies by comparing the pair of images in the CLIP feature space. First, MaskCLIP computes low-resolution CLIP features for each image, which are upsampled using FeatUp. In this figure, features are visualized via a t-SNE projection to three dimensions. Cosine distances between pixel features in the two images produce a raw anomaly map that highlights anomaly objects. In contrast, comparing the images directly in RGB space (extreme right) is noisy and unable to isolate OOD segments. Finally, SegmentAnything processes the input image to generate segments; these are used to refine and clean the anomaly map.
Anomaly detection on land navigation images
Land navigation videos with anomalies removed
Anomaly detection on underwater images
Failure cases
Acknowledgements

This material is based upon work supported by the Army Research Office under Cooperative Agreement No. W911NF-21-2-0150. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.


Back to top