3 & 4: use LLMs for generating semantic labels and visual features. 🦭Interface showing high-error groups for distilbert evaluated on the yelp dataset. (a) ex from the dataset in the high-error groups (sorted by loss), (b) tokens in high-error groups relative to the entire eval set, (c) 2d viz of model embeddings showing groups of errors in color. Paper: https://arxiv.org/abs/2210.05839 Space: https://huggingface.co/spaces/nazneen/seal Screencast: https://vimeo.com/736659216