@poleguy The paper I linked downthread tries several approaches! It does not look like (in the current models) a description of the language is good enough. An interactive session does help somewhat in their experiment.

I would like to see a non-Esolang example, although the effect seems very strong there.

While (b) is certainly a questionable assumption, I think it is currently borne out by this and similar results -- the models get good by having training sets that are relevant, rather than developing transferrable knowledge.