I'm talking specifically about the variant of RL used by systems like AlphaGo.