I mean, you could. Just encode 100ms chunks or whatever into tokens then push them through the same model. I'm pretty sure that's what the claim to do (though with MoE/routing now, maybe).