I think the article is somewhat over-representing the difficulty here. Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from. Once you factor in that many/most teams are designed around one or two specific synergies, and that your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc), which puts huge error bars around whatever prediction you're trying to make, it usually turns out that you're really only picking from 1-3 realistic choices, and there's a very paper-scissors-rock nature to it that you can't really "learn" in the ML sense.
I think you could have gotten equivalent results on such a predictor using much simpler regressions and/or heuristics, once you've already fixed the matchup.
(Also, I just think it's funny how the paper keeps citing "(Zheng, 2020)", etc, like it's a scholarly article or something. Aaron Zheng is a VGC YouTuber and what is being cited is just an online guide a la GameFAQs)
The soft prediction metric seems especially ridiculous to me. If I'm not mistaken, just picking at random gets better results than their ML selection at >= 5 predictions (1-(2/3)*5 > 0.8438).
However:
> your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc)
That's not true in the main competitive live format (e.g. NAIC 2025 which is the main case study here). These tournaments are "open team sheet", aka. moves, ability and held items are known (but not IVs/EVs).
I'm not sure whether this is the case on Smogon though, which means they might even be mixing two completely different datasets...
> Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from.
Nit: there are 15 possible lineups (i.e. combinations of 2 pokemons to start the battle with) but there are 90 possible teams if you also factor in the other 2 pokemons in the back.
Most of my experience is with pre team preview singles (where there was an entirely different meta of blindly choosing a lead that would match up favorably against the set of other common leads), but my understanding was that VGC has a handful of Pokemon (Smeargle...) with a P_lead/P_bring ratio of 1.
I think the article is somewhat over-representing the difficulty here. Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from. Once you factor in that many/most teams are designed around one or two specific synergies, and that your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc), which puts huge error bars around whatever prediction you're trying to make, it usually turns out that you're really only picking from 1-3 realistic choices, and there's a very paper-scissors-rock nature to it that you can't really "learn" in the ML sense.
I think you could have gotten equivalent results on such a predictor using much simpler regressions and/or heuristics, once you've already fixed the matchup.
(Also, I just think it's funny how the paper keeps citing "(Zheng, 2020)", etc, like it's a scholarly article or something. Aaron Zheng is a VGC YouTuber and what is being cited is just an online guide a la GameFAQs)
The soft prediction metric seems especially ridiculous to me. If I'm not mistaken, just picking at random gets better results than their ML selection at >= 5 predictions (1-(2/3)*5 > 0.8438).
However:
> your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc)
That's not true in the main competitive live format (e.g. NAIC 2025 which is the main case study here). These tournaments are "open team sheet", aka. moves, ability and held items are known (but not IVs/EVs).
I'm not sure whether this is the case on Smogon though, which means they might even be mixing two completely different datasets...
> but not IVs/EVs
And even then these can be guessed or even inferred using previous battles as an indicator.
> Once you're at the team selection screen and choosing your lineup, there are only 15 possible combinations to choose from.
Nit: there are 15 possible lineups (i.e. combinations of 2 pokemons to start the battle with) but there are 90 possible teams if you also factor in the other 2 pokemons in the back.
Most of my experience is with pre team preview singles (where there was an entirely different meta of blindly choosing a lead that would match up favorably against the set of other common leads), but my understanding was that VGC has a handful of Pokemon (Smeargle...) with a P_lead/P_bring ratio of 1.