Preferences

No, but it's not obviously clear that other sized kernels will hit the same bottlenecks seen in the post. It's not really shown one way or the other - is it that the rocm kernels are just inefficient, or just the author identified one that wasn't particularly well optimized? And do these opportunities for improvement really mean that the software is "Questionable", or just that you cannot really do an equivalent comparison at the level of ISA on other vendor's software stacks?

I'm not trying to minimize the work here, it's interesting and a good example of the sort of lengths you can go to in order to squeeze that last little bit of performance out (and again, showing the advantages of public ISA documentation and support for users working at that level), I just took issue to the parent comment seeming to use this work as evidence of a poor baseline.


This item has no comments currently.