This is a reference for the situation:
https://www.tomshardware.com/news/intel-nukes-alder-lake-avx-512-now-fuses-it-off-in-silicon
Basically some Alder’s had the instruction available. I would like to check if my comp has the possibility. It is not in the bootloader. Maybe there is a way in KeyTool or another way? I don’t know if I would need to set affinity or if the CFS has a way to detect software running with the instruction and keep it running on P cores.
I’m mainly looking to try to enable this for the largest LLMs I am running. llama.cpp is looking for this instruction and subset already.
Speculatively, I think it has more to do with CPU schedulers in kernels. Junk like windows is super inflexible when it comes to the kernel. Like Windows 11 had to be made just because of the Intel asymmetric cores. Changing the scheduler in W10 just wasn’t viable. Likely having Alder different makes issues because Microsoft is so inflexible with their tiny OS.
I don’t know how Linux is handling the scheduler for intel exactly. I’ve watched most of the Linux Plumbers conference about the changes to the scheduler from 2021, but I don’t recall details about this.
People saying all it takes is disabling the efficiency cores hints that it is a scheduler issue though. I remember someone mentioning that the way cores get spun up and the time it takes to get to full speed is complicated and that management overhead was required, so I’m not sure if the scheduler is the only factor, but I think it is likely the primary.
I’m hoping someone comes along that can identify the mechanism that detects the available instructions and how this can be tested or altered in practice.
What does the avx512 instruction set have to do with scheduling?
It isn’t a direct line to connect, sorry if I am bad at explaining it. The P and E cores mix are the issue. The P cores are Xeon designs that (may) have the extra instructions. The E cores do not. If simply turning off the E cores has made the P cores show up with AVX512 on some systems, I imagine it may have to do with the scheduler. I could be wrong.
The CPU scheduler will require some kind of management function that could bind the process to a core with the extra AVX instructions without manual intervention. This would need to override availability, kernel threads, and things like power efficiency or spin up optimisation. I haven’t taken a super deep dive into how the scheduler is working on a 12th gen. I can say it appears to pin processes more, but I still see a regular rotation of most running processes across cores when they have no affinity or isolation settings, like when I am running a large LLM.