Skip to content

Fix FMA4 detection#262

Merged
shibatch merged 1 commit intoshibatch:masterfrom
colesbury:fma4
May 16, 2019
Merged

Fix FMA4 detection#262
shibatch merged 1 commit intoshibatch:masterfrom
colesbury:fma4

Conversation

@colesbury
Copy link
Copy Markdown
Contributor

FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

The mapping of registers to reg is:

reg[0] = eax
reg[1] = ebx
reg[2] = ecx <---
reg[3] = edx

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal instruction"
errors on AMD CPUs that do not support FMA4.

See pytorch/pytorch#12112
See #261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)

Fixes #261

FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

The mapping of registers to reg is:

  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal instruction"
errors on AMD CPUs that do not support FMA4.

See pytorch/pytorch#12112
See #261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
@shibatch shibatch merged commit 939f753 into shibatch:master May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FMA4 detection is wrong

2 participants