Difference between revisions of "Development:InstCountCI"
(8 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
=== What you need === | === What you need === | ||
* An Arm64 Linux device that can build FEX | * An Arm64 Linux device that can build FEX | ||
+ | ** An x86-64 device using the VIXL simulator can be used as a substitute. | ||
=== Additional cmake options === | === Additional cmake options === | ||
Line 30: | Line 31: | ||
Now to see how the implementations have changed, you can just run '''git diff''' to see how the json files in '''unittests/InstructionCountCI/''' have changed. | Now to see how the implementations have changed, you can just run '''git diff''' to see how the json files in '''unittests/InstructionCountCI/''' have changed. | ||
+ | |||
+ | === Example === | ||
+ | Minor improvement when optimizing a move instruction. From [https://github.com/FEX-Emu/FEX/pull/2972 This] pull request. | ||
+ | "movd mm0, eax": { | ||
+ | - "ExpectedInstructionCount": 3, | ||
+ | + "ExpectedInstructionCount": 2, | ||
+ | "Comment": "0x0f 0x6e", | ||
+ | "ExpectedArm64ASM": [ | ||
+ | - "ubfx x20, x4, #0, #32", | ||
+ | - "fmov s4, w20", | ||
+ | + "fmov s4, w4", | ||
+ | "str d4, [x28, #752]" | ||
+ | ] | ||
+ | }, | ||
* Reset the files with '''git checkout -- unittests/InstructionCountCI/*.json''' if the changes weren't desired. | * Reset the files with '''git checkout -- unittests/InstructionCountCI/*.json''' if the changes weren't desired. | ||
+ | |||
+ | == What classifies as an optimal translation? == | ||
+ | Nothing classifies if an instruction implementation is considered optimal or not. This is left up to the human to try and understand if the translation is optimal for that particular instruction. | ||
+ | * In general the ''Optimal'' tag is just a guideline since humans could have made a mistake and the instruction could be further optimized. | ||
+ | * Alternatively ARM could introduce a new set of instructions that improve how optimal an instruction could be. | ||
+ | * Further an instruction could be considered optimal, but the reviewer ignored something like flag generation around it since that is a systemic FEX-Emu issue. | ||
+ | * Or of course, just human error and it was misunderstood as an optimal implementation but someone found a way to do it better. | ||
+ | |||
+ | This is more just a tag to help humans when they are doing code auditing so they don't need to pay as much attention to the ones that are classified as such. It's still good to periodically go over these implementations and see if things could be done better. | ||
== Diving deeper in to the assembly == | == Diving deeper in to the assembly == | ||
+ | === Manually run instcountci result === | ||
+ | A useful first step might be to run the json tests directly in the code size validation program. This can be done from the build directory. | ||
+ | |||
+ | * eg: | ||
+ | ./Bin/CodeSizeValidation unittests/InstructionCountCI/FEXOpt/libnss.json.instcountci | ||
+ | |||
+ | === Run the test through the TestHarnessRunner === | ||
While the instruction count CI is good at showing the final result, it isn't the best at showing what FEX did to get to that result. This is where the assembly test harness can come in handy. | While the instruction count CI is good at showing the final result, it isn't the best at showing what FEX did to get to that result. This is where the assembly test harness can come in handy. | ||
* Create a file in '''unittests/ASM/Test.asm''' | * Create a file in '''unittests/ASM/Test.asm''' | ||
Line 83: | Line 114: | ||
* Specifically the options for '''FEX_PASSMANAGERDUMPIR''' to get more IR dumping options and '''FEX_HOSTFEATURES''' to fake CPU feature support. | * Specifically the options for '''FEX_PASSMANAGERDUMPIR''' to get more IR dumping options and '''FEX_HOSTFEATURES''' to fake CPU feature support. | ||
* Enabling the vixl simulator with the cmake option '''-DENABLE_VIXL_SIMULATOR=True''' can be useful to test features your CPU doesn't support! | * Enabling the vixl simulator with the cmake option '''-DENABLE_VIXL_SIMULATOR=True''' can be useful to test features your CPU doesn't support! | ||
− |
Latest revision as of 03:43, 30 January 2024
InstCountCI is a continuous integration tool that FEX-Emu uses to ensure that instruction implementations aren't getting worse over time.
Getting Started
Make sure to follow Development:Setting_up_FEX to get an initial build environment set up.
What you need
- An Arm64 Linux device that can build FEX
- An x86-64 device using the VIXL simulator can be used as a substitute.
Additional cmake options
Some additional cmake options need to be passed to the FEX-Emu cmake options to get the tests building.
- -DBUILD_TESTS=True
- -DENABLE_VIXL_DISASSEMBLER=True
Quality of life improvements
Add these cmake options to make iteration time faster and have debug assertions to catch problems.
- -DENABLE_LTO=False
- -DCMAKE_BUILD_TYPE=RelWithDebInfo
- -DENABLE_ASSERTIONS=True
Running InstCountCI
First thing you need to build the tests. This step will parse all the json files inside of unittests/InstructionCountCI/ and set up running CI in the next step.
- ninja instcountci_test_files
Next you need to actually run the tests. This will run all the instructions declared in unittests/InstructionCountCI/*.json. If this step fails, that is okay since that just means that either an instruction translation has gotten worse, or if the test crashed then something catastrophic happened.
- ninja instcountci_tests
The next step is to take the data generated from the previous step and modify the resulting json that is tracked by git.
- ninja instcountci_update_tests
Now to see how the implementations have changed, you can just run git diff to see how the json files in unittests/InstructionCountCI/ have changed.
Example
Minor improvement when optimizing a move instruction. From This pull request.
"movd mm0, eax": { - "ExpectedInstructionCount": 3, + "ExpectedInstructionCount": 2, "Comment": "0x0f 0x6e", "ExpectedArm64ASM": [ - "ubfx x20, x4, #0, #32", - "fmov s4, w20", + "fmov s4, w4", "str d4, [x28, #752]" ] },
- Reset the files with git checkout -- unittests/InstructionCountCI/*.json if the changes weren't desired.
What classifies as an optimal translation?
Nothing classifies if an instruction implementation is considered optimal or not. This is left up to the human to try and understand if the translation is optimal for that particular instruction.
- In general the Optimal tag is just a guideline since humans could have made a mistake and the instruction could be further optimized.
- Alternatively ARM could introduce a new set of instructions that improve how optimal an instruction could be.
- Further an instruction could be considered optimal, but the reviewer ignored something like flag generation around it since that is a systemic FEX-Emu issue.
- Or of course, just human error and it was misunderstood as an optimal implementation but someone found a way to do it better.
This is more just a tag to help humans when they are doing code auditing so they don't need to pay as much attention to the ones that are classified as such. It's still good to periodically go over these implementations and see if things could be done better.
Diving deeper in to the assembly
Manually run instcountci result
A useful first step might be to run the json tests directly in the code size validation program. This can be done from the build directory.
- eg:
./Bin/CodeSizeValidation unittests/InstructionCountCI/FEXOpt/libnss.json.instcountci
Run the test through the TestHarnessRunner
While the instruction count CI is good at showing the final result, it isn't the best at showing what FEX did to get to that result. This is where the assembly test harness can come in handy.
- Create a file in unittests/ASM/Test.asm
- Add the following data:
%ifdef CONFIG { } %endif addps xmm0, xmm1 hlt
- Recompile the asm tests with `ninja asm_files`
- Run the assembly test manually now with FEX_DUMPIR=stderr FEX_DISASSEMBLE=blocks ./Bin/TestHarnessRunner -c irjit -n 1 -g ./unittests/ASM/Test.asm.bin ./unittests/ASM/Test.asm.config.bin
- This will dump both FEX's internal IR and the disassembly of the code for each instruction
- The second code block for the hlt can be ignored. It is just necessary for this test harness to run.
The resulting output will be:
IR-post 0x10000: (%0) IRHeader %2, #65536, #0, #1 (%2) CodeBlock %3, %10 (%3 i0) BeginBlock %2(Invalid) %4(FPRFixed1) i128 = LoadRegister #0x0, #0xd0, FPR, FPRFixed, u8:Tmp:Size %5(FPRFixed0) i128 = LoadRegister #0x0, #0xc0, FPR, FPRFixed, u8:Tmp:Size %6(FPRFixed0) i32v4 = VFAdd u8:Tmp:RegisterSize, u8:Tmp:ElementSize, %5(FPRFixed0) i128, %4(FPRFixed1) i128 (%7 i128) StoreRegister %6(FPRFixed0) i32v4, #0x0, #0xc0, FPR, FPRFixed, u8:Tmp:Size (%8 i64) InlineEntrypointOffset #0x3, u8:Tmp:RegisterSize (%9 i64) ExitFunction %8(Invalid) (%10 i0) EndBlock %2(Invalid) @@@@@ [INFO] Disassemble Begin [INFO] adr x0, #-0x4 (addr 0xffff6fa00018) [INFO] str x0, [x28, #184] [INFO] fadd v16.4s, v16.4s, v17.4s [INFO] ldr x0, pc+8 (addr 0xffff6fa00030) [INFO] blr x0 [INFO] unallocated (Unallocated) [INFO] udf #0xffff [INFO] unallocated (Unallocated) [INFO] udf #0x0 [INFO] Disassemble End
- The disassembly has some instructions at the start and end which are necessary for the JIT to run
- InstCountCI strips this code out automatically.
- In a vacuum of a single instruction, the code block header and tail can dominate the code size.
- It's recommended to become familiar with what the header and tail look like and ignore it in the resulting code generation.
- Currently the header is the first two instructions adr+str
- Currently the tail starts with the ldr+blr after the fadd and continues with some metadata afterwards.
Diving Deeper
I would recommend looking at the man page for FEX to see additional options that can be useful
- Specifically the options for FEX_PASSMANAGERDUMPIR to get more IR dumping options and FEX_HOSTFEATURES to fake CPU feature support.
- Enabling the vixl simulator with the cmake option -DENABLE_VIXL_SIMULATOR=True can be useful to test features your CPU doesn't support!