PHP8 alpha1, about JIT is everyone's main concern, how it really works, what to pay attention to, and what is the performance improvement in the end?

First, let's look at a graph:

The left diagram is a diagram of Opcache process before PHP8, and the right diagram is a diagram of Opcache in PHP8, you can see several key points:

  • Opcache does opcode-level optimizations, such as merging the two opcodes in the diagram into one

  • PHP8's JIT is currently provided in Opcache

  • The JIT is based on the Opcache optimizations and is optimized again with the Runtime information to generate machine code directly.

  • JIT is not a replacement for the original Opcache optimization, it is an enhancement.

  • At present, PHP8 only supports x86 architecture CPU

In fact, JIT shares many of the basic data structures that Opcache used to optimize, such as data flow graph, call graph, SSA, etc. For this part, if you have time, you can write a separate article to introduce it, but today we just focus on the usage level.

After downloading and installing, in addition to the original opcache configuration, we need to add the following configuration to php.ini for JIT.

opcache.jit=1205
opcache.jit_buffer_size=64M

The opcache.jit configuration looks a bit complicated, let me explain, this configuration consists of 4 separate numbers, from left to right (please note that this is based on the current alpha1 version, some configurations may be fine-tuned with subsequent versions):

  • Whether to use AVX instructions when generating machine code points, requires CPU support:

0: Not used
1: Use
  • Register allocation strategy.

0: No register allocation
1: Local (block) domain allocation
2: Global (function) domain allocation

  • JIT trigger strategy:

0: JIT when the PHP script is loaded
1: JIT when the function is executed for the first time
2: JIT when the function is called the most (opcache.prof_threshold * 100) times after a single run
3: When the function/method is executed more than N (N and opcache.jit_hot_func related) times after JIT
4: JIT a function/method when it has @jit in its comment
5: JIT when a Trace is executed more than N times (related to opcache.jit_hot_loop, jit_hot_return, etc.)

  • JIT optimization strategy, the larger the value the greater the optimization effort:

0: No JIT
1: Do JIT for the jump part between oplines
2: Convergent opcode handler calls
3: Do function-level JIT based on type inference
4Do function-level JIT based on type inference, procedure call graph
5: Script-level JIT based on type inference, procedure call graph

Based on this, we can probably conclude the following.

Try to use the 12x5 configuration, which should be the most effective at this point

For x, 0 is recommended for scripting, and 3 or 5 for web services, depending on the test results

The form of @jit may change to <<jit>> when attributes are available

Now, let's test the difference between Zend/bench.php with and without JIT enabled, first without it (php -d opcache.jit_buffer_size=0 Zend/bench.php):

simple             0.008
simplecall         0.004
simpleucall        0.004
simpleudcall       0.004
mandel             0.035
mandel2            0.055
ackermann(7)       0.020
ary(50000)         0.004
ary2(50000)        0.003
ary3(2000)         0.048
fibo(30)           0.084
hash1(50000)       0.013
hash2(500)         0.010
heapsort(20000)    0.027
matrix(20)         0.026
nestedloop(12)     0.023
sieve(30)          0.013
strcat(200000)     0.006
------------------------
Total              0.387

According to the above, we choose opcache.jit=1205, because bench.php is the script (php -d opcache.jit_buffer_size=64M -d opcache.jit=1205 Zend/bench.php).

simple             0.002
simplecall         0.001
simpleucall        0.001
simpleudcall       0.001
mandel             0.010
mandel2            0.011
ackermann(7)       0.010
ary(50000)         0.003
ary2(50000)        0.002
ary3(2000)         0.018
fibo(30)           0.031
hash1(50000)       0.011
hash2(500)         0.008
heapsort(20000)    0.014
matrix(20)         0.015
nestedloop(12)     0.011
sieve(30)          0.005
strcat(200000)     0.004
------------------------
Total              0.157

As you can see, for Zend/bench.php, compared to not turning on JIT, turning it on reduces the time taken by almost 60% and improves the performance by almost 2 times.

For your research, you can use opcache.jit_debug to observe the assembly results generated by JIT, for example, for :

function simple() {
  $a = 0;
  for ($i = 0; $i < 1000000; $i++)
    $a++;
}

We can see this with php -d opcache.jit=1205 -dopcache.jit_debug=0x01:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     xor %rdx, %rdx
     jmp .L2
.L1:
     add $0x1, %rdx
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz .L4
     cmp $0xf4240, %rdx
     jl .L1
     mov 0x10(%r14), %rcx
     test %rcx, %rcx
     jz .L3
     mov $0x1, 0x8(%rcx)
.L3:
     mov 0x30(%r14), %rax
     mov %rax, EG(current_execute_data)
     mov 0x28(%r14), %edi
     test $0x9e0000, %edi
     jnz JIT$$leave_function
     mov %r14, EG(vm_stack_top)
     mov 0x30(%r14), %r14
     cmp $0x0, EG(exception)
     mov (%r14), %r15
     jnz JIT$$leave_throw
     add $0x20, %r15
     add $0x10, %rsp
     jmp (%r15)
.L4:
     mov $0x45543818, %r15
     jmp JIT$$interrupt_handler

You can try to read this assembly, for example, the increment of i, you can see that the optimization is very strong, for example, because i is a local variable directly allocated in the register, the range of i inferred will not be greater than 1000000, so there is no need to determine whether the integer overflow and so on.

And if we use opcache.jit=1005, as introduced earlier, that is, without register allocation, we can get the following results:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     mov $0x0, 0x50(%r14)
     mov $0x4, 0x58(%r14)
     jmp .L2
.L1:
     add $0x1, 0x50(%r14)
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz .L4
     cmp $0xf4240, 0x50(%r14)
     jl .L1
     mov 0x10(%r14), %rcx
     test %rcx, %rcx
     jz .L3
     mov $0x1, 0x8(%rcx)
.L3:
     mov 0x30(%r14), %rax
     mov %rax, EG(current_execute_data)
     mov 0x28(%r14), %edi
     test $0x9e0000, %edi
     jnz JIT$$leave_function
     mov %r14, EG(vm_stack_top)
     mov 0x30(%r14), %r14
     cmp $0x0, EG(exception)
     mov (%r14), %r15
     jnz JIT$$leave_throw
     add $0x20, %r15
     add $0x10, %rsp
     jmp (%r15)
.L4:
     mov $0x44cdb818, %r15
     jmp JIT$$interrupt_handler

You can see that the part for i is now operating in memory and not using registers.

If we use opcache.jit=1201, we can get the following result:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     call ZEND_QM_ASSIGN_NOREF_SPEC_CONST_HANDLER
     add $0x40, %r15
     jmp .L2
.L1:
     call ZEND_PRE_INC_LONG_NO_OVERFLOW_SPEC_CV_RETVAL_UNUSED_HANDLER
     cmp $0x0, EG(exception)
     jnz JIT$$exception_handler
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz JIT$$interrupt_handler
     call ZEND_IS_SMALLER_LONG_SPEC_TMPVARCV_CONST_JMPNZ_HANDLER
     cmp $0x0, EG(exception)
     jnz JIT$$exception_handler
     cmp $0x452a0858, %r15d
     jnz .L1
     add $0x10, %rsp
     jmp ZEND_RETURN_SPEC_CONST_LABEL

This would be a simple introverted partial opcode handler call.

You can also try various opcache.jit strategies combined with debug configurations to observe the difference in results, or you can try various opcache.jit_debug configurations, such as 0xff, which will have more auxiliary information output.

Well, the use of JIT is briefly introduced here, about the implementation of JIT itself and other details, later there is time, I will come back to write it.

You can go to php.net now and download PHP8 to test it :)

thanks