You got the explanation of the speed difference wrong.
If you look at the decompiled code of the synthetic generated classes, you can see that both the lambda and the function reference keep a reference to the ViewModel, passed through their constructor.
The difference is that the function reference class extends FunctionReferenceImpl, and FunctionReferenceImpl's constructor takes many parameters: an int, the class instance, the class, and string descriptions of the functions.
So, it takes more time to instantiate the FunctionReferenceImpl object because the constructor needs to retrieve and allocate all these metadata values, while the Lambda constructor is simpler.
Once the reference is created, the time it takes to invoke it (calling screen.onClick()) should be exactly the same.
One final note: to run microbenchmarks precisely, you need to perform warmup before the test, account for possible CPU scaling, measure time more precisely in nanoseconds, etc. Benchmark libraries like JMH or the Android Microbenchmark library do this for you.