Facebook Moving To The JVM

The Big Guns get behind mlvm. I mean, BIG like GE, and Facebook!
"Are interpreters immoral?" A question I posed some months ago which might soon become irrelevant. General purpose interpretors are about to go the way of general purpose punch cards!

Facebook are looking to move PHP onto the JVM. Why, because clock cycles cost money.  Their first approach was hiphop, a PHP to C+++ cross compiler. Now they are looking into compiling PHP to run on the multi-language VM. The presence of Facebook engineers at the JVM Language Summit in San-Francisco along with their interest in implementing PHP using invoke-dynamic on the JVM is a the shock. The main seismic event will be nothing less than the complete removal of interpretors from main stream general purpose programming. 

The mlvm is the latest version of the JVM (which by convention seems to have changed its case). One might think that JSR 292, the community process which lead to the mlvm, was quite small. It resulted in a single additional bytecode - invoke dynamic. How can something so small revolutionise the world?

0 - enough said. Zero is very small, just a line drawn around space. However, 0 revolutionised mathematics and from that the world. Oddly, the classic example of 0 in the 1010 code of binary which computers use, is actually a misnomer. Alan Turing designed the intellectual prototype computer without zeros as a machine does not need to draw a line around space to know there is nothing there. However, I digress.

Javascript running at near native speeds.
How is invoke dynamic like zero? It opens the door to doing many things which go beyond what can practically achieved without it. The mult-language virtual machine can do for Ruby, PHP, Magik (the implementation on which of which I am personally working) etc. what V8 has done for Javascript. Take a slow interpreted system and allow its compilation to machine code without loosing any of the flexibility and rapid deployment benefits of the original, interpreted design.

Can It Be Done?
Well, GE are doing it. The millions of lines of code written in Magik will very soon all be able to run on the mlvm version of Magik. The technical challenges are non trivial - but they have been overcome. I am very proud to have been part of that team. If it can be done for Magik then doing it for PHP, which is a somewhat simpler language (really, I do know as I program in both: scoping, expression evaluation and first order function handling are all more complex in Magik than PHP) should be easier.

Will It Be Fast Enough?
Initially, I had my doubts. However, I am starting to believe that it will. We really can get fully dynamic, interpretor style languages to run close to C performance. By close, I mean less than 3 times slower. The first time I got the feeling this was possible came from working on algorithm visualisation using Javascript. In these tests, Javascript on V8 ran around 3 times slower than pure Java. Now, that is 3 times slower, but not the tens or hundreds of times slower one might expect from old interpreted Javascript. Please, do not forget here that Java written well runs as fast or faster than C.

Javascript on V8 is not PHP on mlvm, but the principle stands. Some of the just in time compilation systems around the mlvm are very raw and need to be improved and/or replaced. Working practices around how to implement on the platform need evolution. However, even at this early stage I have seen it take code and accelerate it 10:1 over a C based interpretor implementation. 

Java 8 - The Next Speed Step
Two things need to change. Firstly, the implementation of method handle compilation needs to become a more main stream part of the JIT compilation in the JVM. Second, the paradigms for source to bytecode cross compilation need to evolve. The key to effective call site use is to keep sites monomorphic or low count polymorphic. What this means is that at a point of dynamic dispatch an invoke dynamic instruction is placed in the byte code created from source code. Now, if the signature of the place that dispatches to never changes, it is called monomorphic and can be compiled down to a direct machine code dispatch. Remember that this is assessed at runtime and can change at any point; it is dynamic as in dynamic language.

If a few different signatures pass by the point of dispatch, then a simple chain 'if then else' style system can be used. Once the site has more than a handful of signatures to handle it become megamorphic. In this situation dispatch is based around lookup trees and hash tables and ends up no more efficient than more standard interpretor designs. Dispatch under these conditions is quite tricky to get correct, especially under multi-threaded conditions; but that is another story. As we, as a programming community, become more used to invoke dynamic styles we will figure out how to drive call sites into the leaves of the type tree and thus cause more of them to be mono or low count polymorphic. 

If that is a bit too technical - I will try and put it a different way. Invoke dynamic allows dynamic languages with runtime dispatch to run as fast as compiled languages with compile time dispatch configuration. This is done by making the assumption that dispatch from point X in code Y will usually be to type A. But, in iterators (for example) dispatch from X might be to Y,W,Z and a whole bunch of other things. This can be overcome by reimplementation inheritance replacing traditional call reference inheritance with copy inheritance. Consider P,Q and R are all sub types of Y. Each has the same iterator loop. Current common practice is for that loop to be inherited via call and so P, Q and R call a common code block defined on Y which does the iteration. On help make the loop monomorphic we can reimplement the iteration loop on P,Q and R thus (probably) reducing the number of types seen in each instance of the iterator loop.

Such tricks as this - and many more to be invented - will help make mlvm code run faster and faster. Lambdas in the JVM are the other secret ingredient.

Lambdas and why they mater so much. 
Lambdas are a bit like an automatic gearbox; they make coding easier but are not strictly required. I was rather dubious about their addition to Java (yet all my cars have automatic gearboxes - what gives). I like using lambdas; they are a great addition to C++ for example. However, C++ is already far too large for anyone to understand it all perfectly. Java is not yet at that stage, it is still a relatively small language. However, each major release brings more features and the tipping point looms. However, lambda's in Java are being implemented using invoke dynamic. This means that the performance of a key Java feature will depend on the invoke dynamic framework. As a direct consequence, that framework within the JVM will be worked on feverishly and hence will operate super fast; not now, but soon.

So Are Facebook Doing The Right Thing?
Yes! The rise and rise of Javascript over the last 4 years as shown us just how powerful JIT compilation of once interpreted languages can be. The fall and fail of complete re-writes has shown us just how unrealistic it is to completely move a working system from one language to another. Facebook has a stupid amount of PHP and so it is by far the most sensible thing to port that to mlvm. Actually it should be pretty easy.

My experience would indicate that porting all of PHP 5.4 to mlvm should take no more than 6 person years of effort (given the right persons). This is based on a hybrid model. The best route forward is to make a simple JNI call system from Java to native PHP. The pre-existing PHP libraries, which are written in C, can be called via this mechanism from Java. The Java will form the runtime of the new PHP implementation. PHP its self will compile to byte code but when it calls down to what would have been a C library it will call through the Java runtime to the pre-existing C runtime.

Over a period of months or years, those 'call through' points which are hit frequently can be ported from C to Java and so the whole system be slowly move from a hybrid one to a pure PHP/mlvm architecture. Such an approach attracts very low risk and provides a progressive improvement in performance for Facebook.

The Future:
Facebook have done a good job so far of moving key technologies into the open source world. I fully expect either them or someone else to port PHP into the mlvm just like Charles Oliver Nutter and team have done with JRuby. Redhat's sponsorship of  JRuby and its recent hire of Nutter esq. indicates how the movers and shakers of the IT world are lining up behind the mlvm. It is also great for those of use who like PHP - as the big easy.

These are exciting times. Soon I expect interpretors and interpreted langauges to be confined to DSLs and all general purpose coding to be running in a JIT environment or as up-front compilation. That will be good for companies, good for performance and good for the planet. Well done JSR 292 and all who worked on her!