Monday, June 30, 2008

JIT in Prolog

Hi all,

Some news from the JIT front. Progress on the JIT has been low-profile in the past few months. No big results to announce yet, but we have played with some new ideas, and they are now documented as a draft research paper: Towards Just-In-Time Compilation and Specialisation of Prolog.

Prolog? Yes. To understand this slightly unusual choice of programming language, here is first some background about our JIT.

PyPy contains not a JIT but a JIT generator, which means that we only write an interpreter for a language (say, the complete Python language), and we get a JIT "for free". More precisely, it's not for free: we had to write the JIT generator, of course, as well as some amount of subtle generic support code. The JIT generator preprocesses the (complete Python) interpreter that we wrote and links the result with the generic support code; the result is a (complete Python) JIT.

The way that this works so far gives us a generated JIT that is very similar to Psyco in the way it works. But Psyco has issues (and so the current PyPy JITs have the same issues): it can sometimes produce too much machine code, e.g. by failing to notice that two versions of the machine code are close enough that they should really be one; and it can also sometimes fail in the opposite way, by making a single sub-efficient version of the machine code instead of several efficient specialized versions.

A few months ago we have chosen to experiment with improving this instead of finishing and polishing what we had so far. The choice was mostly because we were (and still are) busy finishing and polishing everything else in PyPy, so it was more fun to keep at least the JIT on the experimental side. Besides, PyPy is now getting to a rather good and complete state, and it is quite usable without the JIT already.

Anyway, enough excuses. Why is this about Prolog?

In PyPy, both the (complete Python) interpreter and the JIT support code are in RPython. Now RPython is not an extremely complicated language, but still, it is far from the top on a minimalism scale. In general, this is a good in practice (or at least I think so): it gives a reasonable balance because it is convenient to write interpreters in RPython, while not being so bloated that it makes our translation toolchain horribly complicated (e.g. writing garbage collectors for RPython - or even JIT generators - is reasonable). Still, it is not the best choice for early research-level experimentation.

So what we did instead recently is hand-write, in Prolog, a JIT that looks similar to what we would like to achieve for RPython with our JIT generator. This gave much quicker turnaround times than we were used to when we played around directly with RPython. We wrote tiny example interpreters in Prolog (of course not a complete Python interpreter). Self-inspection is trivial in Prolog, and generating Prolog code at runtime is very easy too. Moreover, many other issues are also easier in Prolog: for example, all data structures are immutable "terms". Other languages than Prolog would have worked, too, but it happens to be one that we (Carl Friderich, Michael Leuschel and myself) are familiar with -- not to mention that it's basically a nice small dynamic language.

Of course, all this is closely related to what we want to do in PyPy. The fundamental issues are the same. Indeed, in PyPy, the major goals of the JIT are to remove, first, the overhead of allocating objects all the time (e.g. integers), and second, the overhead of dynamic dispatch (e.g. finding out that it's integers we are adding). The equivalent goals in Prolog are, first, to avoid creating short-lived terms, and second, to remove the overhead of dispatch (typically, the dispatching to multiple clauses). If you are familiar with Prolog you can find more details about this in the paper. So far we already played with many possible solutions in the Prolog JIT, and the paper describes the most mature one; we have more experimentation in mind. The main point here is that these are mostly language-independent techniques (anything that works both in Prolog and in RPython has to be language-independent, right? :-)

In summary, besides the nice goal of speeding up Prolog, we are trying to focus our Prolog JIT on the issues and goals that have equivalents in the PyPy JIT generator. So in the end we are pretty convinced that it will give us something that we can backport to PyPy -- good ideas about what works and what doesn't, as well as some concrete algorithms.

Friday, June 27, 2008

PyPy code swarm

Following the great success of code_swarm, I recently produced a video that shows the commit history of the PyPy project.

The video shows the commits under the dist/ and branch/ directories, which is where most of the development happens.

In the first part of the video, you can see clearly our sprint based approach: the video starts in February 2003, when the first PyPy sprint took place in Hildesheim: after a lot of initial activity, few commits happened in the next two months, until the second PyPy sprint, which took place in Gothenburg in late May 2003; around the minute 0:15, you can see the high commit rate due to the sprint.

The next two years follow more or less the same pattern: very high activity during sprints, followed by long pauses between them; the most interesting breaking point is located around the minute 01:55; it's January 2005, and when the EU project starts, the number of commits just explodes, as well as the number of people involved.

I also particularly appreciated minute 03:08 aka March 22, 2006: it's the date of my first commit to dist/, and my nickname magically appears; but of course I'm biased :-).

The soundtrack is NIN - Ghosts IV - 34: thanks to xoraxax for having added the music and uploaded the video.


PyPy Codeswarm from solse@trashymail.com on Vimeo.

Thursday, June 26, 2008

Funding of some recent progress by Google's Open Source Programs

As readers of this blog already know, PyPy development has recently focused on getting the code base to a more usable state. One of the most important parts of this work was creating an implementation of the ctypes module for PyPy, which provides a realistic way to interface with external libraries. The module is now fairly complete (if somewhat slow), and has generated a great deal of community interest. One of the main reasons this work progressed so well was that we received funding from Google's Open Source Programs Office. This is really fantastic for us, and we cannot thank Google and Guido enough for helping PyPy progress more rapidly than we could have with volunteer-only time!

This funding opportunity arose from the PyPy US road trip at the end of last year, which included a visit to Google. You can check out the video of the talk we gave during our visit. We wrapped up our day with discussions about the possibility of Google funding some PyPy work and soon after a we were at work on the proposal for improvements we'd submitted.

One nice side-effect of the funding is indeed that we can use some of the money for funding travels of contributors to our sprint meetings. The next scheduled Google funding proposal also aims at making our Python interpreter more usable and compliant with CPython. This will be done by trying to fully run Django on top of PyPy. With more efforts like this one we're hoping that PyPy can start to be used as a CPython replacement before the end of 2008.

Many thanks to the teams at merlinux and Open End for making this development possible, including Carl Friedrich Bolz, Antonio Cuni, Holger Krekel, Maciek Fijalkowski at merlinux, Samuele Pedroni and yours truly at Open End.

We always love to hear feedback from the community, and you can get the latest word on our development and let us know your thoughts here in the comments.

Bea Düring, Open End AB

PS: Thanks Carl Friedrich Bolz for drafting this post.

Sunday, June 22, 2008

Pdb++ and rlcompleter_ng

When hacking on PyPy, I spend a lot of time inside pdb; thus, I tried to create a more comfortable environment where I can pass my nights :-).

As a result, I wrote two modules:

  • pdb.py, which extends the default behaviour of pdb, by adding some commands and some fancy features such as syntax highlight and powerful tab completion; pdb.py is meant to be placed somewhere in your PYTHONPATH, in order to override the default version of pdb.py shipped with the stdlib;
  • rlcompleter_ng.py, whose most important feature is the ability to show coloured completions depending on the type of the objects.

To find more informations about those modules and how to install them, have a look at their docstrings.

It's important to underline that these modules are not PyPy specific, and they work perfectly also on top of CPython.

Friday, June 20, 2008

Running Nevow on top of PyPy

Another episode of the "Running Real Application of top of PyPy" series:

Today's topic: Divmod's Nevow. Nevow (pronounced as the French "nouveau", or "noo-voh") is a web application construction kit written in Python. Which means it's just another web framework, but this time built on top of Twisted. While, due to some small problems we're not yet able to pass full Twisted test suite on top of pypy-c, Nevow seems to be simple enough to work perfectly (959 out of 960 unit tests passing, with the last one recognized as pointless and about to be deleted). Also, thanks to exarkun, Nevow now no longer relies on ugly details like refcounting.

As usual, translate pypy using:
translate.py --gc=hybrid --thread targetpypystandalone --faassen --allworkingmodules --oldstyle

Of course, obligatory to the series, screenshot:
This is Nevow's own test suite.

Cheers,
fijal

Monday, June 16, 2008

Next sprint: Vilnius/Post EuroPython, 10-12th of July

As happened in the last years, there will be a PyPy sprint just after EuroPython. The sprint will take place in the same hotel as the conference, from 10th to 12th of July.

This is a fully public sprint: newcomers are welcome, and on the first day we will have a tutorial session for those new to PyPy development.

Some of the topics we would like to work on:

  • try out Python programs and fix them or fix PyPy or fix performance bottlenecks
  • some JIT improvement work
  • port the stackless transform to ootypesystem

Of course, other topics are also welcome.

For more information, see the full announcement.

Sunday, June 15, 2008

German Introductory Podcast About Python and PyPy

During the Berlin Sprint Holger was interviewed by Tim Pritlove for Tim's Podcast "Chaosradio Express". The whole thing is in German, so only interesting to German-speakers. The PyPy episode can be found here. The interview is touching on a lot of topics, starting with a fairly general intro about what Python is and why it is interesting and then moving to explaining and discussing PyPy. The bit about PyPy starts after about 45 minutes. There is also a comment page about the episode.

Tuesday, June 10, 2008

Running Pylons on top of PyPy

The next episode of the "Running Real Applications on Top of PyPy" series:

Yesterday, we spend some time with Philip Jenvey on tweaking Pylons and PyPy to cooperate with each other. While doing this we found some pretty obscure details, but in general things went well.

After resolving some issues, we can now run all (72) Pylons tests on top of pypy-c compiled with the following command:

translate.py --gc=hybrid --thread targetpypystandalone --faassen --allworkingmodules --oldstyle

and run some example application. Here is the obligatory screenshot (of course it might be fake, as usual with screenshots). Note: I broke application on purpose to showcase cool debugger, default screen is just boring:

Please note that we run example application without DB access, since we need some more work to get SQLAlchemy run on top of pypy-c together with pysqlite-ctypes. Just one example of an obscure details that sqlalchemy is relying on in the test suite:

class A(object):
  locals()[42] = 98


Update:This is only about new-style classes.

This works on CPython and doesn't on PyPy.

Cheers,
fijal

List comprehension implementation details

List comprehensions are a nice feature in Python. They are, however, just syntactic sugar for for loops. E.g. the following list comprehension:

def f(l):
    return [i ** 2 for i in l if i % 3 == 0]

is sugar for the following for loop:

def f(l):
    result = []
    for i in l:
        if i % 3 == 0:
            result.append(i ** 2)
    return result

The interesting bit about this is that list comprehensions are actually implemented in almost exactly this way. If one disassembles the two functions above one gets sort of similar bytecode for both (apart from some details, like the fact that the append in the list comprehension is done with a special LIST_APPEND bytecode).

Now, when doing this sort of expansion there are some classical problems: what name should the intermediate list get that is being built? (I said classical because this is indeed one of the problems of many macro systems). What CPython does is give the list the name _[1] (and _[2]... with nested list comprehensions). You can observe this behaviour with the following code:

$ python
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> [dir() for i in [0]][0]
['_[1]', '__builtins__', '__doc__', '__name__', 'i']
>>> [[dir() for i in [0]][0] for j in [0]][0]
['_[1]', '_[2]', '__builtins__', '__doc__', '__name__', 'i', 'j']

That is a sort of nice decision, since you can not reach that name by any "normal" means. Of course you can confuse yourself in funny ways if you want:

>>> [locals()['_[1]'].extend([i, i + 1]) for i in range(10)]
[0, 1, None, 1, 2, None, 2, 3, None, 3, 4, None, 4, 5, None, 5, 6, None, 6, 7, None, 7, 8, None, 8, 9, None, 9, 10, None]

Now to the real reason why I am writing this blog post. PyPy's Python interpreter implements list comprehensions in more or less exactly the same way, with on tiny difference: the name of the variable:

$ pypy-c-53594-generation-allworking
Python 2.4.1 (pypy 1.0.0 build 53594) on linux2
Type "help", "copyright", "credits" or "license" for more information.
``the globe is our pony, the cosmos our real horse''
>>>> [dir() for i in [0]][0]
['$list0', '__builtins__', '__doc__', '__name__', 'i']

Now, that shouldn't really matter for anybody, should it? Turns out it does. The following way too clever code is apparently used a lot:

__all__ = [__name for __name in locals().keys() if not __name.startswith('_') '
               or __name == '_']

In PyPy this will give you a "$list0" in __all__, which will prevent the import of that module :-(. I guess I need to change the name to match CPython's.

Lesson learned: no detail is obscure enough to not have some code depending on it. Mostly problems on this level of obscurity are the things we are fixing in PyPy at the moment.

Monday, June 9, 2008

Better Profiling Support for PyPy

As PyPy is getting more and more usable, we need better tools to use to work on certain applications running on top of PyPy. Out of this interest, I spent some time implementing the _lsprof module, which is a part of the standard library since Python2.5. It is necessary for the cProfile module, which can profile Python programs with high accuracy and a lot less overhead than the older, pure-python profile module. Together with the excellent lsprofcalltree script, you can display this data using kcachegrind, which gives you great visualization possibilities for your profile data.

Cheers,
fijal