Living on the Edge

GitPython

Posted on May 08, 2008

As you’re probably aware of by now, I really like Git. It took some time but things finally started clicking. One of the things I wanted to do was make it easier to interact with Git from Python / Django projects.

I searched around for a Python Git module. I really didn’t find anything that looked complete to me, although I didn’t look too hard. Not being the creative type I noticed that Ruby has the grit library created by Tom Preston-Werner and Chris Wanstrath, which is very nice. I decided to port it because I can use it for some cool stuff, and because I figured it would help me learn a lot about Python. So here it is.

About

GitPython is a python library used to interact with Git repositories.

GitPython is a port of the grit library in Ruby created by Tom Preston-Werner and Chris Wanstrath.

The method_missing stuff was taken from this blog post.

REQUIREMENTS

INSTALL

You can download the code from CheeseShop or alternatively pull the source.


python setup.py install

SOURCE

GitPython’s git repo is available on Gitorious, which can be browsed at:

http://gitorious.org/projects/git-python/

and cloned from:

git://gitorious.org/git-python/mainline.git

USAGE

GitPython provides object model access to your git repository. Once you have created a repository object, you can traverse it to find parent commit(s), trees, blobs, etc.

Initialize a Repo object

The first step is to create a Repo object to represent your repository.


>>> from git_python import *
>>> repo = Repo("/Users/mtrier/Development/git-python")

In the above example, the directory /Users/mtrier/Development/git-python is my working repository and contains the .git directory. You can also initialize GitPython with a bare repository.


>>> repo = Repo.init_bare("/var/git/git-python.git")

Getting a list of commits

From the Repo object, you can get a list of Commit objects.


>>> repo.commits()
[<GitPython.Commit "207c0c4418115df0d30820ab1a9acd2ea4bf4431">, 
 <GitPython.Commit "a91c45eee0b41bf3cdaad3418ca3850664c4a4b4">, 
 <GitPython.Commit "e17c7e11aed9e94d2159e549a99b966912ce1091">, 
 <GitPython.Commit "bd795df2d0e07d10e0298670005c0e9d9a5ed867">]

Called without arguments, Repo.commits returns a list of up to ten commits reachable by the master branch (starting at the latest commit). You can ask for commits beginning at a different branch, commit, tag, etc.


>>> repo.commits('mybranch')
>>> repo.commits('40d3057d09a7a4d61059bca9dca5ae698de58cbe')
>>> repo.commits('v0.1')

You can specify the maximum number of commits to return.


>>> repo.commits('master', 100)

If you need paging, you can specify a number of commits to skip.


>>> repo.commits('master', 10, 20)

The above will return commits 21-30 from the commit list.

The Commit object

Commit objects contain information about a specific commit.


>>> head = repo.commits()[0]

>>> head.id
'207c0c4418115df0d30820ab1a9acd2ea4bf4431'

>>> head.parents
[<GitPython.Commit "a91c45eee0b41bf3cdaad3418ca3850664c4a4b4">]

>>> head.tree
<GitPython.Tree "563413aedbeda425d8d9dcbb744247d0c3e8a0ac">

>>> head.author
<GitPython.Actor "Michael Trier <mtrier@gmail.com>">

>>> head.authored_date
(2008, 5, 7, 5, 0, 56, 2, 128, 0)

>>> head.committer
<GitPython.Actor "Michael Trier <mtrier@gmail.com>">

>>> head.committed_date
(2008, 5, 7, 5, 0, 56, 2, 128, 0)

>>> head.message
'cleaned up a lot of test information. Fixed escaping so it works with subprocess.'

You can traverse a commit’s ancestry by chaining calls to parents.


>>> repo.commits()[0].parents[0].parents[0].parents[0]

The above corresponds to master^^^ or master~3 in git parlance.

The Tree object

A tree records pointers to the contents of a directory. Let’s say you want the root tree of the latest commit on the master branch.


>>> tree = repo.commits()[0].tree
<GitPython.Tree "a006b5b1a8115185a228b7514cdcd46fed90dc92">

>>> tree.id
'a006b5b1a8115185a228b7514cdcd46fed90dc92'

Once you have a tree, you can get the contents.


>>> contents = tree.contents
[<GitPython.Blob "6a91a439ea968bf2f5ce8bb1cd8ddf5bf2cad6c7">, 
 <GitPython.Blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391">, 
 <GitPython.Tree "eaa0090ec96b054e425603480519e7cf587adfc3">, 
 <GitPython.Blob "980e72ae16b5378009ba5dfd6772b59fe7ccd2df">]

This tree contains three Blob objects and one Tree object. The trees are subdirectories and the blobs are files. Trees below the root have additional attributes.


>>> contents = tree.contents[-2]
<GitPython.Tree "e5445b9db4a9f08d5b4de4e29e61dffda2f386ba">

>>> contents.name
'test'

>>> contents.mode
'040000'

There is a convenience method that allows you to get a named sub-object from a tree.


>>> tree/"lib" 
<GitPython.Tree "c1c7214dde86f76bc3e18806ac1f47c38b2b7a30">

You can also get a tree directly from the repository if you know its name.


>>> repo.tree()
<GitPython.Tree "master">

>>> repo.tree("c1c7214dde86f76bc3e18806ac1f47c38b2b7a30")
<GitPython.Tree "c1c7214dde86f76bc3e18806ac1f47c38b2b7a30">

The Blob object

A blob represents a file. Trees often contain blobs.


>>> blob = tree.contents[-1]
<GitPython.Blob "b19574431a073333ea09346eafd64e7b1908ef49">

A blob has certain attributes.


>>> blob.name
'urls.py'

>>> blob.mode
'100644'

>>> blob.mime_type
'text/x-python'

>>> len(blob)
415

You can get the data of a blob as a string.


>>> blob.data
"from django.conf.urls.defaults import *\nfrom django.conf..." 

You can also get a blob directly from the repo if you know its name.


>>> repo.blob("b19574431a073333ea09346eafd64e7b1908ef49")
<GitPython.Blob "b19574431a073333ea09346eafd64e7b1908ef49">

What Else?

There is more stuff in there, like the ability to tar or gzip repos, stats, blame, and probably a few other things. Additionally calls to the git instance are handled through a method_missing construct, which makes available any git commands directly, with a nice conversion of Python dicts to command line parameters.

Check the unit tests, they’re pretty exhaustive.

What is Next?

There are a couple of tests that don’t pass due to an inability to mock them properly, so I’m going to get those fixed up.

I also plan to restructure some of the object relationships. A few of them feel a little dirty to me.

LICENSE

New BSD License. See the LICENSE file.

Comments
  1. Rob HudsonMay 08, 2008 @ 02:12 AM

    That’s very awesome, empty! I had an inkling of the same thought when I was looking for a git python library but never bothered trying to create something. I especially like how you used the div method to simulate directory traversal… pretty slick.

    Is there a gitorious written in Django in your future? :)

  2. Eric FlorenzanoMay 08, 2008 @ 02:37 AM

    Wow, this looks really cool! I was wondering why there were no Python libraries for interfacing with Git. This must have taken you quite a while.

  3. Jannis LeidelMay 08, 2008 @ 02:46 AM

    Michael, this is great stuff, you rock!

    You definitely have been faster than me /me doing rm -rf ~/Code/gitpython

  4. PetarMay 08, 2008 @ 03:32 AM

    Nice work! Thank you very much for this.

  5. kevinMay 08, 2008 @ 03:16 PM

    looks great! congrats on the launch. thanks a bunch :)

  6. Nuno MarizMay 08, 2008 @ 05:28 PM

    Cool stuff Michael! Specially now, that I’m moving all my projects to Git.

  7. Alan BriolatMay 11, 2008 @ 01:04 PM

    Awesome! I looked for a Python Git module a while ago and turned up nothing, and then today I looked again and found this. Nice work =)

  8. Eddy MulyonoMay 13, 2008 @ 02:59 AM

    Thanx for the release!

    I’ve been thinking about using git from Python. I was aware of Stacked Git (stgit), which is a quilt-like implementation on top of git written in Python.

    Have you looked at Stacked Git?

  9. EmptyMay 13, 2008 @ 10:48 PM

    Eddy: I have not looked at it but will.

    Thanks for all the encouraging comments everyone.

  10. James SnyderMay 14, 2008 @ 10:45 PM

    Excellent. I was looking around for something similar to this a few weeks ago after getting turned on to git. I’d found the ruby implementation, and am glad to see a native python version.

    I’ve been using git hooks for deploying a small django app. It’ll be nice to be able to directly work with git through python :-)

  11. BruceMay 17, 2008 @ 11:22 AM

    Am I right that currently the library is for read-only access to a repository? Would be nice to be able to add files, and commit changes too. Might that be within scope for future enhancement?

    Am really intrigued by the notion of using git as a data store for a simple CMS, a la the Ruby-based git-wiki, hence the question.

  12. Nicola LarosaMay 22, 2008 @ 07:27 AM

    First, sorry for the contrarian point of view.

    I applaud you for one more effort to make a good tool usable in the Python world; however, why use a good tool when there is a great other one? ;-)

    In your posts I cannot find your reasons for choosing Git; anyway, I would like to plug the jewel that Mercurial is. Apart from being written in Python, I find it more pythonic in simplicity of both usage and implementation.

    I summarized its many virtues here:

    http://lwn.net/Articles/274823/

    Again, not intending to rain on your parade, and still curious about your reasons for going with git.

  13. MMay 25, 2008 @ 04:24 PM

    For anyone who git hasnt click for yet,check out this find reading http://www.newartisans.com/blog_files/git.from.bottom.up.php

  14. tdJune 06, 2008 @ 04:34 AM

    what a good work!

  15. griffJune 11, 2008 @ 03:36 PM

    You might want to check out git-issues, a bug tracker for git written in python. It’s at github: http://github.com/ktf/git-issues/tree/master

    I’m thinking about making a django frontend…

Post a comment
Comment