Stubbisms – Tony’s Weblog

July 10, 2009

Git Script to Show Largest Pack Objects and Trim Your Waist Line!

Filed under: Java — Tags: , , , , — Antony Stubbs @ 2:07 pm

This is a script I put together after migrating the Spring Modules project from CVS, using git-cvsimport (which I also had to patch, to get to work on OS X / MacPorts). I wrote it because I wanted to get rid of all the large jar files, and documentation etc, that had been put into source control. However, if _large files_ are deleted in the latest revision, then they can be hard to track down.

The script effectively side step this limitation, as it simply goes through a list of all objects in your pack file (so try and run git gc first, so that all your objects are in your pack), and list the top largest files, showing you their information. The, with the file locations, you can then run:

# remove a tree from entire repo history
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# pull in a repo without the junk
a git pull file://$(pwd)/myGitRepo

Which will remove them from your entire history, trimming your waist line nicely! But be sure to follow the advice from the man page for filter-branch – there’s things you should be aware of, such as old tags (that one got me) etc… Rather than messing around trying to get it exactly right, I actually just retagged the new repo by matching the dates of the tags from the initial cvsimport – there were only 9 after all!

But for reference, here is the command I’m referring to, from the git-filter-branch man page:

You really filtered all refs: use –tag-name-filter cat — –all when calling git-filter-branch.

There’s a few different suggestions as to how to remove the loose objects from your repository, in order to _really_ make it shrink straight away, my favourite being from the man page:

git-filter-branch is often used to get rid of a subset of files, usually with some combination
of –index-filter and –subdirectory-filter. People expect the resulting repository to be
smaller than the original, but you need a few more steps to actually make it smaller, because
git tries hard not to lose your objects until you tell it to. First make sure that:

o You really removed all variants of a filename, if a blob was moved over its lifetime. git
log –name-only –follow –all — filename can help you find renames.

o You really filtered all refs: use –tag-name-filter cat — –all when calling
git-filter-branch.
Then there are two ways to get a smaller repository. A safer way is to clone, that keeps your
original intact.

o Clone it with git clone file:///path/to/repo. The clone will not have the removed objects.
See git-clone(1). (Note that cloning with a plain path just hardlinks everything!)

Apart from the section on “are your objects _really_ loose?”, the most useful bit of information was running the git-pull command, which someone suggested from the discussion on the git mailing list. This was the only thing that actually worked for me, contrary to what it states about git-clone. However, be careful, as git pull by default doesn’t pull over all information…

And without further a due, here is the script:

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
	# extract the size in bytes
	size=$((`echo $y | cut -f 5 -d ' '`/1024))
	# extract the compressed size in bytes
	compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
	# extract the SHA
	sha=`echo $y | cut -f 1 -d ' '`
	# find the objects location in the repository tree
	other=`git rev-list --all --objects | grep $sha`
	#lineBreak=`echo -e "\n"`
	output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

Thanks to David Underhill for the inspiration, and the various posts on the git mailing list!

For other migration tips (svn) – see here: http://fpereda.wordpress.com/2008/06/11/how-i-migrated-paludis-to-git/

P.s. if someone tries running the script on Linux or Cygwin and it needs modifying, let me know and I’ll post the modified versions all next to each other in this article.

July 8, 2009

Spring Modules Fork

Filed under: Java — Tags: , , , — Antony Stubbs @ 10:45 pm
Ok guys – the project is up on git-hub!
Install Git! – http://git-scm.com/
To download an anonymous repository run http://github.com/astubbs/spring-modules or sign up with git-hub and get your personal account!
Let the patching begin!
From the Wiki page: “”"
This is a resurection of the extremely valuable and abandoned Spring-Modules project.
The plan is to fully embrace Maven as the build tool and eventually throw out all the old build code.
At this point, all the old jar libraries and generated documentation have been pruned from the repository history.
This pruning has reduced the size of the repository from 95m to 7m. Nice.
The idea will be to slowly add one by one, as the are compile ready, to the parent module section, so they can be included.
Msg/email me on antony.stubbs@gmail.com to discuss!
“”"
I will try and get a development mailing list setup asap, but for now, email me on antony.stubbs@gmail.com with the [subject spring-module-fork] and I’ll start building a manual list.

One thing I want to discuss is ow we can organise the issues on jira :/ I still have had no response from InspiSpring.

The project website: http://wiki.github.com/astubbs/spring-modules

Inspired by my work with Spring Modules (or difficulties with), and as explained in this post on the Spring Modules forum, I am now announcing the Spring Modules Fork! An effort to bring back to life some very useful software!

I have emailed various people and even emailed suggesting an extensions project as described on the Spring site[1] but haven’t had any responses.

As Spring Modules has been officially declared dead [2] and i’m not getting nay replies to my mails I would like to fork the project into git-hub and get some vital patches applied – particularly on my area of interest – the cache module. I also would like to migrate the whole project to maven.

I’m posting here to get any thoughts or suggestions regarding this – hopefully a response from someone at Spring – particularly Colin Yates.

Let’s get this useful software moving again!

Ok guys – the project is up on git-hub!

Install Git! – http://git-scm.com/

To download an anonymous repository run git clone git://github.com/astubbs/spring-modules.git or sign up with git-hub and get your personal account!

Let the patching begin!

From the Wiki page:

This is a resurrection of the extremely valuable and abandoned Spring-Modules project.

The plan is to fully embrace Maven as the build tool and eventually throw out all the old build code.

At this point, all the old jar libraries and generated documentation have been pruned from the repository history.

This pruning has reduced the size of the repository from 95m to 7m. Nice.

The idea will be to slowly add one by one, as the are compile ready, to the parent module section, so they can be included.

Msg/email me on antony.stubbs@gmail.com to discuss!

I will try and get a development mailing list setup asap, but for now, email me on antony.stubbs@gmail.com with the [subject spring-module-fork] and I’ll start building a manual list. (If anyone knows a free mail list setup, let me know).

One thing I want to discuss is how we can organise the issues on jira :/ I still have had no response from Spring.

P.s.  I’ve also setup a mailing list now:

Mailing List

Here are the essentials:

  • Group name: spring-modules-fork
  • Group home page: http://groups.google.com/group/spring-modules-fork
  • Group email address spring-modules-fork@googlegroups.com

Blog at WordPress.com.