Stubbisms – Tony’s Weblog

July 10, 2009

Git Script to Show Largest Pack Objects and Trim Your Waist Line!

Filed under: Java — Tags: , , , , — Antony Stubbs @ 2:07 pm

This is a script I put together after migrating the Spring Modules project from CVS, using git-cvsimport (which I also had to patch, to get to work on OS X / MacPorts). I wrote it because I wanted to get rid of all the large jar files, and documentation etc, that had been put into source control. However, if _large files_ are deleted in the latest revision, then they can be hard to track down.

The script effectively side step this limitation, as it simply goes through a list of all objects in your pack file (so try and run git gc first, so that all your objects are in your pack), and list the top largest files, showing you their information. The, with the file locations, you can then run:

# remove a tree from entire repo history
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# pull in a repo without the junk
a git pull file://$(pwd)/myGitRepo

Which will remove them from your entire history, trimming your waist line nicely! But be sure to follow the advice from the man page for filter-branch – there’s things you should be aware of, such as old tags (that one got me) etc… Rather than messing around trying to get it exactly right, I actually just retagged the new repo by matching the dates of the tags from the initial cvsimport – there were only 9 after all!

But for reference, here is the command I’m referring to, from the git-filter-branch man page:

You really filtered all refs: use –tag-name-filter cat — –all when calling git-filter-branch.

There’s a few different suggestions as to how to remove the loose objects from your repository, in order to _really_ make it shrink straight away, my favourite being from the man page:

git-filter-branch is often used to get rid of a subset of files, usually with some combination
of –index-filter and –subdirectory-filter. People expect the resulting repository to be
smaller than the original, but you need a few more steps to actually make it smaller, because
git tries hard not to lose your objects until you tell it to. First make sure that:

o You really removed all variants of a filename, if a blob was moved over its lifetime. git
log –name-only –follow –all — filename can help you find renames.

o You really filtered all refs: use –tag-name-filter cat — –all when calling
git-filter-branch.
Then there are two ways to get a smaller repository. A safer way is to clone, that keeps your
original intact.

o Clone it with git clone file:///path/to/repo. The clone will not have the removed objects.
See git-clone(1). (Note that cloning with a plain path just hardlinks everything!)

Apart from the section on “are your objects _really_ loose?”, the most useful bit of information was running the git-pull command, which someone suggested from the discussion on the git mailing list. This was the only thing that actually worked for me, contrary to what it states about git-clone. However, be careful, as git pull by default doesn’t pull over all information…

And without further a due, here is the script:

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
	# extract the size in bytes
	size=$((`echo $y | cut -f 5 -d ' '`/1024))
	# extract the compressed size in bytes
	compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
	# extract the SHA
	sha=`echo $y | cut -f 1 -d ' '`
	# find the objects location in the repository tree
	other=`git rev-list --all --objects | grep $sha`
	#lineBreak=`echo -e "\n"`
	output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

Thanks to David Underhill for the inspiration, and the various posts on the git mailing list!

For other migration tips (svn) – see here: http://fpereda.wordpress.com/2008/06/11/how-i-migrated-paludis-to-git/

P.s. if someone tries running the script on Linux or Cygwin and it needs modifying, let me know and I’ll post the modified versions all next to each other in this article.

July 30, 2008

git rebase, svn and whitespace

Filed under: Java — Tags: , , , , — Antony Stubbs @ 10:35 pm

Working behind a svn repository using the excellent git-svn tool and in a Windows environment, you inevitably run into the problem of trailing whitespace.

Depending on your goals, the first step is to disable the pre-commit hook that checks for it in .git/pre/hooks/pre-commit. easy.

I ran into another interesting whitespace problem when I was running the git-rebase command. Basically the first stage worked fine, but when git went to apply the generated patch it failed complaining about various:

error: NotificationService/launchConfigs/All JUnits.launch: does not match index

and

warning: squelched 104 whitespace errors
warning: 109 lines add whitespace errors.

and

error: Entry 'NotificationService/launchConfigs/All JUnits.launch' not uptodate. Cannot merge.

This is all very strange, because the patch should apply cleanly without any problem as there is no chance to modify any of these files during the process.

Short story was that after an initial panic, I thought back and remembered about the

core.whitespace

setting.

By running:

git-config core.whitespace=nowarn

You are telling git to ignore whitespace issues. I’m quite sure my problem was by git trying to fix them / translate them for during the rebase process.

If you look at

man git-apply

you can see one of the options is:

fix outputs warnings for a few such errors, and applies the patch after fixing
them (strip is a synonym --- the tool used to consider only trailing whitespaces as
errors, and the fix involved stripping them, but modern gits do more).

I believe this may have something to do with this bug in msysgit – “Problems patching for git rebase / git am”.

The work flow went like this:

stubbsa@VFNZV95336 /cygdrive/c/stubbs_dev_merge_git/oasis-cr
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying added hibernate dep
.dotest/patch:15: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.
                        org.hibernate
.dotest/patch:29: trailing whitespace.
                        hibernate-annotations
.dotest/patch:30: trailing whitespace.
                        3.3.1.GA
warning: squelched 7 whitespace errors
warning: 12 lines add whitespace errors.
Applying Added hibernate dep, rev eng strategy, hibernate config for localhost and generation result.
.dotest/patch:24: trailing whitespace.

.dotest/patch:25: trailing whitespace.
                        org.hibernate.eclipse.console.hibernateBuilder
.dotest/patch:26: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.

warning: squelched 268 whitespace errors
warning: 273 lines add whitespace errors.
Applying Moved launch configurations.
.dotest/patch:26: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.

.dotest/patch:29: trailing whitespace.

.dotest/patch:30: trailing whitespace.

error: NotificationService/launchConfigs/All JUnits.launch: does not match index
error: NotificationService/launchConfigs/Dependency Tree.launch: does not match index
error: NotificationService/launchConfigs/Get Deps - runtime.launch: does not match index
Using index info to reconstruct a base tree...
:26: trailing whitespace.

:27: trailing whitespace.

:28: trailing whitespace.

:29: trailing whitespace.

:30: trailing whitespace.

warning: squelched 104 whitespace errors
warning: 109 lines add whitespace errors.
Falling back to patching base and 3-way merge...
error: Entry 'NotificationService/launchConfigs/All JUnits.launch' not uptodate. Cannot merge.
fatal: merging of trees 58e7f2e19e35fc8155e85f3238503f905e312053 and 1239e895b39ceab8e19b140cbf51f5ef7082201a failed
Failed to merge in the changes.
Patch failed at 0003.

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To restore the original branch and stop rebasing run "git rebase --abort".

stubbsa@VFNZV95336 /cygdrive/c/stubbs_dev_merge_git/oasis-cr
$ git rebase --abort
HEAD is now at 5083609 Setup JPA using Hibernate

stubbsa@VFNZV95336 /cygdrive/c/stubbs_dev_merge_git/oasis-cr
$ git-config core.whitespace=nowarn

stubbsa@VFNZV95336 /cygdrive/c/stubbs_dev_merge_git/oasis-cr
$ git rebase -v master
Changes from cb70410d47e13640be9431d63373452a5e9d5c6e to 5548a15d3af44b8686c8e66b246128484a9feaef:
 NotificationService/lib/client-es-2.0.3.jar |  Bin 1939504 -> 0 bytes
 NotificationService/lib/client-es-2.0.5.jar |  Bin 0 -> 1998473 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 NotificationService/lib/client-es-2.0.3.jar
 create mode 100755 NotificationService/lib/client-es-2.0.5.jar
First, rewinding head to replay your work on top of it...
Applying added hibernate dep
.dotest/patch:15: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.
                        org.hibernate
.dotest/patch:29: trailing whitespace.
                        hibernate-annotations
.dotest/patch:30: trailing whitespace.
                        3.3.1.GA
warning: squelched 7 whitespace errors
warning: 12 lines add whitespace errors.
Applying Added hibernate dep, rev eng strategy, hibernate config for localhost and generation result.
.dotest/patch:24: trailing whitespace.

.dotest/patch:25: trailing whitespace.
                        org.hibernate.eclipse.console.hibernateBuilder
.dotest/patch:26: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.

warning: squelched 268 whitespace errors
warning: 273 lines add whitespace errors.
Applying Moved launch configurations.
.dotest/patch:26: trailing whitespace.

.dotest/patch:27: trailing whitespace.

.dotest/patch:28: trailing whitespace.

.dotest/patch:29: trailing whitespace.

.dotest/patch:30: trailing whitespace.

warning: squelched 104 whitespace errors
warning: 109 lines add whitespace errors.
Applying Setup JPA using Hibernate
.dotest/patch:41: trailing whitespace.

.dotest/patch:44: trailing whitespace.

.dotest/patch:55: trailing whitespace.
                        org.eclipse.wst.common.project.facet.core.builder
.dotest/patch:56: trailing whitespace.

.dotest/patch:57: trailing whitespace.

warning: squelched 173 whitespace errors
warning: 178 lines add whitespace errors.

stubbsa@VFNZV95336 /cygdrive/c/stubbs_dev_merge_git/oasis-cr
$ gitk

As you can see, it still complains, but the rebase seems to have worked nicely.

There may be severe reprecusions of using core.whitespace=nowarn ( that’s a bit of a joke ) but I haven’t thought about it much yet. We’ll see… all that matters really in our case is that the svn dcommits get through looking sane.

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 686 other followers