Optimizing your maven2 build time

Or perhaps I should say, "how to not slow your maven2 build process to a crawl".

There has been some discussion on the Hudson mailing list recently about the innordinate amount of time, 40+ minutes, it takes to build Hudson from scratch. In maven2 speak, "from scratch" means you have an empty local repository. Normally, the only time you have an empty local repository is when you've just installed maven and not run any build yet. Then the first time you run a build, maven will go out to its central repository and pull down the dependent libraries (also called artifacts) it needs during the build. These artifacts are then cached into the local repository.

Everyone is free to publish their own artifacts to maven's central repository. The process how to do this is documented here.

In addition to this, maven offers you the possibility to specify your 'own' repository, to be used during the build in addition to maven's central repository. You could store your company artifacts for example on a server that is visible on the intranet but invisible to the outside world. This custom repository is specified in your pom.xml, more specifically in the <repositories> element. There are a few caveats with this however, which is exactly what this post is all about.

Caveat 1
Following good inheritance principles, most people define their
<repositories> in the root pom. This makes sense, you define them once and they are visible for all child modules. This way your custom artifacts become shared between all build instances - neat, and problem solved right ? Well ... sort of, but not quite. The thing is that maven2 will now search for its artifacts in your custom repositories before it will attempt to contact its own central repository.

So imagine you have this in your pom:


<repositories>
<repository>
<id>java.net2</id>
<url>http://download.java.net/maven/2/</url>
</repository>
<repository>
<id>java.net1</id>
<url>http://download.java.net/maven/1/</url>
</repository>
</repositories>

You will most likely get this during your build:

Downloading: http://download.java.net/maven/2/commons-collections/commons-collections/2.0/commons-collections-2.0.pom
Downloading: http://download.java.net/maven/1/commons-collections/commons-collections/2.0/commons-collections-2.0.pom
Downloading: http://repo1.maven.org/maven2/commons-collections/commons-collections/2.0/commons-collections-2.0.pom
171b downloaded

You can clearly see that maven contacts your custom defined repositories before maven central. Now if you're doing a build from scratch this will become a costly operation and can easily increase your build time by 100%. The reason for this is that 95% of your dependencies (direct and transitive) are located on central anyway, and perhaps 5% is custom stuff for your project only.

The solution to this is simple, define maven central as a custom repository before your 'real' custom repository:

<repositories>
<repository>
<id>central</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
<repository>
<id>java.net2</id>
<url>http://download.java.net/maven/2/</url>
</repository>
<repository>
<id>java.net1</id>
<url>http://download.java.net/maven/1/</url>
</repository>
</repositories>

You see now that maven central is contacted first and if the dependency is found there maven doesn't bother contacting the other repositories. This is a good optimization for 95% of our dependencies, the impact of the few artifacts we actually do pull from our custom repos is thus greatly reduced.

I will speak about the other caveats and how to optimize them in a next post.

No comments: