Check out these related webinars…

Introduction

In this article, we’re going to explore the tasks involved in moving a larger project to Apache Maven. By “larger project”, we mean a project with several deliverable artifacts, like ‘jar’ files or ‘ear’ files, and sophisticated dependencies, both among the project artifacts, and on external artifacts.The article is targeted to people who have development experience, but little experience with Maven.

Some Background on Maven

Apache Maven is described on the Maven web site as follows:

Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project’s build, reporting and documentation from a central piece of information.

As the official description hints at, Maven is more than a build system. Among other things, it can manage dependencies – the jar files and other artifacts that any given application module needs in order to compile and to operate at runtime. Given a list of dependencies, Maven will look after retrieving those dependencies from a central repository, relieving the developer from managing a store of libraries. Further, Maven will track down transitive dependencies, where a dependency has dependencies of its own.

Maven has a rich ecosystem of plugins that can be used to process the project – plugins are the way that we encode re-usable knowledge in Maven. So, for instance, the knowledge of how to perform test coverage analysis using the ‘Cobertura’ library is encoded into the ‘cobertura-maven’ plugin. If we want to perform this kind of analysis, we have very little configuration to do; we simply call out the ‘cobertura-maven’ plugin. As an additional benefit, Maven treats the plugins as dependencies, and can retrieve the plugins themselves from the central repository. The net result is that we can have highly sophisticated build processing with almost no configuration or installation on the build machine. As you can imagine, this self-configuring behaviour is extremely convenient when we’re setting up new development machines, or using a continuous integration system like Jenkins.

There are some costs to this convenience – Maven makes heavy use of conventions to eliminate the need to configure plugins. It has very firm views on how a project should be structured, and how modules should be built. These views may not match your views and preferred structure. If that’s the case, and you have strong opinions on how to create a build system, then Maven isn’t for you: You can’t just tell Maven what to do, the way you might with Apache Ant, for instance.

Maven has become incredibly popular over the last few years – lots of recent open-source projects use Maven as their build system (although Maven is more than a build system, as we’ll discuss later).

When a project is setup properly to use Maven, the developers have an easy time running the build. Maven will automatically retrieve all the dependency files and tools used for the build. This convenience is attractive, so there’s often a strong drive to convert existing projects to use Maven.

In the corporate world, things get a little more complicated, as we often have legacy projects that have a pre-existing structure, and pre-existing build files, using more traditional tools like ‘make’ or Apache Ant. Many corporate projects use the builder that’s embedded in an IDE, like Rational Application Developer, or Eclipse. It’s tempting to think of the conversion as a process of re-implementing the build script as a Maven ‘Project Object Model’ (‘pom.xml’ or just ‘pom’), or somehow configuring Maven to mimic the existing build process. You’ll often see questions on the Maven users list that go something like “Maven wants the Java files in ‘src/main/java’, but my existing project has the source in ‘src’. How do I set Maven’s source directory?”. Such an approach is unequivocally doomed to failure. You don’t configure Maven. Maven configures you.

So how do we go about converting a project to use Maven?

Artifacts are the Key!

The process of setting up a Maven-based project revolves around one key observation about the Maven ‘pom.xml’ file: In Maven, we don’t describe how to build an artifact. Instead we describe what the artifact should be. For example, here’s a fragment from a ‘pom.xml’ file (examples are taken from the ‘river-examples’ package at the Apache River project http://river.apache.org ):

<project xsi_schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd" 
 xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance">
 <modelVersion>4.0.0</modelVersion>
 <parent>
 <groupId>org.apache.river.examples</groupId>
 <artifactId>river-examples</artifactId>
 <version>1.0-SNAPSHOT</version>
 </parent>

 <groupId>org.apache.river.examples</groupId>
 <artifactId>browser</artifactId>
 <version>1.0-SNAPSHOT</version>
 <packaging>jar</packaging>

 <name>browser</name>
 <url>http://river.apache.org</url>
 <properties>
 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 </properties>

 <dependencies>
 <!-- Note that the versions for these are specified in the parent
 pom's dependency management section.
 -->
 <dependency>
 <groupId>net.jini</groupId>
 <artifactId>jsk-platform</artifactId>
 </dependency>
 ...other dependencies not shown...
 </dependencies>

</project>

The ‘packaging’ element specifies that the module defined by this ‘pom.xml’ produces a ‘jar’ file as its artifact. Further, the artifact produced by this module depends on a few other artifacts, that will be retrieved from a central repository (or its mirror). If you check the schema for Maven’s ‘pom.xml’, you would find that there is only one ‘packaging’ element allowed. So what would happen if you had a project that produced more than one ‘jar’ file. That’s easy to arrange in an Ant script; you would simply have multiple ‘jar’ tasks in the script.

In Maven, you can’t do that! A module can only produce one artifact.

Actually, that isn’t strictly true. There is a somewhat convoluted way to produce multiple artifacts, using multiple executions of the Maven Assemblyplugin, but that really doesn’t conform to Maven’s philosophy of describing the artifact. The correct way is to use a multi-module project, with a module for each artifact that the project should produce.

It turns out that this observation is both the most important step towards getting along with Maven, and provides the prime guidance for converting legacy projects. If you want to adopt Maven, you need to fully embrace the following principle:

A Maven module produces a single artifact, as described by it’s ‘pom.xml’ file. Hence, producing an artifact (‘jar’ file, ‘ear’ file, ‘war’ file or what-have-you) requires a Maven module.

So, when we want to convert an existing project to Maven, we can approach it as follows:

  1. Make sure your infrastructure is in place for source code and dependency management.
  2. Determine the scope of the project that we’re going to convert.
  3. Enumerate the artifacts that we need to produce.
  4. For each artifact, enumerate the artifact’s dependencies.
  5. Group artifacts into sets that should be built together.
  6. For each set of related artifacts, create a Maven parent module.
  7. For each artifact, create a Maven module under the appropriate parent module.
  8. Populate the modules with the appropriate source files (copied from the existing project) and flesh out the ‘pom.xml’ with the requisite build and site plugins.
  9. Ensure that Maven can find each dependency in a repository.

For that matter, we can also approach new projects using the same technique.

Let’s examine each of these in turn…

Make Sure Your Infrastructure is in Place

Source Code Management: Unfortunately, it’s a given that you’ll be restructuring the source code. To get full benefits from Maven, you need to adhere to Maven’s conventions for organizing the source. It’s not impossible to do that by moving folders in an existing SCM repository, but it’s probably easier to create new projects/repositories to hold the mavenized build. Depending on your SCM system, it might be difficult to keep the version history for the source files. In that case, you’ll need to freeze the existing repository and keep it around for reference. This might be a convenient time to switch to a new SCM if that’s part of your roadmap.

Artifact Repository: You really should have a repository manager to act as a publishing point for the products of your development process, and to act as a local cache and control point for dependencies that come from outside your company. Popular choices include Sonatype Nexus, JFrog Artifactory, and Apache Archiva.

As a side-note, you’ll need to codify your release engineering process and setup the appropriate staging and final repositories.

Determine the Scope of the Project

Just because you currently have a build system that creates 50 jar files doesn’t mean you have to create a new mavenized build for 50 artifacts. It’s perfectly acceptable to to manually load dependencies into a repository; not everything has to be built from scratch. If you have portions of the build that is fairly stable, you might want to leave it in the present build system and concentrate your Maven efforts on parts of a project that are going to be worked on in the near future. Don’t forget to account for quality assurance systems and/or
integration tests in deciding the scope.

Enumerate the Artifacts That We Need To Produce

The easiest place to start is the end goal. Are we producing a web application that gets deployed to a server? Then we should look at the ‘ear’ file or ‘war’ file and see what artifacts it contains. Essentially every “jar”, “war”, or “ear” file in the project is an artifact of compiling and packaging. Since Maven is based on defining artifacts, these are the items that drive our conversion process.

For each artifact, have a look inside it and see if it contains any other artifacts that we’ll need to generate. For instance, a web archive (“war”) artifact will usually contain one or more libraries (as “jar” files) or tag libraries (as “tld”) files. Make a list of the artifacts that are contained inside each artifact

  • These are the artifact’s “dependencies”, and we’ll need that information to create the Maven module that builds the artifact.

If a dependency is produced as part of the current project, (i.e. it’s “in scope”), then you need to repeat the process, diving down into the artifacts until the dependency artifacts are “out of scope”.

For each artifact, we need to make a few decisions:

  • Assign the Maven coordinates (group id, artifact id, version id) for the artifact.
  • Decide whether the artifact needs to be released and published on its own. In other words, does it need to go into a repository or should it be treated as a “temporary” artifact that is only ever used in the one project that we’re building. For example, if you have an application-specific tag library that is packaged into the “war” file, you may not need to publish that “jar” file to the repository. Artifacts that are used by “out-of-scope” projects or actors (e.g. deployment artifacts) definitely need to be published to the artifact repository, as do any artifacts that go through their own “staging” or “QA” lifecycles.
  • If an artifact does need to be published, decide which repository it needs to be published to.
  • Don’t worry at this stage about what the parent project is for any given artifact; we’ll consider the module groupings in the next step.

Group Artifacts into Sets

To recap, an artifact is the product of a module, so we’ll have a module for each artifact. We can group related modules together into a “multi-module project”

  • really a module that aggregates a set of other modules. This multi-module project is also known as a “parent module” or a “reactor module”. When we execute Maven on the the parent module, it executes the lifecycle on all the child modules, in the order dictated by the dependency declarations in the modules.

When a module declares a dependency on another module that is not part of the current build, Maven tries to find it in the artifact repository. If that other module hasn’t published its snapshots to the artifact repository recently, you end up working with old versions of snapshot dependencies.

Clearly, there’s a tradeoff here – the more modules in a parent module, the longer the build is going to take, but we know that we’re building from the most recent source code (assuming you’re practicing a “commit early, commit often” policy on your source code). The fewer child modules in the parent module, the more we need to pay attention to publishing snapshots, and clearing out the snapshot cache.

Usually, the best idea is to group together modules that a single developer is likely to touch in a session. Then institute a policy of publishing a snapshot of every module every night (this is a good application for a continuous integration system).

This will typically be a “pom” project. Note also that a “parent” module can also be the “child” of another “parent” module. You’ll also need to decide how to put the parent modules into your source code management system – are they independent or hierarchical?

For each artifact, create a Maven module under the appropriate parent module.

You can start populating the “pom.xml” file with the maven coordinates and dependencies that you determined in earlier steps.

  1. Populate the modules with the appropriate source files (copied from the existing project) and flesh out the ‘pom.xml’ with the requisite build and site plugins.
  2. Ensure that Maven can find each dependency in a repository.

Ensure that Maven can find each dependency in a repository

It is actually possible to have Maven look on a local file system for dependencies. It’s also possible to load artifacts into Maven’s local cache manually. Both these options are a very bad idea, though, because you sacrifice the repeatability of your build. The right thing to do is to make sure that each of your builds can be executed using only your corporate repository. There should be no jar files in your version control, and no reference to any jar files on a given machine (even to server libraries). Every single dependency and transitive dependency in your project should be available from your corporate repository.

Again, artifacts are the key here. You need to quantify the dependencies for each artifact that your project generates, and make sure that artifact is available to Maven. For commonly-available or open-source artifacts, you’ll need to identify the artifact and find out it’s “Maven coordinates”, or “G-A-V” in Maven Central.

For any in-house artifacts that are generated by Maven projects, you simply need to make sure that they are loaded to the shared repository, or “deployed” in Maven terminology.

Artifacts that are not generated by Maven projects still need to be deployed to the shared repository. You can do this directly in the repository’s user interface (Nexus, Artifactory and Archiva all support this functionality), or you can use Maven’s ‘deploy’ plugin manually. It’s also a good idea to create a ‘pom.xml’ for the artifact that simply lists the artifact’s dependencies, so as to allow Maven to resolve transitive dependencies.

Either way, these artifacts should be subjected to some form of your release engineering process. Your repository becomes the “system of record” for your deployment, so you need to apply appropriate controls to the artifacts that get loaded into your repository. In general, you don’t want individual developer machines going out to Maven Central. Instead, point them at your corporate repository, and setup the requisite caching or proxy behaviour there.

Conclusion

Maven is “opinionated software”. The developers of Maven have strong opinions about how to layout and structure a complex software build. Indeed, Maven was originally created to manage a complicated Apache project. As a result of these strong conventions, it is non-trivial to convert a project to use Apache Maven as its build system.

Nonetheless, once you understand the Maven philosophy (“A module produces an artifact”), converting to Maven is an exercise in tedium rather than a conceptual challenge. The benefits of a repeatable build and a consistent project structure far outweigh the (admittedly not insignificant) efforts of converting.

Related Webinars