On a recent client we found ourselves battling what became known as JCR installer hell. No, it wasn't quite as bad as whatever hell you might imagine. Yes, it was quite the inconvenience. If you've experienced yourself, you'll immediately start nodding your head. Let me know if you've heard this one:
- You make some updates to a bundle or three locally.
- Your pull request is merged or the watched branch is updated via push and your CI box kicks off the build.
- You get notified of build success (or just keep Jenkins open and auto-refreshing if you're OCD like me) and anxiously hop onto the relevant environment to take a look at your changes.
- After 10 minutes of furiously clicking around you begin scratching that spot on your head you always scratch when the code you know was just built out and succeeded locally doesn't seem to be working anymore.
- You quietly contemplate your own existence...
- As a good developer you paired your updates with some proper logging like a wedge of cheese with a fine merlot so you open up the log to see what's going on.
- You begin scratching that spot on your head you always scratch when the code you know was just built that contained your new logging isn't, well, logging.
A Quick Word on the JCR Installer
One way of installing, say, a bundle is to do so manually through Apache Felix. However, this is neither recommended nor would it be ideal to have to manually update a bundle after you've made some updates. Further, we install most everything else in AEM projects via content package anyway (ideally via the Maven content package plugin). Why should our bundles be any different? Thanks to the JCR Installer Provider of Sling, they don't have to be. The JCR Installer Provider is what allows us to follow best practice and deploy our bundles via an 'install' directory under /apps. What's actually happening under the hood is that the JCR Installer Provider continually scans the repository using standard Sling resolution rules and matches via regular expression to directories that start with the word 'install'. For any new artifacts that it finds from these scans (such as an updated bundle), it passes them over to the OSGi Installer which then actually does the installation. It's worth mentioning that, in Sling, a 'provider' is intended to find and 'provide' items to an installer.
Okay so what happened to my code...
Some time around August in 2014, the Sling community updated the JCR Installer Provider to introduce a new mechanism where installations could be put on hold to allow time for another installation to complete. While this makes sense, especially given just how large some package installations can be for sizable teams with reasonably large code bases, it means that the package installer now has a new responsibility to create a node at /system/sling/installer/jcr/pauseInstallation at the start of an installation and to to remove this node at the conclusion. What ended up happening is that, sometimes, installation can remain paused indefinitely due to a number of different reasons. While this ticket exists and seeks to resolve the issue within Sling, it looks like Adobe, as of the newly released AEM 6.2, may have already resolved it (see the fourth bullet under 'Deployment'), potentially as part of an update to CRX.
So until you're on 6.2, or until SLING-5421 gets resolved, if you're finding that your bundles aren't updating appropriately, make sure you check your /system/sling/installer/pauseInstallation directory for any children. If you see any, and you're not actively installing, delete them, save, and the JCR Installer Provider should then pass your updated bundle(s) over to the OSGi Installer.