Engineering

How monorepos can help improve your productivity

Xavi Portilla Edo
|
September 12, 2019

What is a Monorepo?

Tired of managing shared dependencies across multiple services/projects and tip-toeing around update sequencing? Monorepos can save you from dependency hell by streamlining the release process of multiple dependencies.

According to the Wikipedia definition of monorepo:

In version control systems, a monorepo ("mono" meaning 'single' and "repo" being short for 'repository') is a software development strategy where code for many projects is stored in the same repository. - Wikipedia

It is true that when we have multiple libraries that have dependencies between them, the maintenance process turns into a complex task, especially when the libraries depend on each other.

To resolve this problem, we have adopted the monorepo structure:

A monorepo is a software development strategy where code for many projects is stored in the same repository
Image courtesy of Toptal

For Javascript/Typescript there are a lot of libraries that can help us build monorepos. The latest versions of the most common JS/TS package managers like {% c-line %}yarn{% c-line-end %} or {% c-line %}npm{% c-line-end %} accept the monorepo structure. For our monorepos we are going to use the {% c-line %}lerna{% c-line-end%} library and {% c-line %}yarn{% c-line-end %} workspaces.

Monorepo Structure

For the monorepo, we are going to look at the following structure:

a) We are going to have a root {% c-line %}/package.json{% c-line-end %}where the common configurations will be. With this change, we will not have duplicate, obsolete or misconfiguration in our packages because it will be managed in the same place:

  1. Lint-staged
  2. Husky
  3. Prettier
  4. Eslint
  5. IDEs configurations

b) In the {% c-line %}/packages/{% c-line-end %} folder, we will have all the packages. Those packages will have only the necessary files, which means that only the specific configuration will be there. For example, there we can find the TS build configuration or the Mocha configuration for testing.

This is a cleaner way to manage and maintain a monorepo. Find below the final structure of the monorepo:

A typical monorepo structure
Typical Repo Structure (Image courtesy of Sharath Holla)

Conventional commits with Monorepo

{% c-line %}lerna{% c-line-end %} supports Conventional Commits. By adding the flag  {% c-line %}conventionalCommits {% c-line-end %} to the  {% c-line %}lerna.json{% c-line-end %} file we can start committing using the conventional commit syntax.

So how does it work if we have multiple packages in one repo?

  1. {% c-line %}Lerna{% c-line-end %} will read all commits
  2. Those commits will be checked
  3. The files that are involved in each commit will be determined
  4. After reading that information the new versions will be calculated

Let's see an example that will clarify a lot about how it works. Imagine that we have two packages:{% c-line %}B{% c-line-end %} which has a dependency on {% c-line %}A {% c-line-end %}. Let's see some common scenarios:

  1. If we make a change on package {% c-line %}A{% c-line-end %} and then we create a commit with the following commit message: {% c-line %}fix: some fixes done {% c-line-end %} Lerna will detect that there is a change in package {% c-line %}A{% c-line-end %} and it will update the minor version of that package. As package {% c-line %}A{% c-line-end %} is a dependency of package {% c-line %}B{% c-line-end %}, Lerna will update the minor version of that package too.
  2. If we make a change on package {% c-line %}B{% c-line-end %}, and then we create a commit with the following commit message: fix: some fixes done. {% c-line %}Lerna{% c-line-end %} will update its minor version and as it does not have any dependencies with other packages, no more packages will be updated.
  3. If we make a change on package {% c-line %}B{% c-line-end %} and package {% c-line %}A{% c-line-end %}: If we commit all the changes together and if the commit message has something like {% c-line %}fix: some fixes done{% c-line-end %} then both packages will be updated and package {% c-line %}B{% c-line-end %} will include the new version of package {% c-line %}A{% c-line-end %}.

The next version calculation is generated when the CI process runs the command {% c-line %}lerna publish{% c-line-end %} command.

SonarCloud and Monorepo

As we have only one repository with independent packages we have to configure SoncarCloud accordingly to the monorepo structure. There are 2 ways to do that:

1. The monorepo configuration is actually supported by SonarCloud natively. Not only for JS/TS process but also for other languages. You can see the instructions here: https://sonarcloud.io/documentation/analysis/setup-monorepo/. With this configuration, you have to create a new SonarCloud project per each package in the monorepo and then run multiple SonarCloud Scans.

2. The other option is to create a single SonarCloud project and different SonarCloud sub-modules. For this configuration you need a {% c-line %}sonar-project.properties{% c-line-end %} file in the root folder and one per package as well. The important thing in this configuration is that we only have to run only one SonarCloud Scan. On the GUI you will find the monorepo project with the information of all its modules in a single place:

SonarCloud and Monorepo

We recommend the second option because it is more optimal in terms of SonarCloud Scan executions.

The yarn install process

Using a monorepo structure will change the yarn install command due to the fact that we are using {% c-line %}yarn workspaces{% c-line-end %} and {% c-line %}lerna{% c-line-end %}.

Starting with {% c-line %}yarn workspaces{% c-line-end %} we will see the following improvements:

  1. The {% c-line %}node_modules{% c-line-end %} will be installed only in one place and ONLY the needed BINARIES will be copied to the sub-modules
  2. Only one {% c-line %}yarn.lock{% c-line-end %} file will be generated and is the only one we have to track, verify and store

Following with {% c-line %}lerna{% c-line-end %}, the command {% c-line %}lerna bootstrap{% c-line-end %} will be executed immediately after the yarn install command. This command is a key because it will do the following tasks:

  1. Link all shared dependencies creating symlinks
  2. Run {% c-line %}yarn run prepublish{% c-line-end %} in all bootstrapped packages
  3. Run {% c-line %}yarn run prepare{% c-line-end %} in all bootstrapped packages
The yarn install process
Image courtesy of Brigad Engineering Blog/Thibault Malbranche

Local package linking/unlinking flow during development

When a {% c-line %}yarn{% c-line-end %} install command is executed, a linking between all shared libraries will be created automatically. It is similar to the {% c-line %}yarn link{% c-line-end %} command but much more powerful. We have to specify the specific version of the packages in the dependencies so with that {% c-line %}lerna{% c-line-end %} knows that there is a dependency and it will create a symlink automatically:

Local package linking/unlinking flow during development
Image courtesy of Naresh Bhatia

Continuous integration using CircleCI

Since we introduced our first monorepo, there were some improvements on the pipelines that had to be applied in order to reduce the build times and do things more intelligently. Because of that, we needed to develop some improvements in the monorepos that we have currently. There are monorepos that have microservices as a {% c-line %}package{% c-line-end %} and there are monorepos that only contain libraries and there are others that have a mix. This is why we introduced the following improvements depending on the monorepo composition.

The main change introduced for the monorepos is that not all {% c-line %}packages{% c-line-end %} have changes in every commit or in every PR. Due to this, we need to implement a multi-level change detection that allows us to determine which job or workflow we need to execute.

Changes detection from the master branch

Thanks to the new dynamic configurations from CircleCI we can trigger a configuration file with some toggles depending on the changes of the current branch and comparing it with {% c-line %}master{% c-line-end %} branch. So if we take a look at the monorepo and its CircleCI config, we will see the pattern explained above:

Continuous integration using CircleCI

Here, you can see that depending on the changes, and more specifically, depending on the packages that have changed, we will set some {% c-line %}vars{% c-line-end %} to {% c-line %}true{% c-line-end %}. After specifying the {% c-line %}env{% c-line-end %} vars, we will trigger the {% c-line %}continue-config.yaml{% c-line-end %} configuration file with those values properly set.

CircleCI setup configuration triggers

The new CircleCI configuration will trigger the workflows and the jobs required depending on the changes from {% c-line %}master{% c-line-end %} branch.

This is a great improvement, but must move forward and improve. Why is this pattern not enough? Well, it's because it only looks at the changes from the {% c-line %}master{% c-line-end %} branch instead of the last commit. Because of this, we can make an addition: determine the packages that have changed in every single commit and execute the jobs and commands in those specific packages along with the common ones.

Changes detection from the last commit

To determine if a pipeline has to be executed or not, we have implemented two mechanisms that detect the changes from the last commit.

1. Determine if a job has to be executed:

a) In the last chapter, we saw that the dynamic configurations detects changes from {% c-line %}master{% c-line-end %}, which means that when we change a file from a {% c-line %}package{% c-line-end %}, the toggle will be always be set as true, so all the jobs and workflows that have that toggle will be executed. We have also added a detection in all the jobs that checks if a job has to be executed or not depending on the changes in the last commit. For this, we have created the {% c-line %}command stop_if_no_changes{% c-line-end %}:

2. Determine if a command has to be executed: there are some jobs that are common to all the packages like unit tests, integration tests, lint, etc. For that, we have created a command called {% c-line %}exec_command_monorepo{% c-line-end %} that only executes those tasks in the {% c-line %}packages{% c-line-end %} that have changed from the last commits:

These commands will detect the changes against the last commit and will determine the package that has to run that command. With that, we have a smarter pipeline!

Code

Here you can find a Proof of Concept with everything explained in this blog post: https://github.com/voiceflow/poc-monorepo-ci

Conclusion

As you can see, Monorepos can help us a lot during the development process and during the release process. With the setup and structure explained above, we can focus on developing new features and/or fixing bugs instead of upgrading a number of repositories. With Monorepos - everything is automated.