Skip to content

Python Type Checker in a Monorepo#

Background and Motivation#

Type checkers are important tools for ensuring the correctness of your code. They work extremely well in a repository with a single project, but not so well in a monorepo. For example, the authors of Pyright recommend that you set up an individual instance of Pyright for each project, and run each individually.1 If you want to manage type check rules across the repo, you must do so in each project's configuration file. This can be a lot of overhead, and is antithetical to the monorepo philosophy, where you want to simplify the tooling as much as possible.2 So how do we set up type checking in a way that takes advantage of the monorepo structure?

Goals and Non-Goals#

Goals#

  1. Minimize the overhead of maintaining a type checker in a monorepo: This is the primary goal. After this initial set up, it should hopefully be no harder to maintain than in a single project repository.
  2. Maximize the speed of the type checker: It should be no slower than running a user running the type checker on the same number of projects individually. We do not need to push the limits of the type checker though.
  3. Global Configuration: We want to keep the configuration global, so that we can manage all the rules in one place.
  4. Be agnostic to the structure of the monorepo: We do not want to make any assumptions about the structure of the monorepo, as this can vary greatly. Regardless of the structure, it should just work. We may have to make assumptions about the python project itself (e.g. it has a pyproject.toml file), but we should not make assumptions about the monorepo itself.

Non-Goals#

  1. Allow for overriding of the global configuration: A global configuration will work most of the time, but there may be certain projects where that does not work. If a project needs custom rules, it would be nice to allow for that.
  2. Support for multiple type checkers and package managers: This tool can be opinionated and only support one type checker and package manager. But it should be possible to extend this to support multiple in the future.

Design#

Frameworks Used#

  1. Pyright: Pyright is a fast type checker that is designed for performance.5 It is a good choice for a monorepo. The only downside is that it does not have as much buy-in from the community as mypy, but that is slowly changing.6 So we should theoretically be able to switch to mypy if we need to (see non-goals).
  2. Poetry: Poetry is a package manager that is relatively popular and robust. I have chosen it due to my familiarity with it, but it should be possible to adapt this guide to other package managers.

Implementation#

This will be a shim, where most of the work is done by Pyright and Poetry. It should be possible to implement all the necessary logic in a single Python script. It should also be possible to just use built-in Python libraries, so that it is easy to copy-paste the shim around, and modify to suit your needs. While a more robust solution may be needed in the future, this should be sufficient for now.

The script will be broken down into three main steps:

  1. Find all relevant projects in the monorepo
  2. Install dependencies for each project
  3. Type check each project

Finding Projects#

There are a few ways to find the projects in a monorepo. The fastest naive way to exhaustively search seems to be os.walk().34 It may be even faster to exploit the fact that we may know the files to analyze ahead of time, and only look in those directories for projects. So it will go something like this:

flowchart LR

args{{"Input Paths (optional)"}}
exhaust("Iterate with `os.walk()` for all `pyproject.toml` files")

subgraph impute [For each path]
    direction LR
    subgraph walk_up [For each parent directory]
        direction LR
        is_it1{{"Does it contain a `pyproject.toml`?"}}
        append1("append directory to list of projects")
        continue1("continue to next parent directory")

        is_it1 -->|yes| append1
        is_it1 -->|no| continue1
    end
end

subgraph exhaust ["For every directory in `os.walk()`"]
    direction LR
    subgraph files ["For every file in the directory"]
        direction LR
        is_it2{{"Is the file a `pyproject.toml`?"}}
        append2("append directory to list of projects")
        continue2("continue to next file")

        is_it2 -->|yes| append2
        is_it2 -->|no| continue2
    end
end

args -->|paths are specified| impute
args -->|no input paths| exhaust

Installing Dependencies#

Pyright requires the dependencies to be installed in order to type check the code. This is a necessary step, but once installed the first time, this will be relatively quick.

Poetry Install Speed

poetry install --dry-run does not seem to be any faster than poetry install, so it seems that checking ahead of time does not help. So we will just loop over all the projects and call poetry install for each one.

Type Checking#

Pyright does not have any required arguments, but it does have some optional arguments. The ones that this script will use are:

  1. --project: The location of the configuration file.
  2. --venvpath: The location of the virtual environment.

All other arguments are optional, and we will expect that they are stored in a config. While we could pass other arguments on the command line, we are always specifying the configuration file path, so it makes more sense to store all the arguments in the configuration file.

We will pass in the global configuration file, the virtual environment, and the respective files to the type checker.

We will also want to run each project in isolation, so that if one project fails, the others can still run. This way the user can see all the errors at once, and avoid the overhead of running the type checker multiple times.

Here is a flowchart of the process:

flowchart LR

subgraph projects ["for project in all projects"]
    direction LR
    subgraph config_path ["Find the Pyright config path"]
        direction LR
        json("Look for project-specific `pyrightconfig.json`")
        toml("Look for `[tool.pyright]` in `pyproject.toml`")
        return_global("Use root `pyproject.toml`")
        return_json("Use `pyrightconfig.json`")
        return_toml("Use project `pyproject.toml`")

        json -->|"If exists"| return_json
        json -->|"If does not exist"| toml

        toml -->|"If exists"| return_toml
        toml -->|"If does not exist"| return_global
    end

    subgraph venv_path ["Find the venv of the project"]
        direction LR
        call_venv("Capture output of `poetry env info --path`")
    end

    subgraph paths ["Find the project-specific paths"]
        direction LR
        are_there_paths{{"Were paths passed as input?"}}
        filter("Filter to the paths specific to the project")
        use_root("Pass the project directory so Pyright can infer paths on its own")

        are_there_paths -->|yes| filter
        are_there_paths -->|no| use_root
    end

    call_pyright("Launch Pyright")

    config_path -->|"Pass with `--project`"| call_pyright
    venv_path -->|"Pass with `--venvpath`"| call_pyright
    paths -->|"Pass as args"| call_pyright
end

projects --> collect_errors("Do not error until all projects are done")

Drawbacks#

  1. No support for multiple configuration files: Pyright does not support multiple configuration files.7 This means that if you need to override the global configuration file, you will need to manually override all settings, not just a single one.
  2. Not a published package: For users who just want to pip install a solution, this will not work. This will have to be copy-pasted into your monorepo and modified to suit your needs.
  3. Cannot use include: Pyright must be run on a specified project, given that there are multiple projects in the monorepo. This means that we must programmatically pass the files to Pyright, which "overrides the files or directories specified in the pyrightconfig.json or pyproject.toml file."8

Alternatives Considered#

  1. BasedPyright instead of Pyright: This is a fork of Pyright that has some improvements. The key one is that it gets around the installation bug of pyright-python.9 However, it is an unproven fork,10 that makes some opinionated changes. The author is also anonymous, and is attempting to drive a wedge, rather than contributing back to the main project.11 While some of the author's changes seem good, it is not worth the upside at this time.
  2. Mypy instead of Pyright: Mypy is a little slower than Pyright, so it is not as good of a choice for a monorepo. However, it is more mature, and has more buy-in from the community. Pyright is catching up,12 and this script is designed to easily swap out Pyright or extended to support multiple type checkers in the future. You can look into the differences and decide for yourself.13
  3. Ruff instead of Pyright: Today, Ruff does not support type checking.14 However, it is something they have hinted at, and there is unpublicized work being done on it.15 It is likely that Ruff will support type checking in the future, and if is anything like the rest of the tool, it will be significantly faster and user-friendly than Pyright. It is worth keeping an eye on this project.
  4. uv instead of Poetry: This is a new package manager that is extremely fast. It could be a good replacement for Poetry, making the installation of dependencies significantly faster. However, it is as mature as Poetry. It is worth keeping an eye on this project, but it is not worth switching, at least until they have universal lock files.16