Skip to content

Python Type Checker in a Monorepo#

This guide will walk you through how to set up a type checker within a monorepo. To understand the motivation and design, please refer to the design doc.

Tip

You can jump straight to the full implementation if you do not want to read through the guide.

Step 1: Install Pyright (optional)#

First, install Pyright globally at the root of your monorepo:

poetry add pyright

This solution is a community-maintained installation method, but is endorsed by the Pyright.1 This will allow for Pyright to be version controlled by our pyproject.toml file. This is not strictly necessary, but it is easier than installing via npm.

Step 2: Set up a shim script#

In the root of your monorepo, create a script called type-check.py:

type-check.py
import subprocess
import sys
from pathlib import Path

def main():
    subprocess.run(["poetry", "run", "pyright", "-h"], capture_output=True)
    paths = [Path(p).resolve(strict=True) for p in sys.argv[1:]]

if __name__ == "__main__":
    main()

While the pyright-python package is technically installed, it does not actually install the Pyright executable.2 By invoking pyright -h, Node.js and Pyright will be installed under the hood for real this time. If you are not using the pyright-python package, you can skip line 6.

We expect any arguments passed in to be file paths, so we convert them to Path objects. This will allow us to easily manipulate them later on. We also resolve them to make them absolute, and check that they exist to fail fast.

Step 3: Look for projects#

Add a few more pieces of functionality to the script:

type-check.py
def main():
    ...
    projects = infer_projects(paths) if len(paths) > 0 else get_all_projects()

Here we calling one of functions that we will define next: def infer_projects(paths) and def get_all_projects(). These are described in Finding Projects.

You can add the following code above def main() in the script:

type-check.py
import logging
import os
import re

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(message)s")

ROOT = Path(os.getcwd())
PYRIGHT_EXCLUDE = [r".*\/node_modules", r".*\/__pycache__", r".*\/\..*"]
IGNORED = [ROOT]


def get_all_projects() -> list[Path]:
    logger.info("\nSearching for projects:")
    projects = []
    for root, _, files in os.walk(ROOT):
        if any(re.match(e, root) for e in PYRIGHT_EXCLUDE):
            continue
        if Path(root) in IGNORED:
            continue
        for filename in files:
            if filename == "pyproject.toml":
                logger.info(f" - {root}")
                projects.append(Path(root))
    return projects


def infer_projects(paths: list[Path]) -> list[Path]:
    logger.info("\nInferring projects:")
    projects = []
    for path in paths:
        if path.joinpath("pyproject.toml").exists():
            if path in IGNORED:
                continue
            if path not in projects:
                logger.info(f" - {path}")
                projects.append(path)
            continue
        for parent in path.parents:
            if parent.joinpath("pyproject.toml").exists():
                if parent in IGNORED:
                    continue
                logger.info(f" - {parent}")
                projects.append(parent)
                break
    return projects

def main():
    ...

def get_all_projects() uses os.walk() to find all the pyproject.toml files in the monorepo. It makes sure to exclude any directories that Pyright excludes by default,3 and any directories that we explicitly do not want to include. While there are some implementations of os.walk() that use multiprocessing, this was simpler and os.walk() is not the bottleneck.

def infer_projects(paths: list[Path]) uses a more guided approach. Since the script has passed in file paths, we can assume the user only wants to type check those files. For each path, we step up the directory tree until we find the root of the corresponding project.

Step 4: Install dependencies#

Now that we have all the projects we want to type check, we need to make sure they have their dependencies installed. You can read more about this in Installing Dependencies.

Add the following code to the main function:

type-check.py
def main():
    ...
    install_projects(projects)
where def install_projects() can be inserted right above def main():
type-check.py
...
def install_projects(projects: list[Path]):
    logger.info("\nInstalling projects:")
    for project in projects:
        logger.info(f" - {project}")
        proc = subprocess.run(
            ["poetry", "--directory", project, "install", "--all-extras"],
            capture_output=True,
        )
        if proc.returncode != 0:
            raise RuntimeError(
                f"Failed to install {project}"
            ) from subprocess.CalledProcessError(proc.stderr)

def main():
    ...

This function will check if the project is up to date with its installation. If not, it will install everything properly. Some projects might have extra dependencies, so we use --all-extras to make sure we get everything.

Step 5: Type check the projects#

Now that we have all the projects installed, we are ready to type check them. To understand the design better, you can check out Type Checking.

Add the following code to the main function:

type-check.py
def main():
    ...
    run_pyright(projects, paths)
This invokes def run_pyright() which can be inserted right above def main(), along with some helper functions:
type-check.py
import tomllib
...
def get_pyright_config_path(project: Path) -> Path:
    if (json_config := project.joinpath("pyrightconfig.json")).exists():
        return json_config
    if (toml_config := project.joinpath("pyproject.toml")).exists():
        data = tomllib.load(toml_config.open("rb"))
        if data.get("tool", {}).get("pyright", {}) != {}:
            return toml_config
    return ROOT


def get_venv_path(project: Path) -> Path:
    proc = subprocess.run(
        ["poetry", "--directory", project, "env", "info", "--path"],
        capture_output=True,
    )
    if proc.returncode != 0:
        raise RuntimeError(
            f"Failed to get venv path for {project}"
        ) from subprocess.CalledProcessError(proc.stderr)
    return Path(proc.stdout.decode().strip())


def run_pyright(projects: list[Path], paths: list[Path]):
    logger.info("\nLaunching Pyright on:" + "".join(f"\n - {p}" for p in projects))
    code = 0
    for project in projects:
        logger.info(f"\n{'-'*15} {project} {'-'*15}\n")

        config_path = get_pyright_config_path(project).as_posix()
        logger.info(f"  config: {config_path}")

        venv_path = get_venv_path(project).as_posix()
        logger.info(f"  venv: {venv_path}")

        cmd = [
            "poetry",
            "run",
            "pyright",
            "--project",
            config_path,
            "--venvpath",
            venv_path,
        ]
        if len(paths) > 0:
            cmd.extend([p.as_posix() for p in paths if p.is_relative_to(project)])
        else:
            cmd.append(project.as_posix())

        proc = subprocess.run(cmd, capture_output=True)

        if len(out_str := proc.stdout.decode()) > 0:
            logger.info(out_str)
        if len(err_str := proc.stderr.decode()) > 0:
            logger.error(err_str)
        logger.info(f"\n{'-'*15} {project} {'-'*15}\n")
        code = max(code, proc.returncode)

    sys.exit(code)

def main():
    ...

def run_pyright() will iterate over each project, and launch Pyright with the right configuration file, the project-specific venv, and any project-specific files. It will then capture and log the output, without erroring out. This will allow each call to run, even if one fails. At the end, it will then exit with an error if any of the calls failed.

def get_pyright_config_path() will look for a project-specific configuration file. If it finds one, it will return it. If it does not, it will use the global configuration file at the root of the monorepo. Pyright prioritizes pyrightconfig.json over anything in pyproject.toml, so we will mimic that behavior.

def get_venv_path() is relatively straightforward. It will get the virtual environment path for the project.4

Step 6: Run the script#

Since you are using Poetry, you can add the following to your pyproject.toml:

pyproject.toml
packages = [
    { include = "type_check.py"},
]

[tool.poetry.scripts]
type-check = 'type_check:main'

This will allow you to run the script on the entire monorepo:

poetry run type-check

or on specific projects:

poetry run type-check path/to/project_1 path/to/project_2 path/to/project_3/file.py

TLDR#

If you want to skip all the explanations and just see the full script, here it is:

Click to see the full script
type-check.py
import logging
import os
import re
import subprocess
import sys
import tomllib
from pathlib import Path

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(message)s")

ROOT = Path(os.getcwd())
PYRIGHT_EXCLUDE = [r".*\/node_modules", r".*\/__pycache__", r".*\/\..*"]
IGNORED = [ROOT]


def get_all_projects() -> list[Path]:
    logger.info("\nSearching for projects:")
    projects = []
    for root, _, files in os.walk(ROOT):
        if any(re.match(e, root) for e in PYRIGHT_EXCLUDE):
            continue
        if Path(root) in IGNORED:
            continue
        for filename in files:
            if filename == "pyproject.toml":
                logger.info(f" - {root}")
                projects.append(Path(root))
    return projects


def infer_projects(paths: list[Path]) -> list[Path]:
    logger.info("\nInferring projects:")
    projects = []
    for path in paths:
        if path.joinpath("pyproject.toml").exists():
            if path in IGNORED:
                continue
            if path not in projects:
                logger.info(f" - {path}")
                projects.append(path)
            continue
        for parent in path.parents:
            if parent.joinpath("pyproject.toml").exists():
                if parent in IGNORED:
                    continue
                logger.info(f" - {parent}")
                projects.append(parent)
                break
    return projects


def install_projects(projects: list[Path]):
    logger.info("\nInstalling projects:")
    for project in projects:
        logger.info(f" - {project}")
        proc = subprocess.run(
            ["poetry", "--directory", project, "install", "--all-extras"],
            capture_output=True,
        )
        if proc.returncode != 0:
            raise RuntimeError(
                f"Failed to install {project}"
            ) from subprocess.CalledProcessError(proc.stderr)


def get_pyright_config_path(project: Path) -> Path:
    if (json_config := project.joinpath("pyrightconfig.json")).exists():
        return json_config
    if (toml_config := project.joinpath("pyproject.toml")).exists():
        data = tomllib.load(toml_config.open("rb"))
        if data.get("tool", {}).get("pyright", {}) != {}:
            return toml_config
    return ROOT


def get_venv_path(project: Path) -> Path:
    proc = subprocess.run(
        ["poetry", "--directory", project, "env", "info", "--path"],
        capture_output=True,
    )
    if proc.returncode != 0:
        raise RuntimeError(
            f"Failed to get venv path for {project}"
        ) from subprocess.CalledProcessError(proc.stderr)
    return Path(proc.stdout.decode().strip())


def run_pyright(projects: list[Path], paths: list[Path]):
    logger.info("\nLaunching Pyright on:" + "".join(f"\n - {p}" for p in projects))
    code = 0
    for project in projects:
        logger.info(f"\n{'-'*15} {project} {'-'*15}\n")

        config_path = get_pyright_config_path(project).as_posix()
        logger.info(f"  config: {config_path}")

        venv_path = get_venv_path(project).as_posix()
        logger.info(f"  venv: {venv_path}")

        cmd = [
            "poetry",
            "run",
            "pyright",
            "--project",
            config_path,
            "--venvpath",
            venv_path,
        ]
        if len(paths) > 0:
            cmd.extend([p.as_posix() for p in paths if p.is_relative_to(project)])
        else:
            cmd.append(project.as_posix())

        proc = subprocess.run(cmd, capture_output=True)

        if len(out_str := proc.stdout.decode()) > 0:
            logger.info(out_str)
        if len(err_str := proc.stderr.decode()) > 0:
            logger.error(err_str)
        logger.info(f"\n{'-'*15} {project} {'-'*15}\n")
        code = max(code, proc.returncode)

    sys.exit(code)


def main():
    subprocess.run(["poetry", "run", "pyright", "-h"], capture_output=True)
    paths = [Path(p).resolve(strict=True) for p in sys.argv[1:]]

    projects = infer_projects(paths) if len(paths) > 0 else get_all_projects()

    install_projects(projects)

    run_pyright(projects, paths)


if __name__ == "__main__":
    main()