Python Type Checker in a Monorepo#
This guide will walk you through how to set up a type checker within a monorepo. To understand the motivation and design, please refer to the design doc.
Tip
You can jump straight to the full implementation if you do not want to read through the guide.
Step 1: Install Pyright (optional)#
First, install Pyright globally at the root of your monorepo:
This solution is a community-maintained installation method, but is endorsed by the Pyright.1 This will allow for Pyright to be version controlled by our pyproject.toml
file. This is not strictly necessary, but it is easier than installing via npm.
Step 2: Set up a shim script#
In the root of your monorepo, create a script called type-check.py
:
type-check.py | |
---|---|
While the pyright-python package is technically installed, it does not actually install the Pyright executable.2 By invoking pyright -h
, Node.js and Pyright will be installed under the hood for real this time. If you are not using the pyright-python package, you can skip line 6.
We expect any arguments passed in to be file paths, so we convert them to Path
objects. This will allow us to easily manipulate them later on. We also resolve them to make them absolute, and check that they exist to fail fast.
Step 3: Look for projects#
Add a few more pieces of functionality to the script:
def main():
...
projects = infer_projects(paths) if len(paths) > 0 else get_all_projects()
Here we calling one of functions that we will define next: def infer_projects(paths)
and def get_all_projects()
. These are described in Finding Projects.
You can add the following code above def main()
in the script:
import logging
import os
import re
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
ROOT = Path(os.getcwd())
PYRIGHT_EXCLUDE = [r".*\/node_modules", r".*\/__pycache__", r".*\/\..*"]
IGNORED = [ROOT]
def get_all_projects() -> list[Path]:
logger.info("\nSearching for projects:")
projects = []
for root, _, files in os.walk(ROOT):
if any(re.match(e, root) for e in PYRIGHT_EXCLUDE):
continue
if Path(root) in IGNORED:
continue
for filename in files:
if filename == "pyproject.toml":
logger.info(f" - {root}")
projects.append(Path(root))
return projects
def infer_projects(paths: list[Path]) -> list[Path]:
logger.info("\nInferring projects:")
projects = []
for path in paths:
if path.joinpath("pyproject.toml").exists():
if path in IGNORED:
continue
if path not in projects:
logger.info(f" - {path}")
projects.append(path)
continue
for parent in path.parents:
if parent.joinpath("pyproject.toml").exists():
if parent in IGNORED:
continue
logger.info(f" - {parent}")
projects.append(parent)
break
return projects
def main():
...
def get_all_projects()
uses os.walk()
to find all the pyproject.toml
files in the monorepo. It makes sure to exclude any directories that Pyright excludes by default,3 and any directories that we explicitly do not want to include. While there are some implementations of os.walk()
that use multiprocessing, this was simpler and os.walk()
is not the bottleneck.
def infer_projects(paths: list[Path])
uses a more guided approach. Since the script has passed in file paths, we can assume the user only wants to type check those files. For each path, we step up the directory tree until we find the root of the corresponding project.
Step 4: Install dependencies#
Now that we have all the projects we want to type check, we need to make sure they have their dependencies installed. You can read more about this in Installing Dependencies.
Add the following code to the main function:
wheredef install_projects()
can be inserted right above def main()
:
...
def install_projects(projects: list[Path]):
logger.info("\nInstalling projects:")
for project in projects:
logger.info(f" - {project}")
proc = subprocess.run(
["poetry", "--directory", project, "install", "--all-extras"],
capture_output=True,
)
if proc.returncode != 0:
raise RuntimeError(
f"Failed to install {project}"
) from subprocess.CalledProcessError(proc.stderr)
def main():
...
This function will check if the project is up to date with its installation. If not, it will install everything properly. Some projects might have extra dependencies, so we use --all-extras
to make sure we get everything.
Step 5: Type check the projects#
Now that we have all the projects installed, we are ready to type check them. To understand the design better, you can check out Type Checking.
Add the following code to the main function:
This invokesdef run_pyright()
which can be inserted right above def main()
, along with some helper functions:
import tomllib
...
def get_pyright_config_path(project: Path) -> Path:
if (json_config := project.joinpath("pyrightconfig.json")).exists():
return json_config
if (toml_config := project.joinpath("pyproject.toml")).exists():
data = tomllib.load(toml_config.open("rb"))
if data.get("tool", {}).get("pyright", {}) != {}:
return toml_config
return ROOT
def get_venv_path(project: Path) -> Path:
proc = subprocess.run(
["poetry", "--directory", project, "env", "info", "--path"],
capture_output=True,
)
if proc.returncode != 0:
raise RuntimeError(
f"Failed to get venv path for {project}"
) from subprocess.CalledProcessError(proc.stderr)
return Path(proc.stdout.decode().strip())
def run_pyright(projects: list[Path], paths: list[Path]):
logger.info("\nLaunching Pyright on:" + "".join(f"\n - {p}" for p in projects))
code = 0
for project in projects:
logger.info(f"\n{'-'*15} {project} {'-'*15}\n")
config_path = get_pyright_config_path(project).as_posix()
logger.info(f" config: {config_path}")
venv_path = get_venv_path(project).as_posix()
logger.info(f" venv: {venv_path}")
cmd = [
"poetry",
"run",
"pyright",
"--project",
config_path,
"--venvpath",
venv_path,
]
if len(paths) > 0:
cmd.extend([p.as_posix() for p in paths if p.is_relative_to(project)])
else:
cmd.append(project.as_posix())
proc = subprocess.run(cmd, capture_output=True)
if len(out_str := proc.stdout.decode()) > 0:
logger.info(out_str)
if len(err_str := proc.stderr.decode()) > 0:
logger.error(err_str)
logger.info(f"\n{'-'*15} {project} {'-'*15}\n")
code = max(code, proc.returncode)
sys.exit(code)
def main():
...
def run_pyright()
will iterate over each project, and launch Pyright with the right configuration file, the project-specific venv, and any project-specific files. It will then capture and log the output, without erroring out. This will allow each call to run, even if one fails. At the end, it will then exit with an error if any of the calls failed.
def get_pyright_config_path()
will look for a project-specific configuration file. If it finds one, it will return it. If it does not, it will use the global configuration file at the root of the monorepo. Pyright prioritizes pyrightconfig.json
over anything in pyproject.toml
, so we will mimic that behavior.
def get_venv_path()
is relatively straightforward. It will get the virtual environment path for the project.4
Step 6: Run the script#
Since you are using Poetry, you can add the following to your pyproject.toml
:
packages = [
{ include = "type_check.py"},
]
[tool.poetry.scripts]
type-check = 'type_check:main'
This will allow you to run the script on the entire monorepo:
or on specific projects:
TLDR#
If you want to skip all the explanations and just see the full script, here it is:
Click to see the full script
type-check.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|