Cross compilation
The historical assumption of compilation is that the platform where the code is compiled will be the same as the platform where the final code will be executed (if not literally the same machine, then at least one that is CPU and ABI compatible at the operating system level). This is a reasonable assumption for most desktop platforms; however, for some platforms, this isn't the case.
On mobile platforms, an app is compiled on a desktop platform, and transferred to the mobile device (or a simulator) for testing. The compiler is not executed on device. Therefore, it must be possible to build a binary artefact for a CPU architecture and an ABI that is different from the platform that is running the compiler. The situation is similar for embedded devices.
Cross compilation issues also emerge when dealing with continuous integration/deployment (CI/CD). CI/CD platforms (such as Github Actions) generally provide the "common" architectures - often only x86-64 - however, a project may want to produce binaries for other platforms (e.g., ARM support for Raspberry Pi devices; PowerPC or s390x for mainframe/server devices; or for mobile platforms). These binaries won't run natively on the host CI/CD system (without some sort of emulation, for example with QEMU); but code can be compiled for the target platform.
macOS also experiences this as a result of the Apple Silicon transition. Apple has provided the tools to make cross compilation from x86-64 to arm64 as easy as possible, as well as to compile fat binaries (supporting x86-64 and arm64 at the same time) on both architectures. In the latter case, the host platform will still be one of the outputs of the compilation process, and the resulting binary will run on the CI/CD system.
Current state
Native compiler and build toolchains (e.g., autoconf/automake, CMake, Meson) have long supported cross-compilation; however, such cross-compilation capabilities for any given project tend to bitrot and break easily unless they are exercised regularly.
CPython's build system includes some support for cross-compilation. This support
is largely based on leveraging autoconf's support for cross compilation. This
support wasn't well integrated into distutils
and the compilation of the binary
portions of stdlib. The removal of distutils
in Python 3.12 represents an
improvement the overall situation, but there is still a long way to go before
the ecosystem as a whole has fully integrated the consequences of this change.
The way build backend hooks in pyproject.toml
are specified (see PEP 517)
means cross-platform compilation support has been partially converted into a
concern for individual build systems to manage.
In order to cross-compile a Python package, one needs a compiler toolchain as well as two Python installs - one for the build system and one for the host system.1 This can make it a little challenging to get started. If a compiler toolchain is not already provided on the system of interest, it can be built from source with, e.g., crosstool-ng or obtained from, e.g., dockcross. Or one can use a packaging system that has builtin support for cross-compilation. The Yocto Project, OpenEmbedded and Buildroot are projects specifically focused on cross-compilation for Linux embedded systems. More general-purpose packaging ecosystems often have toolchains and supporting infrastructure to cross-compile packages for their own needs - see, e.g., info for Void Linux, conda-forge, Debian and Nix.
Tools like crossenv can be used to trick
Python into performing cross-platform builds. These tools use path hacks and
overrides of known sources of platform-specific details (like sysconfig
and
distutils
) to provide a cross-compilation environment. However, these
solutions tend to be somewhat fragile as they aren't first-class citizens of
the Python ecosystem.
The BeeWare Project also uses a version of these techniques. For both the platforms it supports, BeeWare provides a custom package index that contains pre-compiled binaries (Android; iOS). These binaries are produced using a set of tooling (Android; iOS) that is analogous to the tools used by conda-forge to build binary artefacts.
Problems
There is currently a gap in communicating target platform details to the
build system. While a build system like Meson or CMake may support
cross-platform compilation, and a project may be able to cross-compile binary
artefacts, invocation of a pyproject.toml
build hook typically assumes that the
platform running the build will be the platform that ultimately runs the Python
code. As a result, sys.platform
, or the various attributes of the platform
and sysconfig
modules can't be used as part of the build process.
Running Python code for the host (cross) platform is not possible (modulo
using an emulator), but Python packages have not taken this into account and
provided ways to avoid the need to run the host interpreter. For example,
numpy
and pybind11
ship headers and have get_include()
functions in their
main namespaces to obtain the path to those headers. That is clearly a problem,
which packages depending on those headers have to work around (often done by
patching those packages with hardcoded paths within a cross-compilation setup).
pip
provides support for installing wheels for a different platform
by specifying a --platform
, --implementation
and --abi
flags. However,
these flags only work for packages with wheels, not sdists. Therefore, for
cross compilation setups that rely on pip
rather than another package manager
to install build dependencies, it is cumbersome in practice to prepare the host
(non-native) part of the cross build environment - a single missing -none-any
wheel for a dependency that is pure Python necessitates hacks to get it
installed.2
History
TODO
Relevant resources
- "Towards standardizing cross compiling ", Ben Fogle (2021),
- "PEP xxxx - Standardized Config Settings for Cross-Compiling", Ben Fogle (2021),
- scipy#14812 - Tracking issue for cross-compilation needs and issues (2021),
Potential solutions or mitigations
At the core, what is required is a recognition that the use case of cross-platform builds is something that the Python ecosystem should support.
In concrete terms, for native modules, this would require at least:
- Making it possible to retrieve relevant metadata from a Python installation without having to run Python code.
- Clear separation of metadata associated with the definition of build and target platforms, rather than assuming that build and target platform will always be the same.
In addition, to make cross-compilation easier to use and move from build system specific configuration files - like a "toolchain file" for CMake or a "cross file" for Meson - to a standardized version:
- Extension of the
pyproject.toml
build interface to allow communicating the desired target platform as part of a binary build; or - Formalization of the "platform identification" interface that can used by
build backends to identify the target platform, so that tools like
crossenv
can provide a reliable proxied environment for cross-platform builds.
-
The "build", "host" and "target" terminology for identifying which system is which in a cross-compilation setup is not consistent across build systems and packaging tools. Always carefully check whether "build" means the machine on which the compilation is run and "host" the machine on which the produced binaries will run - or vice versa. ↩
-
The correct solution - filing issues on each project asking them to upload a
-none-any
wheel next to their sdist - typically has a long lead time. Therefore Briefcase, the packaging tool for Beeware, patchespip
to allow installing projects from sdists when--platform
is specified and only error out when the wheel build attempts to invoke a compiler. That way, pure Python packages can be installed directly. ↩