This page contains a collection of issues that do come up in the context of scientific and data science projects and packaging those, but are deemed less high-impact than the key issues.
Lack of support for symlinks in wheels
Shared libraries on Linux and other non-Windows platforms are often provided
and versioned via symlinks (examples:
libarrow in this Discourse thread,
#453). In order to build wheels
containing versioned shared libraries, symlink support is needed. In the
absence of that, the symlinks get materialized into full copies of the
symlinked files, blowing up wheel sizes.
A second use case for symlinks is for editable installs when the build system uses out-of-place builds. Out-of-place builds are the only option in Meson, and also good practice for CMake. For out-of-place builds, you end up with compiled extension modules and generated files in the build directory, and .py files in the source directory. To put those together into a working editable install, the most straightforward solution is putting symlinks to all files in a wheel - see meson-python#47.
It looks like there is an understanding now that symlink support is needed, and that it requires a new wheel format spec (and hence a PEP) - see Clarifications to the wheel specification.
An experimental setuptools extension, wheel-axle, implements support for producing a wheel containing symlinks.
Dropping support for old manylinux versions is difficult
Due to how wheel tags work, they need to be explicitly recognized by build and
install tools. Old versions of
pip tend to be used for years (especially in
Linux distros), which means that when a project starts distributing wheels in a
newer format (e.g.,
manylinux2014 instead of
manylinux1), those new wheels
will not be recognized for part of the user base for a long time. As a result,
projects are forced to also continue distributing the older format, to avoid
those users getting no wheels and a build from sdist instead. Being forced to
produce duplicate wheels for years is a lot of extra work and CI time. This is
in principle a problem on all platforms, it tends to show up more for Linux
because of the combination of old
pip versions and more changes to platform
tags (we've had
manylinux2014 and now, with
PEP 600, "perennial manylinux" - but that still requires agreeing on new glibc
versions to start shipping in practice).
Wheel build tooling is implemented in a scattered fashion
When working with native dependencies, one must use a tool to vendor
dependencies that aren't part of the platform by wheel standards. There are at
least three different tools for this:
delvewheel (Windows). They have the same job, but are three independent
projects with different capabilities. This is bad from a usability perspective,
and when improvements to this tooling needs to be made, the discussion may have
to be had multiple times (example: adding an
--exclude option to not vendor
certain libraries: auditwheel#368).
This scattering issue can also be observed in the many support packages to deal
with metadata, wheel tags, and other aspects of producing wheels, e.g.:
pyproject-metadata. And with
build not using the same UX for things like
Bootstrapping and circular dependencies of Python packaging tools
Python packaging tools have a bit of a bootstrapping issue, which is a problem
for other packaging systems when they want to incorporate those packages. If
one wants to build and install
wheel from source, one
wheel already installed. Same for
poetry does better here, it uses
build itself). This is getting better -
pip vendors all of its runtime
dependencies so it can produce a wheel to install itself, and
vendors a TOML parser - but there is still a ways to go. See this
Bootstrapping a specific version of pip
thread for some discussion on this.
No good way to install headers or non-Python libraries
If a library provides functionality that is meant to be used from C or C++ code
in another package, one needs to install headers and libraries. To make that
work well, those headers and libraries should be installed in a place where
other tools can find them. There are standard places for this on a system, e.g.
for a prefix
/usr the headers may go into
/usr/include/ and the libraries
/usr/lib. This is technically possible with wheels, but recommended
against because the install process may clobber system files. As a result, what
projects like NumPy, Pybind11 and PyArrow end up doing is installing into their
own tree under
site-packages/pkgname (which certainly won't be on a search
path), and then recommending that consuming packages query the location with a
get_include function. E.g.:
import pyarrow pyarrow.get_include()
These still have to be worked out:
- UX for build and install tools is painful and easy to shoot oneself in the foot with (e.g., most users and maintainers don't understand the details of build isolation)
- Tooling will often assume virtualenvs only, and/or deal with environment activation when it really shouldn't.
Created: December 20, 2022