Packaging is an important and time-consuming part of authoring and maintaining Python packages. This is particularly true for projects that are not pure Python but contain code that needs to be compiled, and have to deal with distributing compiled extensions and with build dependencies. Many projects in the PyData ecosystem - which includes scientific computing, data science and ML/AI projects - fall into that category. This site aims to provide an overview of the most important Python packaging issues for such projects, with in-depth explanations and references.
The content on this site is meant to provide insights and good reference material. This will hopefully provide common ground when discussing potential solutions for those problems or design changes in Python packaging as a whole or in individual packaging tools.
The content is divided into "meta topics" and "key issues". Meta topics are mainly descriptions of aspects of Python packaging that are more or less inherent to the whole design of it, and consequences and limitations that follow from that. Key issues are more specific pain points felt by projects with native code. Key issues may also be more tractable to devise solutions or workarounds for.
How are these topics chosen and ranked?
The initial list of topics was constructed by soliciting input from ~25 people, who together are a representative subset of stakeholders:
- maintainers of widely used PyData projects like NumPy, scikit-learn, Apache Arrow, CuPy, Matplotlib, SciPy, H5py, Jupyter Hub and Spyder,
- maintainers of package repositories, package managers and build systems
(Pip, PyPI, Conda, Conda-forge, Spack, Nix,
pypa/build, Meson, and
- engineers from hardware vendors like Intel and NVIDIA,
- engineers responsible for deploying software for HPC users,
- educators and organisers of user groups (WiMLDS, SciPy Lectures, Data Umbrella),
Adding new topics and making changes to existing content on this site happens through community input on GitHub.
Where do potential solutions for these topics get discussed?
The central place for discussion potential changes to Python packaging is the packaging category of the Python Discourse forum. For smaller changes and ideas that are specific to a single tool, the issue tracker of the individual tool is most likely a good place to start.
For some relevant big-picture discussions about changes to Python packaging, which touched upon issues with native code frequently, see these threads:
- Wanting a singular packaging tool/vision (started Nov 2022),
- Python Packaging Strategy Discussion - Part 1 (Jan-Feb 2023),
- Build & package management concepts and terminology
- The multiple purposes of PyPI
- PyPI's author-led social model and its limitations
- Lack of a build farm for PyPI
- Expectations that projects provide ever more wheels
Native dependencies This is, by some distance, the most important issue. Several types of native dependencies are discussed in detail:
- Packaging projects with GPU code
- Metadata handling on PyPI
- Distributing a package containing SIMD code
- Unsuspecting users getting failing from source builds
- Cross compilation
All contributions are very welcome and appreciated! Ways to contribute include:
- Improving existing content on the website: extending or clarifying descriptions, adding relevant references, diagrams, etc.
- Providing feedback on existing content
- Proposing new topics for inclusion on the website, and writing the content for them
- ... and anything else you consider useful!
The content for this website is maintained on GitHub.
- Initial development of this website was sponsored by Intel,
- Initial development effort was led by Quansight Labs,
Created: December 20, 2022