This describes Python-specific guidance for checking code into //piper/third_party/py.
IMPORTANT: Read go/thirdparty first.
NOTE: Python packages are installed in subdirectories of //piper/third_party/py.
Overview
Adding third party code to google3 is relatively straight forward, but has a couple of additional steps to ensure requirements are met for legal and correctness reasons.
In brief, the overall process is:
- Follow the steps (see following sections) for importing your code. Be sure to follow go/pristinecopy.
- Create a CL and send for review
Adding from external sources of truth (github, etc)
This is for code that isn't coming from elsewhere in google3. This includes Github, BitBucket, PyPI, and Git-on-Borg.
IMPORTANT: You must follow go/pristinecopy. Your code will not be approved otherwise.
To import external code, follow these steps:
- Start with go/thirdparty/new. It will explain requirements common to all third party code.
- What is the source of truth?
- Github: use Puppy or manually setup Copybara.
- Bitbucket: manually setup Copybara.
- PyPI: manually import code (use Github if you can).
- Otherwise: manually import code.
- Do you need to modify the code to make it work in google3?
- If yes, how are you importing the code?
- Copybara/Puppy: Turn the modifications into Copybara transforms or Copybara patches.
- Otherwise: Your initial CL must not include the modifications. It is OK if this makes builds/tests fail. Any modifications must be done in followup CLs (go/fig makes this easy) using manual patches or direct modifications.
- If yes, how are you importing the code?
- Remove extraneous files.
- If you're using Puppy: Puppy will mostly take care of this, but there may be some additional patterns to ignore.
- What sort of Python package is the code?
- Regular package: Follow the directory structure for packages steps.
- Namespaced package: Follow the directory structure for namespace packages.
- Not a package: follow the non-package third party libraries steps
- Not sure: See Determining package type
- Follow Writing BUILD rules to create the library's BUILD rules.
- Does the code come with tests?
- If yes, what test framework does it use?
- pytest: Use the go/pytest_test rules.
- unittest/absltest: Use
py_test
rules. - Not sure: See figuring what test framework is used
- Otherwise, ask for advice in your code review.
- Otherwise: Follow Creating a basic test to ensure it builds and is importable.
- If yes, what test framework does it use?
- Does the code depend on other third party code?
- Yes: Those libraries must also be imported into
third_party/py
in their own CLs. See the Dependencies section for more information.
- Yes: Those libraries must also be imported into
- If the code creates C extensions, follow Building extension modules
- Follow other Common requirements.
Once your code structure is done, follow the steps for code review.
Adding from internal sources of truth
This is for code that is either developed directly in third_party or elsewhere in google3.
- Check that the package name (the
third_party/py
directory name) is unused in the global Python ecosystem. Check PyPI, Github, or other popular sites. IMPORTANT: Your Python package name must be unused in the external Python ecosystem. See Unique Package Names. - Follow other Common requirements.
Once your code structure is done, follow the steps for code review.
Externally unique package names
Because Python packages share a global namespace, packages need to have unique top-level names in the larger Python ecosytem. If two packages try to use the same name, it effectively makes it impossible for a program and its transitive dependencies to use both at the same time.
As an example of how this creates an impossible situation, consider a program,
app.py
: app.py
imports one
and two
. one
and two
are unrelated to
each other, owned by different people, and otherwise unaware and don't care
about the other. one
imports conflict
, intending to get Alice's conflict
library to do Alice stuff. two
also imports conflict
, but intends to get
Bob's conflict
library to do Bob stuff.
If app.py
depends on Alice's conflict
library, then two
doesn't work. If
it depends on Bob's, one
doesn't work. But app.py
needs both one
and
two
, otherwise it isn't very useful. So now app.py
is stuck, and one
and
two
are mutually exclusive with each other.
Because this problem applies transitively, the only real way to avoid it is for Alice's and Bob's library to have different top-level names.
NOTE: Your package name doesn't have to be absolutely, unequivocally globally unique. It just has to have a name that is unused to the best of your knowledge. There is no central authority for Python package name assignment.
TIP: If you're part of a larger project, you can use namespace packages to independently release sub-packages within a top-level package, thus avoiding any potential name collisions.
Common requirements
- Python 3 support: the code must support Python 3.
- As-installed file layout: the layout of the
.py
files must be the same as when installed. It's not uncommon for the source layout to be different than the installed layout because e.g. all the code is under asrc
sub-directory orsetup.py
moves files around. - Directory name matches import name: The
third_party/py
directory name must match the top-level import name. i.e.third_party/py/spam
must be imported asimport spam
. - Any additional go/thirdparty requirements.
Code Review
Once your code is ready, send a code review to third-party-*removed*
to have a
reviewer assigned automatically. A third_party/py/OWNERS reviewer will then
review your CL, verify it is properly imported, and approve your CL. The whole
process typically takes a couple days.
It is OK for building and testing to fail in the initial CL because go/pristinecopy prevents you from making fixes. Adding tests and build rules is still required, though, so that we can verify it is being setup correctly.
NOTE: Code review by third-party-removed is only required for the initial CL
adding a new package to third_party/py
. Subsequent CLs should follow the
regular Google review process (LGTM by anyone, Approval by package owner). The
auto-assigner will try to detect whether a package is new, but it may fail. If
it thinks your package is new when it isn't, you do not need to wait for review
before submitting. If you need or want third-party-removed review and the
auto-assigner isn't assigning anyone, send it to
emailremoved@ for review.
Using third-party packages that have already been installed
To use a module named PIL
, you need to add an import statement to your code,
and list it as a BUILD
dependency.
Code in my_main_binary.py
:
import PIL # Use module as normal
...
def MyFunc():
c = PIL.Image.open(...)
...
BUILD
rule:
py_binary(
name = "myprogram",
...
deps = [ ...
"//third_party/py/PIL:pil",
],
...
)
If you invoke a Python interpreter interactively, or otherwise run a Python program without going through the google3 build system, do not expect your imports to work. Use Blaze instead.
In the example above, PIL is actually a package, and Image is a module inside that package. If you are attempting to use a plain old single level module, you'd use import lines like this:
import SOAPpy
...
SOAPpy.foo(...)
and the corresponding BUILD
rule
deps = [ ...
"//third_party/py/SOAPpy",
],
Installing new third-party packages
Preferred method: install with Puppy
The preferred method for importing new Python packages is to use Puppy (go/puppy-python), a command line tool for gLinux. It will transform GitHub projects to third_party/py format and generate a go/copybara config to make future updates easier.
See go/puppy-python for detailed documentation. As a quick example, importing a
package hosted at https://github.com/google/example
can be done by running
(note that this needs the 'quilt' and 'python3-venv' debian packages installed):
blaze run //devtools/python/janitor/puppy -- \
--new https://github.com/google/example
This generates a CL importing the example
package into //third_party/py
, as
well as a Copybara configuration file which can be used to easily bring in
future updates, apply Google-specific patches, and have go/3pp-upgrade-service
register CaaS for you. You'll still need to write
the BUILD file yourself; see the BUILD section below for
details.
Creating a Copybara config
Setting up Copybara is less daunting than the large volume of docs and configuration options might imply. We highly recommend using Puppy to, at the least, generate a base Copybara config. Once generated, you can modify it as you please.
For detailed Copybara documentation, see go/copybara; for git-specific docs, see go/copybara-git-sot.
If you have to manually create a Copybara config, please use
git_to_third_party_py
macro unless
it doesn't work for your case. There are usually three things you need to do:
- Move and rename files: this is done using
core.move()
. - Apply source modifications: this is done using
transformations (e.g.
core.replace()
or patches, which are a special type of transformation). - Ignore extraneous files: this is done using the
exclude
parameter ofglob()
when passing in the list of files to include to thegit_files
parameter ofgit_to_third_party_py
.
Putting this all together, here is a very basic and minimal Copybara config to get you started:
load("//devtools/python/janitor/puppy/puppy", "git_to_third_party_py")
git_to_third_party_py(
git_origin_url = "https://github.com/project/spam",
# Your package will be imported to //third_party/py/spam.
python_package_name = "spam",
git_files = glob(
include = ["**"],
exclude = ["bad_spam.py", "docs/**", "samples/**", "setup.*"],
),
transformations = [
core.move("src", "")
],
patching_enabled = False,
version_selector = None,
git_ref = "main",
)
Modifying source code
It's not uncommon for third party code to need a few minor modifications.
There are several ways to do this:
- Copybara transformations: best for simple, regex based find-replace changes.
- Copybara patching: best for complicated modifications that a Copybara transformation can't do.
- Manually applied patches: best when you're not using Copybara.
- Directly modify the code: an option of last resort, and only if you're not using copybara.
BEST PRACTICE: Using Copybara is the best way to manage modifictions to third party source. The main advantage is that, when you later upgrade the library, the modifications will be re-applied and you don't have to figure out what was done months ago.
BEST PRACTICE: Avoid making changes just to satisfy our internal linter or formatter. Those changes unnecessarily make future upgrades difficult.
Copybara transformations
Copybara transformations are a quick and simple way to make sed-like modifications to the source code. Custom starlark code can also be used to create more advanced transformations.
See the Copybara API reference for detailed documentation about Copybara's API. We only list the ones you're most likely to need.
core.replace()
: regex-based find-replace transformation. It's ideal for, e.g., replacing problematic imports.core.move()
: Rename files and directories. Ideal for e.g., moving code out of asrc
sub-directory, renaming LICENSE files, and turning non-packages into packages.
Copybara patches
Copybara patches are ideal for complicated modifications that regular transformations can't do.
The basic way to do this is:
- Create a
patches
directory and add patch files to it (example). See Manual patches for how to generate and maintain patch files. - Create a
patches/series
file that lists the patch file names in the order to apply them (example) - Pass
patching_enabled = True
to thegit_to_third_party_py
macro. Also remember to:- Add the patch file to the CL.
- Add the patch file to the series file.
Copybara will then apply the patches when it imports code, after having applied the other Copybara transformations.
In the end, you should have a config similar to this:
load("//devtools/python/janitor/puppy/puppy", "git_to_third_party_py")
git_to_third_party_py(
...
patching_enabled = True,
)
Manual patches
Manual patches are ideal for when you can't use Copybara. Their main advantage is, because they record what changed, they can be easily re-applied to future imports of the third party code.
Patches are, by convention, kept in a patches
sub-directory of the third party
code.
To aid generating and managing patches, you can use go/qu4, which is a tool that will track changes to files and generate diffs for you.
Once you have patch file, you can apply it to the code, then send a separate CL (after the initial, pristine code import) as a direct modification
Direct modifications
Direct modifications of the source should be considered a last resort. The main disadvantage of them is they make later upgrades more difficult: the changes will be lost, and someone has to go through the change history to figure out what to reapply and how to reapply it.
In any case, doing this is simple: just modify the code and send a CL. It's strongly suggested to create manual patches for any changes so that later upgrades are easier.
IMPORTANT: Remember go/pristinecopy: source modifications are not allowed in the initial CL (Copybara excepted).
Dependencies
If your third-party package X depends on another third-party package Y, install
Y at the top level //third_party/py/Y
instead of trying to put it as a subdir
inside your //third_party/py/X
.
Directory structure
NOTE: When installing a new package, if possible please follow the preferred method section instead of generating the folder structure manually. This will allow you to automatically pull in future package updates, and reduce the amount of time you'll have to spend updating your CL to fix structural issues.
One of the main considerations when introducing new software in //piper/third_party/py is to ensure that the new software can be imported by other Python code remains the same inside Google as outside. This is an important concern so that software in //piper/third_party/py that depends on other software in //piper/third_party/py needn't be modified to reflect a Google-specific way of importing one of its dependencies.
For example, if the spam software is typically imported with:
import spam
from spam import bacon
it should work the same way inside google3.
Our Blaze build Python runtime (aka "hermetic Python") ensures that the built
third_party/py tree is in sys.path
. This ensures that statements like
import xyz
or from xyz import zzx
will find software from
//third_party/py/xyz
so long as it is in your binary or test's transitive
BUILD deps. The following sections explain how to make sure you install the
software in //piper/third_party/py in a way that will allow everything to work.
You will know that you've installed your software correctly into //piper/third_party if, after building a py_binary that depends on it, it can import and use your software in the same way the upstream examples do.
Determine Python package type
If it's not apparent how the Python code is packaged, here's some guidance on how to figure it out:
- Is it a single file not named
__init__
.py`? Then it's a non-package module. - Is all the code in a sub-directory? Then it's a namespace package
- Is there an
__init__.py
file? Then it's a regular package
Packages
When a third-party software spam is installed as a Python package (a directory
with an __init__.py
file), we just duplicate the package structure under
third_party/py/spam
and everything will automatically work. An example of this
structure would be:
google3/
third_party/
py/
spam/
BUILD
METADATA
OWNERS
__init__.py
bacon.py
You can recognize your software is being distributed as a package if it has an
__init__.py
file, accompanied by zero or more other Python files or binary
extensions.
With the above structure, the following would work:
import spam
from spam import bacon
Often, packages are distributed as source packages containing a Python
package: they will have a setup.py
file describing how to install the package,
and the actual Python package as a subdirectory (alongside the setup.py
file
or in a src
subdirectory). The Python package is the thing that needs to be
duplicated in //third_party/py
. The
preferred installation method takes care
of this automatically.
Namespace packages
Namespace packages are mostly treated as if they weren't namespaced.
- Add the code, as usual, as a sub-directory of its containing package.
- In the parent package's METADATA, set
third_party { type: GROUP }
.
For example, given a spam.eggs
namespaced package, the file layout should
resemble:
# Relative to third_party/py
spam/METADATA # type: GROUP
spam/eggs/METADATA
spam/eggs/OWNERS, BUILD, LICENSE, etc
spam/eggs/*.py, etc
Third-party software not distributed as a package
Some third-party libraries are not structured as Python packages: there will be
no __init__.py
file, and typically just one single Python source file, eg.
eggs.py
, that gets imported with import eggs
.
In this case, the library must be transformed into a package in order for the
Google machinery to work. You do that by creating a file named
//piper/thirdparty/py/eggs/init_.py, and placing the contents of eggs.py
inside it:
google3/
third_party/
py/
eggs/
BUILD
METADATA
OWNERS
__init__.py # Has the contents of eggs.py.
The following will work:
import eggs
It may also be the case that, in addition to eggs.py
, the software includes
some private helpers that are not meant to be imported by the user of the
software. For example, if in the case above a module _eggs.py
was also
included, it's fine to ship it in the same directory, thus:
google3/
third_party/
py/
eggs/
BUILD
METADATA
OWNERS
__init__.py # Has the contents of eggs.py.
_eggs.py
In this case, import _eggs
will not work except when in __init__.py
or other
files in that directory; but that shouldn't be a concern since it's a private
module.
Finally, if the software consists of several modules, eg. milk.py
and
chocolate.py
, all of which should be importable by the user as top-level
modules (that is, import milk, chocolate
should work), please get in touch
with third-party-removed to devise a sensible solution for your case. But this
would be very atypical.
Extraneous files
Because open source projects and google3 differ in their development tooling, open source code typically has many files that aren't relevant to google3. Since they go unused in google3, their presence is confusing.
In particular, remove files that appear to be part of the build/test process, including, but not limited to:
setup.py
,setup.cfg
,MANIFEST
,requirements.txt
et al- Makefiles, configure scripts, et al
- Project config files:
mypy.ini
,pytest.ini
,py.typed
,tox.ini
,pyproject.toml
,appveyor.yml
, etc - Dot files
It's recommended to also remove the following:
- Unused code, such as sample or example code.
- Unbuilt/unused binaries.
- Documentation. You may keep it if you wish, but usually they are only readable in source form because e.g., g3doc won't render third party docs.
See
Puppy's git_exclude
list
for more examples
- If you're using Copybara: add the patterns to origin's exclude patterns.
- If you're using Puppy: add the patterns to
git_exclude
Writing BUILD rules
Create a BUILD
file with a single py_library
rule and at least one test.
This will look like:
py_library(
name = "spam",
srcs = [
"__init__.py",
"bacon.py",
...
],
srcs_version = "PY3ONLY",
)
py_test(
name = "spam_test",
srcs = ["spam_test.py"],
srcs_version = "PY3ONLY",
python_version = "PY3",
deps = [
":spam",
"//testing/pybase"
],
)
The code must support Python 3. If it does not, expect strong pushback from your third-party-removed reviewer. Exceptions will be rare, likely involving you immediately taking on porting the code to PY3 in a child CL.
Determine test framework
If it's not apparent what test framework the third party code uses, here are some ways to figure it out:
- If
pytest
is imported somewhere, then it likely usespytest
. - If test files don't have an
if __name__ == '__main__': ...
block, then it likely uses pytest. - If
unitest.main()
is called, then it uses the stdlib's unittest. - If
absltest.main()
is called, then it uses absltest. - If
nose
is mentioned, then it likely uses nose. - If it has tests, and those tests have an
if __name__ == '__main__': ...
block, but it doesn't appear to use unittest or absltest, then it can probably be treated the same as if it was using unittest.
Creating a basic test
If the third party code lacks tests, then you need to create a basic test to ensure your targets build and can be imported. Here is a simple template to copy:
import spam
import unittest
class SpamTest(unittest.TestCase):
def test_basic(self):
self.assertTrue(spam.some_attribute)
if __name__ == '__main__':
unittest.main()
Building extension modules
Always build Python binary extensions using a py_extension
rule. This sets the
builds the library and its dependencies for proper loading within our Python
runtime. This ensures, for example, that binaries in google3 depending on two
packages in
//piper/third_party/py, that in turn depend themselves on OpenSSL, will only
load a single copy of OpenSSL.
Do not use cc_binary
or cc_library
to build Python binary extensions. If
you come across any documentation recommending that you do so, please contact
emailremoved@ for investigation.
Here's a sample BUILD
file for a Python library with one binary extension:
py_library(
name = "spam",
srcs = [
"__init__.py",
"bacon.py",
],
deps = [
":_eggs",
],
)
py_extension(
name = "_eggs",
outs = ["_eggs.so"],
srcs = [
"eggs.c",
"util.c",
],
deps = [
"//third_party/python_runtime:headers",
],
)
Some important notes:
//third_party/python_runtime:headers
is always needed as a dependency; this will load the version of Python associated with the Crosstool version in use.- if the binary extension requires some library to work, add it in
deps
, e.g.//third_party/openssl:crypto
. - if the C code requires some extra options for the compiler, you can use the
copts
attribute; however, you will need to add"$(PYTHON_EXTENSION_COPTS)"
to it, since that is the default for py_extension. - if the third-party software ships several binary extensions (several .so
files), and they all share some utility code in a common file (
util.c
, for example), do not include that file in the srcs attribute of each extension. Instead, create a separate cc_library with the utility code, and add it to the extensions as a dependency. (See an example in //piper/third_party/py/OpenSSL/BUILD)
Other gotchas
- Only
//third_party/python_runtime:headers
is needed as a dependency for Python extensions. In particular, Python extensions must never depend on//third_party/python_runtime:embedded_interpreter
, which brings in libpython itself: binary extensions will always be loaded into a process that embeds this library already (be it the Python interpreter, or some other process), and duplication would result in hard-to-diagnose crashes.
Precompiled extension modules
Being able to run Python code without going through the BUILD
system is
sometimes desired. However, this requires checking in compiled binaries for all
extension modules, which brings with it a high maintenance burden on the
packages' owners and on other teams (e.g. the compiler and Python teams.) Please
consider whether you really need this, as it's become exceedingly rare in
google3.
If there is a real need to provide precompiled extension modules, the code can be structured to make this possible. Do realize that you are committing yourself (and the other owners of your package) to regular maintenance to rebuild the package with newer compilers and Python versions. For each extension module, provide a precompiled version. Then provide Python code that, at run-time, will first try to locate the shared library in the build system, then fall back to a precompiled version.
Cython sources
When your package contains Cython sources, use go/cython-rules to compile from Cython sources to C/C++ code. Do not check-in pre-generated source files. they are only guaranteed to work in the Python version that was originally used to generate them.
Reviewer Checklist
- [ ] Initial checkin is pristine: go/pristinecopy
- [ ]
OWNERS
file lists at least two owners - [ ]
METADATA
'sthird_party.url.value
(type: GIT
) andcopy.bara.sky
'sgit.origin
refer to the same repository. - [ ] Has at least one usable target (e.g.
py_library
) and test (e.g.py_test
). - [ ] Tests call
unittest.main()
OR pytest build rules are used. - [ ] Directory name matches imported name.
- [ ] Import name is unclaimed in the Python ecosystem (only applies to to-be-open-sourced packages).
- [ ] File layout is the "as installed" layout.
The following can be copy/pasted into the CL description to aid validation and tracking on a per-cl basis:
Startblock:
has LGTM from http://linkremoved/
has tag PRISTINE
has tag METADATA_URL_MATCHES_COPYBARA_URL
has tag HAS_RULES_AND_TESTS
has tag DIRNAME_MATCHES_IMPORT_NAME
has tag IMPORT_NAME_OK
has tag FILE_LAYOUT_OK
WANT_LGTM=all
Add startblock
to the reviewers and to enforce the above. Startblock will then
wait for the above tags to be added (e.g. PRISTINE=yes
) before approving.