This describes R-specific guidance for checking code into //piper/third_party/R and for exporting R packages from //piper/third_party/R to GitHub repositories.
IMPORTANT: Read go/thirdparty first.
Before submitting new code see:
- //piper/third_party/OWNERS
- //piper/third_party/R/OWNERS
- //piper/third_party/README.md
as well as the rest of this file.
Introduction
This file describes how to add code to the //piper/third_party/R directory.
Third party code for the R language should go in Piper under
//piper/third_party/R. This makes it easier to keep track of third party code,
and ensure that we are in legal compliance with software licenses. For more
details about third-party code at Google, see go/thirdparty. Installing packages
from CRAN using install.packages
is possible but discouraged for reasons of
security, compatibility, ..., see
go/r-packages#installing-from-cran-is-discouraged.
The directory contains code for packages and for R itself, at:
//third_party/R/packages/PACKAGENAME/...
//third_party/R/bioconductor/PACKAGENAME/...
//third_party/R/R/...
There is an automated list of available packages at go/rdocs and go/rdocs-nonconf.
Licensing
By
R community standards,
use of one of the standard licenses are indicated by one of the following short
specifications in the 'License'
field of a package's DESCRIPTION
file:
- GPL-2
- GPL-3
- LGPL-2
- LGPL-2.1
- LGPL-3
- AGPL-3 (read go/agpl first)
- Artistic-2.0
- BSD_2_clause
- BSD_3_clause
- MIT
These specifications indicate a direct link to the corresponding license file at
http://linkremoved/ and in
//piper/third_party/R/R/R_3_4_1/share/licenses. If an R package indicates one of
these standard licenses in its description file, that is sufficient to satisfy
third_party
license requirements so long as you add the full text of the
pointed to LICENSE
in the top-level LICENSE
file and explain how you got it
in the METADATA
file. The section on
manually adding packages below
has a more in-depth discussion of these files and the layout of the package
directory.
The standard R licensing practice is an exception to the overall //piper/third_party policy, which requires either a license file or link in the upstream code. References to non-standard licenses without accompanying license text do not satisfy //piper/third_party requirements. If there's ever any confusion or dispute over this exception please email emailremoved@ to resolve.
Also, the last three licenses above are usually specified as "MIT + file
LICENSE", where the package-provided LICENSE
contains the copyright year and
copyright holder necessary to complete the information in the license template.
For an example, see the original license template in
//piper/third_party/R/R/R_3_4_1/share/licenses/MIT. This is acceptable as long
as there are no additional licensing conditions in the package-provided
LICENSE
file.
The import_from_cran
tool described in the following section checks
DESCRIPTION
files for the above licenses. If one is found, it copies the
appropriate LICENSE
file to the package's directory.
Automatically adding or updating third-party packages
There is a tool that mostly automates adding packages from the CRAN and Bioconductor repositories. Alias it with (you can add the same line to your .bashrc):
alias import_from_cran='/path/to/.../import_from_cran.Rar'
Next, navigate to a google3 root directory. You might also want to create a clean CITC client at the same time.
mkclient -f <packagename>
Then, to install (or update) a CRAN / Bioconductor package:
import_from_cran --package=<packagename>
If the package depends on other packages that have not already been imported, please import those first. The tool creates a CL (changelist); each package should be in a separate CL.
To import a package from a URL (e.g. from GitHub, GitLab, or CRAN archive):
import_from_cran --package=<packagename> --url=<download_url>
url
should point to a .zip
or .tar.gz
compression of the package.
For GitHub, you'll need a versioned copy of the package; see below.
Because the entire Bioconductor ecosystem is synchronized to release cycles, our internal repository of Bioconductor packages is synchronized to a single release. This ensures compatibility among imported packages, enabling smoother use and imports. Please see //piper/third_party/R/bioconductor/README.md for details.
After running the tool
When the import_from_cran
tool successfully finishes, it creates a critique
changelist containing your package. At this point, you can add the package as a
dependency to an r_interactive_session
within the CITC client that you used to
create the import. See go/r-packages#executables
Otherwise, the tool will describe the remaining manual steps that are needed. To make it possible to use this version of this package in other CITC clients and for other Googlers to use it, send the package for review with:
g4 mail -c <the_new_changelist_number> -m third-party-*removed*
go/gwsq will automatically assign reviewers to your cl, depending on the files changed. Reviewers will be selected from http://linkremoved/ and http://linkremoved/. When multiple reviewers are assigned to the CL, please wait for everyone to LGTM before submitting.
Unit tests
We strongly encourage adding unit tests for all newly imported packages. BUILD
file of the package to make sure that the internal unit tests pass. The
import_from_cran
will start this process for you, by adding an
r_test
target to the
automatically-generated BUILD file, if it detects that the package has tests.
This is often only the first step, and you will need to add some missing
components to the target before the tests will pass. This includes:
- other dependencies that the test might require
- possible test data that wasn't automatically identified
- test files in non-standard locations, like within the
inst
directory
Certain test patterns that are common outside of Google will fail when executed
on go/forge. The most common is these are tests that write a file to the current
working test directory. You can skip these tests with the helper function,
testthat::skip_on_google()
. See examples
here
and
here.
You might also choose to modify the execution of the test to get it to pass. See
below.
If package needs modification
Some packages will need modification before they can be submitted. When importing a new package that requires changes in imported files, first create a "pristine copy" CL without modifications, and with BUILD rules commented out. It should look something like this.
Importing from GitHub {#github}
Most packages not on CRAN or Bioconductor are available on GitHub. Our automatic import tool can handle these packages too, but special attention needs to be paid to the version of the package imported. To comply with go/oneversion, we cannot directly download from the master archive.
Instead, use an archive matching a release tag or a commit; the former is preferred.
- On the main page of the repository, click Releases.
- Copy the URL corresponding to either the zip or tar archive under the most recent release.
For example, to install version 0.7.4 of dplyr, copy the link for the
appropriate archive from the package's
releases page and use the
following command. The link you copied goes in the url
field below:
import_from_cran --package=dplyr \
--url=https://github.com/tidyverse/dplyr/archive/v0.7.4.zip
When you cannot use a release,
type the y
key
while on the repo main page to get the most recent commit. Use the link to the
archive provided by the green Clone or download as the URL for the package.
If the most recent commit is inappropriate, select the correct commit from the
repo's Commits Page, click browse files and get the archive URL from the
same green button from the repo's main page.
Here's an example of installing dplyr using a recent commit instead of a release.
import_from_cran --package=dplyr \
--url=https://github.com/tidyverse/dplyr/archive/887c239de0f51ada5dde631532f39d01cb823ab4.zip
To view a repo on GitHub using a known commit hash or tag, e.g. using the value
in the "version" field in a METADATA file, append the tag or hash to
https://github.com/<organization>/<repo>/tree/
. For example,
# This is the repo for dplyr v0.7.4
https://github.com/tidyverse/dplyr/tree/v0.7.4
# This is the repo for ggplot2 at a specific commit
https://github.com/tidyverse/dplyr/tree/95ec2a4179a78f83daedaaf23cdacdde49eaf62f
Manually adding a package from CRAN
To add a package foo from CRAN:
Create a Piper client that includes //piper/third_party/R/packages.
g4 client -a //piper/.../... && g4 sync
or
git5 start --import-empty third_party/R/packages/zipcode
Download the package source to //piper/third_party/R/packages/foo/foo.
mkdir third_party/R/packages/foo && cd third_party/R/packages/foo
(use web search to find the package and download the
foo.tgz
)tar xzvf foo.tgz
Create four additional files under //piper/third_party/R/packages/foo:
BUILD
LICENSE
METADATA
OWNERS
Test that the package installs; from the google3 directory, run:
blaze test third_party/R/packages/foo:foo_load_test
In R, load the package using
library(foo)
and test that it works.Create a CL and request approval:
g4 mail -m third-party-*removed*,third-party-*removed*
After receiving approval from both groups, submit the CL.
Rules for the BUILD
file are at
Building R Packages and Binaries.
You will probably find it simplest to mimic an existing package. See
//piper/third_party/R/packages/bit/BUILD for a simple example for a package that
includes both R and C code.
The LICENSE
file should be a copy of the license file from the package
original package, if there is one. Otherwise, the DESCRIPTION
file in the
package should describe the license, e.g. GPL-2
. In that case, include the
standard GPL-2
license.
The OWNERS
file must list at least two full-time employees; that is typically
you and at least one more person from your team.
The METADATA
file should document the package and can be auto-built at
go/thirdparty/metadata. See akima
for a simple example for a package with no
local modifications.
For more formal requirements for these files, see go/thirdparty.
Manually updating a package in //third_party/R
This is for packages with the nested structure, with no real local modifications. If there are local modifications then you may need a versioned subdirectory and a few more steps.
cd third_party/R/packages/foo
rm -rf foo
/path/to/.../updatemd -version $VERSION METADATA
tar xzvf ~/Downloads/new-foo.tgz
g4 edit `g4 diff -se ...`
g4 delete `g4 diff -sd ...`
g4 add `g4 nothave`
Manually update BUILD
.
Removing packages
Unused code is considered "dead," and eliminating it is a part of maintaining good code health. See go/deadcode. About every two years, we go through a large exercise of deleting unused packages and updating the packages that we want to keep around.
To tell if a package is unused, we look up the BUILD files for packages that are checked in to google3. We exclude experimental, third_party and ranklab jumbo package importer from this search. We also ask users to identify the packages that they might be using by sharing a sheet with the broader community. Here is the 2018 version of the sheet.
The data in the sheet is generated by running this script: //piper/.../unused_packages.R
Manually removing an unused package
Under most normal circumstances, the biennial cleanup process handles all unused packages. You don't need to delete anything yourself. The primary exception is when the package OWNERS change and no googlers are available to continue to maintain the package.
If that's the case, then
Check the package dependencies with go/deps or //piper/.../package_dependencies.R. If there are package dependencies, ask the OWNERS of those dependencies to become OWNERS (and maintainers) of the package you seek to delete.
Check go/toolsearch to see if there are any active users of the package. If so, ask them to be the OWNERS of the package.
Otherwise, delete the package with the following shell commands.
# To delete package foo:
g4 delete third_party/R/packages/foo/...
# Check if foo is in the following two files; if so remove it there:
g4 edit quality/ranklab/ipy_ext/BUILD
g4 edit analysis/common/r/build_defs/BUILD
# Check if that breaks anything using the following command
presubmit -p all --email --detach -c <cl-number>
Exporting an R package to GitHub
To export a Google-authored R package to a GitHub repository, move your code into //piper/third_party/R/packages, following go/releasing. Follow the directory structure for other third-party R packages:
//piper/third_party/R/packages/package-name should contain intra-Google-specific files, e.g.,
BUILD
,METADATA
,OWNERS
, and a copy of the LICENSE file.//piper/third_party/R/packages/package-name/package-name should contain the R package code suitable for export including a LICENSE file.
Externally-available packages need to be installable without the support of the Google build system. Some of the differences between third-party R packages and intra-Google R packages include:
Complete a full
DESCRIPTION
file. Look at//piper/third_party/R/packages/*/*/DESCRIPTION
for examples.Add
NAMESPACE
andman
files:- Start an R-google runtime:
blaze run -c opt //analysis/common/r/release_tools:r_google
. - Type
rglib::mkclient("third_party/R/packages/package-name/package-name")
to get to the root directory of your package. - Run
roxygen2::roxygenize()
to generate all of these files.
Make sure
NAMESPACE
andman/*
are editable before running the commands.- Start an R-google runtime:
If you package contains compiled code, you will need to add the appropriate configuration files and interface code. See Configure and Cleanup in Writing R Extensions.
For more information
- R Extensions Manual
- R Development In google3: go/r-development
- Creating and Installing Google R Packages go/r-install and go/r-packages
- Building R Packages in google3 go/rlang/getting-started/old_building