Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dist: set -eu, fix shellcheck, make reproducible and smaller tarballs #13299

Closed
wants to merge 16 commits into from

Conversation

vszakats
Copy link
Member

@vszakats vszakats commented Apr 5, 2024

  • set bash -eu and fix fallouts.
  • fix shellcheck warnings.
  • set and use SOURCE_DATE_EPOCH for reproducibility.
    Authored-by: Daniel J. H.
    Ref: dist: add SOURCE_DATE_EPOCH env var to maketgz script #13280
  • set TZ=UTC and LC_ALL=C for reproducibility.
  • make file timestamps in tarball/zip reproducible.
  • make directory timestamps in zip reproducible.
  • make timestamps of tarballs/zip reproducible.
  • make file order in tarball/zip reproducible.
  • omit extra file metadata from zip for reproducibility.
  • use maximum zip compression.
  • use POSIX ustar tarball format to avoid supply chain vulnerability:
    https://seclists.org/oss-sec/2021/q4/0
  • make uid/gid in tarball reproducible.
  • omit owner user/group names from tarball for reproducibility and privacy.
  • omit current timestamp from .gz header for reproducibility.
  • display SHA-256 hashes of produced tarballs/zip.
  • fix whitespace.

.tar.gz also became smaller in the process: 4,462,311 -> 4,148,249 bytes (8.7.1)

Requires GNU tar, GNU date, sha256sum.

Ref: #13250
Closes #13299

@vszakats vszakats added the build label Apr 5, 2024
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vszakats
Copy link
Member Author

vszakats commented Apr 5, 2024

Haven't found yet where the .tar file is created — depending on how its done
(and if customization is possible), there are some options we could add
to improve its reproducibility too. I've been using those in curl-for-win for a while.

@bagder
Copy link
Member

bagder commented Apr 5, 2024

The tarball is made by the make -sj dist command. The dist target is created by automake.

@vszakats
Copy link
Member Author

vszakats commented Apr 5, 2024

The tarball is made by the make -sj dist command. The dist target is created by automake.

That was my fear :) I'll have to look...

After looking:

tardir=curl-- && ${TAR-tar} chof - "$tardir" | [...]
  • the tar format isn't set, so it's not necessarily ustar. On my macOS, it is, but the official tarball isn't (it says 'non-POSIX', probably GNU).
    It should be ustar (or maybe pax) to avoid supply chain vulnerability: https://seclists.org/oss-sec/2021/q4/0.
  • file order is non-reproducible.
  • local uid/gid are spilling into the tarball. Repro issue.
  • local user/group names may be spilling into the tarball. (On my macOS. In the official tarball this is fortunately not the case.) Privacy and repro issue.

As for how to tell autotools to do the right thing, I have not explored yet. I assume it's impossible or at least painful.

edit:

  • it is managed by [...]/automake/1.16.5/share/aclocal-1.16/tar.m4 on my machine.
  • one option is to untar and repack. Costs extra build work and logic to maintain for us, plus may make portability trickier.

Ref: https://mgorny.pl/articles/portability-of-tar-features.html

@vszakats vszakats changed the title maketgz: set -eu, fix shellcheck warnings, smaller & reproducible .zip maketgz: set -eu, shellcheck fixes, reproducibility, smaller .zip Apr 5, 2024
@bagder
Copy link
Member

bagder commented Apr 5, 2024

I think we can settle with shipping the content reproducible as start, and not the tarball files themselves.

@bagder
Copy link
Member

bagder commented Apr 5, 2024

one option is to untar and repack. Costs extra build work and logic to maintain for us

If we want the tarball images themselves reproducible I figure that's a rather easy thing to add.

@vszakats
Copy link
Member Author

vszakats commented Apr 5, 2024

FWIW curl-for-win uses this:

TZ=UTC tar --create \
  --format=ustar \
  --owner=0 --group=0 --numeric-owner \
  --files-from "${_FLS}"

where ${_FLS} lines are reproducibly ordered.

Also needs to install and remap gtar to tar on macOS for example. (Possibly bsdtar can be setup to create the desired output, though this was simpler/safer.)

I'd vote to make the tar images reproducible, if we're tackling this part anyway. (in a separate PR though)

By easy, which method would you go?

@vszakats vszakats changed the title maketgz: set -eu, shellcheck fixes, reproducibility, smaller .zip dist: set -eu, fix shellcheck, reproducibility, shrink .zip in maketgz Apr 5, 2024
@bagder
Copy link
Member

bagder commented Apr 5, 2024

By easy, which method would you go?

I figure untaring and retaring with the proper options should work fine. Then we can let make dist work as it does and get reproducibility by our own means.

@bagder
Copy link
Member

bagder commented Apr 5, 2024

Maybe like this?

diff --git a/maketgz b/maketgz
index 602f1071b..089d1bfb8 100755
--- a/maketgz
+++ b/maketgz
@@ -174,10 +174,28 @@ res=$?
 if test "$res" != 0; then
     echo "make dist failed"
     exit 2
 fi
 
+retar() {
+  tempdir=$1
+  rm -rf $tempdir
+  mkdir $tempdir
+  cd $tempdir
+  gzip -dc ../$targz | tar -xf -
+  find curl-* | sort > files
+  tar --create --format=ustar --owner=0 --group=0 --numeric-owner --files-from files -zf out.tar.gz
+  rm files
+  mv out.tar.gz ../
+  cd ..
+  rm -rf $tempdir
+}
+
+retar ".tarbuild"
+echo "replace $targz with out.tar.gz"
+mv out.tar.gz "$targz"
+
 ############################################################################
 #
 # Now make a bz2 archive from the tar.gz original
 #
 

@vszakats
Copy link
Member Author

vszakats commented Apr 5, 2024

Oh well, make dist fails all over the place inside autotools on macOS. Let's stick to Linux and GNU tools.

- set file timestamps in tarball/zip reproducible.
- use POSIX tar format (ustar) to avoid supply chain vulnerability:
  https://seclists.org/oss-sec/2021/q4/0
- make file order in tarball reproducible.
- make uid/gid in tarball reproducible.
- omit owner user/group names from tarball for reproducibility and privacy.
- set file timestamps of tarballs/zip reproducible.
- omit current timestamp from .gz header for reproducibility.

.tar.gz also became smaller in the process: 4462311 -> 4148249 bytes

Requires GNU tar, GNU date.
@vszakats vszakats changed the title dist: set -eu, fix shellcheck, reproducibility, shrink .zip in maketgz dist: set -eu, fix shellcheck, make source tarballs reproducible and smaller in maketgz Apr 6, 2024
@vszakats
Copy link
Member Author

vszakats commented Apr 6, 2024

Source tarballs and zip should be fully reproducible now. (Assuming SOURCE_DATE_EPOCH is set to e.g. the tagged commit's timestamp, e.g. with git log -1 '--format=%ct' curl-8_7_1)

Couldn't test maketgz as a whole. Let me know of any issues.

Testing on macOS needs this snippet (plus manually nugding autotools with some command, then copying a missing autotools file into the build directory):

case "$(uname)" in
  Darwin*)
    date() { gdate "$@"; }
    tar() { gtar "$@"; }
    ;;
esac

@vszakats vszakats changed the title dist: set -eu, fix shellcheck, make source tarballs reproducible and smaller in maketgz dist: set -eu, fix shellcheck, make source tarballs reproducible and smaller Apr 6, 2024
@vszakats vszakats requested a review from bagder April 6, 2024 20:21
@vszakats vszakats changed the title dist: set -eu, fix shellcheck, make source tarballs reproducible and smaller dist: set -eu, fix shellcheck, make tarballs reproducible and smaller Apr 7, 2024
maketgz Outdated Show resolved Hide resolved
@vszakats vszakats changed the title dist: set -eu, fix shellcheck, make tarballs reproducible and smaller dist: set -eu, fix shellcheck, reproducible and smaller tarballs Apr 7, 2024
@vszakats vszakats changed the title dist: set -eu, fix shellcheck, reproducible and smaller tarballs dist: set -eu, fix shellcheck, make reproducible and smaller tarballs Apr 7, 2024
@vszakats vszakats closed this in 860cd5f Apr 7, 2024
@vszakats vszakats deleted the maketgz-improve branch April 7, 2024 22:29
vszakats added a commit to vszakats/curl that referenced this pull request Apr 7, 2024
vszakats added a commit that referenced this pull request Apr 8, 2024
Perl remains required for the tarball build process.

Follow-up to 860cd5f #13299

Reviewed-by: Daniel Stenberg
Closes #13310
maketgz Show resolved Hide resolved
vszakats added a commit that referenced this pull request Apr 9, 2024
In the initial implementation of reproducible tarballs, they were
missing directory entries, while .zip archives had them. It meant
that on extracting the tarball, on-disk directory entries got the
current timestamp.

This patch fixes this by including directory entries in the tarball,
with reproducible timestamps. It also moves sorting inside tar,
to ensure reproducible directory entry timestamps on extract
(without the need of `--delay-directory-restore` option, when
extracting with GNU tar. BSD tar got that right by default.)

GNU tar 1.28 (2014-07-28) introduced `--sort=`.

Ref: #13299 (comment)
Follow-up to 860cd5f #13299
Closes #13322
@vszakats vszakats added the dist label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants