8 minute read

Categories:
Tags:

GitHub Packages is a Maven compatible repository accessible outside of GitHub. It serves as the code repository used in Java project compilation both on workstations and within a CI/CD pipeline, as well as allowing manual file downloads through the GitHub web interface. Because it is meant for only these 2 purposes there is no REST API available making custom integrations more difficult than need be. This article documents the URLs exposed through Maven which can be used to create an API of simple HTTP commands. URLs to browse packages and download files will be covered, as well as steps to more effectively use free tier resources allowed on private repositories.

Private Repository Free-Tier Limits

GitHub has separate pricing tiers (or caps) for private repositories. Public repos are generally free for virtually all actions, while private repos generally have a free limited use, and beyond those limits requests are blocked unless paid for. The private tier limits are currently:

Artifact Storage Data transfer within a GitHub Action Maven traffic to GitHub Packages
500MB Unlimited 1GB per Month

Artifact Caching to Reduce Transfers

Unless paying for managed services is within budget, best practice is to run a local Maven repository proxy such as JFrog Artifactory or Sonotype Nexus to minimize network use. Whether utilizing internal self-managed, or an externally managed service such as GitHub Packages when budget allows, network use can easy to exceed 1GB/month. Frequent releases, deep dependency graphs and large number of developers all contribute to growing this significantly. Another practical reason to choose a self-hosted repository is the increased flexibility in managing allow/deny lists of acceptable packages. Open-source software has a solid track record of security, but untrustworthy authors, abandoned projects, and the very nature of community contributed code make ongoing audit the external code dependencies a necessity.

Bypassing Transfer Limits

GitHub Packages has restrictive transfer limits, but an alternative exists. GitHub will allow unlimited transfers within a GitHub Action, so transferring files during CI/CD is a practical workaround.

Egress alternatives with GitHub CI/CD using GitHub Actions and GitHub Packages
Egress alternatives with GitHub CI/CD using GitHub Actions and GitHub Packages

This is further written about in Data Transfers and Egress within a GitHub Action.

GitHub Packages Maven via URL

Maven is based around an HTTP API, and all network operations are performed through simple HTTP requests. This exposes a foundation to build a capable custom API for uses beyond Maven functionality.

Browsing Available Versions

This article will document useful Maven URLs such that they can be made without Maven installed and directly made using wget or curl.

All URLs in this document require an Authorization: token GITHUB_TOKEN HTTP header

Maven packaging defines 3 fields for every package:


<dependency>
    <groupId>{groupId}</groupId>
    <artifactId>{artifactId}</artifactId>
    <version>{version}</version>
</dependency>

Maven exposes XML files as REST URLs, and the GitHub Packages artifact URLs have the form:

https://maven.pkg.github.com/{githubUser}/{githubRepository}/{groupId}/{artifactId}/maven-metadata.xml

For example, a groupId of ca.stevenskelton and artifactId of build-action-file-receiver-assembly has the URL:

https://maven.pkg.github.com/stevenrskelton/build-action-file-receiver/ca/stevenskelton/build-action-file-receiver-assembly/maven-metadata.xml

These URLs expose XML metadata which can construct all other REST URLs for the repository artifacts. An example of the XML is:


<metadata>
    <groupId>ca.stevenskelton</groupId>
    <artifactId>build-action-file-receiver-assembly</artifactId>
    <versioning>
        <latest>1.0.18</latest>
        <versions>
            <version>0.1.0-SNAPSHOT</version>
            <version>1.0.0</version>
            <version>1.0.1</version>
            <version>1.0.18</version>
        </versions>
        <lastUpdated>20240213012802</lastUpdated>
    </versioning>
</metadata>

The schema defines key fields:


<metadata>
    <groupId>{groupId}</groupId>
    <artifactId>{artifactId}</artifactId>
    <versioning>
        <latest>{version}</latest>
        <versions>
            <version>{version}</version>
        </versions>
        <lastUpdated>{time}</lastUpdated>
    </versioning>
</metadata>

SNAPSHOT releases

There is a special case for SNAPSHOT release versions. Snapshot versions accommodate rapidly evolving releases by generating auto-incremented composite version numbers on each subsequent publish into Maven. The version consists of an initial fixed number plus an iteration, with the form version-SNAPSHOT where the version is fixed and the SNAPSHOT part creates the unique timestamp/iteration for the release. This complexity requires an additional step to resolve all varying SNAPSHOT parts, and for this Maven exposes the URL:

https://maven.pkg.github.com/{githubUser}/{githubRepository}/{groupId}/{artifactId}/{version-SNAPSHOT}/maven-metadata.xml

So for version 0.1.0-SNAPSHOT the URL would be:

https://maven.pkg.github.com/stevenrskelton/build-action-file-receiver/ca/stevenskelton/build-action-file-receiver-assembly/0.1.0-SNAPSHOT/maven-metadata.xml

An example of the XML document for this URL is:


<metadata modelVersion="">
    <groupId>ca.stevenskelton</groupId>
    <artifactId>build-action-file-receiver-assembly</artifactId>
    <version>0.1.0-SNAPSHOT</version>
    <versioning>
        <snapshot>
            <timestamp>20230330.234307</timestamp>
            <buildNumber>29</buildNumber>
        </snapshot>
        <lastUpdated>20230330234315</lastUpdated>
        <snapshotVersions>
            <snapshotVersion>
                <extension>jar.md5</extension>
                <value>0.1.0-20230329.004700-13</value>
                <updated>20230329004708</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom</extension>
                <value>0.1.0-20230329.004700-13</value>
                <updated>20230329004709</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom.sha1</extension>
                <value>0.1.0-20230329.004700-13</value>
                <updated>20230329004710</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom.md5</extension>
                <value>0.1.0-20230329.004700-13</value>
                <updated>20230329004711</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>jar</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234311</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>jar.sha1</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234312</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>jar.md5</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234313</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234314</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom.sha1</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234315</updated>
            </snapshotVersion>
            <snapshotVersion>
                <extension>pom.md5</extension>
                <value>0.1.0-20230330.234307-29</value>
                <updated>20230330234315</updated>
            </snapshotVersion>
        </snapshotVersions>
    </versioning>
</metadata>

This example XML has two 0.1.0-SNAPSHOT releases:

  • 0.1.0-20230329.004700-13
  • 0.1.0-20230330.234307-29

The document schema used at this URL exposes the fields:


<metadata>
    <groupId>{groupId}</groupId>
    <artifactId>{artifactId}</artifactId>
    <version>{version}</version>
    <versioning>
        <snapshot>
            <timestamp>{YYYYmmDD.HHMMSS}</timestamp>
            <buildNumber>{iteration}</buildNumber>
        </snapshot>
        <lastUpdated>{YYYYmmDDHHMMSS}</lastUpdated>
        <snapshotVersions>
            <snapshotVersion>
                <extension>{extension}</extension>
                <value>{version-wo-SNAPSHOT}-{YYYYmmDD.HHMMSS}-{iteration}</value>
                <updated>{YYYYmmDDHHMMSS}</updated>
            </snapshotVersion>
        </snapshotVersions>
    </versioning>
</metadata>

This <value> tag provides us with the composite version numbers to we need to reference SNAPSHOT releases in other Maven URLs based around version numbers.

Downloading Artifacts using wget

The direct download URL for all Maven artifacts (jar, sha1, md5, pom) are generated using the version field from maven-metadata.xml, or in the case of SNAPSHOT releases, the value field from version/maven-metadata.xml.

The wget command for a jar is of the form:

wget -d --header="Authorization: token {GITHUB_TOKEN}" \
 https://maven.pkg.github.com/{githubUser}/{githubRepository}/{groupId}/{artifactId}/{version}/{artifactId}-{value}.jar

So for a 1.0.18 release the wget command is:

wget -d --header="Authorization: token {GITHUB_TOKEN}" \
 https://maven.pkg.github.com/stevenrskelton/build-action-file-receiver/ca/stevenskelton/build-action-file-receiver/1.0.18/build-action-file-receiver-assembly-1.0.18.jar

These URLs are visible in GitHub Action logs whenever artifacts are published to GitHub Packages using Maven.

GitHub API Does Not Expose Download URLs

It would be ideal to avoid processing XML and use the GitHub API JSON. This is not possible because the GitHub API only exposes a surrogate identifier (which it also calls version) which is different from the Maven artifact version.

For example, the following GitHub API URL exposes only surrogate version identifiers, which are not helpful in accessing GitHub Packages via Maven URLs:

https://api.github.com/users/{githubUser}/packages/maven/{groupId}/{artifactId}/versions

Returns:

[
  {
    "id": 18648727,
    "name": "{version-different-from-github-packages}",
    "url": "https://api.github.com/users/{githubUser}/packages/maven/{groupId}/{artifactId}/versions/18648727",
    "package_html_url": "https://github.com/{githubUser}/{githubRepository}/packages/1371196",
    "created_at": "2022-04-19T00:16:27Z",
    "updated_at": "2022-04-19T00:16:30Z",
    "html_url": "https://github.com/{githubUser}/{githubRepository}/packages/1371196?version={version}",
    "metadata": {
      "package_type": "maven"
    }
  }
]

Here 18648727 is a GitHub Packages version and 1371196 is a GitHub Packages package identifier for use in GitHub Actions, both unused by Maven.

Downloading Artifacts using Maven (mvn)

There are a few mvn plugins that can be run outside a configured project, directly from any command prompt. One is mvn dependency:copy. This maven plugin command can be used instead of wget or curl to download a jar file:

mvn dependency:copy \
  -Dartifact={groupId}:{artifactId}:{version} \
  -DoutputDirectory=. \
  -DrepositoryId=github '
  --global-settings settings.xml

The benefit here is the input parameter for a separate settings.xml. This file can be used to contain Maven credentials to repository, useful if the wget or curl command won’t have access to the GITHUB_TOKEN or if this command will be executed within a GitHub Action similar to how Scala SBT Publishing to GitHub Packages publishes Maven artifacts using mvn deploy:deploy-file.


<settings>
    <profiles>
        <profile>
            <id>github</id>
            <repositories>
                <repository>
                    <id>github</id>
                    <url>https://maven.pkg.github.com/${GITHUB_REPOSITORY}</url>
                    <snapshots>
                        <enabled>true</enabled>
                    </snapshots>
                </repository>
            </repositories>
        </profile>
    </profiles>
    <servers>
        <server>
            <id>github</id>
            <username>${GITHUB_REPOSITORY_OWNER}</username>
            <password>${GITHUB_TOKEN}</password>
        </server>
    </servers>
</settings>

Alternative Uses of MD5 Files

The primary consumer of network traffic are large binary artifacts due to their size. Maven also is a repository for smaller files: XML metadata and multiple hashcode associated to the artifacts. Downloading these smaller files would have negligible contribution to network use while providing significant value.

The XML metadata forms the basis for search/indexing functionality of the binary artifacts. Its alternative uses have been the topic of the article thus far; but the hashcode also provides interesting functional opportunities.

Network Transfer File Integrity

At their core a hashcode such as md5 and sha1 is a quick ways to verify file contents and integrity using a small number of bytes. In multi-hop or cloud situations, every transfer and intermediate storage poses a risk for file corruption or file confusion. Research has shown TCP checksums will fail to detect errors for roughly 1 in 16 million to 10 billion packets. If MTU is 1500 bytes, a worse case average would be undetected transfer error in every 24GB. File copy utilities typically have hashcode validation built in to address this vulnerability.

CI/CD Pipeline File Integrity

Even when network transfers are successful, the hashcode represents an intrinsic property of the files preserved across file renaming. Any CI/CD pipeline configured to produce artifacts with a specific naming rather than automatically generated creates a possibility for version confusion. This can be seen when QA testing fails during a release; a release candidate for version x.x.x is produced, fails QA, and a second patched version of x.x.x artifacts are created. There are now 2 separate artifacts with identical filenames in existence.

CI/CD integrity validation using GitHub Packages MD5 hashcode
CI/CD integrity validation using GitHub Packages MD5 hashcode

The root cause of version confusion is attempting to preserve branch name as artifact filename. If a release-x.x.x git branch produces x.x.x artifacts there is nothing linking artifacts to a particular commit. In addition to artifact hashcode, another approach is to embed git sha into the artifact as another intrinsic property. In SBT, this can be done at compile-time using SBT sourceManaged key. It represents a Seq[File] of all source-code to be compiled. It is straight-forward to append custom generated Scala sources, to be externally exposed as a help command-line parameter or an HTTP health-check endpoint.

import scala.sys.process._
import sbt.TupleSyntax.t4ToTable4
import java.time.ZonedDateTime
import java.time.format.DateTimeFormatter.ISO_ZONED_DATE_TIME

Compile / sourceGenerators += (Compile / sourceManaged, name, version, scalaVersion).map {
  case (sourceDirectory, name, version, scalaVersion) =>
    val file = sourceDirectory / "SbtBuildInfo.scala"
    val gitSha = "git rev-parse HEAD".!!.trim
    val gitBranch = "git rev-parse --abbrev-ref HEAD".!!.trim
    val buildTime = ZonedDateTime.now.format(ISO_ZONED_DATE_TIME)
    IO.write(file, """package ca.stevenskelton.buildactionfilereceiver
                     |object SbtBuildInfo {
                     |  val name = "%s"
                     |  val version = "%s"
                     |  val scalaVersion = "%s"
                     |  val gitSha = "%s"
                     |  val gitBranch = "%s"
                     |  val buildTime = "%s"
                     |}
                     |""".stripMargin.format(name, version, scalaVersion, gitSha, gitBranch, buildTime))
    Seq(file)
}.taskValue

File Integrity Independent of User Authorization Zones

Another possible use for hashcode verification is as a centralized file authentication. When CI/CD, DEV and PROD are administered under separate user authentication paradigms, it may be easier to unify all user-permissions to a central source. Artifact hashcodes are a secure file integrity mechanism unburdened from maintaining synchronization to user auth systems. Publicly available artifact hashcodes don’t represent confidential information an can allowing for a simplified implementation; though GitHub Packages would need a proxy layer working around a GITHUB_TOKEN requirement.

File verification across different user authentication systems
File verification across different user authentication systems
Categories:
Tags:
Updated: