Why you should use a Go module proxy

After the introduction of Go modules, people started to advocate using a Go module proxy. After researching the pros and cons, I've concluded that this is one of the most important changes in recent years. But why is that the case?

Why you should use a Go module proxy

After the introduction of Go modules, I thought everything I need to know was finalized. I quickly realized this wasn't the case. Recently, people started to advocate using a Go module proxy.  After researching the pros and cons, I've concluded that this is one of the most important changes in recent years. But why is that the case? What makes Go module proxy so special?

In Go modules, if you add a new dependency or build your Go module on a machine with clean cache, it'll (go get) download all the dependencies based on go.mod and will cache it for further operations. You can bypass the cache (and with that downloading dependencies) by using a vendor/ folder and constructing go use this folder with the -mod=vendor flag.

But both of these approaches are not perfect and we can do better.

Problems of (not) using the vendor/ folder

Here are some disadvantages if you use the vendor/ folder:

  • vendor/ folder is not used by default anymore for the go command (in module-aware mode). If you don't append the -mod=vendor flag, it'll not respect it. This leads often to frustrations and leads to other hacky solutions to support older Go versions (see: Using Go modules with vendor support on Travis CI)
  • vendor/ folder, especially for big monorepos, takes a lot of space. This increases the amount of time spent cloning a repo. Even if you think that cloning is done only once, this is mostly not true. CI/CD systems often clone the repository for every trigger (e.g Pull Requests). So in the long term, this leads to longer build times and affects everyone on the team.
  • Vendoring a new dependency often leads to changes that are difficult to review. And most of the times you have to bundle your dependencies with your actual business logic, which makes it hard to go over the changes.

What if you skip the vendor/ folder?  This also doesn't help because now you're dealing with these issues:

  • go will attempt to download the dependencies from the source repositories. But there is always a risk that any dependency might disappear in the future (remember the left-pad saga).
  • The VCS might be down (e.g github.com). In this case, you won't be able to build your project anymore.
  • Some companies don't want to have any outgoing connections outside their internal network. Removing the vendor/ folder is therefore no-go for them.
  • Suppose a dependency is published as v1.3.0 and go get fetches it and caches it locally. In the meantime, the owner of the dependency can compromise the repo by pushing malicious content with the same tag. If your Go module is rebuilt on a machine with a clean cache, it'll now use the compromised package. To protect against this, you need to store the go.sum file alongside the go.mod file.
  • Some of the dependencies use a different VCS than git and therefore depend on other tools such as hg (Mercurial), bzr (Bazaar) or svn (Subversion). Not all these tools are installed on your host (or in your Dockerfile), which often leads to frustrations.
  • go get needs to fetch the source code of each dependency listed in go.mod to resolve transitive dependencies (it needs their go.mod file). This slows down the whole build process significantly as it means it has to download (e.g git clone) each repository just to fetch a single file.

How can we improve this situation?

Advantages of using a Go module proxy

By default the go command downloads modules from VCS's directly. The GOPROXY environment variable allows further control over the download source.  The environment variable configures the go command to use a Go module proxy.

By setting the GOPROXY environment variable to a Go module proxy, you can overcome all of the disadvantages listed above:

  • The Go module proxy is by default caching and storing all the dependencies forever (in immutable storage). This means you don't have to use any vendor/ folder anymore.
  • Getting rid of the vendor/ folder means your projects won't take space in your repository.
  • Because the dependencies are stored in immutable storage, even if a dependency disappears from the internet, you're protected against it.
  • It's not possible to override or delete a Go module once it's stored in the Go proxy. This protects you against actors who might inject malicious code with the same version.
  • You don't require any VSC tools anymore to download the dependencies because the dependencies are served over HTTP (Go proxy uses HTTP under the hood).
  • It's significantly faster to download and build your Go module because Go proxy serves the source code (.zip archive) and go.mod independently over HTTP.  This causes the downloads to take less time and faster (due to less overhead) compared to fetching from a VCS. Resolving dependencies is also faster because the go.mod can be fetched independently (whereas before it had to fetch the whole repository).  The Go team tested it andthey saw a 3x speedup on fast networks and 6x on slow networks!
  • You can easily run your own Go proxy, which gives you more control over the stability of your build pipeline and protects against the rare cases when the VCS is down.

As you see, using a Go module proxy is a win for everyone. But how do we use it? What if you don't want to maintain your own Go module proxy? Let us look into many alternative options.

How to use a Go module proxy

To start using a Go module proxy, we need to set the GOPROXY environment variable to a compatible Go module proxy. There are multiple ways:

1.) If GOPROXY is unset, empty or set to direct then go get will use a direct connection to the VCS (e.g github.com):

GOPROXY=""
GOPROXY=direct

It can be also set to off, which means no network use is allowed.

GOPROXY=off

2.) You can start using a public Go proxy. One of your options is to use the Go proxy from the Go team (which is run by Google). More information can be found here: https://proxy.golang.org/

To start using it, all you have is to set the environment variable:

GOPROXY=https://proxy.golang.org

Other public proxies are:

GOPROXY=https://goproxy.io
GOPROXY=https://goproxy.cn # proxy.golang.org is blocked in China, this proxy is not

3.) You can run several open source implementations and host it yourself. Some of these are:

You need to maintain these yourself. It's up to you if you want to serve it over the public internet or on your internal network.

4.) You can buy a commercial offering:

5.) You can pass a file:/// URL. Because a Go module proxy is a web server that responds to GET requests (with no query parameters), a folder in any filesystem can be also used to serve as a Go module proxy.

Upcoming Go v1.13 changes

There will be some changes regarding Go proxy in the Go v1.13 version which I think should be highlighted:

  1. the GOPROXY environment variable may now set to comma-separated list. It'll try the first proxy before falling back to the next path.
  2. The default value of GOPROXY will be https://proxy.golang.org,direct.  Anything after the direct token is ignored. This also means that go get will now by default use GOPROXY.  If you don't want to use Go proxy at all, you need to set it to off.
  3. A new GOPRIVATE environment variable is introduced, that contains a comma-separated list of glob patterns. This can be used to bypass the GOPROXY proxy for certain paths, especially private modules in your company (e.g: GOPRIVATE=*.internal.company.com).

All of these changes indicate how the Go module proxy is a central and important part of Go modules.

Verdict

Using GOPROXY both via public and private networks has a lot of advantages. It's a great feature that works seamlessly with the go command. Giving the fact that it has so many advantages (secure, fast, storage efficient) it would be wise to embrace it quickly for your projects or in your organization. Also, with Go v1.13, it'll be enabled by default, which is another welcoming step improving the state of dependency management in Go.