You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
garble/main.go

2389 lines
74 KiB
Go

// Copyright (c) 2019, The Garble Authors.
// See LICENSE for licensing information.
5 years ago
// garble obfuscates Go code by wrapping the Go toolchain.
5 years ago
package main
import (
"bufio"
"bytes"
"cmp"
cryptorand "crypto/rand"
"encoding/base64"
"encoding/binary"
"encoding/gob"
"encoding/json"
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
"errors"
5 years ago
"flag"
"fmt"
"go/ast"
"go/importer"
"go/parser"
"go/token"
"go/types"
"go/version"
"io"
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
"io/fs"
"log"
mathrand "math/rand"
5 years ago
"os"
"os/exec"
"path/filepath"
"regexp"
"runtime"
"runtime/debug"
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
"strconv"
"strings"
"time"
"unicode"
"unicode/utf8"
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
"github.com/rogpeppe/go-internal/cache"
"golang.org/x/exp/maps"
"golang.org/x/exp/slices"
"golang.org/x/mod/module"
"golang.org/x/tools/go/ast/astutil"
"golang.org/x/tools/go/ssa"
"mvdan.cc/garble/internal/ctrlflow"
"mvdan.cc/garble/internal/linker"
"mvdan.cc/garble/internal/literals"
5 years ago
)
var flagSet = flag.NewFlagSet("garble", flag.ContinueOnError)
5 years ago
var (
flagLiterals bool
flagTiny bool
flagDebug bool
flagDebugDir string
flagSeed seedFlag
// TODO(pagran): in the future, when control flow obfuscation will be stable migrate to flag
flagControlFlow = os.Getenv("GARBLE_EXPERIMENTAL_CONTROLFLOW") == "1"
)
func init() {
flagSet.Usage = usage
flagSet.BoolVar(&flagLiterals, "literals", false, "Obfuscate literals such as strings")
flagSet.BoolVar(&flagTiny, "tiny", false, "Optimize for binary size, losing some ability to reverse the process")
flagSet.BoolVar(&flagDebug, "debug", false, "Print debug logs to stderr")
flagSet.StringVar(&flagDebugDir, "debugdir", "", "Write the obfuscated source to a directory, e.g. -debugdir=out")
flagSet.Var(&flagSeed, "seed", "Provide a base64-encoded seed, e.g. -seed=o9WDTZ4CN4w\nFor a random seed, provide -seed=random")
}
var rxGarbleFlag = regexp.MustCompile(`-(?:literals|tiny|debug|debugdir|seed)(?:$|=)`)
type seedFlag struct {
random bool
bytes []byte
}
func (f seedFlag) present() bool { return len(f.bytes) > 0 }
func (f seedFlag) String() string {
return base64.RawStdEncoding.EncodeToString(f.bytes)
}
func (f *seedFlag) Set(s string) error {
if s == "random" {
f.random = true // to show the random seed we chose
f.bytes = make([]byte, 16) // random 128 bit seed
if _, err := cryptorand.Read(f.bytes); err != nil {
return fmt.Errorf("error generating random seed: %v", err)
}
} else {
// We expect unpadded base64, but to be nice, accept padded
// strings too.
s = strings.TrimRight(s, "=")
seed, err := base64.RawStdEncoding.DecodeString(s)
if err != nil {
return fmt.Errorf("error decoding seed: %v", err)
}
// TODO: Note that we always use 8 bytes; any bytes after that are
// entirely ignored. That may be confusing to the end user.
if len(seed) < 8 {
return fmt.Errorf("-seed needs at least 8 bytes, have %d", len(seed))
}
f.bytes = seed
}
return nil
}
5 years ago
func usage() {
fmt.Fprintf(os.Stderr, `
Garble obfuscates Go code by wrapping the Go toolchain.
5 years ago
garble [garble flags] command [go flags] [go arguments]
For example, to build an obfuscated program:
garble build ./cmd/foo
Similarly, to combine garble flags and Go build flags:
garble -literals build -tags=purego ./cmd/foo
The following commands are supported:
build replace "go build"
test replace "go test"
run replace "go run"
reverse de-obfuscate output such as stack traces
version print the version and build settings of the garble binary
To learn more about a command, run "garble help <command>".
garble accepts the following flags before a command:
5 years ago
`[1:])
flagSet.PrintDefaults()
fmt.Fprintf(os.Stderr, `
For more information, see https://github.com/burrowers/garble.
`[1:])
5 years ago
}
func main() { os.Exit(main1()) }
var (
// Presumably OK to share fset across packages.
fset = token.NewFileSet()
sharedTempDir = os.Getenv("GARBLE_SHARED")
parentWorkDir = os.Getenv("GARBLE_PARENT_WORK")
)
const actionGraphFileName = "action-graph.json"
type importerWithMap struct {
importMap map[string]string
importFrom func(path, dir string, mode types.ImportMode) (*types.Package, error)
}
wrap types.Importer to canonicalize import paths The docs for go/importer.ForCompiler say: The lookup function is called each time the resulting importer needs to resolve an import path. In this mode the importer can only be invoked with canonical import paths (not relative or absolute ones); it is assumed that the translation to canonical import paths is being done by the client of the importer. We use a lookup func for two reasons: first, to support modules, and second, to be able to use our information from "go list -json -export". However, go/types does not canonicalize import paths before calling ImportFrom. This is somewhat understandable; it doesn't know whether an importer was created with a lookup func, and ImportFrom only requires the input path to be canonicalized in that scenario. When the lookup func is nil, the importer canonicalizes by itself via go/build.Import. Before this change, the added crossbuild test would fail: > garble build net/http [stderr] # vendor/golang.org/x/crypto/chacha20 typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher") # vendor/golang.org/x/text/secure/bidirule typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors") # vendor/golang.org/x/crypto/cryptobyte typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1") # vendor/golang.org/x/text/unicode/norm typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8") This is because we'd fall back to importer.Default, which only knows how to find packages in $GOROOT/pkg. Those are missing for cross-builds, unsurprisingly, as those built archives end up in the build cache. After this change, we properly support importing std-vendored packages, so we can get rid of the importer.Default workaround. And, by extension, cross-builds now work as well. Note that, in the added test script, the full build of the binary fails, as there seems to be some sort of linker problem: > garble build [stderr] # test/main d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined We leave that as a TODO for now, as this change is subtle enough as it is.
3 years ago
func (im importerWithMap) Import(path string) (*types.Package, error) {
wrap types.Importer to canonicalize import paths The docs for go/importer.ForCompiler say: The lookup function is called each time the resulting importer needs to resolve an import path. In this mode the importer can only be invoked with canonical import paths (not relative or absolute ones); it is assumed that the translation to canonical import paths is being done by the client of the importer. We use a lookup func for two reasons: first, to support modules, and second, to be able to use our information from "go list -json -export". However, go/types does not canonicalize import paths before calling ImportFrom. This is somewhat understandable; it doesn't know whether an importer was created with a lookup func, and ImportFrom only requires the input path to be canonicalized in that scenario. When the lookup func is nil, the importer canonicalizes by itself via go/build.Import. Before this change, the added crossbuild test would fail: > garble build net/http [stderr] # vendor/golang.org/x/crypto/chacha20 typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher") # vendor/golang.org/x/text/secure/bidirule typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors") # vendor/golang.org/x/crypto/cryptobyte typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1") # vendor/golang.org/x/text/unicode/norm typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8") This is because we'd fall back to importer.Default, which only knows how to find packages in $GOROOT/pkg. Those are missing for cross-builds, unsurprisingly, as those built archives end up in the build cache. After this change, we properly support importing std-vendored packages, so we can get rid of the importer.Default workaround. And, by extension, cross-builds now work as well. Note that, in the added test script, the full build of the binary fails, as there seems to be some sort of linker problem: > garble build [stderr] # test/main d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined We leave that as a TODO for now, as this change is subtle enough as it is.
3 years ago
panic("should never be called")
}
func (im importerWithMap) ImportFrom(path, dir string, mode types.ImportMode) (*types.Package, error) {
if path2 := im.importMap[path]; path2 != "" {
wrap types.Importer to canonicalize import paths The docs for go/importer.ForCompiler say: The lookup function is called each time the resulting importer needs to resolve an import path. In this mode the importer can only be invoked with canonical import paths (not relative or absolute ones); it is assumed that the translation to canonical import paths is being done by the client of the importer. We use a lookup func for two reasons: first, to support modules, and second, to be able to use our information from "go list -json -export". However, go/types does not canonicalize import paths before calling ImportFrom. This is somewhat understandable; it doesn't know whether an importer was created with a lookup func, and ImportFrom only requires the input path to be canonicalized in that scenario. When the lookup func is nil, the importer canonicalizes by itself via go/build.Import. Before this change, the added crossbuild test would fail: > garble build net/http [stderr] # vendor/golang.org/x/crypto/chacha20 typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher") # vendor/golang.org/x/text/secure/bidirule typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors") # vendor/golang.org/x/crypto/cryptobyte typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1") # vendor/golang.org/x/text/unicode/norm typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8") This is because we'd fall back to importer.Default, which only knows how to find packages in $GOROOT/pkg. Those are missing for cross-builds, unsurprisingly, as those built archives end up in the build cache. After this change, we properly support importing std-vendored packages, so we can get rid of the importer.Default workaround. And, by extension, cross-builds now work as well. Note that, in the added test script, the full build of the binary fails, as there seems to be some sort of linker problem: > garble build [stderr] # test/main d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined We leave that as a TODO for now, as this change is subtle enough as it is.
3 years ago
path = path2
}
return im.importFrom(path, dir, mode)
}
func importerForPkg(lpkg *listedPackage) importerWithMap {
return importerWithMap{
importFrom: importer.ForCompiler(fset, "gc", func(path string) (io.ReadCloser, error) {
pkg, err := listPackage(lpkg, path)
if err != nil {
return nil, err
}
return os.Open(pkg.Export)
}).(types.ImporterFrom).ImportFrom,
importMap: lpkg.ImportMap,
}
wrap types.Importer to canonicalize import paths The docs for go/importer.ForCompiler say: The lookup function is called each time the resulting importer needs to resolve an import path. In this mode the importer can only be invoked with canonical import paths (not relative or absolute ones); it is assumed that the translation to canonical import paths is being done by the client of the importer. We use a lookup func for two reasons: first, to support modules, and second, to be able to use our information from "go list -json -export". However, go/types does not canonicalize import paths before calling ImportFrom. This is somewhat understandable; it doesn't know whether an importer was created with a lookup func, and ImportFrom only requires the input path to be canonicalized in that scenario. When the lookup func is nil, the importer canonicalizes by itself via go/build.Import. Before this change, the added crossbuild test would fail: > garble build net/http [stderr] # vendor/golang.org/x/crypto/chacha20 typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/chacha20/chacha_generic.go:10:2: could not import crypto/cipher (can't find import: "crypto/cipher") # vendor/golang.org/x/text/secure/bidirule typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:12:2: could not import errors (can't find import: "errors") # vendor/golang.org/x/crypto/cryptobyte typecheck error: /usr/lib/go/src/vendor/golang.org/x/crypto/cryptobyte/asn1.go:8:16: could not import encoding/asn1 (can't find import: "encoding/asn1") # vendor/golang.org/x/text/unicode/norm typecheck error: /usr/lib/go/src/vendor/golang.org/x/text/unicode/norm/composition.go:7:8: could not import unicode/utf8 (can't find import: "unicode/utf8") This is because we'd fall back to importer.Default, which only knows how to find packages in $GOROOT/pkg. Those are missing for cross-builds, unsurprisingly, as those built archives end up in the build cache. After this change, we properly support importing std-vendored packages, so we can get rid of the importer.Default workaround. And, by extension, cross-builds now work as well. Note that, in the added test script, the full build of the binary fails, as there seems to be some sort of linker problem: > garble build [stderr] # test/main d9rqJyxo.uoqIiDs5: relocation target runtime.os9A16A3 not defined We leave that as a TODO for now, as this change is subtle enough as it is.
3 years ago
}
// uniqueLineWriter sits underneath log.SetOutput to deduplicate log lines.
// We log bits of useful information for debugging,
// and logging the same detail twice is not going to help the user.
// Duplicates are relatively normal, given that names tend to repeat.
type uniqueLineWriter struct {
out io.Writer
seen map[string]bool
}
func (w *uniqueLineWriter) Write(p []byte) (n int, err error) {
if !flagDebug {
panic("unexpected use of uniqueLineWriter with -debug unset")
}
if bytes.Count(p, []byte("\n")) != 1 {
return 0, fmt.Errorf("log write wasn't just one line: %q", p)
}
if w.seen[string(p)] {
return len(p), nil
}
if w.seen == nil {
w.seen = make(map[string]bool)
}
w.seen[string(p)] = true
return w.out.Write(p)
}
// debugSince is like time.Since but resulting in shorter output.
// A build process takes at least hundreds of milliseconds,
// so extra decimal points in the order of microseconds aren't meaningful.
func debugSince(start time.Time) time.Duration {
return time.Since(start).Truncate(10 * time.Microsecond)
}
5 years ago
func main1() int {
defer func() {
if os.Getenv("GARBLE_WRITE_ALLOCS") != "true" {
return
}
var memStats runtime.MemStats
runtime.ReadMemStats(&memStats)
fmt.Fprintf(os.Stderr, "garble allocs: %d\n", memStats.Mallocs)
}()
5 years ago
if err := flagSet.Parse(os.Args[1:]); err != nil {
return 2
}
log.SetPrefix("[garble] ")
log.SetFlags(0) // no timestamps, as they aren't very useful
if flagDebug {
// TODO: cover this in the tests.
log.SetOutput(&uniqueLineWriter{out: os.Stderr})
} else {
log.SetOutput(io.Discard)
}
5 years ago
args := flagSet.Args()
if len(args) < 1 {
usage()
return 2
5 years ago
}
// If a random seed was used, the user won't be able to reproduce the
// same output or failure unless we print the random seed we chose.
// If the build failed and a random seed was used,
// the failure might not reproduce with a different seed.
// Print it before we exit.
if flagSeed.random {
fmt.Fprintf(os.Stderr, "-seed chosen at random: %s\n", base64.RawStdEncoding.EncodeToString(flagSeed.bytes))
}
if err := mainErr(args); err != nil {
if code, ok := err.(errJustExit); ok {
return int(code)
}
fmt.Fprintln(os.Stderr, err)
return 1
}
return 0
}
type errJustExit int
func (e errJustExit) Error() string { return fmt.Sprintf("exit: %d", e) }
func goVersionOK() bool {
const (
minGoVersion = "go1.22" // the first major version we support
maxGoVersion = "go1.23" // the first major version we don't support
)
// rxVersion looks for a version like "go1.2" or "go1.2.3" in `go env GOVERSION`.
rxVersion := regexp.MustCompile(`go\d+\.\d+(?:\.\d+)?`)
toolchainVersionFull := sharedCache.GoEnv.GOVERSION
sharedCache.GoVersion = rxVersion.FindString(toolchainVersionFull)
if sharedCache.GoVersion == "" {
// Go 1.15.x and older did not have GOVERSION yet; they are too old anyway.
fmt.Fprintf(os.Stderr, "Go version is too old; please upgrade to %s or newer\n", minGoVersion)
return false
}
if version.Compare(sharedCache.GoVersion, minGoVersion) < 0 {
fmt.Fprintf(os.Stderr, "Go version %q is too old; please upgrade to %s or newer\n", toolchainVersionFull, minGoVersion)
return false
}
if version.Compare(sharedCache.GoVersion, maxGoVersion) >= 0 {
fmt.Fprintf(os.Stderr, "Go version %q is too new; Go linker patches aren't available for %s or later yet\n", toolchainVersionFull, maxGoVersion)
return false
}
// Ensure that the version of Go that built the garble binary is equal or
// newer than cache.GoVersionSemver.
builtVersionFull := cmp.Or(os.Getenv("GARBLE_TEST_GOVERSION"), runtime.Version())
builtVersion := rxVersion.FindString(builtVersionFull)
if builtVersion == "" {
// If garble built itself, we don't know what Go version was used.
// Fall back to not performing the check against the toolchain version.
return true
}
if version.Compare(builtVersion, sharedCache.GoVersion) < 0 {
fmt.Fprintf(os.Stderr, `
garble was built with %q and can't be used with the newer %q; rebuild it with a command like:
go install mvdan.cc/garble@latest
`[1:], builtVersionFull, toolchainVersionFull)
return false
}
return true
}
func mainErr(args []string) error {
command, args := args[0], args[1:]
// Catch users reaching for `go build -toolexec=garble`.
if command != "toolexec" && len(args) == 1 && args[0] == "-V=full" {
return fmt.Errorf(`did you run "go [command] -toolexec=garble" instead of "garble [command]"?`)
}
switch command {
case "help":
if hasHelpFlag(args) || len(args) > 1 {
fmt.Fprintf(os.Stderr, "usage: garble help [command]\n")
return errJustExit(2)
}
if len(args) == 1 {
return mainErr([]string{args[0], "-h"})
}
usage()
return errJustExit(2)
case "version":
if hasHelpFlag(args) || len(args) > 0 {
fmt.Fprintf(os.Stderr, "usage: garble version\n")
return errJustExit(2)
}
info, ok := debug.ReadBuildInfo()
if !ok {
// The build binary was stripped of build info?
// Could be the case if garble built itself.
fmt.Println("unknown")
return nil
}
mod := &info.Main
if mod.Replace != nil {
mod = mod.Replace
}
// For the tests.
if v := os.Getenv("GARBLE_TEST_BUILDSETTINGS"); v != "" {
var extra []debug.BuildSetting
if err := json.Unmarshal([]byte(v), &extra); err != nil {
return err
}
info.Settings = append(info.Settings, extra...)
}
// Until https://github.com/golang/go/issues/50603 is implemented,
// manually construct something like a pseudo-version.
// TODO: remove when this code is dead, hopefully in Go 1.22.
if mod.Version == "(devel)" {
var vcsTime time.Time
var vcsRevision string
for _, setting := range info.Settings {
switch setting.Key {
case "vcs.time":
// If the format is invalid, we'll print a zero timestamp.
vcsTime, _ = time.Parse(time.RFC3339Nano, setting.Value)
case "vcs.revision":
vcsRevision = setting.Value
if len(vcsRevision) > 12 {
vcsRevision = vcsRevision[:12]
}
}
}
if vcsRevision != "" {
mod.Version = module.PseudoVersion("", "", vcsTime, vcsRevision)
}
}
fmt.Printf("%s %s\n\n", mod.Path, mod.Version)
fmt.Printf("Build settings:\n")
for _, setting := range info.Settings {
if setting.Value == "" {
continue // do empty build settings even matter?
}
// The padding helps keep readability by aligning:
//
// veryverylong.key value
// short.key some-other-value
//
// Empirically, 16 is enough; the longest key seen is "vcs.revision".
fmt.Printf("%16s %s\n", setting.Key, setting.Value)
}
return nil
case "reverse":
return commandReverse(args)
case "build", "test", "run":
cmd, err := toolexecCmd(command, args)
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
defer func() {
if err := os.RemoveAll(os.Getenv("GARBLE_SHARED")); err != nil {
fmt.Fprintf(os.Stderr, "could not clean up GARBLE_SHARED: %v\n", err)
}
// skip the trim if we didn't even start a build
if sharedCache != nil {
fsCache, err := openCache()
if err == nil {
err = fsCache.Trim()
}
if err != nil {
fmt.Fprintf(os.Stderr, "could not trim GARBLE_CACHE: %v\n", err)
}
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
}()
if err != nil {
return err
}
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
log.Printf("calling via toolexec: %s", cmd)
return cmd.Run()
case "toolexec":
_, tool := filepath.Split(args[0])
if runtime.GOOS == "windows" {
tool = strings.TrimSuffix(tool, ".exe")
}
transform := transformMethods[tool]
transformed := args[1:]
if transform != nil {
startTime := time.Now()
log.Printf("transforming %s with args: %s", tool, strings.Join(transformed, " "))
// We're in a toolexec sub-process, not directly called by the user.
// Load the shared data and wrap the tool, like the compiler or linker.
if err := loadSharedCache(); err != nil {
return err
}
if len(args) == 2 && args[1] == "-V=full" {
return alterToolVersion(tool, args)
}
var tf transformer
toolexecImportPath := os.Getenv("TOOLEXEC_IMPORTPATH")
tf.curPkg = sharedCache.ListedPackages[toolexecImportPath]
if tf.curPkg == nil {
return fmt.Errorf("TOOLEXEC_IMPORTPATH not found in listed packages: %s", toolexecImportPath)
}
tf.origImporter = importerForPkg(tf.curPkg)
var err error
if transformed, err = transform(&tf, transformed); err != nil {
return err
}
log.Printf("transformed args for %s in %s: %s", tool, debugSince(startTime), strings.Join(transformed, " "))
} else {
log.Printf("skipping transform on %s with args: %s", tool, strings.Join(transformed, " "))
}
executablePath := args[0]
if tool == "link" {
modifiedLinkPath, unlock, err := linker.PatchLinker(sharedCache.GoEnv.GOROOT, sharedCache.GoEnv.GOVERSION, sharedCache.CacheDir, sharedTempDir)
if err != nil {
return fmt.Errorf("cannot get modified linker: %v", err)
}
defer unlock()
executablePath = modifiedLinkPath
os.Setenv(linker.MagicValueEnv, strconv.FormatUint(uint64(magicValue()), 10))
os.Setenv(linker.EntryOffKeyEnv, strconv.FormatUint(uint64(entryOffKey()), 10))
if flagTiny {
os.Setenv(linker.TinyEnv, "true")
}
log.Printf("replaced linker with: %s", executablePath)
}
cmd := exec.Command(executablePath, transformed...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return err
5 years ago
}
return nil
default:
return fmt.Errorf("unknown command: %q", command)
5 years ago
}
}
func hasHelpFlag(flags []string) bool {
for _, f := range flags {
switch f {
case "-h", "-help", "--help":
return true
}
}
return false
}
// toolexecCmd builds an *exec.Cmd which is set up for running "go <command>"
// with -toolexec=garble and the supplied arguments.
//
// Note that it uses and modifies global state; in general, it should only be
// called once from mainErr in the top-level garble process.
func toolexecCmd(command string, args []string) (*exec.Cmd, error) {
// Split the flags from the package arguments, since we'll need
// to run 'go list' on the same set of packages.
flags, args := splitFlagsFromArgs(args)
if hasHelpFlag(flags) {
out, _ := exec.Command("go", command, "-h").CombinedOutput()
fmt.Fprintf(os.Stderr, `
usage: garble [garble flags] %s [arguments]
This command wraps "go %s". Below is its help:
%s`[1:], command, command, out)
return nil, errJustExit(2)
}
for _, flag := range flags {
if rxGarbleFlag.MatchString(flag) {
return nil, fmt.Errorf("garble flags must precede command, like: garble %s build ./pkg", flag)
}
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// Here is the only place we initialize the cache.
// The sub-processes will parse it from a shared gob file.
sharedCache = &sharedCacheType{}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
// Note that we also need to pass build flags to 'go list', such
// as -tags.
sharedCache.ForwardBuildFlags, _ = filterForwardBuildFlags(flags)
if command == "test" {
sharedCache.ForwardBuildFlags = append(sharedCache.ForwardBuildFlags, "-test")
}
if err := fetchGoEnv(); err != nil {
return nil, err
}
if !goVersionOK() {
return nil, errJustExit(1)
}
var err error
sharedCache.ExecPath, err = os.Executable()
if err != nil {
return nil, err
}
// Always an absolute directory; defaults to e.g. "~/.cache/garble".
if dir := os.Getenv("GARBLE_CACHE"); dir != "" {
sharedCache.CacheDir, err = filepath.Abs(dir)
if err != nil {
return nil, err
}
} else {
parentDir, err := os.UserCacheDir()
if err != nil {
return nil, err
}
sharedCache.CacheDir = filepath.Join(parentDir, "garble")
}
binaryBuildID, err := buildidOf(sharedCache.ExecPath)
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
2 years ago
if err != nil {
return nil, err
}
sharedCache.BinaryContentID = decodeBuildIDHash(splitContentID(binaryBuildID))
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
2 years ago
if err := appendListedPackages(args, true); err != nil {
return nil, err
}
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
sharedTempDir, err = saveSharedCache()
if err != nil {
return nil, err
}
os.Setenv("GARBLE_SHARED", sharedTempDir)
wd, err := os.Getwd()
if err != nil {
return nil, err
}
os.Setenv("GARBLE_PARENT_WORK", wd)
if flagDebugDir != "" {
if !filepath.IsAbs(flagDebugDir) {
flagDebugDir = filepath.Join(wd, flagDebugDir)
}
if err := os.RemoveAll(flagDebugDir); err != nil {
return nil, fmt.Errorf("could not empty debugdir: %v", err)
}
if err := os.MkdirAll(flagDebugDir, 0o755); err != nil {
return nil, err
}
}
use fewer build flags when building std or cmd When we use `go list` on the standard library, we need to be careful about what flags are passed from the top-level build command, because some flags are not going to be appropriate. In particular, GOFLAGS=-modfile=... resulted in a failure, reproduced via the GOFLAGS variable added to linker.txtar: go: inconsistent vendoring in /home/mvdan/tip/src: golang.org/x/crypto@v0.5.1-0.20230203195927-310bfa40f1e4: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod golang.org/x/net@v0.7.0: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod golang.org/x/sys@v0.5.1-0.20230208141308-4fee21c92339: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod golang.org/x/text@v0.7.1-0.20230207171107-30dadde3188b: is marked as explicit in vendor/modules.txt, but not explicitly required in go.mod To ignore the vendor directory, use -mod=readonly or -mod=mod. To sync the vendor directory, run: go mod vendor To work around this problem, reset the -mod and -modfile flags when calling "go list" on the standard library, as those are the only two flags which alter how we load the main module in a build. The code which builds a modified cmd/link has a similar problem; it already reset GOOS and GOARCH, but it could similarly run into problems if other env vars like GOFLAGS were set. To be on the safe side, we also disable GOENV and GOEXPERIMENT, which we borrow from Go's bootstrapping commands.
1 year ago
goArgs := append([]string{command}, garbleBuildFlags...)
// Pass the garble flags down to each toolexec invocation.
// This way, all garble processes see the same flag values.
// Note that we can end up with a single argument to `go` in the form of:
//
// -toolexec='/binary dir/garble' -tiny toolexec
//
// We quote the absolute path to garble if it contains spaces.
// We can add extra flags to the end of the same -toolexec argument.
var toolexecFlag strings.Builder
toolexecFlag.WriteString("-toolexec=")
quotedExecPath, err := cmdgoQuotedJoin([]string{sharedCache.ExecPath})
if err != nil {
// Can only happen if the absolute path to the garble binary contains
// both single and double quotes. Seems extremely unlikely.
return nil, err
}
toolexecFlag.WriteString(quotedExecPath)
appendFlags(&toolexecFlag, false)
toolexecFlag.WriteString(" toolexec")
goArgs = append(goArgs, toolexecFlag.String())
if flagControlFlow {
goArgs = append(goArgs, "-debug-actiongraph", filepath.Join(sharedTempDir, actionGraphFileName))
}
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
if flagDebugDir != "" {
// In case the user deletes the debug directory,
// and a previous build is cached,
// rebuild all packages to re-fill the debug dir.
goArgs = append(goArgs, "-a")
}
if command == "test" {
// vet is generally not useful on obfuscated code; keep it
// disabled by default.
goArgs = append(goArgs, "-vet=off")
}
goArgs = append(goArgs, flags...)
goArgs = append(goArgs, args...)
return exec.Command("go", goArgs...), nil
}
var transformMethods = map[string]func(*transformer, []string) ([]string, error){
"asm": (*transformer).transformAsm,
"compile": (*transformer).transformCompile,
"link": (*transformer).transformLink,
}
func (tf *transformer) transformAsm(args []string) ([]string, error) {
flags, paths := splitFlagsFromFiles(args, ".s")
// When assembling, the import path can make its way into the output object file.
if tf.curPkg.Name != "main" && tf.curPkg.ToObfuscate {
flags = flagSetValue(flags, "-p", tf.curPkg.obfuscatedImportPath())
}
flags = alterTrimpath(flags)
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
// The assembler runs twice; the first with -gensymabis,
// where we continue below and we obfuscate all the source.
// The second time, without -gensymabis, we reconstruct the paths to the
// obfuscated source files and reuse them to avoid work.
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
newPaths := make([]string, 0, len(paths))
if !slices.Contains(args, "-gensymabis") {
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
for _, path := range paths {
name := hashWithPackage(tf.curPkg, filepath.Base(path)) + ".s"
pkgDir := filepath.Join(sharedTempDir, tf.curPkg.obfuscatedImportPath())
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
newPath := filepath.Join(pkgDir, name)
newPaths = append(newPaths, newPath)
}
return append(flags, newPaths...), nil
}
const missingHeader = "missing header path"
newHeaderPaths := make(map[string]string)
var buf, includeBuf bytes.Buffer
for _, path := range paths {
buf.Reset()
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close() // in case of error
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := scanner.Text()
// Whole-line comments might be directives, leave them in place.
// For example: //go:build race
// Any other comment, including inline ones, can be discarded entirely.
line, comment, hasComment := strings.Cut(line, "//")
if hasComment && line == "" {
buf.WriteString("//")
buf.WriteString(comment)
buf.WriteByte('\n')
continue
}
// Preprocessor lines to include another file.
// For example: #include "foo.h"
if quoted := strings.TrimPrefix(line, "#include"); quoted != line {
quoted = strings.TrimSpace(quoted)
path, err := strconv.Unquote(quoted)
if err != nil { // note that strconv.Unquote errors do not include the input string
return nil, fmt.Errorf("cannot unquote %q: %v", quoted, err)
}
newPath := newHeaderPaths[path]
switch newPath {
case missingHeader: // no need to try again
buf.WriteString(line)
buf.WriteByte('\n')
continue
case "": // first time we see this header
includeBuf.Reset()
content, err := os.ReadFile(path)
if errors.Is(err, fs.ErrNotExist) {
newHeaderPaths[path] = missingHeader
buf.WriteString(line)
buf.WriteByte('\n')
continue // a header file provided by Go or the system
} else if err != nil {
return nil, err
}
tf.replaceAsmNames(&includeBuf, content)
// For now, we replace `foo.h` or `dir/foo.h` with `garbled_foo.h`.
// The different name ensures we don't use the unobfuscated file.
// This is far from perfect, but does the job for the time being.
// In the future, use a randomized name.
basename := filepath.Base(path)
newPath = "garbled_" + basename
if _, err := tf.writeSourceFile(basename, newPath, includeBuf.Bytes()); err != nil {
return nil, err
}
newHeaderPaths[path] = newPath
}
buf.WriteString("#include ")
buf.WriteString(strconv.Quote(newPath))
buf.WriteByte('\n')
continue
}
// Anything else is regular assembly; replace the names.
tf.replaceAsmNames(&buf, []byte(line))
buf.WriteByte('\n')
}
if err := scanner.Err(); err != nil {
return nil, err
}
// With assembly files, we obfuscate the filename in the temporary
// directory, as assembly files do not support `/*line` directives.
// TODO(mvdan): per cmd/asm/internal/lex, they do support `#line`.
basename := filepath.Base(path)
newName := hashWithPackage(tf.curPkg, basename) + ".s"
if path, err := tf.writeSourceFile(basename, newName, buf.Bytes()); err != nil {
return nil, err
} else {
newPaths = append(newPaths, path)
}
f.Close() // do not keep len(paths) files open
}
return append(flags, newPaths...), nil
}
func (tf *transformer) replaceAsmNames(buf *bytes.Buffer, remaining []byte) {
// We need to replace all function references with their obfuscated name
// counterparts.
// Luckily, all func names in Go assembly files are immediately followed
// by the unicode "middle dot", like:
//
// TEXT ·privateAdd(SB),$0-24
// TEXT runtimeinternalsys·Ctz64(SB), NOSPLIT, $0-12
//
// Note that import paths in assembly, like `runtimeinternalsys` above,
// use Unicode periods and slashes rather than the ASCII ones used by `go list`.
// We need to convert to ASCII to find the right package information.
const (
asmPeriod = '·'
goPeriod = '.'
asmSlash = ''
goSlash = '/'
)
asmPeriodLen := utf8.RuneLen(asmPeriod)
for {
periodIdx := bytes.IndexRune(remaining, asmPeriod)
if periodIdx < 0 {
buf.Write(remaining)
remaining = nil
break
}
// The package name ends at the first rune which cannot be part of a Go
// import path, such as a comma or space.
pkgStart := periodIdx
for pkgStart >= 0 {
c, size := utf8.DecodeLastRune(remaining[:pkgStart])
if !unicode.IsLetter(c) && c != '_' && c != asmSlash && !unicode.IsDigit(c) {
break
}
pkgStart -= size
}
// The package name might actually be longer, e.g:
//
// JMP testwith·many·dotsmainimported·PublicAdd(SB)
//
// We have `testwith` so far; grab `·many·dotsmainimported` as well.
pkgEnd := periodIdx
lastAsmPeriod := -1
for i := pkgEnd + asmPeriodLen; i <= len(remaining); {
c, size := utf8.DecodeRune(remaining[i:])
if c == asmPeriod {
lastAsmPeriod = i
} else if !unicode.IsLetter(c) && c != '_' && c != asmSlash && !unicode.IsDigit(c) {
if lastAsmPeriod > 0 {
pkgEnd = lastAsmPeriod
}
break
}
i += size
}
asmPkgPath := string(remaining[pkgStart:pkgEnd])
// Write the bytes before our unqualified `·foo` or qualified `pkg·foo`.
buf.Write(remaining[:pkgStart])
// If the name was qualified, fetch the package, and write the
// obfuscated import path if needed.
// Note that we don't obfuscate the package path "main".
lpkg := tf.curPkg
if asmPkgPath != "" && asmPkgPath != "main" {
if asmPkgPath != tf.curPkg.Name {
goPkgPath := asmPkgPath
goPkgPath = strings.ReplaceAll(goPkgPath, string(asmPeriod), string(goPeriod))
goPkgPath = strings.ReplaceAll(goPkgPath, string(asmSlash), string(goSlash))
var err error
lpkg, err = listPackage(tf.curPkg, goPkgPath)
if err != nil {
panic(err) // shouldn't happen
}
}
if lpkg.ToObfuscate {
// Note that we don't need to worry about asmSlash here,
// because our obfuscated import paths contain no slashes right now.
buf.WriteString(lpkg.obfuscatedImportPath())
} else {
buf.WriteString(asmPkgPath)
}
}
// Write the middle dot and advance the remaining slice.
buf.WriteRune(asmPeriod)
remaining = remaining[pkgEnd+asmPeriodLen:]
// The declared name ends at the first rune which cannot be part of a Go
// identifier, such as a comma or space.
nameEnd := 0
for nameEnd < len(remaining) {
c, size := utf8.DecodeRune(remaining[nameEnd:])
if !unicode.IsLetter(c) && c != '_' && !unicode.IsDigit(c) {
break
}
nameEnd += size
}
name := string(remaining[:nameEnd])
remaining = remaining[nameEnd:]
avoid breaking intrinsics when obfuscating names We obfuscate import paths as well as their declared names. The compiler treats some packages and APIs in special ways, and the way it detects those is by looking at import paths and names. In the past, we have avoided obfuscating some names like embed.FS or reflect.Value.MethodByName for this reason. Otherwise, go:embed or the linker's deadcode elimination might be broken. This matching by path and name also happens with compiler intrinsics. Intrinsics allow the compiler to rewrite some standard library calls with small and efficient assembly, depending on the target GOARCH. For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32, which on amd64 may result in using the TZCNTL instruction. We never noticed that we were breaking many of these intrinsics. The intrinsics for funcs declared in the runtime and its dependencies still worked properly, as we do not obfuscate those packages yet. However, for other packages like math/bits and sync/atomic, the intrinsics were being entirely disabled due to obfuscated names. Skipping intrinsics is particularly bad for performance, and it also leads to slightly larger binaries: │ old │ new │ │ bin-B │ bin-B vs base │ Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4) Finally, the main reason we noticed that intrinsics were broken is that apparently GOARCH=mips fails to link without them, as some symbols end up being not defined at all. This patch fixes builds for the MIPS family of architectures. Rather than building and linking all of std for every GOARCH, test that intrinsics work by asking the compiler to print which intrinsics are being applied, and checking that math/bits gets them. This fix is relatively unfortunate, as it means we stop obfuscating about 120 function names and a handful of package paths. However, fixing builds and intrinsics is much more important. We can figure out better ways to deal with intrinsics in the future. Fixes #646.
1 year ago
if lpkg.ToObfuscate && !compilerIntrinsicsFuncs[lpkg.ImportPath+"."+name] {
newName := hashWithPackage(lpkg, name)
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
log.Printf("asm name %q hashed with %x to %q", name, tf.curPkg.GarbleActionID, newName)
}
buf.WriteString(newName)
} else {
buf.WriteString(name)
}
}
}
// writeSourceFile is a mix between os.CreateTemp and os.WriteFile, as it writes a
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
// named source file in sharedTempDir given an input buffer.
//
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
// Note that the file is created under a directory tree following curPkg's
// import path, mimicking how files are laid out in modules and GOROOT.
func (tf *transformer) writeSourceFile(basename, obfuscated string, content []byte) (string, error) {
// Uncomment for some quick debugging. Do not delete.
// fmt.Fprintf(os.Stderr, "\n-- %s/%s --\n%s", curPkg.ImportPath, basename, content)
if flagDebugDir != "" {
pkgDir := filepath.Join(flagDebugDir, filepath.FromSlash(tf.curPkg.ImportPath))
if err := os.MkdirAll(pkgDir, 0o755); err != nil {
return "", err
}
dstPath := filepath.Join(pkgDir, basename)
if err := os.WriteFile(dstPath, content, 0o666); err != nil {
return "", err
}
}
// We use the obfuscated import path to hold the temporary files.
// Assembly files do not support line directives to set positions,
// so the only way to not leak the import path is to replace it.
pkgDir := filepath.Join(sharedTempDir, tf.curPkg.obfuscatedImportPath())
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
if err := os.MkdirAll(pkgDir, 0o777); err != nil {
return "", err
}
dstPath := filepath.Join(pkgDir, obfuscated)
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
if err := writeFileExclusive(dstPath, content); err != nil {
return "", err
}
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
return dstPath, nil
}
// parseFiles parses a list of Go files.
// It supports relative file paths, such as those found in listedPackage.CompiledGoFiles,
// as long as dir is set to listedPackage.Dir.
func parseFiles(dir string, paths []string) ([]*ast.File, error) {
var files []*ast.File
for _, path := range paths {
if !filepath.IsAbs(path) {
path = filepath.Join(dir, path)
}
file, err := parser.ParseFile(fset, path, nil, parser.SkipObjectResolution|parser.ParseComments)
if err != nil {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return nil, err
}
files = append(files, file)
}
return files, nil
}
func (tf *transformer) transformCompile(args []string) ([]string, error) {
flags, paths := splitFlagsFromFiles(args, ".go")
// We will force the linker to drop DWARF via -w, so don't spend time
// generating it.
flags = append(flags, "-dwarf=false")
// The Go file paths given to the compiler are always absolute paths.
files, err := parseFiles("", paths)
if err != nil {
return nil, err
}
// Literal and control flow obfuscation uses math/rand, so seed it deterministically.
randSeed := tf.curPkg.GarbleActionID[:]
if flagSeed.present() {
randSeed = flagSeed.bytes
}
// log.Printf("seeding math/rand with %x\n", randSeed)
tf.obfRand = mathrand.New(mathrand.NewSource(int64(binary.BigEndian.Uint64(randSeed))))
// Even if loadPkgCache below finds a direct cache hit,
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
// other parts of garble still need type information to obfuscate.
// We could potentially avoid this by saving the type info we need in the cache,
// although in general that wouldn't help much, since it's rare for Go's cache
// to miss on a package and for our cache to hit.
if tf.pkg, tf.info, err = typecheck(tf.curPkg.ImportPath, files, tf.origImporter); err != nil {
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
return nil, err
}
var (
ssaPkg *ssa.Package
requiredPkgs []string
)
if flagControlFlow {
ssaPkg = ssaBuildPkg(tf.pkg, files, tf.info)
newFileName, newFile, affectedFiles, err := ctrlflow.Obfuscate(fset, ssaPkg, files, tf.obfRand)
if err != nil {
return nil, err
}
if newFile != nil {
files = append(files, newFile)
paths = append(paths, newFileName)
for _, file := range affectedFiles {
tf.useAllImports(file)
}
if tf.pkg, tf.info, err = typecheck(tf.curPkg.ImportPath, files, tf.origImporter); err != nil {
return nil, err
}
for _, imp := range newFile.Imports {
path, err := strconv.Unquote(imp.Path.Value)
if err != nil {
panic(err) // should never happen
}
requiredPkgs = append(requiredPkgs, path)
}
}
}
if tf.curPkgCache, err = loadPkgCache(tf.curPkg, tf.pkg, files, tf.info, ssaPkg); err != nil {
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
return nil, err
}
// These maps are not kept in pkgCache, since they are only needed to obfuscate curPkg.
tf.fieldToStruct = computeFieldToStruct(tf.info)
if flagLiterals {
if tf.linkerVariableStrings, err = computeLinkerVariableStrings(tf.pkg); err != nil {
return nil, err
}
}
flags = alterTrimpath(flags)
newImportCfg, err := tf.processImportCfg(flags, requiredPkgs)
if err != nil {
return nil, err
}
// If this is a package to obfuscate, swap the -p flag with the new package path.
// We don't if it's the main package, as that just uses "-p main".
// We only set newPkgPath if we're obfuscating the import path,
// to replace the original package name in the package clause below.
refactor "current package" with TOOLEXEC_IMPORTPATH (#266) Now that we've dropped support for Go 1.15.x, we can finally rely on this environment variable for toolexec calls, present in Go 1.16. Before, we had hacky ways of trying to figure out the current package's import path, mostly from the -p flag. The biggest rough edge there was that, for main packages, that was simply the package name, and not its full import path. To work around that, we had a restriction on a single main package, so we could work around that issue. That restriction is now gone. The new code is simpler, especially because we can set curPkg in a single place for all toolexec transform funcs. Since we can always rely on curPkg not being nil now, we can also start reusing listedPackage.Private and avoid the majority of repeated calls to isPrivate. The function is cheap, but still not free. isPrivate itself can also get simpler. We no longer have to worry about the "main" edge case. Plus, the sanity check for invalid package paths is now unnecessary; we only got malformed paths from goobj2, and we now require exact matches with the ImportPath field from "go list -json". Another effect of clearing up the "main" edge case is that -debugdir now uses the right directory for main packages. We also start using consistent debugdir paths in the tests, for the sake of being easier to read and maintain. Finally, note that commandReverse did not need the extra call to "go list -toolexec", as the "shared" call stored in the cache is enough. We still call toolexecCmd to get said cache, which should probably be simplified in a future PR. While at it, replace the use of the "-std" compiler flag with the Standard field from "go list -json".
3 years ago
newPkgPath := ""
if tf.curPkg.Name != "main" && tf.curPkg.ToObfuscate {
newPkgPath = tf.curPkg.obfuscatedImportPath()
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
flags = flagSetValue(flags, "-p", newPkgPath)
}
newPaths := make([]string, 0, len(files))
for i, file := range files {
basename := filepath.Base(paths[i])
log.Printf("obfuscating %s", basename)
if tf.curPkg.ImportPath == "runtime" {
if flagTiny {
// strip unneeded runtime code
stripRuntime(basename, file)
tf.useAllImports(file)
}
if basename == "symtab.go" {
updateMagicValue(file, magicValue())
updateEntryOffset(file, entryOffKey())
}
}
tf.transformDirectives(file.Comments)
file = tf.transformGoFile(file)
avoid breaking intrinsics when obfuscating names We obfuscate import paths as well as their declared names. The compiler treats some packages and APIs in special ways, and the way it detects those is by looking at import paths and names. In the past, we have avoided obfuscating some names like embed.FS or reflect.Value.MethodByName for this reason. Otherwise, go:embed or the linker's deadcode elimination might be broken. This matching by path and name also happens with compiler intrinsics. Intrinsics allow the compiler to rewrite some standard library calls with small and efficient assembly, depending on the target GOARCH. For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32, which on amd64 may result in using the TZCNTL instruction. We never noticed that we were breaking many of these intrinsics. The intrinsics for funcs declared in the runtime and its dependencies still worked properly, as we do not obfuscate those packages yet. However, for other packages like math/bits and sync/atomic, the intrinsics were being entirely disabled due to obfuscated names. Skipping intrinsics is particularly bad for performance, and it also leads to slightly larger binaries: │ old │ new │ │ bin-B │ bin-B vs base │ Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4) Finally, the main reason we noticed that intrinsics were broken is that apparently GOARCH=mips fails to link without them, as some symbols end up being not defined at all. This patch fixes builds for the MIPS family of architectures. Rather than building and linking all of std for every GOARCH, test that intrinsics work by asking the compiler to print which intrinsics are being applied, and checking that math/bits gets them. This fix is relatively unfortunate, as it means we stop obfuscating about 120 function names and a handful of package paths. However, fixing builds and intrinsics is much more important. We can figure out better ways to deal with intrinsics in the future. Fixes #646.
1 year ago
// newPkgPath might be the original ImportPath in some edge cases like
// compilerIntrinsics; we don't want to use slashes in package names.
// TODO: when we do away with those edge cases, only check the string is
// non-empty.
if newPkgPath != "" && newPkgPath != tf.curPkg.ImportPath {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
file.Name.Name = newPkgPath
}
src, err := printFile(tf.curPkg, file)
if err != nil {
return nil, err
}
// We hide Go source filenames via "//line" directives,
// so there is no need to use obfuscated filenames here.
if path, err := tf.writeSourceFile(basename, basename, src); err != nil {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return nil, err
} else {
newPaths = append(newPaths, path)
}
}
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
flags = flagSetValue(flags, "-importcfg", newImportCfg)
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return append(flags, newPaths...), nil
}
// transformDirectives rewrites //go:linkname toolchain directives in comments
// to replace names with their obfuscated versions.
func (tf *transformer) transformDirectives(comments []*ast.CommentGroup) {
for _, group := range comments {
for _, comment := range group.List {
if !strings.HasPrefix(comment.Text, "//go:linkname ") {
continue
}
// We can have either just one argument:
//
// //go:linkname localName
//
// Or two arguments, where the second may refer to a name in a
// different package:
//
// //go:linkname localName newName
// //go:linkname localName pkg.newName
fields := strings.Fields(comment.Text)
localName := fields[1]
newName := ""
if len(fields) == 3 {
newName = fields[2]
}
localName, newName = tf.transformLinkname(localName, newName)
fields[1] = localName
if len(fields) == 3 {
fields[2] = newName
}
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
log.Printf("linkname %q changed to %q", comment.Text, strings.Join(fields, " "))
ensure the runtime is built in a reproducible way We went to great lengths to ensure garble builds are reproducible. This includes how the tool itself works, as its behavior should be the same given the same inputs. However, we made one crucial mistake with the runtime package. It has go:linkname directives pointing at other packages, and some of those pointed packages aren't its dependencies. Imagine two scenarios where garble builds the runtime package: 1) We run "garble build runtime". The way we handle linkname directives calls listPackage on the target package, to obfuscate the target's import path and object name. However, since we only obtained build info of runtime and its deps, calls for some linknames such as listPackage("sync/atomic") will fail. The linkname directive will leave its target untouched. 2) We run "garble build std". Unlike the first scenario, all listPackage calls issued by runtime's linkname directives will succeed, so its linkname directive targets will be obfuscated. At best, this can result in inconsistent builds, depending on how the runtime package was built. At worst, the mismatching object names can result in errors at link time, if the target packages are actually used. The modified test reproduces the worst case scenario reliably, when the fix is reverted: > env GOCACHE=${WORK}/gocache-empty > garble build -a runtime > garble build -o=out_rebuild ./stdimporter [stderr] # test/main/stdimporter JZzQivnl.NtQJu0H3: relocation target JZzQivnl.iioHinYT not defined JZzQivnl.NtQJu0H3.func9: relocation target JZzQivnl.yz5z0NaH not defined JZzQivnl.(*ypvqhKiQ).String: relocation target JZzQivnl.eVciBQeI not defined JZzQivnl.(*ypvqhKiQ).PkgPath: relocation target JZzQivnl.eVciBQeI not defined [...] The fix consists of two steps. First, if we're building the runtime and listPackage fails on a package, that means we ran into scenario 1 above. To avoid the inconsistency, we fill ListedPackages with "go list [...] std". This means we'll always build runtime as described in scenario 2 above. Second, when building packages other than the runtime, we only allow listPackage to succeed if we're listing a dependency of the current package. This ensures we won't run into similar reproducibility bugs in the future. Finally, re-enable test-gotip on CI since this was the last test flake.
2 years ago
}
comment.Text = strings.Join(fields, " ")
}
}
}
func (tf *transformer) transformLinkname(localName, newName string) (string, string) {
// obfuscate the local name, if the current package is obfuscated
if tf.curPkg.ToObfuscate && !compilerIntrinsicsFuncs[tf.curPkg.ImportPath+"."+localName] {
localName = hashWithPackage(tf.curPkg, localName)
}
if newName == "" {
return localName, ""
}
// If the new name is of the form "pkgpath.Name", and we've obfuscated
// "Name" in that package, rewrite the directive to use the obfuscated name.
dotCnt := strings.Count(newName, ".")
if dotCnt < 1 {
// cgo-generated code uses linknames to made up symbol names,
// which do not have a package path at all.
// Replace the comment in case the local name was obfuscated.
return localName, newName
}
switch newName {
case "main.main", "main..inittask", "runtime..inittask":
// The runtime uses some special symbols with "..".
// We aren't touching those at the moment.
return localName, newName
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
pkgSplit := 0
var lpkg *listedPackage
var foreignName string
for {
i := strings.Index(newName[pkgSplit:], ".")
if i < 0 {
// We couldn't find a prefix that matched a known package.
// Probably a made up name like above, but with a dot.
return localName, newName
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
pkgSplit += i
pkgPath := newName[:pkgSplit]
pkgSplit++ // skip over the dot
if strings.HasSuffix(pkgPath, "_test") {
// runtime uses a go:linkname to metrics_test;
// we don't need this to work for now on regular builds,
// though we might need to rethink this if we want "go test std" to work.
continue
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
var err error
lpkg, err = listPackage(tf.curPkg, pkgPath)
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
if err == nil {
foreignName = newName[pkgSplit:]
break
}
if errors.Is(err, ErrNotFound) {
// No match; find the next dot.
continue
}
if errors.Is(err, ErrNotDependency) {
fmt.Fprintf(os.Stderr,
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
"//go:linkname refers to %s - add `import _ %q` for garble to find the package",
newName, pkgPath)
return localName, newName
}
panic(err) // shouldn't happen
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
if !lpkg.ToObfuscate || compilerIntrinsicsFuncs[lpkg.ImportPath+"."+foreignName] {
// We're not obfuscating that package or name.
return localName, newName
}
var newForeignName string
if receiver, name, ok := strings.Cut(foreignName, "."); ok {
if lpkg.ImportPath == "reflect" && (receiver == "(*rtype)" || receiver == "Value") {
// These receivers are not obfuscated.
// See the TODO below.
} else if strings.HasPrefix(receiver, "(*") {
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
// pkg/path.(*Receiver).method
receiver = strings.TrimPrefix(receiver, "(*")
receiver = strings.TrimSuffix(receiver, ")")
receiver = "(*" + hashWithPackage(lpkg, receiver) + ")"
} else {
// pkg/path.Receiver.method
receiver = hashWithPackage(lpkg, receiver)
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
// Exported methods are never obfuscated.
//
// TODO(mvdan): We're duplicating the logic behind these decisions.
// Reuse the logic with transformCompile.
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
if !token.IsExported(name) {
name = hashWithPackage(lpkg, name)
}
newForeignName = receiver + "." + name
} else {
// pkg/path.function
newForeignName = hashWithPackage(lpkg, foreignName)
}
newPkgPath := lpkg.ImportPath
if newPkgPath != "main" {
newPkgPath = lpkg.obfuscatedImportPath()
}
support go:linkname directives pointing at methods This is not common, but it is done by a few projects. Namely, github.com/goccy/go-json reached into reflect's guts, which included a number of methods: internal/runtime/rtype.go 11://go:linkname rtype_Align reflect.(*rtype).Align 19://go:linkname rtype_FieldAlign reflect.(*rtype).FieldAlign 27://go:linkname rtype_Method reflect.(*rtype).Method 35://go:linkname rtype_MethodByName reflect.(*rtype).MethodByName [...] Add tests for such go:linkname directives pointing at methods. Note that there are two possible symbol string variants; "pkg/path.(*Receiver).method" for methods with pointer receivers, and "pkg/path.Receiver.method" for the rest. We can't assume that the presence of two dots means a method either. For example, a package path may be "pkg/path.with.dots", and so "pkg/path.with.dots.SomeFunc" is the function "SomeFunc" rather than the method "SomeFunc" on a type "dots". To account for this ambiguity, rather than splitting on the last dot like we used to, try to find a package path prefix by splitting on an increasing number of first dots. This can in theory still be ambiguous. For example, we could have the package "pkg/path" expose the method "foo.bar", and the package "pkg/path.foo" expose the func "bar". Then, the symbol string "pkg/path.foo.bar" could mean either of them. However, this seems extremely unlikely to happen in practice, and I'm not sure that Go's toolchain would support it either. I also noticed that goccy/go-json still failed to build after the fix. The reason was that the type reflect.rtype wasn't being obfuscated. We could, and likely should, teach our assembly and linkname transformers about which names we chose not to obfuscate due to the use of reflection. However, in this particular case, reflect's own types can be obfuscated safely, so just do that. Fixes #656.
1 year ago
newName = newPkgPath + "." + newForeignName
return localName, newName
}
stop loading obfuscated type information from deps If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.
2 years ago
// processImportCfg parses the importcfg file passed to a compile or link step.
// It also builds a new importcfg file to account for obfuscated import paths.
func (tf *transformer) processImportCfg(flags []string, requiredPkgs []string) (newImportCfg string, _ error) {
start using original action IDs (#251) When we obfuscate a name, what we do is hash the name with the action ID of the package that contains the name. To ensure that the hash changes if the garble tool changes, we used the action ID of the obfuscated build, which is different than the original action ID, as we include garble's own content ID in "go tool compile -V=full" via -toolexec. Let's call that the "obfuscated action ID". Remember that a content ID is roughly the hash of a binary or object file, and an action ID contains the hash of a package's source code plus the content IDs of its dependencies. This had the advantage that it did what we wanted. However, it had one massive drawback: when we compile a package, we only have the obfuscated action IDs of its dependencies. This is because one can't have the content ID of dependent packages before they are built. Usually, this is not a problem, because hashing a foreign name means it comes from a dependency, where we already have the obfuscated action ID. However, that's not always the case. First, go:linkname directives can point to any symbol that ends up in the binary, even if the package is not a dependency. So garble could only support linkname targets belonging to dependencies. This is at the root of why we could not obfuscate the runtime; it contains linkname directives targeting the net package, for example, which depends on runtime. Second, some other places did not have an easy access to obfuscated action IDs, like transformAsm, which had to recover it from a temporary file stored by transformCompile. Plus, this was all pretty expensive, as each toolexec sub-process had to make repeated calls to buildidOf with the object files of dependencies. We even had to use extra calls to "go list" in the case of indirect dependencies, as their export files do not appear in importcfg files. All in all, the old method was complex and expensive. A better mechanism is to use the original action IDs directly, as listed by "go list" without garble in the picture. This would mean that the hashing does not change if garble changes, meaning weaker obfuscation. To regain that property, we define the "garble action ID", which is just the original action ID hashed together with garble's own content ID. This is practically the same as the obfuscated build ID we used before, but since it doesn't go through "go tool compile -V=full" and the obfuscated build itself, we can work out *all* the garble action IDs upfront, before the obfuscated build even starts. This fixes all of our problems. Now we know all garble build IDs upfront, so a bunch of hacks can be entirely removed. Plus, since we know them upfront, we can also cache them and avoid repeated calls to "go tool buildid". While at it, make use of the new BuildID field in Go 1.16's "list -json -export". This avoids the vast majority of "go tool buildid" calls, as the only ones that remain are 2 on the garble binary itself. The numbers for Go 1.16 look very good: name old time/op new time/op delta Build-8 146ms ± 4% 101ms ± 1% -31.01% (p=0.002 n=6+6) name old bin-B new bin-B delta Build-8 6.61M ± 0% 6.60M ± 0% -0.09% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 321ms ± 7% 202ms ± 6% -37.11% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Build-8 538ms ± 4% 414ms ± 4% -23.12% (p=0.002 n=6+6)
3 years ago
importCfg := flagValue(flags, "-importcfg")
if importCfg == "" {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return "", fmt.Errorf("could not find -importcfg argument")
}
data, err := os.ReadFile(importCfg)
if err != nil {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return "", err
}
var packagefiles, importmaps [][2]string
// using for track required but not imported packages
var newIndirectImports map[string]bool
if requiredPkgs != nil {
newIndirectImports = make(map[string]bool)
for _, pkg := range requiredPkgs {
newIndirectImports[pkg] = true
}
}
for _, line := range strings.Split(string(data), "\n") {
if line == "" || strings.HasPrefix(line, "#") {
continue
}
verb, args, found := strings.Cut(line, " ")
if !found {
continue
}
switch verb {
case "importmap":
beforePath, afterPath, found := strings.Cut(args, "=")
if !found {
continue
}
importmaps = append(importmaps, [2]string{beforePath, afterPath})
case "packagefile":
importPath, objectPath, found := strings.Cut(args, "=")
if !found {
continue
}
packagefiles = append(packagefiles, [2]string{importPath, objectPath})
delete(newIndirectImports, importPath)
}
}
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
// Produce the modified importcfg file.
// This is mainly replacing the obfuscated paths.
// Note that we range over maps, so this is non-deterministic, but that
// should not matter as the file is treated like a lookup table.
newCfg, err := os.CreateTemp(sharedTempDir, "importcfg")
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
if err != nil {
return "", err
}
for _, pair := range importmaps {
beforePath, afterPath := pair[0], pair[1]
lpkg, err := listPackage(tf.curPkg, beforePath)
if err != nil {
return "", err
}
if lpkg.ToObfuscate {
// Note that beforePath is not the canonical path.
// For beforePath="vendor/foo", afterPath and
// lpkg.ImportPath can be just "foo".
// Don't use obfuscatedImportPath here.
beforePath = hashWithPackage(lpkg, beforePath)
afterPath = lpkg.obfuscatedImportPath()
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
}
fmt.Fprintf(newCfg, "importmap %s=%s\n", beforePath, afterPath)
}
if len(newIndirectImports) > 0 {
f, err := os.Open(filepath.Join(sharedTempDir, actionGraphFileName))
if err != nil {
return "", fmt.Errorf("cannot open action graph file: %v", err)
}
defer f.Close()
var actions []struct {
Mode string
Package string
Objdir string
}
if err := json.NewDecoder(f).Decode(&actions); err != nil {
return "", fmt.Errorf("cannot parse action graph file: %v", err)
}
// theoretically action graph can be long, to optimise it process it in one pass
// with an early exit when all the required imports are found
for _, action := range actions {
if action.Mode != "build" {
continue
}
if ok := newIndirectImports[action.Package]; !ok {
continue
}
packagefiles = append(packagefiles, [2]string{action.Package, filepath.Join(action.Objdir, "_pkg_.a")}) // file name hardcoded in compiler
delete(newIndirectImports, action.Package)
if len(newIndirectImports) == 0 {
break
}
}
if len(newIndirectImports) > 0 {
return "", fmt.Errorf("cannot resolve required packages from action graph file: %v", requiredPkgs)
}
}
for _, pair := range packagefiles {
impPath, pkgfile := pair[0], pair[1]
lpkg, err := listPackage(tf.curPkg, impPath)
if err != nil {
// TODO: it's unclear why an importcfg can include an import path
// that's not a dependency in an edge case with "go test ./...".
// See exporttest/*.go in testdata/scripts/test.txt.
// For now, spot the pattern and avoid the unnecessary error;
// the dependency is unused, so the packagefile line is redundant.
// This still triggers as of go1.21.
if strings.HasSuffix(tf.curPkg.ImportPath, ".test]") && strings.HasPrefix(tf.curPkg.ImportPath, impPath) {
continue
}
return "", err
}
if lpkg.Name != "main" {
impPath = lpkg.obfuscatedImportPath()
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
}
fmt.Fprintf(newCfg, "packagefile %s=%s\n", impPath, pkgfile)
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
}
// Uncomment to debug the transformed importcfg. Do not delete.
// newCfg.Seek(0, 0)
// io.Copy(os.Stderr, newCfg)
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
if err := newCfg.Close(); err != nil {
return "", err
}
return newCfg.Name(), nil
}
type (
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
funcFullName = string // as per go/types.Func.FullName
objectString = string // as per recordedObjectString
typeName struct {
PkgPath string // empty if builtin
Name string
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
}
)
// pkgCache contains information about a package that will be stored in fsCache.
// Note that pkgCache is "deep", containing information about all packages
// which are transitive dependencies as well.
type pkgCache struct {
// ReflectAPIs is a static record of what std APIs use reflection on their
// parameters, so we can avoid obfuscating types used with them.
//
// TODO: we're not including fmt.Printf, as it would have many false positives,
// unless we were smart enough to detect which arguments get used as %#v or %T.
ReflectAPIs map[funcFullName]map[int]bool
stop loading obfuscated type information from deps If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.
2 years ago
// ReflectObjects is filled with the fully qualified names from each
// package that we cannot obfuscate due to reflection.
// The included objects are named types and their fields,
// since it is those names being obfuscated that could break the use of reflect.
//
stop loading obfuscated type information from deps If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.
2 years ago
// This record is necessary for knowing what names from imported packages
// weren't obfuscated, so we can obfuscate their local uses accordingly.
ReflectObjects map[objectString]struct{}
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
// EmbeddedAliasFields records which embedded fields use a type alias.
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
// They are the only instance where a type alias matters for obfuscation,
// because the embedded field name is derived from the type alias itself,
// and not the type that the alias points to.
// In that way, the type alias is obfuscated as a form of named type,
// bearing in mind that it may be owned by a different package.
EmbeddedAliasFields map[objectString]typeName
}
func (c *pkgCache) CopyFrom(c2 pkgCache) {
maps.Copy(c.ReflectAPIs, c2.ReflectAPIs)
maps.Copy(c.ReflectObjects, c2.ReflectObjects)
maps.Copy(c.EmbeddedAliasFields, c2.EmbeddedAliasFields)
}
func ssaBuildPkg(pkg *types.Package, files []*ast.File, info *types.Info) *ssa.Package {
// Create SSA packages for all imports. Order is not significant.
ssaProg := ssa.NewProgram(fset, 0)
created := make(map[*types.Package]bool)
var createAll func(pkgs []*types.Package)
createAll = func(pkgs []*types.Package) {
for _, p := range pkgs {
if !created[p] {
created[p] = true
ssaProg.CreatePackage(p, nil, nil, true)
createAll(p.Imports())
}
}
}
createAll(pkg.Imports())
ssaPkg := ssaProg.CreatePackage(pkg, files, info, false)
ssaPkg.Build()
return ssaPkg
}
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
func openCache() (*cache.Cache, error) {
// Use a subdirectory for the hashed build cache, to clarify what it is,
// and to allow us to have other directories or files later on without mixing.
dir := filepath.Join(sharedCache.CacheDir, "build")
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
if err := os.MkdirAll(dir, 0o777); err != nil {
return nil, err
}
return cache.Open(dir)
}
func loadPkgCache(lpkg *listedPackage, pkg *types.Package, files []*ast.File, info *types.Info, ssaPkg *ssa.Package) (pkgCache, error) {
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
fsCache, err := openCache()
if err != nil {
return pkgCache{}, err
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
filename, _, err := fsCache.GetFile(lpkg.GarbleActionID)
// Already in the cache; load it directly.
if err == nil {
f, err := os.Open(filename)
if err != nil {
return pkgCache{}, err
}
defer f.Close()
var loaded pkgCache
if err := gob.NewDecoder(f).Decode(&loaded); err != nil {
return pkgCache{}, fmt.Errorf("gob decode: %w", err)
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
return loaded, nil
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
return computePkgCache(fsCache, lpkg, pkg, files, info, ssaPkg)
}
func computePkgCache(fsCache *cache.Cache, lpkg *listedPackage, pkg *types.Package, files []*ast.File, info *types.Info, ssaPkg *ssa.Package) (pkgCache, error) {
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
// Not yet in the cache. Load the cache entries for all direct dependencies,
// build our cache entry, and write it to disk.
// Note that practically all errors from Cache.GetFile are a cache miss;
// for example, a file might exist but be empty if another process
// is filling the same cache entry concurrently.
//
// TODO: if A (curPkg) imports B and C, and B also imports C,
// then loading the gob files from both B and C is unnecessary;
// loading B's gob file would be enough. Is there an easy way to do that?
computed := pkgCache{
ReflectAPIs: map[funcFullName]map[int]bool{
"reflect.TypeOf": {0: true},
"reflect.ValueOf": {0: true},
},
ReflectObjects: map[objectString]struct{}{},
EmbeddedAliasFields: map[objectString]typeName{},
}
for _, imp := range lpkg.Imports {
if imp == "C" {
// `go list -json` shows "C" in Imports but not Deps.
// See https://go.dev/issue/60453.
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
continue
}
// Shadowing lpkg ensures we don't use the wrong listedPackage below.
lpkg, err := listPackage(lpkg, imp)
if err != nil {
return computed, err
}
if lpkg.BuildID == "" {
continue // nothing to load
}
if err := func() error { // function literal for the deferred close
if filename, _, err := fsCache.GetFile(lpkg.GarbleActionID); err == nil {
// Cache hit; append new entries to computed.
f, err := os.Open(filename)
if err != nil {
return err
}
defer f.Close()
if err := gob.NewDecoder(f).Decode(&computed); err != nil {
return fmt.Errorf("gob decode: %w", err)
}
return nil
}
// Missing or corrupted entry in the cache for a dependency.
// Could happen if GARBLE_CACHE was emptied but GOCACHE was not.
// Compute it, which can recurse if many entries are missing.
files, err := parseFiles(lpkg.Dir, lpkg.CompiledGoFiles)
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
if err != nil {
return err
}
origImporter := importerForPkg(lpkg)
pkg, info, err := typecheck(lpkg.ImportPath, files, origImporter)
if err != nil {
return err
}
computedImp, err := computePkgCache(fsCache, lpkg, pkg, files, info, nil)
if err != nil {
return err
}
computed.CopyFrom(computedImp)
return nil
}(); err != nil {
return pkgCache{}, fmt.Errorf("pkgCache load for %s: %w", imp, err)
}
}
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
// Fill EmbeddedAliasFields from the type info.
for name, obj := range info.Uses {
obj, ok := obj.(*types.TypeName)
if !ok || !obj.IsAlias() {
continue
}
vr, _ := info.Defs[name].(*types.Var)
if vr == nil || !vr.Embedded() {
continue
}
vrStr := recordedObjectString(vr)
if vrStr == "" {
continue
}
aliasTypeName := typeName{
Name: obj.Name(),
}
if pkg := obj.Pkg(); pkg != nil {
aliasTypeName.PkgPath = pkg.Path()
}
computed.EmbeddedAliasFields[vrStr] = aliasTypeName
}
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
// Fill the reflect info from SSA, which builds on top of the syntax tree and type info.
inspector := reflectInspector{
pkg: pkg,
checkedAPIs: make(map[string]bool),
propagatedInstr: map[ssa.Instruction]bool{},
result: computed, // append the results
}
if ssaPkg == nil {
ssaPkg = ssaBuildPkg(pkg, files, info)
}
inspector.recordReflection(ssaPkg)
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
// Unlikely that we could stream the gob encode, as cache.Put wants an io.ReadSeeker.
var buf bytes.Buffer
if err := gob.NewEncoder(&buf).Encode(computed); err != nil {
return pkgCache{}, err
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
if err := fsCache.PutBytes(lpkg.GarbleActionID, buf.Bytes()); err != nil {
return pkgCache{}, err
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
}
return computed, nil
}
// cmd/bundle will include a go:generate directive in its output by default.
// Ours specifies a version and doesn't assume bundle is in $PATH, so drop it.
//go:generate go run golang.org/x/tools/cmd/bundle -o cmdgo_quoted.go -prefix cmdgoQuoted cmd/internal/quoted
//go:generate sed -i /go:generate/d cmdgo_quoted.go
// computeLinkerVariableStrings iterates over the -ldflags arguments,
// filling a map with all the string values set via the linker's -X flag.
// TODO: can we put this in sharedCache, using objectString as a key?
func computeLinkerVariableStrings(pkg *types.Package) (map[*types.Var]string, error) {
linkerVariableStrings := make(map[*types.Var]string)
// TODO: this is a linker flag that affects how we obfuscate a package at
// compile time. Note that, if the user changes ldflags, then Go may only
// re-link the final binary, without re-compiling any packages at all.
// It's possible that this could result in:
//
// garble -literals build -ldflags=-X=pkg.name=before # name="before"
// garble -literals build -ldflags=-X=pkg.name=after # name="before" as cached
//
// We haven't been able to reproduce this problem for now,
// but it's worth noting it and keeping an eye out for it in the future.
// If we do confirm this theoretical bug,
// the solution will be to either find a different solution for -literals,
// or to force including -ldflags into the build cache key.
ldflags, err := cmdgoQuotedSplit(flagValue(sharedCache.ForwardBuildFlags, "-ldflags"))
if err != nil {
return nil, err
}
flagValueIter(ldflags, "-X", func(val string) {
// val is in the form of "foo.com/bar.name=value".
fullName, stringValue, found := strings.Cut(val, "=")
if !found {
return // invalid
}
// fullName is "foo.com/bar.name"
i := strings.LastIndexByte(fullName, '.')
path, name := fullName[:i], fullName[i+1:]
// -X represents the main package as "main", not its import path.
if path != pkg.Path() && (path != "main" || pkg.Name() != "main") {
return // not the current package
}
obj, _ := pkg.Scope().Lookup(name).(*types.Var)
if obj == nil {
slight simplifications and alloc reductions Reuse a buffer and a map across loop iterations, because we can. Make recordTypeDone only track named types, as that is enough to detect type cycles. Without named types, there can be no cycles. These two reduce allocs by a fraction of a percent: name old time/op new time/op delta Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10) name old bin-B new bin-B delta Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9) name old mallocs/op new mallocs/op delta Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10) name old sys-time/op new sys-time/op delta Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9) It doesn't seem like much, but remember that these stats are for the entire set of processes, where garble only accounts for about 10% of the total wall time when compared to the compiler or linker. So a ~0.1% decrease globally is still significant. linkerVariableStrings is also indexed by *types.Var rather than types.Object, since -ldflags=-X only supports setting the string value of variables. This shouldn't make a significant difference in terms of allocs, but at least the map is less prone to confusion with other object types. To ensure the new code doesn't trip up on non-variables, we add test cases. Finally, for the sake of clarity, index into the types.Info maps like Defs and Uses rather than calling ObjectOf if we know whether the identifier we have is a definition of a name or the use of a defined name. This isn't better in terms of performance, as ObjectOf is a tiny method, but just like with linkerVariableStrings before, the new code is clearer.
2 years ago
return // no such variable; skip
}
linkerVariableStrings[obj] = stringValue
})
return linkerVariableStrings, nil
}
// transformer holds all the information and state necessary to obfuscate a
// single Go package.
type transformer struct {
// curPkg holds basic information about the package being currently compiled or linked.
curPkg *listedPackage
// curPkgCache is the pkgCache for curPkg.
curPkgCache pkgCache
// The type-checking results; the package itself, and the Info struct.
pkg *types.Package
info *types.Info
// linkerVariableStrings records objects for variables used in -ldflags=-X flags,
// as well as the strings the user wants to inject them with.
// Used when obfuscating literals, so that we obfuscate the injected value.
slight simplifications and alloc reductions Reuse a buffer and a map across loop iterations, because we can. Make recordTypeDone only track named types, as that is enough to detect type cycles. Without named types, there can be no cycles. These two reduce allocs by a fraction of a percent: name old time/op new time/op delta Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10) name old bin-B new bin-B delta Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9) name old mallocs/op new mallocs/op delta Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10) name old sys-time/op new sys-time/op delta Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9) It doesn't seem like much, but remember that these stats are for the entire set of processes, where garble only accounts for about 10% of the total wall time when compared to the compiler or linker. So a ~0.1% decrease globally is still significant. linkerVariableStrings is also indexed by *types.Var rather than types.Object, since -ldflags=-X only supports setting the string value of variables. This shouldn't make a significant difference in terms of allocs, but at least the map is less prone to confusion with other object types. To ensure the new code doesn't trip up on non-variables, we add test cases. Finally, for the sake of clarity, index into the types.Info maps like Defs and Uses rather than calling ObjectOf if we know whether the identifier we have is a definition of a name or the use of a defined name. This isn't better in terms of performance, as ObjectOf is a tiny method, but just like with linkerVariableStrings before, the new code is clearer.
2 years ago
linkerVariableStrings map[*types.Var]string
// fieldToStruct helps locate struct types from any of their field
// objects. Useful when obfuscating field names.
fieldToStruct map[*types.Var]*types.Struct
// obfRand is initialized by transformCompile and used during obfuscation.
// It is left nil at init time, so that we only use it after it has been
// properly initialized with a deterministic seed.
// It must only be used for deterministic obfuscation;
// if it is used for any other purpose, we may lose determinism.
obfRand *mathrand.Rand
// origImporter is a go/types importer which uses the original versions
// of packages, without any obfuscation. This is helpful to make
// decisions on how to obfuscate our input code.
origImporter importerWithMap
// usedAllImportsFiles is used to prevent multiple calls of tf.useAllImports function on one file
// in case of simultaneously applied control flow and literals obfuscation
usedAllImportsFiles map[*ast.File]bool
}
func typecheck(pkgPath string, files []*ast.File, origImporter importerWithMap) (*types.Package, *types.Info, error) {
info := &types.Info{
replace our caching inside GOCACHE with GARBLE_CACHE For each Go package we obfuscate, we need to store information about how we obfuscated it, which is needed when obfuscating its dependents. For example, if A depends on B to use the type B.Foo, A needs to know whether or not B.Foo was obfuscated; it depends on B's use of reflect. We record this information in a gob file, which is cached on disk. To avoid rolling our own custom cache, and since garble is so closely connected with cmd/go already, we piggybacked off of Go's GOCACHE. In particular, for each build cache entry per `go list`'s Export field, we would store a "garble" sibling file with that gob content. However, this was brittle for two reasons: 1) We were doing this without cmd/go's permission or knowledge. We were careful to use filename suffixes similar to Export files, meaning that `go clean` and other commands would treat them the same. However, this could confuse cmd/go at any point in the future. 2) cmd/go trims cache entries in GOCACHE regularly, to keep the size of the build and test caches under control. Right now, this means that every 24h, any file not accessed in the last five days is deleted. However, that trimming heuristic is done per-file. If the trimming removed Garble's sibling file but not the original Export file, this could cause errors such as "cannot load garble export file" which users already ran into. Instead, start using github.com/rogpeppe/go-internal/cache, an exported copy of cmd/go's own cache implementation for GOCACHE. Since we need an entirely separate directory, we introduce GARBLE_CACHE, defaulting to the "garble" directory inside the user's cache directory. For example, on Linux this would be ~/.cache/garble. Inside GARBLE_CACHE, our gob file cache will be under "build", which helps clarify that this cache is used when obfuscating Go builds, and allows placing other kinds of caches inside GARBLE_CACHE. For example, we already have a need for storing linker binaries, which for now still use their own caching mechanism. This commit does not make our cache properly resistant to removed files. The proof is that our seed.txtar testscript still fails the second case. However, we do rewrite all of our caching logic away from Export files, which in itself is a considerable refactor, and we add a few TODOs. One notable change is how we load gob files from dependencies when building the cache entry for the current package. We used to load the gob files from all packages in the Deps field. However, that is the list of all _transitive_ dependencies. Since these gob files are already flat, meaning they contain information about all of their transitive dependencies as well, we need only load the gob files from the direct dependencies, the Imports field. Performance is largely unchanged, since the behavior is similar. However, the change from Deps to Imports saves us some work, which can be seen in the reduced mallocs per obfuscated build. It's unclear why the binary size isn't stable. When reverting the Deps to Imports change, it then settles at 5.386Mi, which is almost exactly in between the two measurements below. I'm not sure why, but that metric appears to be slightly unstable. goos: linux goarch: amd64 pkg: mvdan.cc/garble cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ new │ │ sec/op │ sec/op vs base │ Build-8 11.09 ± 1% 11.08 ± 1% ~ (p=0.796 n=10) │ old │ new │ │ bin-B │ bin-B vs base │ Build-8 5.390Mi ± 0% 5.382Mi ± 0% -0.14% (p=0.000 n=10) │ old │ new │ │ cached-sec/op │ cached-sec/op vs base │ Build-8 415.5m ± 4% 421.6m ± 1% ~ (p=0.190 n=10) │ old │ new │ │ mallocs/op │ mallocs/op vs base │ Build-8 35.43M ± 0% 34.05M ± 0% -3.89% (p=0.000 n=10) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ Build-8 5.662 ± 1% 5.701 ± 2% ~ (p=0.280 n=10)
1 year ago
Types: make(map[ast.Expr]types.TypeAndValue),
Defs: make(map[*ast.Ident]types.Object),
Uses: make(map[*ast.Ident]types.Object),
Implicits: make(map[ast.Node]types.Object),
Scopes: make(map[ast.Node]*types.Scope),
Selections: make(map[*ast.SelectorExpr]*types.Selection),
Instances: make(map[*ast.Ident]types.Instance),
}
// TODO(mvdan): we should probably set types.Config.GoVersion from go.mod
origTypesConfig := types.Config{Importer: origImporter}
pkg, err := origTypesConfig.Check(pkgPath, fset, files, info)
if err != nil {
return nil, nil, fmt.Errorf("typecheck error: %v", err)
}
return pkg, info, err
}
func computeFieldToStruct(info *types.Info) map[*types.Var]*types.Struct {
done := make(map[*types.Named]bool)
fieldToStruct := make(map[*types.Var]*types.Struct)
// Run recordType on all types reachable via types.Info.
// A bit hacky, but I could not find an easier way to do this.
for _, obj := range info.Uses {
if obj != nil {
recordType(obj.Type(), nil, done, fieldToStruct)
}
}
for _, obj := range info.Defs {
if obj != nil {
recordType(obj.Type(), nil, done, fieldToStruct)
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
}
}
for _, tv := range info.Types {
recordType(tv.Type, nil, done, fieldToStruct)
}
return fieldToStruct
}
// recordType visits every reachable type after typechecking a package.
// Right now, all it does is fill the fieldToStruct map.
// Since types can be recursive, we need a map to avoid cycles.
// We only need to track named types as done, as all cycles must use them.
func recordType(used, origin types.Type, done map[*types.Named]bool, fieldToStruct map[*types.Var]*types.Struct) {
if origin == nil {
origin = used
}
type Container interface{ Elem() types.Type }
switch used := used.(type) {
case Container:
// origin may be a *types.TypeParam, which is not a Container.
// For now, we haven't found a need to recurse in that case.
// We can edit this code in the future if we find an example,
// because we panic if a field is not in fieldToStruct.
if origin, ok := origin.(Container); ok {
recordType(used.Elem(), origin.Elem(), done, fieldToStruct)
}
case *types.Named:
if done[used] {
slight simplifications and alloc reductions Reuse a buffer and a map across loop iterations, because we can. Make recordTypeDone only track named types, as that is enough to detect type cycles. Without named types, there can be no cycles. These two reduce allocs by a fraction of a percent: name old time/op new time/op delta Build-16 10.4s ± 2% 10.4s ± 1% ~ (p=0.739 n=10+10) name old bin-B new bin-B delta Build-16 5.51M ± 0% 5.51M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 391ms ± 9% 407ms ± 7% ~ (p=0.095 n=10+9) name old mallocs/op new mallocs/op delta Build-16 34.5M ± 0% 34.4M ± 0% -0.12% (p=0.000 n=10+10) name old sys-time/op new sys-time/op delta Build-16 5.87s ± 5% 5.82s ± 5% ~ (p=0.182 n=10+9) It doesn't seem like much, but remember that these stats are for the entire set of processes, where garble only accounts for about 10% of the total wall time when compared to the compiler or linker. So a ~0.1% decrease globally is still significant. linkerVariableStrings is also indexed by *types.Var rather than types.Object, since -ldflags=-X only supports setting the string value of variables. This shouldn't make a significant difference in terms of allocs, but at least the map is less prone to confusion with other object types. To ensure the new code doesn't trip up on non-variables, we add test cases. Finally, for the sake of clarity, index into the types.Info maps like Defs and Uses rather than calling ObjectOf if we know whether the identifier we have is a definition of a name or the use of a defined name. This isn't better in terms of performance, as ObjectOf is a tiny method, but just like with linkerVariableStrings before, the new code is clearer.
2 years ago
return
}
done[used] = true
// If we have a generic struct like
//
// type Foo[T any] struct { Bar T }
//
// then we want the hashing to use the original "Bar T",
// because otherwise different instances like "Bar int" and "Bar bool"
// will result in different hashes and the field names will break.
// Ensure we record the original generic struct, if there is one.
recordType(used.Underlying(), used.Origin().Underlying(), done, fieldToStruct)
case *types.Struct:
origin := origin.(*types.Struct)
for i := range used.NumFields() {
field := used.Field(i)
fieldToStruct[field] = origin
if field.Embedded() {
recordType(field.Type(), origin.Field(i).Type(), done, fieldToStruct)
}
}
}
}
// isSafeForInstanceType returns true if the passed type is safe for var declaration.
// Unsafe types: generic types and non-method interfaces.
func isSafeForInstanceType(typ types.Type) bool {
switch t := typ.(type) {
case *types.Named:
if t.TypeParams().Len() > 0 {
return false
}
return isSafeForInstanceType(t.Underlying())
case *types.Signature:
return t.TypeParams().Len() == 0
case *types.Interface:
return t.IsMethodSet()
}
return true
}
func (tf *transformer) useAllImports(file *ast.File) {
if tf.usedAllImportsFiles == nil {
tf.usedAllImportsFiles = make(map[*ast.File]bool)
} else if ok := tf.usedAllImportsFiles[file]; ok {
return
}
tf.usedAllImportsFiles[file] = true
for _, imp := range file.Imports {
if imp.Name != nil && imp.Name.Name == "_" {
continue
}
pkgName := tf.info.PkgNameOf(imp)
pkgScope := pkgName.Imported().Scope()
var nameObj types.Object
for _, name := range pkgScope.Names() {
if obj := pkgScope.Lookup(name); obj.Exported() && isSafeForInstanceType(obj.Type()) {
nameObj = obj
break
}
}
if nameObj == nil {
// A very unlikely situation where there is no suitable declaration for a reference variable
// and almost certainly means that there is another import reference in code.
continue
}
spec := &ast.ValueSpec{Names: []*ast.Ident{ast.NewIdent("_")}}
decl := &ast.GenDecl{Specs: []ast.Spec{spec}}
nameIdent := ast.NewIdent(nameObj.Name())
var nameExpr ast.Expr
switch {
case imp.Name == nil: // import "pkg/path"
nameExpr = &ast.SelectorExpr{
X: ast.NewIdent(pkgName.Name()),
Sel: nameIdent,
}
case imp.Name.Name != ".": // import path2 "pkg/path"
nameExpr = &ast.SelectorExpr{
X: ast.NewIdent(imp.Name.Name),
Sel: nameIdent,
}
default: // import . "pkg/path"
nameExpr = nameIdent
}
switch nameObj.(type) {
case *types.Const:
// const _ = <value>
decl.Tok = token.CONST
spec.Values = []ast.Expr{nameExpr}
case *types.Var, *types.Func:
// var _ = <value>
decl.Tok = token.VAR
spec.Values = []ast.Expr{nameExpr}
case *types.TypeName:
// var _ <type>
decl.Tok = token.VAR
spec.Type = nameExpr
default:
continue // skip *types.Builtin and others
}
// Ensure that types.Info.Uses is up to date.
tf.info.Uses[nameIdent] = nameObj
file.Decls = append(file.Decls, decl)
}
}
// transformGoFile obfuscates the provided Go syntax file.
func (tf *transformer) transformGoFile(file *ast.File) *ast.File {
// Only obfuscate the literals here if the flag is on
// and if the package in question is to be obfuscated.
support GOGARBLE=* with -literals again We recently made an important change when obfuscating the runtime, so that if it's missing any linkname packages in ListedPackages, it does an extra "go list" call to obtain their information. This works very well, but we missed an edge case. In main.go, we disable flagLiterals for the runtime package, but not for other packages like sync/atomic. And, since the runtime's extra "go list" has to compute GarbleActionIDs, it uses the list of garble flags via appendFlags. Unfortunately, it thinks "-literals" isn't set, when it is, and the other packages see it as being set. This discrepancy results in link time errors, as each end of the linkname obfuscates with a different hash: > garble -literals build [stderr] # test/main jccGkbFG.(*yijmzGHo).String: relocation target jccGkbFG.e_77sflf not defined jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.ce66Fmzl not defined jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.e5kPa1qY not defined jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.aQ_3sL3Q not defined jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.zls3wmws not defined jQg9GEkg.(*NLxfRPAP).pB5p2ZP0: relocation target jQg9GEkg.g69WgKIS not defined To fix the problem, treat flagLiterals as read-only after flag.Parse, just like we already do with the other flags except flagDebugDir. The code that turned flagLiterals to false is no longer needed, as literals.Obfuscate is only called when ToObfuscate is true, and ToObfuscate is false for runtimeAndDeps already.
2 years ago
//
// We can't obfuscate literals in the runtime and its dependencies,
// because obfuscated literals sometimes escape to heap,
// and that's not allowed in the runtime itself.
if flagLiterals && tf.curPkg.ToObfuscate {
file = literals.Obfuscate(tf.obfRand, file, tf.info, tf.linkerVariableStrings)
// some imported constants might not be needed anymore, remove unnecessary imports
tf.useAllImports(file)
}
pre := func(cursor *astutil.Cursor) bool {
node, ok := cursor.Node().(*ast.Ident)
if !ok {
return true
}
name := node.Name
if name == "_" {
return true // unnamed remains unnamed
}
obj := tf.info.ObjectOf(node)
if obj == nil {
_, isImplicit := tf.info.Defs[node]
_, parentIsFile := cursor.Parent().(*ast.File)
if !isImplicit || parentIsFile {
// We only care about nil objects in the switch scenario below.
return true
}
// In a type switch like "switch foo := bar.(type) {",
// "foo" is being declared as a symbolic variable,
// as it is only actually declared in each "case SomeType:".
//
// As such, the symbolic "foo" in the syntax tree has no object,
// but it is still recorded under Defs with a nil value.
// We still want to obfuscate that syntax tree identifier,
// so if we detect the case, create a dummy types.Var for it.
//
// Note that "package mypkg" also denotes a nil object in Defs,
// and we don't want to treat that "mypkg" as a variable,
// so avoid that case by checking the type of cursor.Parent.
obj = types.NewVar(node.Pos(), tf.pkg, name, nil)
}
pkg := obj.Pkg()
if vr, ok := obj.(*types.Var); ok && vr.Embedded() {
// The docs for ObjectOf say:
//
// If id is an embedded struct field, ObjectOf returns the
// field (*Var) it defines, not the type (*TypeName) it uses.
//
// If this embedded field is a type alias, we want to
// handle the alias's TypeName instead of treating it as
// the type the alias points to.
//
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
// Alternatively, if we don't have an alias, we still want to
// use the embedded type, not the field.
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
vrStr := recordedObjectString(vr)
aliasTypeName, ok := tf.curPkgCache.EmbeddedAliasFields[vrStr]
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
if ok {
aliasScope := tf.pkg.Scope()
if path := aliasTypeName.PkgPath; path == "" {
aliasScope = types.Universe
} else if path != tf.pkg.Path() {
// If the package is a dependency, import it.
// We can't grab the package via tf.pkg.Imports,
// because some of the packages under there are incomplete.
// ImportFrom will cache complete imports, anyway.
pkg2, err := tf.origImporter.ImportFrom(path, parentWorkDir, 0)
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
if err != nil {
panic(err)
}
aliasScope = pkg2.Scope()
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
}
tname, ok := aliasScope.Lookup(aliasTypeName.Name).(*types.TypeName)
if !ok {
panic(fmt.Sprintf("EmbeddedAliasFields pointed %q to a missing type %q", vrStr, aliasTypeName))
}
if !tname.IsAlias() {
panic(fmt.Sprintf("EmbeddedAliasFields pointed %q to a non-alias type %q", vrStr, aliasTypeName))
}
obj = tname
} else {
named := namedType(obj.Type())
if named == nil {
return true // unnamed type (probably a basic type, e.g. int)
}
properly record when type aliases are embedded as fields There are two scenarios when it comes to embedding fields. The first is easy, and we always handled it well: type Named struct { Foo int } type T struct { Named } In this scenario, T ends up with an embedded field named "Named", and a promoted field named "Foo". Then there's the form with a type alias: type Named struct { Foo int } type Alias = Named type T struct { Alias } This case is different: T ends up with an embedded field named "Alias", and a promoted field named "Foo". Note how the field gets its name from the referenced type, even if said type is just an alias to another type. This poses two problems. First, we must obfuscate the field T.Alias as the name "Alias", and not as the name "Named" that the alias points to. Second, we must be careful of cases where Named and Alias are declared in different packages, as they will obfuscate the same name differently. Both of those problems compounded in the reported issue. The actual reason is that quic-go has a type alias in the form of: type ConnectionState = qtls.ConnectionState In other words, the entire problem boils down to a type alias which points to a named type in a different package, where both types share the same name. For example: package parent import "parent/p1" type T struct { p1.SameName } [...] package p1 import "parent/p2" type SameName = p2.SameName [...] package p2 type SameName struct { Foo int } This broke garble because we had a heuristic to detect when an embedded field was a type alias: // Instead, detect such a "foreign alias embed". // If we embed a final named type, // but the field name does not match its name, // then it must have been done via an alias. // We dig out the alias's TypeName via locateForeignAlias. if named.Obj().Name() != node.Name { As the reader can deduce, this heuristic would incorrectly assume that the snippet above does not embed a type alias, when in fact it does. When obfuscating the field T.SameName, which uses a type alias, we would correctly obfuscate the name "SameName", but we would incorrectly obfuscate it with the package p2, not p1. This would then result in build errors. To fix this problem for good, we need to get rid of the heuristic. Instead, we now mimic what was done for KnownCannotObfuscate, but for embedded fields which use type aliases. KnownEmbeddedAliasFields is now filled for each package and stored in the cache as part of cachedOutput. We can then detect the "embedded alias" case reliably, even when the field is declared in an imported package. On the plus side, we get to remove locateForeignAlias. We also add a couple of TODOs to record further improvements. Finally, add a test. Fixes #466.
2 years ago
obj = named.Obj()
}
pkg = obj.Pkg()
}
if pkg == nil {
return true // universe scope
}
// TODO: We match by object name here, which is actually imprecise.
// For example, in package embed we match the type FS, but we would also
// match any field or method named FS.
// Can we instead use an object map like ReflectObjects?
avoid breaking intrinsics when obfuscating names We obfuscate import paths as well as their declared names. The compiler treats some packages and APIs in special ways, and the way it detects those is by looking at import paths and names. In the past, we have avoided obfuscating some names like embed.FS or reflect.Value.MethodByName for this reason. Otherwise, go:embed or the linker's deadcode elimination might be broken. This matching by path and name also happens with compiler intrinsics. Intrinsics allow the compiler to rewrite some standard library calls with small and efficient assembly, depending on the target GOARCH. For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32, which on amd64 may result in using the TZCNTL instruction. We never noticed that we were breaking many of these intrinsics. The intrinsics for funcs declared in the runtime and its dependencies still worked properly, as we do not obfuscate those packages yet. However, for other packages like math/bits and sync/atomic, the intrinsics were being entirely disabled due to obfuscated names. Skipping intrinsics is particularly bad for performance, and it also leads to slightly larger binaries: │ old │ new │ │ bin-B │ bin-B vs base │ Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4) Finally, the main reason we noticed that intrinsics were broken is that apparently GOARCH=mips fails to link without them, as some symbols end up being not defined at all. This patch fixes builds for the MIPS family of architectures. Rather than building and linking all of std for every GOARCH, test that intrinsics work by asking the compiler to print which intrinsics are being applied, and checking that math/bits gets them. This fix is relatively unfortunate, as it means we stop obfuscating about 120 function names and a handful of package paths. However, fixing builds and intrinsics is much more important. We can figure out better ways to deal with intrinsics in the future. Fixes #646.
1 year ago
path := pkg.Path()
switch path {
case "sync/atomic", "runtime/internal/atomic":
if name == "align64" {
return true
}
case "embed":
// FS is detected by the compiler for //go:embed.
if name == "FS" {
return true
}
case "reflect":
switch name {
// Per the linker's deadcode.go docs,
// the Method and MethodByName methods are what drive the logic.
case "Method", "MethodByName":
return true
}
case "crypto/x509/pkix":
// For better or worse, encoding/asn1 detects a "SET" suffix on slice type names
// to tell whether those slices should be treated as sets or sequences.
// Do not obfuscate those names to prevent breaking x509 certificates.
// TODO: we can surely do better; ideally propose a non-string-based solution
// upstream, or as a fallback, obfuscate to a name ending with "SET".
if strings.HasSuffix(name, "SET") {
return true
}
}
// The package that declared this object did not obfuscate it.
if usedForReflect(tf.curPkgCache, obj) {
stop loading obfuscated type information from deps If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.
2 years ago
return true
}
lpkg, err := listPackage(tf.curPkg, path)
refactor "current package" with TOOLEXEC_IMPORTPATH (#266) Now that we've dropped support for Go 1.15.x, we can finally rely on this environment variable for toolexec calls, present in Go 1.16. Before, we had hacky ways of trying to figure out the current package's import path, mostly from the -p flag. The biggest rough edge there was that, for main packages, that was simply the package name, and not its full import path. To work around that, we had a restriction on a single main package, so we could work around that issue. That restriction is now gone. The new code is simpler, especially because we can set curPkg in a single place for all toolexec transform funcs. Since we can always rely on curPkg not being nil now, we can also start reusing listedPackage.Private and avoid the majority of repeated calls to isPrivate. The function is cheap, but still not free. isPrivate itself can also get simpler. We no longer have to worry about the "main" edge case. Plus, the sanity check for invalid package paths is now unnecessary; we only got malformed paths from goobj2, and we now require exact matches with the ImportPath field from "go list -json". Another effect of clearing up the "main" edge case is that -debugdir now uses the right directory for main packages. We also start using consistent debugdir paths in the tests, for the sake of being easier to read and maintain. Finally, note that commandReverse did not need the extra call to "go list -toolexec", as the "shared" call stored in the cache is enough. We still call toolexecCmd to get said cache, which should probably be simplified in a future PR. While at it, replace the use of the "-std" compiler flag with the Standard field from "go list -json".
3 years ago
if err != nil {
panic(err) // shouldn't happen
}
if !lpkg.ToObfuscate {
return true // we're not obfuscating this package
}
hashToUse := lpkg.GarbleActionID
debugName := "variable"
// log.Printf("%s: %#v %T", fset.Position(node.Pos()), node, obj)
switch obj := obj.(type) {
case *types.Var:
fix a number of issues involving types from indirect imports obfuscatedTypesPackage is used to figure out if a name in a dependency package was obfuscated or not. For example, if that package used reflection on a named type, it wasn't obfuscated, so we must have the same information to not obfuscate the same name downstream. obfuscatedTypesPackage could return nil if the package was indirectly imported, though. This can happen if a direct import has a function that returns an indirect type, or if a direct import exposes a name that's a type alias to an indirect type. We sort of dealt with this in two pieces of code by checking for obfPkg!=nil, but a third one did not have this check and caused a panic in the added test case: --- FAIL: TestScripts/reflect (0.81s) testscript.go:397: > env GOPRIVATE=test/main > garble build [stderr] # test/main panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39] More importantly though, the nil check only avoids panics. It doesn't fix the root cause of the problem: that importcfg does not contain indirectly imported packages. The added test case would still fail, as we would obfuscate a type in the main package, but not in the indirectly imported package where the type is defined. To fix this, resurrect a bit of code from earlier garble versions, which uses "go list -toolexec=garble" to fetch a package's export file. This lets us fill the indirect import gaps in importcfg, working around the problem entirely. This solution is still not particularly great, so we add a TODO about possibly rethinking this in the future. It does add some overhead and complexity, though thankfully indirect imports should be uncommon. This fixes a few panics while building the protobuf module.
3 years ago
if !obj.IsField() {
// Identifiers denoting variables are always obfuscated.
fix a number of issues involving types from indirect imports obfuscatedTypesPackage is used to figure out if a name in a dependency package was obfuscated or not. For example, if that package used reflection on a named type, it wasn't obfuscated, so we must have the same information to not obfuscate the same name downstream. obfuscatedTypesPackage could return nil if the package was indirectly imported, though. This can happen if a direct import has a function that returns an indirect type, or if a direct import exposes a name that's a type alias to an indirect type. We sort of dealt with this in two pieces of code by checking for obfPkg!=nil, but a third one did not have this check and caused a panic in the added test case: --- FAIL: TestScripts/reflect (0.81s) testscript.go:397: > env GOPRIVATE=test/main > garble build [stderr] # test/main panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39] More importantly though, the nil check only avoids panics. It doesn't fix the root cause of the problem: that importcfg does not contain indirectly imported packages. The added test case would still fail, as we would obfuscate a type in the main package, but not in the indirectly imported package where the type is defined. To fix this, resurrect a bit of code from earlier garble versions, which uses "go list -toolexec=garble" to fetch a package's export file. This lets us fill the indirect import gaps in importcfg, working around the problem entirely. This solution is still not particularly great, so we add a TODO about possibly rethinking this in the future. It does add some overhead and complexity, though thankfully indirect imports should be uncommon. This fixes a few panics while building the protobuf module.
3 years ago
break
}
debugName = "field"
fix a number of issues involving types from indirect imports obfuscatedTypesPackage is used to figure out if a name in a dependency package was obfuscated or not. For example, if that package used reflection on a named type, it wasn't obfuscated, so we must have the same information to not obfuscate the same name downstream. obfuscatedTypesPackage could return nil if the package was indirectly imported, though. This can happen if a direct import has a function that returns an indirect type, or if a direct import exposes a name that's a type alias to an indirect type. We sort of dealt with this in two pieces of code by checking for obfPkg!=nil, but a third one did not have this check and caused a panic in the added test case: --- FAIL: TestScripts/reflect (0.81s) testscript.go:397: > env GOPRIVATE=test/main > garble build [stderr] # test/main panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39] More importantly though, the nil check only avoids panics. It doesn't fix the root cause of the problem: that importcfg does not contain indirectly imported packages. The added test case would still fail, as we would obfuscate a type in the main package, but not in the indirectly imported package where the type is defined. To fix this, resurrect a bit of code from earlier garble versions, which uses "go list -toolexec=garble" to fetch a package's export file. This lets us fill the indirect import gaps in importcfg, working around the problem entirely. This solution is still not particularly great, so we add a TODO about possibly rethinking this in the future. It does add some overhead and complexity, though thankfully indirect imports should be uncommon. This fixes a few panics while building the protobuf module.
3 years ago
// From this point on, we deal with struct fields.
// Fields don't get hashed with the package's action ID.
// They get hashed with the type of their parent struct.
// This is because one struct can be converted to another,
// as long as the underlying types are identical,
// even if the structs are defined in different packages.
//
// TODO: Consider only doing this for structs where all
// fields are exported. We only need this special case
// for cross-package conversions, which can't work if
// any field is unexported. If that is done, add a test
// that ensures unexported fields from different
// packages result in different obfuscated names.
fix a number of issues involving types from indirect imports obfuscatedTypesPackage is used to figure out if a name in a dependency package was obfuscated or not. For example, if that package used reflection on a named type, it wasn't obfuscated, so we must have the same information to not obfuscate the same name downstream. obfuscatedTypesPackage could return nil if the package was indirectly imported, though. This can happen if a direct import has a function that returns an indirect type, or if a direct import exposes a name that's a type alias to an indirect type. We sort of dealt with this in two pieces of code by checking for obfPkg!=nil, but a third one did not have this check and caused a panic in the added test case: --- FAIL: TestScripts/reflect (0.81s) testscript.go:397: > env GOPRIVATE=test/main > garble build [stderr] # test/main panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x8a5e39] More importantly though, the nil check only avoids panics. It doesn't fix the root cause of the problem: that importcfg does not contain indirectly imported packages. The added test case would still fail, as we would obfuscate a type in the main package, but not in the indirectly imported package where the type is defined. To fix this, resurrect a bit of code from earlier garble versions, which uses "go list -toolexec=garble" to fetch a package's export file. This lets us fill the indirect import gaps in importcfg, working around the problem entirely. This solution is still not particularly great, so we add a TODO about possibly rethinking this in the future. It does add some overhead and complexity, though thankfully indirect imports should be uncommon. This fixes a few panics while building the protobuf module.
3 years ago
strct := tf.fieldToStruct[obj]
if strct == nil {
panic("could not find struct for field " + name)
}
node.Name = hashWithStruct(strct, obj)
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
log.Printf("%s %q hashed with struct fields to %q", debugName, name, node.Name)
}
return true
case *types.TypeName:
debugName = "type"
case *types.Func:
avoid breaking intrinsics when obfuscating names We obfuscate import paths as well as their declared names. The compiler treats some packages and APIs in special ways, and the way it detects those is by looking at import paths and names. In the past, we have avoided obfuscating some names like embed.FS or reflect.Value.MethodByName for this reason. Otherwise, go:embed or the linker's deadcode elimination might be broken. This matching by path and name also happens with compiler intrinsics. Intrinsics allow the compiler to rewrite some standard library calls with small and efficient assembly, depending on the target GOARCH. For example, math/bits.TrailingZeros32 gets replaced with ssa.OpCtz32, which on amd64 may result in using the TZCNTL instruction. We never noticed that we were breaking many of these intrinsics. The intrinsics for funcs declared in the runtime and its dependencies still worked properly, as we do not obfuscate those packages yet. However, for other packages like math/bits and sync/atomic, the intrinsics were being entirely disabled due to obfuscated names. Skipping intrinsics is particularly bad for performance, and it also leads to slightly larger binaries: │ old │ new │ │ bin-B │ bin-B vs base │ Build-16 5.450Mi ± ∞ ¹ 5.333Mi ± ∞ ¹ -2.15% (p=0.029 n=4) Finally, the main reason we noticed that intrinsics were broken is that apparently GOARCH=mips fails to link without them, as some symbols end up being not defined at all. This patch fixes builds for the MIPS family of architectures. Rather than building and linking all of std for every GOARCH, test that intrinsics work by asking the compiler to print which intrinsics are being applied, and checking that math/bits gets them. This fix is relatively unfortunate, as it means we stop obfuscating about 120 function names and a handful of package paths. However, fixing builds and intrinsics is much more important. We can figure out better ways to deal with intrinsics in the future. Fixes #646.
1 year ago
if compilerIntrinsicsFuncs[path+"."+name] {
return true
}
sign := obj.Type().(*types.Signature)
if sign.Recv() == nil {
debugName = "func"
} else {
debugName = "method"
}
if obj.Exported() && sign.Recv() != nil {
return true // might implement an interface
}
switch name {
case "main", "init", "TestMain":
return true // don't break them
}
if strings.HasPrefix(name, "Test") && isTestSignature(sign) {
return true // don't break tests
}
default:
return true // we only want to rename the above
}
node.Name = hashWithPackage(lpkg, name)
// TODO: probably move the debugf lines inside the hash funcs
if flagDebug { // TODO(mvdan): remove once https://go.dev/issue/53465 if fixed
log.Printf("%s %q hashed with %x… to %q", debugName, name, hashToUse[:4], node.Name)
}
return true
}
post := func(cursor *astutil.Cursor) bool {
imp, ok := cursor.Node().(*ast.ImportSpec)
if !ok {
return true
}
path, err := strconv.Unquote(imp.Path.Value)
if err != nil {
panic(err) // should never happen
}
// We're importing an obfuscated package.
// Replace the import path with its obfuscated version.
// If the import was unnamed, give it the name of the
// original package name, to keep references working.
lpkg, err := listPackage(tf.curPkg, path)
if err != nil {
panic(err) // should never happen
}
if !lpkg.ToObfuscate {
return true
}
if lpkg.Name != "main" {
newPath := lpkg.obfuscatedImportPath()
imp.Path.Value = strconv.Quote(newPath)
}
if imp.Name == nil {
imp.Name = &ast.Ident{
NamePos: imp.Path.ValuePos, // ensure it ends up on the same line
Name: lpkg.Name,
}
}
return true
}
return astutil.Apply(file, pre, post).(*ast.File)
}
// named tries to obtain the *types.Named behind a type, if there is one.
// This is useful to obtain "testing.T" from "*testing.T", or to obtain the type
// declaration object from an embedded field.
func namedType(t types.Type) *types.Named {
switch t := t.(type) {
case *types.Named:
return t
case interface{ Elem() types.Type }:
return namedType(t.Elem())
default:
return nil
}
}
// isTestSignature returns true if the signature matches "func _(*testing.T)".
func isTestSignature(sign *types.Signature) bool {
if sign.Recv() != nil {
return false // test funcs don't have receivers
}
params := sign.Params()
if params.Len() != 1 {
return false // too many parameters for a test func
}
named := namedType(params.At(0).Type())
if named == nil {
return false // the only parameter isn't named, like "string"
}
obj := named.Obj()
return obj != nil && obj.Pkg().Path() == "testing" && obj.Name() == "T"
}
func (tf *transformer) transformLink(args []string) ([]string, error) {
initial support for build caching (#142) As per the discussion in https://github.com/golang/go/issues/41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
4 years ago
// We can't split by the ".a" extension, because cached object files
// lack any extension.
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
flags, args := splitFlagsFromArgs(args)
newImportCfg, err := tf.processImportCfg(flags, nil)
if err != nil {
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
return nil, err
}
// TODO: unify this logic with the -X handling when using -literals.
// We should be able to handle both cases via the syntax tree.
//
// Make sure -X works with obfuscated identifiers.
// To cover both obfuscated and non-obfuscated names,
// duplicate each flag with a obfuscated version.
flagValueIter(flags, "-X", func(val string) {
// val is in the form of "foo.com/bar.name=value".
fullName, stringValue, found := strings.Cut(val, "=")
if !found {
return // invalid
}
// fullName is "foo.com/bar.name"
i := strings.LastIndexByte(fullName, '.')
path, name := fullName[:i], fullName[i+1:]
refactor "current package" with TOOLEXEC_IMPORTPATH (#266) Now that we've dropped support for Go 1.15.x, we can finally rely on this environment variable for toolexec calls, present in Go 1.16. Before, we had hacky ways of trying to figure out the current package's import path, mostly from the -p flag. The biggest rough edge there was that, for main packages, that was simply the package name, and not its full import path. To work around that, we had a restriction on a single main package, so we could work around that issue. That restriction is now gone. The new code is simpler, especially because we can set curPkg in a single place for all toolexec transform funcs. Since we can always rely on curPkg not being nil now, we can also start reusing listedPackage.Private and avoid the majority of repeated calls to isPrivate. The function is cheap, but still not free. isPrivate itself can also get simpler. We no longer have to worry about the "main" edge case. Plus, the sanity check for invalid package paths is now unnecessary; we only got malformed paths from goobj2, and we now require exact matches with the ImportPath field from "go list -json". Another effect of clearing up the "main" edge case is that -debugdir now uses the right directory for main packages. We also start using consistent debugdir paths in the tests, for the sake of being easier to read and maintain. Finally, note that commandReverse did not need the extra call to "go list -toolexec", as the "shared" call stored in the cache is enough. We still call toolexecCmd to get said cache, which should probably be simplified in a future PR. While at it, replace the use of the "-std" compiler flag with the Standard field from "go list -json".
3 years ago
// If the package path is "main", it's the current top-level
// package we are linking.
// Otherwise, find it in the cache.
lpkg := tf.curPkg
if path != "main" {
lpkg = sharedCache.ListedPackages[path]
}
if lpkg == nil {
// We couldn't find the package.
// Perhaps a typo, perhaps not part of the build.
// cmd/link ignores those, so we should too.
return
}
refactor "current package" with TOOLEXEC_IMPORTPATH (#266) Now that we've dropped support for Go 1.15.x, we can finally rely on this environment variable for toolexec calls, present in Go 1.16. Before, we had hacky ways of trying to figure out the current package's import path, mostly from the -p flag. The biggest rough edge there was that, for main packages, that was simply the package name, and not its full import path. To work around that, we had a restriction on a single main package, so we could work around that issue. That restriction is now gone. The new code is simpler, especially because we can set curPkg in a single place for all toolexec transform funcs. Since we can always rely on curPkg not being nil now, we can also start reusing listedPackage.Private and avoid the majority of repeated calls to isPrivate. The function is cheap, but still not free. isPrivate itself can also get simpler. We no longer have to worry about the "main" edge case. Plus, the sanity check for invalid package paths is now unnecessary; we only got malformed paths from goobj2, and we now require exact matches with the ImportPath field from "go list -json". Another effect of clearing up the "main" edge case is that -debugdir now uses the right directory for main packages. We also start using consistent debugdir paths in the tests, for the sake of being easier to read and maintain. Finally, note that commandReverse did not need the extra call to "go list -toolexec", as the "shared" call stored in the cache is enough. We still call toolexecCmd to get said cache, which should probably be simplified in a future PR. While at it, replace the use of the "-std" compiler flag with the Standard field from "go list -json".
3 years ago
// As before, the main package must remain as "main".
newPath := path
if path != "main" {
newPath = lpkg.obfuscatedImportPath()
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
}
newName := hashWithPackage(lpkg, name)
flags = append(flags, fmt.Sprintf("-X=%s.%s=%s", newPath, newName, stringValue))
})
update support for Go 1.17 in time for beta1 Back in early April we added initial support for Go 1.17, working on a commit from master at that time. For that to work, we just needed to add a couple of packages to runtimeRelated and tweak printFile a bit to not break the new "//go:build" directives. A significant amount of changes have landed since, though, and the tests broke in multiple ways. Most notably, the new register ABI is enabled by default for GOOS=amd64. That affected garble indirectly in two ways: there's a new internal package to add to runtimeRelated, and we must make reverse.txt more clever in making its output constant across ABIs. Another noticeable change is that Go 1.17 changes how its own version is injected into the runtime package. It used to be via a constant in runtime/internal/sys, such as: const TheVersion = `devel ...` Since we couldn't override such constants via the linker's -X flag, we had to directly alter the declaration while compiling. Thankfully, Go 1.17 simply uses a "var buildVersion string" in the runtime package, and its value is injected by the linker. This means we can now override it with the linker's -X flag. We make the code to alter TheVersion for Go 1.16 a bit more clever, to not break the package when building with Go 1.17. Finally, our hack to work around ambiguous TOOLEXEC_IMPORTPATH values now only kicks in for non-test packages, since Go 1.17 includes our upstream fix. Otherwise, some tests would end up with the ".test" variant suffix added a second time: test/bar [test/bar.test] [test/bar [test/bar.test].test] All the code to keep compatibility with Go 1.16.x remains in place. We're still leaving TODOs to remind ourselves to remove it or simplify it once we remove support for 1.16.x. The 1.17 development freeze has already been in place for a month, and beta1 is due to come this week, so it's unlikely that Go will change in any considerable way at this point. Hence, we can say that support for 1.17 is done. Fixes #347.
3 years ago
// Starting in Go 1.17, Go's version is implicitly injected by the linker.
// It's the same method as -X, so we can override it with an extra flag.
flags = append(flags, "-X=runtime.buildVersion=unknown")
// Ensure we strip the -buildid flag, to not leak any build IDs for the
// link operation or the main package's compilation.
flags = flagSetValue(flags, "-buildid", "")
// Strip debug information and symbol tables.
flags = append(flags, "-w", "-s")
reimplement import path obfuscation without goobj2 (#242) We used to rely on a parallel implementation of an object file parser and writer to be able to obfuscate import paths. After compiling each package, we would parse the object file, replace the import paths, and write the updated object file in-place. That worked well, in most cases. Unfortunately, it had some flaws: * Complexity. Even when most of the code is maintained in a separate module, the import_obfuscation.go file was still close to a thousand lines of code. * Go compatibility. The object file format changes between Go releases, so we were supporting Go 1.15, but not 1.16. Fixing the object file package to work with 1.16 would probably break 1.15 support. * Bugs. For example, we recently had to add a workaround for #224, since import paths containing dots after the domain would end up escaped. Another example is #190, which seems to be caused by the object file parser or writer corrupting the compiled code and causing segfaults in some rare edge cases. Instead, let's drop that method entirely, and force the compiler and linker to do the work for us. The steps necessary when compiling a package to obfuscate are: 1) Replace its "package foo" lines with the obfuscated package path. No need to separate the package path and name, since the obfuscated path does not contain slashes. 2) Replace the "-p pkg/foo" flag with the obfuscated path. 3) Replace the "import" spec lines with the obfuscated package paths, for those dependencies which were obfuscated. 4) Replace the "-importcfg [...]" file with a version that uses the obfuscated paths instead. The linker also needs that last step, since it also uses an importcfg file to find object files. There are three noteworthy drawbacks to this new method: 1) Since we no longer write object files, we can't use them to store data to be cached. As such, the -debugdir flag goes back to using the "-a" build flag to always rebuild all packages. On the plus side, that caching didn't work very well; see #176. 2) The package name "main" remains in all declarations under it, not just "func main", since we can only rename entire packages. This seems fine, as it gives little information to the end user. 3) The -tiny mode no longer sets all lines to 0, since it did that by modifying object files. As a temporary measure, we instead set all top-level declarations to be on line 1. A TODO is added to hopefully improve this again in the near future. The upside is that we get rid of all the issues mentioned before. Plus, garble now nearly works with Go 1.16, with the exception of two very minor bugs that look fixable. A follow-up PR will take care of that and start testing on 1.16. Fixes #176. Fixes #190.
3 years ago
flags = flagSetValue(flags, "-importcfg", newImportCfg)
return append(flags, args...), nil
}
func splitFlagsFromArgs(all []string) (flags, args []string) {
for i := 0; i < len(all); i++ {
arg := all[i]
if !strings.HasPrefix(arg, "-") {
return all[:i:i], all[i:]
}
if booleanFlags[arg] || strings.Contains(arg, "=") {
// Either "-bool" or "-name=value".
continue
}
// "-name value", so the next arg is part of this flag.
i++
}
return all, nil
}
func alterTrimpath(flags []string) []string {
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
trimpath := flagValue(flags, "-trimpath")
// Add our temporary dir to the beginning of -trimpath, so that we don't
// leak temporary dirs. Needs to be at the beginning, since there may be
// shorter prefixes later in the list, such as $PWD if TMPDIR=$PWD/tmp.
return flagSetValue(flags, "-trimpath", sharedTempDir+"=>;"+trimpath)
avoid reproducibility issues with full rebuilds We were using temporary filenames for modified Go and assembly files. For example, an obfuscated "encoding/json/encode.go" would end up as: /tmp/garble-shared123/encode.go.456.go where "123" and "456" are random numbers, usually longer. This was usually fine for two reasons: 1) We would add "/tmp/garble-shared123/" to -trimpath, so the temporary directory and its random number would be invisible. 2) We would add "//line" directives to the source files, replacing the filename with obfuscated versions excluding any random number. Unfortunately, this broke in multiple ways. Most notably, assembly files do not have any line directives, and it's not clear that there's any support for them. So the random number in their basename could end up in the binary, breaking reproducibility. Another issue is that the -trimpath addition described above was only done for cmd/compile, not cmd/asm, so assembly filenames included the randomized temporary directory. To fix the issues above, the same "encoding/json/encode.go" would now end up as: /tmp/garble-shared123/encoding/json/encode.go Such a path is still unique even though the "456" random number is gone, as import paths are unique within a single build. This fixes issues with the base name of each file, so we no longer rely on line directives as the only way to remove the second original random number. We still rely on -trimpath to get rid of the temporary directory in filenames. To fix its problem with assembly files, also amend the -trimpath flag when running the assembler tool. Finally, add a test that reproducible builds still work when a full rebuild is done. We choose goprivate.txt for such a test as its stdimporter package imports a number of std packages, including uses of assembly and cgo. For the time being, we don't use such a "full rebuild" reproducibility test in other test scripts, as this step is expensive, rebuilding many packages from scratch. This issue went unnoticed for over a year because such random numbers "123" and "456" were created when a package was obfuscated, and that only happened once per package version as long as the build cache was kept intact. When clearing the build cache, or forcing a rebuild with -a, one gets new random numbers, and thus a different binary resulting from the same build input. That's not something that most users would do regularly, and our tests did not cover that edge case either, until now. Fixes #328.
3 years ago
}
// forwardBuildFlags is obtained from 'go help build' as of Go 1.21.
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
var forwardBuildFlags = map[string]bool{
// These shouldn't be used in nested cmd/go calls.
"-a": false,
"-n": false,
"-x": false,
"-v": false,
// These are always set by garble.
"-trimpath": false,
"-toolexec": false,
"-buildvcs": false,
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
"-C": true,
"-asan": true,
"-asmflags": true,
"-buildmode": true,
"-compiler": true,
"-cover": true,
"-covermode": true,
"-coverpkg": true,
"-gccgoflags": true,
"-gcflags": true,
"-installsuffix": true,
"-ldflags": true,
"-linkshared": true,
"-mod": true,
"-modcacherw": true,
"-modfile": true,
"-msan": true,
"-overlay": true,
"-p": true,
"-pgo": true,
"-pkgdir": true,
"-race": true,
"-tags": true,
"-work": true,
"-workfile": true,
}
// booleanFlags is obtained from 'go help build' and 'go help testflag' as of Go 1.21.
var booleanFlags = map[string]bool{
// Shared build flags.
"-a": true,
"-asan": true,
"-buildvcs": true,
"-cover": true,
"-i": true,
"-linkshared": true,
"-modcacherw": true,
"-msan": true,
"-n": true,
"-race": true,
"-trimpath": true,
"-v": true,
"-work": true,
"-x": true,
// Test flags (TODO: support its special -args flag)
"-benchmem": true,
"-c": true,
"-failfast": true,
"-fullpath": true,
"-json": true,
"-short": true,
}
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
func filterForwardBuildFlags(flags []string) (filtered []string, firstUnknown string) {
for i := 0; i < len(flags); i++ {
arg := flags[i]
if strings.HasPrefix(arg, "--") {
arg = arg[1:] // "--name" to "-name"; keep the short form
}
name, _, _ := strings.Cut(arg, "=") // "-name=value" to "-name"
fail if we are unexpectedly overwriting files (#418) While investigating a bug report, I noticed that garble was writing to the same temp file twice. At best, writing to the same path on disk twice is wasteful, as the design is careful to be deterministic and use unique paths. At worst, the two writes could cause races at the filesystem level. To prevent either of those situations, we now create files with os.OpenFile and os.O_EXCL, meaning that we will error if the file already exists. That change uncovered a number of such unintended cases. First, transformAsm would write obfuscated Go files twice. This is because the Go toolchain actually runs: [...]/asm -gensymabis [...] foo.s bar.s [...]/asm [...] foo.s bar.s That is, the first run is only meant to generate symbol ABIs, which are then used by the compiler. We need to obfuscate at that first stage, because the symbol ABI descriptions need to use obfuscated names. However, having already obfuscated the assembly on the first stage, there is no need to do so again on the second stage. If we detect gensymabis is missing, we simply reuse the previous files. This first situation doesn't seem racy, but obfuscating the Go assembly files twice is certainly unnecessary. Second, saveKnownReflectAPIs wrote a gob file to the build cache. Since the build cache can be kept between builds, and since the build cache uses reproducible paths for each build, running the same "garble build" twice could overwrite those files. This could actually cause races at the filesystem level; if two concurrent builds write to the same gob file on disk, one of them could end up using a partially-written file. Note that this is the only of the three cases not using temporary files. As such, it is expected that the file may already exist. In such a case, we simply avoid overwriting it rather than failing. Third, when "garble build -a" was used, and when we needed an export file not listed in importcfg, we would end up calling roughly: go list -export -toolexec=garble -a <dependency> This meant we would re-build and re-obfuscate those packages. Which is unfortunate, because the parent process already did via: go build -toolexec=garble -a <main> The repeated dependency builds tripped the new os.O_EXCL check, as we would try to overwrite the same obfuscated Go files. Beyond being wasteful, this could again cause subtle filesystem races. To fix the problem, avoid passing flags like "-a" to nested go commands. Overall, we should likely be using safer ways to write to disk, be it via either atomic writes or locked files. However, for now, catching duplicate writes is a big step. I have left a self-assigned TODO for further improvements. CI on the pull request found a failure on test-gotip. The failure reproduces on master, so it seems to be related to gotip, and not a regression introduced by this change. For now, disable test-gotip until we can investigate.
3 years ago
buildFlag := forwardBuildFlags[name]
if buildFlag {
filtered = append(filtered, arg)
} else {
firstUnknown = name
}
if booleanFlags[arg] || strings.Contains(arg, "=") {
// Either "-bool" or "-name=value".
continue
}
// "-name value", so the next arg is part of this flag.
if i++; buildFlag && i < len(flags) {
filtered = append(filtered, flags[i])
}
}
return filtered, firstUnknown
}
// splitFlagsFromFiles splits args into a list of flag and file arguments. Since
// we can't rely on "--" being present, and we don't parse all flags upfront, we
// rely on finding the first argument that doesn't begin with "-" and that has
// the extension we expect for the list of paths.
//
// This function only makes sense for lower-level tool commands, such as
// "compile" or "link", since their arguments are predictable.
//
// We iterate from the end rather than from the start, to better protect
// oursrelves from flag arguments that may look like paths, such as:
//
// compile [flags...] -p pkg/path.go [more flags...] file1.go file2.go
//
// For now, since those confusing flags are always followed by more flags,
// iterating in reverse order works around them entirely.
func splitFlagsFromFiles(all []string, ext string) (flags, paths []string) {
for i := len(all) - 1; i >= 0; i-- {
arg := all[i]
if strings.HasPrefix(arg, "-") || !strings.HasSuffix(arg, ext) {
cutoff := i + 1 // arg is a flag, not a path
return all[:cutoff:cutoff], all[cutoff:]
}
}
return nil, all
5 years ago
}
// flagValue retrieves the value of a flag such as "-foo", from strings in the
// list of arguments like "-foo=bar" or "-foo" "bar". If the flag is repeated,
// the last value is returned.
func flagValue(flags []string, name string) string {
lastVal := ""
flagValueIter(flags, name, func(val string) {
lastVal = val
})
return lastVal
}
// flagValueIter retrieves all the values for a flag such as "-foo", like
// flagValue. The difference is that it allows handling complex flags, such as
// those whose values compose a list.
func flagValueIter(flags []string, name string, fn func(string)) {
for i, arg := range flags {
if val := strings.TrimPrefix(arg, name+"="); val != arg {
// -name=value
fn(val)
}
if arg == name { // -name ...
if i+1 < len(flags) {
// -name value
fn(flags[i+1])
}
}
}
}
func flagSetValue(flags []string, name, value string) []string {
for i, arg := range flags {
if strings.HasPrefix(arg, name+"=") {
// -name=value
flags[i] = name + "=" + value
return flags
}
if arg == name { // -name ...
if i+1 < len(flags) {
// -name value
flags[i+1] = value
return flags
}
return flags
}
}
return append(flags, name+"="+value)
}
func fetchGoEnv() error {
out, err := exec.Command("go", "env", "-json",
// Keep in sync with sharedCache.GoEnv.
"GOOS", "GOMOD", "GOVERSION", "GOROOT",
).CombinedOutput()
if err != nil {
// TODO: cover this in the tests.
fmt.Fprintf(os.Stderr, `Can't find the Go toolchain: %v
This is likely due to Go not being installed/setup correctly.
To install Go, see: https://go.dev/doc/install
`, err)
return errJustExit(1)
}
if err := json.Unmarshal(out, &sharedCache.GoEnv); err != nil {
return fmt.Errorf(`cannot unmarshal from "go env -json": %w`, err)
}
sharedCache.GOGARBLE = cmp.Or(os.Getenv("GOGARBLE"), "*") // we default to obfuscating everything
return nil
}