You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
garble/README.md

194 lines
8.7 KiB
Markdown

5 years ago
# garble
go install mvdan.cc/garble@latest
Obfuscate Go code by wrapping the Go toolchain. Requires Go 1.20 or later.
5 years ago
garble build [build flags] [packages]
The tool also supports `garble test` to run tests with obfuscated code,
`garble run` to obfuscate and execute simple programs,
and `garble reverse` to de-obfuscate text such as stack traces.
Run `garble -h` to see all available commands and flags.
You can also use `go install mvdan.cc/garble@master` to install the latest development version.
5 years ago
### Purpose
Produce a binary that works as well as a regular build, but that has as little
information about the original source code as possible.
The tool is designed to be:
initial support for build caching (#142) As per the discussion in https://github.com/golang/go/issues/41145, it turns out that we don't need special support for build caching in -toolexec. We can simply modify the behavior of "[...]/compile -V=full" and "[...]/link -V=full" so that they include garble's own version and options in the printed build ID. The part of the build ID that matters is the last, since it's the "content ID" which is used to work out whether there is a need to redo the action (build) or not. Since cmd/go parses the last word in the output as "buildID=...", we simply add "+garble buildID=_/_/_/${hash}". The slashes let us imitate a full binary build ID, but we assume that the other components such as the action ID are not necessary, since the only reader here is cmd/go and it only consumes the content ID. The reported content ID includes the tool's original content ID, garble's own content ID from the built binary, and the garble options which modify how we obfuscate code. If any of the three changes, we should use a different build cache key. GOPRIVATE also affects caching, since a different GOPRIVATE value means that we might have to garble a different set of packages. Include tests, which mainly check that 'garble build -v' prints package lines when we expect to always need to rebuild packages, and that it prints nothing when we should be reusing the build cache even when the built binary is missing. After this change, 'go test' on Go 1.15.2 stabilizes at about 8s on my machine, whereas it used to be at around 25s before.
4 years ago
* Coupled with `cmd/go`, to support modules and build caching
* Deterministic and reproducible, given the same initial source code
* Reversible given the original source, to de-obfuscate panic stack traces
5 years ago
### Mechanism
The tool wraps calls to the Go compiler and linker to transform the Go build, in
5 years ago
order to:
* Replace as many useful identifiers as possible with short base64 hashes
* Replace package paths with short base64 hashes
* Replace filenames and position information with short base64 hashes
* Remove all [build](https://go.dev/pkg/runtime/#Version) and [module](https://go.dev/pkg/runtime/debug/#ReadBuildInfo) information
* Strip debugging information and symbol tables via `-ldflags="-w -s"`
* [Obfuscate literals](#literal-obfuscation), if the `-literals` flag is given
* Remove [extra information](#tiny-mode), if the `-tiny` flag is given
By default, the tool obfuscates all the packages being built.
You can manually specify which packages to obfuscate via `GOGARBLE`,
a comma-separated list of glob patterns matching package path prefixes.
This format is borrowed from `GOPRIVATE`; see `go help private`.
Note that commands like `garble build` will use the `go` version found in your
`$PATH`. To use different versions of Go, you can
[install them](https://go.dev/doc/manage-install#installing-multiple)
and set up `$PATH` with them. For example, for Go 1.17.1:
```sh
$ go install golang.org/dl/go1.17.1@latest
$ go1.17.1 download
$ PATH=$(go1.17.1 env GOROOT)/bin:${PATH} garble build
```
### Use cases
A common question is why a code obfuscator is needed for Go, a compiled language.
Go binaries include a surprising amount of information about the original source;
even with debug information and symbol tables stripped, many names and positions
remain in place for the sake of traces, reflection, and debugging.
Some use cases for Go require sharing a Go binary with the end user.
If the source code for the binary is private or requires a purchase,
its obfuscation can help discourage reverse engineering.
A similar use case is a Go library whose source is private or purchased.
Since Go libraries cannot be imported in binary form, and Go plugins
[have their shortcomings](https://github.com/golang/go/issues/19282),
sharing obfuscated source code becomes an option.
See [#369](https://github.com/burrowers/garble/issues/369).
Obfuscation can also help with aspects entirely unrelated to licensing.
For example, the `-tiny` flag can make binaries 15% smaller,
similar to the [common practice in Android](https://developer.android.com/build/shrink-code#obfuscate) to reduce app sizes.
Obfuscation has also helped some open source developers work around
anti-virus scans incorrectly treating Go binaries as malware.
### Literal obfuscation
Using the `-literals` flag causes literal expressions such as strings to be
replaced with more complex expressions, resolving to the same value at run-time.
String literals injected via `-ldflags=-X` are also replaced by this flag.
This feature is opt-in, as it can cause slow-downs depending on the input code.
Literals used in constant expressions cannot be obfuscated, since they are
resolved at compile time. This includes any expressions part of a `const`
declaration, for example.
### Tiny mode
With the `-tiny` flag, even more information is stripped from the Go binary.
Position information is removed entirely, rather than being obfuscated.
Runtime code which prints panics, fatal errors, and trace/debug info is removed.
Many symbol names are also omitted from binary sections at link time.
All in all, this can make binaries about 15% smaller.
With this flag, no panics or fatal runtime errors will ever be printed, but they
can still be handled internally with `recover` as normal. In addition, the
`GODEBUG` environmental variable will be ignored.
Note that this flag can make debugging crashes harder, as a panic will simply
exit the entire program without printing a stack trace, and source code
positions and many names are removed.
Similarly, `garble reverse` is generally not useful in this mode.
### Control flow obfuscation
See: [CONTROLFLOW.md](docs/CONTROLFLOW.md)
### Speed
`garble build` should take about twice as long as `go build`, as it needs to
complete two builds. The original build, to be able to load and type-check the
input code, and then the obfuscated build.
Garble obfuscates one package at a time, mirroring how Go compiles one package
at a time. This allows Garble to fully support Go's build cache; incremental
`garble build` calls should only re-build and re-obfuscate modified code.
Note that the first call to `garble build` may be comparatively slow,
as it has to obfuscate each package for the first time. This is akin to clearing
`GOCACHE` with `go clean -cache` and running a `go build` from scratch.
Garble also makes use of its own cache to reuse work, akin to Go's `GOCACHE`.
It defaults to a directory under your user's cache directory,
such as `~/.cache/garble`, and can be placed elsewhere by setting `GARBLE_CACHE`.
### Determinism and seeds
Just like Go, garble builds are deterministic and reproducible in nature.
This has significant benefits, such as caching builds and being able to use
`garble reverse` to de-obfuscate stack traces.
By default, garble will obfuscate each package in a unique way,
which will change if its build input changes: the version of garble, the version
of Go, the package's source code, or any build parameter such as GOOS or -tags.
This is a reasonable default since guessing those inputs is very hard.
You can use the `-seed` flag to provide your own obfuscation randomness seed.
Reusing the same seed can help produce the same code obfuscation,
which can help when debugging or reproducing problems.
Regularly rotating the seed can also help against reverse-engineering in the long run,
as otherwise one can look at changes in how Go's standard library is obfuscated
to guess when the Go or garble versions were changed across a series of builds.
To always use a different seed for each build, use `-seed=random`.
Note that extra care should be taken when using custom seeds:
if a `-seed` value used in a build is lost, `garble reverse` will not work.
5 years ago
### Caveats
Most of these can improve with time and effort. The purpose of this section is
to document the current shortcomings of this tool.
* Exported methods are never obfuscated at the moment, since they could
be required by interfaces. This area is a work in progress; see
[#3](https://github.com/burrowers/garble/issues/3).
* Garble automatically detects which Go types are used with reflection
to avoid obfuscating them, as that might break your program.
Note that Garble obfuscates [one package at a time](#speed),
so if your reflection code inspects a type from an imported package,
you may need to add a "hint" in the imported package to exclude obfuscating it:
```go
type Message struct {
Command string
Args string
}
// Never obfuscate the Message type.
var _ = reflect.TypeOf(Message{})
```
* Aside from `GOGARBLE` to select patterns of packages to obfuscate,
and the hint above with `reflect.TypeOf` to exclude obfuscating particular types,
there is no supported way to exclude obfuscating a selection of files or packages.
More often than not, a user would want to do this to work around a bug; please file the bug instead.
* Go programs [are initialized](https://go.dev/ref/spec#Program_initialization) one package at a time,
where imported packages are always initialized before their importers,
and otherwise they are initialized in the lexical order of their import paths.
Since garble obfuscates import paths, this lexical order may change arbitrarily.
* Go plugins are not currently supported; see [#87](https://github.com/burrowers/garble/issues/87).
* Garble requires `git` to patch the linker. That can be avoided once go-gitdiff
supports [non-strict patches](https://github.com/bluekeyes/go-gitdiff/issues/30).
### Contributing
obfuscate unexported names like exported ones (#227) In 90fa325da7, the obfuscation logic was changed to use hashes for exported names, but incremental names starting at just one letter for unexported names. Presumably, this was done for the sake of binary size. I argue that this is not a good idea for the default mode for a number of reasons: 1) It makes reversing of stack traces nearly impossible for unexported names, since replacing an obfuscated name "c" with "originalName" would trigger too many false positives by matching single characters. 2) Exported and unexported names aren't different. We need to know how names were obfuscated at a later time in both cases, thanks to use cases like -ldflags=-X. Using short names for one but not the other doesn't make a lot of sense, and makes the logic inconsistent. 3) Shaving off three bytes for unexported names doesn't seem like a huge deal for the default mode, when we already have -tiny to optimize for size. This saves us a bit of work, but most importantly, simplifies the obfuscation state as we no longer need to carry privateNameMap between the compile and link stages. name old time/op new time/op delta Build-8 153ms ± 2% 150ms ± 2% ~ (p=0.065 n=6+6) name old bin-B new bin-B delta Build-8 7.09M ± 0% 7.08M ± 0% -0.24% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Build-8 296ms ± 5% 277ms ± 6% -6.50% (p=0.026 n=6+6) name old user-time/op new user-time/op delta Build-8 562ms ± 1% 558ms ± 3% ~ (p=0.329 n=5+6) Note that I do not oppose using short names for both exported and unexported names in the future for -tiny, since reversing of stack traces will by design not work there. The code can be resurrected from the git history if we want to improve -tiny that way in the future, as we'd need to store state in header files again. Another major cleanup we can do here is to no longer use the garbledImports map. From a look at obfuscateImports, we hash a package's import path with its action ID, much like exported names, so we can simply re-do that hashing for the linker's -X flag. garbledImports does have some logic to handle duplicate package names, but it's worth noting that should not affect package paths, as they are always unique. That area of code could probably do with some simplification in the future, too. While at it, make hashWith panic if either parameter is empty. obfuscateImports was hashing the main package path without a salt due to a bug, so we want to catch those in the future. Finally, make some tiny spacing and typo tweaks to the README.
3 years ago
We welcome new contributors. If you would like to contribute, see
[CONTRIBUTING.md](CONTRIBUTING.md) as a starting point.