Commit Graph

38 Commits (master)

Author SHA1 Message Date
Daniel Martí 9cb4a6f0c8 amend panic message after decodeHash got renamed 3 months ago
Daniel Martí 6f0e46f80b strip struct tags when hashing structs for type identity
This was a long standing TODO, and a user finally ran into it.
The fix isn't terribly straightforward, but I'm pretty happy with it.

Add a test case where the same struct field is identical
with no tag, with the "tagged1" json tag, and again with "tagged2".
While here, we add a test case for a regular named field too.

Fixes #801.
6 months ago
Daniel Martí 978fd6d518 appease Go 1.22's stricter base64 sanity checks
We were using an alphabet with a duplicate character on purpose.
Go 1.21 was perfectly fine with that, but 1.22 started noticing:

    panic: encoding alphabet includes duplicate symbols

I can't fault the new sanity check, because it makes sense in general.
What we are doing here is slightly bizarre, because we don't care
about decoding the name hashes at all.

Appease the sanity check by replacing dashes with duplicate characters
as a follow-up step. While here, use "a" rather than "z",
which is more common and less likely to be noticeable.
7 months ago
pagran 0e2e483472
add control flow obfuscation
Implemented control flow flattening with additional features such as block splitting and junk jumps
11 months ago
Daniel Martí cee53a7868 make GarbleActionID a full sha256 hash
This is in preparation for the switch to Go's cache package,
whose ActionID type is also a full sha256 hash with 32 bytes.
We were using "short" hashes as shown by `go tool buildid`,
since that was consistent and 15 bytes was generally enough.
1 year ago
Daniel Martí 414e3b7f70 tidy our build ID hash code a bit
First, rename "component" to "hash", since it's shorter and more useful.
A full build ID is two or four hashes joined with slashes.

Second, add sanity checks that buildIDHashLength is being followed.
Otherwise the use of []byte could lead to human error.

Third, move all the hash encoding and decoding logic together.
1 year ago
Daniel Martí 0c9a59127a rename cache global to sharedCache
Since we will start importing github.com/rogpeppe/go-internal/cache,
and I don't want to have to rename it or leave confusion around.
1 year ago
pagran 86b7e334ba
implement funcInfo.entryoff encryption
At linker stage, we now encrypt funcInfo.entryoff value with a simple algorithm (1 xor + 1 mul). 
This makes it harder to relate function metadata (e.g. name) to function itself in binary, almost without affecting performance.
1 year ago
pagran 9a8608f061
internal/literals: add benchmark to measure the run-time overhead 1 year ago
pagran 6ace03322f
patch and rebuild cmd/link to modify the magic value in pclntab
This value is hard-coded in the linker and written in a header.
We could rewrite the final binary, like we used to do with import paths,
but that would require once again maintaining libraries to do so.

Instead, we're now modifying the linker to do what we want.
It's not particularly hard, as every Go install has its source code,
and rebuilding a slightly modified linker only takes a few seconds at most.

Thanks to `go build -overlay`, we only need to copy the files we modify,
and right now we're just modifying one file in the toolchain.
We use a git patch, as the change is fairly static and small,
and the patch is easier to understand and maintain.

The other side of this change is in the runtime,
as it also hard-codes the magic value when loading information.
We modify the code via syntax trees in that case, like `-tiny` does,
because the change is tiny (one literal) and the affected lines of code
are modified regularly between major Go releases.

Since rebuilding a slightly modified linker can take a few seconds,
and Go's build cache does not cache linked binaries,
we keep our own cached version of the rebuilt binary in `os.UserCacheDir`.

The feature isn't perfect, and will be improved in the future.
See the TODOs about the added dependency on `git`,
or how we are currently only able to cache one linker binary at once.

Fixes #622.
1 year ago
Daniel Martí 41df1f8725 slightly reduce the range of hashed name lengths
v0.8.0 was improved to obfuscate names so that they didn't all end up
with the same length. We went from a length fixed to 8 to a range
between 8 and 15.

Since the new mechanism chooses a length evenly between 8 and 15,
the average hashed name length went from 8 to 11.5.

While this improved obfuscation, it also increased the size of
obfuscated binaries by quite a bit. We do need to use a reasonably large
length to avoid collisions within packages, but we also don't want it to
be large enough to cause too much bloat.

Reduce the minimum and maximum from 8 and 15 to 6 and 12. This means
that the average goes fro 11.5 to 9, much closer to the old average.
We could consider a range of 4 to 12, but 4 bytes is short enough where
collisions become likely in practice, even if it's just the minimum.
And a range of 6 to 10 shrinks the range a bit too much.

	name      old time/op         new time/op         delta
	Build-16          20.6s ± 0%          20.6s ± 0%    ~     (p=0.421 n=5+5)

	name      old bin-B           new bin-B           delta
	Build-16          5.77M ± 0%          5.66M ± 0%  -1.92%  (p=0.008 n=5+5)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          705ms ± 4%          703ms ± 2%    ~     (p=0.690 n=5+5)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          25.0M ± 0%          25.0M ± 0%  -0.08%  (p=0.008 n=5+5)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          8.11s ± 2%          8.11s ± 2%    ~     (p=0.841 n=5+5)

Updates #618.
1 year ago
Daniel Martí 7ead5998bc drop shadowed and unused global consts
I forgot these in the recent hash length randomness refactor.
1 year ago
Daniel Martí 481e3a1f09 default to GOGARBLE=*, stop using GOPRIVATE
We can drop the code that kicked in when GOGARBLE was empty.
We can also add the value in addGarbleToHash unconditionally,
as we never allow it to be empty.

In the tests, remove all GOGARBLE lines where it just meant "obfuscate
everything" or "obfuscate the entire main module".

cgo.txtar had "obfuscate everything" as a separate step,
so remove it entirely.

linkname.txtar started failing because the imported package did not
import strings, so listPackage errored out. This wasn't a problem when
strings itself wasn't obfuscated, as transformLinkname silently left
strings.IndexByte untouched. It is a problem when IndexByte does get
obfuscated. Make that kind of listPackage error visible, and fix it.

reflect.txtar started failing with "unreachable method" runtime throws.
It's not clear to me why; it appears that GOGARBLE=* makes the linker
think that ExportedMethodName is suddenly unreachable.
Work around the problem by making the method explicitly reachable,
and leave a TODO as a reminder to investigate.

Finally, gogarble.txtar no longer needs to test for GOPRIVATE.
The rest of the test is left the same, as we still want the various
values for GOGARBLE to continue to work just like before.

Fixes #594.
1 year ago
Daniel Martí b6a0284f84 immprove how hashWithCustomSalt comes up with its random lengths
The last change made it so that hashWithCustomSalt does not always end
up with 8 base64 characters, which is a good change for the sake of
avoiding easy patterns in obfuscated code.

However, the docs weren't updated accordingly, and it wasn't
particularly clear that the byte giving us randomness wasn't part of the
resulting base64-encoded name.

First, refactor the code to only feed as many checksum bytes to the
base64 encoder as necessary, which is 12.
This shrinks b64NameBuffer and saves us some base64 encoding work.

Second, use the first checksum byte that we don't use, the 13th,
as the source of the randomness.
Note how before we used a base64-encoded byte for the randomness,
which isn't great as that byte was only one of 63 characters,
whereas a checksum byte is one of 256.

Third, update the docs so that the code is as clear as possible.
This is particularly important given that we have no tests.

With debug prints in the gogarble.txt test, we can see that the
randomness in hash lengths is working as intended:

	# test/main/stdimporter
	hashLength = 13
	hashLength = 8
	hashLength = 12
	hashLength = 15
	hashLength = 10
	hashLength = 15
	hashLength = 9
	hashLength = 8
	hashLength = 15
	hashLength = 15
	hashLength = 12
	hashLength = 10
	hashLength = 13
	hashLength = 13
	hashLength = 8
	hashLength = 15
	hashLength = 11

Finally, add a regression test that will complain if we end up with
hashed names that reuse the same length too often.
Out of eight hashed names, the test will fail if six end up with the
same length, as that is incredibly unlikely given that each should pick
one of eight lengths with a fair distribution.
2 years ago
Azrotronik 73b77ce6be
randomize the length of hashes used for identifiers and filenames
Otherwise all of those names share the same exact length,
which can be a rather easy pattern to spot that garble was used.
2 years ago
Daniel Martí c1c90fee13 make obfuscation fully deterministic with -seed
The default behavior of garble is to seed via the build inputs,
including the build IDs of the entire Go build of each package.
This works well as a default, and does give us determinism,
but it means that building for different platforms
will result in different obfuscation per platform.

Instead, when -seed is provided, don't use any other hash seed or salt.
This means that a particular Go name will be obfuscated the same way
as long as the seed, package path, and name itself remain constant.

In other words, when the user supplies a custom -seed,
we assume they know what they're doing in terms of storage and rotation.

Expand the README docs with more examples and detail.

Fixes #449.
2 years ago
Daniel Martí ad87a0e2bc don't let -debug affect the build cache hashes
Noticed while debugging #492,
as adding -debug to a previously run command would unexpectedly
rebuild all packages in the build, as if -a was given.

While here, remove commented out testscript that were kept in error.
2 years ago
Daniel Martí 70b1cb2fd8 CI: start enforcing vet and staticcheck
Fix a staticcheck warning about unused code,
as well as an unparam warning and a missing copyright header.

We also bump the action versions to their latest releases,
and drop unnecessary "name" fields for self-describing steps.

Note that we drop the "go env" commands, as setup-go does that now.

Finally, I did briefly try to add caching,
but then realised it didn't help us at all. Document why.
2 years ago
Daniel Martí cffb5acd11 remove another allocation per hashed name
We use a buffer to turn a hash into a name via base64.
We can reuse that buffer to save one repeated allocation.
The returned value is still unique, as we convert to a string.

	name      old time/op         new time/op         delta
	Build-16          9.26s ± 3%          9.39s ± 2%    ~     (p=0.151 n=5+5)

	name      old bin-B           new bin-B           delta
	Build-16          5.16M ± 0%          5.16M ± 0%    ~     (all equal)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          314ms ± 5%          324ms ± 6%    ~     (p=0.310 n=5+5)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          29.8M ± 0%          29.4M ± 0%  -1.38%  (p=0.008 n=5+5)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          4.70s ± 4%          4.66s ± 2%    ~     (p=0.548 n=5+5)
2 years ago
Daniel Martí cdc1efd95b avoid allocating twice for every name we hash
name      old time/op         new time/op         delta
	Build-16          9.25s ± 2%          9.26s ± 3%    ~     (p=0.841 n=5+5)

	name      old bin-B           new bin-B           delta
	Build-16          5.16M ± 0%          5.16M ± 0%    ~     (all equal)

	name      old cached-time/op  new cached-time/op  delta
	Build-16          316ms ± 5%          314ms ± 5%    ~     (p=1.000 n=5+5)

	name      old mallocs/op      new mallocs/op      delta
	Build-16          30.7M ± 0%          29.8M ± 0%  -2.90%  (p=0.008 n=5+5)

	name      old sys-time/op     new sys-time/op     delta
	Build-16          4.78s ± 4%          4.70s ± 4%    ~     (p=0.690 n=5+5)
2 years ago
Daniel Martí ff803004d0 avoid build ID mismatches when using -debugdir
A recent change added the -debugdir value to addGarbleToHash,
which is part of the hash seed for all obfuscation taking place.

In principle, it was okay to add, just like any other garble flag.
In practice, it broke the added test case:

	> garble -debugdir ./debug1 build
	[stderr]
	# test/main
	FBi9xa6e.(*ac0bCOhR).String: relocation target FBi9xa6e.rV55e6H9 not defined
	qmECK6zf.init.0: relocation target qmECK6zf.eUU08z98 not defined
	[...]

This is because -debugdir gets turned into an absolute path,
and not all garble processes ended up using it consistently.

The fix is rather simple; since -debugdir doesn't affect obfuscation,
don't include it in the build hash seeding at all.

Fixes #451.
2 years ago
Daniel Martí 06d5972223
add -debug to aid in debugging (#431)
Not really meant for end users,
but they might still debug failures before filing bugs.

We add the -debug flag itself,
as well as machinery to deduplicate output lines.
There are quite a lot of them otherwise,
which aren't helpful and simply add noise.

In the future, if we always want to output a debug log line,
such as "choosing not to obfuscate here because X",
we can simply insert the unique position string.

Finally, turn all commented-out log.Printf calls to debugf.
Improve a few log lines to be more human-friendly,
and also add a few extras like how long it takes to load files.

We can improve the logging further in the future.
This seems like a good starting point.
2 years ago
Daniel Martí 5f74a1c9f0 unify the definition and storage of flag values
The parent garble process parses the original flags,
as provided by the user via the command line.
Previously, those got stored in the shared cache file,
so that child processes spawned by toolexec could see them.

Unfortunately, this made the code relatively easy to misuse.
A child process would always see flagLiterals as zero value,
given that it should never see such a flag argument directly.
Similarly, one would have to be careful with cached options,
as they could only be consumed after the cache file is loaded.

Simplify the situation by deduplicating the storage of flags.
Now, the parent passes all flags onto children via toolexec.

One exception is GarbleDir, which now becomes an env var.
This seems in line with other top-level dirs like GARBLE_SHARED.

Finally, we turn -seed into a flag.Value,
which lets us implement its "set" behavior as part of flag.Parse.

Overall, we barely reduce the amount of code involved,
but we certainly remove a couple of footguns.
As part of the cleanup, we also introduce appendFlags.
2 years ago
Daniel Martí fceb19f6da
deprecate using GOPRIVATE in favor of GOGARBLE (#427)
Piggybacking off of GOPRIVATE is great for a number of reasons:

* People tend to obfuscate private code, whose package paths will
  generally be in GOPRIVATE already

* Its meaning and syntax are well understood

* It allows all the flexibility we need without adding our own env var
  or config option

However, using GOPRIVATE directly has one main drawback.
It's fairly common to also want to obfuscate public dependencies,
to make the code in private packages even harder to follow.
However, using "GOPRIVATE=*" will result in two main downsides:

* GONOPROXY defaults to GOPRIVATE, so the proxy would be entirely disabled.
  Downloading modules, such as when adding or updating dependencies,
  or when the local cache is cold, can be less reliable.

* GONOSUMDB defaults to GOPRIVATE, so the sumdb would be entirely disabled.
  Adding entries to go.sum, such as when adding or updating dependencies,
  can be less secure.

We will continue to consume GOPRIVATE as a fallback,
but we now expect users to set GOGARBLE instead.
The new logic is documented in the README.

While here, rewrite some uses of "private" with "to obfuscate",
to make the code easier to follow and harder to misunderstand.

Fixes #276.
2 years ago
Daniel Martí b5bef981ee
stop relying on nested "go list -toolexec" calls (#422)
We rely on importcfg files to load type info for obfuscated packages.
We use this type information to remember what names we didn't obfuscate.
Unfortunately, indirect dependencies aren't listed in importcfg files,
so we relied on extra "go list -toolexec" calls to locate object files.

This worked fine, but added a significant amount of complexity.
The extra "go list -export -toolexec=garble" invocations weren't slow,
as they avoided rebuilding or re-obfuscating thanks to the build cache.
Still, it was hard to reason about how garble runs during a build
if we might have multiple layers of -toolexec invocations.

Instead, record the export files we encounter in an incremental map,
and persist it in the build cache via the gob file we're already using.
This way, each garble invocation knows where all object files are,
even those for indirect imports.

One wrinkle is that importcfg files can point to temporary object files.
In that case, figure out its final location in the build cache.
This requires hard-coding a bit of knowledge about how GOCACHE works,
but it seems relatively harmless given how it's very little code.
Plus, if GOCACHE ever changes, it will be obvious when our code breaks.

Finally, add a TODO about potentially saving even more work.
3 years ago
Daniel Martí 29356f30f7 update runtimeAndDeps for Go 1.18
In particular, internal/abi now has some actual code,
so obfuscating those literals was breaking as expected.
Document how to update the list in the future as well.

The change above gets "go test" to just one test failure on:

	go version devel go1.18-578ada410d Tue Nov 9 22:58:24 2021 +0000 linux/amd64

We also move the doc about why we disable GarbleLiterals,
so that it's next to where the disabling happens.

While here, we also rename GarbleLiterals to ObfuscateLiterals,
as we have been trying to move away from "to garble" as a verb.

Finally, limit the verbosity of diffoscope.
One test was failing for me, and diffoscope printed thousands of lines.
Not particularly useful when I'm trying to skim test results.
Usually, seeing a few dozen lines of output is enough.

Updates #385.
3 years ago
Daniel Martí c9b0b07853 hash field names equally in all packages
Packages P1 and P2 can define identical struct types T1 and T2, and one
can convert from type T1 to T2 or vice versa.

The spec defines two identical struct types as:

	Two struct types are identical if they have the same sequence of
	fields, and if corresponding fields have the same names, and
	identical types, and identical tags. Non-exported field names
	from different packages are always different.

Unfortunately, garble broke this: since we obfuscated field names
differently depending on the package, cross-package conversions like the
case above would result in typechecking errors.

To fix this, implement Joe Tsai's idea: hash struct field names with the
string representation of the entire struct. This way, identical struct
types will have their field names obfuscated in the same way in all
packages across a build.

Note that we had to refactor "reverse" a bit to start using transformer,
since now it needs to keep track of struct types as well.

This failure was affecting the build of google.golang.org/protobuf,
since it makes regular use of cross-package struct conversions.

Note that the protobuf module still fails to build, but for other
reasons. The package that used to fail now succeeds, so the build gets a
bit further than before. #240 tracks adding relevant third-party Go
modules to CI, so we'll track the other remaining failures there.

Fixes #310.
3 years ago
Daniel Martí 3afc993266 use "go env -json" to collect env info all at once
In the worst case scenario, when GOPRIVATE isn't set at all, we would
run these three commands:

* "go env GOPRIVATE", to fetch GOPRIVATE itself
* "go list -m", for GOPRIVATE's fallback
* "go version", to check the version of Go being used

Now that we support Go 1.16 and later, all these three can be obtained
via "go env -json":

	$ go env -json GOPRIVATE GOMOD GOVERSION
	{
		"GOMOD": "/home/mvdan/src/garble/go.mod",
		"GOPRIVATE": "",
		"GOVERSION": "go1.16.3"
	}

Note that we don't get the module path directly, but we can use the
x/mod/modfile Go API to parse it from the GOMOD file cheaply.

Notably, this also simplifies our Go version checking logic, as now we
get just the version string without the "go version" prefix and
"GOOS/GOARCH" suffix we don't care about.

This makes our code a bit more maintainable and robust. When running a
short incremental build, we can also see a small speed-up, as saving two
"go" invocations can save a few milliseconds:

	name           old time/op       new time/op       delta
	Build/Cache-8        168ms ± 0%        166ms ± 1%  -1.26%  (p=0.009 n=6+6)

	name           old bin-B         new bin-B         delta
	Build/Cache-8        6.36M ± 0%        6.36M ± 0%  +0.12%  (p=0.002 n=6+6)

	name           old sys-time/op   new sys-time/op   delta
	Build/Cache-8        222ms ± 2%        219ms ± 3%    ~     (p=0.589 n=6+6)

	name           old user-time/op  new user-time/op  delta
	Build/Cache-8        857ms ± 1%        846ms ± 1%  -1.31%  (p=0.041 n=6+6)
3 years ago
Daniel Martí 10ec00b37a
make flags like -literals and GOPRIVATE affect hashing (#288)
In 6898d61637, we switched from using action IDs from "go list
-toolexec=garble" to those from the original "go list". We still wanted
the obfuscation and hashing to change if the version of garble changes,
so we hashed that "original action ID" with garble's own content ID, and
called the new hash "garble action ID".

While working on a different patch, I noticed something weird: with the
new mechanism, adding or removing flags like -literals did not alter
those hashes, unlike the old method. This is because the old method used
ownContentID, which includes such bits of information, but the new
method does not.

Change that, and add a test that locks in the behavior we want. In
seed.txt, we check that a single function name gets hashed in particular
ways in different scenarios.

Note that we use a mix of "cmp" and "! bincmp", since the former has no
negated form.

While at it, the seed.txt test is revamped a bit. Now, we only run with
-literals once, as this test is mainly about -seed. We also declare seed
strings once, as environment variables, which makes it easier to track
what each step is doing.
3 years ago
Daniel Martí b03cd08c09
avoid one more call to 'go tool buildid' (#253)
We use it to get the content ID of garble's binary, which is used for
both the garble action IDs, as well as 'go tool compile -V=full'.

Since those two happen in separate processes, both used to call 'go tool
buildid' separately. Store it in the gob cache the first time, and reuse
it the second time.

Since each call to cmd/go costs about 10ms (new process, running its
many init funcs, etc), this results in a nice speed-up for our small
benchmark. Most builds will take many seconds though, so note that a
~15ms speedup there will likely not be noticeable.

While at it, simplify the buildInfo global, as now it just contains a
map representation of the -importcfg contents. It now has better names,
docs, and a simpler representation.

We also stop using the term "garbled import", as it was a bit confusing.
"obfuscated types.Package" is a much better description.

	name     old time/op       new time/op       delta
	Build-8        106ms ± 1%         92ms ± 0%  -14.07%  (p=0.010 n=6+4)

	name     old bin-B         new bin-B         delta
	Build-8        6.60M ± 0%        6.60M ± 0%   -0.01%  (p=0.002 n=6+6)

	name     old sys-time/op   new sys-time/op   delta
	Build-8        208ms ± 5%        149ms ± 3%  -28.27%  (p=0.004 n=6+5)

	name     old user-time/op  new user-time/op  delta
	Build-8        433ms ± 3%        384ms ± 3%  -11.35%  (p=0.002 n=6+6)
3 years ago
Daniel Martí 6898d61637
start using original action IDs (#251)
When we obfuscate a name, what we do is hash the name with the action ID
of the package that contains the name. To ensure that the hash changes
if the garble tool changes, we used the action ID of the obfuscated
build, which is different than the original action ID, as we include
garble's own content ID in "go tool compile -V=full" via -toolexec.

Let's call that the "obfuscated action ID". Remember that a content ID
is roughly the hash of a binary or object file, and an action ID
contains the hash of a package's source code plus the content IDs of its
dependencies.

This had the advantage that it did what we wanted. However, it had one
massive drawback: when we compile a package, we only have the obfuscated
action IDs of its dependencies. This is because one can't have the
content ID of dependent packages before they are built.

Usually, this is not a problem, because hashing a foreign name means it
comes from a dependency, where we already have the obfuscated action ID.
However, that's not always the case.

First, go:linkname directives can point to any symbol that ends up in
the binary, even if the package is not a dependency. So garble could
only support linkname targets belonging to dependencies. This is at the
root of why we could not obfuscate the runtime; it contains linkname
directives targeting the net package, for example, which depends on runtime.

Second, some other places did not have an easy access to obfuscated
action IDs, like transformAsm, which had to recover it from a temporary
file stored by transformCompile.

Plus, this was all pretty expensive, as each toolexec sub-process had to
make repeated calls to buildidOf with the object files of dependencies.
We even had to use extra calls to "go list" in the case of indirect
dependencies, as their export files do not appear in importcfg files.

All in all, the old method was complex and expensive. A better mechanism
is to use the original action IDs directly, as listed by "go list"
without garble in the picture.

This would mean that the hashing does not change if garble changes,
meaning weaker obfuscation. To regain that property, we define the
"garble action ID", which is just the original action ID hashed together
with garble's own content ID.

This is practically the same as the obfuscated build ID we used before,
but since it doesn't go through "go tool compile -V=full" and the
obfuscated build itself, we can work out *all* the garble action IDs
upfront, before the obfuscated build even starts.

This fixes all of our problems. Now we know all garble build IDs
upfront, so a bunch of hacks can be entirely removed. Plus, since we
know them upfront, we can also cache them and avoid repeated calls to
"go tool buildid".

While at it, make use of the new BuildID field in Go 1.16's "list -json
-export". This avoids the vast majority of "go tool buildid" calls, as
the only ones that remain are 2 on the garble binary itself.

The numbers for Go 1.16 look very good:

	name     old time/op       new time/op       delta
	Build-8        146ms ± 4%        101ms ± 1%  -31.01%  (p=0.002 n=6+6)

	name     old bin-B         new bin-B         delta
	Build-8        6.61M ± 0%        6.60M ± 0%   -0.09%  (p=0.002 n=6+6)

	name     old sys-time/op   new sys-time/op   delta
	Build-8        321ms ± 7%        202ms ± 6%  -37.11%  (p=0.002 n=6+6)

	name     old user-time/op  new user-time/op  delta
	Build-8        538ms ± 4%        414ms ± 4%  -23.12%  (p=0.002 n=6+6)
3 years ago
Daniel Martí a223147093
use more bits for the obfuscated name hashes (#248)
We've been using four base64 characters for obfuscated names for a
while. And that has mostly worked, since most packages only have up to a
few hundred exported or unexported names at a time.

However, we have already encountered two collisions in the wild, which
can be reproduced with one seed but not another:

	[...] PsaN.hQyW is a field, not a method

	[...] byte is not a type

In both of those cases, we happened to run into a collision by chance.
And that's not terribly unlikely to begin with; even with just 100
names, the probability of a collision was about 0.03%. It dramatically
goes up if there are more names; with 500, we're already around 0.75%.

It's clear that four base64 chars is not enough to properly avoid
collisions in the vast majority of cases. But how many characters are
enough? The target should be that, even with a very large package and
lots of names, we should still practically never have a collision.

I did some basic estimation with "lots of names" being ten thousand,
with "practically never" being a one in a million chance. We need to go
all the way up to eight characters to reach that probability.

It's entirely possible that 7 or even 6 characters would be enough for
most users. However, collisions result in confusing errors which are
also hard to reproduce for us unless we can use exactly the same seed
and source code for a build.

So, play it safe, and use 8 characters. The constant now also has
documentation explaining how we arrived at that figure.
3 years ago
Daniel Martí 840cf9b68d
make hashWith a bit smarter (#238)
It used to return five-byte strings, and now it returns four bytes with
nearly the same number of bits of entropy.

It also avoids the exported vs unexported dance if the name isn't an
identifier, which is now common with import paths.

See the added docs for more details.
3 years ago
Daniel Martí e64fccd367
better document and position the hash base64 encoding (#234)
We now document why we use a custom base64 charset.

The old "b64" name was also too generic, so it might have been misused
for other purposes.
3 years ago
Daniel Martí 79c775e218
obfuscate unexported names like exported ones (#227)
In 90fa325da7, the obfuscation logic was changed to use hashes for
exported names, but incremental names starting at just one letter for
unexported names. Presumably, this was done for the sake of binary size.

I argue that this is not a good idea for the default mode for a number
of reasons:

1) It makes reversing of stack traces nearly impossible for unexported
   names, since replacing an obfuscated name "c" with "originalName"
   would trigger too many false positives by matching single characters.

2) Exported and unexported names aren't different. We need to know how
   names were obfuscated at a later time in both cases, thanks to use
   cases like -ldflags=-X. Using short names for one but not the other
   doesn't make a lot of sense, and makes the logic inconsistent.

3) Shaving off three bytes for unexported names doesn't seem like a huge
   deal for the default mode, when we already have -tiny to optimize for
   size.

This saves us a bit of work, but most importantly, simplifies the
obfuscation state as we no longer need to carry privateNameMap between
the compile and link stages.

	name     old time/op       new time/op       delta
	Build-8        153ms ± 2%        150ms ± 2%    ~     (p=0.065 n=6+6)

	name     old bin-B         new bin-B         delta
	Build-8        7.09M ± 0%        7.08M ± 0%  -0.24%  (p=0.002 n=6+6)

	name     old sys-time/op   new sys-time/op   delta
	Build-8        296ms ± 5%        277ms ± 6%  -6.50%  (p=0.026 n=6+6)

	name     old user-time/op  new user-time/op  delta
	Build-8        562ms ± 1%        558ms ± 3%    ~     (p=0.329 n=5+6)

Note that I do not oppose using short names for both exported and
unexported names in the future for -tiny, since reversing of stack
traces will by design not work there. The code can be resurrected from
the git history if we want to improve -tiny that way in the future, as
we'd need to store state in header files again.

Another major cleanup we can do here is to no longer use the
garbledImports map. From a look at obfuscateImports, we hash a package's
import path with its action ID, much like exported names, so we can
simply re-do that hashing for the linker's -X flag.

garbledImports does have some logic to handle duplicate package names,
but it's worth noting that should not affect package paths, as they are
always unique. That area of code could probably do with some
simplification in the future, too.

While at it, make hashWith panic if either parameter is empty.
obfuscateImports was hashing the main package path without a salt due to
a bug, so we want to catch those in the future.

Finally, make some tiny spacing and typo tweaks to the README.
3 years ago
Daniel Martí 249501b5e9
fix garbling names belonging to indirect imports (#203)
main.go includes a lengthy comment that documents this edge case, why it
happened, and how we are fixing it. To summarize, we should no longer
error with a build error in those cases. Read the comment for details.

A few other minor changes were done to allow writing this patch.

First, the actionID and contentID funcs were renamed, since they started
to collide with variable names.

Second, the logging has been improved a bit, which allowed me to debug
the issue.

Third, the "cache" global shared by all garble sub-processes now
includes the necessary parameters to run "go list -toolexec", including
the path to garble and the build flags being used.

Thanks to lu4p for writing a test case, which also applied gofmt to that
testdata Go file.

Fixes #180.
Closes #181, since it includes its test case.
3 years ago
lu4p cf290b8e6d
Share data between processes via a shared file. (#192)
Previously garble heavily used env vars to share data between processes.
This also makes it easy to share complex data between processes.

The complexity of main.go is considerably reduced.
4 years ago
Daniel Martí dfa622fe50
simplify globals, split hash.go (#191)
The previous globals worked, but were unnecessarily complex. For
example, we passed the fromPath variable around, but it's really a
static global, since we only compile or link a single package in each Go
process. Use such global variables instead of passing them around, which
currently include the package's import path, its build ID, and its
import config path.

Also split all the hashing and build ID code into hash.go, since that's
a relatively well contained 200 lines of code that doesn't need to make
main.go any bigger. We also split the code to alter Go's own version to
a separate function, so that it can be moved out of main.go as well.
4 years ago