Go’s sync package provides powerful tools for concurrent programming that can significantly improve application performance when used correctly. I’ve worked extensively with these primitives and found them essential for building high-performance systems. Let me share practical techniques for optimizing Go applications using the sync package.
Object Pooling with sync.Pool
One of the most effective ways to improve performance in Go applications is reducing garbage collection pressure. The sync.Pool helps achieve this by recycling temporary objects instead of constantly allocating and deallocating memory.
I’ve found sync.Pool particularly useful for objects that are frequently created and destroyed during request processing, such as buffers, temporary structures, and work objects.
// Create a pool of byte buffers
var bufferPool = sync.Pool{
New: func() interface{} {
// Default size for new buffers
return make([]byte, 4096)
},
}
func processRequest(data []byte) []byte {
// Get a buffer from the pool
buf := bufferPool.Get().([]byte)
// Important: return the buffer to the pool when done
defer bufferPool.Put(buf)
// Reset buffer to ensure clean state
buf = buf[:0]
// Use buffer for processing...
for _, b := range data {
buf = append(buf, b+1)
}
// Return a copy of the result since the buffer goes back to the pool
result := make([]byte, len(buf))
copy(result, buf)
return result
}
When implementing this pattern, remember that objects returned to the pool may be modified by other goroutines later. Always reset pooled objects before use and never return references to them.
Fine-Grained Locking with Multiple Mutexes
Using a single lock for an entire data structure can create contention. I’ve improved throughput by splitting resources into smaller sections with separate locks.
type UserCache struct {
// Separate mutex for each shard to reduce contention
shards [256]map[string]User
shardLocks [256]sync.Mutex
}
func NewUserCache() *UserCache {
uc := &UserCache{}
for i := range uc.shards {
uc.shards[i] = make(map[string]User)
}
return uc
}
func (uc *UserCache) getShardIndex(key string) uint8 {
// Simple hash function to determine shard
if len(key) == 0 {
return 0
}
return uint8(key[0])
}
func (uc *UserCache) Get(key string) (User, bool) {
idx := uc.getShardIndex(key)
uc.shardLocks[idx].Lock()
defer uc.shardLocks[idx].Unlock()
user, ok := uc.shards[idx][key]
return user, ok
}
func (uc *UserCache) Set(key string, user User) {
idx := uc.getShardIndex(key)
uc.shardLocks[idx].Lock()
defer uc.shardLocks[idx].Unlock()
uc.shards[idx][key] = user
}
This sharded approach allows concurrent access to different parts of the cache. The key benefit is that operations on different shards never block each other.
Read-Write Locks for Read-Heavy Workloads
When working with data that’s read frequently but updated rarely, I’ve achieved substantial performance gains using sync.RWMutex instead of regular mutexes.
type ConfigStore struct {
mu sync.RWMutex
configs map[string]string
}
func NewConfigStore() *ConfigStore {
return &ConfigStore{
configs: make(map[string]string),
}
}
func (cs *ConfigStore) Get(key string) (string, bool) {
// Multiple readers can acquire read lock simultaneously
cs.mu.RLock()
defer cs.mu.RUnlock()
val, ok := cs.configs[key]
return val, ok
}
func (cs *ConfigStore) Set(key, value string) {
// Writers need exclusive access
cs.mu.Lock()
defer cs.mu.Unlock()
cs.configs[key] = value
}
The RWMutex allows multiple goroutines to read simultaneously while ensuring writes have exclusive access. This pattern shines in scenarios with many readers and few writers.
Lock-Free Counters with atomic Package
For simple counters and flags, locks can be overkill. The atomic package provides faster, lock-free alternatives:
type RequestStats struct {
totalRequests int64
activeRequests int64
errors int64
}
func (s *RequestStats) IncrementTotal() {
atomic.AddInt64(&s.totalRequests, 1)
}
func (s *RequestStats) RequestStarted() {
atomic.AddInt64(&s.activeRequests, 1)
}
func (s *RequestStats) RequestCompleted() {
atomic.AddInt64(&s.activeRequests, -1)
}
func (s *RequestStats) RecordError() {
atomic.AddInt64(&s.errors, 1)
}
func (s *RequestStats) GetStats() (total, active, errors int64) {
// Get consistent snapshot of values
total = atomic.LoadInt64(&s.totalRequests)
active = atomic.LoadInt64(&s.activeRequests)
errors = atomic.LoadInt64(&s.errors)
return
}
Atomic operations avoid the overhead of locking and unlocking mutexes, making them ideal for high-frequency counter operations.
Thread-Safe Lazy Initialization with sync.Once
Initializing resources only when needed can improve startup time, but doing so safely in concurrent environments can be tricky. The sync.Once structure solves this elegantly:
type ExpensiveResource struct {
connection *Connection
once sync.Once
}
func (r *ExpensiveResource) GetConnection() *Connection {
// Initialize connection exactly once, regardless of concurrent calls
r.once.Do(func() {
r.connection = createExpensiveConnection()
})
return r.connection
}
func createExpensiveConnection() *Connection {
// Simulate expensive work
time.Sleep(2 * time.Second)
return &Connection{}
}
This pattern ensures the initialization code runs exactly once, even with multiple goroutines trying to access the resource simultaneously.
Coordinating Goroutines with WaitGroup
When spawning multiple goroutines for parallel work, I often need to wait for all of them to complete. The sync.WaitGroup provides a clean, efficient way to do this:
func ProcessUserData(userIDs []string) []UserResult {
var wg sync.WaitGroup
results := make([]UserResult, len(userIDs))
// Process each user ID concurrently
for i, id := range userIDs {
wg.Add(1)
go func(index int, userID string) {
defer wg.Done()
// Perform work and store result
results[index] = fetchUserData(userID)
}(i, id)
}
// Wait for all goroutines to complete
wg.Wait()
return results
}
func fetchUserData(id string) UserResult {
// Simulate API call or database query
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
return UserResult{ID: id, Name: "User " + id}
}
WaitGroups are more efficient than channels when you only need synchronization without communication between goroutines.
Concurrent Map with sync.Map
Go’s built-in maps aren’t safe for concurrent use. While you can protect a map with a mutex, the sync.Map type offers better performance for certain access patterns:
type UserSession struct {
// Built-in thread safety without additional locks
sessions sync.Map
}
func (us *UserSession) Get(sessionID string) (Session, bool) {
value, ok := us.sessions.Load(sessionID)
if !ok {
return Session{}, false
}
return value.(Session), true
}
func (us *UserSession) Set(sessionID string, session Session) {
us.sessions.Store(sessionID, session)
}
func (us *UserSession) Delete(sessionID string) {
us.sessions.Delete(sessionID)
}
func (us *UserSession) ForEach(f func(key string, value Session) bool) {
us.sessions.Range(func(key, value interface{}) bool {
return f(key.(string), value.(Session))
})
}
The sync.Map is optimized for two common use cases: (1) when keys are written once but read many times, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
Measuring and Benchmarking Synchronization Options
The most important optimization technique is measuring performance in your specific use case. I always benchmark different sync options before choosing one:
func BenchmarkMutexMap(b *testing.B) {
var mu sync.Mutex
m := make(map[int]int)
b.RunParallel(func(pb *testing.PB) {
counter := 0
for pb.Next() {
mu.Lock()
m[counter%100] = counter
mu.Unlock()
counter++
}
})
}
func BenchmarkSyncMap(b *testing.B) {
var m sync.Map
b.RunParallel(func(pb *testing.PB) {
counter := 0
for pb.Next() {
m.Store(counter%100, counter)
counter++
}
})
}
func BenchmarkShardedMap(b *testing.B) {
shards := make([]map[int]int, 16)
locks := make([]sync.Mutex, 16)
for i := range shards {
shards[i] = make(map[int]int)
}
b.RunParallel(func(pb *testing.PB) {
counter := 0
for pb.Next() {
key := counter % 100
shardIndex := key % 16
locks[shardIndex].Lock()
shards[shardIndex][key] = counter
locks[shardIndex].Unlock()
counter++
}
})
}
Run these benchmarks with go test -bench=. -benchmem
to see which approach performs best for your workload.
Practical Application: Building a Thread-Safe Cache
Let me demonstrate how to combine these techniques in a real-world application - a high-performance, thread-safe cache with expiration:
type Cache struct {
shards [256]map[string]cacheEntry
shardLocks [256]sync.RWMutex
pool sync.Pool // For temporary buffers
janitor *time.Ticker
stopChan chan struct{}
}
type cacheEntry struct {
value interface{}
expiration time.Time
}
func NewCache(cleanupInterval time.Duration) *Cache {
cache := &Cache{
janitor: time.NewTicker(cleanupInterval),
stopChan: make(chan struct{}),
pool: sync.Pool{
New: func() interface{} {
return make([]string, 0, 10)
},
},
}
// Initialize shards
for i := range cache.shards {
cache.shards[i] = make(map[string]cacheEntry)
}
// Start cleanup goroutine
go cache.cleanup()
return cache
}
func (c *Cache) shardIndex(key string) uint8 {
h := fnv.New32a()
h.Write([]byte(key))
return uint8(h.Sum32() % 256)
}
func (c *Cache) Set(key string, value interface{}, ttl time.Duration) {
idx := c.shardIndex(key)
c.shardLocks[idx].Lock()
defer c.shardLocks[idx].Unlock()
expiration := time.Now().Add(ttl)
c.shards[idx][key] = cacheEntry{
value: value,
expiration: expiration,
}
}
func (c *Cache) Get(key string) (interface{}, bool) {
idx := c.shardIndex(key)
c.shardLocks[idx].RLock()
defer c.shardLocks[idx].RUnlock()
entry, found := c.shards[idx][key]
if !found {
return nil, false
}
// Check if expired
if time.Now().After(entry.expiration) {
return nil, false
}
return entry.value, true
}
func (c *Cache) cleanup() {
for {
select {
case <-c.janitor.C:
c.removeExpired()
case <-c.stopChan:
c.janitor.Stop()
return
}
}
}
func (c *Cache) removeExpired() {
now := time.Now()
for i := range c.shards {
// Get buffer from pool for keys to delete
keysToDelete := c.pool.Get().([]string)
keysToDelete = keysToDelete[:0] // Reset slice while keeping capacity
// Find expired entries with read lock
c.shardLocks[i].RLock()
for k, v := range c.shards[i] {
if now.After(v.expiration) {
keysToDelete = append(keysToDelete, k)
}
}
c.shardLocks[i].RUnlock()
// Delete expired entries with write lock if any found
if len(keysToDelete) > 0 {
c.shardLocks[i].Lock()
for _, k := range keysToDelete {
delete(c.shards[i], k)
}
c.shardLocks[i].Unlock()
}
// Return buffer to pool
c.pool.Put(keysToDelete)
}
}
func (c *Cache) Close() {
close(c.stopChan)
}
This cache implements several optimization techniques:
- Sharding with fine-grained locks to reduce contention
- Read-write locks to allow concurrent reads
- Object pooling to reduce garbage collection
- Background cleanup to avoid blocking operations
Memory Synchronization and the Go Memory Model
When using synchronization primitives, it’s essential to understand Go’s memory model. Proper synchronization ensures not just mutual exclusion but also memory visibility across goroutines.
var data []string
var initialized int32
func initData() {
if atomic.LoadInt32(&initialized) == 0 {
doInit()
}
}
func doInit() {
// This is not thread-safe!
data = []string{"a", "b", "c"}
// Memory reordering can cause issues
atomic.StoreInt32(&initialized, 1)
}
The above code has a race condition. Even using atomic operations doesn’t guarantee proper synchronization. Instead, use sync.Once:
var data []string
var initOnce sync.Once
func initData() {
initOnce.Do(func() {
data = []string{"a", "b", "c"}
})
}
This ensures both the initialization happens once and proper memory synchronization occurs.
In my experience, performance optimization with Go’s sync package is about selecting the right tool for each scenario while understanding the trade-offs. Start with simple, readable code, measure performance, then apply these techniques to address specific bottlenecks.
The best synchronization is often the one you don’t need - design your systems to minimize shared state where possible. When shared state is necessary, choose the least restrictive synchronization that ensures correctness, and always verify with benchmarks in your specific use case.