The Practices That Separate Good Developers from Great Ones

Most developers know the basics. Write tests. Review code. Use version control. Follow style guides.

But there's a vast gap between knowing these things exist and practicing them with the discipline that separates good engineering teams from exceptional ones.

I've spent years studying how top engineering organizations operate—not their tech stacks, but their habits. The rituals. The standards they refuse to compromise on. The practices that seem excessive until you understand why they exist.

What follows isn't a checklist. It's an attempt to document what world-class engineering actually looks like in practice, and why most teams never get there.

Part One: How You Think About Code

Design Documents Before Code

At Google, significant changes require a design document before any code is written. Not a quick outline—a thorough document that explains the problem, evaluates alternatives, identifies risks, and proposes a solution.

This feels slow. It is slow, intentionally.

The design doc process catches architectural mistakes when they're cheap to fix—before thousands of lines of code exist. It forces clarity of thought. It creates a written record of why decisions were made, invaluable when someone asks "why does this work this way?" two years later.

A design doc isn't a formality. It's a thinking tool. The act of writing forces you to confront ambiguities you'd otherwise discover mid-implementation.

A good design doc answers:

Context: What problem are we solving? Why now? What happens if we don't solve it?

Goals and Non-Goals: What's in scope? Equally important—what's explicitly out of scope?

Options Considered: What alternatives exist? Why were they rejected? This section is where most design docs fail—they present the chosen solution as if it were obvious, without showing the reasoning.

Proposed Solution: The technical approach, with enough detail that another engineer could implement it.

Risks and Mitigations: What could go wrong? How will you know if it's going wrong? What's the rollback plan?

Open Questions: What don't you know yet? What decisions are deferred?

Google's internal design doc template has been adapted publicly. Stripe, Uber, and most mature engineering organizations have similar processes. If your team doesn't write design docs, you're debugging architecture decisions in production.

Architecture Decision Records

Design docs capture point-in-time decisions. Architecture Decision Records (ADRs) capture the ongoing evolution of your system's architecture.

An ADR is a short document—often a single page—that records a significant architectural decision: the context, the decision, the consequences.

# ADR-0042: Use PostgreSQL for Order Service

## Status
Accepted

## Context
The Order Service needs a persistent data store. We evaluated PostgreSQL, 
MySQL, and DynamoDB. Order data has complex relational queries (order 
items, customer history, inventory joins) that favor relational databases.

## Decision
We will use PostgreSQL 15 with read replicas for the Order Service.

## Consequences
- Operational overhead of managing PostgreSQL (mitigated by using managed RDS)
- Team needs PostgreSQL expertise (two engineers have production experience)
- Enables complex queries without denormalization
- Limits horizontal write scaling (acceptable for projected order volume)

The power of ADRs is the historical record. When a new team member asks "why PostgreSQL instead of DynamoDB?", the answer is documented. When someone proposes a migration, they can read the original context and understand what constraints have changed.

Michael Nygard's original ADR proposal remains the best introduction.

Technical Debt as a First-Class Concept

Every engineering organization accumulates technical debt. The difference between good and great teams is whether they manage it deliberately or let it accumulate unconsciously.

Stripe maintains a "technical debt registry"—a documented inventory of known debt, its impact, and remediation plans. This isn't a backlog that gets ignored. It's reviewed regularly. Debt is prioritized alongside features.

Technical debt isn't bad code. It's the gap between the system you have and the system you need. Sometimes that gap is acceptable. Sometimes it's not. The key is knowing which is which.

Categorize debt by type:

Deliberate debt: Shortcuts taken consciously, with documented tradeoffs. "We're using a simple polling mechanism instead of webhooks because we need to ship by Q3. We'll revisit in Q1."

Accidental debt: Debt accumulated through ignorance or changing requirements. The code was fine when written; the world changed.

Bit rot: Systems that degraded through neglect—outdated dependencies, deprecated APIs, abandoned tests.

Each type requires different remediation. Deliberate debt needs scheduled payback. Accidental debt needs discovery and assessment. Bit rot needs continuous maintenance.

Amazon's practice of writing "Working Backwards" documents—starting with the press release for a feature before building it—serves a similar purpose. It forces clarity about what you're building and why, reducing the debt that comes from building the wrong thing.

Part Two: Code Review as a Discipline

Beyond "Looks Good To Me"

Code review at most companies is a speed bump. At Google, it's a teaching institution.

Google's code review process is rigorous to the point that outsiders find it excessive. Every change requires approval from an owner of the affected code. Reviews focus not just on correctness, but on readability, maintainability, and adherence to style guides. Reviewers are expected to be thorough; authors are expected to respond thoughtfully to every comment.

This slows things down. That's the point.

The benefits compound over time: consistent codebases, shared knowledge across teams, fewer bugs reaching production, junior engineers learning from seniors through review feedback.

Code review isn't a gate to pass. It's a conversation about how to make the code better.

What to look for as a reviewer:

Correctness: Does the code do what it claims? Are edge cases handled? Are there race conditions?

Design: Is this the right abstraction? Does it fit the existing architecture? Will it be easy to modify?

Readability: Can someone unfamiliar with this code understand it? Are variable names clear? Is the logic straightforward?

Tests: Are there tests? Do they test the right things? Could the implementation change without the tests catching it?

Security: Are inputs validated? Are there injection risks? Is authentication/authorization handled correctly?

Performance: Are there obvious inefficiencies? Database queries in loops? Unnecessary allocations?

What to do as an author:

Write small changes. Google's data shows that reviews of 200+ lines take disproportionately longer and catch fewer issues per line. Break large changes into reviewable chunks.

Write good descriptions. Explain what the change does, why it's needed, and how to verify it works. Link to design docs, tickets, or related changes.

Respond to every comment. Even if the response is "Done" or "I disagree because X." Silence is ambiguous.

The Readability Process

Google has a concept called "readability"—a certification that an engineer can write idiomatic code in a given language. Until you have readability, your code in that language requires approval from someone who does.

This seems bureaucratic until you've worked in a codebase where everyone writes Python like it's Java, or JavaScript like it's C++. Language idioms exist because they make code easier to understand for people who know that language.

You don't need a formal readability process. But you need someone on every review who knows the language well enough to catch non-idiomatic patterns.

Pair Programming and Mob Programming

Code review is asynchronous feedback. Pair programming is synchronous collaboration.

Pivotal Labs (now VMware Tanzu Labs) built their entire engineering culture around pair programming. Every line of code written by two people at one keyboard. This sounds inefficient. Their data showed it reduced bugs, increased knowledge sharing, and improved code quality enough to offset the apparent cost.

You don't need to pair on everything. But pairing on complex problems, unfamiliar codebases, or critical paths often produces better results than solo work followed by review.

Mob programming extends this further—the entire team working on one thing together. Useful for particularly thorny problems or critical architectural work. Not sustainable daily, but valuable as a tool.

Part Three: Testing Beyond Unit Tests

The Testing Pyramid and Its Discontents

The traditional testing pyramid—lots of unit tests, fewer integration tests, even fewer end-to-end tests—remains useful as a starting point. But it oversimplifies.

Unit tests are fast and focused but don't verify that components work together. Integration tests catch interface mismatches but are slower and more brittle. End-to-end tests verify the entire system but are slowest and flakiest.

The right mix depends on your system. A pure CRUD application might need more integration tests and fewer unit tests. A complex algorithmic system might be the opposite.

Test the behavior, not the implementation. If your tests break every time you refactor, they're testing the wrong thing.

Contract Testing

When services communicate over APIs, both sides need to agree on the contract. Traditional integration tests verify this by running both services. Contract testing verifies it without the overhead.

Pact is the most widely-used contract testing framework. The consumer defines expectations. The provider verifies it meets them. If either side changes incompatibly, tests fail before deployment.

This matters at scale. When you have dozens of services, running full integration tests for every change becomes impractical. Contract tests give you confidence in interface compatibility without the infrastructure cost.

Property-Based Testing

Traditional tests verify specific examples: "given input X, expect output Y." Property-based testing verifies properties across randomly generated inputs: "for any valid input, the output should satisfy property P."

from hypothesis import given, strategies as st
 
@given(st.lists(st.integers()))
def test_sort_is_idempotent(xs):
    """Sorting a sorted list should produce the same result."""
    assert sorted(sorted(xs)) == sorted(xs)
 
@given(st.lists(st.integers()))
def test_sort_preserves_length(xs):
    """Sorting shouldn't add or remove elements."""
    assert len(sorted(xs)) == len(xs)

Property-based tests find edge cases you wouldn't think to write examples for. They're particularly valuable for parsers, serializers, and any code with complex input spaces.

Hypothesis for Python, fast-check for JavaScript, and QuickCheck for Haskell pioneered this approach.

Mutation Testing

How do you know your tests are good? Code coverage tells you which lines executed, not whether the tests would catch bugs.

Mutation testing introduces small changes (mutations) to your code—flipping conditionals, changing operators, deleting statements—and runs your tests. If the tests still pass, they didn't catch the mutation. The mutation testing score tells you what percentage of mutations your tests caught.

This is computationally expensive, so it's typically run nightly rather than on every commit. But it reveals weak spots in your test suite that coverage metrics miss.

PIT for Java, mutmut for Python, Stryker for JavaScript.

Chaos Engineering

Netflix famously runs Chaos Monkey, which randomly terminates production instances. This sounds reckless until you realize the alternative: discovering your system can't handle instance failures during an actual outage.

Chaos engineering is the discipline of experimenting on a system to build confidence in its ability to withstand turbulent conditions.

Start small. Before killing production servers, try:

Introducing network latency between services
Simulating dependency failures
Exhausting connection pools
Filling disk space

Gremlin and Litmus provide chaos engineering platforms. Netflix's Chaos Monkey is open source. AWS has Fault Injection Simulator.

Chaos engineering in production requires mature observability and incident response. Don't inject failures you can't detect and recover from.

Load Testing and Performance Verification

"It works on my machine" extends to performance. Code that's fast enough in development may collapse under production load.

Load testing should be continuous, not a pre-launch checkbox. Run performance tests in CI. Establish baselines. Alert on regressions.

k6, Locust, and Gatling are modern load testing tools. Grafana k6 is particularly developer-friendly with test scripts in JavaScript.

import http from 'k6/http';
import { check, sleep } from 'k6';
 
export const options = {
  vus: 100,
  duration: '5m',
  thresholds: {
    http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
  },
};
 
export default function () {
  const res = http.get('https://api.example.com/orders');
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(1);
}

Part Four: Observability as a Practice

Beyond Logging

Logs tell you what happened. Metrics tell you how the system is performing. Traces tell you why a specific request was slow. You need all three.

Structured logging: Stop writing log.info("Processing order " + orderId). Write structured logs that your observability platform can parse and query.

{
  "timestamp": "2025-05-21T14:30:00Z",
  "level": "info",
  "message": "Order processed",
  "orderId": "ord_12345",
  "customerId": "cust_67890",
  "processingTimeMs": 234,
  "correlationId": "req_abc123"
}

Correlation IDs: Every request entering your system should get a unique identifier that propagates through all services and appears in all logs. Without this, debugging distributed systems is archaeology.

Metrics with dimensions: Don't just track http_requests_total. Track it with dimensions: http_requests_total{method="POST", path="/orders", status="200"}. This lets you slice data by any dimension when diagnosing issues.

Distributed Tracing

When a request touches multiple services, how do you know which one is slow? Distributed tracing.

A trace represents a single request's journey through your system. Spans represent individual operations within that trace. The trace ID propagates across service boundaries, stitching together a complete picture.

OpenTelemetry is the vendor-neutral standard. Jaeger and Zipkin are popular open-source backends. Cloud providers have their own: AWS X-Ray, Google Cloud Trace, Azure Monitor.

Traces are invaluable for debugging but expensive to store at 100% sampling. Sample intelligently—always capture errors and slow requests, randomly sample normal traffic.

SLOs, SLIs, and Error Budgets

Google's Site Reliability Engineering book introduced Service Level Objectives (SLOs) and Error Budgets. These concepts have become foundational for running reliable services.

Service Level Indicator (SLI): A measurement of service behavior. "The proportion of successful HTTP requests" or "The proportion of requests completed in under 200ms."

Service Level Objective (SLO): A target for the SLI. "99.9% of requests should succeed" or "95% of requests should complete in under 200ms."

Error Budget: The inverse of the SLO. If your SLO is 99.9% availability, your error budget is 0.1%—about 43 minutes of downtime per month.

The error budget is what makes this useful. When you have budget remaining, you can take risks—deploy faster, experiment more. When the budget is exhausted, you slow down and focus on reliability.

SLO	Error Budget (monthly)	Error Budget (yearly)
99%	7.2 hours	3.65 days
99.9%	43.8 minutes	8.76 hours
99.95%	21.9 minutes	4.38 hours
99.99%	4.38 minutes	52.6 minutes

The Google SRE Books are free online and essential reading for anyone running production services.

Alerting That Works

Most alerting is noise. Alerts fire constantly. On-call engineers become desensitized. Real problems get lost.

Effective alerting requires discipline:

Alert on symptoms, not causes. Don't alert on high CPU—alert on elevated error rates or latency that affects users. High CPU that doesn't impact users isn't urgent.

Every alert should be actionable. If there's nothing an engineer can do at 3 AM, it shouldn't page them at 3 AM.

Regularly review alerts. Which alerts fire most often? Are they useful? Which incidents weren't caught by alerts? Tune continuously.

Define severity levels and stick to them. Page for customer-impacting issues. Email for things that can wait until morning. Slack for informational notices.

Part Five: Security as a Practice

Shift-Left Security

Finding security vulnerabilities in production is expensive. Finding them in code review is cheaper. Finding them during design is cheapest.

"Shift-left" means moving security earlier in the development process:

Threat modeling during design: Before building a feature, identify what could go wrong. What are the attack vectors? What data is at risk? What would an attacker try?

Security-focused code review: Train developers to spot security issues. SQL injection, XSS, authentication bypasses—these patterns are learnable.

Automated scanning in CI: Run static analysis (SAST), dependency scanning, and secret detection on every commit. Don't wait for a quarterly security audit.

Security champions: Embed security-minded engineers in development teams. They don't replace the security team—they extend it.

Dependency Management and Supply Chain Security

Your code is a small fraction of what runs in production. The rest is dependencies. The average JavaScript project has hundreds of transitive dependencies.

Dependabot, Renovate, and Snyk automate dependency updates and vulnerability scanning. Use them.

But tooling isn't enough. You need practices:

Pin dependencies to specific versions. Don't use ^ or ~ in production. Know exactly what you're deploying.

Audit new dependencies before adding them. Who maintains it? How active is development? What's the security track record?

Regularly update dependencies. Falling behind makes updates harder and leaves vulnerabilities unpatched.

Verify dependency integrity. Lock files (package-lock.json, Pipfile.lock, go.sum) ensure reproducible builds. Verify checksums.

The SLSA framework (Supply-chain Levels for Software Artifacts) provides a maturity model for supply chain security. Sigstore enables signing and verification of software artifacts.

Secrets Management

Secrets in code repositories are the most common security mistake. It's embarrassing how often AWS keys end up on GitHub.

Never commit secrets. Use .gitignore patterns. Use pre-commit hooks like detect-secrets or gitleaks.

Use a secrets manager. HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault. Not environment variables in deployment configs.

Rotate secrets regularly. Automated rotation is better than manual rotation is better than never rotating.

Audit secret access. Know who accessed which secrets and when.

If you discover a secret was committed, even briefly, consider it compromised. Rotate it immediately. Git history is forever.

Part Six: Incident Response

Before the Incident

The time to prepare for incidents is not during the incident.

Runbooks: Document how to diagnose and resolve common issues. When the pager goes off at 3 AM, you want a checklist, not a research project.

Escalation paths: Who do you contact if you can't resolve the issue? At what point? What information do they need?

Communication templates: Pre-written templates for status page updates and stakeholder communication. You shouldn't be wordsmithing during an outage.

Regular drills: Practice incident response. Run game days. Rotate on-call to spread knowledge.

During the Incident

Assign clear roles. The Incident Commander makes decisions. The Communications Lead handles stakeholder updates. The Technical Lead directs debugging. Don't have everyone doing everything.

Communicate early and often. Silence erodes trust. "We're aware of the issue and investigating" is better than nothing.

Document as you go. Keep a running timeline. What did you try? What did you observe? This becomes the postmortem input.

Know when to escalate. If you're not making progress, bring in more people. Pride causes prolonged outages.

After the Incident: Blameless Postmortems

The postmortem isn't about finding who to blame. It's about finding what to fix.

Google, Etsy, and other mature engineering organizations practice blameless postmortems. The assumption is that people acted rationally given what they knew at the time. The question is: what about the system made the incident possible?

A good postmortem includes:

Timeline: What happened, when, in what order.

Impact: How many users affected? For how long? What was the business impact?

Root cause analysis: Not "Alice deployed bad code" but "Our deployment process allowed a change without adequate testing to reach production."

Action items: Specific, assigned, with due dates. Not "improve monitoring" but "Add alert for payment processing latency above 2s (owner: Bob, due: March 15)."

What went well: Incident response isn't only about failures. What worked? What should you do more of?

Etsy's Debriefing Facilitation Guide is an excellent resource for running effective postmortems.

Part Seven: Developer Experience

Internal Developer Platforms

At scale, developers shouldn't think about infrastructure. They should think about their application.

Internal Developer Platforms (IDPs) abstract infrastructure complexity. Developers push code; the platform handles building, deploying, scaling, and monitoring.

Backstage, open-sourced by Spotify, provides a framework for building developer portals—a single place to discover services, documentation, and tooling.

The best platform teams treat developers as customers. They measure developer satisfaction. They reduce friction continuously.

Fast Feedback Loops

Productivity correlates with feedback speed. How long from code change to knowing if it works?

Local development should be fast. If running tests locally takes 10 minutes, developers won't run them. Invest in test speed.

CI should be fast. Google found that test suites over 10 minutes significantly reduce developer productivity. Parallelize. Cache. Optimize.

Deploys should be fast. If deployment takes an hour, you'll deploy less often. Frequent small deploys are safer than rare large ones.

Code review turnaround should be fast. Google aims for code reviews completed within hours, not days. Slow reviews block developers and encourage larger, harder-to-review changes.

Measure developer cycle time: from commit to production. Then systematically reduce it.

Documentation as a Product

Documentation is a product with users. Treat it like one.

Keep docs close to code. Documentation in a separate wiki drifts from reality. READMEs in the repo are more likely to stay current.

Document the why, not just the how. "Run npm install" is obvious. "We use npm instead of yarn because of compatibility with our CI caching" is useful.

Have owners. Documentation without owners becomes stale. Every doc should have someone responsible for keeping it current.

Test your docs. Can a new team member set up the development environment using only the documentation? If not, the docs are incomplete.

Part Eight: The Practices That Scale

Trunk-Based Development

Long-lived feature branches are a liability. They diverge from main. Merges become painful. Integration bugs appear late.

Trunk-based development keeps branches short—ideally less than a day. Everyone commits to main (or a single trunk branch). Feature flags hide incomplete work.

This requires discipline: small changes, good tests, fast CI. But it eliminates merge hell and enables continuous deployment.

Google, Facebook, and Microsoft all practice some form of trunk-based development at massive scale.

Feature Flags as Infrastructure

Feature flags decouple deployment from release. You can deploy code to production without exposing it to users. You can release to a subset of users and monitor before full rollout. You can instantly disable a feature if problems emerge.

This isn't just a nice-to-have. At scale, it's essential.

LaunchDarkly, Split, and Unleash provide feature flag platforms. You can also build your own for simple cases, but managing flag lifecycle, targeting rules, and analytics gets complex quickly.

Feature flags are technical debt if not managed. Have a process to remove flags once features are fully rolled out. Dead flags accumulate and confuse.

Monorepo or Polyrepo: A Deliberate Choice

Google famously uses a monorepo—all code in one repository. Many companies use polyrepos—separate repositories per service.

Both work. The choice has tradeoffs:

Monorepo advantages: Atomic changes across multiple services. Easier code sharing. Consistent tooling. Single source of truth.

Monorepo challenges: Requires sophisticated build tooling. Repository size can strain standard git workflows. Access control is more complex.

Polyrepo advantages: Simpler for small teams. Clear ownership boundaries. Standard git tooling works fine.

Polyrepo challenges: Cross-repo changes require coordination. Dependency management is harder. Tooling inconsistency across repos.

The right choice depends on your organization's size, structure, and tooling investment. What matters is making a deliberate choice and committing to the practices that make it work.

For monorepos at scale: Bazel, Nx, Turborepo.

The Meta-Practice: Continuous Improvement

Every practice here shares something: they require continuous investment. They degrade without attention.

The teams that excel treat their engineering practices like a product. They measure. They iterate. They improve.

Run retrospectives on your processes, not just your sprints. Survey developers about friction points. Track metrics—deploy frequency, lead time, change failure rate, mean time to recovery (the DORA metrics).

The gap between good and great engineering teams isn't one big thing. It's hundreds of small practices, each contributing a little, compounding over time.

No team does everything here. But every team can do something better than they do today. Pick one practice. Implement it well. Then pick another.

That's how engineering cultures improve. Not through transformation initiatives, but through accumulated discipline.