david.dev - AI x Ruby on Rails

Production Rails applications need health check endpoints. Load balancers poll them. Kubernetes uses them for pod readiness. Monitoring services depend on them. Yet many teams either skip them entirely or expose too much information to the public internet.

Rails 8 provides built-in health check functionality through Rails::HealthController, but real-world applications need more: database connectivity checks, Redis availability, queue system health, and disk space monitoring. This guide covers building comprehensive health checks that satisfy ops requirements without becoming security liabilities.

The Built-in Health Check

Rails 8 ships with a minimal health check endpoint. Enable it by mounting the built-in route:

# config/routes.rb
Rails.application.routes.draw do
  get "up" => "rails/health#show", as: :rails_health_check
  
  # ... rest of routes
end

This endpoint returns a 200 status when the Rails process is running. Load balancers can hit /up to verify the application server responds. However, a running Rails process doesn't mean the application actually works—the database could be down, Redis unavailable, or disk space exhausted.

Building a Comprehensive Health Check

Production systems need deeper checks. Create a custom health controller that verifies each critical dependency:

# app/controllers/health_controller.rb
class HealthController < ActionController::API
  # Public endpoint for load balancers - minimal info
  def show
    checks = run_health_checks
    
    if checks.values.all? { |v| v[:healthy] }
      render json: { status: "healthy" }, status: :ok
    else
      render json: { status: "unhealthy" }, status: :service_unavailable
    end
  end
  
  # Detailed endpoint - protect with authentication or IP restriction
  def detailed
    checks = run_health_checks
    all_healthy = checks.values.all? { |v| v[:healthy] }
    
    render json: {
      status: all_healthy ? "healthy" : "unhealthy",
      timestamp: Time.current.iso8601,
      checks: checks
    }, status: all_healthy ? :ok : :service_unavailable
  end
  
  private
  
  def run_health_checks
    {
      database: check_database,
      redis: check_redis,
      queue: check_queue,
      disk: check_disk_space,
      memory: check_memory
    }
  end
  
  def check_database
    ActiveRecord::Base.connection.execute("SELECT 1")
    { healthy: true, message: "Connected" }
  rescue StandardError => e
    { healthy: false, message: e.message.truncate(100) }
  end
  
  def check_redis
    return { healthy: true, message: "Not configured" } unless defined?(Redis)
    
    redis = Redis.new(url: ENV.fetch("REDIS_URL", "redis://localhost:6379"))
    redis.ping
    { healthy: true, message: "Connected" }
  rescue StandardError => e
    { healthy: false, message: e.message.truncate(100) }
  end
  
  def check_queue
    # Check Solid Queue health by verifying the database table exists
    # and has been accessed recently by a worker
    SolidQueue::Process.where("last_heartbeat_at > ?", 5.minutes.ago).exists?
    { healthy: true, message: "Workers active" }
  rescue StandardError => e
    { healthy: false, message: e.message.truncate(100) }
  end
  
  def check_disk_space
    stat = Sys::Filesystem.stat("/")
    available_percent = (stat.blocks_available.to_f / stat.blocks * 100).round(1)
    
    if available_percent < 10
      { healthy: false, message: "#{available_percent}% available" }
    else
      { healthy: true, message: "#{available_percent}% available" }
    end
  rescue StandardError => e
    { healthy: true, message: "Check unavailable" }
  end
  
  def check_memory
    return { healthy: true, message: "Check unavailable" } unless File.exist?("/proc/meminfo")
    
    meminfo = File.read("/proc/meminfo")
    total = meminfo[/MemTotal:\s+(\d+)/, 1].to_i
    available = meminfo[/MemAvailable:\s+(\d+)/, 1].to_i
    available_percent = (available.to_f / total * 100).round(1)
    
    if available_percent < 10
      { healthy: false, message: "#{available_percent}% available" }
    else
      { healthy: true, message: "#{available_percent}% available" }
    end
  rescue StandardError => e
    { healthy: true, message: "Check unavailable" }
  end
end

Add the sys-filesystem gem to the Gemfile for disk space checks, or remove that check if running in containers where disk monitoring happens elsewhere.

Routing and Security

Health check endpoints present a security consideration. The basic endpoint reveals whether your application runs—fine for public access. The detailed endpoint exposes infrastructure information—restrict it to internal networks or authenticated requests:

# config/routes.rb
Rails.application.routes.draw do
  # Public health check for load balancers
  get "health" => "health#show"
  
  # Detailed health check - consider protecting with constraints
  constraints ->(req) { internal_request?(req) } do
    get "health/detailed" => "health#detailed"
  end
  
  # ... rest of routes
end

# Helper method for IP-based restriction
def internal_request?(request)
  allowed_ips = %w[127.0.0.1 ::1 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16]
  allowed_ips.any? { |ip| IPAddr.new(ip).include?(request.remote_ip) }
rescue IPAddr::InvalidAddressError
  false
end

For Kubernetes deployments, expose separate endpoints for liveness and readiness probes:

# config/routes.rb
Rails.application.routes.draw do
  # Liveness: Is the process running?
  get "health/live" => "health#live"
  
  # Readiness: Can this instance handle traffic?
  get "health/ready" => "health#ready"
end

# app/controllers/health_controller.rb
class HealthController < ActionController::API
  # Liveness check - just verify the process responds
  def live
    head :ok
  end
  
  # Readiness check - verify dependencies are available
  def ready
    ActiveRecord::Base.connection.execute("SELECT 1")
    head :ok
  rescue StandardError
    head :service_unavailable
  end
end

Adding Timeouts

Health checks should fail fast. A hanging database connection shouldn't make the health check hang for 30 seconds. Wrap checks in timeouts:

# app/controllers/health_controller.rb
class HealthController < ActionController::API
  TIMEOUT_SECONDS = 3
  
  private
  
  def check_database
    Timeout.timeout(TIMEOUT_SECONDS) do
      ActiveRecord::Base.connection.execute("SELECT 1")
    end
    { healthy: true, message: "Connected" }
  rescue Timeout::Error
    { healthy: false, message: "Connection timeout" }
  rescue StandardError => e
    { healthy: false, message: e.message.truncate(100) }
  end
end

Apply the same pattern to Redis and other external service checks. Three seconds works well for most deployments—long enough for a slow query under load, short enough to fail promptly when services are down.

Caching Health Check Results

Under heavy load, health check endpoints can become a burden themselves. If Kubernetes polls every 10 seconds across 20 pods, that creates steady database traffic. Cache results briefly:

# app/controllers/health_controller.rb
class HealthController < ActionController::API
  def show
    checks = Rails.cache.fetch("health_check_result", expires_in: 5.seconds) do
      run_health_checks
    end
    
    if checks.values.all? { |v| v[:healthy] }
      render json: { status: "healthy" }, status: :ok
    else
      render json: { status: "unhealthy" }, status: :service_unavailable
    end
  end
end

Five seconds of caching prevents hammering dependencies while still detecting failures quickly. Adjust based on monitoring requirements.

Common Mistakes

Several patterns cause problems in production health checks:

Checking non-critical services: If the email service is down, should the app return unhealthy? Usually not—degrade gracefully instead
No timeouts: A health check that hangs defeats its purpose
Exposing stack traces: Error messages should be truncated and sanitized
Checking too much: Each check adds latency and failure modes
Skipping authentication on detailed endpoints: Attackers use infrastructure information for reconnaissance

Summary

Production-ready health checks balance thoroughness with simplicity. Start with the built-in Rails health check for basic load balancer integration. Add database connectivity verification for readiness probes. Include detailed checks for infrastructure monitoring, but protect those endpoints from public access. Apply timeouts to every external call, and cache results to prevent health checks from becoming a performance problem themselves.

The goal remains straightforward: quickly answer whether this application instance can handle traffic, without exposing sensitive details or creating new failure modes in the process.