david.dev - AI x Ruby on Rails

TLDR

Both PostgreSQL and MySQL are excellent options for your web apps. For smaller applications, the practical difference is often minimal. AI coding assistants (Claude Code, Cursor, Windsurf) are proficient in both technologies. The choice to evaluate MySQL was motivated by small nuances that can improve future scalability—and perhaps a bit of nostalgia for the LAMP stacks that shaped the early web.

A Brief History: From MySQL Dominance to PostgreSQL Renaissance

Around 18 years ago, MySQL was the default choice for web development. The LAMP stack (Linux, Apache, MySQL, PHP) powered an estimated 70% of dynamic websites during the mid-2000s. During that era, some developers advocated for PostgreSQL—pushing hosting providers to support PHP with PostgreSQL stacks. The motivation was partly contrarian, but PostgreSQL genuinely offered features that MySQL lacked: better SQL standards compliance, more sophisticated data types, and superior handling of complex queries.

Fast forward to today, and PostgreSQL has achieved remarkable adoption. The 2024 Stack Overflow Developer Survey ranked PostgreSQL as the most-used database among professional developers. Companies like Neon and Supabase have made PostgreSQL accessible and developer-friendly, particularly within the Next.js ecosystem. The PostgreSQL community has grown substantially, and the database powers everything from startups to enterprise applications.

Yet MySQL continues to offer distinct advantages that deserve consideration.

The Case for MySQL

1. Operational Simplicity

MySQL's architecture prioritizes straightforward administration. The configuration is more approachable, with fewer parameters requiring immediate attention for production deployments. Backup and restore operations follow predictable patterns, and the tooling ecosystem—including MySQL Workbench, phpMyAdmin, and numerous CLI utilities—remains mature and well-documented.

PostgreSQL offers more features, but that flexibility introduces complexity. PostgreSQL's configuration includes over 300 parameters, while MySQL's InnoDB engine requires attention to roughly 30-40 for most production deployments. MySQL's opinionated defaults mean fewer decisions during initial setup and ongoing maintenance.

2. Memory Management and Buffer Pool

MySQL's InnoDB storage engine allows direct control over memory allocation through the innodb_buffer_pool_size parameter. On a server with 14GB of RAM, you can allocate 70-80% (approximately 10-11GB) directly to the buffer pool, effectively keeping your entire working dataset in memory.

innodb_buffer_pool_size = 10G
innodb_buffer_pool_instances = 8

PostgreSQL takes a different approach, relying heavily on the operating system's page cache for caching. The PostgreSQL documentation recommends setting shared_buffers to approximately 25% of available RAM on dedicated database servers. On a 14GB system, this means around 3.5GB allocated directly to PostgreSQL, with the expectation that the OS will cache the rest.

While this approach has merits—particularly for systems running multiple applications and benefiting from unified cache management—it provides less predictable performance characteristics. The dual-layer caching can result in the same data being cached twice: once in PostgreSQL's shared buffers and again in the OS page cache. MySQL's buffer pool bypasses the filesystem cache through direct I/O options, giving administrators more explicit control over memory utilization.

3. Replication Architecture

MySQL's replication has matured significantly since its early asynchronous-only days. Binary log replication is conceptually straightforward to configure, and Group Replication (introduced in MySQL 5.7.17) provides built-in high availability and automatic failover without requiring external tools. The traditional primary-replica setup can be established with a handful of configuration changes.

MySQL also offers multiple replication formats: statement-based, row-based, and mixed-mode, allowing administrators to optimize for their specific workloads. Row-based replication, in particular, provides deterministic replay regardless of non-deterministic functions in the original queries.

PostgreSQL's streaming replication is powerful and has improved substantially in recent versions. However, built-in automatic failover arrived only with PostgreSQL 12's improved standby promotion capabilities. Many production deployments still rely on external tools like Patroni, repmgr, or pg_auto_failover to manage failover orchestration. These tools are mature and widely used, but they add components to deploy, configure, and monitor.

4. Enterprise Adoption at Scale

MySQL powers critical infrastructure at remarkable scale. Meta (Facebook) runs one of the world's largest MySQL deployments, managing petabytes of data across thousands of database instances. Meta's engineering team has contributed significant improvements back to the community, including work on MyRocks (a storage engine based on RocksDB) and various performance optimizations.

YouTube relies on MySQL for video metadata, handling billions of rows with low-latency requirements. Booking.com operates one of the largest MySQL deployments in Europe, processing millions of transactions daily. GitHub built its infrastructure on MySQL, managing repository metadata and user data at scale. Airbnb, Slack, and Twitter have all used MySQL for significant portions of their data infrastructure.

This isn't to diminish PostgreSQL's enterprise presence—Apple, Instagram, Twitch, and Reddit run substantial PostgreSQL deployments. The point is that MySQL's enterprise credentials remain robust, with proven scalability across diverse workloads.

5. Oracle Stewardship and the MariaDB Alternative

Oracle's acquisition of MySQL through Sun Microsystems in 2010 raised legitimate concerns about the database's future direction. The open-source community worried about reduced development transparency and potential feature restrictions.

However, MySQL development has continued with regular releases and meaningful improvements. MySQL 8.0 (released in 2018) brought window functions, common table expressions (CTEs), and enhanced JSON support. MySQL 8.0.17 added multi-valued indexes for JSON arrays. Recent versions have introduced invisible indexes, descending indexes, and improved security features like dual passwords and connection compression.

Oracle releases MySQL under the GPL license, and the source code remains publicly available. For organizations concerned about Oracle's involvement or requiring additional features, MariaDB provides a community-driven fork founded by MySQL's original creator, Michael "Monty" Widenius. MariaDB maintains wire-protocol compatibility for most applications, though the projects have diverged more significantly since MariaDB 10.x.

The Uber Migration: A Case Study

In 2016, Uber's engineering team published a detailed technical blog post explaining their migration from PostgreSQL to MySQL. Their analysis highlighted several architectural differences that affected their specific workload:

Write amplification: PostgreSQL's MVCC (Multi-Version Concurrency Control) implementation stores multiple row versions in the main table structure, requiring all indexes to be updated when a row is modified. For Uber's update-heavy workload, this led to significant index bloat and increased write I/O. MySQL's InnoDB uses a different approach where secondary indexes reference the primary key rather than physical row locations, reducing update overhead.
Replication architecture: At the time, PostgreSQL's write-ahead log (WAL) replication transmitted changes at the physical storage level. MySQL's row-based replication operated at a logical level, providing more flexibility for replica configurations and cross-version replication.
Connection handling: PostgreSQL's process-per-connection model created memory overhead compared to MySQL's thread-based approach, particularly for applications with many concurrent connections.

It's worth noting that PostgreSQL has addressed several of these concerns in subsequent versions. PostgreSQL 10 introduced logical replication, and ongoing work on storage improvements continues. Additionally, Uber's workload—high update rates on large tables with many secondary indexes—was highly specific to their use case.

This migration illustrates that database selection depends heavily on access patterns, operational expertise, and workload characteristics rather than abstract feature comparisons.

Framework Abstraction: Does It Matter?

Modern web frameworks abstract most database differences. Ruby on Rails' ActiveRecord, Django's ORM, Laravel's Eloquent, Prisma, and similar tools provide database-agnostic interfaces. For standard CRUD operations, the underlying database becomes largely invisible to application code. Migrations handle schema changes, and query builders translate to appropriate SQL dialects.

However, abstraction doesn't eliminate underlying performance characteristics. Memory management differences persist regardless of your ORM. Replication complexity doesn't disappear because ActiveRecord handles your queries. Connection overhead affects latency whether you're using Prisma or raw SQL.

The database choice also matters when you inevitably step outside ORM coverage—complex reporting queries, database-specific features, bulk data operations, and performance optimization all require understanding the underlying system.

When PostgreSQL Excels

A fair assessment requires acknowledging PostgreSQL's genuine strengths:

Advanced data types: Native support for arrays, JSON/JSONB with GIN indexing, range types, network address types, and user-defined types. JSONB, in particular, offers binary storage with indexing capabilities that surpass MySQL's JSON implementation for complex querying.
Extensibility: PostgreSQL's extension architecture enables PostGIS for geospatial data (the industry standard for geographic information systems), pg_trgm for fuzzy text search, TimescaleDB for time-series workloads, and pgvector for embedding-based similarity search. Extensions can transform PostgreSQL into specialized systems without forking the codebase.
Standards compliance: Closer adherence to SQL standards and more predictable behavior in edge cases. PostgreSQL implements more of the SQL specification, which can simplify migrations between database systems.
Complex queries: PostgreSQL's query planner handles sophisticated analytical queries with impressive optimization, including parallel query execution, advanced join strategies, and materialized views.
Full-text search: Built-in full-text search capabilities with ranking, stemming, and dictionary support—often sufficient to avoid external search infrastructure for moderate requirements.
ACID guarantees: Stricter default transaction isolation (Read Committed with true snapshot semantics) and robust data integrity enforcement.

Conclusion

Both MySQL and PostgreSQL are production-ready, battle-tested databases that power significant portions of the internet. PostgreSQL's recent popularity surge is well-deserved—it's an excellent database with an active community, impressive feature set, and strong ecosystem support from companies like Supabase, Neon, and Vercel.

But MySQL shouldn't be dismissed. Its operational simplicity, explicit memory management, straightforward replication, and proven scalability make it a compelling choice for many applications. The companies running MySQL at massive scale—Meta, YouTube, Booking.com, GitHub—demonstrate that it handles growth effectively.

For applications prioritizing operational simplicity, predictable resource usage, and straightforward scaling, MySQL offers genuine advantages. For applications requiring advanced data types, geospatial capabilities, or sophisticated analytical queries, PostgreSQL's feature set may prove more valuable.

Your requirements may lead to a different conclusion—and that's exactly as it should be. The best database is the one that fits your team's expertise, your application's access patterns, and your operational capabilities.

Don't choose based on popularity. Choose based on fit.