blog
~/writings/picking-a-database-properly

data strategy crash course

The One Thing You Should Know Before Vibe Coding

"If you know whence you came, there is really no limit to where you can go."

— James Baldwin

This quote from the legendary James Baldwin echoes an ancient injunction: know thyself. To know oneself is to understand origin, constraint, and trajectory simultaneously. In layman's terms, this self-knowledge enables situational awareness: the capacity not only to discern what is occurring, but the why behind it; tracing causal chains, anticipating second- and third-order effects, and forecasting what follows based on current conditions.

The same epistemic principle governs competent technology design.

Modern tooling has dramatically reduced the barrier to building. Yet this convenience has obscured a discipline that once sat at the center of serious engineering: architectural reasoning. Titles like solution architect emerged to formalize accountability for why a system is structured the way it is, and why particular tools are chosen to serve that structure.

Architecture today is unfortunately frequently "vibed," leading to shaky foundations that will inevitably collapse.

Speed at the expense of understanding is another term for gambling. Except in technology, instead of an immediate loss, it becomes a mountain of money spent bandaging avoidable problems.

Selecting a database (or, in modern enterprise systems, a hybrid of databases) is among the most consequential foundational decisions in software architecture. This choice propagates upward and outward through every layer of the stack. It constrains or enables entire classes of possible queries, shapes developer ergonomics, and defines the ceiling of performance and scalability. More subtly, it encodes a philosophy: an implicit theory of how your data will relate and evolve.

A database (or hybrid of databases) is not "just" a storage layer. It is your skeleton. What works at ten users must work at ten million. Many self-taught technologists are vulnerable here. Bottlenecks, technical debt, and scalability failures have been normalized among the incompetent, because speed has too often been optimized over foundation.

Going slow to go fast is always the best approach to technology, and that starts here. Let's start to talk about it.

Relational Databases

I was a kid in an airport trying to beat MLB Beat the Streak, studying a stack of statistics with no concept of a database. Years later, when I learned what a database was, I realized that much of the tedious work I had done manually could have been stored and queried properly. I later spoke with people who were, at the time, database experts. One of them told me he chose to specialize in databases because technology is continuously evolving, but the principles of relational databases would always remain the same, so mastering them would always be relevant.

He was not a visionary in anticipating how AI would make certain forms of static engineering obsolete, but his point carried deeper meaning. The principles of data modeling and relational integrity remain relevant regardless of how technology evolves, because data is the foundation of all software. Before NoSQL, before graph databases, before real-time databases, relational databases were the foundation upon which nearly all software was built.

So when people describe relational databases as "old," "boring," or "outdated," it often says more about when they entered the field than about the technology itself. While "boring" may be true, relational databases still serve a critical role, whether through direct implementation or as the conceptual foundation for modeling data before selecting another database paradigm.

Among relational databases, in my experience and observation, PostgreSQL stands apart. Its early support for advanced functions, JSONB, and portability across operating systems made it difficult to compete with. In contrast, other tools such as MySQL prioritized speed and simplicity at the expense of functionality. Limitations around temporary tables, self-joins, and expressive querying left lasting scars for many early developers. Microsoft SQL Server positioned itself as the enterprise database for organizations already embedded in the Windows ecosystem. As developers moved to being "cool", its popularity declined. Oracle followed a similar trajectory, often perceived as software for a previous generation.

While many technologies have been adopted successfully, PostgreSQL's early commitment to correctness, extensibility, and developer ergonomics earned sustained trust. It did not merely win preference. It established itself as a relational database suitable for serious, long-term, enterprise-grade systems.

Document Databases

Document databases emerged as a reaction to the rigidity of relational schemas. They promised flexibility: store JSON, iterate quickly, and worry about structure later.

MongoDB became synonymous with this movement. It offered a seductive pitch: model your data the way your application thinks about it. No migrations, no foreign keys, no constraints. Almost exactly how a vibe coder with no foundation thinks. Worry about it later. Adoption accelerated quickly. MVP velocity increased dramatically. More document databases followed suit: DynamoDB, Cassandra, Cosmos DB.

But document databases, in the hands of the wrong developer, enabled lazy data modeling and a disregard for tried-and-true best practices. Data duplication, inconsistent schemas, and the absence of integrity checks produced systems that were brittle, opaque, and difficult to maintain.

In the hands of the right developer (one who understands the trade-offs), a document database is among the most powerful tools available. Distributed workloads, horizontal scaling, and flexible schemas, particularly in domains where structured data is the exception rather than the rule, are where document databases excel.

Graph Databases

Graph databases are my favorite to talk about. They enable the stickiness behind the most defensible applications. Very complex to understand syntax-wise, but very easy to understand conceptually. At their core, graph databases model the world the way it actually works: entities and the relationships between them. They allow us to understand complex thought patterns similar to the way neural layers in the human brain operate. Patterns that traditional or document databases cannot easily query or express.

Neo4j is a pioneer in how this operates, allowing data to be traversed instead of joined, and enabling multi-hop relationships that are not only cheap to compute, but designed to be understood at a more human level rather than through rigid structural abstractions.

The result is the foundation behind the best recommendation systems, social networks, and temporal applications. Systems where the data's value is not in what is merely reported, but in what is deeply connected and not easily replicated.

Hybrid Architectures

The most common mistake technologists make is believing that picking a database means picking one. Mature systems do not work this way. Hybrid architectures have gained traction because no single database paradigm is optimized for the full suite of requests from an enterprise system.

A well-designed hybrid architecture deliberately assigns each paradigm a clear and bounded responsibility. Relational databases anchor transactional truth and enforce invariants. Document databases absorb flexibility and scale where structure is fluid. Systems fail not because hybridity is complex, but because responsibilities are blurred: when document stores are used as sources of truth, or when relational schemas are contorted to simulate graph-like relationships they were never designed to express.

The prerequisite to getting this right is understanding the problem you are solving. Introductory computer science often teaches this through puzzle analogies: to assemble a puzzle efficiently, you first segment the pieces (edges, colors, patterns) before attempting construction. The same principle applies to architecture. Data must be sectioned by behavior and long-term intent before choosing how and where it should live.

Conclusion

Don't vibe code your data foundation :)