Looking Back At My DB Experiences

Intro

My feelings toward ORMs are complicated. It feels like every framework, every project now uses ORMs. ORMs make traversals from one object through relationships (any of one-to-one, one-to-many, many-to-one, many-to-many) feel like a piece of cake. You barely even have to understand SQL! ... Or so you think. How are those objects and their relationships hydrated with data? Do you even know?

My Different Experiences with Relational Databases

My First Dance: Plain, Raw SQL On a College Project

My first project ever was just raw SQL all over the place. My professor in college mandated it be that way. Her husband worked at Draft Kings, and she told us how they all wrote raw SQL even in production. I'm glad I started there. At the time, the act of typing so many column names and table names out (with no autocomplete) was frankly torture. I wasn't all that fast a typer yet, my SQL skills themselves were developing on the fly, and I had never even made an app before. Just figuring out how to get my JSP (Java Server Pages) completely unstyled, raw HTML application running on localhost caused me plenty of pain by itself.

Now while I felt really slowin terms of productivity and output. The entire journey through that project, I was absolutely aware of every single roundtrip to the database because I wrote every single statement from scratch. More on this later, but the SQL I wrote as a stupid newbie on a college project was objectively better than many "production-grade" projects I worked on later. I was a terrible programmer at the time, but I wrote something extremely fast because I wrote every interaction with my data store by hand.

I didn't know that of course at the time. I was haunted by the fact that I was absolutely certain that there were options out there...I'm not so sure I was right.

Solo Web App #1: Raw SQL Again

Between that experience and my first job, I dabbled with a little local postgres instances on my computer (I was obsessed with basketball stats at the time), but it was at my first job where we met "for real" again—with actual customers this time. The company was a small consulting firm for state DOTs (Departments of Transportation). We were not really a software company per se, but occasionally the company dabbled with it Wordpress sites or hire contractors to build different tools. They knew I had some (I was really a Math and Econ major) computer science background, so over time they gave me more and more software projects.

At my first adult job, my first voyage in app-making was to build essentially a survey/test-taking cool with a few graphs at the end. Transportation practitioners—as we called them—would login, create assessment(s), and answer series of questions about the state of their agencies. At the end, they'd get some summary facts and a little graph. Not knowing much about apps, I built the thing in flask and built every little thing from scratch. I wrote all my database interactions in raw sql. This is an aside, but I also pretty much stuck just to functions in python-land. I didn't understand why classes and methods seemed to be everywhere—more on that in another article I'm sure. Classes were literally only used if a 3rd party required their usage. Eventually, I have to say that I made a half-decent product (it did definitely take a while though and it stylistically looked like crap), but it totally worked smooth like butter - and still does today.

Solo Web App #2: Enter the ORMs

At that point, they said can you do it again? And can you make it more reusable for future surveys/assessments. I said I'd try. I thought this time around, a framework might be the move. Maybe it'll teach me things. I read "Two Scoops of Django" (a great book by the way) as I got to work. This whole ORM thing was kinda nice, but why in the world were some of my pages so slow.

I hit the N+1 problem so easily with my little toy app. See, this time around, they wanted to enable group assessments. Admin could make group assessments, send invites to others, let them take assessments, track their progress, fire off a unified final "consensus" assessment, export their results, clone their assessment for next year, and every assessment had two distinct assessment phases ... I was not good at saying no. But in building this data model, there were several places where I needed the parent assessment's children assessments' questions, their answers, their comments, and rollup percents of their progress across both phases. Django tries to hide your tablenames, I think generally the community will tell you: don't write raw sql in your Django apps. Plus I had so much to do. I was trying to write it *fast*. Well, this project didn't go nearly as well - mostly because of the product requirements for a still quite green "software engineer" (at the time I didn't feel equipped to even call myself that), but my lack of ability to make the calls to the DB when-I-wanted, how-I-wanted really destroyed my ability to make this project reliable or performant. I'm not sure I was well-versed enough in SQL at the time anyways to write elegant/performant SQL for some of these things, but I'm convinced now the plain SQL approach would have better end product for me - at least at the time.

During the project, I at least learned about select_related and prefetch_related and. I also wasted a ton of time making my app template generic. There was more more iteration upon this app: another similar assessment-like tool with which I decided to give React a go as well (somebody really should have told me about Ruby on Rails or Elixir at this point). Oh, what a time it was—coding alone with no mentorship, pieced together solely by my own intuition and random stackoverflow forums.

Flash forward to my current and longest job. We use Flask and SQLAlchemy to power a GraphQL API. To make graphql work well, it is essential that you eagerload/dataload every path in your graph. We went with a system where we dynamically inspect our sqlalchemy models and graphene types and autogenerate a ton of stuff including SQL-driven filters, count fields for all our sqlalchemy relationship, aggregates, etc. There are many projects that have now done this kind of thing, but we really did do all of this in-house. This is definitely where I picked up the greatest amount of ORM—and SQL-knowledge. So often graphql produced a codepath through a calculated column or something of that nature that produced wildly inefficient roundtripping code. We have used all sorts of different measures to counter roundtrips, long running queries, recursive ctes, ... with hybrid properties, our handrolled "banquet properties" with either an expression or custom comparators, sql expressions as column properties, views invisible to the ORM, tables defined by sqlalchemy-core select statements (like a view), custom sql functions, postgres triggers, async workers, ... we've tried a lot. And because we've tried a lot, I think I finally have some authority to speak on ORMs now, so stay tuned for the next post all about ORM footguns (there are MANY)!