The Case for PostgreSQL®️

If you’re in database administration, chances are you’ve faced an age-old question: is it better to become slightly proficient in lots of different database tools, or become very proficient in one tool that can do a bit of everything reasonably well?

In this blog post, we’ll explore the value of PostgreSQL for database administration and explain why you should learn it if you are pursuing a career in this field.

What is Database Administration?

Database administration is the practice of storing and protecting data. For years, this practice labored in obscurity as a sub-function of application development or an also-ran of operations management. In the modern world of electronic everything, enterprise organizations are beginning to understand that data is the lifeblood of organization management.

As the awareness of data analysis improves, the market understanding of the importance of data is improving at an astounding rate. Database administrators are now considered a vital role in the service delivery pipeline. They are also being given a much louder voice in the role of operations. In order to perform this critical function of data storage and protection, database administration has grown to include many tasks:

  • Security of data in flight and at rest
  • Security of data at an application access level
  • Redundancy of data for multiple consumers
  • Availability of data through service stability
  • Recovery of data in disaster scenarios
  • Interpretation of data through defined storage
  • Maintenance of data for cleanliness
  • Archiving of data after retention policy has expired
  • Acquisition of data from foreign systems
  • Provision of data to applications

Where Does PostgreSQL Fit In?

The role of PostgreSQL in this market is as an open source Post-Object Relational Database Management System (PORDBMS). PostgreSQL has been available in the market for more than 25 years. In this time, it has grown to be a mature offering of data services. These services are produced by thousands of developers across the world.

For the moment, suffice it to say that you are not working in a vacuum here. So let’s go through that acronym one concept at a time and use it to define what PostgreSQL has grown to become.

stands for post. As in after object relational in clock time. This refers to the advanced storage and interpretation features of PostgreSQL such as JSON and XML support, alternative storage engines, replication models, and enterprise management tools. For the last decade or so, the PostgreSQL global development community has been concentrating on the toolset surrounding PostgreSQL to make it a world-class database ecosystem.

OR stands for object relational. This one is quite a bit more complex than you would imagine at first glance. The obvious reference is to the fact that the entities in the database (relations—tables, views, functions, etc.) are associated with one another through data references that are design dependent. In addition to that, the objects themselves may be used as base classes in an object-oriented programming sense. That is, a table may be defined using the definition of another table or a function may take a table definition as input and provide a different table definition as output. This rather unique implementation follows through to using built-in data types as the basis for your own custom data types, as well as constraint definitions.

DB stands for database. This is the core of the system which manages the data storage and retrieval layer. It is written in C, and it’s blindingly fast.

MS stands for management system The PostgreSQL service is a bit of a misnomer. It is actually several services working together to perform storage, retrieval, user management, caching, temporary storage, and several other tasks. It can grow to several hundred or even thousands of Linux processes, all requiring the coordination of a central service.

The Pros and Cons of PostgreSQL

This brings us to a very hot debate in the database community. The question of whether it is more advantageous to retain all of the data in one system, or to separate the data by function into bespoke systems. And the answer is, well, both. Or maybe not. Or maybe neither. Kinda depends. Sorta.

The question is usually very poorly asked and is therefore pretty poorly answered. To say that PostgreSQL is just for OLTP is to undersell its abilities to the point of libel. But to say that it can do everything is also a bit overstated. To ask the question in a bit better understanding, it would look like: “At what architectural point should I begin separating my data into scalable services?”

First, let’s dive into the “eggs all in one basket” advantages and disadvantages. For starters, the data is stored with consistency at the transaction level. That is, you won’t get one version of a row from one source, and a different row from another. All changes are reflected across the system in coherence with one another. The second big advantage is one of high availability configuration.

The more systems you split the data into, the more of them you have to have disaster plans and operational procedures for. One system equals one procedure plan. Much simpler. Also, a single system is easier to monitor. What happens if one part of a multipart service goes down? Is the system available or not? If there is only one system, then there is no difference in interpretation.

Do you really want to support multiple database types? Really? With one administrator or with several? Are the database administrators going to be separated by service or by skills? How many other pieces of architecture have to be designed to go with that? Load balancing, connection pooling, connection concentrators, DNS, caching, etc.? Creating a data services layer based on many data storage types can become overwhelmingly complex very quickly.

Now for the opposite point of view. The advantages of decentralization are that multiple services can be tuned independently for performance. Growth can be maintained for the largest growing service without having to upgrade everything. External utilities can also be sized appropriately to service. Performance is usually improved and diagnostics become divided by service making issues in flight easier to locate and diagnose.

Whether or not you go with one architecture or the other also depends a lot on the skill set of the implementation team. A “one-man-show” might want to keep the disaster recovery and high availability plans as simple as possible. More skills and more available time make more complex configurations realistic.

PostgreSQL Does Everything. Kinda.

So this brings us to the final form of the question: “At what point is it appropriate to separate the services?” And that answer lies with your specific data usage pattern. That is, PostgreSQL is a vertical scalability model by design. There can only be one system receiving changes to data at a time. This limitation determines the maximum number of transactions for your application that PostgreSQL will allow.

As you approach 60% of that limitation, you should start planning for multiple databases, whether they are also PostgreSQL or not. At about 80% of capacity, you should execute that plan. The plan is to stay in the single system solution as long as possible, but have a plan to get out of it if you outgrow it. The vast majority of systems will never outgrow the single system.

After all that, this discussion was to determine which solutions fit in the PostgreSQL box. The answer to that question is all of them, to a point. And some of them still fit after that point. By the time you get there, which ones are appropriate will become apparent.

PostgreSQL Use Cases Are Growing

Use cases for PostgreSQL are growing exponentially over time. That is, the number of applications that PostgreSQL can target is becoming greater as time goes by, and (more importantly) the number of applications that it can’t target is going down.

Two decades ago, it would have been reasonable to say that PostgreSQL was appropriate for OLTP solutions with a few megabytes of data. Now that target market includes OLTP, ETL/ELT, OLAP, ODS solutions, and much more. Now, it is easier to say what kind of applications PostgreSQL does **not** target, such as seismic data processing or the NYSE. But these corner cases are becoming fewer and fewer, almost to a vanishing point.

As PostgreSQL use cases grow, so do the professional opportunities for those who take the time to learn it. If you’re in development or database administration and haven’t yet learned PostgreSQL, this is an exciting time to dig in and see what it can do for your projects (and your career).

 

NB: This article originally appeared in a slightly longer form and permission has been kindly given by Instaclustr®, one of the founding members of the OpenSI, for its reuse.

Tagged in: