Picking the Right Database Type – Tougher than You Think

Update: 2024-02-05

Description

You asked, we listened! A request from one of our Slack channels was to go over the various types of databases and why you might choose one over another. Join us in another information filled episode where Joe won’t be attending the event he’s been promoting and Allen tries to keep his voice together for the entirety of the episode, and almost succeeded.

News

Reviews

iTunes: ivan.kuchin, MikeW717

Spotify: Darren Pruitt, chutney3000

Upcoming Events

Orlando Code Camp – Conference is February 24th
https://orlandocodecamp.com

Miscellaneous

Kudos to Dell Support on their monitors

The Cat 8 journey will be beginning soon

Home offices – random desires

Database Types

Primary resource we used

https://db-engines.com/en/ranking

Some terminology we’ll be using

Schema on write – the schema for the data is determined before writing the record

Schema on read – the schema of the data is understood by the client using the data

Relational DBMS

Popular – 1. Oracle, 2. mySQL, 3. Microsoft SQL Server, 4. PostgreSQL, 8. IBM DB2, 9. Snowflake, 11. Microsoft Access

Schema on write

Primary language / form of access is SQL

Schema is defined by named tables with named columns and specific data types

Data exists as rows in the table that conform to the columns/types that are defined in the schema

Scalability – typically vertical scaling (increasing available CPU/RAM) is the preferred way
- Horizontal scaling with most RDBMS’s is generally complex and requires a lot of thought and effort
  - https://www.designgurus.io/blog/scaling-sql-databases

Can be very performant but requires knowledge on how to index and store data properly
- Even with excellent design and indexing, performance can suffer as size of data grows

Some fun Instragram posts on scaling their databases
- https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c
- https://earthweb.com/how-many-pictures-are-on-instagram/

Key-value stores

Popular: 6. Redis, 15. Amazon Dynamo DB, 27. Azure Cosmos DB, 35. Memcached, 54. etcd

Schema on read

No real language – usually an API to put and get documents

Depending on the key value store, complex data structures may be stored and ability to query in various ways

Scalability – horizontally scalable – massively

Very performant

Many have built in extended functionality beyond looking up by a single key – for instance, Redis allows search engine type of filtering

Why’s Hadoop not on the list?
https://db-engines.com/en/blog_post/16

Document Stores

Popular: 5. MongoDB, 15. Amazon DynamoDB, 17. Databricks, 27. Azure Cosmos DB, 34. Couchbase

Schema on read

DBMS specific querying – usually offer a SQL capability but often times is not the most powerful way to query the data

Documents do not need to conform to any schema
- Multiple documents in the same collection can have completely different fields/properties, OR they have have the same properties with different data types
- Documents can contain collections in fields or even nest other documents
- Typically stores data in JSON like documents

Can be very performant but may require care to create proper indexes, manage connections, etc

Time Series DBMS

Popular: 28. InfluxDB, 50. Prometheus, 52. Kdb, 79. Graphite, 73. TimescaleDB

Schema on read

Has special features specifically tailored to time series data that isn’t quite as easy / performant in a regular RDMBS or Key/Value store
- Things like querying instants, range vectors, complex joins on ranges, etc
  - https://prometheus.io/docs/prometheus/latest/querying/basics/
- Also have built in functions specific to the needs of time series data – things like rates, deltas, histograms, quantiles, etc
  - https://prometheus.io/docs/prometheus/latest/querying/functions/

Scalability seems to vary – InfluxDB is set up for scaling via clusters with meta and data nodes, whereas Prometheus has a different federated approach
- Scaling Prometheus – https://logz.io/blog/prometheus-architecture-at-scale/
- Scaling InfluxDB – https://www.influxdata.com/blog/influxdb-clustering/

Very performant for querying time series related data
- Obviously there’s always things to consider – such as histograms vs quantiles in Prometheus – client vs server side
  - https://prometheus.io/docs/practices/histograms/

Graph DBMS

Popular: 22. Neo4j, 27. Azure Comsos DB, 59. Aerospike, 75. Virtuoso, 85. ArangoDB

Schema on write (mostly) – not sure if all graph databases force labels and attributes to be consistent
- https://neo4j.com/docs/getting-started/data-modeling/guide-data-modeling/

Different in terms of functionality than other databases – graph databases store data in terms of nodes and edges
- Edges are the relationships between the nodes

Great explanation on the Neo4j website – https://neo4j.com/docs/getting-started/data-modeling/guide-data-modeling/

Use cases – https://neo4j.com/use-cases/
- Fraud and detection analysis
  - Financial Fraud Detection with Graph Data Science
  - Money Laundering Prevention with Neo4j
  - Why Intelligent Applications Need a Graph Database with Granular Security
  - Fraud Detection with Neo4j
- Identity and access management
- Network and IT operations
- Real time recommendations

So why a graph database? Can’t you do this with an RDBMS and joins?
- The friend of a friend scenario – a graph database can easily and performantly return relationships with 20 degrees of separation or more – try that in a SQL query and watch your mind and database engine melt
  - https://neo4j.com/videos/why-neo4j-3/

Neo4j has built in scalability via sharding – https://neo4j.com/product/neo4j-graph-database/scalability/

Search engine

Popular: 7. Elasticsearch, 14. Splunk, 24. Solr, 40. OpenSearch, 58. MarkLogic

Extensions of NoSQL databases

Schema on read

Complex search expressions

Full text search

Stemming – reducing words to their root forms so that searches can be more accurate with similar word searches

Ranking and grouping of search results

Built for scalability

Incredibly performant for the use case

Not great with relationship data

Why choose over something like a relational or document database?

Resources

https://db-engines.com/en/ranking

https://db-engines.com/en/articles

All the DB vendor websites – so much good information

<a href="https://www.codingblocks.net/get/designing-data-intensive-applicati

Comments

In Channel

When to Log Out

2024-10-0701:03:13

Things to Know when Considering Multi-Tenant or Multi-Threaded Applications

2024-09-0201:58:44

Two Water Coolers Walk Into a Bar…

2024-08-1801:33:42

How did We Even Arrive Here?

2024-08-0401:37:13

AI, Blank Pages, and Client Libraries…oh my!

2024-07-0701:47:20

Alternatives to Administering and Running Apache Kafka

2024-06-2301:05:14

Nuts and Bolts of Apache Kafka

2024-06-0901:37:25

Intro to Apache Kafka

2024-05-2602:04:47

StackOverflow AI Disagreements, Kotlin Coroutines and More

2024-05-1301:41:38

Llama 3 is Here, Spending Time on Environmental Setup and More

2024-04-2801:33:36

Ktor, Logging Ideas, and Plugin Safety

2024-04-1401:38:38

Importance of Data Structures, Bad Documentation and Comments and More

2024-04-0101:40:42

Decorating your Home Office

2024-03-1801:21:17

Multi-Value, Spatial, and Event Store Databases

2024-03-0401:07:13

Overview of Object Oriented, Wide Column, and Vector Databases

2024-02-1902:03:37

Picking the Right Database Type – Tougher than You Think

2024-02-0502:10:54

There is still cool stuff on the internet

2024-01-2101:38:40

Reflecting on 2023 and Looking Forward to 2024

2024-01-0801:52:18

Gartner Top Strategic Technology Trends 2024

2023-12-1801:40:14

2023 Holiday Season Developer Shopping List

2023-11-2502:28:50

00:00

1.0x

Picking the Right Database Type – Tougher than You Think

Allen Underwood, Michael Outlaw, Joe Zack

#box-pro-ellipsis-176575512409657{-webkit-line-clamp:2;}Picking the Right Database Type – Tougher than You Think

News

Reviews

Upcoming Events

Miscellaneous

Database Types

Relational DBMS

Key-value stores

Document Stores

Time Series DBMS

Graph DBMS

Search engine

Resources

Picking the Right Database Type – Tougher than You Think

Allen Underwood, Michael Outlaw, Joe Zack

Picking the Right Database Type – Tougher than You Think