Overview of Object Oriented, Wide Column, and Vector Databases

Update: 2024-02-19

Description

We have a different combination of the hosts for this episode where we continue the series on the types of database systems available and why you might choose one over another. Michael continues impressing by recalling everything we’ve ever said on our 500+ hours of podcasts, Allen enjoys learning about a database system he’d never come across, and Joe is loaded up and ready for his trek to Georgia, USA.

Reviews

iTunes: Calum55555

Spotify: Ian Neethling, Ghostmerc, Xuraith

Audible: Wood2prog

News

Orlando Code Camp
https://orlandocodecamp.com/

Object Oriented DBMS

Popular: InterSystems Cache, 92. InterSystems IRIS, 161. DB4o, 154. ObjectStore, 159. Actian NoSQL Database

The idea was to store data in the database the way that it’s modeled in the application
https://stackoverflow.com/questions/9884407/what-is-the-difference-between-object-oriented-and-document-databases#:~:text=The big difference%2C that I,but they’re organized differently.

Relationships and inheritance would also be modeled in the database

Would be more performant because the data would be stored in the way the application would expect without using complex joins

Fallen out of popularity with the availability of ORM’s for RDBMS
https://www.ionos.com/digitalguide/hosting/technical-matters/object-oriented-databases/

From InterSystems IRIS info
- Based on the ODMG (Object Database Management Group) standard with advanced features like multiple inheritance
- ObjectScript and Python directly manipulate and read from the storage – objects can also be exposed in other languages like .NET, JavaScript, Java and C++
- Can also be queried with SQL syntax

Wide Column Stores

Popular: 12. Cassandra, 26. HBase, 27. Azure Cosmos DB

Also known as extensible record stores
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf

Can hold extremely large numbers of dynamic columns
- How much is a large number – “a record can have billions of columns” – which is why they’re also described as two-dimensional key/value stores

Schema on read

Wide column stores should not be confused with columnar storage in RDBMS – the latter is an implementation detail inside a relational database system that imroves OLAP type of performance by storing data column by column rather than record by record

Using Cassandra as the information – https://cassandra.apache.org/_/cassandra-basics.html
- Hyper-horizontally scalable
  - Prevents data loss due to hardware failures (if scaled)
- Ability to tweak throughput of reads or writes in isolation
  https://www.codingblocks.net/podcast/search-driven-apps/
- It’s “distributed” manner means it runs on many nodes but it looks like a single point of entry
- No real point of running a single node of Cassandra
- “Masterless” architecture – every node in a cluster acts like every other node
  https://www.codingblocks.net/podcast/designing-data-intensive-applications-secondary-indexes-rebalancing-routing/
- In contrast with traditional RDMBS – can be scaled on low-cost, commodity hardware – don’t need super-high-end motherboards that support terrabytes of ram to scale
- Linear scalability – every node you add gives you + n throughput
  https://www.datastax.com/products/datastax-astra
- Replication is handled by tweaking replication factors – ie how many times you want the data replicated in order to stay in a good state
- Per query configurable consistency – how many nodes must acknowledge the read/write query before returning a success

Vector DBMS

Popular: 52. Kdb, 103. Pinecone, 139. Chroma

A database system that specializes in storing vector embeddings and being able to retrieve them quickly
- What is a vector embedding?
  - https://www.pinecone.io/learn/vector-embeddings-for-developers/
  - What is a vector? A mathematical structure with a size and a direction
    - Think of it as a point in space (on a graph) with the direction being the arrow from (0,0,0) to the vector point
    - They say for developers, it’s easier to think of vectors as an array of numbers
    - When you look at the vectors in space, some will be floating by themselves while others might be clustered closely to each other
  - Vectors are very useful in Machine Learning algorithms because CPUs and GPUs are very good at doing math
  - Vector Embeddings is the process of converting virtually any data structure into vectors
  - It’s not as simple as just a straight conversion
    - You don’t want to lose the original data’s “meaning”
      - An example they used was comparing two sentences – you wouldn’t just compare the words, you want to compare if the two sentences had the same meaning
      - To keep the meaning and produce vectors with relationships that make sense, that requires embedding models
    - Nowadays, many embedding models are created by passing large sets of “labeled” data to neural networks
      https://en.wikipedia.org/wiki/Neural_network
      - Neural networks are trained using supervised learning (usually), they can also be self-supervised or unsupervised learning
        
        Using a supervised model, you pass in large sets of data as pairs of inputs and labeled outputs
        
        The values are transformed in each layer of the neural network
        
        With each training of the neural network, the activations at each layer are modified
        
        The goal is that eventually the neural network will be able to provide an output for any given input, even if it hasn’t seen that specific input before
      - The embedding model is essentially those layers of the neural network minus the last one that was labeling data – rather than getting labeled data you get a vector embedding
    - They have a great visualization on the pinecone page showing the output of a word2vec embedding model that shows how words would appear in this 3d vectror space
    - This is what an embedding model does – it can take inputs and know where to place them in “vector space”
      - Items placed closer together are more related, and further apart, less related

Ok, so now we know what vector embeddings are, what can we do with them?
- Semantic search – rather than having search engines be able to search for words that are similar to what you entered, they can now search for content with meaning similar to what you searched for
- Question answering applications
- Audio search

Check out the page of sample applications – https://docs.pinecone.io/page/examples

Resources

Primary resource we used for these database rankings
https://db-engines.com/en/ranking

Some nice ways to learn about Machine Learning in an approachable way
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

Tips of the Week

docker init – let AI help you generate a better Dockerfile
https://medium.com/@akhilesh-mishra/you-should-stop-writing-dockerfiles-today-do-this-instead-3cd8a44cb8b0

epoch converter has code samples!!!
https://www.epochconverter.com/

Add a someone you trust as an

Comments

In Channel

When to Log Out

2024-10-0701:03:13

Things to Know when Considering Multi-Tenant or Multi-Threaded Applications

2024-09-0201:58:44

Two Water Coolers Walk Into a Bar…

2024-08-1801:33:42

How did We Even Arrive Here?

2024-08-0401:37:13

AI, Blank Pages, and Client Libraries…oh my!

2024-07-0701:47:20

Alternatives to Administering and Running Apache Kafka

2024-06-2301:05:14

Nuts and Bolts of Apache Kafka

2024-06-0901:37:25

Intro to Apache Kafka

2024-05-2602:04:47

StackOverflow AI Disagreements, Kotlin Coroutines and More

2024-05-1301:41:38

Llama 3 is Here, Spending Time on Environmental Setup and More

2024-04-2801:33:36

Ktor, Logging Ideas, and Plugin Safety

2024-04-1401:38:38

Importance of Data Structures, Bad Documentation and Comments and More

2024-04-0101:40:42

Decorating your Home Office

2024-03-1801:21:17

Multi-Value, Spatial, and Event Store Databases

2024-03-0401:07:13

Overview of Object Oriented, Wide Column, and Vector Databases

2024-02-1902:03:37

Picking the Right Database Type – Tougher than You Think

2024-02-0502:10:54

There is still cool stuff on the internet

2024-01-2101:38:40

Reflecting on 2023 and Looking Forward to 2024

2024-01-0801:52:18

Gartner Top Strategic Technology Trends 2024

2023-12-1801:40:14

2023 Holiday Season Developer Shopping List

2023-11-2502:28:50

00:00

Overview of Object Oriented, Wide Column, and Vector Databases

Allen Underwood, Michael Outlaw, Joe Zack

#box-pro-ellipsis-176575480395148{-webkit-line-clamp:2;}Overview of Object Oriented, Wide Column, and Vector Databases

Reviews

News

Object Oriented DBMS

Wide Column Stores

Vector DBMS

Resources

Tips of the Week

Overview of Object Oriented, Wide Column, and Vector Databases

Allen Underwood, Michael Outlaw, Joe Zack

Overview of Object Oriented, Wide Column, and Vector Databases