The Color of Language

The Color of Language

Update: 2023-02-03
Share

Description

When we start a new software project we “swing with a big hammer”, standing up structures and framework quickly. What we build resembled the both the native Oracle APEX environment plus the client’s colors, logos, and fonts. Through this initial framework, we, the development team, offered a vision. From here a shared vision developed through shared work. We also needed to accommodate the multiple languages of French, Dutch, and the English that the developers depended on. We need to comply with European data privacy laws which are stricter than those in the United States. Building a robust application that accommodates the nuanced complexities that our client will require to differentiate and support their business practices, we design speed and resilience into our system. Designing for speed, also means avoiding techniques that rob us of speed.
Speed within a database environment involves honor a series of rules called “Data Normalization”. A discussion about data normalization typically involves slides discussing abstract rules about optimizing complex data within a database environment. Edgar “Ted” Codd, born in 1923, invented the relational model for database management. He worked for IBM. He received the Turing Award in 1981. Dr. Cobb developed the relational model for databases in 1969. Sixty years later, his initial work expanded. The initial data normalization steps I learned have expanded and some nomenclature changed.
The amazing thing to me is that people being invented relational databases. And people invented programming languages. Today, we argue about these topics forgetting the humanity underpinning these technologies. 
My favorite college professor, John Jungck, stood before his Bio 101 course each year to give a lecture called: “There are No Facts”. Unlike modern disputes about facts, he forced no political agenda. He challenged each in the audience to approach science and technology with an open mind. We must eschew assumptions. “Oh, you think two plus two equals four?” Then he reaches under the lectern. He pours two liters of clear water-like fluid into a container with two liters of clear water-like fluid. The total was a bit less than the expected four liters. A bit of chemistry happened given that one of the fluids was not water. Bluntly put, he performed a parlor trick for us. As an illustration, it works. He encouraged student to be curious, skeptical, and carry a bit of doubt when people get dogmatic about anything.
All of this technology we use to build and support software applications results from inventions and ideas that came the mind of people. People like you and like me. We invented it all. 
When honoring the process of normalizing data, we gain speed and maximize performance within relational databases. I do promise, we are talking about how to manage multiple spoken languages within a database. 
In Episode 2 “Data Tables”, I mentioned that my colleague Dirk provided us with 132 data table definitions filling over 5000 lines of text. I discussed the importance of a unique primary key for each row, or record, of data. One row of data contained the data profile for precisely one subject. One example was the customer table. The customer table has a primary key called the customer_pk. All of the data within that row must related to that exact customer. If it doesn’t related to or describe an element of the client’s profile, then it does not belong. 
That statement of “fact”, which I surrounded with quotes, complies with the First Normal Form of data. Thank you Doctor Cobb. You documented this idea and stamped it with a name in 1970.
When a data row has a singular primary key, as our customer table does with the customer_pk, then it meets the standards for the Second Normal Form. Well done us. This seems obvious to software developers. It seems so obvious to some that we forget the intelligence and humanity behind this concept. 
Database tables relate to each other. We do not store each and every invoice within the customer table. Instead, we create a table that stores the invoice data. In that table, we’ll have a unique primary key called invoice_pk (meeting both the first and second normal forms – well done us). In the second position of that invoice table, we keep the foreign key for the customer table. That foreign key, our team calls customer_fk. I look at that knowing instantly that invoices relate to customers and the customer foreign key connects to the customer primary key. We break the data into logical and non-duplicating elements. 
For example, some may opt to put a customer’s address in the customer table. Here’s the street name, the municipality name, the state or province, and the postal code. Suddenly, we find a customer with two addresses. Or maybe one address is for mailing and the other is for shipping. Maybe one address is the physical address, etc. We have all experienced this complexity. When I order from an online vendor, I have one postal code for our physical address and a separate postal code for our mailing address. When I want items shipped to our home, I use the postal code for a municipality that is 50 kilometers east of us. The credit card statement goes to a different postal code which carries a different municipality name. Neither of these agree with the emergency services (or 911/999) address for our farm. That is a third address. Yes, all of these three addresses land on our 40 hectare property clinging to the side of the world’s oldest mountain range. 
To manage complexity of addresses, we ought to have a table for addresses. The first column is reserved for the address primary key (yes, of course it is called address_pk). The second column is then the customer foreign key (customer_fk). 
These relationships and efforts to segregate data into a parent/child relationship satisfies the third normal form. Customer is a parent to both address and invoices. In human-speak, one customer may have zero, one, or more addresses. One customer may have zero, one, or more invoices. The parent table may have zero, one, or more related child rows of data in other tables. 
I have always had difficulty in telling a story about the 4th normal form of relational data. It is a subtle shift in perception of data duplication. In the prior forms, we strove to eliminate the duplication of data within a row. In short, the normalization process discourages us from having fields such as Address 1 and Address 2 and Address 3 and Address 4 in our tables. We should pull that mess out to create an address table, then simplify the customer table. We want to remove these duplicate-like fields of Address 1, Address 2, Address 3. It makes our lives easier. Picture a customer with only 1 address. Can you picture how to handle a customer with four addresses? How do we handle that?
In the 4th Normal Form, we strive to reduce redundancy between rows of data. In the Second normal form, we don’t want to keep adding fields to accommodate new and more address for a customer. That’s inefficient and difficult to write. It brings more problems than it solves, Dr Cobb was right. 
In the 4th Normal Form, we reduce the number of rows by creating related tables. Imagine that each contact at a customer also included the customers address, the customer’s primary phone number, and such. In this example, five rows of contact data would have the same Customer Name. Five rows would have the same address. Five rows would have the same phone numbers. This can result in table having compound keys. The most important data fields are duplicated. When searching for the contacts for Acme Company, I get five rows with the name Acme Company. I see five rows with the same phone number and five rows with the same address. Imagine that two of my contact people have the surname Gonzales. What if a father-son duo work there. We suddenly have Pablo Go...

Comments 
In Channel
Print That!

Print That!

2023-03-0341:18

A Heavy Lift

A Heavy Lift

2023-02-1742:35

The Color of Language

The Color of Language

2023-02-0334:02

Middleware

Middleware

2023-01-2034:17

Framework

Framework

2023-01-0635:49

Data Tables

Data Tables

2022-12-2340:38

Chapter 8 | Okta

Chapter 8 | Okta

2020-12-0929:22

Chapter 6 | Recurly

Chapter 6 | Recurly

2020-11-2534:58

Chapter 5 | PayPal

Chapter 5 | PayPal

2020-11-1822:29

Chapter 4 | Plans

Chapter 4 | Plans

2020-11-1115:40

Chapter 2 | The Cloud

Chapter 2 | The Cloud

2020-10-2816:56

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

The Color of Language

The Color of Language

Christina Mcdonald Moore