DiscoverThe Backend Engineering Show with Hussein Nasser
Claim Ownership
The Backend Engineering Show with Hussein Nasser
Author: Hussein Nasser
Subscribed: 479Played: 15,932Subscribe
Share
© Hussein Nasser
Description
Welcome to the Backend Engineering Show podcast with your host Hussein Nasser. If you like software engineering you’ve come to the right place. I discuss all sorts of software engineering technologies and news with specific focus on the backend. All opinions are my own.
Most of my content in the podcast is an audio version of videos I post on my youtube channel here http://www.youtube.com/c/HusseinNasser-software-engineering
Buy me a coffee
https://www.buymeacoffee.com/hnasr
🧑🏫 Courses I Teach
https://husseinnasser.com/courses
Most of my content in the podcast is an audio version of videos I post on my youtube channel here http://www.youtube.com/c/HusseinNasser-software-engineering
Buy me a coffee
https://www.buymeacoffee.com/hnasr
🧑🏫 Courses I Teach
https://husseinnasser.com/courses
512 Episodes
Reverse
Get my backend course https://backend.win
Google submitted a patch to Linux Kernel 6.8 to improve TCP performance by 40%, this is done via rearranging the tcp structures for better cpu cache lines, I explore this here.
0:00 Intro
0:30 Google improves Linux Kernel TCP by 40%
1:40 How CPU Cache Line Works
6:45 Reviewing the Google Patch
https://www.phoronix.com/news/Linux-6.8-Networking
https://lore.kernel.org/netdev/20231129072756.3684495-1-lixiaoyan@google.com/
Discovering Backend Bottlenecks: Unlocking Peak Performance
https://performance.husseinnasser.com
0:00 Intro
2:00 File System Block vs Database Pages
4:00 Torn pages or partial page
7:40 How Oracle Solves torn pages
8:40 MySQL InnoDB Doublewrite buffer
10:45 Postgres Full page writes
Get my backend course https://backend.win
Cloudflare has announced they are opening sources Pingora as a networking framework! Big news, let us discuss
0:00 Intro
0:30 Reasons why Cloudflare built Pingora?
3:00 It is a framework!
7:30 What in Pingora?
11:50 Security in Pingora
13:45 Multi-threading in Pingora
21:00 Customization vs Configuration
25:00 Summary
https://blog.cloudflare.com/pingora-open-source/?utm_campaign=cf_blog&utm_content=20240228&utm_medium=organic_social&utm_source=twitter
https://backend.win
https://databases.win
I’m a big believer that database systems share similar core fundamentals at their storage layer and understanding them allows one to compare different DBMS objectively. For example, How documents are stored in MongoDB is no different from how MySQL or PostgreSQL store rows.
Everything goes to pages of fixed size and those pages are flushed to disk.
Each database define page size differently based on their workload, for example MongoDB default page size is 32KB, MySQL InnoDB is 16KB and PostgreSQL is 8KB.
The trick is to fetch what you need from disk efficiently with as fewer I/Os as possible, the rest is API.
In this video I discuss the evolution of MongoDB internal architecture on how documents are stored and retrieved focusing on the index storage representation. I assume the reader is well versed with fundamentals of database engineering such as indexes, B+Trees, data files, WAL etc, you may pick up my database course to learn the skills.
Let us get started.
In this video I explore the type of languages, compiled, garbage collected, interpreted, JIT and more.
I talk about default values and how PostgreSQL 14 got slower when a default parameter has changed.
Mike's blog
https://smalldatum.blogspot.com/2024/02/it-wasnt-performance-regression-in.html
Background writing is a process that writes dirty pages in shared buffer to the disk (well goes to the OS file cache then get flushed to disk by the OS) I go into this process in this video
Fragmentation is a very interesting topic to me, especially when it comes to memory.
While virtually memory does solve external fragmentation (you can still allocate logically contiguous memory in non-contiguous physical memory) it does however introduce performance delays as we jump all over the physical memory to read what appears to us for example as contiguous array in virtual memory.
You see, DDR RAM consists of banks, rows and columns. Each row has around 1024 columns and each column has 64 bits which makes a row around 8kib. The cost of accessing the RAM is the cost of “opening” a row and all its columns (around 50-100 ns) once the row is opened all the columns are opened and the 8 kib is cached in the row buffer in the RAM.
The CPU can ask for an address and transfer 64 bytes at a time (called bursts) so if the CPU (or the MMU to be exact) asks for the next 64 bytes next to it, it comes at no cost because the entire row is cached in the RAM. However if the CPU sends a different address in a different row the old row must be closed and a new row should be opened taking an additional 50 ns hit. So spatial access of bytes ensures efficiency,
So fragmentation does hurt performance if the data you are accessing are not contiguous in physical memory (of course it doesn’t matter if it is contiguous in virtual memory). This kind of remind me of the old days of HDD and how the disk needle physically travels across the disk to read one file which prompted the need of “defragmentation” , although RAM access (and SSD NAND for that matter) isn’t as bad.
Moreover, virtual memory introduces internal fragmentation because of the use of fixed-size blocks (called pages and often 4kib in size), and those are mapped to frames in physical memory.
So if you want to allocate a 32bit integer (4 bytes) you get a 4 kib worth of memory, leaving a whopping 4092 allocated for the process but unused, which cannot be used by the OS. These little pockets of memory can add up as many processes. Another reason developers should take care when allocating memory for efficiency.
In this video I explore the hidden costs of sending a request from the frontend to the backend
Heard
https://medium.com/@hnasr/the-journey-of-a-request-to-the-backend-c3de704de223
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
https://database.husseinnasser.com
Why create Index blocks writes
In this video I explore how create index, why does it block writes and how create index concurrently work and allow writes.
0:00 Intro
1:28 How Create Index works
4:45 Create Index blocking Writes
5:00 Create Index Concurrently
HTTP/3 is getting popular in the cloud scene but before you migrate to HTTP/3 consider its cost. I explore it here.
0:00 Intro HTTP/3 is getting popular
3:40 HTTP/1.1 Cost
5:18 HTTP/2 Cost
6:30 HTTP/3 Cost
https://blog.apnic.net/2023/09/25/why-http-3-is-eating-the-world/
The Encrypted Client Hello or ECH is a new RFC that encrypts the TLS client hello to hide sensitive information like the SNI. In this video I go through pros and cons of this new rfc.
0:00 Intro
2:00 SNI
4:00 Client Hello
8:40 Encrypted Client Hello
11:30 Inner Client Hello Encryption
18:00 Client-Facing Outer SNI
21:20 Decrypting Inner Client Hello
23:30 Disadvantages
26:00 Censorship vs Privacy ECH
https://blog.cloudflare.com/announcing-encrypted-client-hello/
https://chromestatus.com/feature/6196703843581952
From the frontend through the kernel to the backend processWhen we send a request to a backend most of us focus on the processing aspect of the request which is really just the last step.
There is so much more happening before a request is ready to be processed, most of this step happens in the Kernel. I break this into 6 steps, each step can theoretically be executed by a dedicated thread or process. Pretty much all backends, web servers, proxies, frameworks and even databases have to do all these steps and they all do choose to do it differently.
Grab my backend performance course https://performance.husseinnasser.com
0:00 Intro
3:50 What is a Request?
10:14 Step 1 - Accept
21:30 Step 2 - Read
29:30 Step 3 - Decrypt
34:00 Step 4 - Parse
40:36 Step 5 - Decode
43:14 Step 6 - Process
Medium article
https://medium.com/@hnasr/the-journey-of-a-request-to-the-backend-c3de704de223
In a wonderful blog, Kyle explores the pains he faced managing a Postgres instance for a startup he works for and how enabling partitioning sigintfically created wait events causing the backend and subsequently NGINX to through 500 errors.
We discuss this in this video/podcast
https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits
WebTransport is a cutting-edge protocol framework designed to support multiplexed and secure transport over HTTP/2 and HTTP/3. It brings together the best of web and transport technologies, providing an all-in-one solution for real-time, bidirectional communication on the web.
Watch full episode (subscribers only) https://spotifyanchor-web.app.link/e/cTSGkq5XuAb
fsync is a linux system call that flushes all pages and metadata for a given file to the disk. It is indeed an expensive operation but required for durability especially for database systems. Regular writes that make it to the disk controller are often placed in the SSD local cache to accumulate more writes before getting flushed to the NAND cells.
However when the disk controller receives this flush command it is required to immediately persist all of the data to the NAND cells.
Some SSDs however don't do that because they don't trust the host and no-op the fsync. In this video I explain this in details and go through details on how postgres provide so many options to fine tune fsync
0:00 Intro
1:00 A Write doesn’t write
2:00 File System Page Cache
6:00 Fsync
7:30 SSD Cache
9:20 SSD ignores the flush
9:30 15 Year old Firefox fsync bug
12:30 What happens if SSD loses power
15:00 What options does Postgres exposes?
15:30 open_sync (O_SYNC)
16:15 open_datasync (O_DSYNC)
17:10 O_DIRECT
19:00 fsync
20:50 fdatasync
21:13 fsync = off
23:30 Don’t make your API simple
26:00 Database on metal?
ego is the main problem to a defective software product. the ego of the engineer or the tech lead seeps into the quality of the product.
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
https://backend.husseinnasser.com
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
https://database.husseinnasser.com
In version 5.3, MongoDB introduced a feature called clustered collection which stores documents in the _id index as oppose to the hidden wiredTiger hidden index. This eliminates an entire b+tree seek for reads using the _id index and also removes the additional write to the hidden index speeding both reads and writes.
However like we know in software engineering, everything has a cost. This feature does come with a few that one must be aware of before using it. In this video I discuss the following
How Original MongoDB Collections Work
How Clustered Collections Work
Benefits of Clustered Collections
Limitations of Clustered Collections
Prime video engineering team has posted a blog detailing how they moved their live stream monitoring service from microservices to a monolith reducing their cost by 90%, let us discuss this
0:00 Intro
2:00 Overview
10:35 Distributed System Overhead
21:30 From Microservices to Monolith
29:00 Scaling the Monolith
32:30 Takeaways
https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
https://backend.husseinnasser.com
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
https://database.husseinnasser.com
In a row-store database engine, rows are stored in units called pages. Each page has a fixed header and contains multiple rows, with each row having a record header followed by its respective columns. When the database fetches a page and places it in the shared buffer pool, we gain access to all rows and columns within that page. So, the question arises: if we have all the columns readily available in memory, why would SELECT * be slow and costly? Is it really as slow as people claim it to be? And if so why is it so? In this post, we will explore these questions and more.
0:00 Intro
1:49 Database Page Layout
5:00 How SELECT Works
10:49 No Index-Only Scans
18:00 Deserialization Cost
21:00 Not All Columns are Inline
28:00 Network Cost
36:00 Client Deserialization
https://medium.com/@hnasr/how-slow-is-select-8d4308ca1f0c
I was struggling understanding those devil concepts, I was in a position of can't talk on them with anyone because I haven't acquired the all thing that good. Thanks you sensei
Awesome job great content 👌👍