Table of Contents

Systems Performance Table of Contents

Return to Systems Performance Index, Systems Performance Glossary, Systems Performance, 2nd Edition, Performance Bibliography, Systems Performance, Performance DevOps, IT Bibliography, DevOps Bibliography

“ (SysPrfBGrg 2021)

Contents at a Glance

Detailed Contents

1. Introduction

2. Methodologies

3. Operating Systems

4. Observability Tools

5. Applications

6. CPUs

7. Memory

7 Systems Performance Memory

7.1 Terminology

7.2 Concepts

7.2.1 Virtual Memory

7.2.2 Paging

7.2.3 Demand Paging

7.2.4 Overcommit

7.2.5 Process Swapping

7.2.6 File System Cache Usage

7.2.7 Utilization and Saturation

7.2.8 Allocators

7.2.9 Shared Memory

7.2.10 Working Set Size

7.2.11 Word Size

7.3 Architecture

7.3.1 Hardware

7.3.2 Software

7.3.3 Process Virtual Address Space

7.4 Methodology

7.4.1 Tools Method

7.4.2 USE Method

7.4.3 Characterizing Usage

7.4.4 Cycle Analysis

7.4.5 Performance Monitoring

7.4.6 Leak Detection

7.4.7 Static Performance Tuning

7.4.8 Resource Controls

7.4.9 Micro-Benchmarking

7.4.10 Memory Shrinking

7.5 Observability Tools

7.5.1 vmstat

7.5.2 PSI

7.5.3 swapon

7.5.4 sar

7.5.5 slabtop

7.5.6 numastat

7.5.7 ps

7.5.8 top

7.5.9 pmap

7.5.10 perf

7.5.11 drsnoop

7.5.12 wss

7.5.13 bpftrace

7.5.14 Other Tools

7.6 Tuning

7.6.1 Tunable Parameters

7.6.2 Multiple Page Sizes

7.6.3 Allocators

7.6.4 NUMA Binding

7.6.5 Resource Controls

7.7 Exercises

7.8 References

8. File Systems

8 Systems Performance File Systems

8.1 Terminology

8.2 Models

8.2.1 File System Interfaces

8.2.2 File System Cache

8.2.3 Second-Level Cache

8.3 Concepts

8.3.1 File System Latency

8.3.2 Caching

8.3.3 Random vs. Sequential I/O

8.3.4 Prefetch

8.3.5 Read-Ahead

8.3.6 Write-Back Caching

8.3.7 Synchronous Writes

8.3.8 Raw and Direct I/O

8.3.9 Non-Blocking I/O

8.3.10 Memory-Mapped Files

8.3.11 Metadata

8.3.12 Logical vs. Physical I/O

8.3.13 Operations Are Not Equal

8.3.14 Special File Systems

8.3.15 Access Timestamps

8.3.16 Capacity

8.4 Architecture

8.4.1 File System I/O Stack

8.4.2 VFS

8.4.3 File System Caches

8.4.4 File System Features

8.4.5 File System Types

8.4.6 Volumes and Pools

8.5 Methodology

8.5.1 Disk Analysis

8.5.2 Latency Analysis

8.5.3 Workload Characterization

8.5.4 Performance Monitoring

8.5.5 Static Performance Tuning

8.5.6 Cache Tuning

8.5.7 Workload Separation

8.5.8 Micro-Benchmarking

8.6 Observability Tools

8.6.1 mount

8.6.2 free

8.6.3 top

8.6.4 vmstat

8.6.5 sar

8.6.6 slabtop

8.6.7 strace

8.6.8 fatrace

8.6.9 LatencyTOP

8.6.10 opensnoop

8.6.11 filetop

8.6.12 cachestat

8.6.13 ext4dist (xfs, zfs, btrfs, nfs)

8.6.14 ext4slower (xfs, zfs, btrfs, nfs)

8.6.15 bpftrace

8.6.17 Other Tools

8.6.18 Visualizations

8.7 Experimentation

8.7.1 Ad Hoc

8.7.2 Micro-Benchmark Tools

8.7.3 Cache Flushing

8.8 Tuning

8.8.1 Application Calls

8.8.2 ext4

8.8.3 ZFS

8.9 Exercises

8.10 References

9. Disks

9 Systems Performance Disks

9.1 Terminology

9.2 Models

9.2.1 Simple Disk

9.2.2 Caching Disk

9.2.3 Controller

9.3 Concepts

9.3.1 Measuring Time

9.3.2 Time Scales

9.3.3 Caching

9.3.4 Random vs. Sequential I/O

9.3.5 Read/Write Ratio

9.3.6 I/O Size

9.3.7 IOPS Are Not Equal

9.3.8 Non-Data-Transfer Disk Commands

9.3.9 Utilization

9.3.10 Saturation

9.3.11 I/O Wait

9.3.12 Synchronous vs. Asynchronous

9.3.13 Disk vs. Application I/O

9.4 Architecture

9.4.1 Disk Types

9.4.2 Interfaces

9.4.3 Storage Types

9.4.4 Operating System Disk I/O Stack

9.5 Methodology

9.5.1 Tools Method

9.5.2 USE Method

9.5.3 Performance Monitoring

9.5.4 Workload Characterization

9.5.5 Latency Analysis

9.5.6 Static Performance Tuning

9.5.7 Cache Tuning

9.5.8 Resource Controls

9.5.9 Micro-Benchmarking

9.5.10 Scaling

9.6 Observability Tools

9.6.1 iostat

9.6.2 sar

9.6.3 PSI

9.6.4 pidstat

9.6.5 perf

9.6.6 biolatency

9.6.7 biosnoop

9.6.8 iotop, biotop

9.6.9 biostacks

9.6.10 blktrace

9.6.11 bpftrace

9.6.12 MegaCli

9.6.13 smartctl

9.6.14 SCSI Logging

9.6.15 Other Tools

9.7 Visualizations

9.7.1 Line Graphs

9.7.2 Latency Scatter Plots

9.7.3 Latency Heat Maps

9.7.4 Offset Heat Maps

9.7.5 Utilization Heat Maps

9.8 Experimentation

9.8.1 Ad Hoc

9.8.2 Custom Load Generators

9.8.3 Micro-Benchmark Tools

9.8.4 Random Read Example

9.8.5 ioping

9.8.6 fio

9.8.7 blkreplay

9.9 Tuning

9.9.1 Operating System Tunables

9.9.2 Disk Device Tunables

9.9.3 Disk Controller Tunables

9.10 Exercises

9.11 References

10. Networks

10 Systems Performance Network

10.1 Terminology

10.2 Models

10.2.1 Network Interface

10.2.2 Controller

10.2.3 Protocol Stack

10.3 Concepts

10.3.1 Networks and Routing

10.3.2 Protocols

10.3.3 Encapsulation

10.3.4 Packet Size

10.3.5 Latency

10.3.6 Buffering

10.3.7 Connection Backlog

10.3.8 Interface Negotiation

10.3.9 Congestion Avoidance

10.3.10 Utilization

10.3.11 Local Connections

10.4 Architecture

10.4.1 Protocols

10.4.2 Hardware

10.4.3 Software

10.5 Methodology

10.5.1 Tools Method

10.5.2 USE Method

10.5.3 Workload Characterization

10.5.4 Latency Analysis

10.5.5 Performance Monitoring

10.5.6 Packet Sniffing

10.5.7 TCP Analysis

10.5.8 Static Performance Tuning

10.5.9 Resource Controls

10.5.10 Micro-Benchmarking

10.6 Observability Tools

10.6.1 ss

10.6.2 ip

10.6.3 ifconfig

10.6.4 nstat

10.6.5 netstat

10.6.6 sar

10.6.7 nicstat

10.6.8 ethtool

10.6.9 tcplife

10.6.10 tcptop

10.6.11 tcpretrans

10.6.12 bpftrace

10.6.13 tcpdump

10.6.14 Wireshark

10.6.15 Other Tools

10.7 Experimentation

10.7.1 ping

10.7.2 traceroute

10.7.3 pathchar

10.7.4 iperf

10.7.5 netperf

10.7.6 tc

10.7.7 Other Tools

10.8 Tuning

10.8.1 System-Wide

10.8.2 Socket Options

10.8.3 Configuration

10.9 Exercises

10.10 References

11. Cloud Computing

11 Systems Performance Cloud Computing

11.1 Background

11.1.1 Instance Types

11.1.2 Scalable Architecture

11.1.3 Capacity Planning

11.1.4 Storage

11.1.5 Multitenancy

11.1.6 Orchestration (Kubernetes)

11.2 Hardware Virtualization

11.2.1 Implementation

11.2.2 Overhead

11.2.3 Resource Controls

11.2.4 Observability

11.3 OS Virtualization

11.3.1 Implementation

11.3.2 Overhead

11.3.3 Resource Controls

11.3.4 Observability

11.4 Lightweight Virtualization

11.4.1 Implementation

11.4.2 Overhead

11.4.3 Resource Controls

11.4.4 Observability

11.5 Other Types

11.6 Comparisons

11.7 Exercises

11.8 References

12. Benchmarking

12 Systems Performance Benchmarking

12.1 Background

12.1.1 Reasons

12.1.2 Effective Benchmarking

12.1.3 Benchmarking Failures

12.2 Benchmarking Types

12.2.1 Micro-Benchmarking

12.2.2 Simulation

12.2.3 Replay

12.2.4 Industry Standard]]s

12.3 Methodology

12.3.1 Passive Benchmarking

12.3.2 Active Benchmarking

12.3.3 CPU Profiling

12.3.4 USE Method

12.3.5 Workload Characterization

12.3.6 Custom Benchmarks

12.3.7 Ramping Load

12.3.8 Sanity Check

12.3.9 Statistical Analysis

12.3.10 Benchmarking Checklist

12.4 Benchmark Questions

12.5 Exercises

12.6 References

13. perf

13 Systems Performance perf

13.1 Subcommands Overview

13.2 One-Liners

13.3 perf Events

13.4 Hardware Events

13.4.1 Frequency Sampling

13.5 Software Events

13.6 Tracepoint Events

13.7 Probe Events

13.7.1 kprobes

13.7.2 uprobes

13.7.3 USDT

13.8 perf stat

13.8.1 Options

13.8.2 Interval Statistics

13.8.3 Per-CPU Balance

13.8.4 Event Filters

13.8.5 Shadow Statistics

13.9 perf record

13.9.1 Options

13.9.2 CPU Profiling

13.9.3 Stack Walking

13.10 perf report

13.10.1 TUI

13.10.2 STDIO

13.11 perf script

13.11.1 Flame Graphs

13.11.2 Trace Scripts

13.12 perf trace

13.12.1 Kernel Versions

13.13 Other Commands

13.14 perf Documentation

13.15 References

14. Ftrace

14 Systems Performance Ftrace

14.1 Capabilities Overview

14.2 tracefs (/sys)

14.2.1 tracefs Contents

14.3 Ftrace Function Profiler

14.4 Ftrace Function Tracing

14.4.1 Using trace

14.4.2 Using trace_pipe

14.4.3 Options

14.5 Tracepoints

14.5.1 Filter

14.5.2 Trigger

14.6 kprobes

14.6.1 Event Tracing

14.6.2 Arguments

14.6.3 Return Values

14.6.4 Filters and Triggers

14.6.5 kprobe Profiling

14.7 uprobes

14.7.1 Event Tracing

14.7.2 Arguments and Return Values

14.7.3 Filters and Triggers

14.7.4 uprobe Profiling

14.8 Ftrace function_graph

14.8.1 Graph Tracing

14.8.2 Options

14.9 Ftrace hwlat

14.10 Ftrace Hist Triggers

14.10.1 Single Keys

14.10.2 Fields

14.10.3 Modifiers

14.10.4 PID Filters

14.10.5 Multiple Keys

14.10.6 Stack Trace Keys

14.10.7 Synthetic Events

14.11 trace-cmd

14.11.1 Subcommands Overview

14.11.2 trace-cmd One-Liners

14.11.3 trace-cmd vs. perf(1)

14.11.4 trace-cmd function_graph

14.11.5 KernelShark

14.11.6 trace-cmd Documentation

14.12 perf ftrace

14.13 perf-tools

14.13.1 Tool Coverage

14.13.2 Single-Purpose Tools

14.13.3 Multi-Purpose Tools

14.13.4 perf-tools One-Liners

14.13.5 Example

14.13.6 perf-tools vs. BCC/BPF

14.13.7 Documentation

14.14 Ftrace Documentation

14.15 References

15. BPF

15 Systems Performance BPF

15.1 BCC

15.1.1 Installation

15.1.2 Tool Coverage

15.1.3 Single-Purpose Tools

15.1.4 Multi-Purpose Tools

15.1.5 One-Liners

15.1.6 Multi-Tool Example

15.1.7 BCC vs. bpftrace

15.1.8 Documentation

15.2 bpftrace

15.2.1 Installation

15.2.2 Tools

15.2.3 One-Liners

15.2.4 Programming

15.2.5 Reference

15.2.6 Documentation

15.3 References

16. Case Study

16 Systems Performance Case Study

16.1 An Unexplained Win

16.1.1 Problem Statement

16.1.2 Analysis Strategy

16.1.3 Statistics

16.1.4 Configuration

16.1.5 PMCs

16.1.6 Software Events

16.1.7 Tracing

16.1.8 Conclusion

16.2 Additional Information

16.3 References

Appendix

A USE Method: Linux

B sar Summary

C bpftrace One-Liners

D Solutions to Selected Exercises

Fair Use Sources

Fair Use Sources:

Performance: Systems performance, Systems performance bibliography, Systems Performance Outline: (Systems Performance Introduction, Systems Performance Methodologies, Systems Performance Operating Systems, Systems Performance Observability Tools, Systems Performance Applications, Systems Performance CPUs, Systems Performance Memory, Systems Performance File Systems, Systems Performance Disks, Systems Performance Network, Systems Performance Cloud Computing, Systems Performance Benchmarking, Systems Performance perf, Systems Performance Ftrace, Systems Performance BPF, Systems Performance Case Study), Accuracy, Algorithmic efficiency (Big O notation), Algorithm performance, Amdahl's Law, Android performance, Application performance engineering, Async programming, Bandwidth, Bandwidth utilization, bcc, Benchmark (SPECint and SPECfp), BPF, bpftrace, Performance bottleneck (“Hotspots”), Browser performance, C performance, C++ performance, C# performance, Cache hit, Cache performance, Capacity planning, Channel capacity, Clock rate, Clojure performance, Compiler performance (Just-in-time (JIT) compilation - Ahead-of-time compilation (AOT), Compile-time, Optimizing compiler), Compression ratio, Computer performance, Concurrency, Concurrent programming, Concurrent testing, Container performance, CPU cache, CPU cooling, CPU cycle, CPU overclocking (CPU boosting, CPU multiplier), CPU performance, CPU speed, CPU throttling (Dynamic frequency scaling - Dynamic voltage scaling - Automatic underclocking), CPU time, CPU load - CPU usage - CPU utilization, Cycles per second (Hz), CUDA (Nvidia), Data transmission time, Database performance (ACID-CAP theorem, Database sharding, Cassandra performance, Kafka performance, IBM Db2 performance, MongoDB performance, MySQL performance, Oracle Database performance, PostgreSQL performance, Spark performance, SQL Server performance), Disk I/O, Disk latency, Disk performance, Disk speed, Disk usage - Disk utilization, Distributed computing performance (Fallacies of distributed computing), DNS performance, Efficiency - Relative efficiency, Encryption performance, Energy efficiency, Environmental impact, Fast, Filesystem performance, Fortran performance, FPGA, Gbps, Global Interpreter Lock - GIL, Golang performance, GPU - GPGPU, GPU performance, Hardware performance, Hardware performance testing, Hardware stress test, Haskell performance, High availability (HA), Hit ratio, IOPS - I/O operations per second, IPC - Instructions per cycle, IPS - Instructions per second, Java performance (Java data structure performance - Java ArrayList is ALWAYS faster than LinkedList, Apache JMeter), JavaScript performance (V8 JavaScript engine performance, Node.js performance - Deno performance), JVM performance (GraalVM, HotSpot), Kubernetes performance, Kotlin performance, Lag (video games) (Frame rate - Frames per second (FPS)), Lagometer, Latency, Lazy evaluation, Linux performance, Load balancing, Load testing, Logging, macOS performance, Mainframe performance, Mbps, Memory footprint, Memory speed, Memory performance, Memory usage - Memory utilization, Micro-benchmark, Microsecond, Monitoring

Linux/UNIX commands for assessing system performance include:

(Event monitoring - Event log analysis, Google Cloud's operations suite (formerly Stackdriver), htop, mpstat, macOS Activity Monitor, Nagios Core, Network monitoring, netstat-iproute2, proc filesystem (procfs)]] - ps (Unix), System monitor, sar (Unix) - systat (BSD), top - top (table of processes), vmstat), Moore’s law, Multicore - Multi-core processor, Multiprocessor, Multithreading, mutex, Network capacity, Network congestion, Network I/O, Network latency (Network delay, End-to-end delay, packet loss, ping - ping (networking utility) (Packet InterNet Groper) - traceroute - netsniff-ng, Round-trip delay (RTD) - Round-trip time (RTT)), Network performance, Network switch performance, Network usage - Network utilization, NIC performance, NVMe, NVMe performance, Observability, Operating system performance, Optimization (Donald Knuth: “Premature optimization is the root of all evil), Parallel processing, Parallel programming (Embarrassingly parallel), Perceived performance, Performance analysis (Profiling), Performance design, Performance engineer, Performance equation, Performance evaluation, Performance gains, Performance Mantras, Performance measurement (Quantifying performance, Performance metrics), Perfmon, Performance testing, Performance tuning, PowerShell performance, Power consumption - Performance per watt, Processing power, Processing speed, Productivity, Python performance (CPython performance, PyPy performance - PyPy JIT), Quality of service (QOS) performance, Refactoring, Reliability, Response time, Resource usage - Resource utilization, Router performance (Processing delay - Queuing delay), Ruby performance, Rust performance, Scala performance, Scalability, Scalability test, Server performance, Size and weight, Slow, Software performance, Software performance testing, Speed, Stress testing, SSD, SSD performance, Swift performance, Supercomputing, Tbps, Throughput, Time (Time units, Nanosecond, Millisecond, Frequency (rate), Startup time delay - Warm-up time, Execution time), TPU - Tensor processing unit, Tracing, Transistor count, TypeScript performance, Virtual memory performance (Thrashing), Volume testing, WebAssembly, Web framework performance, Web performance, Windows performance (Windows Performance Monitor). (navbar_performance)


© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers

SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.