systems_performance_2nd_edition_by_brendan_gregg_index

Systems Performance, 2nd Edition, by Brendan Gregg Index

A

Actions in bpftrace, 769

Active benchmarking]], 657–660

Active listening in three-way handshakes, 511

Active pages in page caches, 318

Activities overview, 3–4

Ad hoc checklist method, 43–44

Adaptive mutex locks, 198

Adaptive Replacement Cache (ARC), 381

Address space, 304

guests, 603

kernel, 90

memory, 304, 310

processes, 95, 99–102, 319–322

threads, 227–228

virtual memory, 104, 305

Address space layout randomization (ASLR), 723

Advanced Format for magnetic rotational disks, 437

AF_NETLINK address family, 145–146

Agents

monitoring software, 137–138

product monitoring, 79

AKS (Azure [[Kubernetes Service), 586

Alerts, 8

Algorithms

caching, 36

congestion control, 115, 118, 513–514

big O notation, 175–176

Allocation groups in XFS, 380

Allocators

memory, 309

multithreaded applications, 353

process virtual address space, 320–321

Amazon EKS]] (Elastic Kubernetes Service]]), 586

Amdahl’s Law of Scalability, 64–65

Analysis

benchmarking]], 644–646, 665–666

capacity planning, 38, 71–72

drill-down, 55–56

I/O traces, 478–479

latency, 56–57, 384–386, 454–455

off-CPU, 188–192

resource, 38–39

thread state, 193–197

workload, 4–5, 39–40

Analysis step in scientific method, 44–45

Analysis strategy in case study]], 784

an[[notate subcommand for perf, 673

Anonymous memory, 304

Anonymous paging, 305–307

Anti-methods

blame-someone-else, 43

random change, 42–43

streetlight, 42

Apdex (application performance index), 174

Application calls, tuning, 415–416

Application I/O, 369, 435

Application instrumentation in off-CPU analysis, 189

Application internals, 213

Application layer, file system latency in, 384

Application performance index (Apdex), 174

Applications, 171

basics, 172–173

big O notation, 175–176

bpftrace for, 765

common case optimization, 174

common problems, 213–215

exercises, 216–217

internals, 213

latency documentation, 385

methodology. See Applications methodology

missing stacks, 215–216

missing symbols, 214

objectives, 173–174

observability, 174

observability tools. See Applications observability tools

performance techniques. See Applications performance techniques

programming languages. See Applications programming languages

references, 217–218

Applications methodology

CPU profiling, 187–189

distributed tracing, 199

lock analysis, 198

off-CPU analysis, 189–192

overview, 186–187

static performance tuning, 198–199

syscall analysis, 192

thread state analysis, 193–197

USE method, 193

Applications observability tools

bpftrace, 209–213

execsnoop, 207–208

offcputime, 204–205

overview, 199–200

perf, 200–203

profile, 203–204

strace, 205–207

syscount, 208–209

Applications performance techniques

buffers, 177

caching, 176

concurrency and parallelism, 177–181

I/O size selection, 176

non-blocking I/O, 181

Performance Mantras, 182

polling, 177

processor binding, 181–182

Applications programming languages, 182–183

compiled, 183–184

garbage collection]], 184–185

Interpreted, 184–185

virtual machines, 185

Appropriateness level in methodologies, 28–29

ARC (Adaptive Replacement Cache), 381

Architecture

CPUs. See CPUs architecture

disks. See Disks architecture

file systems. See File [[systems architecture

vs. loads, 581–582

memory. See Memory architecture

networks. See Networks architecture

scalable, 581–582

archive subcommand for perf, 673

arcstat.pl tool, 410

arg variables for bpftrace, 778

argdist tool, 757–759

Arguments

kprobes, 152

networks, 507

tracepoints, 148–149

uprobes, 154

Arithmetic mean, 74

Arrival process in queueing systems, 67

ASG (auto scaling]] group)

capacity planning, 72

cloud computing, 583–584

ASLR (address space layout randomization), 723

Associativity in caches, 234

Asynchronous disk I/O, 434–435

Asynchronous interrupts, 96–97

Asynchronous writes, 366

atop tool, 285

Auto scaling]] group (ASG)

capacity planning, 72

cloud computing, 583–584

available_filter_functions file, 710

Available swap, 309

available_tracers file, 710

Averages, 74–75

avg function, 780

Axes

B

Back-ends in instruction pipeline, 224

Background color in flame graphs, 291

Backlogs in network connections, 507, 519–520, 556–557, 569

Bad paging, 305

Balloon drivers, 597

Bandwidth

disks, 424

interconnects, 237

networks, 500, 508, 532–533

OS virtualization, 614–615

Bare-metal hypervisors, 587

Baseline statistics, 59

BATCH scheduling policy, 243

BBR (Bottleneck Bandwidth and RTT) algorithm, 118, 513

bcache technology, 117

BCC (BPF Compiler Collection), 12

vs. bpftrace, 760

disks, 450

documentation, 760–761

installing, 754

multi-purpose tools, 757

multi-tool example, 759

networks, 526

one-liners, 757–759

overview, 753–754

vs. perf-tools, 747–748

single-purpose tools, 755–757

slow disks case study]], 17

system-wide tracing, 136

tool overview, 754–755

bcc-tools tool package, 132

BEGIN probes in bpftrace, 774

bench subcommand for perf, 673

Benchmark paradox, 648–649

Benchmarket]]ing, 642

Benchmarking]], 641–642

analysis, 644–646

capacity planning, 70

CPUs, 254

effective, 643–644

exercises, 668

failures, 645–651

industry standards, 654–656

memory, 328

micro-benchmarking]]. See Micro-benchmarking]]

questions, 667–668

reasons, 642–643

references, 669–670

replay, 654

simulation, 653–654

specials, 650

SysBench system, 294

types, 13, 651–656

Benchmarking]] methodology

active, 657–660

checklist, 666–667

CPU profiling, 660–661

custom benchmarks, 662

overview, 656

passive, 656–657

ramping load, 662–664

sanity checks, 664–665

statistical analysis, 665–666

USE method, 661

workload characterization, 662

Berkeley Packet Filter (BPF), 751–752

BCC compiler. See BCC (BPF Compiler Collection)

bpftrace. See bpftrace tool

description, 12–13

extended. See Extended BPF

iterator, 562

JIT compiler, 117

kernels, 92

OS virtualization tracing, 620, 624–625, 629

vs. perf-tools, 747–748

program, 90

Berkeley Software Distribution (BSD)]], 113

BFQ (Budget Fair Queueing) I/O schedulers, 119, 449

Big kernel lock (BKL) performance bottleneck, 116

Big O notation, 175–176

Billing in cloud computing, 584

Bimodal performance, 76

Binary executable files, 183

Binary translations in hardware virtualization, 588, 590

Binding

CPU, 253, 297–298

NUMA, 353

processor, 181–182

bioerr tool, 487

biolatency tool

BCC, 753–755

disks, 450, 468–470

example, 753–754

biopattern tool, 487

BIOS, tuning, 299

biosnoop tool

BCC, 755

disks, 470–472

event tracing, 58

hardware virtualization, 604–605

outliers, 471–472

queued time, 472

system-wide tracing, 136

biostacks tool, 474–475

biotop tool

BCC, 755

disks, 450, 473–474

Bit width in CPUs, 229

bitesize tool

BCC, 755

perf-tools, 743

blame command, 120

Blame-someone-else anti-method, 43

Blanco, Brenden, 753

Blind faith benchmarking]], 645

blk tracer, 708

blkio control group]], 610, 617

blkreplay tool, 493

blktrace tool

action filtering, 478

action identifiers, 477

analysis, 478–479

default output, 476–477

description, 116

disks, 475–479

RWBS description, 477

visualizations, 479

Block-based file systems, 375–376

Block caches in disk I/O, 430

Block device interface, 109–110, 447

Block I/O state in delay accounting, 145

Block I/O times for disks, 427–428, 472

Block interleaving, 378

Block [[size

defined, 360

FFS, 378

Block stores in cloud computing, 584

Blue-green cloud computing deployments, 3–4

Bonnie and Bonnie++ benchmarking]] tools

active benchmarking]], 657–660

file systems, 412–414

Boolean expressions in bpftrace, 775–776

Boot options, security, 298–299

Boot-time tracing, 119

Borkmann, Daniel, 121

Borrowed virtual time (BVT) schedulers, 595

Bottleneck Bandwidth and RTT (BBR) algorithm, 118, 513

Bottlenecks

capacity planning, 70–71

complexity, 6

defined, 22

USE method, 47–50, 245, 324, 450–451

BPF. See Berkeley Packet Filter (BPF)

bpftrace tool, 12–13

application internals, 213

vs. BCC, 752–753, 760

block I/O events, 625, 658–659

description, 282

disk I/O errors, 483

disk I/O latency, 482–483

disk I/O size, 480–481

event sources, 558

examples, 284, 761–762

file system internals, 408

hardware virtualization, 602

I/O profiling, 210–212

installing, 762

lock tracing, 212–213

malloc() bytes flame graph, 346

memory internals, 346–347

one-liners for CPUs, 283, 803–804

one-liners for disks, 479–480, 806–807

one-liners for file systems, 402–403, 805–806

one-liners for memory, 343–344, 804–805

one-liners for networks, 550–552, 807–808

one-liners overview, 763–765

package contents, 132

packet inspection]], 526

page fault flame graphs, 346

programming. See bpftrace tool programming

references, 782

scheduling internals, 284–285

signal tracing, 209–210

socket tracing, 552–555

stacks viewing, 450

syscall tracing, 403–405

system-wide tracing, 136

TCP tracing, 555–557

tracepoints, 149

user allocation stacks, 345

VFS tracing, 405–408

bpftrace tool programming

actions, 769

comments, 767

documentation, 781

example, 766

filters, 769

flow [[control, 775–777

functions, 770–772, 778–781

Hello, World! program, 770

operators, 776–777

probe arguments, 775

probe format, 768

probe types, 774–775

probe wildcards, 768–769

program structure, 767

timing, 772–773

usage, 766–767

variables, 770–771, 777–778

BQL (Byte Queue Limits)

driver queues, 524

tuning, 571

Branch prediction in instruction pipeline, 224

Breakpoints in perf, 680

brk system calls, 95

brkstack tool, 348

Broadcast]] network messages, 503

BSD ([[Berkeley Software Distribution)]], 113

b[[trace tool, 476, 478

btrfs file system, 381–382, 399

btrfsdist tool, 755

btrfsslower tool, 755

btt tool, 478

Buckets

hash table]]s, 180

heat maps, 82–83

Buddy allocators, 317

Budget Fair Queueing (BFQ) I/O schedulers, 119, 449

buf function, 778

Buffer caches, 110, 374

Bufferbloat, 507

Buffers

applications, 177

block devices, 110, 374

networks, 507

ring, 522

TCP, 520, 569

bufgrow tool, 409

Bug database systems

applications, 172

case studies, 792–793

buildid-cache subcommand for perf, 673

Built-in bpftrace variables, 770, 777–778

Bursting in cloud computing, 584, 614–615

Buses, memory, 312–313

BVT (borrowed virtual time) schedulers, 595

Bypass, kernel, 94

Byte Queue Limits (BQL)

driver queues, 524

tuning, 571

Bytecode, 185

C

C, C++

compiled languages, 183

symbols, 214

stacks, 215

C-states in CPUs, 231

c2c subcommand for perf, 673, 702

Cache Allocation Technology (CAT), 118, 596

Cache miss rate, 36

Cache warmth, 222

cachegrind tool, 135

Caches and caching

applications, 176

associativity, 234

block devices, 110, 374

cache line size, 234

coherency, 234–235

CPUs, hardware virtualization, 596

CPUs, memory, 221–222, 314

CPUs, OS virtualization, 615–616

CPUs, processors, 230–235

CPUs, vs. GPUs, 240

defined, 23

dentry, 375

disks, I/O, 430

disks, on-disk, 437

disks, tuning, 456

file systems, flushing, 414

file systems, OS virtualization, 613

file systems, overview, 361–363

file systems, tuning, 389, 414–416

file systems, types, 373–375

file systems, usage, 309

inode, 375

methodologies, 35–37

micro-benchmarking]] test, 390

operating systems, 108–109

page, 315, 374

perf events, 680

RAID, 445

tuning, 60

write-back, 365

cachestat tool

file systems, 399, 658–659

memory, 348

perf-tools, 743

slow disks case study]], 17

Caching disk model, 425–426

Canary testing, 3

Capacity-based utilization, 34

Capacity of file systems, 371

Capacity planning

benchmarking]] for, 642

cloud computing, 582–584

defined, 4

factor analysis, 71–72

micro-benchmarking]], 70

overview, 69

resource analysis, 38

resource limits, 70–71

scaling solutions, 72–73

CAPI (Coherent Accelerator Processor Interface), 236

Carrier sense multiple access with collision detection (CSMA/CD) algorithm, 516

CAS (column address strobe) latency, 311

Cascading failures, 5

Case studies

analysis strategy, 784

bug database systems, 792–793

conclusion, 792

configuration, 786–788

PMCs, 788–789

problem statement, 783–784

references, 793

slow disks, 16–18

software change, 18–19

software events, 789–790

statistics, 784–786

tracing, 790–792

Casual benchmarking]], 645

CAT (Cache Allocation Technology), 118, 596

cat function, 779

CAT (Intel Cache Allocation Technology), 118, 596

CFQ (completely fair queueing), 115, 449

CFS (completely fair scheduler), 116–117

CPU scheduling, 241

CPU shares, 614–615

description, 243

cgroup file, 141

cgroup variable, 778

cgroupid function, 779

cgroups

block I/O, 494

description, 116, 118

Linux kernel, 116

memory, 317, 353

OS virtualization, 606, 608–611, 613–620, 630

resource management, 111, 298

statistics, 139, 141, 620–622, 627–628

cgtop tool, 621

Character devices, 109–110

Characterizing memory usage, 325–326

Cheating in benchmarking]], 650–651

Checklists

ad hoc checklist method, 43–44

benchmarking]], 666

CPUs, 247, 527

disks, 453

file systems, 387

Linux 60-second analysis, 15

memory, 325

Chip-level multiprocessing (CMP), 220

chrt command, 295

Cilium, 509, 586, 617

Circular buffers for applications, 177

CISCs (complex instruction set computers), 224

clang complier, 122

Classes, scheduling

CPUs, 242–243

I/O, 493

kernel, 106, 115

priority, 295

Clean memory, 306

clear function in bpftrace, 780

clear subcommand in trace-cmd, 735

clock routine, 99

Clocks

CPUs, 223, 230

CPUs vs. GPUs, 240

operating systems, 99

clone system calls, 94, 100

Cloud APIs, 580

Cloud computing, 579–580

background, 580–581

capacity planning, 582–584

comparisons, 634–636

vs. enterprise, 62

exercises, 636–637

hardware virtualization. See Hardware virtualization

instance types, 581

lightweight virtualization, 630–633

multitenancy, 585–586

orchestration, 586

OS virtualization. See OS virtualization

overview, 14

PMCs, 158

proof-of-concept testing, 3

references, 637–639

scalable architecture, 581–582

storage, 584–585

types, 634

Cloud-[[native databases, 582

Clue-based approach in thread state analysis, 196

Clusters in cloud computing, 586

CMP (chip-level multiprocessing), 220

CNI (container network interface]]) software, 586

Co-routines in applications, 178

Coarse view in profiling, 35

Code changes in cloud computing, 583

Coefficient of variation (CoV), 76

Coherence

caches, 234–235

models, 63

Coherent Accelerator Processor Interface (CAPI), 236

Cold caches, 36

collectd agent, 138

Collisions

hash, 180

networks, 516

Colors in flame graphs, 291

Column address strobe (CAS) latency, 311

Column quantizations, 82–83

comm variable in bpftrace, 778

Comma-separated values (CSV) format for sar, 165

Comments in bpftrace, 767

Common case optimization in applications, 174

Communication in multiprocess vs. multithreading, 228

Community applications, 172–173

Comparing benchmarks, 648

Competition, benchmarking]], 649

Compiled programming languages

optimizations, 183–184

overview, 183

Compilers

CPU optimization, 229

options, 295

Completely fair queueing (CFQ), 115, 449

Completely fair scheduler (CFS), 116–117

CPU scheduling, 241

CPU shares, 614–615

description, 243

Completion target in workload analysis, 39

Complex benchmark tools, 646

Complex instruction set computers (CISCs), 224

Complexity, 5

Comprehension in flame graphs, 249

Compression

btrfs, 382

disks, 369

ZFS, 381

Compute kernel, 240

Compute Unified Device Architecture (CUDA), 240

Concurrency

applications, 177–181

micro-benchmarking]], 390, 456

CONFIG options, 295–296

CONFIG_TASK_DELAY_ACCT option, 145

Configuration

applications, 172

case study]], 786–788

network options, 574

Congestion avoidance and control

Linux kernel, 115

networks, 508

TCP, 510, 513

tuning, 570

connect system calls, 95

Connections for networks, 509

backlogs, 507, 519–520, 556–557, 569

characteristics, 527–528

firewalls, 517

latency, 7, 24–25, 505–506, 528

life span, 507

local, 509

monitoring, 529

NICs, 109

QUIC, 515

TCP queues, 519–520

three-way handshakes, 511–512

UDP, 514

Container network interface]] (CNI) software, 586

Containers

lightweight virtualization, 631–632

orchestration, 586

observability, 617–630

OS virtualization, 605–630

resource controls, 52, 70, 613–617, 626

Contention

locks, 198

models, 63

Context switches

defined, 90

kernels, 93

Contributors to system performance technologies, 811–814

Control group]]s (cgroups). See cgroups

Control paths in hardware virtualization, 594

Control units in CPUs, 230

Controllers

caches, 430

disk, 426

mechanical]] disks, 439

micro-benchmarking]], 457

network, 501–502, 516

solid-state drives, 440–441

tunable, 494–495

USE method, 49, 451

Controls, resource. See Resource controls

Cookies, TCP, 511, 520

Copy-on-write (COW) file systems, 376

btrfs, 382

ZFS, 380

Copy-on-write (COW) process strategy, 100

CoreLink Interconnects, 236

Cores

CPUs vs. GPUs, 240

defined, 220

Corrupted file [[system data, 365

count function in bpftrace, 780

Counters, 8–9

fixed, 133–135

hardware, 156–158

CoV (coefficient of variation), 76

COW (copy-on-write) file systems, 376

btrfs, 382

ZFS, 380

COW (copy-on-write) process strategy, 100

CPCs (CPU performance counters), 156

CPI (cycles per instruction), 225

CPU affinity, 222

CPU-bound applications, 106

cpu control group]], 610

CPU mode for applications, 172

CPU performance counters (CPCs), 156

CPU registers, perf-tools for, 746–747

cpu variable in bpftrace, 777

cpuacct control group]], 610

cpudist tool

BCC, 755

case study]], 790–791

threads, 278–279

cpufreq tool, 285

cpuinfo tool, 142

cpupower tool, 286–287

CPUs, 219–220

architecture. See CPUs architecture

benchmark questions, 667–668

binding, 181–182

bpftrace for, 763, 803–804

clock rate, 223

compiler optimization, 229

cross calls, 110

exercises, 299–300

experiments, 293–294

feedback]]-directed optimization, 122

flame graphs. See Flame graphs

FlameScoped tool, 292–293

garbage collection]], 185

hardware virtualization, 589–592, 596–597

I/O wait, 434

instructions, defined, 220

instructions, IPC, 225

instructions, pipeline, 224

instructions, size, 224

instructions, steps, 223

instructions, width, 224

memory caches, 221–222

memory tradeoffs with, 27

methodology. See CPUs methodology

models, 221–222

multiprocess and multithreading, 227–229

observability tools. See CPUs observability tools

OS virtualization, 611, 614, 627, 630

preemption, 227

priority inversion, 227

profiling. See CPUs profiling

references, 300–302

run queues, 222

saturation, 226–227

scaling in networks, 522–523

schedulers, 105–106

scheduling classes, 115

simultaneous multithreading, 225

statistic accuracy, 142–143

subsecond-offset heat maps, 289

terminology, 220

thread pools, 178

tuning. See CPUs tuning

USE method, 49–51, 795–797

user time, 226

utilization, 226

utilization heat maps, 288–289

virtualization support, 588

visualizations, 288–293

volumes and pools, 383

word size, 229

CPUs architecture, 221, 229

accelerators, 240–242

associativity, 234

caches, 230–235

GPUs, 240–241

hardware, 230–241

idle threads, 244

interconnects, 235–237

latency, 233–234

memory management units, 235

NUMA grouping, 244

P-states and C-states, 231

PMCs, 237–239

processors, 230

schedulers, 241–242

scheduling classes, 242–243

software, 241–244

CPUs methodology

CPU binding, 253

cycle analysis, 251

micro-benchmarking]], 253–254

overview, 244–245

performance monitoring, 251

priority tuning, 252–253

profiling, 247–250

resource controls, 253

sample processing, 247–248

static performance tuning, 252

tools method, 245

USE, 245–246

workload characterization, 246–247

CPUs observability tools, 254–255

bpftrace, 282–285

cpudist, 278–279

GPUs, 287

hardirqs, 282

miscellaneous, 285–286

mpstat, 259

perf, 267–276

pidstat, 262

pmcarch, 265–266

profile, 277–278

ps, 260–261

ptime, 263–264

runqlat, 279–280

runqlen, 280–281

sar, 260

showboost, 265

softirqs, 281–282

time, 263–264

tlbstat, 266–267

top, 261–262

turbostat, 264–265

uptime, 255–258

vmstat, 258

CPUs profiling

applications, 187–189

benchmarking]], 660–661

perf, 200–201

record, 695–696

steps, 247–250

system-wide, 268–270

CPUs tuning

compiler options, 295

CPU binding, 297–298

exclusive CPU sets, 298

overview, 294–295

power states, 297

processor options, 299

resource controls, 298

scaling governors, 297

scheduler options, 295–296

scheduling priority and class, 295

security boot options, 298–299

Cpusets, 116

CPU binding, 253

exclusive, 298

cpusets control group]], 610, 614, 627

cpuunclaimed tool, 755

Crash resilience, multiprocess vs. multithreading, 228

Cr[[edit-based schedulers, 595

Crisis tools, 131–133

critical-chain command, 120

Critical paths in systemd]] service manager, 120

criticalstat tool, 756

CSMA/CD (carrier sense multiple access with collision detection) algorithm, 516

CSV (comma-separated values) format for sar, 165

CUBIC algorithm for TCP congestion control, 513

CUDA (Compute Unified Device Architecture), 240

CUMASK values in MSRs, 238–239

current_tracer file, 710

curtask variable for bpftrace, 778

Custom benchmarks, 662

Custom load generators, 491

Cycle analysis

CPUs, 251

memory, 326

Cycles per instruction (CPI), 225

Cylinder groups in FFS, 378

D

Daily patterns, monitoring, 78

Data [[Center TCP (DCTCP) congestion control, 118, 513

Data deduplication in ZFS, 381

Data integrity in magnetic rotational disks, 438

Data paths in hardware virtualization, 594

Data Plane Development Kit (DPDK), 523

Data rate in throughput, 22

Databases

applications, 172

case studies, 792–793

cloud computing, 582

Datagrams

OSI model, 502

UDP, 514

DAX (Direct Access), 118

dbslower tool, 756

dbstat tool, 756

Dcache (dentry cache), 375

dcsnoop tool, 409

dcstat tool, 409

DCTCP (Data [[Center TCP) congestion control, 118, 513

dd command

disks, 490–491

file systems, 411–412

DDR SDRAM (double data rate synchronous dynamic random-access memory), 313

Deadline I/O schedulers, 243, 448

DEADLINE scheduling policy, 243

DebugFS interface, 116

Decayed average, 75

Deflated disk I/O, 369

Defragmentation in XFS, 380

Degradation in scalability, 31–32

Delay accounting

kernel, 116

off-CPU analysis, 197

overview, 145

Delayed ACKs algorithm, 513

Delayed allocation

ext4, 379

XFS, 380

delete function in bpftrace, 780

Demand paging

BSD kernel, 113

memory, 307–308

Dentry caches (dcaches), 375

Dependencies in perf-tools, 748

Development, benchmarking]] for, 642

Development attribute, multiprocess vs. multithreading, 228

Devices

backlog tuning, 569

disk I/O caches, 430

drivers, 109–110, 522

hardware virtualization, 588, 594, 597

devices control group]], 610

df tool, 409

Dhrystone benchmark

CPUs, 254

simulations, 653

Diagnosis cycle, 46

diff subcommand for perf, 673

Differentiated Services Code Points (DSCPs), 509–510

Direct Access (DAX), 118

Direct buses, 313

Direct I/O, 366

Direct mapped caches, 234

Direct measurement approach in thread state analysis, 197

Direct-reclaim memory method, 318–319

Directories in file systems, 107

Directory indexes in ext3, 379

Directory name lookup cache (DNLC), 375

Dirty memory, 306

Disk commands, 424

Disk controllers

caches, 430

magnetic rotational disks, 439

tunable, 494–495

USE method, 451

Disk I/O state in thread state analysis, 194–197

Disk request time, 428

Disk response time]], 428

Disk service time, 428–429

Disk wait time, 428

Disks, 423–424

architecture. See Disks architecture

exercises, 495–496

experiments, 490–493

I/O. See Disks I/O

IOPS, 432

latency analysis, 384–386

methodology. See Disks methodology

models. See Disks models

non-data-[[transfer disk commands, 432

observability tools. See Disks observability tools

read/write ratio, 431

references, 496–498

resource controls, 494

saturation, 434

terminology, 424

tunable, 494

tuning, 493–495

USE method, 451

utilization, 433

visualizations, 487–490

Disks architecture

interfaces, 442–443

magnetic rotational disks, 435–439

operating system disk I/O stack, 446–449

persistent memory, 441

solid-state drives, 439–441

storage types, 443–446

Disks I/O

vs. application I/O, 435

bpftrace for, 764, 806–807

caching, 430

errors, 483

heat maps, 488–490

latency, 428–430, 454–455, 467–472, 482–483

operating system stacks, 446–449

OS virtualization, 613, 616

OS virtualization strategy, 630

random vs. sequential, 430–431

scatter plots, 488

simple disk, 425

size, 432, 480–481

synchronous vs. asynchronous, 434–435

time measurements, 427–429

time scales, 429–430

wait, 434

Disks methodology

cache tuning, 456

latency analysis, 454–455

micro-benchmarking]], 456–457

overview, 449–450

performance monitoring, 452

resource controls, 456

scaling, 457–458

static performance tuning, 455–456

tools method, 450

USE method, 450–451

workload characterization, 452–454

Disks models

caching disk, 425–426

controllers, 426

simple disk, 425

Disks observability tools, 484–486

biolatency, 468–470

biosnoop, 470–472

biostacks, 474–475

biotop, 473–474

blktrace, 475–479

bpftrace, 479–483

iostat, 459–463

iotop, 472–473

MegaCli, 484

miscellaneous, 487

overview, 458–459

perf, 465–468

pidstat, 464–465

PSI, 464

sar, 463–464

SCSI event logging, 486

diskstats tool, 142, 487

Dispatcher-queue latency, 222

Distributed operating systems, 123–124

Distributed tracing, 199

Distributions

multimodal, 76–77

normal, 75

dmesg tool

CPUs, 245

description, 15

memory, 348

OS virtualization, 619

dmidecode tool, 348–349

DNLC (directory name lookup cache), 375

DNS latency, 24–25

Docker 607, 620–622

Documentation

application latency, 385

BCC, 760–761

bpftrace, 781

Ftrace, 748–749

kprobes, 153

perf, 276, 703

perf-tools, 748

PMCs, 158

sar, 165–166

trace-cmd, 740

tracepoints, 150–151

uprobes, 155

USDT, 156

Domains

scheduling, 244

Xen, 589

Double data rate synchronous dynamic random-access memory (DDR SDRAM), 313

Double-pumped data [[transfer for CPUs, 237

DPDK (Data Plane Development Kit), 523

DRAM (dynamic random-access memory), 311

Drill-down analysis

overview, 55–56

slow disks case study]], 17

Drivers

balloon, 597

device, 109–110, 522

parameterized, 593–595

drsnoop tool

BCC, 756

memory, 342

DSCPs (Differentiated Services Code Points), 509–510

D[[Trace tool

description, 12

Solaris kernel, 114

Duplex for networks, 508

Duplicate ACK detection, 512

Duration in RED method, 53

DWARF (debugging with attributed record formats) stack walking, 216, 267, 676, 696

Dynamic instrumentation

kprobes, 151

latency analysis, 385

overview, 12

Dynamic priority in scheduling classes, 242–243

Dynamic random-access memory (DRAM), 311

Dynamic sizing in cloud computing, 583–584

Dynamic tracers, 12

Dynamic tracing

D[[Trace, 114

perf, 677–678

tools, 12

Dynamic USDT, 156

DynTicks, 116

E

e2fsck tool, 418

Early Departure Time (EDT), 119, 524

eBPF. See Extended BPF

EBS (Elastic Block Store), 585

ECC (error-correcting code) for magnetic rotational disks, 438

ECN (Explicit Congestion Notification) field

IP, 508–510

TCP, 513

tuning, 570

EDT (Early Departure Time), 119, 524

EFS (Elastic File [[System), 585

EKS (Elastic Kubernetes Service]]), 586

elasped variable in bpftrace, 777

Elastic Block Store (EBS), 585

Elastic File [[System (EFS), 585

Elastic Kubernetes Service]] (EKS), 586

Elevator seeking in magnetic rotational disks, 437–438

ELF (Executable and Linking Format) binaries

description, 183

missing symbols in, 214

Embedded caches, 232

eMLC (enterprise multi-level cell) flash memory]], 440

Encapsulation for networks, 504

END probes in bpftrace, 774

End-to-end network arguments, 507

Enterprise models, 62

Enterprise multi-level cell (eMLC) flash memory]], 440

Environment

benchmarking]], 647

processes, 101–102

Ephemeral drives, 584

Ephemeral ports, 531

epoll system call, 115, 118

EPTs (extended page tables), 593

Erlang virtual machines, 185

Error-correcting code (ECC) for magnetic rotational disks, 438

Errors

applications, 193

benchmarking]], 647

CPUs, 245–246, 796, 798

disk controllers, 451

disk devices, 451

I/O, 483, 798

kernels, 798

memory, 324–325, 796, 798

networks, 526–527, 529, 796–797

RED method, 53

storage, 797

task capacity, 799

USE method overview, 47–48, 51–53

user mutex, 799

Ethernet congestion avoidance, 508

ethtool tool, 132, 546–547

Event-based concurrency, 178

Event-based tools, 133

Event-select MSRs, 238

Event sources for Wireshark, 559

Event tracing

disks, 454

file systems, 388

Ftrace, 707–708

kprobes, 719–720

methodologies, 57–58

perf-tools for, 745–746

trace-cmd for, 737

uprobes, 722–723

Event worker threads, 178

Events

case study]], 789–790

CPUs, 273–274

frequency sampling, 682–683

observability source, 159

perf. See perf tool events

SCSI logging, 486

selecting, 274–275

stat filters, 693–694

synthetic, 731–733

trace, 148

events directory in tracefs, 710

Eviction policies for caching, 36

evlist subcommand for perf, 673

Exceptions

synchronous interrupts, 97

user mode, 93

Exclusive CPU sets, 298

exec system calls

kernel, 94

processes, 100

execsnoop tool

BCC, 756

CPUs, 285

perf-tools, 743

process tracing, 207–208

static instrumentation, 11–12

tracing, 136

Executable and Linking Format (ELF) binaries

description, 183

missing symbols in, 214

Executable data in process virtual address space, 319

Executable text in process virtual address space, 319

Execution in kernels, 92–93

execve system call, 11

exit function in bpftrace, 770, 779

Experimentation-based performance gains, 73–74

Experiments

CPUs, 293–294

disks, 490–493

file systems, 411–414

networks, 562–567

observability, 7

overview, 13–14

scientific method, 45–46

Experts for applications, 173

Explicit Congestion Notification (ECN) field

IP, 508–510

TCP, 513

tuning, 570

Explicit logical metadata]] in file systems, 368

Exporters for monitoring, 55, 79, 137

Express Data Path (XDP) technology

description, 118

event sources, 558

kernel bypass, 523

ext3 file system, 378–379

ext4 file system

features, 379

tuning, 416–418

ext4dist tool, 399–401, 756

ext4slower tool, 401–402, 756

Extended BPF, 12

BCC 751–761

bpftrace 752–753, 761–781, 803–808

description, 118

firewalls, 517

histograms, 744

kernel-mode applications, 92

overview, 121–122

tracing tools, 166

Extended page tables (EPTs), 593

Extent-based file systems, 375–376

Extents, 375–376

btrfs, 382

ext4, 380

External caches, 232

F

FaaS (functions as a service), 634

FACK (forward acknowledgments) in TCP, 514

Factor analysis in capacity planning, 71–72

Failures, benchmarking]], 645–651

Fair-share schedulers, 595

False sharing for hash table]]s, 181

Families of instance types, 581

Fast File [[System (FFS)

description, 113

overview, 377–378

Fast open in TCP, 510

Fast recovery in TCP, 510

Fast retransmits in TCP, 510, 512

Fast user-space mutex (Futex), 115

Fastpath state in Mutex locks, 179

fatrace tool, 395–396

Faults

in synchronous interrupts, 97

page faults. See page faults

faults tool, 348

FC (Fibre Channel) interface, 442–443

fd tool, 141

Feedback]]-directed optimization (FDO), 122

ffaults tool, 348

FFS (Fast File [[System)

description, 113

overview, 377–378

Fiber threads, 178

Fibre Channel (FC) interface, 442–443

Field-programmable gate arrays (FPGAs), 240–241

FIFO scheduling policy, 243

File descriptor capacity in USE method, 52

File [[offset pattern, micro-benchmarking]] for, 390

File stores in cloud computing, 584

File [[system internals, bpftrace for, 408

File [[systems

access timestamps, 371

ad hoc tools, 411–412

architecture. See File [[systems architecture

bpftrace for, 764, 805–806

caches. See File [[systems caches

capacity, OS virtualization, 616

capacity, performance issues, 371

exercises, 419–420

experiments, 411–414

hardware virtualization, 597

I/O, logical vs. physical, 368–370

I/O, non-blocking, 366–367

I/O, random vs. sequential, 363–364

I/O, raw and direct, 366

I/O, stack, 107–108

interfaces, 361

latency, 362–363

memory-mapped files, 367

metadata]], 367–368

methodology. See File [[systems methodology

micro-benchmark tools, 412–414

models, 361–362

observability tools. See File [[systems observability tools

operations, 370–371

OS virtualization, 611–612

overview, 106–107, 359–360

paging, 306

pre[[fetch, 364–365

read-ahead, 365

reads, micro-benchmarking]] for, 61

record size tradeoffs, 27

references, 420–421

special, 371

synchronous writes, 366

terminology, 360

tuning, 414–419

types. See File [[systems types

visualizations, 410–411

volumes and pools, 382–383

File [[systems architecture

caches, 373–375

features, 375–377

I/O stacks, 107–108, 372

VFS, 107, 373

File [[systems caches, 361–363

defined, 360

flushing, 414

hit ratio, 17

OS virtualization, 616

OS virtualization strategy, 630

tuning, 389

usage, 309

write-back, 365

File [[systems methodology

cache tuning, 389

disk analysis, 384

latency analysis, 384–386

micro-benchmarking]], 390–391

overview, 383–384

performance monitoring, 388

static performance tuning, 389

workload characterization, 386–388

workload separation, 389

File [[systems observability tools

bpftrace, 402–408

cachestat, 399

ext4dist, 399–401

ext4slower, 401–402

fatrace, 395–396

filetop, 398–399

free, 392–393

LatencyTOP, 396

miscellaneous, 409–410

mount, 392

opensnoop, 397

overview, 391–392

sar, 393–394

slabtop, 394–395

strace, 395

top, 393

vmstat, 393

File [[systems types

btrfs, 381–382

ext3, 378–379

ext4, 379

FFS, 377–378

XFS, 379–380

ZFS, 380–381

FileBench tool, 414

filelife tool, 409, 756

fileslower tool, 409

filetop tool, 398–399

filetype tool, 409

Filters

bpftrace, 769, 776

event, 693–694

kprobes, 721–722

PID, 729–730

tracepoints, 717–718

uprobes, 723

fio (Flexible IO Tester) tool

disks, 493

file systems, 413–414

Firecracker project, 631

Firewalls, 503

misconfigured, 505

overview, 517

tuning, 574

First-byte latency, 506, 528

Five Whys in drill-down analysis, 56

Fixed counters, 133–135

Flame graphs

automated, 201

characteristics, 290–291

colors, 291

CPU profiling, 10–11, 187–188, 278, 660–661

generating, 249, 270–272

interactivity, 291

interpretation, 291–292

malloc() bytes, 346

missing stacks, 215

off-CPU time, 190–191, 205

overview, 289–290

page faults, 340–342, 346

perf, 119

performance wins, 250

profiles, 278

sample processing, 249–250

scripts, 700

FlameScoped tool, 292–293, 700

Flash-memory-based SSDs, 439–440

Flash translation layer (FTL) in solid-state drives, 440–441

Flent (FLExible Network Tester) tool, 567

Flexible IO Tester (fio) tool

disks, 493

file systems, 413–414

FLExible Network Tester (Flent) tool, 567

Floating point events in perf, 680

floating-point operations per second (FLOPS) in benchmarking]], 655

Flow [[control in bpftrace, 775–777

Flusher threads, 374

Flushing caches, 365, 414

fmapfault tool, 409

Footprints, off-CPU, 188–189

fork system calls, 94, 100

forks.bt tool, 624–625

Format string for tracepoints, 148–149

Forward acknowledgments (FACK) in TCP, 514

4-wide processors, 224

FPGAs (field-programmable gate arrays), 240–241

Fragmentation

FFS, 377

file systems, 364

memory, 321

packets, 505

reducing, 380

Frames

defined, 500

networks, 515

OSI model, 502

Free memory lists, 315–318

free tool

description, 15

file systems, 392–393

memory, 348

OS virtualization, 619

FreeBSD

jails, 606

je[[malloc, 322

kernel, 113

TSA analysis, 217

network stack, 514

performance vs. Linux, 124

TCP LRO, 523

Freeing memory, 315–318

Frequency sampling for hardware events, 682–683

Front-ends in instruction pipeline, 224

Front-side buses, 235–237

fsck time in ext4, 379

fsrwstat tool, 409

FTL (flash translation layer) in solid-state drives, 440–441

ftrace subcommand for perf, 673

Ftrace, 13, 705–706

capabilities overview, 706–708

description, 166

documentation, 748–749

function_graph, 724–725

function profiler, 711–712

function tracer, 713–716

hist triggers, 727–733

hwlat, 726

kprobes, 719–722

options, 716

OS virtualization, 629

perf, 741

perf-tools, 741–748

references, 749

trace-cmd, 734–740

trace file, 713–715

trace_pipe file, 715

tracefs, 708–711

tracepoints, 717–718

tracing, 136

uprobes, 722–723

Full I/O distributions disk latency, 454

Full stack in systems performance, 1

Fully associative caches, 234

Fully-preemptible kernels, 110, 114

func variable in bpftrace, 778

funccount tool

BCC, 756–758

example, 747

perf-tools, 744, 748

funcgraph tool

Ftrace, 706–707

perf-tools, 744, 748

funclatency tool, 757

funcslower tool

BCC, 757

perf-tools, 744

function_graph tracer

description, 708

graph tracing, 724–725

options, 725

trace-cmd for, 737, 739

function_profile_enabled file, 710

Function profiling

Ftrace, 707, 711–712

observability source, 159

Function tracer. See Ftrace tool

Function tracing

profiling, 248

trace-cmd for, 736–737

Functional block diagrams in USE method, 49–50

Functional units in CPUs, 223

Functions as a service (FaaS), 634

Functions in bpftrace, 770, 778–781

functrace tool, 744

Futex (fast user-space mutex), 115

futex system calls, 95

G

Garbage collection]], 185–186

gcc compiler

optimizations, 183–184

PGO kernels, 122

gdb tool, 136

Generic segmentation offload (GSO) in networks, 520–521

Generic system performance methodologies, 40–41

Geometric mean, 74

getdelays.c tool, 286

gethostlatency tool, 561, 756

github.[[com tool package, 132

GKE (Google Kubernetes Engine), 586

glibc allocator, 322

Glossary of terms, 815–823

Golang

goroutines, 178

syscalls, 92

Good/fast/cheap trade-off]]s, 26–27

Google Kubernetes Engine (GKE), 586

Goroutines for applications, 178

gprof tool, 135

Grafana, 8–9, 138

Graph tracing, 724–725

Graphics processing units (GPUs)

vs. CPUs, 240

tools, 287

GRO (Generic Receive Offload), 119

Growth

big O notation, 175

heap, 320

memory, 185, 316, 327

GSO (generic segmentation offload) in networks, 520–521

Guests

hardware virtualization, 590–593, 596–605

lightweight virtualization, 632–633

OS virtualization, 617, 627–629

gVisor project, 631

H

Hard disk drives (HDDs), 435–439

Hard interrupts, 282

hardirqs tool, 282, 756

Hardware

memory, 311–315

networks, 515–517

threads, 220

tracing, 276

Hardware-assisted virtualization, 590

Hardware counters. See Performance monitoring counters (PMCs)

Hardware events

CPUs, 273–274

frequency sampling, 682–683

perf, 680–683

selecting, 274–275

Hardware instances in cloud computing, 580

Hardware interrupts, 91

Hardware latency detector (hwlat), 708, 726

Hardware latency tracer, 118

Hardware probes, 774

Hardware RAID, 444

Hardware resources in capacity planning, 70

Hardware virtualization

comparisons, 634–636

CPU support, 589–592

I/O, 593–595

implementation, 588–589

memory mapping, 592–593

multi-tenant contention, 595

observability, 597–605

overhead, 589–595

overview, 587–588

resource controls, 595–597

Harmonic mean, 74

Hash fields in hist triggers, 728

Hash table]]s in applications, 180–181

HBAs (host bus adapters), 426

HDDs (hard disk drives), 435–439

hdparm tool, 491–492

Head-based sampling in distributed tracing, 199

Heads in magnetic rotational disks, 436

Heap

anonymous paging, 306

description, 304

growth, 320

process virtual address space, 319

Heat maps

CPU utilization]], 288–289

disk offset, 489–490

disk utilization, 490

file systems, 410–411

FlameScoped, 292–293

I/O latency, 488–489

overview, 82–83

subsecond-offset, 289

Hello, World! program, 770

hfaults tool, 348

hist function in bpftrace, 780

Hist triggers

fields, 728–729

modifiers, 729

multiple keys, 730

perf-tools, 748

PID filters, 729–730

single keys, 727–728

stack trace keys, 730–731

synthetic events, 731–733

usage, 727

hist triggers profiler, 707

Histogram, 76–77

Hits, cache, 35–36, 361

Hold times for locks, 198

Holistic approach, 6

Horizontal pod autoscalers (HPAs), 73

Horizontal scaling and scalability

capacity planning, 72

cloud computing, 581–582

Host bus adapters (HBAs), 426

Hosts

applications, 172

cloud computing, 580

hardware virtualization, 597–603

lightweight virtualization, 632

OS virtualization, 617, 619–627

Hot caches, 37

Hot/cold flame graphs, 191

Hourly patterns, monitoring, 78

HPAs (horizontal pod autoscalers), 73

HT (HyperTransport) for CPUs, 236

htop tool, 621

HTTP/3 protocol, 515

Hubs in networks, 516

Hue in flame graphs, 291

Huge pages, 115–116, 314, 352–353

hugetlb control group]], 610

hwlat (hardware latency detector), 708, 726

Hybrid clouds, 580

Hybrid kernels, 92, 123

Hyper-Threading Technology, 225

Hyper-V, 589

Hypercalls in paravirtualization, 588

Hyperthreading-aware scheduling classes, 243

HyperTransport (HT) for CPUs, 236

Hypervisors

cloud computing, 580

hardware virtualization, 587–588

kernels, 93

Hypothesis step in scientific method, 44–45

I

I/O. See Input/output (I/O)

IaaS (infrastructure as a service), 580

Icicle graphs, 250

icstat tool, 409

IDDs (isolated driver domains), 596

Identification in drill-down analysis, 55

Idle memory, 315

Idle scheduling class, 243

IDLE scheduling policy, 243

Idle state in thread state analysis, 194, 196–197

Idle threads, 99, 244

ieee80211scan tool, 561

If statements, 776

ifconfig tool, 537–538

ifpps tool, 561

iftop tool, 562

Implicit disk I/O, 369

Implicit logical metadata]], 368

Inactive pages in page caches, 318

Incast problem in networks, 524

Index nodes (inodes)

caches, 375

defined, 360

VFS, 373

Indirect disk I/O, 369

Individual synchronous writes, 366

Industry benchmarking]], 60–61

Industry standards for benchmarking]], 654–655

Inflated disk I/O, 369

Infrastructure as a service (IaaS), 580

init process, 100

Initial window in TCP, 514

inject subcommand for perf, 673

Inodes (index nodes)

caches, 375

defined, 360

VFS, 373

inotify framework, 116

inotify tool, 409

Input

event tracing, 58

solid-state drive controllers, 440

Input/output (I/O)

disks. See Disks I/O

file systems, 360

hardware virtualization, 593–595, 597

I/O-bound applications, 106

latency, 424

logical vs. physical, 368–370

merging, 448

multiqueue schedulers, 119

non-blocking, 181, 366–367

OS virtualization, 611–612, 616–617

random vs. sequential, 363–364

raw and direct, 366

request time, 427

schedulers, 448

scheduling, 115–116

service time, 427

size, applications, 176

size, micro-benchmarking]], 390

stacks, 107–108, 372

USE method, 798

wait time, 427

Input/output operations per second. See IOPS (input/output operations per second)

Input/output profiling

bpftrace, 210–212

perf, 202–203

syscall analysis, 192

Installing

BCC, 754

bpftrace, 762

instances directory in tracefs, 710

Instances in cloud computing

description, 14

types, 580

Instruction pointer for threads, 100

Instructions, CPU

defined, 220

IPC, 225

pipeline, 224

size, 224

steps, 223

text, 304

width, 224

Instructions per cycle (IPC), 225, 251, 326

Integrated caches, 232

Intel Cache Allocation Technology (CAT), 118, 596

Intel Clear Containers, 631

Intel processor cache sizes, 230–231

Intel VTune Amplifier XE tool, 135

Intelligent Platform Management Interface (IPMI), 98–99

Intelligent pre[[fetch in ZFS, 381

Inter-processor interrupts (IPIs), 110

Inter-stack latency in networks, 529

Interactivity in flame graphs, 291

Interconnects

buses, 313

CPUs, 235–237

USE method, 49–51

Interfaces

defined, 500

device drivers, 109–110

disks, 442–443

file systems, 361

kprobes, 153

network, 109, 501

network hardware, 515–516

network IOPS, 527–529

network negotiation, 508

PMCs, 157–158

scheduling in NAPI, 522

tracepoints, 149–150

uprobes, 154–155

Interleaving in FFS, 378

Internet Protocol (IP)

congestion avoidance, 508

overview, 509–510

sockets]], 509

Interpretation of flame graphs, 291–292

Interpreted programming languages, 184–185

Interrupt coalescing mode for networks, 522

Interrupt-disabled mode, 98

Interrupt service requests (IRQs), 96–97

Interrupt service routines (ISRs), 96

Interrupts

asynchronous, 96–97

defined, 91

hardware, 282

masking, 98–99

network [[latency, 529

overview, 96

soft, 281–282

synchronous, 97

threads, 97–98

interrupts tool, 142

interval probes in bpftrace, 774

Interval statistics, stat for, 693

IO accounting, 116

io_submit command, 181

io_uring_enter command, 181

io_uring interface, 119

ioctl system calls, 95

iolatency tool, 743

ionice tool, 493–494

ioping tool, 492

ioprofile tool, 409

IOPS (input/output operations per second)

defined, 22

description, 7

disks, 429, 431–432

networks, 527–529

performance metric, 32

resource analysis, 38

iosched tool, 487

iosnoop tool, 743

iostat tool

bonnie++ tool, 658

default output, 459–460

description, 15

disks, 450, 459–463

extended output, 460–463

fixed counters, 134

memory, 348

options, 460

OS virtualization, 619, 627

percent busy metric, 33

slow disks case study]], 17

iotop tool, 450, 472–473

IP (Internet Protocol)

congestion avoidance, 508

overview, 509–510

sockets]], 509

ip tool, 525, 536–537

ipc control group]], 608

IPC (instructions per cycle), 225, 251, 326

ipecn tool, 561

iperf tool

example, 13–14

network micro-benchmarking]], 10

network throughput, 564–565

IPIs (inter-processor interrupts), 110

IPMI (Intelligent Platform Management Interface), 98–99

iproute2 tool package, 132

IRQs (interrupt service requests), 96–97

irqsoff tracer, 708

iscpu tool, 285

Isolated driver domains (IDDs), 596

Isolation in OS virtualization, 629

ISRs (interrupt service routines), 96

istopo tool, 286

J

Jails in BSD kernel, 113, 606

Java

analysis, 29

case study]], 783–792

flame graphs, 201, 271

dynamic USDT, 156, 213

garbage colleciton, 185–186

Java F[[light Recorder, 135

stack traces, 215

symbols, 214

uprobes, 213

USDT probes, 155, 213

virtual machines, 185

Java F[[light Recorder (JFR), 135

JavaScript Object Notation (JSON) format, 163–164

JBOD (just a bunch of disks), 443

je[[malloc allocator, 322

JFR (Java F[[light Recorder), 135

JIT (just-in-[[time) compilation

Linux kernel, 117

PGO kernels, 122

runtime missing symbols, 214

Jitter in operating systems, 99

jmaps tool, 214

join function, 778

Journaling

btrfs, 382

ext3, 378–379

file systems, 376

XFS, 380

JSON (JavaScript Object Notation) format, 163–164

Jumbo frames

packets, 505

tuning, 574

Just a bunch of disks (JBOD), 443

Just-in-[[time (JIT) compilation

Linux kernel, 117

PGO kernels, 122

runtime missing symbols, 214

K

kaddr function, 779

Kata Containers, 631

KCM (Kernel Connection Multiplex]]or), 118

Keep-alive strategy in networks, 507

Kendall’s notation for queueing systems, 67–68

Kernel-based Virtual Machine (KVM) technology

CPU quotas, 595

description, 589

I/O path, 594

Linux kernel, 116

observability, 600–603

Kernel bypass for networks, 523

Kernel Connection Multiplex]]or (KCM), 118

Kernel mode, 93

Kernel page table isolation (KPTI) patches, 121

Kernel space, 90

Kernel state in thread state analysis, 194–197

Kernel statistics (Kstat) framework, 159–160

Kernel time

CPUs, 226

syscall analysis, 192

Kernels

bpftrace for, 765

BSD, 113

comparisons, 124

defined, 90

developments, 115–120

execution, 92–93

file systems, 107

filtering in OS virtualization, 629

Linux, 114–122, 124

microkernels, 123

monolithic, 123

overview, 91–92

PGO, 122

PMU events, 680

preemption, 110

schedulers, 105–106

Solaris, 114

stacks, 103

system calls, 94–95

time analysis, 202

unikernels, 123

Unix, 112

USE method, 798

user modes, 93–94

versions, 111–112

KernelShark software, 83–84, 739–740

kfunc probes, 774

killsnoop tool

BCC, 756

perf-tools, 743

klockstat tool, 756

kmem subcommand for perf, 673, 702

Knee points

models, 62–64

scalability, 31

Known-knowns, 37

Known-unknowns, 37

kprobe_events file, 710

kprobe probes, 774

kprobe profiler, 707

kprobe tool, 744

kprobes, 685–686

arguments, 686–687, 720–721

event tracing, 719–720

filters, 721–722

overview, 151–153

profiling, 722

return values, 721

triggers, 721–722

kprobes tracer, 708

KPTI (kernel page table isolation) patches, 121

kretfunc probes, 774

kretprobes, 152–153, 774

kstack function in bpftrace, 779

kstack variable in bpftrace, 778

Kstat (kernel statistics) framework, 159–160

kswapd tool, 318–319, 374

ksym function, 779

kubectl command, 621

Kubernetes

node, 608

orchestration, 586

OS virtualization, 620–621

KVM. See Kernel-based Virtual Machine (KVM) technology

kvm_entry tool, 602

kvm_exit tool, 602

kvm subcommand for perf, 673, 702

kvm_vcpu_halt command, 592

kvmexits.bt tool, 602–603

Kyber multi-queue schedulers, 449

L

L2ARC cache in ZFS, 381

Label selectors in cloud computing, 586

Language virtual machines, 185

Large Receive Offload (LRO), 116

Large segment offload for packet size, 505

Last-level caches (LLCs), 232

Latency

analysis methodologies, 56–57

applications, 173

biolatency, 468–470

CPUs, 233–234

defined, 22

disk I/O, 428–430, 454–455, 467–472, 482–483

distributions, 76–77

file systems, 362–363, 384–386, 388

graph tracing, 724–725

hardware, 118

hardware virtualization, 604

heat maps, 82–83, 488–489

I/O profiling, 210–211

interrupts, 98

line charts, 80–81

memory, 311, 441

methodologies, 24–25

networks, analysis, 528–529

networks, connections, 7, 24–25, 505–506, 528

networks, defined, 500

networks, types, 505–507

outliers, 58, 186, 424, 471–472

overview, 6–7

packets, 532–533

percentiles, 413–414

perf, 467–468

performance metric, 32

run-queue, 222

scatter plots, 81–82, 488

scheduler, 226, 272–273

solid-state drives, 441

ticks, 99

transaction costs analysis, 385–386

VFS, 406–408

workload analysis, 39–40

LatencyTOP tool for file systems, 396

latencytop tool for operating systems, 116

Lazy shootdowns, 367

LBR (last branch record), 216, 676, 696

Leak detection for memory, 326–327

Least frequently used (LFU) caching algorithm, 36

Least recently used (LRU) caching algorithm, 36

Level 1 caches

data, 232

instructions, 232

memory, 314

Level 2 ARC, 381

Level 2 caches

embedded, 232

memory, 314

Level 3 caches

LLC, 232

memory, 314

Level of appropriateness in methodologies, 28–29

LFU (least frequently used) caching algorithm, 36

lhist function, 780

libpcap library as observability source, 159

Life cycle for processes, 100–101

Life span

network connections, 507

solid-state drives, 441

Lightweight threads, 178

Lightweight virtualization

comparisons, 634–636

implementation, 631–632

observability, 632–633

overhead, 632

overview, 630

resource controls, 632

Limit investigations, benchmarking]] for, 642

Limitations of averages, 75

Limits for OS virtualization resources, 613

limits tool, 141

Line charts

baseline statistics, 59

disks, 487–488

working with, 80–81

Linear scalability

methodologies, 32

models, 63

Link aggregation tuning, 574

Link-[[time optimization (LTO), 122

Linux 60-second analysis, 15–16

Linux operating system

crisis tools, 131–133

extended BPF, 121–122

kernel developments, 115–120

KPTI patches, 121

network stacks, 518–519

observability sources, 138–146

observability tools, 130

operating system disk I/O stack, 447–448

overview, 114–115

static performance tools, 130–131

systemd]] service manager, 120

thread state analysis, 195–197

linux-tools-common linux-tools tool package, 132

list subcommand

perf, 673

trace-cmd, 735

Listen backlogs in networks, 519

listen subcommand in trace-cmd, 735

Listing events

perf, 674–675

trace-cmd for, 736

Little’s Law, 66

Live reporting in sar, 165

LLCs (last-level caches), 232

llcstat tool

BCC, 756

CPUs, 285

Load averages for uptime, 255–257

Load [[balancers

capacity planning, 72

schedulers, 241

Load generation

capacity planning, 70

custom load generators, 491

micro-benchmarking]], 61

Load vs. architecture in methodologies, 30–31

loadavg tool, 142

Local memory, 312

Local network connections, 509

Localhost network connections, 509

Lock state in thread state analysis, 194–197

lock subcommand for perf, 673, 702

Locks

analysis, 198

applications, 179–181

tracing, 212–213

Logging

applications, 172

SCSI events, 486

ZFS, 381

Logical CPUs

defined, 220

hardware threads, 221

Logical I/O

defined, 360

vs. physical, 368–370

Logical metadata]] in file systems, 368

Logical operations in file systems, 361

Longest-latency caches, 232

Loopbacks in networks, 509

Loops in bpftrace, 776–777

LRO (Large Receive Offload), 116

LRU (least recently used) caching algorithm, 36

lsof tool, 561

LTO (link-[[time optimization), 122

LTTng tool, 166

M

M/D/1 queueing systems, 68–69

M/G/1 queueing systems, 68

M/M/1 queueing systems, 68

M/M/c queueing systems, 68

Macro-benchmarks, 13, 653–654

MADV_COLD option, 119

MADV_PAGEOUT option, 119

madvise system call, 367, 415–416

Magnetic rotational disks, 435–439

Main memory

caching, 37–39

defined, 90, 304

latency, 26

managing, 104–105

overview, 311–312

malloc() bytes flame graphs, 346

Map functions in bpftrace, 771–772, 780–781

Map variables in bpftrace, 771

Mapping memory. See Memory mappings

maps tool, 141

Marketing, benchmarking]] for, 642

Markov model, 654

Markovian arrivals in queueing systems, 68–69

Masking interrupts, 98–99

max function in bpftrace, 780

Maximum controller operation rate, 457

Maximum controller throughput, 457

Maximum disk operation rate, 457

Maximum disk random reads, 457

Maximum disk throughput

magnetic rotational disks, 436–437

micro-benchmarking]], 457

Maximum transmission unit (MTU) size for packets, 504–505

MCS locks, 117

mdflush tool, 487

Mean, 74

“A Measure of Transaction Processing Power,” 655

Measuring disk time, 427–429

Medians, 75

MegaCli tool, 484

Melo, Arnaldo Carvalho de, 671

Meltdown vulnerability, 121

mem subcommand for perf, 673

meminfo tool, 142

memleak tool

BCC, 756

memory, 348

Memory, 303–304

allocators, 309, 353

architecture. See Memory architecture

benchmark questions, 667–668

bpftrace for, 763–764, 804–805

BSD kernel, 113

CPU caches, 221–222

CPU tradeoffs with, 27

demand paging, 307–308

exercises, 354–355

file system cache usage, 309

garbage collection]], 185

hardware virtualization, 596–597

internals, 346–347

mappings. See Memory mappings

methodology. See Memory methodology

multiple page sizes, 352–353

multiprocess vs. multithreading, 228

NUMA binding, 353

observability tools. See Memory observability tools

OS virtualization, 611, 613, 615–616

OS virtualization strategy, 630

overcommit, 308

overprovisioning in solid-state drives, 441

paging, 306–307

persistent, 441

process swapping, 308–309

references, 355–357

resource controls, 353–354

shared, 310

shrinking method, 328

terminology, 304

tuning, 350–354

USE method, 49–51, 796–798

utilization and saturation, 309

virtual, 90, 104–105, 304–305

word size, 310

working set size, 310

Memory architecture, 311

buses, 312–313

CPU caches, 314

freeing memory, 315–318

hardware, 311–315

latency, 311

main memory, 311–312

MMU, 314

process virtual address space, 319–322

software, 315–322

TLB, 314

memory control group]], 610, 616

Memory locality, 222

Memory management units (MMUs), 235, 314

Memory mappings

displaying, 337–338

files, 367

hardware virtualization, 592–593

heap growth, 320

kernel, 94

micro-benchmarking]], 390

OS virtualization, 611

Memory methodology

cycle analysis, 326

leak detection, 326–327

memory shrinking, 328

micro-benchmarking]], 328

overview, 323

performance monitoring, 326

resource controls, 328

static performance tuning, 327–328

tools method, 323–324

usage characterization, 325–326

USE method, 324–325

Memory observability tools

bpftrace, 343–347

drsnoop, 342

miscellaneous, 347–350

numastat, 334–335

overview, 328–329

perf, 338–342

pmap, 337–338

ps, 335–336

PSI, 330–331

sar, 331–333

slabtop, 333–334

swapon, 331

top, 336–337

vmstat, 329–330

wss, 342–343

Memory reclaim state in delay accounting, 145

Metadata]]

ext3, 378

file systems, 367–368

Method R, 57

Methodologies, 21–22

ad hoc checklist method, 43–44

anti-methods, 42–43

applications. See Applications methodology

baseline statistics, 59

benchmarking]]. See Benchmarking]] methodology

cache tuning, 60

caching, 35–37

capacity planning, 69–73

CPUs. See CPUs methodology

diagnosis cycle, 46

disks. See Disks methodology

drill-down analysis, 55–56

event tracing, 57–58

exercises, 85–86

file systems. See File [[systems methodology

general, 40–41

known-unknowns, 37

latency analysis, 56–57

latency overview, 24–25

level of appropriateness, 28–29

Linux 60-second analysis checklist, 15–16

load vs. architecture, 30–31

memory. See Memory methodology

Method R, 57

metrics, 32–33

micro-benchmarking]], 60–61

modeling. See Methodologies modeling

models, 23–24

monitoring, 77–79

networks. See Networks methodology

performance, 41–42

performance mantras, 61

perspectives, 37–40

point-in-[[time recommendations, 29–30

problem statement, 44

profiling, 35

RED method, 53

references, 86–87

resource analysis, 38–39

saturation, 34–35

scalability, 31–32

scientific method, 44–46

static performance tuning, 59–60

statistics, 73–77

stop indicators, 29

terminology, 22–23

time scales, 25–26

tools method, 46

trade-off]]s, 26–27

tuning efforts, 27–28

USE method, 47–53

utilization, 33–34

visualizations. See Methodologies visualizations

workload analysis, 39–40

workload characterization, 54

Methodologies modeling, 62

Amdahl’s Law of Scalability, 64–65

enterprise vs. cloud, 62

queueing theory, 66–69

Universal Scalability Law, 65–66

visual identification, 62–64

Methodologies visualizations, 79

heat maps, 82–83

line charts, 80–81

scatter plots, 81–82

surface plots, 84–85

timeline]] charts, 83–84

tools, 85

Metrics, 8–9

applications, 172

fixed counters, 133–135

methodologies, 32–33

observability tools, 167–168

resource analysis, 38

USE method, 48–51

workload analysis, 40

MFU (most frequently used) caching algorithm, 36

Micro-benchmarking]]

capacity planning, 70

CPUs, 253–254

description, 13

design example, 652–653

disks, 456–457, 491–492

file systems, 390–391, 412–414

memory, 328

methodologies, 60–61

networks, 533

overview, 651–652

Micro-operations (uOps), 224

Microcode ROM in CPUs, 230

Microkernels, 92, 123

Microservices

cloud computing, 583–584

USE method, 53

Midpath state for Mutex locks, 179

Migration types for free lists, 317

min function in bpftrace, 780

MINIX operating system, 114

Minor faults, 307

MIPS (millions of instructions per second]]) in benchmarking]], 655

Misleading benchmarks, 650

Missing stacks, 215–216

Missing symbols, 214

Mixed-mode CPU profiles, 187

Mixed-mode flame graphs, 187

MLC (multi-level cell) flash memory]], 440

mmap sys call

description, 95

memory mapping, 320, 367

mmapfiles tool, 409

mmapsnoop tool, 348

mmiotrace tracer, 708

MMUs (memory management units), 235, 314

mnt control group]], 609

Mode switches

defined, 90

kernels, 93

Model-specific registers (MSRs)

CPUs, 238

observability source, 159

Models

Amdahl’s Law of Scalability, 64–65

CPUs, 221–222

disks, 425–426

enterprise vs. cloud, 62

file systems, 361–362

methodologies, 23–24

networks, 501–502

overview, 62

queueing theory, 66–69

Universal Scalability Law, 65–66

visual identification, 62–64

wireframe, 84–85

Modular I/O scheduling, 116

Monitoring, 77–79

CPUs, 251

disks, 452

drill-down analysis, 55

file systems, 388

memory, 326

networks, 529, 537

observability tools, 137–138

products, 79

sar, 161–162

summary-since-boot values, 79

time-based patterns, 77–78

Monolithic kernels, 91, 123

Most frequently used (MFU) caching algorithm, 36

Most recently used (MRU) caching algorithm, 36

Mount points in file systems, 106

mount tool

file systems, 392

options, 416–417

Mounting]] file systems, 106, 392

mountsnoop tool, 409

mpstat tool

case study]], 785–786

CPUs, 245, 259

description, 15

fixed counters, 134

lightweight virtualization, 633

OS virtualization, 619

mq-deadline multi-queue schedulers, 449

MR-IOV (multiroot I/O virtualization), 593–594

MRU (most recently used) caching algorithm, 36

MSG_ZEROCOPY flag, 119

msr-tools tool package, 132

MSRs (model-specific registers)

CPUs, 238

observability source, 159

mtr tool, 567

Multi-level cell (MLC) flash memory]], 440

Multi-queue schedulers

description, 119

operating system disk I/O stack, 449

Multiblock allocators in ext4, 379

Multicalls in paravirtualization, 588

Multicast network transmissions, 503

Multichannel memory buses, 313

Multics (Multiplexed]] Information and Computer Services) operating system, 112

Multimodal distributions, 76–77

MultiPath TCP, 119

Multiple causes as performance challenge, 6

Multiple page sizes, 352–353

Multiple performance issues, 6

Multiple pre[[fetch streams in ZFS, 381

Multiple-zone disk recording, 437

Multiplexed]] Information and Computer Services (Multics) operating system, 112

Multiprocess CPUs, 227–229

Multiprocessors

applications, 177–181

overview, 110

Solaris kernel support, 114

Multiqueue block I/O, 117

Multiqueue I/O schedulers, 119

Multiroot I/O virtualization (MR-IOV), 593–594

Multitenancy in cloud computing, 580

contention in hardware virtualization, 595

contention in OS virtualization, 612–613

overview, 585–586

Multithreading

applications, 177–181

CPUs, 227–229

SMT, 225

Mutex (MUTually EXclusive) locks

applications, 179–180

contention, 198

tracing, 212–213

USE method, 52

MySQL database

bpftrace tracing, 212–213

CPU flame graph, 187–188

CPU profiling, 200, 203, 269–270, 277, 283–284, 697–700

disk I/O tracing, 466–467, 470–471, 488

file tracing, 397–398, 401–402

memory [[allocation, 345

memory mappings, 337–338

network tracing, 552–554

Off–CPU analysis, 204–205, 275–276

Off–CPU Time flame graphs, 190–192

page fault sampling, 339–341

query latency analysis, 56

scheduler latency, 272, 279–280

s[[hards, 582

slow query log, 172

stack traces, 215

syscall tracing, 201–202

working set size, 342

mysqld_qslower tool, 756

N

NAGLE algorithm for TCP congestion control, 513

Name [[resolution latency, 505, 528

Namespaces in OS virtualization, 606–609, 620, 623–624

NAPI (New API) framework, 522

NAS (network-attached storage), 446

Native Command Queueing (NCQ), 437

Native hypervisors, 587

Negative caching in Dcache, 375

Nested page tables (NPTs), 593

net control group]], 609

net_cls control group]], 610

Net I/O state in thread state analysis, 194–197

net_prio control group]], 610

net tool

description, 562

socket information, 142

Net[[filter conn[[track as observability source, 159

Netflix cloud performance team, 2–3

netlink observability tools, 145–146, 536

netperf tool, 565–566

netsize tool, 561

netstat tool, 525, 539–542

nettxlat tool, 561

Network-attached storage (NAS), 446

Network interface cards (NICs)

description, 501–502

network connections, 109

sent and received packets, 522

Networks, 499–500

architecture. See Networks architecture

benchmark questions, 668

bpftrace for, 764–765, 807–808

buffers, 27, 507

congestion avoidance, 508

connection backlogs, 507

controllers, 501–502

encapsulation, 504

exercises, 574–575

experiments, 562–567

hardware virtualization, 597

interface negotiation, 508

interfaces, 501

latency, 505–507

local connections, 509

methodology. See Networks methodology

micro-benchmarking]] for, 61

models, 501–502

observability tools. See Networks observability tools

on-chip interfaces, 230

operating systems, 109

OS virtualization, 611–613, 617, 630

packet size, 504–505

protocol stacks, 502

protocols, 504

references, 575–578

round-trip time, 507, 528

routing, 503

sniffing, 159

stacks, 518–519

terminology, 500

throughput, 527–529

tuning. See Networks tuning

USE method, 49–51, 796–797

utilization, 508–509

Networks architecture

hardware, 515–517

protocols, 509–515

software, 517–524

Networks methodology

latency analysis, 528–529

micro-benchmarking]], 533

overview, 524–525

packet sniffing, 530–531

performance monitoring, 529

resource controls, 532–533

static performance tuning, 531–532

TCP analysis, 531

tools method, 525

USE method, 526–527

workload characterization, 527–528

Networks observability tools

bpftrace, 550–558

ethtool, 546–547

ifconfig, 537–538

ip, 536–537

miscellaneous, 560–562

netstat, 539–542

nicstat, 545–546

nstat, 538–539

overview, 533–534

sar, 543–545

ss, 534–536

tcp[[dump, 558–559

tcplife, 548

tcpretrans, 549–550

tcptop, 549

Wireshark, 560

Networks tuning, 567

configuration, 574

socket options, 573

system-wide, 567–572

New API (NAPI) framework, 522

New Vegas (NV) congestion control algorithm, 118

nfsdist tool

BCC, 756

file systems, 399

nfsslower tool, 756

nfsstat tool, 561

NFU (not frequently used) caching algorithm, 36

nice command

CPU priorities, 252

resource management, 111

scheduling priorities, 295

NICs (network interface cards)

description, 501–502

network connections, 109

sent and received packets, 522

nicstat tool, 132, 525, 545–546

“A Nine Year Study of File [[System and Storage Benchmarking]],” 643

Nitro hardware virtualization

description, 589

I/O path, 594–595

NMIs (non-maskable interrupts), 98

NO_HZ_FULL option, 117

Node taints in cloud computing, 586

Node.js

dynamic USDT, 156

event-based concurrency, 178

non-blocking I/O, 181

symbols, 214

USDT tracing, 677, 690–691

Nodes

cloud computing, 586

free lists, 317

main memory, 312

Noisy neighbors

multitenancy, 585

OS virtualization, 617

Non-blocking I/O

applications, 181

file systems, 366–367

Non-data-[[transfer disk commands, 432

Non-idle time, 34

Non-maskable interrupts (NMIs), 98

Non-regression testing

benchmarking]] for, 642

software change case study]], 18

Non-uniform memory access (NUMA)

CPUs, 244

main memory, 312

memory balancing, 117

memory binding, 353

multiprocessors, 110

Non-uniform random distributions, 413

Non-Volatile Memory express (NVMe) interface, 443

Noop I/O schedulers, 448

nop tracer, 708

Normal distribution, 75

NORMAL scheduling policy, 243

Not frequently used (NFU) caching algorithm, 36

NPTs (nested page tables), 593

nsecs variable in bpftrace, 777

nsenter command, 624

nstat tool, 134, 525, 538–539

ntop function, 779

NUMA. See Non-uniform memory access (NUMA)

numactl command, 298, 353

numactl tool package, 132

numastat tool, 334–335

Number of service centers in queueing systems, 67

NV (New Vegas) congestion control algorithm, 118

nvmelatency tool, 487

O

O in Big O notation, 175–176

O(1) scheduling class, 243

Object stores in cloud computing, 584

Observability

allocators, 321

applications, 174

benchmarks, 643

counters, statistics, and metrics, 8–9

hardware virtualization, 597–605

operating systems, 111

OS virtualization. See OS virtualization observability

overview, 7–8

profiling, 10–11

RAID, 445

tracing, 11–12

volumes and pools, 383

Observability tools, 129

applications. See Applications observability tools

coverage, 130

CPUs. See CPUs observability tools

crisis, 131–133

disks. See Disks observability tools

evaluating results, 167–168

exercises, 168

file system. See File [[systems observability tools

fixed counters, 133–135

memory. See Memory observability tools

monitoring, 137–138

network. See Networks observability tools

profiling, 135

references, 168–169

sar, 160–166

static performance, 130–131

tracing, 136, 166

types, 133

Observability tools sources, 138–140

delay accounting, 145

hardware counters, 156–158

kprobes, 151–153

miscellaneous, 159–160

netlink, 145–146

/proc file system, 140–143

/sys file system, 143–144

tracepoints, 146–151

uprobes, 153–155

USDT, 155–156

Observation-based performance gains, 73

Observational tests in scientific method, 44–45

Observer effect in metrics, 33

off-CPU

analysis process, 189–192

footprints, 188–189

thread state analysis, 197

time flame graphs, 205

offcputime tool

BCC, 756

description, 285

networks, 561

scheduler tracing, 190

slow disks case study]], 17

stack traces, 204–205

time flame graphs, 205

Offset heat maps, 289, 489–490

offwaketime tool, 756

On-chip caches, 231

On-die caches, 231

On-disk caches, 425–426, 430, 437

Online balancing, 382

Online defragmentation, 380

OOM killer (out-of-memory killer), 316–317, 324

OOM (out of memory), defined, 304

oomkill tool

BCC, 756

description, 348

open command

description, 94

non-blocking I/O, 181

Open Container Interface, 586

openat syscalls, 404

opensnoop tool

BCC, 756

file systems, 397

perf-tools, 743

Operating systems, 89

additional reading, 127–128

caching, 108–109

clocks and idle, 99

defined, 90

device drivers, 109–110

disk I/O stack, 446–449

distributed, 123–124

exercises, 124–125

file systems, 106–108

hybrid kernels, 123

interrupts, 96–99

jitter, 99

kernels, 91–95, 111–114, 124

Linux. See Linux operating system

microkernels, 123

multiprocessors, 110

networking, 109

observability, 111

PGO kernels, 122

preemption, 110

processes, 99–102

references, 125–127

resource management, 110–111

schedulers, 105–106

stacks, 102–103

system calls, 94–95

terminology, 90–91

tunables for disks, 493–494

unikernels, 123

virtual memory, 104–105

virtualization. See OS virtualization

Operation rate

defined, 22

file systems, 387–388

Operations

applications, 172

defined, 360

file systems, 370–371

micro-benchmarking]], 390

Operators for bpftrace, 776–777

OProfile system profiler, 115

oprofile tool, 285

Optimistic spinning in Mutex locks, 179

Optimizations

applications, 174

compiler, 183–184, 229

feedback]]-directed, 122

networks, 524

Orchestration in cloud computing, 586

Ordered mode in ext3, 378

Orlov block allocator, 379

OS instances in cloud computing, 580

OS virtualization

comparisons, 634–636

control group]]s, 609–610

implementation, 607–610

namespaces, 606–609

overhead, 610–613

overview, 605–607

resource controls, 613–617

OS virtualization observability

BPF tracing, 624–625

containers, 620–621

guests, 627–629

hosts, 619–627

namespaces, 623–624

overview, 617–618

resource controls, 626–627

strategy, 629–630

tracing tools, 629

traditional]] tools, 618–619

OS X syscall tracing, 205

OS wait time for disks, 472

OSI model, 502

Out-of-memory killer (OOM killer), 316–317, 324

Out of memory (OOM), defined, 304

Out-of-order packets, 529

Outliers

heat maps, 82

latency, 186, 424, 471–472

normal distributions, 77

Output formats in sar, 163–165

Output with solid-state drive controllers, 440

Overcommit strategy, 115

Overcommitted main memory, 305, 308

Overflow sampling

hardware events, 683

PMCs, 157–158

Overhead

hardware virtualization, 589–595

kprobes, 153

lightweight virtualization, 632

metrics, 33

multiprocess vs. multithreading, 228

OS virtualization, 610–613

strace, 207

ticks, 99

tracepoints, 150

uprobes, 154–155

volumes and pools, 383

Overlayfs file system, 118

Overprovisioning cloud computing, 583

override function, 779

Oversize arenas, 322

P

P-caches in CPUs, 230

P-states in CPUs, 231

Pacing in networks, 524

Packages, CPUs vs. GPUs, 240

Packets

defined, 500

latency, 532–533

networks, 504

OSI model, 502

out-of-order, 529

size, 504–505

sniffing, 530–531

throttling, 522

Padding locks for hash table]]s, 181

Page caches

file systems, 374

memory, 315

Page faults

defined, 304

flame graphs, 340–342, 346

sampling, 339–340

Page-outs

daemons, 317

working with, 306

Page scanning, 318–319, 323, 374

Page tables, 235

Paged virtual memory, 113

Pages

defined, 304

kernel, 115

sizes, 352–353

Paging

anonymous, 305–307

demand, 307–308

file system, 306

memory, 104–105

overview, 306

PAPI (performance application programming interface), 158

Parallelism in applications, 177–181

Paravirtualization (PV), 588, 590

Paravirtualized I/O drivers, 593–595

Parity in RAID, 445

Partitions in Hyper-V, 589

Passive benchmarking]], 656–657

Passive listening in three-way handshakes, 511

pathchar tool, 564

Pathologies in solid-state drives, 441

Patrol reads in RAID, 445

Pause frames in congestion avoidance, 508

pchar tool, 564

PCI pass-through in hardware virtualization, 593

PCP (Performance Co-Pilot), 138

PE (Portable Executable) format, 183

PEBS (precise event-based sampling), 158

Per-I/O latency values, 454

Per-interval I/O averages latency values, 454

Per-interval statistics with stat, 693

Per-process observability tools, 133

fixed counters, 134–135

/proc file system, 140–141

profiling, 135

tracing, 136

Percent busy metric, 33

Percentiles

description, 75

latency, 413–414

perf c2c command, 118

perf_event control group]], 610

perf-stat-hist tool, 744

perf tool, 13

case study]], 789–790

CPU flame graphs, 201

CPU one-liners, 267–268

CPU profiling, 200–201, 245, 268–270

description, 116

disk block devices, 465–467

disk I/O, 450, 467–468

documentation, 276

events. See perf tool events

flame graphs, 119, 270–272

hardware tracing, 276

hardware virtualization, 601–602, 604

I/O profiling, 202–203

kernel time analysis, 202

memory, 324

networks, 526, 562

one-liners for counting events, 675

one-liners for CPUs, 267–268

one-liners for disks, 467

one-liners for dynamic tracing, 677–678

one-liners for listing events, 674–675

one-liners for memory, 338–339

one-liners for profiling, 675–676

one-liners for reporting, 678–679

one-liners for static tracing, 676–677

OS virtualization, 619, 629

overview, 671–672

page fault flame graphs, 340–342

page fault sampling, 339–340

PMCs, 157, 273–274

process profiling, 271–272

profiling overview, 135

references, 703–704

scheduler latency, 272–273

software tracing, 275–276

subcommands. See perf tool subcommands

syscall tracing, 201–202

thread state analysis, 196

tools collection. See perf-tools collection

vs. trace-cmd, 738–739

tracepoint events, 684–685

tracepoints, 147, 149

tracing, 136, 166

perf tool events

hardware, 274–275, 680–683

kprobes, 685–687

overview, 679–681

software, 683–684

uprobes, 687–689

USDT probes, 690–691

perf tool subcommands

documentation, 703

ftrace, 741

miscellaneous, 702–703

overview, 672–674

record, 694–696

report, 696–698

script, 698–701

stat, 691–694

trace, 701–702

perf-tools collection

vs. BCC/BPF, 747–748

coverage, 742

documentation, 748

example, 747

multi-purpose tools, 744–745

one-liners, 745–747

overview, 741–742

single-purpose tools, 743–744

perf-tools-unstable tool package, 132

Performance and performance monitoring

applications, 172

challenges, 5–6

cloud computing, 14, 586

CPUs, 251

disks, 452

file systems, 388

memory, 326

networks, 529

OS virtualization, 620

resource analysis investments, 38

Performance application programming interface (PAPI), 158

Performance Co-Pilot (PCP), 138

Performance engineers, 2–3

Performance instrumentation counters (PICs), 156

Performance Mantras

applications, 182

list of, 61

Performance monitoring counters (PMCs), 156

case study]], 788–789

challenges, 158

CPUs, 237–239, 273–274

cycle analysis, 251

documentation, 158

example, 156–157

interface, 157–158

memory, 326

Performance monitoring unit (PMU) events, 156, 680

perftrace tool, 136

Periods in OS virtualization, 615

Persistent memory, 441

Personalities in FileBench, 414

Perspectives

overview, 4–5

performance analysis, 37–38

resource analysis, 38–39

workload analysis, 39–40

Perturbations

benchmarks, 648

FlameScoped, 292–293

system test]]s, 23

pfm-events, 681

PGO (profile-guided]] optimization) kernels, 122

Physical I/O

defined, 360

vs. logical, 368–370

Physical metadata]] in file systems, 368

Physical operations in file systems, 361

Physical resources in USE method, 795–798

PICs (performance instrumentation counters), 156

pid control group]], 609

pid variable in bpftrace, 777

pids control group]], 610

PIDs (process IDs)

filters, 729–730

process environment, 101

pidstat tool

CPUs, 245, 262

description, 15

disks, 464–465

OS virtualization, 619

thread state analysis, 196

Ping latency, 505–506, 528

ping tool, 562–563

Pipelines in ZFS, 381

pktgen tool, 567

Platters in magnetic rotational disks, 435–436

Plugins for monitoring software, 137

pmap tool, 135, 337–338

pmcarch tool

CPUs, 265–266

memory, 348

PMCs. See Performance monitoring counters (PMCs)

pmheld tool, 212–213

pmlock tool, 212

PMU (performance monitoring unit) events, 156, 680

Pods in cloud computing, 586

Point-in-[[time recommendations in methodologies, 29–30

Policies for scheduling classes, 106, 242–243

poll system call, 177

Polling applications, 177

Pooled storage

btrfs, 382

overview, 382–383

ZFS, 380

Portability of benchmarks, 643

Portable Executable (PE) format, 183

Ports

ephemeral, 531

network, 501

posix_fadvise call, 415

Power states in processors, 297

Preallocation in ext4, 379

Precise event-based sampling (PEBS), 158

Prediction step in scientific method, 44–45

Preemption

CPUs, 227

Linux kernel, 116

operating systems, 110

schedulers, 241

Solaris kernel, 114

preemptirsqoff tracer, 708

preemptoff tracer, 708

Pre[[fetch caches, 230

Pre[[fetch for file systems

overview, 364–365

ZFS, 381

Presentability of benchmarks, 643

Pressure stall information (PSI)

CPUs, 257–258

description, 119

disks, 464

memory, 323, 330–331

pressure tool, 142

Price/performance ratio

applications, 173

benchmarking]] for, 643

print function, 780

printf function, 770, 778

Priority

CPUs, 227, 252–253

OS virtualization resources, 613

schedulers, 105–106

scheduling classes, 242–243, 295

Priority inheritance scheme, 227

Priority inversion, 227

Priority pause frames in congestion avoidance, 508

Private clouds, 580

Privilege rings in kernels, 93

probe subcommand for perf, 673

probe variable in bpftrace, 778

Probes and probe events

bpftrace, 767–768, 774–775

kprobes, 685–687

perf, 685

uprobes, 687–689

USDT, 690–691

wildcards, 768–769

Problem statement

case study]], 16, 783–784

determining, 44

/proc file system observability tools, 140–143

Process-context IDs (PCIDs), 119

Process IDs (PIDs)

filters, 729–730

process environment, 101

Processes

accounting, 159

creating, 100

defined, 90

environment, 101–102

life cycle, 100–101

overview, 99–100

profiling, 271–272

schedulers, 105–106

swapping, 104–105, 308–309

syscall analysis, 192

tracing, 207–208

USE method, 52

virtual address space, 319–322

Processors

binding, 181–182

defined, 90, 220

power states, 297

tuning, 299

procps tool package, 131

Products, monitoring, 79

Profile-guided]] optimization (PGO) kernels, 122

profile probes, 774

profile tool

applications, 203–204

BCC, 756

CPUs, 245, 277–278

profiling, 135

trace-cmd, 735

Profilers

Ftrace, 707

perf-tools for, 745

Profiling

CPUs. See CPUs profiling

I/O, 203–204, 210–212

interpretation, 249–250

kprobes, 722

methodologies, 35

observability tools, 135

overview, 10–11

perf, 675–676

uprobes, 723

Program counter threads, 100

Programming languages

bpftrace. See bpftrace tool programming

compiled, 183–184

garbage collection]], 185–186

Interpreted, 184–185

overview, 182–183

virtual machines, 185

Prometheus monitoring software, 138

Proofs of concept

benchmarking]] for, 642

testing, 3

Proportional set size (PSS) in shared memory]], 310

Protection rings in kernels, 93

Protocols

HTTP/3, 515

IP, 509–510

networks, 502, 504, 509–515

QUIC, 515

TCP, 510–514

UDP, 514

ps tool

CPUs, 260–261

fixed counters, 134

memory, 335–336

OS virtualization, 619

PSI. See Pressure stall information (PSI)

PSS (proportional set size) in shared memory]], 310

Pterodactyl latency heat maps, 488–489

ptime tool, 263–264

ptrace tool, 159

Public clouds, 580

PV (paravirtualization), 588, 590

Q

qdisc-fq tool, 561

QEMU (Quick Emulator)

hardware virtualization, 589

lightweight virtualization, 631

qemu-system-x86 process, 600

QLC (quad-level cell) flash memory]], 440

QoS (quality of service) for networks, 532–533

QPI (Quick Path Interconnect), 236–237

Qspinlocks, 117–118

Quad-level cell (QLC) flash memory]], 440

Quality of service (QoS) for networks, 532–533

Quantifying issues, 6

Quantifying performance gains, 73–74

Quarterly patterns, monitoring, 79

Question step in scientific method, 44–45

Queued spinlocks, 117–118

Queued time for disks, 472

Queueing disciplines

networks, 521

OS virtualization, 617

tuning, 571

Queues

I/O schedulers, 448–449

interrupts, 98

overview, 23–24

queueing theory, 66–69

run. See Run queues

TCP connections, 519–520

QUIC protocol, 515

Quick Emulator (QEMU)

hardware virtualization, 589

lightweight virtualization, 631

Quick Path Interconnect (QPI), 236–237

Quotas in OS virtualization, 615

R

RACK (recent acknowledgments) in TCP, 514

RAID (redundant array of independent disks) architecture, 444–445

Ramping load benchmarking]], 662–664

Random-access pattern in micro-benchmarking]], 390

Random change anti-method, 42–43

Random I/O

disk read example, 491–492

disks, 430–431, 436

latency profile, micro-benchmarking]], 457

vs. sequential, 363–364

Rate transitions in networks, 517

Raw hardware event descriptors, 680

Raw I/O, 366, 447

Raw tracepoints, 150

RCU (read-copy update), 115

RCU-walk (read-copy-update-walk) algorithm, 375

rdma control group]], 610

Re-exec method in heap growth, 320

Read-ahead in file systems, 365

Read-copy update (RCU), 115

Read-copy-update-walk (RCU-walk) algorithm, 375

Read latency profile in micro-benchmarking]], 457

Read-modify-write operation in RAID, 445

read syscalls

description, 94

tracing, 404–405

Read/write ratio in disks, 431

readahead tool, 409

Reader/writer (RW) locks, 179

Real-time scheduling classes, 106, 253

Real-time systems, interrupt masking in, 98

Realism in benchmarks, 643

Reaping memory, 316, 318

Rebuilding volumes and pools, 383

Receive Flow Steering (RFS) in networks, 523

Receive Packet Steering (RPS) in networks, 523

Receive packets in NICs, 522

Receive Side Scaling (RSS) in networks, 522–523

Recent acknowledgments (RACK) in TCP, 514

Reclaimed pages, 317

Record size, defined, 360

record subcommand for perf

CPU profiling, 695–696

example, 672

options, 695

overview, 694–695

software events, 683–684

stack walking, 696

record subcommand for trace-cmd, 735

RED method, 53

Reduced instruction set computers (RISCs), 224

Redundant array of independent disks (RAID) architecture, 444–445

reg function, 779

Regression testing, 18

Remote memory, 312

Reno algorithm for TCP congestion control, 513

Repeatability of benchmarks, 643

Replay benchmarking]], 654

report subcommand for perf

example, 672

overview, 696–697

STDIO, 697–698

TUI interface, 697

report subcommand for trace-cmd, 735

Reporting

perf, 678–679

sar, 163, 165

trace-cmd, 737

Request latency, 7

Request rate in RED method, 53

Request time in I/O, 427

Requests in workload analysis, 39

Resident memory, defined, 304

Resident set size (RSS), 308

Resilvering volumes and pools, 383

Resource analysis perspectives, 4–5, 38–39

Resource controls

cloud computing, 586

CPUs, 253, 298

disks, 456, 494

hardware virtualization, 595–597

lightweight virtualization, 632

memory, 328, 353–354

networks, 532–533

operating systems, 110–111

OS virtualization, 613–617, 626–627

tuning, 571

USE method, 52

Resource isolation in cloud computing, 586

Resource limits in capacity planning, 70–71

Resource lists in USE method, 49

Resource utilization in applications, 173

Resources in USE method, 47

Response time]]

defined, 22

disks, 452

latency, 24

restart subcommand in trace-cmd, 735

Results in event tracing, 58

Retention policy for caching, 36

Retransmits

latency, 528

TCP, 510, 512, 529

UDP, 514

Retrospectives, 4

Return values

kprobes, 721

kretprobes, 152

ukretprobes, 154

uprobes, 723

retval variable in bpftrace, 778

RFS (Receive Flow Steering) in networks, 523

Ring buffers

applications, 177

networks, 522

RISCs (reduced instruction set computers), 224

Robertson, Alastair 761

Roles, 2–3

Root level in file systems, 106

Rostedt, Steven, 705, 711, 734, 739–740

Rotation time in magnetic rotational disks, 436

Round-trip time (RTT) in networks, 507, 528

Route tables, 537

Router]]s, 516–517

Routing networks, 503

RPS (Receive Packet Steering) in networks, 523

RR scheduling policy, 243

RSS (Receive Side Scaling) in networks, 522–523

RSS (resident set size), 308

RT scheduling class, 242–243

RTT (round-trip time) in networks, 507, 528

Run queues

CPUs, 222

defined, 220

latency, 222

schedulers, 105, 241

Runnability of benchmarks, 643

Runnable state in thread state analysis, 194–197

runqlat tool

CPUs, 279–280

description, 756

runqlen tool

CPUs, 280–281

description, 756

runqslower tool

CPUs, 285

description, 756

RW (reader/writer) locks, 179

S

S3 (Simple Storage Service), 585

SaaS (software as a service), 634

SACK (selective acknowledgment) algorithm, 514

SACKs (selective acknowledgments), 510

Sampling

CPU profiling, 35, 135, 187, 200–201, 247–248

distributed tracing, 199

off-CPU analysis, 189–190

page faults, 339–340

PMCs, 157–158

run queues, 242–243

Sanity checks in benchmarking]], 664–665

sar (system activity reporter)

configuration, 162

coverage, 161

CPUs, 260

description, 15

disks, 463–464

documentation, 165–166

file systems, 393–394

fixed counters, 134

live reporting, 165

memory, 331–333

monitoring, 137, 161–165

networks, 543–545

options, 801–802

OS virtualization, 619

output formats, 163–165

overview, 160

reporting, 163

thread state analysis, 196

SAS (Serial Attached SCSI) disk interface, 442

SATA (Serial ATA) disk interface, 442

Saturation

applications, 193

CPUs, 226–227, 245–246, 251, 795, 797

defined, 22

disk controllers, 451

disk devices, 434, 451

flame graphs, 291

I/O, 798

kernels, 798

memory, 309, 324–326, 796–797

methodologies, 34–35

networks, 526–527, 796–797

resource analysis, 38

storage, 797

task capacity, 799

USE method, 47–48, 51–53

user mutex, 799

Saturation points in scalability, 31

Scalability and scaling

Amdahl’s Law of Scalability, 64–65

capacity planning, 72–73

cloud computing, 581–584

CPU, 522–523

CPUs vs. GPUs, 240

disks, 457–458

methodologies, 31–32

models, 63–64

multithreading, 227

Universal Scalability Law, 65–66

Scalability ceiling, 64

Scalable Vector Graphics (SVG) files, 164

Scaling governors, 297

Scanning pages, 318–319, 323, 374

Scatter plots

disk I/O, 81–82

I/O latency, 488

sched command, 141

SCHED_DEADLINE policy, 117

sched subcommand for perf, 272–273, 673, 702

schedstat tool, 141–142

Scheduler latency

CPUs, 226, 272–273

delay accounting, 145

run queues, 222

Scheduler tracing off-CPU analysis, 189–190

Schedulers

CPUs, 241–242

defined, 220

hardware virtualization, 596–597

kernel, 105–106

multiqueue I/O, 119

options, 295–296

OS disk I/O stack, 448–449

scheduling internals, 284–285

Scheduling classes

CPUs, 115, 242–243

I/O, 115, 493

kernel, 106

priority, 295

Scheduling in Kubernetes, 586

Scientific method, 44–46

Scratch variables in bpftrace, 770–771

scread tool, 409

script subcommand

flame graphs, 700

overview, 698–700

trace scripts, 700–701

script subcommand for perf, 673

Scrubbing file systems, 376

SCSI (Small Computer System Interface)

disks, 442

event logging, 486

scsilatency tool, 487

scsiresult tool, 487

SDT events, 681

Second-level caches in file systems, 362

Sectors in disks

defined, 424

size, 437

zoning, 437

Security boot options, 298–299

SEDA (staged event-driven architecture]]), 178

SEDF (simple earliest deadline first) schedulers, 595

Seek time in magnetic rotational disks, 436

seeksize tool, 487

seekwatcher tool, 487

Segments

defined, 304

OSI model, 502

process virtual address space, 319

segmentation offload, 520–521

Selective acknowledgment (SACK) algorithm, 514

Selective acknowledgments (SACKs), 510

Self-Monitoring, Analysis and Reporting Technology (SMART) data, 485

self tool, 142

Semaphores for applications, 179

Send packets in NICs, 522

sendfile command, 181

Sequential I/O

disks, 430–431, 436

vs. random, 363–364

Serial ATA (SATA) disk interface, 442

Serial Attached SCSI (SAS) disk interface, 442

Server instances in cloud computing, 580

Service consoles in hardware virtualization, 589

Service thread pools for applications, 178

Service time

defined, 22

I/O, 427–429

queueing systems, 67–69

Set associative caches, 234

set_ftrace_filter file, 710

Shadow page tables, 593

Shadow statistics, 694

S[[hards

capacity planning, 73

cloud computing, 582

Shared memory]], 310

Shared system buses, 312

Shares in OS virtualization, 614–615, 626

Shell scripting, 184

Shingled Magnetic Recording (SMR) drives, 439

shmsnoop tool, 348

Short-lived processes, 12, 207–208

Short-stroking in magnetic rotational disks, 437

showboost tool, 245, 265

signal function, 779

Signal tracing, 209–210

Simple disk model, 425

Simple earliest deadline first (SEDF) schedulers, 595

Simple Network Management Protocol (SNMP), 55, 137

Simple Storage Service (S3), 585

Simulation benchmarking]], 653–654

Simultaneous multithreading (SMT), 220, 225

Single-level cell (SLC) flash memory]], 440

Single root I/O virtualization (SR-IOV), 593

Site reliability engineers (SREs), 4

Size

blocks, 27, 360, 375, 378

cloud computing, 583–584

disk I/O, 432, 480–481

disk sectors, 437

free lists, 317

I/O, 176, 390

instruction, 224

multiple page, 352–353

packets, 504–505

virtual memory, 308

word, 229, 310

working set. See Working set size (WSS)

sizeof function, 779

skbdrop tool, 561

skblife tool, 561

Slab

allocator, 114

process virtual address space, 321–322

slabinfo tool, 142

slabtop tool, 333–334, 394–395

SLC (single-level cell) flash memory]], 440

Sleeping state in thread state analysis, 194–197

Sliding windows in TCP, 510

SLOG log in ZFS, 381

Sloth disks, 438

Slow disks case study]], 16–18

Slow-start in TCP, 510

Slowpath state in Mutex locks, 179

SLUB allocator, 116, 322

Small Computer System Interface (SCSI)

disks, 442

event logging, 486

smaps tool, 141

SMART (Self-Monitoring, Analysis and Reporting Technology) data, 485

smartctl tool, 484–486

SMP (symmetric multiprocessing), 110

smpcalls tool, 285

SMR (Shingled Magnetic Recording) drives, 439

SMs (streaming multiprocessors), 240

SMT (simultaneous multithreading), 220, 225

Snapshots

btrfs, 382

ZFS, 381

Sniffing packets, 530–531

SNMP (Simple Network Management Protocol), 55, 137

SO_BUSY_POLL socket option, 522

SO_REUSEPORT socket option, 117

SO_TIMESTAMP socket option, 529

SO_TIMESTAMPING socket option, 529

so1stbyte tool, 561

soaccept tool, 561

socketio tool, 561

socketio.bt tool, 553–554

Sockets]]

BSD, 113

defined, 500

description, 109

local connections, 509

options, 573

statistics, 534–536

tracing, 552–555

tuning, 569

socksize tool, 561

sockstat tool, 561

soconnect tool, 561

soconnlat tool, 561

sofamily tool, 561

Soft interrupts, 281–282

softirqs tool, 281–282

Software

memory, 315–322

networks, 517–524

Software as a service (SaaS), 634

Software change case study]], 18–19

Software events

case study]], 789–790

observability source, 159

perf, 680, 683–684

recording and tracing, 275–276

software probes, 774

Software resources

capacity planning, 70

USE method, 52, 798–799

Solaris

kernel, 114

Kstat, 160

Slab allocator, 322, 652

syscall tracing, 205

top tool Solaris mode, 262

zones, 606, 620

Solid-state disks (SSDs)

cache devices, 117

overview, 439–441

soprotocol tool, 561

sormem tool, 561

Source [[code for applications, 172

SPEC (Standard Performance Evaluation Corporation) benchmarks, 655–656

Special file systems, 371

Speedup with latency, 7

Spin locks

applications, 179

contention, 198

queued, 118

splice call, 116

SPs (streaming processors), 240

SR-IOV (single root I/O virtualization), 593

SREs (site reliability engineers), 4

ss tool, 145–146, 525, 534–536

SSDs (solid-state disks)

cache devices, 117

overview, 439–441

Stack helpers, 214

Stack traces

description, 102

displaying, 204–205

keys, 730–731

Stack walking, 102, 696

stackcount tool, 757–758

Stacks

I/O, 107–108, 372

JIT symbols, 214

missing, 215–216

network, 109, 518–519

operating system disk I/O, 446–449

overview, 102

process virtual address space, 319

protocol, 502

reading, 102–103

user and kernel, 103

Staged event-driven architecture]] (SEDA), 178

Stall cycles in CPUs, 223

Standard deviation, 75

Standard Performance Evaluation Corporation (SPEC) benchmarks, 655–656

Starovoitov, Alexei, 121

start subcommand in trace-cmd, 735

Starvation]] in deadline I/O schedulers, 448

stat subcommand in perf

description, 635

event filters, 693–694

interval statistics, 693

options, 692–693

overview, 691–692

per-CPU balance, 693

shadow statistics, 694

stat subcommand in trace-cmd, 735

stat tool, 95, 141–142

Stateful workload simulation, 654

Stateless workload simulation, 653

Statelessness of UDP, 514

States

TCP, 511–512

thread state analysis, 193–197

Static instrumentation

overview, 11–12

perf events, 681

tracepoints, 146, 717

Static performance tuning

applications methodology, 198–199

CPUs, 252

disks, 455–456

file systems, 389

memory, 327–328

methodologies, 59–60

networks, 531–532

tools, 130–131

Static priority of threads, 242–243

Static probes, 116

Static tracing in perf, 676–677

Statistical analysis in benchmarking]], 665–666

Statistics, 8–9

averages, 74–75

baseline, 59

case study]], 784–786

coefficient of variation, 76

line charts, 80–81

multimodal distributions, 76–77

outliers, 77

quantifying performance gains, 73–74

standard deviation, percentiles, and median, 75

statm tool, 141

stats function, 780

statsnoop tool, 409

status tool, 141

STDIO report option, 697–698

stop subcommand in trace-cmd, 735

Storage

benchmark questions, 668

cloud computing, 584–585

disks. See Disks

sample processing, 248–249

USE method, 49–51, 796–797

Storage array caches, 430

Storage arrays, 446

str function, 770, 778

strace tool

bonnie++ tool, 660

file system latency, 395

format strings, 149–150

limitations, 202

networks, 561

overhead, 207

system call tracing, 205–207

tracing, 136

stream subcommand in trace-cmd, 735

Streaming multiprocessors (SMs), 240

Streaming processors (SPs), 240

Streaming workloads in disks, 430–431

Streetlight effect, 42

Stress testing in software change case study]], 18

Stripe width of volumes and pools, 383

Striped allocation in XFS, 380

Stripes in RAID, 444–445

strncmp function, 778

Stub domains in hardware virtualization, 596

Subjectivity, 5

Subsecond-offset heat maps, 289

sum function in bpftrace, 780

Summary-since-boot values monitoring, 79

Super-serial model, 65–66

Superblocks in VFS, 373

superping tool, 561

Superscalar architectures for CPUs, 224

Surface plots, 84–85

SUT (system under test) models, 23

SVG (Scalable Vector Graphics) files, 164

Swap areas, defined, 304

Swap capacity in OS virtualization, 613, 616

swapin tool, 348

swapon tool

disks, 487

memory, 331

Swapping

defined, 304

memory, 316, 323

overview, 305–307

processes, 104–105, 308–309

Swapping state

delay accounting, 145

thread state analysis, 194–197

Switches in networks, 516–517

Symbol churn, 214

Symbols, missing, 214

Symmetric multiprocessing (SMP), 110

SYN backlogs, 519

SYN cookies, 511, 520

Synchronization primitives for applications, 179

Synchronous disk I/O, 434–435

Synchronous interrupts, 97

Synchronous writes, 366

syncsnoop tool

BCC, 756

file systems, 409

Synthetic events in hist triggers, 731–733

/sys file system, 143–144

/sys/fs options, 417–418

SysBench system benchmark, 294

syscount tool

BCC, 756

CPUs, 285

file systems, 409

perf-tools, 744

system calls count, 208–209

sysctl tool

congestion control, 570

network tuning, 567–568

schedulers, 296

SCSI logging, 486

sysstat tool package, 131

System activity reporter. See sar (system activity reporter)

System calls

analysis, 192

connect latency, 528

counting, 208–209

defined, 90

file system latency, 385

kernel, 92, 94–95

micro-benchmarking]] for, 61

observability source, 159

send/receive latency, 528

tracing in bpftrace, 403–405

tracing in perf, 201–202

tracing in strace, 205–207

System design]], benchmarking]] for, 642

system function in bpftrace, 770, 779

System statistics, monitoring, 138

System under test (SUT) models, 23

System-wide CPU profiling, 268–270

System-wide observability tools, 133

fixed counters, 134

/proc file system, 141–142

profiling, 135

tracing, 136

System-wide tunable parameters

byte queue limits, 571

device backlog, 569

ECN, 570

networks, 567–572

production example, 568

queueing disciplines, 571

resource controls, 571

sockets]] and TCP buffers, 569

TCP backlog, 569

TCP congestion control, 570

Tuned Project, 572

systemd]]-analyze command, 120

systemd]] service manager, 120

Systems performance overview, 1–2

activities, 3–4

cascading failures, 5

case studies, 16–19

cloud computing, 14

complexity, 5

counters, statistics, and metrics, 8–9

experiments, 13–14

latency, 6–7

methodologies, 15–16

multiple performance issues, 6

observability, 7–13

performance challenges, 5–6

perspectives, 4–5

references, 19–20

roles, 2–3

SystemTap tool, 166

T

Tagged Command Queueing (TCQ), 437

Tahoe algorithm for TCP congestion control, 513

Tail-based sampling in distributed tracing, 199

Tail Loss Probe (TLP), 117, 512

Task capacity in USE method, 799

task tool, 141

Tasklets with interrupts, 98

Tasks

defined, 90

idle, 99

taskset command, 297

tc tool, 566

tcdump tool, 136

TC[[Malloc allocator, 322

TCP. See Transmission Control Protocol (TCP)

TCP Fast Open (TFO), 117

TCP/[[IP stack

BSD, 113

kernels, 109

protocol, 502

stack bypassing, 509

TCP segmentation offload (TSO), 521

TCP Small Queues (TSQ), 524

TCP Tail Loss Probe (TLP), 117

TCP TIME_WAIT latency, 528

tcpaccept tool, 561

tcpconnect tool, 561

tcp[[dump tool

BPF for, 12

description, 526

event tracing, 57–58

overview, 558–559

packet sniffing, 530–531

tcplife tool

BCC, 756

description, 525

overview, 548

tcpnagle tool, 561

tcpreplay tool, 567

tcpretrans tool

BCC, 756

overview, 549–550

perf-tools, 743

tcpsynbl.bt tool, 556–557

tcptop tool

BCC, 756

description, 526

top processes, 549

tcpwin tool, 561

TCQ (Tagged Command Queueing), 437

Temperature-aware scheduling classes, 243

Temperature sensors for CPUs, 230

Tenancy in cloud computing, 580

contention in hardware virtualization, 595

contention in OS virtualization, 612–613

overview, 585–586

Tensor processing units (TPUs), 241

Test errors in benchmarking]], 646–647

Text step in scientific method, 44–45

Text user interface (TUI), 697

TFO (TCP Fast Open), 117

Theoretical maximum disk throughput, 436–437

Thermal pressure in Linux kernel, 119

THP (transparent huge pages)

Linux kernel, 116

memory, 353

Thread blocks in GPUs, 240

Thread pools in USE method, 52

Thread state analysis, 193–194

Linux, 195–197

software change case study]], 19

states, 194–195

Threads

applications, 177–181

CPU time, 278–279

CPUs, 227–229

CPUs vs. GPUs, 240

defined, 90

flusher, 374

hardware, 221

idle, 99, 244

interrupts, 97–98

lightweight, 178

micro-benchmarking]], 653

processes, 100

schedulers, 105–106

SMT, 225

static priority, 242–243

USE method, 52

3-wide processors, 224

3D NAND flash memory]], 440

3D XPoint persistent memory, 441

Three-way handshakes in TCP, 511

Throttling

benchmarks, 661

hardware virtualization, 597

OS virtualization, 626

packets, 522

Throughput

applications, 173

defined, 22

disks, 424

file systems, 360

magnetic rotational disks, 436–437

networks, defined, 500

networks, measuring, 527–529

networks, monitoring, 529

performance metric, 32

resource analysis, 38

solid-state drives, 441

workload analysis, 40

Tickless kernels, 99, 117

Ticks, clock, 99

tid variable in bpftrace, 777

Time

averages over, 74

disk measurements, 427–429

event tracing, 58

kernel analysis, 202

Time-based patterns in monitoring, 77–78

Time-based utilization, 33–34

time control group]], 609

time function in bpftrace, 778

Time scales

disks, 429–430

methodologies, 25–26

Time-series metrics, 8

Time sharing for schedulers, 241

Time slices for schedulers, 242

Time to first byte (TTFB) in networks, 506

time tool for CPUs, 263–264

TIME_WAIT latency, 528

TIME_WAIT state, 512

timechart subcommand for perf, 673

Timeline]] charts, 83–84

Timer-based profile sampling, 247–248

Timer-based retransmits, 512

Timerless multitasking, 117

Timers in TCP, 511–512

Timestamps

CPU counters, 230

file systems, 371

TCP, 511

tiptop tool, 348

tiptop tool package, 132

TLBs. See Translation lookaside buffers (TLBs)

tlbstat tool

CPUs, 266–267

memory, 348

TLC (tri-level cell) flash memory]], 440

TLP (Tail Loss Probe), 117, 512

TLS (transport layer security), 113

Tools method

CPUs, 245

disks, 450

memory, 323–324

networks, 525

overview, 46

Top-[[level directories, 107

Top of file system layer, file system latency in, 385

top subcommand for perf, 673

top tool

CPUs, 245, 261–262

description, 15

file systems, 393

fixed counters, 135

hardware virtualization, 600

lightweight virtualization, 632–633

memory, 324, 336–337

OS virtualization, 619, 624

TPC (Transaction Processing Performance Council) benchmarks, 655

TPC-A benchmark, 650–651

tpoint tool, 744

TPUs (tensor processing units), 241

trace-cmd front end, 132

documentation, 740

function_graph, 739

KernelShark, 739–740

one-liners, 736–737

overview, 734

vs. perf, 738–739

subcommands overview, 734–736

trace file, 710, 713–715

trace_options file, 710

trace_pipe file, 710, 715

Trace scripts, 698, 700–701

trace_stat directory, 710

trace subcommand for perf, 673, 701–702

trace tool, 757–758

tracefs file system, 149–150

contents, 709–711

overview, 708–709

tracepoint probes, 774

Tracepoints

arguments and format string, 148–149

description, 11

documentation, 150–151

events in perf, 681, 684–685

example, 147–148

filters, 717–718

interface, 149–150

Linux kernel, 116

overhead, 150

overview, 146

triggers, 718

tracepoints tracer, 707

traceroute tool, 563–564

Tracing

BPF, 12–13

bpftrace. See bpftrace tool

case study]], 790–792

distributed, 199

dynamic instrumentation, 12

events. See Event tracing

Ftrace. See Ftrace tool

locks, 212–213

observability tools, 136

OS virtualization, 620, 624–625, 629

perf, 676–678

perf-tools for, 745

schedulers, 189–190

sockets]], 552–555

software, 275–276

static instrumentation, 11–12

strace, 136, 205–207

tools, 166

trace-cmd. See trace-cmd front end

virtual file system, 405–406

tracing_on file, 710

Trade-off]]s in methodologies, 26–27

Traffic control utility in networks, 566

Transaction costs of latency, 385–386

Transaction groups (TXGs) in ZFS, 381

Transaction Processing Performance Council (TPC) benchmarks, 655

Translation lookaside buffers (TLBs)

cache statistics, 266–267

CPUs, 232

flushing, 121

memory, 314–315

MMU, 235

shootdowns, 367

Translation storage buffers (TSBs), 235

Transmission Control Protocol (TCP)

analysis, 531

anti-bufferbloat, 117

autocorking, 117

backlog, tuning, 569

buffers, 520, 569

congestion algorithms, 115

congestion avoidance, 508

congestion control, 118, 513, 570

connection latency, 24, 506, 528

connection queues, 519–520

connection rate, 527–529

duplicate ACK detection, 512

features, 510–511

first-byte latency, 528

friends, 509

initial window, 514

Large Receive Offload, 116

lockless listener, 118

New Vegas, 118

offload in packet size, 505

out-of-order packets, 529

retransmits, 117, 512, 528–529

SACK, FACK, and RACK, 514

states and timers, 511–512

three-way handshakes, 511

tracing in bpftrace, 555–557

transfer time, 24–25

Transmit Packet Steering (XPS) in networks, 523

Transparent huge pages (THP)

Linux kernel, 116

memory, 353

Transport, defined, 424

Transport layer security (TLS), 113

Traps

defined, 90

synchronous interrupts, 97

Tri-level cell (TLC) flash memory]], 440

Triggers

hist. See Hist triggers

kprobes, 721–722

tracepoints, 718

uprobes, 723

Troubleshooting, benchmarking]] for, 642

TSBs (translation storage buffers), 235

tshark tool, 559

TSO (TCP segmentation offload), 521

TSQ (TCP Small Queues), 524

TTFB (time to first byte) in networks, 506

TUI (text user interface), 697

Tunable parameters

disks, 494

memory, 350–351

micro-benchmarking]], 390

networks, 567

operating systems, 493–495

point-in-[[time recommendations, 29–30

tradeoffs with, 27

tune2fs tool, 416–417

Tuned Project, 572

Tuning

benchmarking]] for, 642

caches, 60

CPUs. See CPUs tuning

disk caches, 456

disks, 493–495

file system caches, 389

file systems, 414–419

memory, 350–354

methodologies, 27–28

networks, 567–574

static performance. See Static performance tuning

targets, 27–28

turboboost tool, 245

turbostat tool, 264–265

TXGs (transaction groups) in ZFS, 381

Type 1 hypervisors, 587

Type 2 hypervisors, 587

U

uaddr function, 779

Ubuntu Linux distribution

crisis tools, 131–132

memory tunables, 350–351

sar configuration, 162

scheduler options, 295–296

UDP Generic Receive Offload (GRO), 119

UDP (User Datagram Protocol), 514

udpconnect tool, 561

UDS (Unix domain sockets]]), 509

uid variable in bpftrace, 777

UIDs (user IDs) for processes, 101

UIO (user space I/O) in kernel bypass, 523

ulimit command, 111

Ultra Path Interconnect (UPI), 236–237

UMA (uniform memory access) memory system, 311–312

UMA (universal memory allocator), 322

UMASK values in MSRs, 238–239

Unicast network transmissions, 503

UNICS (UNiplexed Information and Computing Service), 112

Unified buffer caches, 374

Uniform memory access (UMA) memory system, 311–312

Unikernels, 92, 123, 634

UNiplexed Information and Computing Service (UNICS), 112

Units of time for latency, 25

Universal memory allocator (UMA), 322

Universal Scalability Law (USL), 65–66

Unix domain sockets]] (UDS), 509

Unix kernels, 112

UnixBench benchmarks, 254

Unknown-unknowns, 37

Unrelated disk I/O, 368

unroll function, 776

UPI (Ultra Path Interconnect), 236–237

uprobe_events file, 710

uprobe profiler, 707

uprobe tool, 744

uprobes, 687–688

arguments, 154, 688–689, 723

bpftrace, 774

documentation, 155

event tracing, 722–723

example, 154

filters, 723

Ftrace, 708

interface and overload, 154–155

Linux kernel, 117

overview, 153

profiling, 723

return values, 723

triggers, 723

uptime tool

case study]], 784–785

CPUs, 245

description, 15

load averages, 255–257

OS virtualization, 619

PSI, 257–258

uretprobes, 154

usdt probes, 774

USDT (user-level static instrumentation events)

perf, 681

probes, 690–691

USDT (user-level statically defined tracing), 11, 155–156

USE method. See Utilization, saturation, and errors (USE) method

User address space in processes, 102

User allocation stacks, 345

user control group]], 609

User Datagram Protocol (UDP), 514

User IDs (UIDs) for processes, 101

User land, 90

User-level static instrumentation events (USDT)

perf, 681

probes, 690–691

User-level statically defined tracing (USDT), 11, 155–156

User modes in kernels, 93–94

User mutex in USE method, 799

User space, defined, 90

User space I/O (UIO) in kernel bypass, 523

User stacks, 103

User state in thread state analysis, 194–197

User time in CPUs, 226

username variable in bpftrace, 777

USL (Universal Scalability Law), 65–66

ustack function in bpftrace, 779

ustack variable in bpftrace, 778

usym function, 779

util-linux tool package, 131

Utilization

applications, 173, 193

CPUs, 226, 245–246, 251, 795, 797

defined, 22

disk controllers, 451

disk devices, 451

disks, 433, 452

heat maps, 288–289, 490

I/O, 798

kernels, 798

memory, 309, 324–326, 796–797

methodologies, 33–34

networks, 508–509, 526–527, 796–797

performance metric, 32

resource analysis, 38

storage, 796–797

task capacity, 799

USE method, 47–48, 51–53

user mutex, 799

Utilization, saturation, and errors (USE) method

applications, 193

benchmarking]], 661

CPUs, 245–246

disks, 450–451

functional block diagrams, 49–50

memory, 324–325

metrics, 48–51

microservices, 53

networks, 526–527

overview, 47

physical resources, 795–798

procedure, 47–48

references, 799

resource controls, 52

resource lists, 49

slow disks case study]], 17

software resources, 52, 798–799

uts control group]], 609

V

V-NAND (vertical NAND) flash memory]], 440

valgrind tool

CPUs, 286

memory, 348

Variable block [[sizes in file systems, 375

Variables in bpftrace, 770–771, 777–778

Variance

benchmarks, 647

description, 75

FlameScoped, 292–293

Variation, coefficient of, 76

vCPUs (virtual CPUs), 595

Verification of observability tool results, 167–168

Versions

applications, 172

kernel, 111–112

Vertical NAND (V-NAND) flash memory]], 440

Vertical scaling

capacity planning, 72

cloud computing, 581

VFIO (virtual function]] I/O) drivers, 523

VFS. See Virtual file system (VFS)

VFS layer, file system latency analysis in, 385

vfs_read function in bpftrace, 772–773

vfs_read tool in Ftrace, 706–707

vfscount tool, 409

vfssize tool, 409

vfsstat tool, 409

Vibration in magnetic rotational disks, 438

Virtual CPUs (vCPUs), 595

Virtual disks

defined, 424

utilization, 433

Virtual file system (VFS)

defined, 360

description, 107

interface, 373

latency, 406–408

Solaris kernel, 114

tracing, 405–406

Virtual function]] I/O (VFIO) drivers, 523

Virtual machine managers (VMMs)

cloud computing, 580

hardware virtualization, 587–605

Virtual machines (VMs)

cloud computing, 580

hardware virtualization, 587–605

programming languages, 185

Virtual memory

BSD kernel, 113

defined, 90, 304

managing, 104–105

overview, 305

size, 308

Virtual processors, 220

Virtual-to-guest physical translation, 593

Virtualization

hardware. See Hardware virtualization

OS. See OS virtualization

Visual identification of models, 62–64

Visualizations, 79

blktrace, 479

CPUs, 288–293

disks, 487–490

file systems, 410–411

flame graphs. See Flame graphs

heat maps. See Heat maps

line charts, 80–81

scatter plots, 81–82

surface plots, 84–85

timeline]] charts, 83–84

tools, 85

VMMs (virtual machine managers)

cloud computing, 580

hardware virtualization, 587–588

VMs (virtual machines)

cloud computing, 580

hardware virtualization, 587–588

programming languages, 185

vmscan tool, 348

vmstat tool, 8

CPUs, 245, 258

description, 15

disks, 487

file systems, 393

fixed counters, 134

hardware virtualization, 604

memory, 323, 329–330

OS virtualization, 619

thread state analysis, 196

VMware ESX, 589

Volume managers, 360

Volumes

defined, 360

file systems, 382–383

Voluntary kernel preemption, 110, 116

W

W-caches in CPUs, 230

Wait time

disks, 434

I/O, 427

off-CPU analysis, 191–192

wakeup tracer, 708

wakeup_rt tracer, 708

wakeuptime tool, 756

Warm caches, 37

Warmth of caches, 37

watchpoint probes, 774

Waterfall charts, 83–84

Wear leveling in solid-state drives, 441

Weekly patterns, monitoring, 79

Whetstone benchmark, 254, 653

Whys in drill-down analysis, 56

Width

flame graphs, 290–291

instruction, 224

Wildcards for probes, 768–769

Windows

DiskMon, 493

fibers, 178

hybrid kernel, 92

Hyper-V, 589

LTO and PGO, 122

microkernel, 123

portable executable format, 183

ProcMon, 207

syscall tracing, 205

TIME_WAIT, 512

word size, 310

Wireframe models, 84–85

Wireshark tool, 560

Word size

CPUs, 229

memory, 310

Work queues with interrupts, 98

Working set size (WSS)

benchmarking]], 664

memory, 310, 328, 342–343

micro-benchmarking]], 390–391, 653

Workload analysis perspectives, 4–5, 39–40

Workload characterization

benchmarking]], 662

CPUs, 246–247

disks, 452–454

file systems, 386–388

methodologies, 54

networks, 527–528

workload analysis, 39

Workload separation in file systems, 389

Workloads, defined, 22

Write amplification in solid-state drives, 440

Write-back caches

file systems, 365

on-disk, 425

virtual disks, 433

write system calls, 94

Write-through caches, 425

Write type, micro-benchmarking]] for, 390

writeback tool, 409

Writes starving reads, 448

writesync tool, 409

wss tool, 342–343

WSS (working set size)

benchmarking]], 664

memory, 310, 328, 342–343

micro-benchmarking]], 390–391, 653

X

Y

Z

zero function, 780

ZFS file system

features, 380–381

options, 418–419

pool statistics, 410

Solaris kernel, 114

zfsdist tool

BCC, 757

file systems, 399

zfsslower tool, 757

ZIO pipeline in ZFS, 381

zoneinfo tool, 142

Zones

free lists, 317

magnetic rotational disks, 437

OS virtualization, 606, 620

Solaris kernel, 114

zpool tool, 410

Fair Use Sources

Performance: Systems performance, Systems performance bibliography, Systems Performance Outline: (Systems Performance Introduction, Systems Performance Methodologies, Systems Performance Operating Systems, Systems Performance Observability Tools, Systems Performance Applications, Systems Performance CPUs, Systems Performance Memory, Systems Performance File Systems, Systems Performance Disks, Systems Performance Network, Systems Performance Cloud Computing, Systems Performance Benchmarking, Systems Performance perf, Systems Performance Ftrace, Systems Performance BPF, Systems Performance Case Study), Accuracy, Algorithmic efficiency (Big O notation), Algorithm performance, Amdahl's Law, Android performance, Application performance engineering, Async programming, Bandwidth, Bandwidth utilization, bcc, Benchmark (SPECint and SPECfp), BPF, bpftrace, Performance bottleneck (“Hotspots”), Browser performance, C performance, C++ performance, C# performance, Cache hit, Cache performance, Capacity planning, Channel capacity, Clock rate, Clojure performance, Compiler performance (Just-in-time (JIT) compilation - Ahead-of-time compilation (AOT), Compile-time, Optimizing compiler), Compression ratio, Computer performance, Concurrency, Concurrent programming, Concurrent testing, Container performance, CPU cache, CPU cooling, CPU cycle, CPU overclocking (CPU boosting, CPU multiplier), CPU performance, CPU speed, CPU throttling (Dynamic frequency scaling - Dynamic voltage scaling - Automatic underclocking), CPU time, CPU load - CPU usage - CPU utilization, Cycles per second (Hz), CUDA (Nvidia), Data transmission time, Database performance (ACID-CAP theorem, Database sharding, Cassandra performance, Kafka performance, IBM Db2 performance, MongoDB performance, MySQL performance, Oracle Database performance, PostgreSQL performance, Spark performance, SQL Server performance), Disk I/O, Disk latency, Disk performance, Disk speed, Disk usage - Disk utilization, Distributed computing performance (Fallacies of distributed computing), DNS performance, Efficiency - Relative efficiency, Encryption performance, Energy efficiency, Environmental impact, Fast, Filesystem performance, Fortran performance, FPGA, Gbps, Global Interpreter Lock - GIL, Golang performance, GPU - GPGPU, GPU performance, Hardware performance, Hardware performance testing, Hardware stress test, Haskell performance, High availability (HA), Hit ratio, IOPS - I/O operations per second, IPC - Instructions per cycle, IPS - Instructions per second, Java performance (Java data structure performance - Java ArrayList is ALWAYS faster than LinkedList, Apache JMeter), JavaScript performance (V8 JavaScript engine performance, Node.js performance - Deno performance), JVM performance (GraalVM, HotSpot), Kubernetes performance, Kotlin performance, Lag (video games) (Frame rate - Frames per second (FPS)), Lagometer, Latency, Lazy evaluation, Linux performance, Load balancing, Load testing, Logging, macOS performance, Mainframe performance, Mbps, Memory footprint, Memory speed, Memory performance, Memory usage - Memory utilization, Micro-benchmark, Microsecond, Monitoring

Linux/UNIX commands for assessing system performance include:

  • uptime the system reliability and load average
  • top for an overall system view
  • vmstat vmstat reports information about runnable or blocked processes, memory, paging, block I/O, traps, and CPU.
  • htop interactive process viewer
  • dstat, atop helps correlate all existing resource data for processes, memory, paging, block I/O, traps, and CPU activity.
  • iftop interactive network traffic viewer per interface
  • nethogs interactive network traffic viewer per process
  • iotop interactive I/O viewer
  • iostat for storage I/O statistics
  • netstat for network statistics
  • mpstat for CPU statistics
  • tload load average graph for terminal
  • xload load average graph for X
  • /proc/loadavg text file containing load average

(Event monitoring - Event log analysis, Google Cloud's operations suite (formerly Stackdriver), htop, mpstat, macOS Activity Monitor, Nagios Core, Network monitoring, netstat-iproute2, proc filesystem (procfs)]] - ps (Unix), System monitor, sar (Unix) - systat (BSD), top - top (table of processes), vmstat), Moore’s law, Multicore - Multi-core processor, Multiprocessor, Multithreading, mutex, Network capacity, Network congestion, Network I/O, Network latency (Network delay, End-to-end delay, packet loss, ping - ping (networking utility) (Packet InterNet Groper) - traceroute - netsniff-ng, Round-trip delay (RTD) - Round-trip time (RTT)), Network performance, Network switch performance, Network usage - Network utilization, NIC performance, NVMe, NVMe performance, Observability, Operating system performance, Optimization (Donald Knuth: “Premature optimization is the root of all evil), Parallel processing, Parallel programming (Embarrassingly parallel), Perceived performance, Performance analysis (Profiling), Performance design, Performance engineer, Performance equation, Performance evaluation, Performance gains, Performance Mantras, Performance measurement (Quantifying performance, Performance metrics), Perfmon, Performance testing, Performance tuning, PowerShell performance, Power consumption - Performance per watt, Processing power, Processing speed, Productivity, Python performance (CPython performance, PyPy performance - PyPy JIT), Quality of service (QOS) performance, Refactoring, Reliability, Response time, Resource usage - Resource utilization, Router performance (Processing delay - Queuing delay), Ruby performance, Rust performance, Scala performance, Scalability, Scalability test, Server performance, Size and weight, Slow, Software performance, Software performance testing, Speed, Stress testing, SSD, SSD performance, Swift performance, Supercomputing, Tbps, Throughput, Time (Time units, Nanosecond, Millisecond, Frequency (rate), Startup time delay - Warm-up time, Execution time), TPU - Tensor processing unit, Tracing, Transistor count, TypeScript performance, Virtual memory performance (Thrashing), Volume testing, WebAssembly, Web framework performance, Web performance, Windows performance (Windows Performance Monitor). (navbar_performance)


© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers

SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.


systems_performance_2nd_edition_by_brendan_gregg_index.txt · Last modified: 2024/04/28 03:36 (external edit)