art_of_concurrency_index

The Art of Concurrency - Index

A

abundancy of a number, Applying MapReduce, Applying MapReduce

calculation of, Applying MapReduce

adjacency matrix, Graph Algorithms, Glossary

algorithms with state, Algorithms with State

all-pairs shortest path algorithm, All-Pairs Shortest Path, Alternatives to Floyd's Algorithm, All-Pairs Shortest Path, Scalability, All-Pairs Shortest Path, All-Pairs Shortest Path, What About the Data Race on the kth Row?, Design Factor Scorecard, Alternatives to Floyd's Algorithm

Dijkstra's Algorithm, Alternatives to Floyd's Algorithm

Floyd's Algorithm, All-Pairs Shortest Path, Scalability, All-Pairs Shortest Path, All-Pairs Shortest Path, What About the Data Race on the kth Row?, Design Factor Scorecard

concurrent implementation using TBB, All-Pairs Shortest Path

data race in, What About the Data Race on the kth Row?

design factor scorecard, Design Factor Scorecard

serial implementation, All-Pairs Shortest Path

Amdahl's Law, Amdahl's Law, Amdahl's Law

criticisms of, Amdahl's Law

analysis, identifying possible concurrency, Step 1. Analysis: Identify Possible Concurrency

array packing, The ArrayPack() function, The ArrayPack() function

ArrayPack( ) function (example), The ArrayPack() function

with prefix scan, The ArrayPack() function

arrays, Data Decomposition, How should you divide the data into chunks?, Recurrences, Parallel Sum and Prefix Scan, Handcoded Reduction, Bubblesort

decomposition examples, How should you divide the data into chunks?

decomposition of, Data Decomposition

recurrence relation on array access, Recurrences

sorting, Bubblesort

summing elements, Parallel Sum and Prefix Scan (see parallel sum algorithms)

summing elements using reduction code, Handcoded Reduction

asynchronous, Glossary

atomic, Glossary

atomic construct, Trying to Push the Concurrency Higher

atomic construct (OpenMP), OpenMP

atomic statements, Verification of Parallel Algorithms, Verification of Parallel Algorithms

interleavings of, from two or more threads, Verification of Parallel Algorithms

B

barrier objects, Handcoded Reduction, A Barrier Object Implementation, A Barrier Object Implementation, The Concurrent Straight Radix Sort Solution, Glossary

for concurrent straight radix sort, The Concurrent Straight Radix Sort Solution

implementation for Pthreads, A Barrier Object Implementation, A Barrier Object Implementation

benign data race, A Concurrent Code for Odd-Even Transposition Sort, Glossary

Beowulf Project, Distributed-Memory Programming

binary keys in radix sorts, Radix Sort

binary search, Binary Search, Binary Search, Scalability, Binary Search, But First, a Serial Version, At Last, the Concurrent Solution, Design Factor Scorecard

concurrent N-ary search algorithm, OpenMP implementation, At Last, the Concurrent Solution

design factor scorecard for N-ary concurrent search algorithm, Design Factor Scorecard

iterative version, binary search algorithm, Binary Search

N-ary search example with four threads, Binary Search

serial version of N-ary search algorithm, But First, a Serial Version

binary tree, complete, Glossary

body class (TBB), Intel Threading Building Blocks

boss/worker algorithm, Producer/consumer, How are the tasks assigned to threads?

task scheduling, How are the tasks assigned to threads?

bottom-up threading, Rule 2: Implement Concurrency at the Highest Level Possible

breadth-first search, Breadth-First Search

Breshears's Fundamental Law of Sorting, Iterative Quicksort

Bubblesort algorithm, Bubblesort, Scalability, Bubblesort, Bubblesort, Will It Work?, Design Factor Scorecard

serial version, Bubblesort

threaded version, Bubblesort, Will It Work?, Design Factor Scorecard

design factor scorecard, Design Factor Scorecard

interleaving cases, Will It Work?

bus overload, Glossary

C

C language, Who Is This Book For?

cache, Theoretical Models

levels of, hierarchies added to memory, Theoretical Models

cache prefetching, Glossary

call graph analysis, Profiling

chunks, Glossary

circular wait condition of deadlock, Third Attempt

clusters of PCs, used to create distributed-memory parallel platform, Distributed-Memory Programming

coarse-grained, What are the tasks and how are they defined?

compiler pragma, Glossary

compilers, support of OpenMP, OpenMP

complete binary tree, Glossary

concurrency, Parallelism and Concurrency: What's the Difference?, Step 1. Analysis: Identify Possible Concurrency, Glossary

defined, Parallelism and Concurrency: What's the Difference?, Glossary

identifying code parts for threading, Step 1. Analysis: Identify Possible Concurrency

concurrent programming, Why Should You Read This Book?, Isn't Concurrent Programming Hard?, Aren't Threads Dangerous?, This Book's Approach to Concurrent Programming, Design Models for Concurrent Algorithms, Concurrent Design Models Wrap-Up, What's Not Parallel, Domain-Specific Libraries

approach of this book, This Book's Approach to Concurrent Programming

code that can't be parallel, What's Not Parallel

design factor scorecard, Concurrent Design Models Wrap-Up

domain-specific libraries, Domain-Specific Libraries

level of difficulty, Isn't Concurrent Programming Hard?

primer on, Aren't Threads Dangerous?

task decomposition design model, Design Models for Concurrent Algorithms

concurrent_queue container (TBB), Finding work for threads, Giving threads their pink slips

condition variables (Pthreads), Pthreads

conditional expression evaluation, locking, Locking a conditional expression evaluation

CONDITION_VARIABLE objects, A Barrier Object Implementation

connected components, Alternatives to Floyd's Algorithm, Glossary

connected graphs, Graph Algorithms, Minimum Spanning Tree, Glossary

spanning tree, Minimum Spanning Tree

consumer threads, Producer/consumer

containers (TBB), Intel Threading Building Blocks

CountAndMark class (example), Counting and marking elements for partitions

critical path, Profiling

critical path analysis, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler

critical regions, Example: The Critical Section Problem, How many locks do we need?, Glossary

defined, Glossary

size of, locks and, How many locks do we need?

critical section problem (example), Example: The Critical Section Problem, First Attempt, Second Attempt, Third Attempt, Fourth Attempt, Dekker's Algorithm

algorithm to enforce mutual exclusion, Dekker's Algorithm, Dekker's Algorithm

algorithm to enforce mutual exclusion, first attempt, First Attempt

algorithm to enforce mutual exclusion, fourth attempt, Fourth Attempt

algorithm to enforce mutual exclusion, second attempt, Second Attempt

algorithm to enforce mutual exclusion, third attempt, Third Attempt

critical sections, Glossary

CRITICAL_SECTION objects, Windows Threads, Bubblesort

D

data decomposition, Design Models for Concurrent Algorithms, Data Decomposition, Data Decomposition, How can you ensure that the tasks for each chunk have access to all data required for updates?, Example: Game of Life on a finite grid, Glossary

defined, Glossary

Game of Life example, Example: Game of Life on a finite grid

key design considerations, Data Decomposition

updates of elements in a chunk, How can you ensure that the tasks for each chunk have access to all data required for updates?

data dependency, What are the dependencies between tasks and how can they be satisfied?

data flow parallelism, Glossary

data races, Aren't Threads Dangerous?, Step 3. Test for Correctness: Detecting and Fixing Threading Errors, A Concurrent Code for Odd-Even Transposition Sort, Glossary

benign, A Concurrent Code for Odd-Even Transposition Sort, Glossary

detecting and fixing, Step 3. Test for Correctness: Detecting and Fixing Threading Errors

dbx debugger, Thread-Aware Debugger

deadlocks, Third Attempt, Bubblesort, Glossary

defined, Glossary

livelock versus, Bubblesort

necessary conditions for, Third Attempt

debuggers, Debuggers, Thread-Aware Debugger, Thread Issue Debugger: Thread Checker

thread issue, Thread Checker debugger, Thread Issue Debugger: Thread Checker

thread-aware, Thread-Aware Debugger

declarations, local, and thread-local storage, Local declarations and thread-local storage

Dekker's Algorithm, Example: The Critical Section Problem, Dekker's Algorithm, What Did You Learn?

using to solve critical section problem, Dekker's Algorithm, What Did You Learn?

dependencies, Task Decomposition, What are the dependencies between tasks and how can they be satisfied?, What are the dependencies between tasks and how can they be satisfied?, What are the dependencies between tasks and how can they be satisfied?, Example: Game of Life on a finite grid, Recurrences, Induction Variables, Reduction, Loop-Carried Dependence, Glossary, Glossary, Glossary

code without, What are the dependencies between tasks and how can they be satisfied?

data decomposition, Game of Life example, Example: Game of Life on a finite grid

data dependency, What are the dependencies between tasks and how can they be satisfied?

defined, Glossary

loop-carried dependence, Loop-Carried Dependence

order dependency, What are the dependencies between tasks and how can they be satisfied?

removable, Recurrences, Induction Variables, Glossary

separable, Reduction, Glossary

depth-first search, Depth-First Search, Scalability, Depth-First Search, A Recursive Solution, An Iterative Solution, Not the Concurrent Solution, Yet, How many locks do we need?, Locking a conditional expression evaluation, Now for the Concurrent Solution, A little interleaving analysis, Spawning the depth-first search threads, Design Factor Scorecard

concurrent implementation, Now for the Concurrent Solution, A little interleaving analysis, Spawning the depth-first search threads

interleaving analysis, A little interleaving analysis

spawning depth-first search threads, Spawning the depth-first search threads

design factor scorecard for concurrent version, Design Factor Scorecard

example, Depth-First Search

iterative version, An Iterative Solution

recursive serial implementation, A Recursive Solution

transforming iterative version into concurrent version, Not the Concurrent Solution, Yet, How many locks do we need?, Locking a conditional expression evaluation

locking conditional expression evaluation, Locking a conditional expression evaluation

number of locks, How many locks do we need?

design factor scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard, Design Factor Scorecard

Bubblesort algorithm, Design Factor Scorecard

for concurrent N-ary search algorithm, Design Factor Scorecard

concurrent minimum spanning tree algorithms, Design Factor Scorecard

for depth-first search concurrent version, Design Factor Scorecard

for Floyd's Algorithm concurrent version, Design Factor Scorecard

for linear search algorithm, Design Factor Scorecard

odd-even transposition sort, concurrent versions, Design Factor Scorecard

parallel sum algorithm, Design Factor Scorecard

prefix scan, Design Factor Scorecard

Quicksort algorithm, final concurrent version, Design Factor Scorecard

for radix sorts, Design Factor Scorecard

reduce operation code, Design Factor Scorecard

Shellsort algorithms, Design Factor Scorecard

design rules for multithreaded applications, Eight Simple Rules for Designing Multithreaded Applications (see multithreaded applications, design rules)

deterministic, Glossary

Dijkstra's Algorithm, Alternatives to Floyd's Algorithm

directed graphs, Graph Algorithms, Glossary

directives (OpenMP), OpenMP

discrete optimization problems, Depth-First Search

distributed radix sort algorithm, Keeping data movement stable

distributed-memory model, Glossary

distributed-memory programming, Distributed-Memory Programming, Common Features

features in common with shared-memory programming, Common Features

distributed-memory version, odd-even transposition sort, Portability

distributed-memory version, Shellsort algorithm, Portability

double-phase, concurrent implementation, odd-even transposition sort, Keeping threads awake longer without caffeine, Efficiency

design factor scorecard, Efficiency

dynamic allocation, Static/dynamic allocation of work, Glossary

dynamic graphs, Static Graphs Versus Dynamic Graphs

dynamic scheduling of tasks, How are the tasks assigned to threads?

E

edge weight, Glossary

edges, graph, Graph Algorithms, Glossary

efficiency, Efficiency, Efficiency, Glossary

prefix scan algorithm, Efficiency

efficiency (speedup), One Final Note on Speedup and Efficiency, Glossary

enchantingly parallel, What are the dependencies between tasks and how can they be satisfied?, Glossary

environment variables (OpenMP), OpenMP

errors, Isn't Concurrent Programming Hard?, Step 3. Test for Correctness: Detecting and Fixing Threading Errors

in concurrent programming, Isn't Concurrent Programming Hard?

detecting and fixing threading errors, Step 3. Test for Correctness: Detecting and Fixing Threading Errors

events (Windows), Windows Threads

exchange of data, How should you divide the data into chunks?

exclusive prefix scan, Prefix Scan, Glossary

execution, Rule 6: Never Assume a Particular Order of Execution, Glossary

order of threads, Rule 6: Never Assume a Particular Order of Execution

out-of-order, Glossary

execution stream, Glossary

explicit threading libraries, Explicit Threading, Windows Threads, Pthreads, Windows Threads

Pthreads, Pthreads

Windows Threads, Windows Threads

F

fairness property, Verification of Parallel Algorithms

false sharing, Memory effects, Glossary

fetch_and_add( ) operation (TBB), Portability

FindMedians class (example), The Concurrent Algorithm, Finding the medians of subsequences

fine-grained, What are the tasks and how are they defined?

floating-point division, Handcoded Reduction

Floyd's Algorithm, All-Pairs Shortest Path, Scalability, All-Pairs Shortest Path, All-Pairs Shortest Path, What About the Data Race on the kth Row?, Design Factor Scorecard

concurrent implementation using TBB, All-Pairs Shortest Path

data race in, What About the Data Race on the kth Row?

design factor scorecard for concurrent version, Design Factor Scorecard

serial implementation, All-Pairs Shortest Path

fork-join parallelism, Glossary

friendly numbers, Applying MapReduce, Applying MapReduce, Friendly Numbers Example Summary

solution to finding, Applying MapReduce, Friendly Numbers Example Summary

G

game trees, Depth-First Search, Static Graphs Versus Dynamic Graphs, Glossary

search algorithms for, Static Graphs Versus Dynamic Graphs

GCD (greatest common divisor), calculating, Applying MapReduce

gdb debugger, Thread-Aware Debugger

geometric decomposition, Data Decomposition

ghost cells, Glossary

gprof profiling tool, Profiling

granularity, What are the tasks and how are they defined?, How should you divide the data into chunks?, Glossary

considering in data decomposition, How should you divide the data into chunks?

defined, Glossary

graph algorithms, Graph Algorithms, Scalability, Depth-First Search, Scalability, Breadth-First Search, All-Pairs Shortest Path, Alternatives to Floyd's Algorithm, Minimum Spanning Tree, Scalability

all-pairs shortest path, All-Pairs Shortest Path, Alternatives to Floyd's Algorithm

breadth-first search, Breadth-First Search

depth-first search, Depth-First Search, Scalability

minimum spanning tree, Minimum Spanning Tree, Scalability

graphs, Static Graphs Versus Dynamic Graphs, All-Pairs Shortest Path, Glossary

defined, Glossary

map as instance, All-Pairs Shortest Path

static versus dynamic, Static Graphs Versus Dynamic Graphs

gray code, Glossary

greatest common divisor (GCD), calculating, Applying MapReduce

Gustafson-Barsis's Law, Gustafson-Barsis's Law, Glossary

H

handle of a thread, Pthreads, Windows Threads

HANDLE objects in Windows Threads, Windows Threads

Pthreads, Pthreads

hardware support for parallelism, evolution of, Review of the Evolution for Supporting Parallelism in Hardware

Heisenberg Uncertainty Principle, Isn't Concurrent Programming Hard?, Glossary

helper functions, Curtailing the Search, Concurrent Version of Prim's Algorithm, Glossary

concurrent linear search, Curtailing the Search

concurrent version of Prim's Algorithm, Concurrent Version of Prim's Algorithm

defined, Glossary

hold and wait condition of deadlock, Third Attempt

hotspots, Step 1. Analysis: Identify Possible Concurrency, Recurrences, Rule 2: Implement Concurrency at the Highest Level Possible, Profiling

defined, Step 1. Analysis: Identify Possible Concurrency

finding via profiling, Profiling

placing concurrency at highest level around, Rule 2: Implement Concurrency at the Highest Level Possible

recurrences and, Recurrences

Hyper-Threading (HT) technology, One Final Note on Speedup and Efficiency

I

idb debugger, Thread-Aware Debugger

implicit threading libraries, Implicit Threading, Intel Threading Building Blocks, OpenMP, Intel Threading Building Blocks

Intel Threading Building Blocks (TBB), Intel Threading Building Blocks

OpenMP, OpenMP

inclusive prefix scan, Prefix Scan, Glossary

indefinite postponement, What about indefinite postponement?

induction variables, concurrent execution and, Induction Variables

insertion sort algorithm, Quick Review of Insertion Sort

Intel, Four Steps of a Threading Methodology, How are the tasks assigned to threads?, One Final Note on Speedup and Efficiency, Intel Threading Building Blocks, Intel Threading Building Blocks, Domain-Specific Libraries, Domain-Specific Libraries, Thread-Aware Debugger, Thread Issue Debugger: Thread Checker, Profiling, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler, Anything Else Out There?, Glossary

(see also TBB)

Hyper-Threading (HT) technology, One Final Note on Speedup and Efficiency

idb debugger, Thread-Aware Debugger

Integrated Performance Primitives (IPP), Domain-Specific Libraries

Math Kernel Library (MKL), Domain-Specific Libraries

Parallel Studio components, Anything Else Out There?

Thread Checker debugger, Thread Issue Debugger: Thread Checker

Thread Profiler, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler

Threading Building Blocks (TBB), How are the tasks assigned to threads?, Intel Threading Building Blocks, Glossary

threading methodology, Four Steps of a Threading Methodology

VTune Performance Analyzer, Profiling

intercepted wait, A Barrier Object Implementation

interleavings of atomic statements from threads, Verification of Parallel Algorithms, Will It Work?, A little interleaving analysis, What About the Data Race on the kth Row?

Bubblesort algorithm, Will It Work?

concurrent depth-first search, A little interleaving analysis

Floyd's Algorithm, What About the Data Race on the kth Row?

InterlockedCompareExchange( ) function, Now for the Concurrent Solution, S[[implicity]]

InterlockedIncrement( ) function, Final Threaded Version

IPP (Integrated Performance Primitives) library, Domain-Specific Libraries

iterative searches, Binary Search, An Iterative Solution

binary search algorithm, Binary Search

depth-first search algorithm, An Iterative Solution

iterative sorts, Concurrency Within an Iterative Version, Giving threads their pink slips, Radix Exchange Sort

Quicksort algorithm, Concurrency Within an Iterative Version, Giving threads their pink slips

radix exchange sort, Radix Exchange Sort

K

Kruskal's Algorithm, Kruskal's Algorithm

L

LAPACK library, Domain-Specific Libraries

LEG class (example), Counting and marking elements for partitions

length of a path, Graph Algorithms

lifecycle of software, Four Steps of a Threading Methodology, Step 1. Analysis: Identify Possible Concurrency

analysis step, Step 1. Analysis: Identify Possible Concurrency

linear search algorithm, Unsorted Sequence, Curtailing the Search, Curtailing the Search, Curtailing the Search, Design Factor Scorecard

concurrent version, code to create threads and examine search results, Curtailing the Search

concurrent version, global declarations, Curtailing the Search

concurrent version, linear search and helper functions, Curtailing the Search

design factor scorecard for concurrent version, Design Factor Scorecard

OpenMP version, Unsorted Sequence

linear search code for unsorted data, Unsorted Sequence

linear speedup, Glossary

Linux, Thread-Aware Debugger, Profiling

profiling tool, gprof, Profiling

thread-aware debuggers, Thread-Aware Debugger

list structures, data decomposition and, Data Decomposition

lists, selection of element from, Selection, Some Design Notes

livelock, Bubblesort, Glossary

defined, Glossary

load balance, Glossary

load balancing, Data Decomposition, Implementing a Concurrent Map

data decomposition and, Data Decomposition

issues in map computations, Implementing a Concurrent Map

local variables, Glossary

locks, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data, How many locks do we need?, Locking a conditional expression evaluation, Now for the Concurrent Solution

association to specific data, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data

determining number in concurrent depth-first search, How many locks do we need?

locking conditional expression evaluation, Locking a conditional expression evaluation

Windows Threads, concurrent depth-first search, Now for the Concurrent Solution

loop-carried dependence, concurrent execution and, Loop-Carried Dependence

loops, OpenMP, Intel Threading Building Blocks, Handcoded Reduction, Applying MapReduce, Unsorted Sequence, At Last, the Concurrent Solution

parallelization of iterations with TBB, Intel Threading Building Blocks

static scheduling of iterations, Handcoded Reduction

worksharing construct in OpenMP, OpenMP, Applying MapReduce, Unsorted Sequence, At Last, the Concurrent Solution

concurrent N-ary search algorithm, At Last, the Concurrent Solution

exit points and, Unsorted Sequence

using in MapReduce solution, Applying MapReduce

M

map operation, MapReduce

MapReduce framework, MapReduce, MapReduce As Generic Concurrency, MapReduce, Map As a Concurrent Operation, Reduce As a Concurrent Operation, Applying MapReduce, Friendly Numbers Example Summary, Applying MapReduce, Friendly Numbers Example Summary, MapReduce As Generic Concurrency

applying, Applying MapReduce, Friendly Numbers Example Summary, Applying MapReduce, Friendly Numbers Example Summary

solution to finding friendly numbers, Applying MapReduce, Friendly Numbers Example Summary

as generic concurrency, MapReduce As Generic Concurrency

map as concurrent operation, Map As a Concurrent Operation

pseudocode example, MapReduce

reduce as concurrent operation, Reduce As a Concurrent Operation

maps, as instances of a graph, All-Pairs Shortest Path

Math Kernel Library (MKL), Domain-Specific Libraries

matrix multiplication algorithm, Alternatives to Floyd's Algorithm

median of a data set, The Serial Algorithm, The Concurrent Algorithm

finding medians of subsequences, The Concurrent Algorithm

memory, Theoretical Models, Theoretical Models, Distributed-Memory Programming, Distributed-Memory Programming, Memory effects, Communication in memory

access patterns, PRAM variations based on, Theoretical Models

communication in, Communication in memory

distributed-memory configurations, Distributed-Memory Programming

effects of sharing, Memory effects

shared-memory parallel computers, Distributed-Memory Programming

thread access to, Theoretical Models

message-passing, Glossary

minimum element, index of, Concurrent Version of Prim's Algorithm

minimum spanning tree algorithms, Minimum Spanning Tree, Scalability, Kruskal's Algorithm, Prim's Algorithm, Which Serial Algorithm Should We Start With?, Concurrent Version of Prim's Algorithm, Design Factor Scorecard

design factor scorecard for concurrent versions, Design Factor Scorecard

Kruskal's Algorithm, Kruskal's Algorithm

Prim's Algorithm, Prim's Algorithm, Which Serial Algorithm Should We Start With?, Concurrent Version of Prim's Algorithm

concurrent version, Concurrent Version of Prim's Algorithm

serial implementation, Which Serial Algorithm Should We Start With?

MKL (Math Kernel Library), Domain-Specific Libraries

modulo locks, How many locks do we need?, Locking a conditional expression evaluation, Glossary

defined, Glossary

protecting read and write of variable in conditional expression, Locking a conditional expression evaluation

monotonically decreasing amounts of work among threads, Applying MapReduce

multicore processors, Why Should You Read This Book?, Review of the Evolution for Supporting Parallelism in Hardware, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores

increasing numbers of cores, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores

processor and platform innovations leading to, Review of the Evolution for Supporting Parallelism in Hardware

multithreaded applications, design rules, Eight Simple Rules for Designing Multithreaded Applications, Rule 1: Identify Truly Independent Computations, Rule 2: Implement Concurrency at the Highest Level Possible, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores, Rule 4: Make Use of Thread-Safe Libraries Wherever Possible, Rule 5: Use the Right Threading Model, Rule 6: Never Assume a Particular Order of Execution, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data, Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency

changing algorithm for better concurrency, Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency

identifying independent computations, Rule 1: Identify Truly Independent Computations

implementing concurrency at highest level possible, Rule 2: Implement Concurrency at the Highest Level Possible

not assuming particular order of execution, Rule 6: Never Assume a Particular Order of Execution

planning scalability for increasing numbers of cores, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores

using right threading model, Rule 5: Use the Right Threading Model

using thread-local storage and locks, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data

using thread-safe libraries, Rule 4: Make Use of Thread-Safe Libraries Wherever Possible

multithreaded programming, Want to Go Faster? Raise Your Hands if You Want to Go Faster!

(see also concurrent programming)

mutex objects, Intel Threading Building Blocks, Pthreads, Windows Threads, Glossary

defined, Glossary

Pthread, Pthreads

TBB, Intel Threading Building Blocks

Windows Threads, Windows Threads

mutual exclusion, Mutual exclusion, Glossary

mutual exclusion condition of deadlock, Third Attempt

N

no preemption condition of deadlock, Third Attempt

nodes, graph, Graph Algorithms, Graph Algorithms, Graph Algorithms, Depth-First Search, Minimum Spanning Tree

paths, Graph Algorithms

tree graphs, Minimum Spanning Tree

visiting in depth-first search, Depth-First Search

weight of, Graph Algorithms

nondeterministic, Rule 6: Never Assume a Particular Order of Execution, Glossary

numbers, friendly, Applying MapReduce (see friendly numbers)

numerical integration code example, Example: numerical integration

O

odd-even transposition sort, Odd-Even Transposition Sort, Scalability, Odd-Even Transposition Sort, A Concurrent Code for Odd-Even Transposition Sort, Trying to Push the Concurrency Higher, Trying to Push the Concurrency Higher, Keeping threads awake longer without caffeine, Design Factor Scorecard

concurrent code for, A Concurrent Code for Odd-Even Transposition Sort

concurrent double-phase implementation, Keeping threads awake longer without caffeine

concurrent version, pushing concurrency higher, Trying to Push the Concurrency Higher

concurrent version, while loop changes, Trying to Push the Concurrency Higher

design factor scorecard for concurrent versions, Design Factor Scorecard

serial code for, Odd-Even Transposition Sort

OpenMP, What are the dependencies between tasks and how can they be satisfied?, Rule 5: Use the Right Threading Model, OpenMP, OpenMP, OpenMP, OpenMP, A More Practical Algorithm, Portability, Applying MapReduce, Friendly Numbers Example Summary, A Concurrent Code for Odd-Even Transposition Sort, A Concurrent Code for Odd-Even Transposition Sort, Trying to Push the Concurrency Higher, Concurrent Shellsort, Concurrent iterative version, Portability, Keeping data movement stable, Portability, Unsorted Sequence, At Last, the Concurrent Solution, Concurrent Version of Prim's Algorithm, Concurrent Version of Prim's Algorithm, Glossary

barrier construct, Keeping data movement stable

barriers, Portability

code example, computing pi with numerical integration, OpenMP

concurrent N-ary search algorithm, At Last, the Concurrent Solution

concurrent straight radix sorts and, Portability

defined, Glossary

definition and description of, OpenMP

linear search algorithm, Unsorted Sequence

loop worksharing construct in concurrent minimum spanning tree, Concurrent Version of Prim's Algorithm

MapReduce solution for finding friendly numbers, Applying MapReduce, Friendly Numbers Example Summary

odd-even transposition sort, concurrent version, A Concurrent Code for Odd-Even Transposition Sort, A Concurrent Code for Odd-Even Transposition Sort, Trying to Push the Concurrency Higher

pushing concurrency higher, Trying to Push the Concurrency Higher

Prim's Algorithm concurrent implementation and, Concurrent Version of Prim's Algorithm

private clause, What are the dependencies between tasks and how can they be satisfied?

Quicksort algorithm implementation, Portability

reduction clause, using in parallel sum, A More Practical Algorithm

Shellsort algorithm, concurrent version, Concurrent Shellsort

specification document, OpenMP

task concurrency in version 3.0, OpenMP

teams of threads, Concurrent iterative version

order dependency, What are the dependencies between tasks and how can they be satisfied?

order of execution, not assuming particular order, Rule 6: Never Assume a Particular Order of Execution

out-of-order execution, Glossary

overhead, Aren't Threads Dangerous?, Amdahl's Law, Glossary

of concurrent execution, Aren't Threads Dangerous?

defined, Glossary

inherent in concurrent algorithms, Amdahl's Law

P

PackingMove class (example), The ArrayPack() function

PackingScan class (example), The ArrayPack() function

parallel, Parallelism and Concurrency: What's the Difference?, Glossary

parallel algorithms, Backg[[round of Parallel Algorithms, Verification of Parallel Algorithms, There Are No Evil Threads, Just Threads Programmed for Evil, Verification of Parallel Algorithms, Intel Threading Building Blocks

generic, in TBB library, Intel Threading Building Blocks

verifying, Verification of Parallel Algorithms, There Are No Evil Threads, Just Threads Programmed for Evil, Verification of Parallel Algorithms

steps in, Verification of Parallel Algorithms

parallel construct (OpenMP), OpenMP

Parallel Random Access Machine, Theoretical Models (see PRAM)

parallel region, Glossary

Parallel Studio tool, Anything Else Out There?

parallel sum, Parallel Sum and Prefix Scan, Glossary

parallel sum algorithms, Parallel Sum, Scalability, PRAM Algorithm, A More Practical Algorithm, A More Practical Algorithm, Design Factor Scorecard, Counting and marking elements for partitions

CountAndMark class (example), Counting and marking elements for partitions

design factor scorecard, Design Factor Scorecard

PRAM algorithm, PRAM Algorithm

using OpenMP reduction clause, A More Practical Algorithm

using POSIX threads and global partial sum storage, A More Practical Algorithm

parallelism, Parallelism and Concurrency: What's the Difference?, Shared-Memory Programming Versus Distributed-Memory Programming

concurrency versus, Parallelism and Concurrency: What's the Difference?

shared-memory and distributed-memory, Shared-Memory Programming Versus Distributed-Memory Programming

parallelization, Parallelism and Concurrency: What's the Difference?

ParallelSelect( ) function (example), The Concurrent Algorithm

parallel_for algorithm (TBB), Intel Threading Building Blocks, The Concurrent Algorithm

parallel_reduce algorithm (TBB), Intel Threading Building Blocks, Concurrent Version of Prim's Algorithm

parallel_scan algorithm (TBB), Prefix Scan, The Concurrent Algorithm

using in parallel select algorithm, The Concurrent Algorithm

partitioner class (TBB), Intel Threading Building Blocks

partitioning, in Quicksort algorithm, Quicksort, Quicksort

partitions, Counting and marking elements for partitions, Radix Exchange Sort, Radix Exchange Sort

counting and marking elements for, Counting and marking elements for partitions

radix exchange sort, Radix Exchange Sort, Radix Exchange Sort

path length, Glossary

paths (in graphs), Graph Algorithms, Glossary

Patterns for Parallel Programming, Design Models for Concurrent Algorithms

perfect speedup, Glossary

performance metrics, Performance Metrics (How Am I Doing?), Speedup, Gustafson-Barsis's Law, Efficiency, One Final Note on Speedup and Efficiency

efficiency, Efficiency

speedup, Speedup, Gustafson-Barsis's Law

using multiple threads per core, One Final Note on Speedup and Efficiency

performance tools, Performance Tools, Profiling

profiling, Profiling

performance tuning, Step 4. Tune for Performance: Removing Performance Bottlenecks

PLAPACK library, Domain-Specific Libraries

POSIX threads, Local declarations and thread-local storage, Explicit Threading, Glossary

(see also Pthreads)

defined, Glossary

thread-local storage (TLS) API, Local declarations and thread-local storage

pragmas (OpenMP), OpenMP

PRAM (Parallel Random Access Machine), Theoretical Models, Theoretical Models, Glossary

variations based on memory access patterns, Theoretical Models

PRAM algorithms, PRAM Algorithm, PRAM Algorithm, A Final Thought

adaptation of, A Final Thought

for parallel sum, PRAM Algorithm

for prefix scan, PRAM Algorithm

prefix scan, Prefix Scan, Scalability, Prefix Scan, Prefix Scan, PRAM Algorithm, PRAM Algorithm, A More Practical Algorithm, Design Factor Scorecard, Efficiency, The ArrayPack() function, Using prefix scan to gather keys, Keeping data movement stable, Glossary, Glossary, Glossary

array packing with, The ArrayPack() function

defined, Glossary

design factor scorecard, Design Factor Scorecard

exclusive, Glossary

exclusive prefix scan in serial, Efficiency

gathering keys in straight radix sort, Using prefix scan to gather keys

inclusive, Glossary

inclusive and exclusive, Prefix Scan

PRAM algorithm for (code example), PRAM Algorithm

PRAM computation for, PRAM Algorithm

serial computation of integer array, Prefix Scan

using for count variables in straight radix sort, Keeping data movement stable

using Windows Threads, A More Practical Algorithm

Prim's Algorithm, Prim's Algorithm, Which Serial Algorithm Should We Start With?, Concurrent Version of Prim's Algorithm, Concurrent Version of Prim's Algorithm

concurrent version, Concurrent Version of Prim's Algorithm

concurrent version and helper function, Concurrent Version of Prim's Algorithm

serial implementation, Which Serial Algorithm Should We Start With?

processors, specifying number used by PRAM algorithm, Theoretical Models

producer/consumer algorithm, Producer/consumer, How are the tasks assigned to threads?

profiling tools, Profiling, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler

thread profiling with Thread Profiler, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler

Pthreads, Explicit Threading, Pthreads, A More Practical Algorithm, A Barrier Object Implementation, A Barrier Object Implementation, Portability, The Concurrent Straight Radix Sort Solution

barrier object implementation for, A Barrier Object Implementation, A Barrier Object Implementation

code example, computing pi with numerical integration, Pthreads

implementing concurrent version of Quicksort, Portability

straight radix sort, improved version, The Concurrent Straight Radix Sort Solution

using in parallel sum, A More Practical Algorithm

pthread_mutex_t objects, Portability

push( ) and pop( ) functions, An Iterative Solution

iterative depth-first search, An Iterative Solution

Q

queues, Iterative Quicksort, Finding work for threads, Finding work for threads, It's all in the queue

in concurrent breadth-first search, It's all in the queue

defined for iterative Quicksort, Iterative Quicksort

Quicksort algorithm, blocking on empty queue, Finding work for threads

using semaphores to track remaining items, Finding work for threads

Quicksort algorithm, Quicksort, Scalability, Quicksort, Quicksort, Concurrency Within Recursion, Concurrency Within an Iterative Version, Giving threads their pink slips, Final Threaded Version, Final Threaded Version, Design Factor Scorecard, Not the Concurrent Solution, Yet, Glossary

concurrency within iterative version, Concurrency Within an Iterative Version, Giving threads their pink slips

concurrency within recursion, Concurrency Within Recursion

defined, Glossary

depth-first search versus, Not the Concurrent Solution, Yet

design factor scorecard for final concurrent version, Design Factor Scorecard

final threaded version of QuickSort( ) function, Final Threaded Version

partitioning in, Quicksort

serial version, Quicksort

threaded QuickSort( ), code fragment calling, Final Threaded Version

R

race conditions, Glossary

radix sorts, Radix Sort, Scalability, Radix Exchange Sort, Radix Exchange Sort, Straight Radix Sort, Straight Radix Sort, Using prefix scan to gather keys, Keeping data movement stable, Keeping data movement stable, Reducing the number of data touches, Reducing the number of data touches, The Concurrent Straight Radix Sort Solution, Design Factor Scorecard

design factor scorecard, Design Factor Scorecard

radix exchange sort, Radix Exchange Sort, Radix Exchange Sort

serial version, Radix Exchange Sort

straight radix sort, Straight Radix Sort, Straight Radix Sort, Using prefix scan to gather keys, Keeping data movement stable, Keeping data movement stable, Reducing the number of data touches, Reducing the number of data touches, The Concurrent Straight Radix Sort Solution

concurrent algorithm, Reducing the number of data touches

concurrent solution, using Pthreads, The Concurrent Straight Radix Sort Solution

gathering keys with prefix scan, Using prefix scan to gather keys

keeping data movement stable, Keeping data movement stable

reducing number of data touches, Reducing the number of data touches

second serial version, Keeping data movement stable

serial version, Straight Radix Sort

RAM (Random Access Machine) model, Theoretical Models

range class (TBB), Intel Threading Building Blocks

readers/writer locks, Readers/writer locks, Readers/writer locks, How many locks do we need?, Glossary

defined, Glossary

depth-first search, concurrent version, How many locks do we need?

recurrences, Recurrences, Loop-Carried Dependence

concurrent execution and, Recurrences

in loop-carried dependence, Loop-Carried Dependence

recursion, Quicksort, A Recursive Solution

depth-first search algorithm, serial implementation, A Recursive Solution

in Quicksort algorithm, Quicksort

reduce operation, MapReduce, Implementing a Concurrent Map, Reduce As a Concurrent Operation, Handcoded Reduction, Design Factor Scorecard

as concurrent operation, Reduce As a Concurrent Operation

code to sum elements of an array, Handcoded Reduction

considerations during map operation design, Implementing a Concurrent Map

design factor scorecard for code, Design Factor Scorecard

reduction, Reduction, A More Practical Algorithm, MapReduce

concurrent execution and, Reduction

reduction clause (OpenMP), A More Practical Algorithm, Concurrent Version of Prim's Algorithm

redundant work, Redundant work

root node, Minimum Spanning Tree

Rubik's Cube configuration, as search graph, Breadth-First Search

S

scalability, Concurrent Design Models Wrap-Up, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores, Scalability, Glossary

of concurrent applications, Concurrent Design Models Wrap-Up

defined, Glossary

planning for increasing numbers of cores, Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores

prefix scan algorithm, Scalability

scalable algorithms, Why Do I Need to Know This? What's in It for Me?

ScaLAPACK library, Domain-Specific Libraries

scaled speedup, Gustafson-Barsis's Law

schedule clause (OpenMP), Applying MapReduce, Efficiency

scientific and technical libraries for parallel computation, Domain-Specific Libraries

searching, Searching, Scalability, Unsorted Sequence, Scalability, Curtailing the Search, Curtailing the Search, Binary Search, Scalability

binary search, Binary Search, Scalability

unsorted data, Unsorted Sequence, Scalability, Curtailing the Search, Curtailing the Search

curtailing the search, Curtailing the Search, Curtailing the Search

selection, Selection, The Serial Algorithm, The Serial Algorithm, The Serial Algorithm, The Concurrent Algorithm, The Concurrent Algorithm, Finding the medians of subsequences, Counting and marking elements for partitions, The ArrayPack() function, The ArrayPack() function, The ArrayPack() function, Some Design Notes

concurrent algorithm for, The Concurrent Algorithm, The Concurrent Algorithm, Finding the medians of subsequences, Counting and marking elements for partitions, The ArrayPack() function, The ArrayPack() function, The ArrayPack() function, Some Design Notes

ArrayPack( ) function (example), The ArrayPack() function

CountAndMark and LEG classes (example), Counting and marking elements for partitions

design notes, Some Design Notes

FindMedians class (example), Finding the medians of subsequences

PackingMove class (example), The ArrayPack() function

PackingScan class (example), The ArrayPack() function

ParallelSelect code (example), The Concurrent Algorithm

serial algorithm for, The Serial Algorithm

serial code implementing selection algorithm, The Serial Algorithm

support functions for serial algorithm, The Serial Algorithm

semaphores, Finding work for threads, Giving threads their pink slips, Now for the Concurrent Solution, Glossary

defined, Glossary

using in concurrent depth-first search, Now for the Concurrent Solution

using in thread termination, Giving threads their pink slips

using to track items in queue, Finding work for threads

sequential consistency, Task Decomposition, Glossary

serial code, Isn't Concurrent Programming Hard?, What About Concurrency from Scratch?

transformed to concurrent, implementation step, What About Concurrency from Scratch?

shared memory, Glossary

shared-memory programming, Common Features, Features Unique to Shared Memory

features in common with distributed-memory programming, Common Features

features unique to, Features Unique to Shared Memory

sharing data, Sharing data

Shellsort algorithm, Shellsort, Scalability, Quick Review of Insertion Sort, Serial Shellsort, Concurrent Shellsort, Concurrent Shellsort, Design Factor Scorecard

concurrent version, using OpenMP, Concurrent Shellsort

design factor scorecard, Design Factor Scorecard

modified serial Shellsort to sort entire h-partition, Concurrent Shellsort

review of insertion sort, Quick Review of Insertion Sort

serial version, Serial Shellsort

SIMD (Single Instruction, Multiple Data stream), Glossary

simple paths, Graph Algorithms, Glossary

single construct, Trying to Push the Concurrency Higher

single pragma (OpenMP), Concurrent Version of Prim's Algorithm

SMP (symmetric multiprocessing), Review of the Evolution for Supporting Parallelism in Hardware

software lifecycle, Four Steps of a Threading Methodology (see lifecycle of software)

sorting, The Serial Algorithm, Sorting, Scalability, Bubblesort, Scalability, Odd-Even Transposition Sort, Scalability, Shellsort, Scalability, Quicksort, Scalability, Radix Sort, Scalability

Bubblesort algorithm, Bubblesort, Scalability

odd-even transposition sort, Odd-Even Transposition Sort, Scalability

Quicksort algorithm, Quicksort, Scalability

radix sorts, Radix Sort, Scalability

Shellsort algorithm, Shellsort, Scalability

sorting algorithms, stable, Straight Radix Sort

spanning tree, Minimum Spanning Tree

speedup, Speedup, Speedup, Speedup, Amdahl's Law, Gustafson-Barsis's Law, One Final Note on Speedup and Efficiency, Glossary, Glossary, Glossary, Glossary, Glossary

defined, Glossary

efficiency and, One Final Note on Speedup and Efficiency

estimating using Amdahl's Law, Amdahl's Law

example speedup curves, Speedup

Gustafson-Barsis's Law for, Gustafson-Barsis's Law

linear, Glossary

perfect, Glossary

superlinear, Speedup, Glossary

stable sorting algorithms, Straight Radix Sort

starvation, Fourth Attempt, Glossary

state, algorithms with, Algorithms with State

states in discrete optimization problems, Depth-First Search

static allocation, Static/dynamic allocation of work, Glossary

static graphs, Static Graphs Versus Dynamic Graphs

static scheduling of tasks, How are the tasks assigned to threads?

storage conflicts, Aren't Threads Dangerous?, Thread Issue Debugger: Thread Checker, Glossary

straight radix sort, Straight Radix Sort, Straight Radix Sort, Straight Radix Sort, Straight Radix Sort, Using prefix scan to gather keys, Keeping data movement stable, Keeping data movement stable, Reducing the number of data touches, Reducing the number of data touches, The Concurrent Straight Radix Sort Solution

code execution, Straight Radix Sort

concurrent algorithm, Reducing the number of data touches

concurrent version, using Pthreads, The Concurrent Straight Radix Sort Solution

keeping data movement stable, Keeping data movement stable

reducing number of data touches, Reducing the number of data touches

second serial version, Keeping data movement stable

serial version, Straight Radix Sort

using decimal digits, Straight Radix Sort

using prefix scan to gather keys, Using prefix scan to gather keys

Strassen's Algorithm, Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency

Streaming SIMD Execution (SSE), Review of the Evolution for Supporting Parallelism in Hardware, Glossary

SumByReduction( ) function (example), Handcoded Reduction

superlinear speedup, Speedup, Glossary

symmetric multiprocessing (SMP), Review of the Evolution for Supporting Parallelism in Hardware

synchronization, OpenMP, Windows Threads, Handcoded Reduction, A Barrier Object Implementation, A Barrier Object Implementation, Glossary, Glossary, Glossary

barrier object implementation for Pthreads, A Barrier Object Implementation, A Barrier Object Implementation

barrier objects, Handcoded Reduction, Glossary

defined, Glossary

in OpenMP, OpenMP

semaphores, Glossary

in Windows Threads, Windows Threads

synchronization objects, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data, Glossary

locks, association to specific data, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data

T

task decomposition, Task Decomposition, Glossary

taskq extensions to OpenMP, Portability

tasks, How are the tasks assigned to threads?, Intel Threading Building Blocks

assignment to threads, How are the tasks assigned to threads?

definition and scheduling by TBB library, Intel Threading Building Blocks

TBB (Threading Building Blocks), Intel Threading Building Blocks, Intel Threading Building Blocks, The Concurrent Algorithm, Finding work for threads, Portability, Portability, Portability, All-Pairs Shortest Path, Concurrent Version of Prim's Algorithm, Glossary

atomic fetch_and_add( ) operation, Portability

class to find index of minimum element via parallel_reduce, Concurrent Version of Prim's Algorithm

code example, computing pi with numerical integration, Intel Threading Building Blocks

concurrent Floyd's Algorithm implementation, All-Pairs Shortest Path

concurrent N-ary search algorithm implementation, Portability

concurrent straight radix sorts and, Portability

concurrent_queue container, Finding work for threads

defined, Glossary

using in concurrent algorithm for selection, The Concurrent Algorithm

TBB (Threading Building Blocks) (Intel), How are the tasks assigned to threads?

TBB (Threading Building Blocks) library, Prefix Scan

parallel_scan template, Prefix Scan

teams of threads, Concurrent iterative version

testing for correctness, Step 3. Test for Correctness: Detecting and Fixing Threading Errors, The testing and tuning cycle

performance tuning changes, The testing and tuning cycle

Thread Checker debugger, Thread Issue Debugger: Thread Checker

thread monkey, What Is a Thread Monkey?, Glossary

thread pools, Concurrent iterative version, Letting threads know the work is done, Finding work for threads, Giving threads their pink slips, Final Threaded Version, Glossary

defined, Glossary

finding work for threads, Finding work for threads

implementation in concurrent solution, Concurrent iterative version

letting threads know work is done, Letting threads know the work is done

terminating threads running QuickSort( ) function, Giving threads their pink slips

terminating threads with TerminateThread( ), Final Threaded Version

Thread Profiler, Thread Profiling: Standard Profile Tool (Sample Over Time), Thread Profiler

thread-local storage, Local declarations and thread-local storage (see TLS)

thread-safe, Glossary

thread-safe libraries, Rule 4: Make Use of Thread-Safe Libraries Wherever Possible

threaded programming, Who Is This Book For?

threading, Aren't Threads Dangerous?, Verification of Parallel Algorithms, There Are No Evil Threads, Just Threads Programmed for Evil, Rule 1: Identify Truly Independent Computations, Rule 2: Implement Concurrency at the Highest Level Possible, Rule 5: Use the Right Threading Model

(see also multithreaded applications, design rules)

dangers of threads, Aren't Threads Dangerous?

serial code, approaches to, Rule 2: Implement Concurrency at the Highest Level Possible

using right model, Rule 5: Use the Right Threading Model

verifying parallel algorithms, Verification of Parallel Algorithms, There Are No Evil Threads, Just Threads Programmed for Evil

threading libraries, Threading Libraries, Domain-Specific Libraries, Implicit Threading, Intel Threading Building Blocks, Explicit Threading, Windows Threads, What Else Is Out There?, Domain-Specific Libraries

domain-specific, Domain-Specific Libraries

explicit threading, Explicit Threading, Windows Threads

implicit threading, Implicit Threading, Intel Threading Building Blocks

other, What Else Is Out There?

threading methodology, Four Steps of a Threading Methodology, Step 1. Analysis: Identify Possible Concurrency, Step 2. Design and Implementation: Threading the Algorithm, Step 3. Test for Correctness: Detecting and Fixing Threading Errors, Step 4. Tune for Performance: Removing Performance Bottlenecks

analysis, Step 1. Analysis: Identify Possible Concurrency

design and implementation, Step 2. Design and Implementation: Threading the Algorithm

steps within software lifecycle, Four Steps of a Threading Methodology

testing for correctness, Step 3. Test for Correctness: Detecting and Fixing Threading Errors

tuning for performance, Step 4. Tune for Performance: Removing Performance Bottlenecks

threading tools, Threading Tools, Go Forth and Conquer, Debuggers, Performance Tools, Anything Else Out There?

debuggers, Debuggers

Intel Parallel Studio components, Anything Else Out There?

performance tools, Performance Tools

threads, Aren't Threads Dangerous?, Aren't Threads Dangerous?, Theoretical Models, What are the dependencies between tasks and how can they be satisfied?, How are the tasks assigned to threads?, Verification of Parallel Algorithms, OpenMP, OpenMP, Spawning the depth-first search threads, Glossary

assignment of tasks to, How are the tasks assigned to threads?

defined, Glossary

depth-first search, spawning in concurrent version, Spawning the depth-first search threads

interleavings of statements from, Verification of Parallel Algorithms

lockstep execution, Theoretical Models

in OpenMP, OpenMP

setting number in OpenMP, OpenMP

storage conflict or data race, Aren't Threads Dangerous?

synchronization of, Aren't Threads Dangerous?

variable accessible only to given thread, What are the dependencies between tasks and how can they be satisfied?

TLS (thread-local storage), Local declarations and thread-local storage, What are the dependencies between tasks and how can they be satisfied?, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data, Glossary

defined, Glossary

local declarations and, Local declarations and thread-local storage

local work variables and, What are the dependencies between tasks and how can they be satisfied?

use in multithreaded applications, Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data

top-down threading, Rule 2: Implement Concurrency at the Highest Level Possible

Totalview debugger, Thread-Aware Debugger

trees, Minimum Spanning Tree

tuning for performance, Step 4. Tune for Performance: Removing Performance Bottlenecks

U

undirected graphs, Graph Algorithms, Glossary

weighted graph and associated weight matrix, Graph Algorithms

unsorted data, searching, Unsorted Sequence, Scalability, Unsorted Sequence, Unsorted Sequence, Curtailing the Search, Curtailing the Search, Design Factor Scorecard

curtailing the search, Curtailing the Search, Curtailing the Search

design factor scorecard for concurrent linear search, Design Factor Scorecard

linear search code for, Unsorted Sequence

OpenMP version of linear search algorithm, Unsorted Sequence

updates, elements within a data chunk, How can you ensure that the tasks for each chunk have access to all data required for updates?

V

variables, Local declarations and thread-local storage, What are the dependencies between tasks and how can they be satisfied?, Induction Variables, Pthreads, Glossary, Glossary

accessible only to given thread, What are the dependencies between tasks and how can they be satisfied?

condition variables in Pthreads, Pthreads

induction variables and concurrent execution, Induction Variables

local, Glossary

private, Glossary

private or local, in shared-memory programming, Local declarations and thread-local storage

vertices, Graph Algorithms, Glossary

visited array, depth-first search, A Recursive Solution, Not the Concurrent Solution, Yet

volume-to-surface ratio (in data decomposition), How should you divide the data into chunks?

VTune Performance Analyzer, Thread Issue Debugger: Thread Checker, Profiling

Thread Checker plug-in, Thread Issue Debugger: Thread Checker

W

Fair Use Sources

Concurrency: Concurrency Programming Best Practices, Concurrent Programming Fundamentals, Parallel Programming Fundamentals, Asynchronous I/O, Asynchronous programming (Async programming, Asynchronous flow control, Async / await), Asymmetric Transfer, Akka, Atomics, Busy waiting, Channels, Concurrent, Concurrent system design, Concurrency control (Concurrency control algorithms‎, Concurrency control in databases, Atomicity (programming), Distributed concurrency control, Data synchronization), Concurrency pattern, Concurrent computing, Concurrency primitives, Concurrency problems, Concurrent programming, Concurrent algorithms, Concurrent programming languages, Concurrent programming libraries‎, Java Continuations, Coroutines, Critical section, Deadlocks, Decomposition, Dining philosophers problem, Event (synchronization primitive), Exclusive or, Execution model (Parallel execution model), Fibers, Futures, Inter-process communication, Linearizability, Lock (computer science), Message passing, Monitor (synchronization), Computer multitasking (Context switch, Pre-emptive multitasking - Preemption (computing), Cooperative multitasking - Non-preemptive multitasking), Multi-threaded programming, Multi-core programming, Multi-threaded, Mutual exclusion, Mutually exclusive events, Mutex, Non-blocking algorithm (Lock-free), Parallel programming, Parallel computing, Process (computing), Process state, Producer-consumer problem (Bounded-buffer problem), Project Loom, Promises, Race conditions, Read-copy update (RCU), Readers–writer lock, Readers–writers problem, Recursive locks, Reducers, Reentrant mutex, Scheduling (computing)‎, Semaphore (programming), Seqlock (Sequence lock), Serializability, Shared resource, Sleeping barber problem, Spinlock, Synchronization (computer science), System resource, Thread (computing), Tuple space, Volatile (computer programming), Yield (multithreading) , Degree of parallelism, Data-Oriented Programming (DOP), Functional and Concurrent Programming, Concurrency bibliography, Manning Concurrency Async Parallel Programming Series, Concurrency glossary, Awesome Concurrency, Concurrency topics, Functional programming. (navbar_concurrency - see also navbar_async, navbar_python_concurrency, navbar_golang_concurrency, navbar_java_concurrency)


© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers

SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.


art_of_concurrency_index.txt · Last modified: 2024/04/28 03:52 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki