Table of Contents
AI Glossary
Appendix F. Glossary
Acronyms
- AGI: Artificial general intelligence: Machine intelligence capable of solving a variety of problems that human brains can solve
- AI: Artificial intelligence: Machine behavior that is impressive enough to be called intelligent by scientists or corporate marketers
- ANN: Approximate nearest neighbors: A family of algorithms that finds the closest vectors to the given vector in a provided set of vectors.
- API: Application programmer interface - A user interface for developers, usually a command line tool, source code library, or web interface that they can interact with programmatically
- AWS: Amazon Web Services - Amazon invented the concept of cloud services when they exposed their internal infrastructure to the world.
- BERT: Bi-Directional Encoding Representation from Transformers - A transformer-based language model introduced in 2018 that dramatically changed the NLP landscape and was a precursor to Large Language Models] of today.
- BOW: Bag of words - A data structure (usually a vector) that retains the counts (frequencies) of words but not their order
- CEC: Constant error carousel - A neuron that outputs its input delayed by one time step. Used within an LSTM or GRU memory unit. This is the memory register for an LSTM unit and can only be reset to a new value by the forgetting gate interrupting this “carousel.”
- CNN: Convolutional neural network - A neural network that is trained to learn filters, also known as kernels, for feature extraction in supervised learning
- CUDA: Compute Unified Device Architecture - An Nvidia open source software library optimized for running general computations/-algorithms on a GPU
- DAG: Directed acyclic graph - A network topology without any cycles, connections that loop back on themselves
- DFA: Deterministic finite automaton - A finite state machine that doesn’t make random choices. The re package in Python compiles regular expressions to create a DFA, but the regex can compile fuzzy regular expressions into NDFA (nondeterministic FA).
- FSM: Finite-state machine - Kyle Gorman and Wikipedia can explain this better than I (https://en.wikipedia.org/wiki/Finite-state_machine).
- FST: Finite-state transducer - Like regular expressions, but they can output a new character to replace each character they matched. Kyle Gorman explains them well (https://www.openfst.org).
- GIS: Geographic information system - A database for storing, manipulating, and displaying geographic information, usually involv-ing latitude, longitude, and altitude coordinates and traces.
- GPU: Graphical processing unit - The graphics card in a gaming rig, a cryptocurrency mining server, or a machine learning server
- GRU: Gated recurrent unit - A variation of long short-term memory networks with shared parameters to cut computation time
- HitLRL or HLRL: Human in the Loop Reinforcement Learning - HitLRL is an active learning approach to model training used for conversational LLMs such as InstructGPT and large game playing deep learning models such as AlphaGo. Conversational LLMs such as InstructGPT use reinforcement learning augmented with human curators in order to keep up with the evolution of language and concepts. The human labelers are identifying whether generated text is within the ethical and quality guidelines for the model. But unlike conventional RL, these labels are used to train a quality scoring supervisor model that is then used to flag future bot responses for labeling.
- HNSW: A graph data structure that enables efficient search (and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs (https://arxiv.org/vc/arxiv/papers/1603/1603.09320v1.pdf) by Yu F. Malkov and D. F. Yashunin)
- HPC: High performance computing - The study of systems that maximize throughput, usually by parallelizing computation with separate map and reduce computation stages
- IDE: Integrated development environment - A desktop application for software development, such as PyCharm, Eclipse, Atom, or Sublime Text 3
- IR: Information retrieval - The study of document and web search engine algorithms. This is what brought NLP to the forefront of important computer science disciplines in the 90s.
- ITU: India Technical University - A top-ranking technical university. The Georgia Tech of India.
- i18n: Internationalization - An application for use in more than one country (locale)
- LDA: Linear discriminant analysis - A classification algorithm with linear boundaries between classes (see Chapter 4)
- LLM: Large language model - If you scale up a transformer-based language model to web scale, using millions of dollars in compute resources to train it on a large portion of the natural language text on the Internet, that’s a Large Language Model.
- LSA: Latent semantic analysis - SVD applied to TF-IDF or bag-of-words vectors to create topic vectors in a vector space language model
- LSH: Locality sensitive hash - A hash that works as an efficient but approximate mapping/clustering index on dense, continuous, high-dimensional vectors (see chapter 13). Think of them as ZIP Codes that work for more than just 2D (latitude and longitude).
- LSTM: Long short-term memory - An enhanced form of a recurrent neural network that maintains a memory of state that itself is trained via backpropagation (see chapter 9)
- MIH: Multi-index hashing - A hashing and indexing approach for high-dimensional dense vectors
- ML: Machine learning - Programming a machine with data rather than hand-coded algorithms
- MSE: Mean squared error - The sum of the square of the difference between the desired output of a machine learning model and the actual output of the model
- NELL: Never Ending Language Learning - A Carnegie Mellon knowledge extraction project that has been running continuously for years, scraping web pages and extracting general knowledge about the world (mostly “IS-A” categorical relationships between terms)
- NLG: Natural language generation - Composing text automatically, algorithmically; one of the most challenging tasks of natural language processing (NLP)
- NLP: Natural language processing - You probably know what this is by now. If not, see the introduction in chapter 1.
- NLU: Natural language understanding - Often used in recent papers to refer to natural language processing with neural networks
- NMF: Nonnegative matrix factorization - A matrix factorization similar to SVD, but constrains all elements in the matrix factors to be greater than or equal to zero
- NSF: National Science Foundation - A US government agency tasked with funding scientific research
- NYC: New York City - A US city that never sleeps
- pip: Pip installs pip - The official Python package manager that downloads and installs packages automatically from the “Cheese Shop” (pypi.python.org)
- PR: Pull request - The right way to request that someone merge your code into theirs. GitHub has some buttons and wizards to make this easy. This is how you can build your reputation as a conscientious contributor to open source.
- PCA: Principal component analysis - A technique that is used to decrease the dimensionality of data. Its application in NLP is often called LSF.
- QDA: Quadratic discriminant analysis - Similar to LDA, but allows for quadratic (curved) boundaries between classes
- RAG: Retrieval-Augmented Generation - A way to increase the accuracy and reliability of generative language models by using a retrieval model to fetch relevant data from a database or knowledge graph to serve as a base for the generation step.
- ReLU: Rectified linear unit - A linear neural net activation function that forces the output of a neuron to be nonzero. Equivalent to y = np.max(x, 0). The most popular and efficient activation function for image processing and NLP, because it allows back propagation to work efficiently on extremely deep networks without “vanishing the gradients.”
- REPL: Read–evaluate–print loop - The typical workflow of a developer of any scripting language that doesn’t need to be compiled. The ipython, jupyter console, and jupyter notebook REPLs are particularly powerful, with their help, ?, ??, and % magic commands, plus auto-complete, and Ctrl-R history search. .[1]
- RMSE: Root mean square error - The square root of the mean squared error. A common regression error metric. It can also be used for binary and ordinal classification problems. It provides an intuitive estimate of the 1-sigma uncertainty in a model’s predictions.
- RNN: Recurrent neural network - A neural network architecture that feeds the outputs of one layer into the input of an earlier layer. RNNs are often “unfolded” into equivalent feed forward neural networks for diagramming and analysis.
- SVD: Singular Value Decomposition - A technique
- SMO: Sequential minimal optimization - A support vector machine training approach and algorithm
- SVD: Singular value decomposition - A matrix factorization that produces a diagonal matrix of eigenvalues and two orthogonal matrices containing eigenvectors. It’s the math behind LSA and PCA (see chapter 4).
- SVM: Support vector machine - A machine learning algorithm usually used for classification
- TF-IDF: Term frequency * inverse document frequency - A normalization of word counts that improves information retrieval results (see chapter 3)
- UI: User interface - The “affordances” you offer your user through your software, often the graphical web pages or mobile application screens that your user must interact with to use your product or -service
- UX: User experience - The nature of a customer’s interaction with your product or company, from purchase all the way through to their last contact with you. This includes your website or API UI on your website and all the other interactions with your company.
- VSM: Vector space model - A vector representation of the objects in your problem, such as words or documents in an NLP problem (see chapter 4 and chapter 6)
- YMMV: Your mileage may vary - You may not get the same results that we did.
Terms
- Affordance: A way for your user to interact with your product that you intentionally make possible. Ideally that interaction should come naturally to the user, be easily discoverable, and self-documenting.
- Artificial neural network: A computational graph for machine learning or simulation of a biological neural network (brain)
- Cell: The memory or state part of an LSTM unit that records a single scalar value and outputs it continuously .[2]
- Dark patterns: Software patterns (usually for a user interface) that are intended to increase revenue but often fail due to “blowback” because they manipulate your customers into using your product in ways that they don’t intend
- Feed-forward network: A “one-way” neural network that passes all its inputs through to its outputs in a consistent direction, forming a directed acyclic graph (DAG) or tree
- Grounding: A method to improve accuracy of large language models and reduce hallucinations, by making the model base its answers on data retrieved from a document database.
- Guardrails: Ways of controlling the output of large language model, such as ensuring the response format, or preventing the model from discussing certain issues.
- Hallucinations: A common problem with generative language models, where the model generates text that seems plausible but is actually not true or accurate.
- Intent: A category of users' intentions that is meant to produce a response in a conversational system.
- Morpheme: A part of a token or word that contains meaning in and of itself. The morphemes that make up a token are collectively called the token’s morphology. The morphology of a token can be found using algorithms in packages like SpaCy that process the token with its context (words around it). .[3]
- Net, network, or neural net: Artificial neural network
- Neuron: A unit in a neural net whose function (such as y = tanh(w.dot(x))) takes multiple inputs and outputs a single scalar value. This value is usually the weights for that neuron (w*or wi) multiplied by all the input signals (*x or xi) and summed with a bias weight (w_0) before applying an activation function like _tanh. A neuron always outputs a scalar value, which is sent to the inputs of any additional hidden or output neurons in the network. If a neuron implements a much more complicated activation function than that, like the enhancements that were made to recurrent neurons to create an LSTM, it is usually called a unit, for example, an LSTM unit.
- Nessvector: An informal term for topic vectors or semantic vectors that capture concepts or qualities, such as femaleness or blueness, into the dimensions of a vector
- Predicate: In English grammar, the predicate is the main verb of a sentence that’s associated with the subject. Every complete sentence must have a predicate, just like it must also have a subject.
- Skip-grams: Pairs of tokens used as training examples for a word vector embedding, where any number of intervening words are ignored (see chapter 6).
- Softmax: Normalized exponential function used to squash the real-valued vector output by a neural network so that its values range between 0 and 1 like probabilities.
- Subject: The main noun of a sentence]]: every complete sentence must have a subject (and a predicate) even if the subject is implied, like in the sentence “Run!” where the implied subject is “you.”
- Transformers: A type of artificial neural network that use a mechanism called attention. Large transformers trained on internet-sized datasets are often called Large Language Models.
- Unit: Neuron or small collection of neurons that perform some more complicated nonlinear function to compute the output. For example, an LSTM unit has a memory cell that records state, an input gate (neuron) that decides what value to remember, a forget gate (neuron) that decides how long to remember that value, and an output gate neuron that accomplishes the activation function of the unit (usually a sigmoid or tanh()). A unit is a drop-in replacement for a neuron in a neural net that takes a vector input and outputs a scalar value; it just has more complicated behavior.
[1] Python’s REPLs even allow you to execute any shell command (including pip) installed on your OS (such as !git commit -am 'fix 123'). This lets your fingers stay on the keyboard and away from the mouse, minimizing cognitive load from context switches.
[2] See the web page titled “Long short-term memory” ( https://en.wikipedia.org/wiki/Long_short-term_memory).
[3] See the web page titled “Linguistic Features : spaCy Usage Documentation” ( https://spacy.io/usage/linguistic-features#rule-based-morphology).
Terms related to: AI-ML-DL-NLP-GenAI-LLM-GPT-RAG-MLOps-Chatbots-ChatGPT-Gemini-Copilot-HuggingFace-GPU-Prompt Engineering-Data Science-DataOps-Data Engineering-Big Data-Analytics-Databases-SQL-NoSQL
AI, Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Neural Network, Generative AI (GenAI), Natural Language Processing (NLP), Large Language Model (LLM), Transformer Models, GPT (Generative Pre-trained Transformer), ChatGPT, Chatbots, Prompt Engineering, HuggingFace, GPU (Graphics Processing Unit), RAG (Retrieval-Augmented Generation), MLOps (Machine Learning Operations), Data Science, DataOps (Data Operations), Data Engineering, Big Data, Analytics, Databases, SQL (Structured Query Language), NoSQL, Gemini (Google AI Model), Copilot (AI Pair Programmer), Foundation Models, LLM Fine-Tuning, LLM Inference, LLM Training, Parameter-Efficient Tuning, Instruction Tuning, Few-Shot Learning, Zero-Shot Learning, One-Shot Learning, Meta-Learning, Reinforcement Learning from Human Feedback (RLHF), Self-Supervised Learning, Contrastive Learning, Masked Language Modeling, Causal Language Modeling, Attention Mechanism, Self-Attention, Multi-Head Attention, Positional Embeddings, Word Embeddings, Tokenization, Byte Pair Encoding (BPE), SentencePiece Tokenization, Subword Tokenization, Prompt Templates, Prompt Context Window, Context Length, Scaling Laws, Parameter Scaling, Model Architecture, Model Distillation, Model Pruning, Model Quantization, Model Compression, Low-Rank Adaptation (LoRA), Sparse Models, Mixture of Experts, Neural Architecture Search (NAS), AutoML, Gradient Descent Optimization, Stochastic Gradient Descent (SGD), Adam Optimizer, AdamW Optimizer, RMSProp Optimizer, Adagrad Optimizer, Adadelta Optimizer, Nesterov Momentum, Learning Rate Schedules, Warmup Steps, Cosine Decay, Hyperparameter Tuning, Bayesian Optimization, Grid Search, Random Search, Population Based Training, Early Stopping, Regularization, Dropout, Weight Decay, Label Smoothing, Batch Normalization, Layer Normalization, Instance Normalization, Group Normalization, Residual Connections, Skip Connections, Encoder-Decoder Architecture, Encoder Stack, Decoder Stack, Cross-Attention, Feed-Forward Layers, Position-Wise Feed-Forward Network, Pre-LN vs Post-LN, Sequence-to-Sequence Models, Causal Decoder-Only Models, Masked Autoencoder, Domain Adaptation, Task-Specific Heads, Classification Head, Regression Head, Token Classification Head, Sequence Classification Head, Multiple-Choice Head, Span Prediction Head, Causal Head, Next Sentence Prediction, MLM (Masked Language Modeling), NSP (Next Sentence Prediction), C4 Dataset, WebText Dataset, Common Crawl Corpus, Wikipedia Corpus, BooksCorpus, Pile Dataset, LAION Dataset, Curated Corpora, Fine-Tuning Datasets, Instruction Data, Alignment Data, Human Feedback Data, Preference Ranking, Reward Modeling, RLHF Policy Optimization, Batch Inference, Online Inference, Vector Databases, FAISS Integration, Chroma Integration, Weaviate Integration, Pinecone Integration, Milvus Integration, Data Embeddings, Semantic Search, Embedding Models, Text-to-Vector Encoding, Vector Similarity Search, Approximate Nearest Neighbor (ANN), HNSW Index, IVF Index, ScaNN Index, Memory Footprint Optimization, HuggingFace Transformers, HuggingFace Hub, HuggingFace Datasets, HuggingFace Model Cards, HuggingFace Spaces, HuggingFace Inference Endpoints, HuggingFace Accelerate, HuggingFace PEFT (Parameter Efficient Fine-Tuning), HuggingFace Safetensors Format, HuggingFace Tokenizers, HuggingFace Pipeline, HuggingFace Trainer, HuggingFace Auto Classes (AutoModel, AutoTokenizer), HuggingFace Model Conversion, HuggingFace Community Models, HuggingFace Diffusers, Stable Diffusion, HuggingFace Model Hub Search, HuggingFace Secrets Management, OpenAI GPT models, OpenAI API, OpenAI Chat Completions, OpenAI Text Completions, OpenAI Embeddings API, OpenAI Rate Limits, OpenAI Fine-Tuning (GPT-3.5, GPT-4), OpenAI System Messages, OpenAI Assistant Messages, OpenAI User Messages, OpenAI Function Calls, OpenAI ChatML Format, OpenAI Temperature Parameter, OpenAI Top_p Parameter, OpenAI Frequency Penalty, OpenAI Presence Penalty, OpenAI Max Tokens Parameter, OpenAI Logit Bias, OpenAI Stop Sequences, Azure OpenAI Integration, Anthropic Claude Integration, Anthropic Claude Context Window, Anthropic Claude Constitutional AI, Cohere Integration LLM provider, Llama2 (Meta's LLM), Llama2 Chat Model, Vicuna Model (LLM)), Alpaca Model, StableLM, MPT (MosaicML Pretrained Transformer), Falcon LLM, Baichuan LLM, Code Llama, WizardCoder Model, WizardLM Model, Phoenix LLM, Samantha LLM, LoRA Adapters, PEFT for LLM, BitFit Parameters Tuning, QLoRA (Quantized LoRA), GLoRA, GGML Quantization, GPTQ Quantization, SmoothQuant, Int4 Quantization, Int8 Quantization, FP16 Mixed Precision, BF16 Precision, MLOps Tools, MLOps CI/CD, MLOps CD4ML, MLOps Feature Store, MLOps Model Registry, MLOps Model Serving, MLOps Model Monitoring, MLOps Model Drift Detection, MLOps Data Drift Detection, MLOps Model Explainability Integration, MLOps MLFlow Integration, MLOps Kubeflow Integration, MLOps MLRun, MLOps Seldon Core for serving, MLOps BentoML for serving, MLOps MLflow Tracking, MLOps MLflow Model Registry, MLOps DVC (Data Version Control), MLOps Delta Lake, RAG (Retrieval-Augmented Generation), RAG Document Store, RAG Vector Store Backend, RAG Memory Augmentation, RAG On-the-fly Retrieval, RAG Re-ranking Step, RAG HyDE Technique - It's known as hypothetical document embeddings - advanced but known in RAG, RAG chain-of-thought, chain-of-thought related to LLM reasoning, Chain-of-Thought Reasoning, Self-Consistency Decoding, Tree-of-thoughts, ReAct (Reason+Act) Prompting Strategy, Prompt Engineering Techniques, Prompt Templates (LLM), Prompt Variables Replacement, Prompt Few-Shot Examples, Prompt Zero-Shot Mode, Prompt Retrieval Injection, Prompt System Message, Prompt Assistant Message, Prompt Role Specification, Prompt Content Filtering, Prompt Moderation Tools, AI-Generated Code Completion, Copilot (GitHub) Integration, CoPilot CLI, Copilot Labs, Gemini (Google Model) Early access, LLM from Google, LaMDA (Language Model for Dialog Applications), PaLM (Pathways Language Model), PaLM2 (PaLM 2 Model), Flan PaLM Models, Google Vertex AI Integration, AWS Sagemaker Integration, Azure Machine Learning Integration, Databricks MLFlow Integration, HuggingFace Hub LFS for large models, LFS big files management, OPT (Open Pretrained Transformer) Meta Model, Bloom LLM, Ernie Bot (Baidu LLM), Zhipu-Chat - Another LLM from China, Salesforce CodeT5 - It's a code model, Finetune with LoRA on GPT-4, Anthropic Claude 2
Artificial Intelligence (AI): The Borg, SkyNet, Google Gemini, ChatGPT, AI Fundamentals, AI Inventor: Arthur Samuel of IBM 1959 coined term Machine Learning. Synonym Self-Teaching Computers from 1950s. Experimental AI “Learning Machine” called Cybertron in early 1960s by Raytheon Company; ChatGPT, Generative AI, NLP, GAN, AI winter, The Singularity, AI FUD, Quantum FUD (Fake Quantum Computers), AI Propaganda, Quantum Propaganda, Cloud AI (AWS AI, Azure AI, Google AI-GCP AI-Google Cloud AI, IBM AI, Apple AI), Deep Learning (DL), Machine learning (ML), AI History, AI Bibliography, Manning AI-ML-DL-NLP-GAN Series, AI Glossary, AI Topics, AI Courses, AI Libraries, AI frameworks, AI GitHub, AI Awesome List. (navbar_ai - See also navbar_dl, navbar_ml, navbar_nlp, navbar_chatbot, navbar_chatgpt, navbar_llm, navbar_openai, borg_usage_disclaimer, navbar_bigtech, navbar_cia)
Cloud Monk is Retired ( for now). Buddha with you. © 2025 and Beginningless Time - Present Moment - Three Times: The Buddhas or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.