Dive into Deep Learning

Download PDF

100%

Try now

Dive into Deep Learning

Three Linear Neural Networks for Regression

Three point one Linear Regression

Three point one point two Vectorization for Speed

Three point one point three The Normal Distribution and Squared Loss

Vectors

Basic Properties of Tensor Arithmetic

Non-Reduction Sum

Matrix-Vector Products

Discussion

Derivatives and Differentiation

Discussion

A Simple Function

Discussion

Nine point one Working with Sequences

Nine point one point two Sequence Models

Nine point one point four Prediction

Summary

Nine point two point one Reading the Dataset

Tokenization

Nine point two point three

Putting It All Together

Exploratory Language Statistics

Exercises

Nine point three point one

Nine point three point two Perplexity

Nine point three point four Summary and Discussion

Exercises

Nine point four point one Neural Networks without Hidden States

Nine point four point three RNN-Based Character-Level Language Models

Summary

Recurrent Neural Network Implementation from Scratch

Nine point five point two

Nine point five point three

Nine point five point four

Nine point five point five

Nine point five point six

Nine point five point seven

Nine point six point one Defining the Model

Nine point six point three Summary

Exercises

Nine point seven point one Analysis of Gradients in RNNs

Nine point seven point three

Nine point seven point four Exercises

Reading the Dataset

Fourteen point six point four Summary

Fourteen point seven Single Shot Multibox Detection

Fourteen point seven point two

Fourteen point seven point three Prediction

Summary

Exercises

Fourteen point eight point one R-CNNs

Fast R-CNN

Summary

Exercises

Fourteen point nine point one Image Segmentation and Instance Segmentation

Fourteen point nine point three

Exercises

Fourteen point ten point one Basic Operation

Fourteen point ten point three

Fourteen point ten point four Summary

Fourteen point eleven Fully Convolutional Networks

Fourteen point eleven point two Initializing Transposed Convolutional Layers

Fourteen point eleven point four Training

Summary

Exercises

Fourteen point twelve point one Method

Fourteen point twelve point three Preprocessing and Postprocessing

Fourteen point twelve point five Defining the Loss Function

Fourteen point twelve point seven

Fourteen point twelve point eight

Preface

About This Book

Learning by Doing

Content and Structure

Code

Target Audience

Notebooks, Website, GitHub, and Forum

Summary

Exercises

Installation

Installing Miniconda

Installing the Deep Learning Framework and the d2l Package

Downloading and Running the Code

Notation

Numerical Objects

Set Theory

Functions and Operators

Calculus

Probability and Information Theory

Introduction

One point one A Motivating Example

A Motivating Example

One point two. Key Components

Key Components

One point two point two Models

One point two point three Objective Functions

One point two point four Optimization Algorithms

One point three Kinds of Machine Learning Problems

One point three point one Supervised Learning

Supervised learning.

Classification

Tagging

Search

Recommender Systems

Sequence Learning

One point three point two Unsupervised and Self-Supervised Learning

One point three point three Interacting with an Environment

One point three point four Reinforcement Learning

One point four Roots

Roots

One point five. The Road to Deep Learning

The Road to Deep Learning

Success Stories

One point six Success Stories

The Essence of Deep Learning

One point seven. The Essence of Deep Learning

Summary

One point nine. Exercises

Preliminaries

Two point one. Data Manipulation

Two point one point one. Getting Started

Two point one point two Indexing and Slicing

Two point one point three Operations

Two point one point four Broadcasting

Two point one point five Saving Memory

Two point one point six Conversion to Other Python Objects

Two point one point seven Summary

Two point one point eight Exercises

Two point two Data Preprocessing

Two point two point one Reading the Dataset

Data Preprocessing

Two point two point three Conversion to the Tensor Format

Two point two point five Exercises

Two point three Linear Algebra

Two point three point one Scalars

Two point three point two Vectors

Two point three point three Matrices

Two point three point three

Two point three point four Tensors

Two point three point five Basic Properties of Tensor Arithmetic

Two point three point six Reduction

Two point three point seven Non-Reduction Sum

Two point three point eight Dot Products

Two point three point nine Matrix-Vector Products

Two point three point ten Matrix-Matrix Multiplication

Two point three point eleven Norms

Two point three point twelve Discussion

To recap:

Two point three point thirteen Exercises

Two point four Calculus

Two point four point one Derivatives and Differentiation

Constant multiple rule

Two point four point two Visualization Utilities

Two point four point three Partial Derivatives and Gradients

Two point four point four Chain Rule

Two point four point five Discussion

Two point four point six Exercises

Two point five. Automatic Differentiation

Two point five point one. A Simple Function

Two point five point two. Backward for Non-Scalar Variables

Two point five point three. Detaching Computation

Two point five point four Gradients and Python Control Flow

Two point five point five Discussion

Two point five point six Exercises

Two point six Probability and Statistics

Two point six point one A Simple Example: Tossing Coins

Two point six point two A More Formal Treatment

Two point six point three Random Variables

Two point six point four Multiple Random Variables

Two point six point five An Example

Two point six point six Expectations

Two point six point seven Discussion

Two point six point eight Exercises

Discussions sixty-four.

Two point seven point one Functions and Classes in a Module

Two point seven point two Specific Functions and Classes

sixty-seven Discussions sixty-seven.

three point one Linear Regression

three point one point one Basics

Loss Function

Analytic Solution

Minibatch Stochastic Gradient Descent

Predictions

Three point one point two Vectorization for Speed

Three point one point three The Normal Distribution and Squared Loss

Three point one point four Linear Regression as a Neural Network

Biology

Three point one point five Summary

Three point one point six Exercises

Discussions.

Three point two point one. Utilities

Three point two point two. Models

Three point two point three. Data

Three point two point four. Training

Discussions.

Three point two point six. Exercises

Three point three. Synthetic Regression Data

Three point three point one. Generating the Dataset

Three point three point two. Reading the Dataset

Three point three point three. Concise Implementation of the Data Loader

Three point three point four. Summary

Three point three point five. Exercises

Three point four. Linear Regression Implementation from Scratch

Three point four point one. Defining the Model

Three point four point two Defining the Loss Function

Three point four point three Defining the Optimization Algorithm

Three point four point four Training

Three point four point five Summary

Three point four point six Exercises

Discussions.

Three point five point one. Defining the Model.

Three point five point two. Defining the Loss Function.

Three point five point three. Defining the Optimization Algorithm.

Three point five point four. Training.

Three point five point five. Summary.

Three point five point six Exercises

Discussions eighty.

Three point six point one Training Error and Generalization Error

Model Complexity

Three point six point two Underfitting or Overfitting?

Polynomial Curve Fitting

Dataset Size

Three point six point three Model Selection

Cross-Validation

Three point six point four Summary

Three point six point five Exercises

Three point seven Weight Decay

Three point seven point one Norms and Weight Decay

Three point seven point two High-Dimensional Linear Regression

Three point seven point three Implementation from Scratch

Defining L two Norm Penalty

Defining the Model

Training without Regularization

Using Weight Decay

Three point seven point four Concise Implementation

Three point seven point five Summary

Three point seven point six Exercises

Four Linear Neural Networks for Classification

Four point one Softmax Regression

Four point one point one Classification

Linear Model

The Softmax

Vectorization

Four point one point two Loss Function

Softmax and Cross-Entropy Loss

Four point one point nine

Four point one point ten

Four point one point three Information Theory Basics

Entropy

Surprisal

Cross-Entropy Revisited

Four point one point four Summary and Discussion

Four point one point five Exercises

Discussions ninety-two.

Four point two point one. Loading the Dataset

Four point two point two. Reading a Minibatch

Four point two point four. Summary

Four point two point five. Exercises

Four point three. The Base Classification Model

Four point three point one The Classifier Class

Four point three point two Accuracy

Four point three point three Summary

Four point three point four Exercises

Four point four Softmax Regression Implementation from Scratch

Four point four point one The Softmax

Four point four point two The Model

Four point four point three The Cross-Entropy Loss

Four point four point five Prediction

Four point four point six Summary

Four point four point seven Exercises

Four point five. Concise Implementation of Softmax Regression

Four point five point one. Defining the Model

Four point five point two. Softmax Revisited

Four point five point three. Training

Four point five point four. Summary

Four point five point five. Exercises

Four point six. Generalization in Classification

Four point six point one The Test Set

Four point six point two Test Set Reuse

Four point six point three Statistical Learning Theory

Four point six point four Summary

Four point six point five Exercises

Four point seven Environment and Distribution Shift

Four point seven point one Types of Distribution Shift

Covariate Shift

Label Shift

Concept Shift

Four point seven point two Examples of Distribution Shift

Medical Diagnostics

Self-Driving Cars

Nonstationary Distributions

More Anecdotes

Four point seven point three. Correction of Distribution Shift

Empirical Risk and Risk

Covariate Shift Correction

Label Shift Correction

Concept Shift Correction

Four point seven point four A Taxonomy of Learning Problems

Batch Learning

Online Learning

Bandits

Control

Reinforcement Learning

Four point seven point five Fairness, Accountability, and Transparency in Machine Learning

Four point seven point six Summary

Four point seven point seven Exercises

Five Multilayer Perceptrons

Five point one. Multilayer Perceptrons

Five point one point one. Hidden Layers

Limitations of Linear Models

Incorporating Hidden Layers

From Linear to Nonlinear

Universal Approximators

Five point one point two Activation Functions

ReLU Function

Sigmoid Function

Tanh Function

Five point one point three Summary and Discussion

Five point one point four Exercises

Five point two Implementation of Multilayer Perceptrons

Five point two point one Implementation from Scratch

Initializing Model Parameters

Model

Five point two point two Concise Implementation

Model

Training

Five point two point three Summary

Five point two point four Exercises

Five point three Forward Propagation, Backward Propagation, and Computational Graphs

Five point three point one Forward Propagation

Five point three point two Computational Graph of Forward Propagation

Five point three point three Backpropagation

Five point three point four Training Neural Networks

Five point three point five Summary

Five point three point six Exercises

Five point four Numerical Stability and Initialization

Five point four point one Vanishing and Exploding Gradients

Vanishing Gradients

Exploding Gradients

Breaking the Symmetry

Five point four point two Parameter Initialization

Default Initialization

Xavier Initialization

Beyond

Five point four point three Summary

Five point four point four Exercises

Five point five Generalization in Deep Learning

Five point five point one Revisiting Overfitting and Regularization

Five point five point two Inspiration from Nonparametrics

Five point five point three Early Stopping

Five point five point four Classical Regularization Methods for Deep Networks

Five point five point five Summary

Five point five point six Exercises

Five point six Dropout

Five point six point one Dropout in Practice

Five point six point two Implementation from Scratch

Defining the Model

Training

Five point six point three Concise Implementation

Five point six point four Summary

Five point six point five Exercises

Five point seven Predicting House Prices on Kaggle

Five point seven point one Downloading Data

Five point seven point two Kaggle

Five point seven point three. Accessing and Reading the Dataset

Five point seven point four. Data Preprocessing

Five point seven point five. Error Measure

Five point seven point six. K-Fold Cross-Validation

Five point seven point seven Model Selection

Five point seven point eight Submitting Predictions on Kaggle

Submitting data to Kaggle

Five point seven point ten Exercises

Six Builders' Guide

Six point one Layers and Modules

Six point one point one A Custom Module

Six point one point two The Sequential Module

Six point one point three Executing Code in the Forward Propagation Method

Six point one point four Summary

Six point one point five Exercises

Six point two Parameter Management

Six point two point one Parameter Access

Targeted Parameters

Six point two point two Tied Parameters

Six point two point three Summary

Six point two point four Exercises

Six point three Parameter Initialization

Six point three point one Built-in Initialization

Parameter Initialization

Custom Initialization

Six point three point two Summary

Six point three point three Exercises

Lazy Initialization

Six point four point one Summary

Six point four point two Exercises

Custom Layers

Six point five point one Layers without Parameters

Six point five point three Summary

Six point five point four Exercises

Six point six File I/O

Six point six point one Loading and Saving Tensors

Six point six point three Summary

Six point six point four Exercises

Six point seven GPUS

Six point seven point one Computing Devices

Six point seven point two Tensors and GPUs

Storage on the GPU

Side Notes

Six point seven point three Neural Networks and GPUS

Six point seven point four Summary

Six point seven point five Exercises

Seven Convolutional Neural Networks

Seven point one From Fully Connected Layers to Convolutions

Seven point one point one Invariance

Seven point one point two Constraining the MLP

Translation Invariance

Seven point one point two

Locality

Seven point one point three Convolutions

Seven point one point four Channels

Seven point one point five. Summary and Discussion

Seven point one point six. Exercises

Seven point two. Convolutions for Images

Seven point two point one. The Cross-Correlation Operation

Seven point two point two. Convolutional Layers

Seven point two point three. Object Edge Detection in Images

Seven point two point four Learning a Kernel

Seven point two point five Cross-Correlation and Convolution

Seven point two point six Feature Map and Receptive Field

Seven point two point seven Summary

Seven point two point eight Exercises

Seven point three Padding and Stride

Seven point three point one Padding

Seven point three point two Stride

Seven point three point three Summary and Discussion

Seven point three point four Exercises

Seven point four Multiple Input and Multiple Output Channels

Seven point four point one Multiple Input Channels

Seven point four point two Multiple Output Channels

Seven point four point three One by One Convolutional Layer

Seven point four point four Discussion

Seven point four point five Exercises

Seven point five Pooling

Seven point five point one Maximum Pooling and Average Pooling

Seven point five point two Padding and Stride

Seven point five point three Multiple Channels

Seven point five point four Summary

Seven point five point five Exercises

Seven point six Convolutional Neural Networks (LeNet)

Seven point six point one LeNet

Seven point six point two Training

Seven point six point three Summary

Seven point six point four Exercises

Eight. Modern Convolutional Neural Networks

Eight point one. Deep Convolutional Neural Networks AlexNet

Eight point one point one. Representation Learning

Missing Ingredient: Data

Missing Ingredient: Hardware

Eight point one point two AlexNet

Architecture

Activation Functions

Capacity Control and Preprocessing

Eight point one point three Training

Eight point one point four Discussion

Eight point one point five Exercises

Eight point two Networks Using Blocks (VGG)

Eight point two point one VGG Blocks

Eight point two point two VGG Network

Eight point two point three Training

Eight point two point four Summary

Eight point two point five Exercises

Eight point three Network in Network (NiN)

Eight point three point one NiN Blocks

Eight point three point two NiN Model

Eight point three point three Training

Eight point three point four Summary

Eight point three point five Exercises

Eight point four Multi-Branch Networks (GoogLeNet)

Eight point four point one Inception Blocks

Eight point four point two GoogLeNet Model

Eight point four point three Training

Eight point four point four Discussion

Eight point four point five Exercises

Eight point five Batch Normalization

Eight point five point one Training Deep Networks

Eight point five point two Batch Normalization Layers

Fully Connected Layers

Convolutional Layers

Layer Normalization

Batch Normalization During Prediction

Eight point five point three Implementation from Scratch

Eight point five point four LeNet with Batch Normalization

Eight point five point five Concise Implementation

Eight point five point six Discussion

Eight point five point seven Exercises

Discussions One hundred thirty-two

Eight point six point one Function Classes

Eight point six point two Residual Blocks

Eight point six point three ResNet Model

Eight point six point four Training

Eight point six point five ResNeXt

Eight point six point six Summary and Discussion

Eight point six point seven Exercises

Eight point seven Densely Connected Networks (DenseNet)

Eight point seven point one From ResNet to DenseNet

Eight point seven point two Dense Blocks

Eight point seven point three Transition Layers

Eight point seven point five Training

Eight point seven point six Summary and Discussion

Eight point seven point seven Exercises

Eight point eight. Designing Convolution Network Architectures

Eight point eight point one. The AnyNet Design Space

Eight point eight point two Distributions and Parameters of Design Spaces

Eight point eight point three RegNet

Eight point eight point four Training

Eight point eight point five Discussion

Eight point eight point six Exercises

Nine Recurrent Neural Networks

Nine point one. Working with Sequences

Nine point one point one. Autoregressive Models

Nine point one point two Sequence Models

Markov Models

The Order of Decoding

Nine point one point three Training

Nine point one point four Prediction

Nine point one point five Summary

Nine point one point six Exercises

Nine point two Converting Raw Text into Sequence Data

Nine point two point one. Reading the Dataset

Nine point two point two. Tokenization

Nine point two point three. Vocabulary

Nine point two point four. Putting It All Together

Nine point two point five. Exploratory Language Statistics

Nine point two point six Summary

Nine point two point seven Exercises

Nine point three Language Models

Nine point three point one Learning Language Models

Markov Models and n-grams

Word Frequency

Laplace Smoothing

Nine point three point two Perplexity

Nine point three point three Partitioning Sequences

Nine point three point four Summary and Discussion

Nine point three point five Exercises

Discussions one hundred forty.

Nine point four point one Neural Networks without Hidden States

Nine point four point two Recurrent Neural Networks with Hidden States

Nine point four point three RNN-Based Character-Level Language Models

Nine point four point four Summary

Nine point four point five Exercises

Nine point five Recurrent Neural Network Implementation from Scratch

Nine point five point one RNN Model

Nine point five point two R N N-Based Language Model

One-Hot Encoding

Transforming R N N Outputs

Nine point five point three Gradient Clipping

Nine point five point four Training

Nine point five point five Decoding

Nine point five point six Summary

Nine point five point seven Exercises

Nine point six Concise Implementation of Recurrent Neural Networks

Nine point six point one Defining the Model

Concise Implementation of Recurrent Neural Networks

Nine point six point two Training and Predicting

Nine point six point three Summary

Nine point six point four Exercises

Nine point seven Backpropagation Through Time

Nine point seven point one Analysis of Gradients in RNNs

Full Computation

Truncating Time Steps

Randomized Truncation

Comparing Strategies

Nine point seven point two Backpropagation Through Time in Detail

Nine point seven point three Summary

Nine point seven point four Exercises

Ten Modern Recurrent Neural Networks

Ten point one. Long Short-Term Memory

Ten point one point one. Gated Memory Cell

Gated Hidden State

Input Gate, Forget Gate, and Output Gate

Input Node

Memory Cell Internal State

Hidden State

Ten point one point two Implementation from Scratch

Initializing Model Parameters

Training and Prediction

Ten point one point three Concise Implementation

Ten point one point four Summary

Ten point one point five Exercises

Discussions one hundred forty-five

Ten point two point one Reset Gate and Update Gate

Candidate Hidden State

Hidden State

Implementation from Scratch

Initializing Model Parameters

Defining the Model

Training

Concise Implementation

Summary

Exercises

Discussions.

Ten point three point one Implementation from Scratch

Ten point three point two Concise Implementation

Ten point three point four Exercises

Ten point four Bidirectional Recurrent Neural Networks

Ten point four point one Implementation from Scratch

Ten point four point two Concise Implementation

Ten point four point three Summary

Ten point four point four Exercises

Ten point five Machine Translation and the Dataset

Ten point five point one Downloading and Preprocessing the Dataset

Ten point five point two Tokenization

Ten point five point three Loading Sequences of Fixed Length

Ten point five point four Reading the Dataset

Ten point five point five Summary

Ten point five point six Exercises

Ten point six The Encoder-Decoder Architecture

Ten point six point one Encoder

Ten point six point two Decoder

The Encoder-Decoder Architecture

Ten point six point three Putting the Encoder and Decoder Together

Ten point six point four Summary

Ten point six point five Exercises

Ten point seven Sequence-to-Sequence Learning for Machine Translation

Ten point seven point one Teacher Forcing

Ten point seven point two Encoder

Ten point seven point three Decoder

Ten point seven point four Encoder-Decoder for Sequence-to-Sequence Learning

Ten point seven point five Loss Function with Masking

Ten point seven point six Training

Ten point seven point seven Prediction

Ten point seven point eight Evaluation of Predicted Sequences

Ten point seven point four

Ten point seven point nine Summary

Ten point seven point ten Exercises

Ten point eight Beam Search

Ten point eight point one Greedy Search

Ten point eight point two Exhaustive Search

Ten point eight point three Beam Search

Ten point eight point four Summary

Ten point eight point five Exercises

Eleven Attention Mechanisms and Transformers

Eleven point one. Queries, Keys, and Values

Eleven point one point two

Eleven point one point three

Eleven point one point one Visualization

Eleven point one point two Summary

Eleven point one point three Exercises

Eleven point two Attention Pooling by Similarity

Eleven point two point one Kernels and Data

Eleven point two point two Attention Pooling via Nadaraya-Watson Regression

Eleven point two point three Adapting Attention Pooling

Eleven point two point four Summary

Eleven point two point five Exercises

Discussions one hundred fifty-six.

Eleven point three point one Dot Product Attention

Eleven point three point two Convenience Functions

Masked Softmax Operation

Batch Matrix Multiplication

Eleven point three point three Scaled Dot Product Attention

Eleven point three point six

Eleven point three point four Additive Attention

Eleven point three point five Summary

Eleven point three point six Exercises

One hundred fifty-eight Discussions one hundred fifty-eight

Eleven point four point two Defining the Decoder with Attention

Eleven point four point three Training

Eleven point four point four Summary

Eleven point four point five Exercises

Discussions

Eleven point five point one Model

Multi-Head Attention

Eleven point five point two Implementation

Eleven point five point three Summary

Eleven point five point four Exercises

One hundred sixty Discussions one hundred sixty

Eleven point six point one Self-Attention

Eleven point six point two Comparing CNNs, RNNs, and Self-Attention

Eleven point six point three Positional Encoding

Absolute Positional Information

Relative Positional Information

Eleven point six point four Summary

Eleven point six point five Exercises

Discussions one hundred sixty-one.

Eleven point seven point one Model

The Transformer architecture.

Eleven point seven point two Positionwise Feed-Forward Networks

Eleven point seven point three Residual Connection and Layer Normalization

Eleven point seven point four Encoder

Eleven point seven point five Decoder

Eleven point seven point six Training

Eleven point seven point seven Summary

Eleven point seven point eight Exercises

Eleven point eight. Transformers for Vision

Eleven point eight point one. Model

Eleven point eight point two. Patch Embedding

Eleven point eight point three. Vision Transformer Encoder

Eleven point eight point four. Putting It All Together

Eleven point eight point five. Training

Eleven point eight point six. Summary and Discussion

Eleven point eight point seven Exercises

Eleven point nine Large-Scale Pretraining with Transformers

Eleven point nine point one Encoder-Only

Pretraining BERT

Fine-Tuning BERT

Eleven point nine point two Encoder-Decoder

Pretraining T Five

Fine-Tuning T Five

Eleven point nine point three Decoder-Only

GPT and GPT Two

GPT Three and Beyond

Eleven point nine point four Scalability

Eleven point nine point five Large Language Models

Eleven point nine point six Summary and Discussion

Eleven point nine point seven Exercises

Twelve Optimization Algorithms

Twelve point one Optimization and Deep Learning

Twelve point one point one Goal of Optimization

Twelve point one point two Optimization Challenges in Deep Learning

Local Minima

Twelve point one point one

Saddle Points

Vanishing Gradients

Twelve point one point three Summary

Twelve point one point four Exercises

Twelve point two. Convexity

Twelve point two point one. Definitions

Convex Sets

Twelve point two point one

Convex Functions

Jensen's Inequality

Twelve point two point two Properties

Local Minima Are Global Minima

Convexity and Second Derivatives

Twelve point two point three Constraints

Lagrangian

Penalties

Projections

Twelve point two point four Summary

Twelve point two point five Exercises

Twelve point three Gradient Descent

Twelve point three point one One-Dimensional Gradient Descent

Learning Rate

Local Minima

Twelve point three point three Adaptive Methods

Newton's Method

Convergence Analysis

Preconditioning

Gradient Descent with Line Search

Twelve point three point four Summary

Twelve point three point five Exercises

Twelve point four Stochastic Gradient Descent

Twelve point four point one Stochastic Gradient Updates

Twelve point four point two Dynamic Learning Rate

Twelve point four point three Convergence Analysis for Convex Objectives

Twelve point four point four Stochastic Gradients and Finite Samples

Twelve point four point five Summary

Twelve point four point six Exercises

Twelve point five Minibatch Stochastic Gradient Descent

Twelve point five point one Vectorization and Caches

Twelve point five point two Minibatches

Twelve point five point one

Twelve point five point three Reading the Dataset

Twelve point five point four Implementation from Scratch

Twelve point five point five Concise Implementation

Twelve point five point six Summary

Twelve point five point seven Exercises

Discussions one hundred seventy-three

Twelve point six point one Basics

Leaky Averages

An Ill-conditioned Problem

The Momentum Method

Effective Sample Weight

Twelve point six point two Practical Experiments

Implementation from Scratch

Concise Implementation

Twelve point six point three Theoretical Analysis

Quadratic Convex Functions

Scalar Functions

Twelve point six point four Summary

Twelve point six point five Exercises

Twelve point seven Adagrad

Twelve point seven point one Sparse Features and Learning Rates

Twelve point seven point two Preconditioning

A one A d

Twelve point seven point three The Algorithm

Twelve point seven point four Implementation from Scratch

Twelve point seven point five Concise Implementation

Twelve point seven point seven Exercises

Twelve point eight RMSProp

Twelve point eight point one The Algorithm

Twelve point eight point two Implementation from Scratch

Twelve point eight point three Concise Implementation

Twelve point eight point four Summary

Twelve point eight point five Exercises

Discussions one hundred seventy-nine.

Twelve point nine point one The Algorithm

Twelve point nine point two Implementation

Twelve point nine point three Summary

Twelve point nine point four Exercises

Twelve point ten Adam

Twelve point ten point one The Algorithm

Twelve point ten point two Implementation

Twelve point ten point three Yogi

Twelve point ten point four Summary

Twelve point ten point five Exercises

Twelve point eleven Learning Rate Scheduling

Twelve point eleven point one Toy Problem

Twelve point eleven point two Schedulers

Twelve point eleven point three Policies

Factor Scheduler

Multi Factor Scheduler

Cosine Scheduler

Warmup

Twelve point eleven point four Summary

Twelve point eleven point five Exercises

Thirteen Computational Performance

Thirteen point one Compilers and Interpreters

Thirteen point one point one Symbolic Programming

Thirteen point one point two Hybrid Programming

Thirteen point one point three Hybridizing the Sequential Class

Acceleration by Hybridization

Serialization

Thirteen point one point four Summary

Thirteen point one point five Exercises

Discussions one hundred eighty-four.

Thirteen point two point one Asynchrony via Backend

Thirteen point two point two Barriers and Blockers

Thirteen point two point four Summary

Thirteen point two point five Exercises

Discussions one hundred eighty-five

Thirteen point three point one Parallel Computation on GPUs

Thirteen point three point two Parallel Computation and Communication

Thirteen point three point three Summary

Thirteen point three point four Exercises

Discussions.

Thirteen point four point one Computers

Thirteen point four point two Memory

Thirteen point four point three Storage

Hard Disk Drives

Solid State Drives

Cloud Storage

Thirteen point four point four CPUs

Microarchitecture

Vectorization

Cache

Thirteen point four point five GPUs and other Accelerators

Thirteen point four point six Networks and Buses

Thirteen point four point seven More Latency Numbers

Thirteen point four point eight Summary

Thirteen point four point nine Exercises

Discussions two hundred six

Thirteen point five point one Splitting the Problem

Thirteen point five point two Data Parallelism

Thirteen point five point three A Toy Network

Thirteen point five point four Data Synchronization

Thirteen point five point five Distributing Data

Thirteen point five point six Training

Thirteen point five point seven Summary

Thirteen point five point eight Exercises

Thirteen point six Concise Implementation for Multiple GPUs

Thirteen point six point one A Toy Network

Concise Implementation for Multiple GPUs

Thirteen point six point three Training

Thirteen point six point four Summary

Thirteen point six point five Exercises

Thirteen point seven Parameter Servers

Thirteen point seven point one Data-Parallel Training

Thirteen point seven point two Ring Synchronization

Thirteen point seven point three Multi-Machine Training

Multi-machine multi-GPU distributed parallel training.

Thirteen point seven point four Key-Value Stores

Thirteen point seven point five Summary

Thirteen point seven point six Exercises

Fourteen Computer Vision

Fourteen point one. Image Augmentation

Fourteen point one point one. Common Image Augmentation Methods

Flipping and Cropping

Changing Colors

Combining Multiple Image Augmentation Methods

Fourteen point one point two. Training with Image Augmentation

Multi-GPU Training

Fourteen point one point three Summary

Fourteen point one point four Exercises

Fourteen point two Fine-Tuning

Fourteen point two point one Steps

Fourteen point two point two Hot Dog Recognition

Reading the Dataset

Defining and Initializing the Model

Fine-Tuning the Model

Fourteen point two point three Summary

Fourteen point two point four Exercises

Fourteen point three Object Detection and Bounding Boxes

Fourteen point three point one Bounding Boxes

Fourteen point three point two Summary

Fourteen point three point three Exercises

Fourteen point four Anchor Boxes

Fourteen point four point one Generating Multiple Anchor Boxes

Fourteen point four point two Intersection over Union (IoU)

Fourteen point four point three Labeling Anchor Boxes in Training Data

Assigning Ground-Truth Bounding Boxes to Anchor Boxes

Labeling Classes and Offsets

An Example

Fourteen point four point four Predicting Bounding Boxes with Non-Maximum Suppression

The following nms function sorts confidence scores in descending order and returns their indices.

Fourteen point four point five Summary

Fourteen point four point six Exercises

Fourteen point five Multiscale Object Detection

Fourteen point five point one Multiscale Anchor Boxes

Fourteen point five point two Multiscale Detection

Fourteen point five point three Summary

Fourteen point five point four Exercises

Fourteen point six The Object Detection Dataset

Fourteen point six point one Downloading the Dataset

Fourteen point six point two Reading the Dataset

Fourteen point six point three Demonstration

Fourteen point six point four Summary

Fourteen point six point five Exercises

Fourteen point seven Single Shot Multibox Detection

Fourteen point seven point one Model

Class Prediction Layer

Bounding Box Prediction Layer

Concatenating Predictions for Multiple Scales

Downsampling Block

Base Network Block

The Complete Model

Fourteen point seven point two Training

Reading the Dataset and Initializing the Model

Defining Loss and Evaluation Functions

Training the Model

Fourteen point seven point three Prediction

Fourteen point seven point four Summary

Fourteen point seven point five Exercises

(Fourteen point seven point one)

Fourteen point eight. Region-based CNNs

Fourteen point eight point one R-CNNs

Fourteen point eight point two Fast R-CNN

Fourteen point eight point three Faster R-CNN

Fourteen point eight point four Mask R-CNN

Fourteen point eight point five Summary

Fourteen point eight point six Exercises

Fourteen point nine Semantic Segmentation and the Dataset

Fourteen point nine point one Image Segmentation and Instance Segmentation

Fourteen point nine point two The Pascal VOC two thousand twelve Semantic Segmentation Dataset

Data Preprocessing

Custom Semantic Segmentation Dataset Class

Reading the Dataset

Fourteen point nine point three Summary

Fourteen point nine point four Exercises

Discussions Two hundred twenty-one.

Fourteen point ten point one Basic Operation

Fourteen point ten point two Padding, Strides, and Multiple Channels

Fourteen point ten point three Connection to Matrix Transposition

Fourteen point ten point four Summary

Fourteen point ten point five Exercises

Discussions two hundred twenty-two

Fourteen point eleven point one The Model

Fourteen point eleven point two Initializing Transposed Convolutional Layers

Fourteen point eleven point three Reading the Dataset

Fourteen point eleven point four Training

Fourteen point eleven point five Prediction

Fourteen point eleven point six Summary

Fourteen point eleven point seven Exercises

Fourteen point twelve Neural Style Transfer

Fourteen point twelve point one Method

Fourteen point twelve point two Reading the Content and Style Images

Fourteen point twelve point three Preprocessing and Postprocessing

Fourteen point twelve point four Extracting Features

Fourteen point twelve point five Defining the Loss Function

Content Loss

Style Loss

Total Variation Loss

makes values of neighboring pixels on the synthesized image closer.

Loss Function

Fourteen point twelve point six Initializing the Synthesized Image

Fourteen point twelve point seven Training

Fourteen point twelve point eight Summary

Fourteen point twelve point nine Exercises

Fourteen point thirteen Image Classification (CIFAR-Ten) on Kaggle

Fourteen point thirteen point one Obtaining and Organizing the Dataset

Downloading the Dataset

Organizing the Dataset

Fourteen point thirteen point two Image Augmentation

Fourteen point thirteen point three Reading the Dataset

Fourteen point thirteen point four Defining the Model

Fourteen point thirteen point five Defining the Training Function

Fourteen point thirteen point six Training and Validating the Model

Fourteen point thirteen point seven Classifying the Testing Set and Submitting Results on Kaggle

Fourteen point thirteen point eight Summary

Fourteen point thirteen point nine Exercises

Fourteen point fourteen Dog Breed Identification ImageNet Dogs on Kaggle

Fourteen point fourteen point one Obtaining and Organizing the Dataset

Downloading the Dataset

Organizing the Dataset

Fourteen point fourteen point two Image Augmentation

Fourteen point fourteen point three Reading the Dataset

Fourteen point fourteen point four Fine-Tuning a Pretrained Model

Fourteen point fourteen point five Defining the Training Function

Fourteen point fourteen point six Training and Validating the Model

Fourteen point fourteen point eight Summary

Fourteen point fourteen point nine Exercises

Fifteen Natural Language Processing: Pretraining

Fifteen point one Word Embedding (word2vec)

Fifteen point one point one One-Hot Vectors Are a Bad Choice

Fifteen point one point two Self-Supervised word2vec

Fifteen point one point three The Skip-Gram Model

Training

Fifteen point one point four The Continuous Bag of Words (CBOW) Model

Training

Fifteen point one point five Summary

Fifteen point one point six Exercises

Fifteen point two Approximate Training

Fifteen point two point one Negative Sampling

Approximate Training

Fifteen point two point two Hierarchical Softmax

Fifteen point two point three Summary

Fifteen point two point four Exercises

Discussions two hundred twenty-nine.

Fifteen point three point one Reading the Dataset

Fifteen point three point two Subsampling

Fifteen point three point three Extracting Center Words and Context Words

Fifteen point three point four Negative Sampling

Fifteen point three point five Loading Training Examples in Minibatches

Fifteen point three point six Putting It All Together

Fifteen point three point seven Summary

Fifteen point three point eight Exercises

Fifteen point four Pretraining word2vec

Fifteen point four point one The Skip-Gram Model

Embedding Layer

Defining the Forward Propagation

Fifteen point four point two Training

Binary Cross-Entropy Loss

Initializing Model Parameters

Defining the Training Loop

Fifteen point four point three Applying Word Embeddings

Fifteen point four point four Summary

Fifteen point four point five Exercises

Fifteen point five Word Embedding with Global Vectors (GloVe)

Fifteen point five point one Skip-Gram with Global Corpus Statistics

Fifteen point five point two The GloVe Model

Fifteen point five point three Interpreting GloVe from the Ratio of Co-occurrence Probabilities

Fifteen point five point four Summary

Fifteen point five point five Exercises

Fifteen point six Subword Embedding

Fifteen point six point one The fastText Model

Fifteen point six point two Byte Pair Encoding

Fifteen point six point three Summary

Fifteen point six point four Exercises

Fifteen point seven Word Similarity and Analogy

Fifteen point seven point one Loading Pretrained Word Vectors

Fifteen point seven point two Applying Pretrained Word Vectors

Word Similarity

Word Analogy

Fifteen point seven point three Summary

Fifteen point seven point four Exercises

Fifteen point eight Bidirectional Encoder Representations from Transformers (BERT)

Fifteen point eight point one From Context-Independent to Context-Sensitive

Fifteen point eight point two From Task-Specific to Task-Agnostic

Fifteen point eight point three BERT: Combining the Best of Both Worlds

Fifteen point eight point four Input Representation

Fifteen point eight point five Pretraining Tasks

Masked Language Modeling

Next Sentence Prediction

Fifteen point eight point six Putting It All Together

Fifteen point eight point seven Summary

Fifteen point eight point eight Exercises

Fifteen point nine The Dataset for Pretraining BERT

Fifteen point nine point one. Defining Helper Functions for Pretraining Tasks

Generating the Next Sentence Prediction Task

Generating the Masked Language Modeling Task

Fifteen point nine point two. Transforming Text into the Pretraining Dataset

Fifteen point nine point three Summary

Fifteen point nine point four Exercises

Two hundred thirty-nine Discussions two hundred thirty-nine.

Fifteen point ten point one Pretraining BERT

Fifteen point ten point two Representing Text with BERT

Fifteen point ten point three Summary

Fifteen point ten point four Exercises

Two hundred forty Discussions two hundred forty.

Sixteen point one. Sentiment Analysis and the Dataset

Sixteen point one point one. Reading the Dataset

Sentiment Analysis and the Dataset

Sixteen point one point two. Preprocessing the Dataset

Sixteen point one point three. Creating Data Iterators

Sixteen point one point four. Putting It All Together

Sixteen point one point five. Summary

Sixteen point one point six. Exercises

Discussions .

Sixteen point two point one. Representing Single Text with RNNs

Sixteen point two point two Loading Pretrained Word Vectors

Sixteen point two point three Training and Evaluating the Model

Sixteen point two point four Summary

Sixteen point two point five Exercises

Discussions two hundred forty-four

Sixteen point three point one One-Dimensional Convolutions

Sixteen point three point two Max-Over-Time Pooling

Sixteen point three point three The textCNN Model

Defining the Model

Loading Pretrained Word Vectors

Training and Evaluating the Model

Sixteen point three point five Exercises

Sixteen point four Natural Language Inference and the Dataset

Sixteen point four point one Natural Language Inference

Sixteen point four point two The Stanford Natural Language Inference Dataset

Reading the Dataset

Defining a Class for Loading the Dataset

Putting It All Together

Sixteen point four point three Summary

Sixteen point four point four Exercises

Sixteen point five Natural Language Inference: Using Attention

Sixteen point five point one The Model

Attending

Comparing

Aggregating

Putting It All Together

Sixteen point five point two Training and Evaluating the Model

Reading the dataset

Creating the Model

Training and Evaluating the Model

Using the Model

Sixteen point five point three Summary

Sixteen point five point four Exercises

Sixteen point six Fine-Tuning BERT for Sequence-Level and Token-Level Applications

Sixteen point six point one Single Text Classification

Sixteen point six point two Text Pair Classification or Regression

Sixteen point six point three Text Tagging

Sixteen point six point four Question Answering

Sixteen point six point five Summary

Sixteen point six point six Exercises

Sixteen point seven Natural Language Inference: Fine-Tuning BERT

Sixteen point seven point one Loading Pretrained BERT

Sixteen point seven point two The Dataset for Fine-Tuning BERT

Sixteen point seven point three Fine-Tuning BERT

Sixteen point seven point four Summary

Sixteen point seven point five Exercises

Seventeen Reinforcement Learning

Seventeen point one Markov Decision Process

Seventeen point one point one Definition of a Markov Decision Process

Markov Decision Process

Seventeen point one point two Return and Discount Factor

Seventeen point one point three Discussion of the Markov Assumption

Seventeen point one point four Summary

Seventeen point one point five Exercises

Seventeen point two Value Iteration

Seventeen point two point one Stochastic Policy

Seventeen point two point two Value Function

Seventeen point two point one

Seventeen point two point two

Seventeen point two point three

Seventeen point two point three Action-Value Function

Seventeen point two point four Optimal Stochastic Policy

Seventeen point two point five Principle of Dynamic Programming

Seventeen point two point six Value Iteration

Seventeen point two point seven Policy Evaluation

17.2.8 Implementation of Value Iteration

Seventeen point two point nine Summary

Seventeen point two point ten Exercises

Discussions

Seventeen point three point one The Q-Learning Algorithm

Seventeen point three point two An Optimization Problem Underlying Q-Learning

Seventeen point three point three Exploration in Q-Learning

Seventeen point three point four The "Self-correcting" Property of Q-Learning

Seventeen point three point five. Implementation of Q-Learning

Seventeen point three point six. Summary

Seventeen point three point seven. Exercises

Two hundred fifty-seven Discussions two hundred fifty-seven. Gaussian Processes

Eighteen point one. Introduction to Gaussian Processes

Eighteen point one point one Summary

Eighteen point one point two Exercises

Eighteen point two Gaussian Process Priors

Eighteen point two point one Definition

Eighteen point two point two A Simple Gaussian Process

Eighteen point two point three From Weight Space to Function Space

Eighteen point two point four The Radial Basis Function Kernel

Eighteen point two point five The Neural Network Kernel

Eighteen point two point six Summary

Eighteen point two point seven Exercises

Eighteen point three Gaussian Process Inference

Eighteen point three point one Posterior Inference for Regression

Eighteen point three point two Equations for Making Predictions and Learning Kernel Hyperparameters in GP Regression

Eighteen point three point three Interpreting Equations for Learning and Predictions

Eighteen point three point four Worked Example from Scratch

Eighteen point three point five Making Life Easy with GPyTorch

Eighteen point three point six Summary

Eighteen point three point seven Exercises

Discussions two hundred sixty-four

Nineteen point one What Is Hyperparameter Optimization?

Nineteen point one point one The Optimization Problem

The Objective Function

The Configuration Space

Nineteen point one point two Random Search

Nineteen point one point three Summary

Nineteen point one point four Exercises

Nineteen point two Hyperparameter Optimization API

Nineteen point two point one Searcher

Nineteen point two point two Scheduler

Nineteen point two point three Tuner

Nineteen point two point four Bookkeeping the Performance of HPO Algorithms

Nineteen point two point five Example: Optimizing the Hyperparameters of a Convolutional Neural Network

Nineteen point two point six Comparing HPO Algorithms

Nineteen point two point seven Summary

Nineteen point two point eight Exercises

Nineteen point three Asynchronous Random Search

Nineteen point three point one Objective Function

Nineteen point three point two Asynchronous Scheduler

Nineteen point three point three Visualize the Asynchronous Optimization Process

Nineteen point three point five Exercises

Two. Advanced. The goal of this exercise is to implement a new scheduler in Syne Tune.

Discussions.

Nineteen point four point one Successive Halving

Nineteen point four point two Summary

Discussions two hundred seventy-two.

Nineteen point five point one. Objective Function

Nineteen point five point two. Asynchronous Scheduler

Nineteen point five point three. Visualize the Optimization Process

Nineteen point five point four. Summary

Discussions two hundred seventy-three

Twenty point one. Generative Adversarial Networks

Twenty point one point one. Generate Some "Real" Data

Twenty point one point two. Generator

Twenty point one point three. Discriminator

Twenty point one point four. Training

Twenty point one point five. Summary

Twenty point one point six Exercises

Discussions two hundred seventy-five

Twenty point two point two The Generator

Twenty point two point one

Twenty point two point three Discriminator

Twenty point two point two

Twenty point two point three

Twenty point two point four Training

Twenty point two point five Summary

Twenty point two point six Exercises

A point one Geometry and Linear Algebraic Operations

A point one point one Geometry of Vectors

A point one point two. Dot Products and Angles

Cosine Similarity

A point one point three. Hyperplanes

A point one four Geometry of Linear Transformations

A point one five Linear Dependence

A point one point six Rank

A point one point seven Invertibility

Numerical Issues

A point one point eight Determinant

A point one point nine Tensors and Common Linear Algebra Operations

Common Examples from Linear Algebra

Expressing in Code

A point one point ten Summary

A point one point eleven Exercises

A point two Eigendecompositions

A point two point one Finding Eigenvalues

An Example

A point two point two Decomposing Matrices

A point two point three Operations on Eigendecompositions

A point two point four Eigendecompositions of Symmetric Matrices

A point two point five Gershgorin Circle Theorem

A point two point six A Useful Application: The Growth of Iterated Maps

Eigenvectors as Long Term Behavior

Behavior on Random Data

Relating Back to Eigenvectors

An Observation

Fixing the Normalization

A point two point seven Discussion

A point two point eight Summary

A point two point nine Exercises

A point three Single Variable Calculus

A point three point one Differential Calculus

A point three point two Rules of Calculus

Common Derivatives

Derivative Rules

Linear Approximation

Higher Order Derivatives

Taylor Series

A point three point three Summary

A point three point four Exercises

Discussions two hundred eighty

A point four point one Higher-Dimensional Differentiation

A point four point two Geometry of Gradients and Gradient Descent

A point four three A Note on Mathematical Optimization

A point four four Multivariate Chain Rule

Another more subtle example of the chain rule.

A point four five The Backpropagation Algorithm

A point four six Hessians

A point four point seven A Little Matrix Calculus

A point four point eight Summary

A point four point nine Exercises

A point five Integral Calculus

A point five point one Geometric Interpretation

A point five point two The Fundamental Theorem of Calculus

A point five point three Change of Variables

A point five point four A Comment on Sign Conventions

A point five point five Multiple Integrals

A point five point six Change of Variables in Multiple Integrals

A. five point seven Summary

A. five point eight Exercises

A. six Random Variables

A. six point one Continuous Random Variables

From Discrete to Continuous

Probability Density Functions

Cumulative Distribution Functions

Means

Overview

A detailed exploration of deep learning concepts, tools, and methodologies, providing readers with essential knowledge in data preparation, neural networks, and implementation strategies.

Key Points

1Covers foundational topics in machine learning and deep learning
2Discusses the importance of data preprocessing and model evaluation
3Introduces various neural network architectures and their applications
4Highlights issues like generalization, overfitting, and practical implementations
5Emphasizes the relevance of advanced techniques like convolutional and recurrent neural networks.

Details

Authors: ASTON ZHANG, ZACHARY C. LIPTON, MU LI, ALEXANDER J. SMOLA
Category: Technology and Engineering

PDF
Fast Video Shot Transition Localization with Deep Structured Models
This document presents a novel framework for detecting both cut and gradual video shot transitions using deep structured models, addressing the shortcomings of existing methods in video analysis. It also introduces a new database, ClipShots, for training and evaluation purposes.
PDF
Voice ChatGPT: AI Conversations and Privacy
This document introduces the capabilities of ChatGPT as an AI conversation tool, detailing user agreements regarding terms and privacy policies, and the review processes in place for improving AI models.
PDF
The Adolescence of Technology
This essay explores the existential risks and opportunities presented by the rapid advancement of artificial intelligence, likening humanity's current technological state to an "adolescence" that could lead to either great progress or significant peril.
PDF
HCI area - Quantitative and Qualitative Modeling and Evaluation
This document explores the interconnected roles of quantitative and qualitative modeling and evaluation in Human-Computer Interaction (HCI) research. It discusses various modeling techniques and their applications in evaluating computer interfaces and user interactions.
PDF
BCA SEM 2 (Data Structures) Unit V: Graphs and Traversal Algorithms
This document serves as a comprehensive guide to graphs in data structures, explaining key concepts such as graph representation, types of graphs, and traversal algorithms like BFS and DFS.