Sitemap - 2023 - The Daily Ink ✍️

📣 The Daily Ink is Going Even Less Daily

Pipeline parallelism as a bandaid on memory limitations

Large models as engines of computation

Distilling models makes them feasible to use

Quadratic Complexity holds back the legendary transformer (Part 2)

Quadratic Complexity holds back the legendary transformer (Part 1)

Composing models together makes them powerful

Multi-modal models are the future

Deep learning solves a 20-year long unsolved problem in science (Part 2)

Deep learning solves a 20-year long unsolved problem in science (Part 1)

Models can do calculus better than you

Is a group of expert models better than one very smart model?

Winning the AI Lottery by Buying a Lot of Tickets

Using information retrieval for code generation

Meta's new model is small and mighty

Models can control robots just like humans

Anthropic makes AI that teaches itself ethics

Models can magically learn new skills at scale

Discovering a better optimization algorithm with evolution

Talking to models requires special prompts that help them think sequentially

Teaching LLMs to use tools and not suck at math

English is just math in prettier clothing

The secret to good writing is editing

Solving context length constraints by distillation

A large language model for SCIENCE

Optimal parallelism in ML training is possible, says ALPA

Google makes a language model for music

Google's LaMDA model is too convincing, and a researcher is fired

Teaching computers to think in abstractions

The secret sauce behind ChatGPT

FlashAttention challenges ML researchers to think about systems-level improvements

Make models smarter not larger, with data pruning

DeepMind attempts to make AI that can do anything

Training Compute Optimal Large Language Models

Gradient Descent: The Ultimate Optimizer

Cramming: Training a Language Model on a Single GPU in One Day

A Neural Corpus Indexer for Document Retrieval