Sitemap - 2023 - The Daily Ink ✍️
📣 The Daily Ink is Going Even Less Daily
Pipeline parallelism as a bandaid on memory limitations
Large models as engines of computation
Distilling models makes them feasible to use
Quadratic Complexity holds back the legendary transformer (Part 2)
Quadratic Complexity holds back the legendary transformer (Part 1)
Composing models together makes them powerful
Multi-modal models are the future
Deep learning solves a 20-year long unsolved problem in science (Part 2)
Deep learning solves a 20-year long unsolved problem in science (Part 1)
Models can do calculus better than you
Is a group of expert models better than one very smart model?
Winning the AI Lottery by Buying a Lot of Tickets
Using information retrieval for code generation
Meta's new model is small and mighty
Models can control robots just like humans
Anthropic makes AI that teaches itself ethics
Models can magically learn new skills at scale
Discovering a better optimization algorithm with evolution
Talking to models requires special prompts that help them think sequentially
Teaching LLMs to use tools and not suck at math
English is just math in prettier clothing
The secret to good writing is editing
Solving context length constraints by distillation
A large language model for SCIENCE
Optimal parallelism in ML training is possible, says ALPA
Google makes a language model for music
Google's LaMDA model is too convincing, and a researcher is fired
Teaching computers to think in abstractions
The secret sauce behind ChatGPT
FlashAttention challenges ML researchers to think about systems-level improvements
Make models smarter not larger, with data pruning
DeepMind attempts to make AI that can do anything
Training Compute Optimal Large Language Models
Gradient Descent: The Ultimate Optimizer
Cramming: Training a Language Model on a Single GPU in One Day