DiscoverPyTorch Developer Podcast
PyTorch Developer Podcast
Claim Ownership

PyTorch Developer Podcast

Author: Edward Yang, Team PyTorch

Subscribed: 13Played: 204
Share

Description

The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.
29 Episodes
Reverse
CMake

CMake

2021-06-1417:49

Why is PyTorch's build so g-dang complicated. How to avoid having to deal with cmake at all? And if you have to deal with cmake, what are the most important things to know? And if you were going to improve our cmake, how would you go about doing it...Further reading.The official CMake documentation is a great help and well worth reading https://cmake.org/documentationIf you work in torch/csrc chances are you'll need to edit this file https://github.com/pytorch/pytorch/blob/master/tools/build_variables.bzlLiner notes.multiple build systems: cmake, buck, xplat buck, ovrsource buck, bazeltools/build_variables.bzl is read from cmake! append_filelistbut not used uniformly for all components! (ouch!)mashed together ATen and Caffe2 build systems (e.g., main library libtorch_cpu is defined in caffe2/CMakeLists.txt)cmake: not very much syntax, "everything is a function". This means you can look up constructs relatively easily; e.g., even if() is a commandthe general cmake model: "set a bunch of variables, run a bunch of commands". cmake is VERY GREPPABLEbut not everything is in CMakeLists.txt; check *.cmake toothe directory structure makes no sense, you really need to grep.(doing a lot of set PARENT_SCOPE to propagate stuff)renaming a file? grep for itprimary hazard of refactoring: need to make sure all the variablesare setup at the new locationmany directories are not recursive glob, beware of adding new directoriesold school cmake: literally everything is stuffed in variables (CMAKE_CXX_FLAGS). new school cmake: attach things to targets, things propagate when you depend on targets (public/private dependencies)add_library: the most important thingdon't randomly change things and pray: have hypotheses and test them
torchdeploy

torchdeploy

2021-06-1113:42

torchdeploy is a way of running multiple Python interpreters inside the same process. It can be used to deploy Python PyTorch programs in situations where the GIL is a problem, not the CPython interpreter. How does it work, and what kind of challenges does it pose for people who want to write code that calls from C++ to Python?Further reading.How the torchdeploy build system works https://dev-discuss.pytorch.org/t/torch-deploy-the-build/238Description of the single interpreter per Tensor invariant https://github.com/pytorch/pytorch/issues/57756Recent work on making it possible to load C extensions into torchdeploy https://dev-discuss.pytorch.org/t/running-multiple-python-interpreters-via-custom-dynamic-loading/241
C++ frontend

C++ frontend

2021-06-1017:07

What's the C++ frontend? Why is avoiding templates so important? Why is Tensor a reference type? How do we simulate keyword arguments in C++? Where did the nn Module support in the C++ API come from? Why did we reimplement all modules in C++? How are modules implemented in C++? What are some performance challenges of writing Python in C++, and how are we working around them?Further reading.C++ frontend tutorial https://pytorch.org/tutorials/advanced/cpp_frontend.htmlWriting Python in C++ (a manifesto) https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)MaybeOwned PR https://github.com/pytorch/pytorch/pull/53317
PyObject preservation

PyObject preservation

2021-06-0916:19

Given two separately refcounted objects, how can you arrange for each of them to stay live so long as the other is live? Why doesn't just having a strong-strong or strong-weak reference between the two objects work? What is object resurrection in CPython? What's a finalizer and why does it make things more complicated? How does Python GC work?Further reading.PyObject preservation PR https://github.com/pytorch/pytorch/pull/56017Sam Gross's original PoC, which works fine if the two objects in question are both PyObjects https://github.com/colesbury/refcount/PEP 442 Safe object finalization https://www.python.org/dev/peps/pep-0442/Essential reading about Python GC https://devguide.python.org/garbage_collector/
Mobile selective build

Mobile selective build

2021-06-0816:02

What is mobile selective build? Why are we so obsessed with reducing binary size? How does selective build work? Why doesn't static linking just work? Why can't you just read out the ops used in a TorchScript model to determine what operators you actually need? What are the tradeoffs of statically determining the operator dependency graph versus tracing? What's up with the SELECTIVE_NAME macro? How the heck does selective build work at all when you have multiple mobile apps in a single Buck build system? What takeaways should I have as a regular PyTorch developer?Further reading:Official open source mobile documentation on custom selective builds https://pytorch.org/mobile/android/#custom-buildHow to rebuild the op dependency yaml https://github.com/pytorch/pytorch/blob/master/tools/code_analyzer/build.shLiner notes: binary size is premium; ship only what you actually needbig idea: get the ops your model needs -> apply this to build of pytorchget the ops your model needs TorchScript ~> read it out directly from the model itselfbut what if ops use other ops?need a dependency graph. done with static analysis llvm (jiakai) ~> with a (possibly inaccurate) yaml checked in for easy kickstart if you don't want to run the pass (updated by bot, not operational since Feb, recommend rebuilding from scratch if you run into trouble)other possibility: dynamic tracingpro: no need for dependency graph, just look at what was called; works for dtypescon: need representative inputs, if control flow might not cover everythingapply this to build of pytorchordinarily: static linking ensures stuff that isn't used gets prunedbut this doesn't work with distributed operator registration based on static initializershow?codegen - just don't generate itno codegen - SELECTIVE_NAME - C++ doesn't support string in macrobuild system integrationbuck constraint: only one librarytherefore: generate multiple copies of glue libraryalt: atomize library into each operator. caffe2 used to do this; each library takes a long time to build (1m) and crashes xcode because there's too manycommon hiccupsmodify implementation details, some op is/isn't called anymore ~> error! usually just means some yaml needs regenerating. PyTorch Edge developers are very friendly and can help
torch.nn

torch.nn

2021-06-0714:18

What goes into the implementation of torch.nn? Why do NN modules exist in the first place? What's the function of Parameter? How do modules actually track all the parameters in question? What is all of the goop in the top level NN module class? What are some new developments in torch.nn modules? What are some open problems with our modules?Further reading:Implementation of nn.Module https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.pynn.Module is complicated and that means its sometimes a bit slow. Some analysis at https://dev-discuss.pytorch.org/t/overhead-in-nn-module-causing-massive-slowdowns-compared-to-raw-cublas-or-torchscript/110Lazy modules PR https://github.com/pytorch/pytorch/pull/44538 and factory kwargs https://github.com/pytorch/pytorch/pull/54508Liner notes: python for hackability (c++ is reimplemented)parameters parameter collection (for optimization) buffers: not considered optimizablemodules functorial operation (_apply) jit script: staged computation (init is not scripted) __call__ to forward (extra instrumentation) serialization / state_dict new stuff: device kwarg (joel schlosser) new stuff: lazy modules (emcastillo) open problems: parameter initialization
Code generation

Code generation

2021-06-0416:51

Why does PyTorch use code generation as part of its build process? Why doesn't it use C++ templates? What things is code generation used for? What are the pros/consof using code generation? What are some other ways to do the same things we currently do with code generation?Further reading.Top level file for the new code generation pipeline https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen.pyOut of tree external backend code generation from Brian Hirsh: https://github.com/pytorch/xla/issues/2871Documentation for native_functions.yaml https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md (have you seen this README before? Yes you've seen this README before. Imma post it again.)Outline:High level: reduce the amount of code in PyTorch, easier to developStrongly typed pythonStuff we're using codegen forMeta point: stuff c++ metaprogramming can't doC++ apis (functions, methods on classes)Especially for forwarding (operator dot doko)Prototypes for c++ to implementYAML files used by external frameworks for binding (accidental)Python arg parsingpyi generationAutograd classes for saving saved dataOtherwise complicated constexpr computation (e.g., parsing JITschema)ProsBetter surface syntax (native_functions.yaml, jit schema,derivatives.yaml)Better error messages (template messages famously bad)Easier to organize complicated code; esp nontrivial inputdata structureEasier to debug by looking at generated codeConNot as portable (template can be used by anyone)Less good modeling for C++ type based metaprogramming (we've replicated a crappy version of C++ type system in our codegen)Counterpoints in the design spaceC++ templates: just as efficientBoxed fallback: simpler, less efficientOpen question: can you have best of both worlds, e.g., with partially evaluated interpreters?
Why is autograd so complicated? What are the constraints and features that go into making it complicated? What's up with it being written in C++? What's with derivatives.yaml and code generation? What's going on with views and mutation? What's up with hooks and anomaly mode? What's reentrant execution? Why is it relevant to checkpointing? What's the distributed autograd engine?Further reading.Autograd notes in the docs https://pytorch.org/docs/stable/notes/autograd.htmlderivatives.yaml https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yamlPaper on autograd engine in PyTorch https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf
__torch_function__

__torch_function__

2021-06-0217:00

What is __torch_function__? Why would I want to use it? What does it have to do with keeping extra metadata on Tensors or torch.fx? How is it implemented? Why is __torch_function__ a really popular way of extending functionality in PyTorch? What makes it different from the dispatcher extensibility mechanism? What are some downsides of it being written this way? What are we doing about it?Further reading.__torch_function__ RFC: https://github.com/pytorch/rfcs/blob/master/RFC-0001-torch-function-for-methods.mdOne of the original GitHub issues tracking the overall design discussion https://github.com/pytorch/pytorch/issues/24015Documentation for using __torch_function__ https://pytorch.org/docs/stable/notes/extending.html#extending-torch
TensorIterator

TensorIterator

2021-06-0117:50

You walk into the whiteboard room to do a technical interview. The interviewer looks you straight in the eye and says, "OK, can you show me how to add the elements of two lists together?" Confused, you write down a simple for loop that iterates through each element and adds them together. Your interviewer rubs his hands together evilly and cackles, "OK, let's make it more complicated."What does TensorIterator do? Why the heck is TensorIterator so complicated? What's going on with broadcasting? Type promotion? Overlap checks? Layout? Dimension coalescing? Parallelization? Vectorization?Further reading.PyTorch TensorIterator internals https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/Why is TensorIterator so slow https://dev-discuss.pytorch.org/t/comparing-the-performance-of-0-4-1-and-master/136Broadcasting https://pytorch.org/docs/stable/notes/broadcasting.html and type promotion https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc
native_functions.yaml

native_functions.yaml

2021-05-2815:32

What does native_functions.yaml have to do with the TorchScript compiler? What multiple use cases is native_functions.yaml trying to serve? What's up with the JIT schema type system? Why isn't it just Python types? What the heck is the (a!) thingy inside the schema? Why is it important that I actually annotate all of my functions accurately with this information? Why is my seemingly BC change to native_functions.yaml actually breaking people's code? Do I have to understand the entire compiler to understand how to work with these systems?Further reading.native_functions.yaml README https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.mdTracking issue for serializing default arguments https://github.com/pytorch/pytorch/issues/54613Test for BC breaking changes in native_functions.yaml https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py
Serialization

Serialization

2021-05-2717:06

What is serialization? Why do I care about it? How is serialization done in general in Python? How does pickling work? How does PyTorch implement pickling for its objects? What are some pitfalls of pickling implementation? What does backwards compatibility and forwards compatibility mean in the context of serialization? What's the difference between directly pickling and using torch.save/load? So what the heck is up with JIT/TorchScript serialization? Why did we use zip files? What were some design principles for the serialization format? Why are there two implementations of serialization in PyTorch? Is the fact that PyTorch uses pickling for serialization mean that our serialization format is insecure?Further reading.TorchScript serialization design doc https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/docs/serialization.mdEvolution of serialization formats over time https://github.com/pytorch/pytorch/issues/31877Code pointers:Tensor __reduce_ex__ https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/_tensor.py#L97-L178Python side serialization https://github.com/pytorch/pytorch/blob/de845020a0da39e621db984515bc1cce03f526ea/torch/serialization.py#L384-L499C++ side serialization https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit/serialization
Continuous integration

Continuous integration

2021-05-2616:53

How is our CI put together? What is the history of the CI? What constraints are under the CI? Why does the CI use Docker? Why are build and test split into two phases? Why are some parts of the CI so convoluted? How does the HUD work? What kinds of configurations is PyTorch tested under? How did we decide what configurations to test?  What are some of the weird CI configurations? What's up with the XLA CI? What's going on with the Facebook internal builds? Further reading.The CI HUD for viewing the status of master https://ezyang.github.io/pytorch-ci-hud/build/pytorch-masterStructure of CI https://github.com/pytorch/pytorch/blob/master/.circleci/README.mdHow to debug Windows problems on CircleCI https://github.com/pytorch/pytorch/wiki/Debugging-Windows-with-Remote-Desktop-or-CDB-(CLI-windbg)-on-CircleCI
What's a stacked diff? Why might you want to do it? What does the workflow for stacked diffs with ghstack look like? How to use interactive rebase to edit earlier diffs in my stack? How can you actually submit a stacked diff to PyTorch? What are some things to be aware of when using ghstack?Further reading.The ghstack repository https://github.com/ezyang/ghstack/A decent explanation of how the stacked diff workflow works on Phabricator, including how to do rebases https://kurtisnusbaum.medium.com/stacked-diffs-keeping-phabricator-diffs-small-d9964f4dcfa6
Shared memory

Shared memory

2021-05-2410:45

What is shared memory? How is it used in your operating system? How is it used in PyTorch? What's shared memory good for in deep learning? Why use multiple processes rather than one process on a single node? What's the point of PyTorch's shared memory manager? How are allocators for shared memory implemented? How does CUDA shared memory work? What is the difference between CUDA shared memory and CPU shared memory? How did we implement safer CUDA shared memory?Further reading.Implementations of vanilla shared memory allocator https://github.com/pytorch/pytorch/blob/master/aten/src/TH/THAllocator.cpp and the fancy managed allocator https://github.com/pytorch/pytorch/blob/master/torch/lib/libshm/libshm.hMultiprocessing best practices describes some things one should be careful about when working with shared memory https://pytorch.org/docs/stable/notes/multiprocessing.htmlMore details on how CUDA shared memory works https://pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details
What is automatic mixed precision? How is it implemented? What does it have to do with mode dispatch keys, fallthrough kernels? What are AMP policies? How is its cast caching implemented? How does torchvision also support AMP? What's up with Intel's CPU autocast implementation?Further reading.Autocast implementation lives at https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cppHow to add autocast implementations to custom operators that are out of tree https://pytorch.org/tutorials/advanced/dispatcher.html#autocastCPU autocasting PR https://github.com/pytorch/pytorch/pull/57386
Conjugate views

Conjugate views

2021-05-2015:27

What are complex numbers? What is conjugation? Why is conjugation so common in linear algebra? Why would we like conjugation to behave similarly to transposition (and why is matrix multiply with a transposed input so fast?) What is a conjugate view? How is it implemented? What's the relationship between views, laziness and call-by-name evaluation?Further reading.Pull request that adds conjugate views https://github.com/pytorch/pytorch/pull/54987The idea of conjugate views originally came up when we were deciding which complex autograd convention to use in https://github.com/pytorch/pytorch/issues/41857 . PyTorch uses the conjugate Wirtinger derivative which, true to its name, involves a lot of conjugations in its formulas.Conjugate views are a form of bidirectional lens. This nice presentation explains what the concept is https://www.cis.upenn.edu/~bcpierce/papers/lenses-etapsslides.pdf
What historical constraints and design choices lead to the design of Tensor/Storage (and their Impl variants) as they are today? Why do we use intrusive refcounting? Why are we trying to get rid of virtual methods on TensorImpl? Why are there so many frickin' bitfields?Further reading.PyTorch internals blog post http://blog.ezyang.com/2019/05/pytorch-internals/Writing Python in C++, a manifesto https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)At time of writing, the breakdown of all fields on TensorImpl https://github.com/pytorch/pytorch/blob/71f4c5c1f436258adc303b710efb3f41b2d50c4e/c10/core/TensorImpl.h#L2155-L2177
What's the general process by which a new operator is added to PyTorch? Why is this actually something of a rare occurrence? How do you integrate an operator with the rest of PyTorch's system so it can be run end-to-end? What should I expect if I'm writing a CPU and CUDA kernel? What tools are available to me to make the job easier? How can I debug my kernels? How do I test them?Further reading.The README for the native/ directory, where all kernels get put https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.mdA high level overview of how TensorIterator works https://labs.quansight.org/blog/2020/04/pytorch-tensoriterator-internals/Where OpInfos live https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py
What is a Variable? Why did it exist as a wrapper in the first place? Why did it get removed? How did we remove it? What are some of the lingering consequences of its removal?Further reading:The release notes of PyTorch 0.4 do a good job explaining the user visible consequences of the removal, at the time, including how we "simulate" concepts on Variable that don't make sense anymore https://pytorch.org/blog/pytorch-0_4_0-migration-guide/Part 1: Removal of Variable wrapper in C++ https://github.com/pytorch/pytorch/pull/17072Part 2: Merge of Variable and Tensor types in C++ https://github.com/pytorch/pytorch/pull/28620
loading
Comments 
Download from Google Play
Download from App Store