EA - Open Technical Challenges around Probabilistic Programs and Javascript by Ozzie Gooen
Update: 2023-08-26
Description
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Technical Challenges around Probabilistic Programs and Javascript, published by Ozzie Gooen on August 26, 2023 on The Effective Altruism Forum.
While working on Squiggle, we've encountered many technical challenges in writing probabilistic functionality with Javascript. Some of these challenges are solved in Python and must be ported over, and some apply to all languages.
We think the following tasks could be good fits for others to tackle. These are fairly isolated and could be done in contained NPM packages or similar. The solutions would be useful for Squiggle and might be handy for others in the Javascript ecosystem as well. Advice and opinions are also appreciated.
This post was quickly written, as it's for a narrow audience and might get outdated. We're happy to provide more rigor and context if requested. Let us know if you are interested in taking any of them on and could use some guidance!
For those not themselves interested in contributing, this might be useful for giving people a better idea of the sorts of challenges we at QURI work on.
1. Density Estimation
Users often want to convert samples into continuous probability distribution functions (PDFs). This is difficult to do automatically.
The standard approach of basic Kernel Density Estimation can produce poor fits on multimodal or heavily skewed data.
a. Variable kernel density estimation
Simple KDE algorithms use a constant bandwidth. There are multiple methods for estimating this. One common method is Silverman's rule of thumb. In practice, using Silverman's rule of thumb with one single bandwidth performs poorly for multimodal or heavily skewed distributions.
Squiggle performs log KDE for heavily skewed distributions, but this only helps so much, and this strategy comes with various inconsistencies.
There's a set of algorithms for variable kernel density estimation or adaptive bandwidth choice, which seems more promising. Another option is the Sheather-Jones method, which existing python KDE libraries use. We don't know of good Javascript implementations of these algorithms.
b. Performant KDE with non-triangle shapes
Squiggle now uses a triangle kernel for speed. Fast algorithms (FFT) should be possible, with better kernel shapes.
See this thread for some more discussion.
c. Cutoff Heuristics
One frequent edge-case is that many distributions have specific limits, often at 0. There might be useful heuristics like, "If there are no samples below zero, then it's very likely there should be zero probability mass below zero, even if many samples are close and the used bandwidth would imply otherwise."
See this issue for more information.
d. Discrete vs. continuous estimation
Sometimes, users pass in samples from discrete distributions or mixtures of discrete and continuous distributions. In these cases, it's helpful to have heuristics to detect which data might be meant to be discrete and which is meant to be continuous. Right now, in Squiggle, we do this by using simple heuristics of repetition - if multiple samples are precisely the same, we assume they represent discrete information. It's unclear if there are any great/better ways of doing this heuristically.
e. Multidimensional KDE
Eventually, it will be useful to do multidimensional KDE. It might be more effective to do this in WebAssembly, but this would of course, introduce complications.
2. Quantiles to Distributions, Maybe with Metalog
A frequent use case is: "I have a few quantile/CDF points in mind and want to fit this to a distribution. How should I do this?"
One option is to use the Metalog distribution. There's no great existing Javascript implementation of Metalog yet. Sam Nolan made one attempt, but it's not as flexible as we'd like. (It fails to convert many points into metalog distributions).
Jonas Moss thinks we can do better than...
While working on Squiggle, we've encountered many technical challenges in writing probabilistic functionality with Javascript. Some of these challenges are solved in Python and must be ported over, and some apply to all languages.
We think the following tasks could be good fits for others to tackle. These are fairly isolated and could be done in contained NPM packages or similar. The solutions would be useful for Squiggle and might be handy for others in the Javascript ecosystem as well. Advice and opinions are also appreciated.
This post was quickly written, as it's for a narrow audience and might get outdated. We're happy to provide more rigor and context if requested. Let us know if you are interested in taking any of them on and could use some guidance!
For those not themselves interested in contributing, this might be useful for giving people a better idea of the sorts of challenges we at QURI work on.
1. Density Estimation
Users often want to convert samples into continuous probability distribution functions (PDFs). This is difficult to do automatically.
The standard approach of basic Kernel Density Estimation can produce poor fits on multimodal or heavily skewed data.
a. Variable kernel density estimation
Simple KDE algorithms use a constant bandwidth. There are multiple methods for estimating this. One common method is Silverman's rule of thumb. In practice, using Silverman's rule of thumb with one single bandwidth performs poorly for multimodal or heavily skewed distributions.
Squiggle performs log KDE for heavily skewed distributions, but this only helps so much, and this strategy comes with various inconsistencies.
There's a set of algorithms for variable kernel density estimation or adaptive bandwidth choice, which seems more promising. Another option is the Sheather-Jones method, which existing python KDE libraries use. We don't know of good Javascript implementations of these algorithms.
b. Performant KDE with non-triangle shapes
Squiggle now uses a triangle kernel for speed. Fast algorithms (FFT) should be possible, with better kernel shapes.
See this thread for some more discussion.
c. Cutoff Heuristics
One frequent edge-case is that many distributions have specific limits, often at 0. There might be useful heuristics like, "If there are no samples below zero, then it's very likely there should be zero probability mass below zero, even if many samples are close and the used bandwidth would imply otherwise."
See this issue for more information.
d. Discrete vs. continuous estimation
Sometimes, users pass in samples from discrete distributions or mixtures of discrete and continuous distributions. In these cases, it's helpful to have heuristics to detect which data might be meant to be discrete and which is meant to be continuous. Right now, in Squiggle, we do this by using simple heuristics of repetition - if multiple samples are precisely the same, we assume they represent discrete information. It's unclear if there are any great/better ways of doing this heuristically.
e. Multidimensional KDE
Eventually, it will be useful to do multidimensional KDE. It might be more effective to do this in WebAssembly, but this would of course, introduce complications.
2. Quantiles to Distributions, Maybe with Metalog
A frequent use case is: "I have a few quantile/CDF points in mind and want to fit this to a distribution. How should I do this?"
One option is to use the Metalog distribution. There's no great existing Javascript implementation of Metalog yet. Sam Nolan made one attempt, but it's not as flexible as we'd like. (It fails to convert many points into metalog distributions).
Jonas Moss thinks we can do better than...
Comments 
In Channel







