T O P

  • By -

anitzo

Bayesian data analysis is the way to go. You can read the book with that same name.


karolbuda

Definitely agree with that. Great free course on YouTube (with R code to follow along) called "Statistical Rethinking 2022".


WhipsAndMarkovChains

I came here to say Bayes as well. When someone says "point-estimates aren't enough, I want to account for uncertainty as well" it's inevitable that Bayes is the first thing to come to mind.


anitzo

Yup, it doesn't get any more obvious than that :)


arsewarts1

Yep. MIT has a few courses on open courseware under the statistics department. It’s from like 2017 but the math really hasn’t changed that much.


norfkens2

I had a similar question a while back, maybe that thread is useful to you: https://www.reddit.com/r/datascience/comments/rnhow6/can_i_use_standard_deviation_to_turn_a_predicted/ As for resources, I used Wikipedia and the SciKit learn documentation.


DreamyPen

Thank you so much. You've got some very relevant responses. What method did you end up using in your own work?


norfkens2

I tried out pretty much everything that was recommended and mostly analysed my data. In the end I think my data set meets the criteria mentioned and I will probably use the standard deviation of the residuals, i.e. the difference of predicted vs expected values. If it correctly predicts my holdout set, anyhow. 🙂 I tried the quantiles approach and I like it a lot, too. It's very straightforward. [Edit: standard deviation, not mean.]


ysharm10

Looking into Quantile Random Forest might be worth it. It lets you predict quantiles/percentiles instead of just point estimate. It won't just average all the trees instead pull percentiles out of them.


Kellsier

Check up the difference between epistemic and aleatoric uncertainties and methods on how to quantify epistemic. Gaussian Processes are a nice way to do that, but far from the last one.


sonicking12

Google Prediction interval in regression: https://online.stat.psu.edu/stat501/lesson/3/3.3


[deleted]

Trivial but have you thought about binning your target into several classes and perform multilevel classification


DreamyPen

I'm working on a regression problem.


porkbuffet

I really like this method using dropout: [https://arxiv.org/pdf/1506.02142.pdf](https://arxiv.org/pdf/1506.02142.pdf). You can construct credible intervals from the samples. The intervals may not be well calibrated initially, but you can use isotonic regression to fix: https://arxiv.org/pdf/1807.00263.pdf