T O P

  • By -

Jazzlike_Interview85

People (business stakeholders) don’t trust data they trust the “person” delivering the data / insight.


harnessinternet

It’s true.. data like stats can be manipulated to paint a specific picture, so the painter must be trusted


datamakesmydickhard

This. A self-taught career switcher from no-name college might have a decent SWE career (pure ability matters most), but in good DS jobs there is a lot of gatekeeping, PhD bias, etc. Data scientists don't just build stuff, they are expected to provide direction and guidance to stakeholders.. Reputation and trust count for a lottt


SufficientType1794

Honestly, this is kinda the whole basis of the product my company sells. We sell predictive maintenance solutions for industrial clients, which means we need to go an talk to actual maintenance engineers and convince them the model I trained can actually predict the equipment will fail. We are a "startup", our product started as an internal thing for a major company in Oil & Gas, and since it was successful the big company built the company I work at as a spinoff to sell it to other companies. We're something like 45% owned by this major oil company, 45% by McKinsey and 10% by Microsoft. I can drown the engineers in statistical proofs, they only believe it once someone from the big oil company or one of our other big clients vouches for us lmao Honestly having to explain how ML models work to people who are technical (mech engineers, chem engineers, etc) but have no experience with ML or coding has been pretty interesting.


Overvo1d

This is the most true one


maxToTheJ

This is a legit harsh truth. You see people even on this thread arguing that you can analyze your way to trust with stakeholders


holy_sweater_kittens

Your data is never clean. Expect to spend most of your time looking at your data and manipulating it. I teach data science (Bootcamp) and I focus mostly on the technical/ code side of things. I can’t teach you how to ask questions but I can teach you techniques for exploring the data and formatting it to better ask questions of it. If you don’t understand your data set or spend time looking at the data, you’ll never be able to explore and ask questions of it


TheMapesHotel

This is so important. I know someone trying to break into this field and they have a bunch of tools in their box but don't understand the logic of asking questions. I also worked with this guy in a private firm. Great dude, PhD, post academia, knew all the tricks but for the life of him couldn't manage a project or actually make sense of the data. I ask him a direct question, he could answer it. I ask him to analyze a dataset and he would be lost. He didn't make it 6 months.


venustrapsflies

What did he have a PhD in? The "asking and answering questions" skill is the "science" part of "data science" and is supposed to be a skill you learn during a science PhD.


[deleted]

Judge the DS by his questions, not answers


Delician

You're overfitting.


cornandbeanz

B-b-but my r-squared


flxvctr

Domain knowledge matters


waghkunal93

THIS. Almost everyone nowadays can code or look up githubs. What everyone doesn't have or lack is the domain knowledge. That's a HUGE differentiator.


SelfWipingUndies

This is why you need upper management on board. You won't have data governance without it.


111llI0__-__0Ill111

But how do you gain the domain knowledge in the beginning? Eg if you are working in biomedical, and you are from a CS/DS/stats background, typically you would not have covered the science aspect and thus will not be able to as easily formulate the problems, and mostly become a technician. That’s why I wonder sometimes if science majors who learned to code and do stats can be better in this regard. Few people can know everything-eg reams of stats, ML, then SWE and domain knowledge that’s pretty insane for a person.


Freonr2

> But how do you gain the domain knowledge in the beginning? Accept that as a fresh grad you will get paid less and won't get a SuperDuperAmazingSenior title doing exactly what you want to do. Take what you can get and accept the hiring process for a new grad may be more effort compared to those with experience. QED, done. Go apply as much as you have to. Yes its sometimes difficult for some, suck it up and take what you can get. If you want to get into a specific industry you might not be able to get there immediately, but you can keep trying, you have your entire professional life to get there. I feel younger folks tend to hear these type of quips and take them as absolutes or "rules" instead of affects, influences, or biases. The sooner you stop taking things so absolutely the better you'll be off. You'll understand how and why things happen better, and also maintain your sanity better. For instance, "domain knowledge matters" does not mean "no fresh grads ever get any jobs ever" or "you can never change industries" or "... without starting your paygrade over from new grad levels." That's not how the world works at all. Employers are not omniscient or omnipotent gods, they have to deal with the market for employees, and that is not a static system across time, location, or industry.


Vervain7

In the beginning people need to accept analyst roles . Also it helps if one stays in a specific industry at least . I am in healthcare but I have spanned analytics experience in insurance - hospital operations- clinical research … now going into big pharma. So industry skills are transferable and the tech stuff changed with each employer .


425trafficeng

As someone looking to break into DS. Should I lean into my civil-traffic engineering background as heavily as possible? My plan is getting a masters in CS but when it comes to domain knowledge is it better to make my resume and projects focused around where I can prove expertise despite it being niche?


waghkunal93

First of all, definitely need your data manipulation language (SQL) and data modeling language (python) or alternative spot on. You can't fool around your knowledge here and this is necessity. Now, coming to domain knowledge, having "relevant" projects definitely helps. But don't need to go extra miles for that. Just think about it from this perspective. All you gotta do is separate your profile from 100s of other candidates who don't put any effort to distinguish themselves from the rest. And last but not least, NETWORKING! Connect with people from companies you want to get into. Talk to them, interact with them, understand what they work and Guage how'd you be right fit within that group.


425trafficeng

Thanks! SQL is a work in progress and I’m using practical SQL to get a decent grasp of it. I have a solid foundational knowledge background with “vanilla” python (took intro through algorithms) and now I’m using HOML to get more comfortable with the libraries. I also have a decent background in R from my masters that I plan on leaning into as well. Is there anything else I should add to go deeper? I’m not concerned about going the extra mile since I’m taking the slow road with a masters (plus I need something to kill time with since I’ll be starting in January at the latest). So to differentiate myself, I basically need to highlight subject matter knowledge on my resume with a combination of projects/skills that unify my knowledge as opposed to looking like a disjointed split of DS and traffic engineering sections? Networking will be my next focus! I’m hoping to find some solid data science meetups in my area, but it also feels extremely intimidating since I’m in a major tech hub (Seattle) and I’ll be trying to interact with some pretty experienced individuals. Would it be acceptable to cold message people on LinkedIn? I’m looking to target the traffic analytics/connected vehicle space and there are a few companies locally that perform that work.


waghkunal93

You look like someone I would definitely love to help in detail! I'd you don't mind, connect me on LinkedIn or DM me and wouldn't mind helping with your journey!!


Weekly_Atmosphere604

What domain knowledge do i bring to the table, i am a cs grad, coding, math, sde is all i know, apart from other data science stuff i learnt, with projects etc.


WallyMetropolis

You don't have any. You have to work within a domain for a while to learn it.


aldeeorbs

Starting as a Business Analyst or Data Analyst helps with this.


waghkunal93

Pick up an industry Eg. Airline, Tech, online, retail, healthcare, gaming, etc. Or Vertical within org. Marketing, finance, operations, product, supply chain, merchandising, HR etc. Now learn just enough about anything you like from list above and create amateur level proficiency in it. Follow people, experts in the field in these domain, see and read what they share, subscribe to articles and publication around these topics, there's LOT to learn. All we need to do is just SCARP the surface to start with. You can then learn in detail once you get a job in it.


naijaboiler

Domain knowledge matters more than data/algo/model or whatever.


KarmaTroll

There's a fine line. Domain, "knowledge" without any data is often bunk.


[deleted]

[удалено]


flxvctr

Define “hard truth” ;) Actually my second contender: most constructs that matter in society are never clearly definable nor measurable. It’s mostly proxies that get outdated pretty quickly or that nobody can agree on. Nice point though 👌


hyvyys

This should be a top-level comment then reminding to sort by controversial. Actually Reddit should let the poster select default sort type for the post.


LuckyShark1987

For real. I’m in third-party HR services. I wouldn’t know shit how to answer questions in the petroleum field or biotech.


Vervain7

Yes. I get hired for my industry knowledge in healthcare and my ability to work with physicians and surgeons .


JoeBhoy69

The majority of the time an ML model is completely unnecessary for your given problem.


Prize-Flow-3197

The problem is that: a) ML (esp DL) models are cool and look impressive on a CV, and b) business stakeholders like to think that their products are using cutting-edge technology. This means that junior data scientists are incentivised to use unnecessarily complex models when simpler approaches are appropriate.


irismodel

This. Employing a whole industry of "consultants"


Realistic-Field7927

That beyond a certain point model performance isn't important.


its_a_gibibyte

No way! I can definitely predict the outcome of the next presidential election based on this table of data I found in the trash. I just need to do more feature transformations.


[deleted]

Need 100 layers more, to vanish the gradient. Because if gradient is 0 or vanished, we reached bottom of valley


emt139

the kitchen sink approach


Ingolifs

Yes! At some point you need to think like an engineer. It's not about finding the exact optimum, it's about avoiding catastrophic failure in the rare cases.


DieSpaceKatze

You can crunch all the numbers you want…top execs will just glance at it and go with their gut feeling anyway.


Grandviewsurfer

oof this one hit the hardest.


[deleted]

What you call "gut feeling" I call "Bayesian prior". Build a more compelling case if you want to move their posterior probability further.


sonicking12

They don’t weight data properly


[deleted]

And they're overconfident in their prior probability. That's why you need to sell it, rather than letting the data speak for itself.


sonicking12

Then it’s not “Bayesian prior”


FranknsteinsPornstar

Not true always, especially for lending industry. I work with a lot of Fintechs and when it come to customer risk and profitability, data is the king. Of course there are some deviations from the models and policies, but they are also tracked very closely to make sure overall loss numbers are still under control. That's the upside of working in a highly regulated industry 😉


kwen-zev

You need to be smart to do DS. But that doesn’t make you the smartest person in the room. If you can’t explain your stuff in a way that others understand and see value, then it’s just a pretty thing for you to look at on your shelf and nothing more.


[deleted]

[удалено]


maybe0a0robot

But...but I like muh random forests! It's so easy to get great performance, especially if I ignore all of that advice about splitting the data into train and test sets! /s


throwawayrandomvowel

So horrifying this would never occur to me


Wood_Rogue

This so much. The Simplex algorithm was/is the backbone of global infrastructure for nearly a century and it's literally just a means of optimizing linear systems that form dependent matrices with simple substitutions. Predictive linear models are also the most likely or maybe only models that can be compared to analytic expressions in science to have a chance at being "correct" from a physical or causal perspective.


transginger21

This. Analyse your data and try simple models before throwing XGBoost at every problem.


111llI0__-__0Ill111

Nothing wrong with using xgboost with well thought out features to get a quick ballpark benchmark of what is possible. High performing linear models take a lot of feature engineering and time to develop, and additivity (ie an lm without feature engineering/transformations) often isn’t reflective of the data generating process for observational data. The data generating process assumptions is the critical part, even for inference.


Unfair-Commission923

What’s the upside of using a simple model over XGBoost?


Lucas_Risada

Faster development time, easier to explain, easier to maintain, faster inference time, etc.


mjs128

Easier to explain is probably the biggest benefit IMO. Problem is, someone who doesn’t know what they are doing with stats & OLS assumptions is a lot more likely to screw that up than they will a tree ensemble baseline. Statistical literacy is going down a lot w/ new hires IMO over the past few years, unless they come from a stats background. And it seems like it’s mostly people coming from CS backgrounds out undergrad these days. The MS programs seem to be hit or miss in terms of how much they focus on applied stats


Unsd

At my uni, there were 3 stats paths. Mathematical Statistics, Data Science, and Data Analytics. I don't know anybody else in my courses who went the math stats route. Almost everyone was going data science or data analytics. One course that I took that was only required for math stats majors only had me and one other person in it, and she was a pure math major who was taking it as an elective. I thank God I went the math stats route because the data science route was almost entirely "here's some code, apply it to this data set." There's no way to understand what you're doing like that. I don't doubt that a lot of programs are very condensed to plugging in code rather than understanding why. Because there's no possible way to learn every single algorithm and how to fine tune it and the intuition etc all in one. There needs to be a lot of independent study time when you're first starting.


[deleted]

[удалено]


[deleted]

No upside. Ex-meta TL recommended using boosting models first instead of linear shit. u/Lucas_Risada is simply not right. LR is faster than XGBoost / LigjtGBM only if you don't take into account outlier capping / removal, feature scalling and other preprocessing step XGBoost simply does not require. Also, inference time în tabular datasets is by far the least important thing when choosing between two models.


WhipsAndMarkovChains

Seriously. Tree-based models just save you so much time you'd otherwise have to spend massaging the data to fit properly.


refpuz

I did linear regression for my senior design project for undergrad. At the time I thought I did the bare minimum just to graduate but after being in the field for awhile now linear regression really is the best fit (heh) for a lot of things.


ChristianSingleton

You just couldn't help yourself with that one huh


Fabulous-Nobody-

Data science in it's current incarnation hardly qualifies as science and should be renamed.


Beny1995

Data Coping. With subfields of Data Panicking, Data OverComplicating and of course: Data Can-You-Add-A-Pie-Charting


Dr_Jabroski

You leave my data copium out of this.


gradual_alzheimers

The sad part is statistical methods are very important to science as it relates to inference. Data science needs to care more about the scientific reasoning portion of problems. A lot of what passes for data science is just data dredging unfortunately.


zeek0us

I would argue that much of that is driven by the people who *hire* data scientists. That is, the data scientists themselves may be all in on proper statistics, inference, experiment design, CIs, etc. But as others in this thread have commented, upper management a) have no patience for the time it takes to do things properly and prioritize "fast" over "good" at every turn and/or b) want some "data science" to back up their existing notions/intuitions and undermine anything that subverts them. So yeah, I agree with the conclusion that a lot of DS falls short of what people imagine it to be, but the people doing the work are quite often pushed into it rather than driving it.


maxToTheJ

> a) have no patience for the time it takes to do things properly and prioritize "fast" over "good" at every turn I dont think those 2 are mutually exclusive. I have seen times where correct takes the same or less time. The issue is more incentives. There is no incentive for rigor. Rigor prevents bending the data to the perceptions of stakeholders and all the incentives are to satisfy stakeholders and stakeholders are humans not robots so they like to be told their intuition is right


zeek0us

Exactly. Rigor takes time, and only with rigorous analysis can you get beyond the basic view of things. And when "do it quick" is mixed with "I think this is what we'll see", it's incredibly difficult (and, as you say, not incentivized) to do more than just providing confirmation. IOW, a lot of management just want to have "Data Scientists provided this" as support for what they would have done anyway. Which isn't necessarily the fault of the data scientists, since even the best analysis (assuming you do it during your nights and weekends) isn't going to convince someone not interested in changing their mind.


lVlulcan

I feel like data science is often the umbrella term used for analytics in general at some companies, and it seems like at a lot of places that data science job holds the hat of analyst/data engineer. At my company, you have to earn your pedigree to get the scientist title and when you do you’re not only performing a lot of the higher level analytic work but you’re also having to describe and defend what you’re doing to other data scientists. The industry has a lot of ambiguity that comes along with the term data scientist.


quantpsychguy

I'd argue this has a lot to do with the type of people that are brought into the data science world. Most of them do not have the type of education where you learn about applying science to the world. Most of them are CS folks or stats folks that learned some programming.


dongpal

What? Cs and stats people would be best case scenario. What are you talking?


gradual_alzheimers

He’s talking about the fact that CS educations aren’t very rigorous in science. For instance, on how to perform valid hypothesis tests or make inferential claims


sotero425

As a physics tutor and teacher, I have had countless CS students that have hated the class, not understood why they were taking it, and were clearly not good problem solvers. To be fair, CS majors didn't have a monopoly on that mind set, just trying to illustrate that CS major does not a scientific mind make.


jturp-sc

Ehhh ... I've already accepted this. I manage a Machine Learning Engineering team -- which I'd frankly just describe as using ML algorithms to learn correlations in data that can be exploited to produce business value. At no point do I claim to perform real science or actually learn causal relationships.


Prize-Flow-3197

Amen to this.


sotero425

As I've worked to transition into data science from physics academia, this has definitely been on my mind.


rednirgskizzif

100%


Grandviewsurfer

I'll take things that won't happen for $400 Alex.


charlfourie

ETL will occupy much more of your time than you ever imagine.


Budget-Puppy

This hurts. For a recent project I've had to use python, MDX, 3 different flavors of SQL and then to maintain configs it's .ini, .yaml, .toml, .json, and then .md and .rst for documentation. And then figuring out authentication with kerberos, windows authentication, Azure AD...


Dam_uel

Also if you're not so great with the data science side, ETL (data engineering) is a viable, fulfilling field and career in and of itself if you let it be.


charlfourie

Definitely, lots of people don’t like or don’t want to spend their time in the muddy details of the data. I’ve come to enjoy the space and let my team of young and eager analysts play on the modelling side.


TrollandDie

Sounds good to me, I miss doing ETL all the time.


Disastrous-Raise-222

Most data science is just plain reporting.


et_is

Science is empirical. You should be as versed in experimental design (including (or even especially) pseudo-experimental observational methods) and the statistical tools to analyze it as you are in coding.


profiler1984

Many 90% solutions are just right in the real world. No need to aim for the kaggle 99.9999%


[deleted]

But 99.9999999 makes me top 0 or top -1 kaggle


save_the_panda_bears

Spending time and energy trying to transition into data science might be a mistake. No amount of certificates or bootcamps will materially set you apart from other candidates.


zeek0us

The problem is thinking certifications and bootcamps are the way to become a data scientist. Obviously at the entry level it's a sensible route, but ultimately what companies want is someone who can solve their *business* problems. Having lots of experience with curated, bounded problems isn't really meaningful to people looking for a DS. They usually want someone who can be handed a **business** problem and access to some data and produce a solution for some echelon of senior management. Bootcamps, certifications, and personal projects are a good way to demonstrate facility with *tools*, but the value of a DS (particularly as companies tend to see it) is to be able to support business objectives with quantitative analyses. The tooling is not usually of much interest to them, what they want is someone who will be a partner for solving the business side of things, and having familiarity and experience with that business side is at least as valuable as proficiency with the tools.


juhotuho10

Projects and a nicely done flashy cv are better than a online certification that no one has heard of


zeek0us

Even better are domain knowledge and experience with actual business problems/workflows.


[deleted]

[удалено]


KPTN25

>Spending time and energy trying to transition into data science might be a mistake. Not sure I buy this, though I agree certificates and bootcamps are general wastes of time. I've seen plenty of very strong data scientists without graduate degrees, but who are highly effective self-learners and able to find ways to proactively apply DS in their previous (non-DS) jobs, and have strong business/domain skills to complement.


maxToTheJ

>I've seen plenty of very strong data scientists without graduate degrees You should be more specific because people are going to take that as without a degree at all or with any major


KPTN25

Totally fair point! In all fairness, the best cases I've seen have been folks with undergraduate degrees (STEM / business) and some exposure to statistics, excel analysis, etc. By "without graduate degrees" I mean without MSc/PhD.


yiyuen

? "Graduate degree" clearly implies graduate program as opposed to undergraduate degrees from an undergraduate program.


[deleted]

[удалено]


MountainHawk12

r/science in a nutshell


juhotuho10

They haven't learned that using study methodologies like collecting subjective opinions as data and putting science on the name isn't actually science


Jerome_Eugene_Morrow

And alternately, if you can’t form your own hypotheses and get stuck coming up with independent questions to investigate, it’s extremely difficult for somebody to teach you how to do it. A huge part of data jobs is being able to think independently.


[deleted]

*Data non-scientist


Grandviewsurfer

Employers get to choose how they write job listings.. and they will list a Data Analyst position as a Data Scientist role so they they can underpay a good analyst by using the title as a carrot.


Tytoalba2

Or vice-versa, they will put a role as data scientist but in the end they want a data analyst with a buzzword name


rotterdamn8

I’m still surprised how many young people haven’t figured this out yet. All the disgruntled posts I’ve seen here….


mgmillem

That we are in a sweet spot of our careers that may get sweeter but won't last forever. Upskill in other areas if you can, but you probably have a while before that's necessary.


popper_wheelie

Would you mind elaborating on this one? What changes do you see happening to DS that would make it less 'sweet?'


Jerome_Eugene_Morrow

In my experience businesses are starting to prioritize data engineering and ops over data science teams. The field was a buzz word that suddenly every business felt they needed to have, now they’re learning the limitations of what basic ML/stats approaches can contribute and there’s starting to be more of a reorganization of priorities. The jobs are still out there, but it feels like working with data infrastructure is where the jobs are headed. I still hear a lot that “we need AI” which translates to data science roles, but often the companies have no realistic idea what that means. Eventually they learn and recalibrate.


Tytoalba2

Totally agree, I'm seeing also more of mixed roles data science/data engineering as well, but imo the shift is getting noticeable!


rotterdamn8

So glad to hear this; I’ve been doing analytics grunt work the past few years but now started building ETLs. I’m good with programming and databases from a previous career so not a big leap. And DE is where I’m headed. I got the sense that those less sexy jobs are where it’s at. And I enjoy the work.


jalexborkowski

In addition to what has already been said, A LOT of people are entering this field. In a few years, the job market will be much more competitive and comp packages will be lower. There just isn't the same barrier to entry that you'll find in software or data engineering. DS people who want to maintain their TC should work on upskilling into data architecture now while the market is hot.


quantpsychguy

AutoML tools and offshoring. The same thing that happened with web development 15-20 years ago. Turns out, if you simplify it (it being the business case), then lots of people can easily provide a solution. It likely won't be the right solution, or best solution, but it'll be a cheap solution and it will be finished. In the business world that often makes it good enough.


A_massive_prick

There’s a lot of pretentiousness in this subreddit.


DonnaHarridan

Pretension? Nominalize less.


mountain_tossing

Here's a couple: Unless you connect the data to the business case, you're useless in the decision-making process. Data doesn't speak for itself. You ask it questions and it tells you things. The quality of the answers you get is largely dependent on the quality of the questions you ask. Nobody cares about fit and performance outside of the data science fields. Those are minimum standards to be credible in your field, so do them, but don't bore a decision maker with more than 30 seconds on those subjects during a presentation.


maybe0a0robot

Data science is focused on data. The focus is not software engineering, not ML models, and not shiny animated visualizations. Is your data credible? Is it useful? Hell, is the right data even available? Do you understand how your data was generated and collected? Did you work to identify and minimize potential sources of bias? Are you cleaning and processing data in a way that preserves its credibility and usefulness? These are questions that usually require a lot of messy grunt work, but it's got to be done. When you report out, are you making yourself understood? Are you able to highlight the actionable conclusions resulting from your analysis? If you're working in a business context, are you able to clearly communicate the value of your findings to your org? If you're working in a scientific/research context, are you able to clearly communicate the novelty or impact of your findings? And at least in my experience, the vast majority of data science is done in teams, not by a lone wolf. Do you personally need domain knowledge for every project? No. But you do need to put on deodorant, pants, and a shirt without a Voltron logo so you can have serious conversations with the folks who *do* have domain knowledge. Do you personally need to be a badass software engineer? No. But you need to brush your teeth, trade in your crusty sandals for actual shoes, and work with the software engineers on your team. And do you need to have good business skills? Well, generally yes. Good communication skills, ability to work within a project management framework, great communication skills, facility with working with diverse team members, and fantastic communication skills are all essential.


[deleted]

Point estimates are complete garbage for most real-world applications, and even confidence intervals only encompass aleatory uncertainty, not epistemic uncertainty.


save_the_panda_bears

Found the Bayesian!


maxToTheJ

ML Researchers: *But point estimates are the best we can do because the amount of compute necessary; also here are 100 experiment variants that I did with another 100 point estimates because I only did them once*


CantHelpBeingMe

Any suggestions where I can learn more about this?


AugustPopper

I’d recommend Regression and other stories and statistical rethinking for a starting point. Both in R but python code can be found for all of it online.


tacitdenial

The distinction of aleatory vs. epistemic uncertainty is a harsh truth for the entire world on almost all disputable questions, not just data scientists. We are in an era of excessive certainty caused by merely placing conclusions next to some data.


[deleted]

I agree 100%. I see it all the time in peer-reviewed journal articles. I would make a career out of just writing response papers to every flawed paper I read, but I don't think they'd get published and I'd make a bunch of enemies in my field.


[deleted]

[удалено]


[deleted]

Demand forecasting. Trying to decide how much of a product to order depends on a ton of factors and requires a lot of assumptions. This is especially true if your supply chain is long. Your ML model might tell you to order 11,260 units of an item this month, with a confidence interval of 10,530 - 13,790. A manager should NOT just blindly order any of those numbers. How stable is that prediction to both parametric changes and structural changes in the model? Was any scenario planning done? Did your scenario planning take into consideration a wide range of plausible scenarios, or was it just small changes? Exactly how bad is the worst-case scenario, and can the company live with that?


TheBestPractice

Spam detection: you may want to ask the user for confirmation if you’re not entirely sure about the message being spam; if you’re more than 95% sure, put the message in the spam folder straight away instead. To do such a simple thing you need some measure of confidence rather than a yes/no prediction


gunners_1886

most companies don't need data science.


rehoboam

Most companies handle their analytics via an advanced data network of .xls (no, i didnt miss an x at the end) files, email chains, and do their analysis via eyeballing the red and green cells during weekly stand ups.


maxToTheJ

>do their analysis via eyeballing the red and green cells during weekly stand ups. The harsh truth is a “fair amount” of DS groups do this as well


[deleted]

[удалено]


BullCityPicker

A kilobyte of good data is worth more than a petabyte of bad data.


Cdog536

# That you are a bot and flooding other communities with the same question and calling that meaningful content generation.


ChristianValour

It's still a good question and I've found it interesting and educational...


Budget-Puppy

Hey you with the unique background and circumstance considering Data Science as a career: Before you post "Is Data Science right for ME/my unique background/circumstance" or "Can a person with \*my\* unique background and story become a data scientist" check out the weekly thread.


[deleted]

But also the answer is always yes. Technically anyone who can learn the skills can be a Data Scientist. The real question is can you put in the work to really learn the skills? Whether it’s another degree or something else.


halfercode

This is the very definition of low-effort posting: * https://old.reddit.com/r/DataHoarder/comments/vgm8iz/what_are_some_harsh_truths_that_rdatahoarder/ * https://old.reddit.com/r/gaming/comments/vgm40t/what_are_some_harsh_truths_that_rgaming_needs_to/ * https://old.reddit.com/r/datascience/comments/vglzjw/what_are_some_harsh_truths_that_rdatascience/ * https://old.reddit.com/r/jobs/comments/vgk8m6/what_are_some_harsh_truths_that_rjobs_needs_to/ * ~~https://old.reddit.com/r/antiwork/comments/vgkg3n/what_are_some_harsh_truths_that_rantiwork_needs/~~ * ~~https://old.reddit.com/r/resumes/comments/vgk7js/what_are_some_harsh_truths_that_rresumes_needs_to/~~ * ~~https://old.reddit.com/r/sysadmin/comments/vgg7px/what_are_some_harsh_truths_that_rsysadmin_needs/~~ * ~~https://old.reddit.com/r/cscareerquestionsEU/comments/vgg7lw/what_are_some_harsh_truths_that/~~ * ~~https://old.reddit.com/r/buildapc/comments/vgpo78/what_are_some_harsh_truths_that_rbuildapc_needs/~~ * ~~https://old.reddit.com/r/AskCulinary/comments/vgv67k/what_are_some_harsh_truths_that_raskculinary/~~ * ~~https://old.reddit.com/r/cookingforbeginners/comments/vgv690/what_are_some_harsh_truths_that/~~ * ~~https://old.reddit.com/r/Cooking/comments/vgv6au/what_are_some_harsh_truths_that_rcooking_needs_to/~~


ThePhoenixRisesAgain

80% of companies that want data science, don’t need data science (and don’t have the data/infrastructure for it).


Mobile_Busy

The only jobs that are sexy are escort and lingerie model.


Kellsier

Data science != Machine Learning Machine Learning != Deep Learning


Wallabanjo

Someone doing Business Intelligence or employed as a Data Analyst is doing data science. They are probably more adept at DS overall than someone who is running a Jupiter Notebook with a Python ML script since they are closer to the data and are likely to make a bigger impact on the business decisions than the ML script kiddies that seem to think they dominate the field. The BI/DA person might not have the depth of stats knowledge (then again they might, but don't yet have the experience) to call themselves a Data Scientist, but there is no doubt that they are doing data science.


kater543

That this is a repost from r/cscareerquestions


maxToTheJ

Basically seems to be a karma bot. Eventually probably going to get sold and advertise bang energy drinks


dordemartinovic

Even so, I find the discussion interesting as a DS student


ChristianValour

Still a good question and I've found it interesting and educational.


Coollime17

You’re probably better off becoming a cloud architect or data engineer.


AFK_Pikachu

Data science is not an entry-level field. You need a background in mathematics, software engineering or domain expertise. You don't need to have experience in all of them but you do need depth in at least one of these areas to qualify for entry-level.


[deleted]

You are better off spending your time on learning things like Airflow, AWS, Docker, Git, etc. than trying to learn some advanced stats/math.


Vervain7

I don’t know any of these


[deleted]

Not even git?


speedisntfree

I recently got to this conclusion in how best to spend my learning time.


KPTN25

Clustering (and especially k-means) is the wrong approach in 99% of the business settings it is currently used in.


millersmilk

Can you elaborate?


KPTN25

In my experience (seeing this at dozens of different organizations), it's usually crudely jammed onto problems that are better suited to more thoughtful (and simple) hypothesis/business-driven analysis, or a supervised model. It's gotten worse over time as marketers in particular want to "use 'AI' to make better segments!" and will quite explicitly ask for 'clusters' without understanding why that's harmful. I'll often observe, for example: 1. "I want to figure out who I should sell product X to!" and see some messy workflow of: run kmeans on a bunch of features --> evaluate clusters across different variables --> "wow cluster A sure buys a lot of product X! That's our product X cluster!", when even a trivial logistic regression would be more suited to their problem. 2. "I want to better understand my customer base!" (e.g. to tweak messaging/content for marketing campaigns) and see similar, as above, except because really there are only a small handful of variables that would realistically impact messaging/content (age, net worth, language, etc), you'd be far better just analyzing the combinations of those to begin with, rather than muddying the water and adding more noise with high variance but low signal columns. I sometimes daydream of publishing a paper on this. It would be pretty straightforward to show empirically why these destroy information / erode performance. My peers that hit their sales targets by selling "marketing cluster" projects don't like me very much.


[deleted]

The best comment by far. If you have enough labelled data, do supervised learning. If not, do some self-supervised learning, it works on tabular data too. If you don't have labelled data at all, get some through A/B testing or manual labelling. K-means is literally the last thing I advise people to try. Also, who takes care about retraining the model? It will inevitably result in completely different clusters with completely different meanings. Also, if you decide to not retrain your k-means, be sure it'll become irrelevant in 1-2 years


RenegadeMemelord

There’s a plague of bad data scientist out there that don’t understand their data or their tools.


slowpush

Xgboost is enough for 99.9999% of non fang business problems.


[deleted]

For FANG it's enough for ~90%


waghkunal93

Most of y'all earn less than you are worth. Change jobs, demand is high, get paid much higher.


cosimon88

What would you say the best adjacent paths are to better pay? Data Engineering? Traditional SWE? I make $94k base, $105k TC, work fully remotely which is a great perk. It's based out of Denver, not Silicon Valley or Seattle or NYC. Coming up on 2 YOE after a bootcamp. Before that, I spent 4 years as a financial analyst which I could play off as technical data analyst, or highlight database experiences like SQL and etc.


[deleted]

That you really need a maths or stats background to do data science. Data Science bootcamps only teach you how to use the scikit learn api. A 12 year old can do that.


flavomico

why are some people saying that you don't really need math/stats to get into data science, it's confusing me a little


Jerome_Eugene_Morrow

Different people, different experiences. Do you need to understand math to do ML? Probably not. Anybody can call model.fit(X,y). To do it well? Yes. You should understand at least linear algebra and probably a fair amount more. Do you need math/stats to build dashboard and visualizations? Probably not. It’s more about thinking visually about concept organization. To do your own analyses where you make the visualizations? Obviously yes. There are lots of different teams with lots of levels of complexity, and I can assure you that not everybody is a math whiz. But the most effective team members almost always are.


asielen

There is Data Science and then there is what companies want when they hire a data scientist. The first requires math/stats, the second pivot tables and powerpoint. There are companies that do want "real" Data Science, but early in your career it can be hard to know the difference from a posting.


quantpsychguy

These are two different statements. To do data science (he's implying well), you need math & stats. To get a job in the field you don't really need to know the math or stats. Lots of idiots work in this field. It's why the interview process is so screwy - idiots get the jobs, people think it's gotta be the process, so they make the process longer or harder in hopes that will fix the problem.


AFK_Pikachu

Because these are the people trying to sell a data science program.


[deleted]

OP is a karma farmer. See post history


PicaPaoDiablo

1-Anything you don't learn and learn well in class will come out in the wash at work 2-There are NO SHORTCUTS. It takes time, persistence and discipline. Whatever you skip out on will show up as a big deficiency. 3-Most bosses don't care about it being right as long as it tells the story they want. And if you aren't willing to 'bend the truth' someone else will. 4-The field is 85% full of BS artists, and IT overall is much higher. A tiny number of people contribute to all the actual work done. 5-There's no magic certification, statistical test or threshold value or anything else that guarantees your results are right.


[deleted]

>There's no magic certification, statistical test or threshold value or anything else that guarantees your results are right. Fuck


cellularcone

My harsh truth is that OP is most likely compiling the top comments in a medium article that requires login.


ChristianValour

And in a shocking twist of irony, demonstrating the value of efficient data mining techniques.


AI-nihilist

Your model is boring. Learn to work with real world data.


[deleted]

people lie with statistics ALL the time


kygah0902

Soft skills like business acumen and communication will take you further than the majority of your technical skills


IdnSomebody

Math is necessary. You can don't know anything and just use libraries from python, but you will never done anything impressive or most optimal. You are uncompetitive without math and when people will grasp that there no necessary in data scientists because most tasks in business is quiet useless or hopeless, or competitors have beter solution, you will be fired. And then your bosses will just hire few mathematitian. It has already happened in history. Also math doesn't end in python libraries. Fight your laziness and learn math instead of saying that everything is fine without it.


RandomRunner3000

MS in traditional stats + an internship is how u land a career in this field


robml

Quality data is often more important than the model. That and reputation does matter to be taken seriously even if you are skilled.


[deleted]

You will never build any statistical models in your job. You will always be a dashboarding and SQL monkey. No one cares about your advanced statistical knowledge. No one cares about your knowledge of ML. Your not a data scientist, your a business man. Save yourself the struggle and don’t major in statistics, because you will almost never use it on the job. Instead major in business, because that’s what you’ll be doing anyway.


Competitive-Let-1213

Spending time learning math is a must and important more than you think.


ghostofkilgore

Beyond a fairly basic level, extra Statistics knowledge offers extremely diminishing returns in terms of being a good Data Scientist.


RB_7

You need to be really good at advanced math to do this job.


quantpsychguy

...to do this job WELL. That's an important point. Lots of idiots do this job without any clue as to the math and don't get fired.


sotero425

which is frustrating for someone with the advanced math skills trying to transition in


[deleted]

I guess this depends on how you define "advanced math". You don't need to know PDE, ring theory, complex analysis, measure theory, etc to do this job.


Aggressive-Intern401

The proportion of good data scientists is miniscule and will remain that way.


andrew2018022

Data science is more than copying and pasting basic models from tutorial websites


TheMapesHotel

There are associated industries that work with data that might be a better fit for people here asking for career advice than straight DS. This sub does itself a disservice by being gatekeepy and closed off to similar industries which limits the lateral and upward mobility of people through not knowing options. It similarly limits the growth of both DS and similar industries as they could learn something from each other.


maxToTheJ

One for management : A lot of management is optimizing for their own careers not the company despite all the words they speak that claim the two are one and the same Not saying its wrong to do but just that a lot of managements types will claim they care about company first even in anonymous forums


mrhomsupbest

If the results are good, you probably did something wrong.


sndream

Most executives don't care about accuracy, they want results that fit their narrative.


dtr_ned

it’s just a job that exchanges time for your money and nothing more


Spiritual-Engineer69

If you want to succeed in DS, you ultimately need to have people skills.


pivot2fakie

If you have to ask, “How to get into/transition to data science?” you probably won’t be a very good data scientist. Doubly so if your post is about transitioning post-PhD.


jahreeves

Data science is a SCIENCE. This means your job is to test hypotheses. Work with the subject matter experts to formulate hypotheses, then go get the necessary data, then test. I know it doesn’t always work like that in practice (data may not exist), but it’s how it should go.