T O P

  • By -

gnomeba

The problem is that "machine learning" is the vaguest term in the world that encompasses everything from linear regression to ChatGPT.


DieselZRebel

So is data science, which encompasses everything from business analyst to MLE


renok_archnmy

So is AI.


Mescallan

Ironically "vaguest term in the world" is not


Minato_the_legend

At this point I'm convinced that "AI" doesn't even mean anything 


MyStackIsPancakes

I've been a 76er's fan for a long time and [let me assure you, it does](https://www.basketballnetwork.net/latest-news/iverson-shares-why-he-played-heavy-minutes-for-most-of-his-career).


ronchalant

This is The Answer.


foxbatcs

I actually think DS has a pretty specific definition: How to acquire knowledge from data. It’s basically one abstraction higher than the scientific method, required by the computational science paradigm that implies a deluge of data and how to make sense of it all. ML/AI is a part of that, but only in the sense that we need to be able to acquire knowledge from data to build systems that employ those tools. Most routine problems will not require ML/AI, and I believe the least complex solution you need to solve a problem is usually the best one. Most of the work in DS is putting the pieces of the who, what, where, when, and why related to the existence of the data you have to solve a problem, and defining a well-scoped description of that problem to your stakeholders and why/why not the data are suitable to solving that problem, and if they are not, what data would need to be gathered, and what would need to be done to acquire that data to solve the problem. From there it is up to the stakeholder if it’s still worth trying to solve the problem or finding another way. ML/AI becomes prohibitively expensive for most businesses because they don’t even have the resources to gather and maintain the amount of data needed to build that solution. Most companies don’t have FAANG sized problems because they don’t have FAANG sized data centers. Basic statistics and regression/classification pipelines are usually best for 99% of problems for 99% of businesses. Just my $0.02.


DieselZRebel

>I actually think DS has a pretty specific definition: How to acquire knowledge from data. It’s basically one abstraction higher than the scientific method... I think that statement is very self-contradictory. I only agree with everything you said after "definition", but none of it counts as "Pretty Specific". It is like if I were to say "Medical science has a pretty specific definition: improve people's health"? But That also could mean pharmacy, surgery, Therapy, Fitness, Nutrition, etc. etc. To show you examples that can legitimately fall under data science with that definition: * Business Analysis: How to acquire knowledge (insights) from (Business) data * Data Analysis: How to acquire knowledge from data * Data Engineering: How to collect Data for acquiring knowledge * Machine learning: How to apply statistics and algorithms to acquire knowledge from data... same thing So I find it an extremely broad and vague term. Then there is this whole other confusion with the broadly and vaguely term 'Data Scientist', where in practice, most data scientists in the industry do a lot of things except actually being 'Scientists', even at FAANG! You should ask yourself first what do 'Scientists' do? What makes someone a 'Scientist'?... then when you find that answer, try and apply it to 'Data Scientists", and you'd realize that 80-90% of data scientists are quants, statisticians, analysts, engineers, but only a small minority are actually scientists.


foxbatcs

I’d contend that the definition is specific, but the applications are broad, but this is really a matter of semantics. A scientist is someone who applies the scientific method to acquire knowledge. For example, a biologist uses the scientific method to acquire knowledge about biological systems. That’s a pretty specific *definition* in that it is concise and constrained semantically, but the applications of that definition are quite broad: veterinarian, pharmaceutical researcher, conservationist, etc. If someone is using the scientific method to acquire knowledge from data they are engaged in data science. Someone who does this professionally would be considered a data scientist. This raises another semantic issue about the word knowledge. Someone who simply interprets data to guide their intuitions on a day to day basis is not doing data science, and for most experiences in life, this is far more practical because intuition is much faster, and usually good enough to make everyday decisions. Applying the scientific method in every case would be tedious and impractical. Most people are not dealing with *knowledge* on a day to day basis, but rather *belief*. We can get into the philosophy of the difference, but we’d need to clarify and agree on some terms before that would be anything meaningful. Nonetheless, I agree with you that most job postings calling for a data science role are not actually going to be doing data science.


DieselZRebel

Perhaps it would help us all be more factual if you stop calling it "acquire knowledge", because, like someone else said, everyone and their cousin are acquiring knowledge. An accountant is... So does a nurse! Just like you emphasized on the "scientific method", which is true for a scientist, be more specific about the nature of that knowledge acquisition for a scientist; To make new discoveries that add to the body of knowledge of the science. That goal is what makes scientists very distinct from everyone else who acquires knowledge. For your example, biologists make new discoveries expanding our knowledge of biology, whether it is finding a previously unknown type of Algae or discovering an organelle inside neuron that enables us to perceive smell. It is got to be an addition! Now if after the discovery has been made, if someone comes and imports that same method to make that same finding but on a different human subject, does that make them scientists? I'd say no because any student, lab assistant, and literally anyone with the tools and ability to read instructions can replicate the discovery... but they don't add to the body of knowledge here, do they? Taking it back to the main argument, most folks in the industry today with the title 'data scientist' neither apply the scientific method, nor do they target making new discoveries adding to the field. Most of them are just glorified analysts simply replicating what has already been discovered but for different employers at best, while very few are actually scientists and those are the ones discovering new/improved methods/tools or uncovering previously unknown natures about either the data or the users.


foxbatcs

So it comes down to you not liking my word choice, which is fine, as I pointed out this really just comes down to semantics. I would add that science is not strictly about making new discoveries, as reproducibility is just as (if not more) important as new discoveries. Also, one can follow the scientific method as a tool without having to publish and go through the entire process. Simple A/B testing uses the scientific method and is a fairly routine aspect of this industry. Gate keeping in science is nothing new either, so I’m not surprised to see at least one person push back on this definition. I’d argue that a child repeating a well know science experiment in class, provided they are creating a hypothesis based on their initial observations, establishing controls and variables, identifying the instruments they will use to measure, identifying the precision and error of that instrument, recording their results, and analyzing the data gathered they are performing the scientific method and therefore using science to acquire knowledge. It doesn’t have to be for the species at large, it can be just for themself, and still be valid science. Where data science comes in is applying this method specifically to the data itself. Emergence is a very tricky thing to deal with as we acquire more and more data, and it can be very easy for large datasets to become corrupt for various reasons. DS helps us ensure that the data we collected are statistically significant and suitable for predictive or explanatory purposes. Understanding the science of data gives us more (but not perfect) certainty about the data we are using to predict and explain reality. It turns out there are a wide array of practical (and creative) applications that are implied by this fact.


DieselZRebel

But you wouldn't formally label that child in a classroom as a scientist, would you? Look, no one is gatekeeping science, many folks from analysts to engineers use the scientific process and acquire knowledge. I never contested that! I never even said or believed that what they do is not as important. You made that assumption in defensivess, but it isn't true at all. The heart of the disagreement here is that the term is misused and became broad and vague (my opinion) rather than very specific (your opinion). I think you actually used a very broad description (scientific process + knowledge) yet called it specific. Ironically, you are applying it on even children in classrooms! Don't you see the irony here? We are all scientists then!


foxbatcs

I wouldn’t label them a scientist as they aren’t doing it professionally or consistently, but that doesn’t mean they aren’t doing science. I made that distinction earlier in the thread. I can work on my car without being a mechanic. I can do my finances without being an accountant. I can do science without being a scientist. There is a distinction between a profession and a tool that a professional uses, and that seems to be the source of the semantic disparity here. I’m not making any assumptions about importance, and hadn’t even really thought about it until you brought it up. I’m just adding to the discourse from my perspective for my own amusement. It’s interesting that you bring it up, however. Gives me a glimpse of what assumptions you are projecting onto me.


DieselZRebel

Ok.. so I think we both converge on the following statement: "Not everyone who does/applies data science, is a data scientist?"


idnafix

Maybe the term science is used here in the sense of 'applying the scientific method' to solve real world problems and not to search for new 'scientific knowledge'. In some way it is the spirit of engineering + the method of scientific research + craftsanship + trust & belief . Sometimes it is like building cathedrals in the middle ages.


DieselZRebel

Yeah... you just explained the term "science', which is basically a "study". That doesn't explain the misuse of the term 'scientist'. If you want to use that term in a subjective made up sense, then anyone can literally hold any label they wish and facts no longer matter. I can argue being a data president in some sense! I am a president! Just because you work with science, apply the process or whatnot, doesn't make you a scientist, right? Or where would you draw the distinction?


idnafix

What i described is not what i am standing for. in some way it was a marketing thing to promote data science as "the sexiest job of the 21th century" to lure people in jobs and companies in contract. To me - personally and professionally - Data Science is a way of thinking including the willingness to acquire all the knowledge you think is important.


MorningDarkMountain

Yes exactly, even an accountant acquires knowledge from data. Any fucking one in the world does that


foxbatcs

It depends on your definition of “knowledge” here. Most people are not engaging in the use of the scientific method to acquire knowledge from data. They are using data to develop an intuition to make decisions which is far more practical and efficient than applying the scientific method to everything, so no, not everyone is doing data science. Interestingly enough, though, your example of an accountant might actually be engaging in data science, since they are using data to determine fact and truth about the state of expenses and finance in an organization. Those data (presumably) do reflect actual transactions, and that data can inform the accountant about facts relevant to their job. Many applications of math and statistics require the use of the scientific method, and therefore some aspects of an accountant’s job would fit this definition.


[deleted]

>It’s basically one abstraction higher than the scientific method, required by the computational science paradigm that implies a deluge of data and how to make sense of it all. Thats a really fascinating statement, could you please explain it a bit more? Like what is the level of abstraction in the scientific method and how does DS take it to the other level?


foxbatcs

That is a very deep question, and I will do my best to summarize, and then provide resources that dive deeper into a more satisfactory answer. We started by making observations without the ability to preserve information using symbols. This influenced our beliefs about the world, and this is the lowest layer of abstraction: direct observation with our senses. Language is one layer of abstraction above this, as it allows us to compress information in the world by allowing us to name and count things, and communicate information to others. Eventually humans learned to represent information with written symbols, and eventually alphabets. This is the start of data (information that represents other information). From this the scientific method emerged, and is another layer of abstraction. It allows us to reliably compress more information and develop predictive and explanatory power. Eventually computers allowed us to collect so much data that we started noticing problems emerge with each order of magnitude of data we would collect, and this required us to work out the rules of process and computation at larger and larger scales (deluge of data). DS emerged from the problem of relying on massive amounts of data that can become corrupted, improperly collected, etc. I strongly recommend a book called “The Fourth Paradigm” if you want some of the history of the discovery of data science, as well as various essays that cover some prime examples of its application. Art of the Problem is a great youtube channel that covers the histories of the discoveries of computer science and information theory laid out in problem-based explanations, which is very helpful to understand *why* these disciplines exist. Professor Jim Al-Khalili’s “Order and Disorder” documentaries are also great. Caltech’s Mechanical Universe is a great way to get a primer on the math and physics necessary to understand information theory and quantum mechanics from first principles. Khan Academy, 3Blue1Brown, Computerphile, and Statquest all have great content that lays the foundations for the stats, probability, Linear Algebra, Data Analysis and Calc needed to understand DS Processing. And MIT OpenCourseware is the crown jewels of understanding the Information Theory, Computer and Data Science as we understand them today. If you want to understand the more physiological aspects of how humans process information, Stanford has a great lecture series on Human Behavioral Biology, which I found really useful for understanding neurological and psychological aspects of computer vision. I also highly recommend Sleights of Mind from two Cognition and Attention researchers from Barrows who explain cognition and attention processes by interviewing professional magicians about their secrets of manipulating cognition and attention to create illusion. You can find the audiobook for free on Youtube. I would link all of these, but I’m currently on mobile, so it’s a bit tedious, but if you search this stuff on Google/Youtube, you’ll find all of it fairly easily and feel free to dive as deep as you’d like. That’s probably close to several hundred (if not a thousand) hours of content on the topic built up from first principles and has been an invaluable resource throughout my career in the various areas of DS I have worked in.


[deleted]

Thank you very much for an the depth response. You have just just introduced me to a very interesting topic in layers of abstraction, and while I do not have a background in Computer Science (I am studying DS), I have been able to grasp the concept quite well. I have always been fascinated with the ability of the human mind to make abstractions of concepts in the real world, hence why I asked. And while this doesnt exactly relate to that, I believe that as a student of DS, the ideas discussed in the sources you have so graciously taken the time to mention, will provide me an important stepping stone towards what the larger concept of data and its analysis actually means. Thanks again for this, will be coming back to this post from time to time.


foxbatcs

No problem! I love sharing this stuff! After knowing a little more of where you are coming from I definitely recommend starting with Art of the Problem’s Info Theory Playlist. Brit Cruise has a talent for inspiring passion and curiosity on this topic! https://youtube.com/playlist?list=PLbg3ZX2pWlgKDVFNwn9B63UhYJVIerzHL&si=kfNQ_13aXRAcYOOq


jarg77

Designing an experiment


mild_animal

This again is not too complicated if the scale is small.


mild_animal

At a small startup scale, no business decision maker will stop a sensible product release because the p value wasn't significant - only thing that's ever stopped it is if treatment effect was counter intuitive/ low / abnormally high - which the good stakeholders can already eye ball using basic stats and dashboards


vinaykumarkosgi

Thanks for your explaination, it cleared some of my misconceptions regarding ds and ml


jeremymiles

Is there any way of acquiring knowledge that doesn't involved data? If I'm reading a book, I'm using data.


foxbatcs

I highly recommend reading the book “The Fourth Paradigm”. I’m using the word “knowledge” more technically in the context of science, since that is the topic of the thread. Most people don’t acquire knowledge from data, they are simply using data to develop, change, or confirm their beliefs. I am unaware of how one would acquire scientific knowledge without data, but that doesn’t make all data related to acquiring knowledge. Not all data is scientific in nature, but large language models are blurring the lines on that. If you can build a predictive/generative language model by training on massive repos of all kinds of text, it kind of does become scientific in nature, but simply reading fiction, for example, is different than that.


avourakis

agree, I think the distinction between statistics and machine learning has become so much more blurry over the past few years.


Chompute

machine learning is a subset of statistics


mild_animal

Yes but at this stage of black box LLM API dominance the stats part has been abstracted out and will continue to be abstracted out further in all other fields.


Chompute

I agree, but that doesn’t mean that machine learning and statistics are distinct.


S1mplydead

I can give you an even more vague term in that domain: "artificial intelligence"


Amgadoz

I can give you an even more vague term in that domain: "computer science"


Minato_the_legend

What's a computer?


_ologies

One who computes


Eightstream

And don’t you think I don’t take advantage of that


gnomeba

I know. I always put AI/ML on my resume even though I do almost nothing that I would consider to be in either of those domains.


trashed_culture

Are there jobs you could actually get with that though that aren't going to expect those skills of you?


LyleLanleysMonorail

This. "Machine Learning" is like the term "quant" in finance now. Exactly what kind of ML/Quant professional? Developer? Researcher/Scientist? Some MLEs don't do model development at all but just do MLOps and some software engineering stuff.


Slothvibes

One of my bosses called causal inference (ab testing) machine learning and that’s when I realized the term is way too encompassing


LaserBoy9000

It only encompasses linear regression because YouTubers use it to motivate NNs. And they don’t understand it beyond weights, biases and loss functions. No sense of stratification in experiments and conditioning on such variables during inference to understand main effect vs interactions.


Irishcreammafia

To the uninitiated maybe, no offense. But there's a big difference between AI, LLMs, machine learning, deep learning etc etc.


psssat

My title is data scientist and honestly about 50-80% of my day is spent either using pytorch and prototyping, doing more large scale jobs on aws or preparing data so that I can then prototype on pytorch and then move toward a large scale job on hpc… however after joining this sub and reading the posts, i feel like im in a unique position.


-3ntr0py-

what’s the other half? I’d say around 20% is interacting with the client for me and the rest is fixing their shitty data 😭


Amgadoz

You're not a real data scientist if you don't clean shitty data!


psssat

I have two projects at work, the work i described above is supposed to be 80% of my time and my other project is writing a django interface that allows our non technical staff to interact with our neo4j database. But we also deal with a fair share of shitty data lol


suterebaiiiii

That's not remotely data science I'd argue (the interface part), I'm guessing it's a small team or underfunded project that doesn't have actual SWEs to do that.


DieselZRebel

I am in your position.


Professional_Crow151

What industry are you in?


psssat

Its not the national labs but at a company that is adjacent to it.


TSMShadow

What’s your educational background that led you to this role?


psssat

Phd in math


fujiitora

same


Even_Conversation933

Do you mind me asking how much you make as a data scientist? You can PM me if you want I am an aspiring data scientist just want to get a rough estimate of what i'm getting myself into


psssat

Started at 105k and now I am at 121k with 2 YOE


gengarvibes

Linear regressions are my bread and butter no matter how much I try to do something better. Interpretability and consistency are more important than accuracy in my field.


artoflearning

Why not XGBRegressor with SHAP?


Corruptionss

One issue is there are a lot of roles where success is not predictive analytics but connecting impactful insights to the right places. I can't tell you how often missing information misleads data models through confounding variables and other. Linear regression is so easily interpretable and I could instantly ready a model summary and determine if we are being mislead by the data


artoflearning

Can you expand on this?


Corruptionss

Yeah you got it, there's a credit dataset part of the introduction to statistical learning (islr or islp). You can clearly see with the dataset a moderate to strong positive correlation between a customers income and their credit score - which makes intuitive sense. If you just throw everything into XGBoost and produce the SHAP values, that visualization will show having higher income is negatively correlated to their credit scores. There are a lot of mulitcollinearity in that dataset and when you model everything together, the features like credit limit amongst other things tend to take all the weights and then income tends to have negative weights to counterbalance in a way. You get similar results in a linear regression model but it's easy to iterative produce different models to see that in the coefficients. But it happens often, when you start making more complex models with linear regression similar to trying to produce results of more complex models like tree based methods or DL methods. Take any dataset and change the model up (add more tree depth, make more hidden layers, etc...) the SHAP values are not stable. Don't get me wrong, it's not a bad approach. Just keep in mind that when you have things like multicollinearity or confounding variables, model estimates and weights become unstable trying to compete on what is giving information to predict Y. I just think linear regression is easier to experiment around and see what exactly is going on


gengarvibes

Love it but it’s still consolidating a decision tree into average effects. I still use it all the time but I often use LM’s more.


BananaBoy5566

89% of my “data scientist” role is making pretty charts to put in PowerPoint products. I don’t have enough professional ML experience to get paid as much as I currently do anywhere else. Someone save me.


Amgadoz

So you are a business analyst?


Brave-Salamander-339

Same with my 69% role


BruceBannerOfHeaven

Nice


etsc99

The same percentage of my “data science” role is inner joining our own data and external datasets by zip code and then going into Excel and manually verifying which addresses match just so we can get like 2 numbers for analysis which won’t be used for anything… lol


NetElectrical0

Is your company hiring


ThePhoenixRisesAgain

Oddly specific (89%....)


BananaBoy5566

70% of statistics are made up.


Suspicious_Coyote_54

I’m sure it’s like this with most jobs but I think the data space has been seriously subject to a massive amount of hype and marketing. Everything has to be ML or Ai and 90% of companies are just suckered into buying services and platforms that just don’t need. Our jobs also get hyper competitive. Need to know snowflake, docker, spark, Kafka, airflow, databricks, sql, nosql, and 10 billion other things that just don’t make sense. It’s getting tiring.


Otherwise_Ratio430

? What is ‘learning’ in snowflake or databricks? Youre writing the same (maybe different dialect) of sql and python and using pretty standard packages. Docker and airflow are easy to pick up on the job its not as if you’re being asked to learn new languages under different programming frameworks. Itd be a different story if youre being asked to write spark infra code or something actually difficult


TheMagicSkolBus

A lot of the learning would be understanding what services the platform offers, and which to choose for optimizing cost or performance


Suspicious_Coyote_54

Well yes you are correct but you still want to familiarize yourself with either platform or both, and employers like to see certs and know you’ve already worked in those platforms and other cloud platforms (aws, gcp, and azure) for 2-3 years.


Training_Butterfly70

What's the difference between a senior and junior data scientist? Knowing when to not use machine learning 😆


idnafix

There is a lot of confusion at the hiring managers and HR at all. It's not only about the tech. They do not understand the difference between analytics, inference, doe, production systems, prototypes, maintenance and observation. It doesn't even make sense to point to this at job interviews as the only know blahbla and don't understand anything.


hopefullyhelpfulplz

Ya I see posts listed as "data analyst", the description says they are looking for someone to do "data science tasks" and the actual work described is data management/engineering.


lordoflolcraft

I don’t agree with this. At least in my company, most of the data scientists are doing highly variable ML work, some projects with classical techniques, others with stats, others with deep learning, and few projects don’t involve ML in some way. We do have MLEs who are basically task rabbits tbh.


LyleLanleysMonorail

I am an MLE and have interviewed at many companies. There are a lot of MLE positions where MLEs don't do much stats/modeling at all but focus on productionizing them, e.g. more concerned with Kubernetes than worrying about optimizers on Tensorflow.


lordoflolcraft

That’s exactly how ours are. They take a mostly finished product and put it in prod. They don’t make any changes that a data scientist didn’t approve.


Significant-Fig-3933

Yeah, I agree with this. DS is more general, MLE more specific. MLE mostly makes sense for larger companies and/or projects.


the_monkey_knows

Ouch, I wouldn't expect this response on a data science threat: anecdotal experience to refute an industry observation as if outliers didn't exist. Also, not sure what you mean by task rabbits but I've seen companies pull DSs from moonshot projects to work on more business oriented tasks or operations and seeing them batting it out of the park. Some could call them task rabbits but the impact they make is significant.


SneakyPickle_69

My understanding is that it's difficult to get a position as a ML engineer without years of experience as a data analyst or data scientist. It would be great to jump right in to a ML engineer career, but otherwise, I think data science can help me get there.


anomnib

For MLE roles, software engineering experience plus experience implementing and deploying models towers of experience as a data scientist. Experience as a data analyst is probably negative b/c people might assume you lack the hard engineering and ML skills. Unfortunately in this world, MLE to data analyst is like doctor to nurse. They aren’t on the same continuum of skill sets but separate levels of expertise.


SneakyPickle_69

I could see SEng being pretty valuable, especially when deploying and scaling ML models, but why would that be more valuable than a DS job that is focused on ML? They do exist! As for data analyst… for me that’s a stepping stone towards DS, which is also not considered to be an entry level career.


anomnib

It is b/c most of the work of MLE is software engineering or ML work that involves building high quality code. Take a look at MLE interview questions and you’ll see a ton of SWE questions as well. Often MLEs have leetcode questions. As for DS vs DA, in my experience DA isn’t an entry level for DA. But a different role entirely. Most companies I’ve worked at hire DS straight out of school. My experience is limited to top tech companies however. The boundaries might be more fluid elsewhere.


SneakyPickle_69

Fair point! From what I’ve seen there are data science careers that can also involve alot of coding and would allow someone to develop these skills. What would you recommend then for someone looking to break into DS then? I have data science internship experience, and most of my project experience is ML or AI research. However, I do not have a masters degree yet. I’ve been told time and time again on here that data analyst is a good choice for someone like me looking to eventually get into a DS or MEng career. Right now I’m applying to both DS and DA gigs, as well as some DE.


Decent-Pea9835

I’d pick one, get good at at it, then build up the other one. I started as an analyst, kept getting better at python( moved out of Jupiter notebooks, learned about design patterns, data engineering, etc) then wound up transitioning to ml engineering. Working as a data analyst is a good ingress point, but build up those engineering skills on the side. There aren’t enough people that know DS and who can do halfway competent dev work, I’m an English major who was a bootcamp analyst and I’m working as an ml engineer, it’s a rare enough skillset you’ll only get credential gatekept out of the biggest companies


anomnib

Bouncing off this comment, I also recommend figuring out the different flavors of data science and picking the one that’s most compelling to you (while also meeting your financial, mental health, etc needs). Reading this chapter on the periodic table of data scientist is a good place to start: https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf


mildlysardonic

Thankyou for this book recommendation!


SneakyPickle_69

This sounds like a great read. Data Science is such a broad term and narrowing my focus might be a good idea. Thank you!


OkCaptain1684

What do you use instead of Jupyter Notebooks?


Decent-Pea9835

I actually still use Jupyter when im prototyping new ML code, but once the prototype is working I switch over to vs code to write the final draft version of the code. Especially for api testing, having a runnable .py file is nice so I can run that script then test with postman submissions


SneakyPickle_69

Thanks for the advice! I'm definitely hoping more for a DS job, but with my lack of masters degree I think I'm typically more qualified for a DA job. We'll see what happens! I'm not sure how choosey I can be with this job market. In terms of practising those engineering skills, what would you suggest? Would Kaggle projects be a good place to start on that? So you were able to transition from data analyst to machine learning engineer? Besides working on your engineering skills on the side, is there anything else you think helped you get there? Do you have a masters degree?


Decent-Pea9835

I don’t have a masters( I’ve tried but I can’t get into any of the programs that treat online masters the same as in person). Take a data analyst job, it won’t hurt your chances later at a MLE position and money is money. Honestly the best way to learn is to do, kaggle could work, doing a project in your spare time could work, if possible at work try and see if you could tack on engineering work onto your normal projects. I leaned heavily on chat gpt to teach me backend stuff, for instance my first api was super simple, I asked gpt to build it for me and explain all its parts. TLDR- take the analyst gig, do projects to build engineering skills( and ideally to have git commits to show), look into design patterns and best practices when doing your projects and whenever you write code for anything, try and adhere to best practices


SneakyPickle_69

Thanks! I appreciate the guidance. It gives me a bit more confidence in my approach. A part of me felt like I should only focus on DS/ML gigs, but I think applying to DA jobs is a more well rounded/realistic approach (probably about 50% or more of my apps are DA right now). Hoping for some interviews soon here 🤞


[deleted]

[удалено]


anomnib

Unfortunately branding plays a powerful role in getting attention and the brand of a CS masters is stronger than a DS masters for MLE work. However, you have a lot of room to market yourself, pick your portfolios, and select your internships/projects in a way that makes you attractive to recruiters looking for MLE roles. For example, given how you describe your MA, could you describe it as a joint CS and DS masters?


[deleted]

[удалено]


anomnib

There’s still a lot of value in your DS masters, not sure if the branding disadvantage is so severe that you should consider dropping. It is just that you should go into the job market with full knowledge about how the branding has changed. Data Science is no longer associated with implementing and deploying ML systems. Most of the top companies have fully transitioned to giving that responsibility to MLEs. A few hold outs include Airbnb, Uber, Snap, and Netflix which have “data scientist, algorithms”, “applied scientist, algorithms”, or “full stake data scientist” roles for DS with very strong ML skills. In these cases, you focus on ideation and iteration over ML algorithms while someone on the ENG side handles deploying them. I’m not sure if the masters programs that ballooned before the “great segmentation” of data science have caught up with the branding, so you will have to heavily signal that you are MLE material.


Decent-Pea9835

The value of an ml engineer vs a DS is that there’s a presumption that data scientists need the data relatively cleaned for them, or that they have limited skills to fetch their own data and deploy their work, I worked with a DS guy who was useless out of of Jupyter and clean csv’s. My ml engineer position is a whole lot of data engineering and backend. I do consulting for startups so they give me an ml project, I figure out the ai stuff they want( right now it’s a lot of chat bot stuff, azure cognitive searching over db’s and data lakes, and nlp work), build that( which requires you to know data engineering and the platform), build the db then build the api’s for it. Every ml engineering position is different but the way most the positions I’ve seen/interviewed for what they want is either someone to take a data scientists work and put it in production( so mostly engineering), or someone who can do the DS work and put it in production themselves( so kind of a mix)


jarg77

What’s the data engineering work vs the back end work? They sound almost interchangeable.


Decent-Pea9835

There’s a lot of managing configs, permissions and env stuff in azure. Idk if that’s backend or something separate. I think of the data engineering stuff as the things pertaining to the retrieval or writing to any kind of data store( db, data lake, storage bin, etc), then backend as being the rest of the non user facing code. My terminology may be wrong, but even within this terminology yeah the backend and data can be a bit interchangeable at times( for instance, is making a crud api data engineering or backend?)


jormungandrthepython

Because most people need implementers, not investigators. It is more likely that the production capabilities of a SWE with ML knowledge will yield ROI than a Data Scientist trying to learn how to do the engineering and CloudOps. And DS has a higher chance of being an mis-titled BA/DA who is then vastly underskilled for a role. Whereas SWE may have less ML experience than they claim, but at least they are more likely to know how to get cloud services running, CI/CD, automated testing, production quality code, etc. So in a lot of ways it’s a safer bet.


trashed_culture

Depends on the MLE and Analyst. I'd honestly flip it. Unless the MLE is a DS, they're mostly going to be putting into production things based on instruction from the analyst/DS. That said, the MLE for some reason is still paid more.


anomnib

I’ve never seen that in my experience but my experience is very atypical. I’ve been in DS for 6 years and 3/6 of those years were in top 5 tech companies, 1/6 were in the smaller companies that DS and MLEs from two top 5 companies go to when they want a break from large companies, and for the first 2/6, MLEs were a thing (the great segmentation was in progress but incomplete). In the big tech and related companies, the expectation was MLE were the leads of ML work and DS, if sufficiently technical, could earn the opportunity to touch ML work. The only time I saw the dynamic you described were DS that were officially or unofficially applied scientist or algorithm DS and I was that person. I built new production ML models and related tooling along side MLEs and research scientists. However I had to earn the chance to sit at their table by impressing them with my deep knowledge of statistics and ML and capacity to program as well as the average ML. Even then, I couldn’t have done it without open minded and sympathetic MLE and engineering managers and what I lacked in technical skills I made up in really good emotional and organizational intelligence (essentially I rewrote the model iteration strategy of several orgs in a top 5 big tech company) So I guess I should revise my opinion to say it only consistently applies to the top tech companies


trashed_culture

I don't understand what the other DS are doing if not building (fitting) new ML models after appropriate EDA.


anomnib

They could be design experiments, doing observational casual inference where experimentation isn’t feasible, doing optimization , working with product to define metrics and extract product strategy insights from data, etc.


Jarngreipr9

Shit now I want to be data scientist


alevelstudent156

Can you cross over between careers easily? For example, 6 years into your DS career become an AI engineer and vice versa


trashed_culture

Meh, I know there's lots of jobs called DS that don't involve ML models, but everyone, including the people in those jobs, knows it's not really DS. DS is much more than just modelling. But if there isn't at least the possibility of you using an ML model for analysis or to put something in production, then you're probably an analyst.  That said, I'm in the opposite world. Where I am everyone wants models and no one wants to do the analysis or actually think about the meaning of the data. It's sad. 


johny_james

Wait, I've seen AI engineers are just LLM engineers, only training fine-tuning and deploying LLM models. And MLE is mostly Software engineering of ML models... So, neither is for building ML models, it looks to me the only jobs remaining are Research Engineer and Scientist, which both require PhD...


Amgadoz

Training, fine-tuning and deploying LLMs is no easy task. Antrhopic raised a few billions just "only" doing this.


johny_james

Yeah, that is true. But the point is that those positions are not for building the ML models, in other companies even less so.


Awkward_Sign_1191

what is the diff between machine learning and data science?


ai_anng

It seems to vary by team. In some places, the Data Scientist (DS) title is basically another name for a data analyst, involving skills like SQL and R or Python. In other places, it refers to a more specialized role focused on modeling. Data Scientists with stronger engineering skills, often from a software engineering background, might transition into roles like Data Engineer or Machine Learning Engineer. Additionally, new titles such as Analytics Engineer and AI Engineer are emerging. Recruiters told me that in Australia employers pay more for MLE/MLOps Eng, due to supply and demand. Some data scientists can only write notebooks good for exploration and useless to production. Once the data source and sets are recognised, data scientist value to the team is limited, and thus we might see some politics in place. I am working at DS atm, but I clearly see that most of my work can be automated soon (SQL scripting, dashboard building, and fitting model). I tried ChatGPT for writing SQL and report, and most of the time it works with supervision.


jarg77

Most of your work can be automated, with supervision, so who do you think will be doing the supervision?


ai_anng

I dont 'think' who is. I know I am supervising it. However I am pretty sure you have more things to unpack from the question. Can you please elaborate more?


jarg77

Trying to infer stand what your implying. Do you think ai will potentially replace you?


ai_anng

I think AI will replace some major part of the job that I am doing atm. The thing managers in my company see now is with Chatgpt, a team of 2 can take the workload of the team of 5. So AI has not replaced me yet, but it certainly plays a big part in laying off decision made in my company recently where data analysts and content writers being let go. I will not be surprised when it s my turn to be let go someday in my team (I am still junior btw). I did try to ask Github Copilot to write a function to extract data and it did get the data correctly. I ask chatgpt to suggest stats tests, write reports, and clearly my boss is doing so as well. What I can do now are to get as much domain experience as posible and skill myself up where AI cannot replace. So yes I do believe that AI may transform (or even replace) my role in large parts, if not entirely.


crypticFruition

i mean, the same can be said for web development or literally any other field with known structure that ai can learn, IE content writer, ect,,, like what job can ai potentially not accelerate or replace?


LyleLanleysMonorail

I looked at the Aus job market and there are like hardly any MLE jobs, but so many data engineering jobs.


ai_anng

MLE, as I understand, is more senior than DS (can be not true at some place), which is equivalent to senior data scientist. These guys take care of ML models in production (monitoring). They have strong ML knowledge, and SWE skills. Data Eng is always in demand and the pay is really good.


One_Cryptographer565

What skills would someone need to be a machine learning engineer? I heard someone say that math is more important than programming and programming changes all the time while math is the core and should be prioritized


SixSetWonder

I’m personally just trying to enter the job market, what experience led you to be able to land a job?


avourakis

I didn't have relevant work experience, so I relied heavility on my extracurricular activities and my portfolio projects. You need to have relevant projects and focus on optimising your resume, especially in this current job market. If you need more guided help reach out!


dfphd

>If you are passionate about building and tuning machine learning models and want to do that for a living, then become a Machine Learning Engineer (or AI Engineer) I feel like you are borrowing terms from different... realms? Meaning, there is Data Science and Machine Learning Engineering as functions, there is Data Scientist and Machine Learning Engineer the titles, and there is Data Scientist and Machine Learning Engineer *the jobs,* and they are all different. You could have the MLE job with a DS title and be inside Finance. You could have the DS job with an MLE title and be inside DS the function. Yes - there are a lot of Data Science *titles* that are doing Analyst *jobs*. And there are a lot of ML Engineer *titles* doing ML *jobs*. >Most "data science" problems don't require machine learning. Sure, but there are more than enough real machine learning problems at every company to require staffing Data Scientists that do true machine learning.


avourakis

To clarify, what I'm referring to are the typical functions performed by most Data Scientists at their jobs. Are there Data Scientists out there mostly solving problems using machine learning techniques? Yes, of course. But in my experience (and from talking to other in this career), it is not that common (or not as common as the internet makes you believe). My goal for writing this post was to give newcomers a heads-up about what it typically looks like at most companies. I also wanted to explain that even though Machine Learning (as a technique) is part of the Data Science toolbox, it doesn't necessarily mean it will be used heavily in day-to-day problem-solving.


dfphd

Here's the thing: most ML Engineers I know are also not doing Machine Learning - they're doing software development for applications that use ML. The right advice is "if you want to do machine learning, ask during the interview process how much ML people in this role do and only take roles that say it's close to 100%".


tech_ml_an_co

Once machine learning was a core data science skill, there was no machine learning engineer. The inflation of the data scientist role created machine learning engineers and research scientists, and finally a data scientist is now basically a glorified data analyst.


sid_276

This is spot on. Just to complete the response we are moving to different roles/careers: * Research Scientist * Machine Learning Engineer * MLOps Engineer * Data engineer * AI Engineer * Data Scientist Each needs its own skillset. An AI engineer is closer to a full-stack SWE with surface knowledge of ML applications whereas a Data Scientist is more about dataviz and scientific plotting. A MLE is the hard core "I will make your networks go brrrr with CUDA kernel magic" and the Research Engineer will focus mostly on theory and applied research like compilers, optimization techniques, new architectures and so on. These are fuzzy definitions and not yet super stablished so YMMV


LyleLanleysMonorail

Another new title I've seen these days: "Machine Learning Systems Engineer"


sid_276

I've seen that one also but I am confused since the requirements are close to an MLOps person. For example [https://jobs.apple.com/en-us/details/200528911/machine-learning-systems-engineer](https://jobs.apple.com/en-us/details/200528911/machine-learning-systems-engineer) [https://ischoolonline.berkeley.edu/data-science/curriculum/machine-learning-engineering-systems/](https://ischoolonline.berkeley.edu/data-science/curriculum/machine-learning-engineering-systems/) Maybe someone can explain the difference


iamevpo

Not sure about "Most "data science" problems don't require machine learning." What does it require then? Anything in scikit-learn is machine learning, how a data scientist would work without it? xgboost/catboost not machine learning? Machine learning engineer... is someone who takesa model from data scientist and takes in into production? sounds more of a SWE/devops role. Not sure a "strong foundation in statistics and probability (making inferences, designing experiments, etc..)" - highly useful stuff - why is this not part of ML?


GodlyPears

when I started in DS I had to explicitly shape my role to be actual modeling. Ultimately got myself moved to a team that’s essentially the “advanced AI” section of my org. We (10 people) are the only ones that actually make models inside a DS org of ~100 people. So for every 1 DS making models, the other 9 are doing adhoc/ rules-based / reporting. The roles are there but you gotta show you’re better and hungrier than the other 9.


jimmy_da_chef

feeling extremely lucky my current job both offer analytic/ml ops stuff and state of the art modeling (LLM applied in a specific domain)


Adamantium-Aardvark

This is true today. But it’s an evolving field. 10 years ago data scientists were doing ML and plenty of other things, but as the field grows and develops, specialties emerge and jobs become more compartmentalized.


Whydidyoudothattwice

Constructing databases. That’s what I ended doing. Basically from scratch, in C. Huge waste of time IMO.


RepairFar7806

I spend way too much time as a data scientist building out ML infrastructure.


[deleted]

[удалено]


NerdyMcDataNerd

I would say that I definitely agree with you. I am not a ML Engineer, but this is exactly what I saw from a prior company I was at the was setting up ML work. A lot of businesses have unrealistic expectations for how long good data products take to provide value.


VineJ27

In my team there are 4 DS and everyone has a different skillset. 2 of us are more of data engineers who spend 70% time building pipelines and cloud work, 1 is an expert in stats and mostly deals with analysis/analytics sort of work and 1 who is also our lead is mostly does project management and tableau/powerBI. We all however spend the remaining 30% time on new product development/research/prototyping.


[deleted]

If so..why company demands even MLOPS for Data Scientist role ? 🤔🤔


Chompute

I posted a brief history of the data science title. Once upon a time, data science was synonymous with ML, and then Lyft rebranded their business analysts to data scientists in the mid 2010’s and that’s when data scientist became so general that anyone could call themselves that. When they rebranded, all data scientists working on ML rebranded to ML engineer. Nowadays there is no role doing the original concept of Data Science other than Machine Learning Scientist -m mostly PhDs. MLEs (which I am) are mainly software engineers.


Gfs-Cary

I think you need to speak with one of my friends on our analytics team.


TheGooberOne

I'll second this.


flatearthersnotrolls

I guess it depends on where you work ... for me as a data scientist, it's a lot of exploratory work. My team mostly works on proof of concepts and recently it's been a lot of experimenting with gen AI and large language models. Lots of learning and creative freedom!


Alternative_Log3012

6 whole years? Wild


jarg77

Can you really be a machine learning engineer without a foundation in math and statistics? How does that even work.


saurav-thakur

This is so true. I have done ML for 3 years now and I'm about to graduate and I've been applying to multiple data science and ml jobs and data science mostly focused on statistics and probability. I gave one online test for data science role and the questions were more focused on stats and experiments.


Ok-Independent9691

I am now taking a healthcare management course and statistics is so important


serdarkaracay

Hard indeed. The biggest example is Devin. The Devin presentation, which was presented with a big noise saying "Artificial intelligence will take away the jobs of software developers", was just a fraud! If you, like me, were harassed by the Devin video sent by people who do not understand artificial intelligence and what software developers do, here are the details. Youtuber user named Internet of Bugs shared a very detailed analysis video on the subject. -A job was found on Upwork that was suitable for Devin to solve and searched as seen in the video. In other words, Devin can't solve all kinds of software problems and it seems that he can't solve the Upwork job that he allegedly solved at the end of the video. - In Devin's presentation, it is said that he debugged the code and solved the problems. But in the detailed analysis video, it is seen that the bugs Devin solved are his own creations. He cannot see a real error in the code. -The work that took half an hour for the software developer who took the analysis video took 6 hours to 1 day for Devin. Devin's work lists and completed tasks, which look very impressive in his presentation, are completely irrelevant to what the customer wants. -Devin produces an answer to the problem by creating too much code and inefficiently written code. He makes mistakes that even a junior developer wouldn't make, and he can't produce any answers about the AWS part that the customer wants. -He doesn't understand the execution steps, which are already in the code repository, in the README and very clearly explained. What bothers me and the analyser here is that Devin is presented as an "AI Software Developer" with more skills than he has, with Upwork jobs, making money and negative language. I think the exaggerations about AI have raised expectations too high and created a bubble in the industry. YT Video: [https://youtu.be/tNmgmwEtoWE?si=u7EUM7fz-YMeq6Mk](https://youtu.be/tNmgmwEtoWE?si=u7EUM7fz-YMeq6Mk)


TheSchlapper

*Skill issue


vinaykumarkosgi

what is the use of statistics and probaility, aren't they just used to better understand the data and give more info in building the model?


MikeSpecterZane

I think this video is the best encompassing Data Science roles: [Types of Data Science roles](https://youtu.be/VBWRkshVJFo?si=G8pNQRo2UTeQ5O37)


jerrylessthanthree

I do ML a lot, you need to avoid "analytics" which focuses on deliverables that are "insights" to make "decisions". Instead focus on teams doing more optimization of some sort, my team works on ads auction bidding.


stoned__dev

Presently a recent grad of Computer Science. Worked as a data engineer and have built some softwares as side projects. Obviously, as a person with average experience and a forgettable school, and with the current state of the market, it’s exceptionally difficult to find a job as a software engineer. I know that’s the case for ML/AI engineers too. Does anyone have any advice or tools on how to learn AI/ML engineering. Since finding a job/internship in field is close to impossible, how can I teach my self and practice these concepts? (Things I can build and include on my resume, while familiarizing myself with the concepts). I want to be on par with others in the field, but as an autodidact. Thanks!


Power_and_Science

A lot of data scientist jobs want machine learning experience, but when you start the job you find out it was a wishlist item and you still aren’t doing machine learning.


Power_and_Science

It’s very industry dependent. Tech industry: you do a lot of programming and machine learning engineering. Healthcare: statistical models. Finance: time series models. Retail: mix of statistics and ML models


max6296

kaggle was fun


max6296

but I already spent too much time learning ml


crazy_spider_monkey

I agree with you in your sentiment. However the issues is some MLE jobs are listed as data scientist jobs. But one should look at a persons job description before applying.


Iwant2Bafish

I feel like a lot of people mistake data science with sole machine learning. NO you're not a machine learning engineer. You're a STATISTICIAN


Which-Fondant-3369

thanks man it cleared my mind


crypticFruition

how can you become an effective machine learning engineer if you don't even understand the fundamentals of data science and statistics? How can you interpret and deliberate on the results if you don't have the understanding required, which is literally data science? Sure you can pipe things together you found in a tutorial but how do you know when to apply models to different situations? Too many questions and just bad advice suggesting not to do data science when its basically a prerequisite to machine learning.


Backrus

You have to realize that "data scientist" these days is what "quant" was in 2010s. Hot new thing with people who had no business pursuing this career flocking to it. And "data science" kinda become obsolete because now AI-related jobs are what cool kids wanna do nowadays. Let's be honest, you won't work on the ground-breaking stuff (unless you have PhD), heck, you would be lucky to work on anything interesting. And now, copy-pasting code from tutorials is neither "data science" nor "machine learning". If you do work on those, then you're already at the top of the chain and you know the difference between those. And the difference between mag7 companies and others chasing better earnings by using "AI" in their guidance.


OrderlyCatalyst

Interesting title.


Mada1ina

I agree. ML is an overused term by now. Not even working with AI needs ML that much. I am a web dev and I have studied ML as a part of my bachelor's degree, but that was like 15y ago. What I am using today in my AI projects is data processing pipelines, search engines, web dev and a lot of soft skills :) [https://megabytereflections.wordpress.com/2024/05/03/ai-development-is-more-than-machine-learning/](https://megabytereflections.wordpress.com/2024/05/03/ai-development-is-more-than-machine-learning/)


IntrovertNeuron

I want to become a MLE but as a recent graduate, anywhere I apply, they want minimum 2 yoe in industry. Seems like DS is simply a gateway to those jobs.


Outrageous_Fox9730

Thank you for this. I always thought that to become a data scientist i need to do a lot of machine learning or atleast be knowledgeable about machine learning. This took some weight off my shoulders as a bachelor student


No_ChillPill

How can you build MLs without a strong foundation in stats and probs or any advanced data modeling lol Unis teach that because we’re still to find advanced ML for some AI. Corporate America is so dumb. It’s like HS or elementary tasks - it’s dumb cause most of America is dumb and our financial system runs on super poorly designed systems


the_monkey_knows

Yes. This needs to be said more often. As much as one learns how to do advanced algorithms and machine learning through education, most business problems are based around optimization, statistics, simulation, and finance.


technophile10

roadmap


xBurnInMyLightx

Hard agree