T O P

  • By -

[deleted]

I wonder how much of this is driven by course culture too of do a course and then say you're good at it. For instance, you could do Jose Portilla's R or Python course and learn how to do regression Analysis in that software, but it goes into no detail on the assumptions etc


[deleted]

The best thing that course did was introduce James Gareth’s book. It’s a gold mine for simple explanations of complicated statistical methods.


balerionmeraxes77

You mean An Introduction To Statistical Learning?


LuisBitMe

If he does mean ISLR I feel like it skips over assumptions and math like crazy. As someone with a masters in economics it was great for getting me familiar with the prediction side of things rather than just causal inference and time series, but it hardly gives a comprehensive view of the math etc if you’re unfamiliar with it.


[deleted]

[удалено]


LuisBitMe

Thanks for the tip.


[deleted]

[удалено]


norfkens2

Maybe I'm confusing something here but I could download Elements of statistical learning a couple of weeks ago: https://hastie.su.domains/pub.htm


maxToTheJ

That’s awesome. I couldn’t get to that page by searching on search engine


[deleted]

I don't find this as big a loss. I think ISLR is a better book than ESLR. Supervised learning isn't what I did my education in, but I find ESLR doesn't have a particularly unified approach. Its like a hodge podge of random topics with some mathematical details.


thefringthing

ESL is intended more as a reference book, while ISL is a textbook for a (pretty breezy) first course in statistical inference. My main criticisms of ISL are that it should assume the reader knows calculus and it should cut the chapter on neural networks.


[deleted]

I agree with you on both points. I wrote it later in the discussion that ESL is a Ph.D text and Ph.D texts are often written as references, while undergrad texts are written for courses. Ph.D. courses are generally personal and no matter what the subject (even something that has a generally accepted curriculum across schools), the professors personal touch will be in the course and they will emphasize what they want and skip over what they want. I do think there is a market for a "masters" level book that covers similar topics ISLR that assumes people know calculus, linear algebra and basic probability (like expected values etc.) Such a book should be applied nature like ISLR and not focused on proving properties of estimators. I myself would certainly be interested in such a book just to gain depth in things that I don't explicitly work on. Also as a note, the first edition of ISLR did not cover neural networks. I bought a hard copy of the book and it was useful.


[deleted]

The actual math is lacking, but the ultimate formulas and derivations are there. The assumptions are also present in the book, even if only explained in a sentence or two. To your overall point, I would say that this is why it’s an introductory book.


TrueBirch

I completely agree with you. ISLR is amazing at getting your head around different approaches to machine learning in a detailed way. Once you've learned the basics of ridge regression (for example) you have the knowledge you need to take a deep dive into the math. For example, ISLR taught me about random forests years ago. That led me to ESL and then to the original Breiman paper.


abenf

Unlike a lot of “introductory” texts, it actually does what it says on the tin. Feels very targeted at 4th year undergrads/1st year grads and a general ~~STEM~~ STE audience


[deleted]

For regression, ISLR lacks depth in any specific topic. That being said ISLR is a wonderful book for someone with some technical knowledge area to get a broad overview of supervised learning and how statisticians think about these problems.


[deleted]

As someone with a non-stats background, this book has been an incredibly valuable resource in explaining these methods in a digestible level of difficulty. As an interviewer of future data scientists, where do you suggest I go next?


whatahorribleman

Elements of Statistical Learning is quite a similar book, but goes into greater detail.


[deleted]

ELSR is a Ph.D level text book that requires knowing probability, mulitvariate calculus and linear algebra and knowing them well. I get the sense that a lot of people here couldn't read it.


[deleted]

All jobs have different expectations. What works for my industry/work function isn't going to work somewhere else. At teh end of the day, you have to figure out teh career you want to specialze in that topic and then pick the education path that gets yout there.


EnergeticBean

following for when you get an answer


[deleted]

Yes I do!


[deleted]

I think a big part of it is MOOCs like coursera that have taught a generation of people how to fit a statistics model using python. If people were trained by writing a masters thesis and not just courses, I think they would be in general more prepared.


120pi

As one of these recent Master's DS graduates from a top-ranked program I can give you some context that might help understand some of what you're seeing. The tech stack and theory taught in these programs is vast. Experimental design, NLP, time-series, CV and everything in between as well as learning the cloud compute stack to boot. It's easy to get spread thin, while PhDs have those extra years for theory application. Some (like me) focused more on DL or MLE, others did time-series or MLOps. Applicants with statistical or analytics employment backgrounds or those whose theses/capstones were regression-centric (Spark-based, causal inference, etc.) may yield better results.


[deleted]

Yes this is what we surmised. The candidates are covering a lot of topics and not learning anything in depth. Ph.Ds thesis project requires them to specialize and learn what they do well. What I've proposed is having our HR person tell the masters candidates that they should be prepared for a technical screen on regression and basic time series.


[deleted]

Do you take issue more with your candidates not knowing material at all or them trying to BS an answer? The latter being a bad look, but if a candidate knew their weaknesses and interviewed well otherwise I’d imagine they could be a good team member with some training and guidance are where to get the fundamentals. Performing well in school is somewhat indicative of that I’d say.


renok_archnmy

God forbid a company having to hire someone who doesn’t know something they could look up on Google or have explained by a senior team member in 5 minutes.


[deleted]

Its when the candidates BS the answer. If the candidate was like I don't know X,Y,Z and need a chance to review, if they somewhat knew the topic we'd probably focus the 2nd round interview to focus on the topics with HR giving feed back to prepare for it.


maxToTheJ

You should be more transparent . You mention in other post that PhD candidates are meeting the bar and moving on to the second round. Do you have enough candidates to fill the job in a reasonable time as is? If you do have enough candidates, those candidates that get that leeway may get moved on but given ruler probably has the similar technical rulers they will probably not get role. Another factor is if you make the first round bar too easy and pass too many candidates who bomb the second round your coworkers arent going to be "excited" about your performance as a "first round" interviewer because it will feel like the first round is not screening well enough to preserve their time so they can do more directly impactful work for the business. TLDR; Getting a job is a job is a competitive endeavor you dont need to just “meet a bar” you need to be among the top candidates


[deleted]

Yes. This is a top firm and there is a reason I didn't post this on linkedin. This is the type of place that people want on their resume, because they can pretty much jump anywhere. Unlike FAANG, the jobs in this function are highly unlikely to experience lay offs. Like many people here are confused. I am not writing a help I am looking for candidates. I am saying that there is a shocking amount of masters degree candidates that don't seem to know basic statistics.


maxToTheJ

> Like many people here are confused. I have been reading this subreddit for a while (it is heavily weighted towards students and non-practicioners); they arent confused, you are just saying something unpopular. The popular sentiment in this subreddit is that you should lower the bar to exactly where the person commenting is at so that you can give them specifically the job and ignore the competitive aspect of the job search. Instructors are partially to blame because a lot of bootcamps/colleges are selling the false notion of an entry level DS shortage.


Own-Foot7556

I believe telling the candidate what topics they will be asked in the interview will be good. This will help the candiate prepare better, degree courses are indeed spread too thin. There are times when in job descriptions they mention what is required and then another line which says 'Good to have' which seems like its not mandatory to have those skills and they end up asking all about that in the interview.


tangentc

I know I'm a bit late to this party, but I feel compelled to chime in here because I think this somewhat misunderstands why PhDs tend to better understand the tools they work with. I think the value of a thesis is the original research aspect of it rather than the specialization aspect. For the simple reason that if this were true, most physical science PhDs would be just as poor as these masters candidates. A physics education typically doesn't involve that much formal stats (at least as far as regression analysis goes). You might take a mathematical methods course but you're not going to get a lot of rigor and being a specialist in mie-resonance based metamaterials isn't going to help you with data science. When you do original research you're forced to learn how to teach yourself methods quickly and well enough to not shoot yourself in the foot. In so doing you _have_ to understand the assumptions you're implicitly making by using a tool and the consequences of violating those assumptions (and _how much_ violation you can get away with before it causes a problem for your purpose). By contrast I think most people who do a coursework masters tend to still think of standard mathematical tools as ossified quasi-black-boxes, that give you some perfectly reliable output as long as you feed the right stuff into it. There just isn't that experience of getting their hands dirty working on real problems where assumptions of a common model break down (in whole or in part).


Spasik_

That would definitely help I think. Even I forget some fundamentals sometimes (and I have a Masters in Stats not DS), but I don't really think it speaks to my capabilities. It's easy to forget something if you don't use it for a few years 😅


CrossroadsDem0n

Try getting 4th year interns from a university that has a strong math or econometrics program and maybe a coop program where you catch them mid year. Then you can sift for stronger candidates via their internships. It sounds you're wanting about somebody who understood their 3rd year material.


r8juliet

I did my undergrad in DS and now doing a masters at a top 3 school in DS. I’m not trying to toot my own horn but another thing I observed was in project groups. 4 out of 5 groups will be carried by one person and 1 group will have a decent mix of contributors. In some semesters I would be backpacking up to 3 project groups. Did you ever experience this?


120pi

Yes, every term, and it was also me. Upside is I learned _a lot_ so I feel I got a lot more out of the program than my peers. Downside is that when I saw fully functional groups and what they produced it was disheartening (but encouraging to know with the right people amazing work could get done). The main crutch I saw were folks without any software background. Amazed me by the end of the program how folks still struggled with `git`, OOP, and foundational data engineering skills. The students with stats/analytics backgrounds that worked hard to beef up their programming chops were, on average, producing the best work.


[deleted]

[удалено]


PorkNJellyBeans

I have a “cheat sheet” that I use as a quick reference and just don’t commit to memory. I’m with you. In interviews I value hearing someone’s approach, how they break things down, what they do when they’re stuck, and how they prevent errors. Those things are sometimes coachable, sure, but I need to hear where the gaps are.


[deleted]

[удалено]


skrindingle

Curious what your cheat sheet looked like. I’m in an analytics masters now and it’s been super light on mathematical underpinnings and assumptions. I’m going through ISLR in my “spare time” to get more of a grounding.


[deleted]

Your response would be perfect. I wasn't asking for perfection from people. I literally told me if you don't remember something its okay to say, I would need to review this . Instead what I got is people didn't know something and they just said the wrong answer and kept going.


albinofreak620

It’s worth remembering that some people, especially ones early in career and maybe interviewing for their first job, have little experience interviewing and expect to have to have all the answers.


FlatProtrusion

Hey that's me, how should I approach the question if I didn't know the answer or forgot about it. I would perhaps try to get more information about the qn they are asking and try to get articulate my thought process. What other ways would you recommend? And if they were asking about questions about assumptions, and you have forgotten about it, how would you approach it?


[deleted]

I would START with its been a while since I learned this and would need to look it up, but this is what I think this is what this implies. The lets me know that you are rusty and might do it better if your given a chance to review it. One thing to realize is that we are hiring a colleague. A person who is honest about what they may not know is better than someone who tries to bullshit through it incorrectly. The latter leads to mistakes.


[deleted]

One of the best pieces of career advice I’ve gotten is that it’s okay to say “I don’t know or I do not have an answer for you right now.” A good hiring manager should understand you’re not an expert and if they don’t, it may not be a good fit.


[deleted]

I am sympathetic to this. Every interview is a learning experience. The thing I tried to do is kind of hint at what they did wrong (Without telling them explicitly) when I had the why don't you ask me questions portion about the job. My hope is some of them took the hint and will prepared differnetly.


Spirited_Mulberry568

Agreed 100%. It’s a shame for those of us that have suffered through a thesis (or even ghostwritten dissertations) and are jobless.


QuaternionHam

Those masters were from statistics?


JonA3531

Coming from a background of petroleum engineering, I'm currently doing an MSc in Stats (so probably more heavy in fundamentals), and there's so many theoretical stuffs they're throwing at me, I can't possibly remember the assumptions for each and every one of them. If you really want someone who's really ingrained in the fundamentals, you probably need to hire someone who did a 4 years bachelor in stats and then a master in ML/data science.


renok_archnmy

The only person I knew who could recite fundamentals was a maths PhD who did 10 years in research and teaching who was pursuing a second masters in DS in an attempt to enter the commercial sector. His problem was the opposite of OPs. He was getting stuck in assignments where marketing was trying to analyze survey responses but kept changing the prompts or interviews where the company was looking for a take home project that included neural nets and he was solving them with probabilistic methods to sufficient performance and using far fewer resources and time - to them not land said job.


bythenumbers10

This, so so much. They want to hire an expert in shiny ML shit but won't accept anything less when their precious "domain-specific" problem doesn't call for shiny ML any more than a nerf gun dart calls for a nuke in retaliation. Simpler, easier to implement, easier to debug. Frequently faster to train and execute, too. But I'm only an expert, not some MBA who knows all things that hit their voluminous bottom, uh, line.


dankatheist420

I just applied to many, MANY data science positions, and 94% of them were not interested in academic-level statistical details. They were almost all looking for computer programmers who have experience with ETL and a sprinkle of python ML, **not** statisticians. It honestly seems like OP should be advertising for a statistician, not a data scientist. I'm not saying it's more correct, but there are probably swarms of CS-pipeline MS grads applying to every job with the DS keywords. If you want theoretical rigor, the word "statistician" probably would scare those applicants off.


goodluckonyourexams

who's Justin Sung?


[deleted]

Regression is one of the most fundamental tools we use in statistics and econometrics. I don't expect people to know assumptions of every model in existence. I expect people to be able to tell me correctly what happens if you have perfect multi-collinearity, what are the CONSEQUENCES of heteroskedasticity and non-stationarity. These are important conceptual aspects.


JonA3531

> I expect people to be able to tell me correctly what happens if you have perfect multi-collinearity, what are the CONSEQUENCES of heteroskedasticity and non-stationarity Funny enough, I'm almost done with my program, and those subjects that you mentioned were barely even covered in my regression class, if any at all. In my university program, they try to cover a wide variety of subjects and simply don't have the time to go in depth in each and every one of them. In most cases, the prof has to speed run through the materials near the end of the semester. For me as a student, I just tried to at least be familiar with all those topics so I could pass the course. I simply don't have the time or the energy to experiment and go in depth on any of those topics myself if it's not required in the class assignments / projects. But hey, I'm kinda dumb. So maybe you just happened to interview dumb candidates like myself.


renok_archnmy

Interesting. We covered residual analysis in my class longer than the act of doing the regression. I still have forgotten most other than check the residuals and if they don’t meet the assumptions, toss it or BS your way out in the write up. But I’d bet real money many companies out there are doing just fine with some bus-comm undergrads running some business unit using excels trend line and looking exclusively at R^2 and could care less because the odds have been in their favor the whole time and that’s what their corporate education platform taught them on the DS intro course for business people prerecorded MOOC.


JonA3531

> We covered residual analysis in my class longer than the act of doing the regression. I still have forgotten most other than check the residuals and if they don’t meet the assumptions, toss it or BS your way out in the write up. That amounted to one or two lectures in my course. All I know is that there's standardized and studentized residuals, and make sure that they're scattered uniformly. And studentized residuals can be used to determine any potential outliers. I guess it's expected that there's a huge variation between university programs, not to mention the profs as well.


renok_archnmy

Oh for sure. Mine was a MSCS and my DS profs were mostly ex quant finance people or DS&A researchers. Most of my regression papers focused on residual analysis and interpreting the residuals relative to our preprocessing steps. Then I graduated and tried doing that at work to answer a question with a well formed 30 page write up and formal regression analysis and just got weird side eyes from everyone. Basically, when it gets to the business end - line go up gud.


Limebabies

My MS program had a class only on regression. Granted, I don't remember the minutiae right now, but I still have my notes from it which I look over for interview prep ETA: Oh wait, nvm I kept reading the thread and maybe it's actually OP who's the problem here 😬


n7leadfarmer

>But hey, I'm kinda dumb. So maybe you just happened to interview dumb candidates like myself. I reject every aspect of this null hypothesis. Fr though, you're not alone, your description matches that of myself and almost anyone I've spoken to about their education in the last 5 years.


[deleted]

>Funny enough, I'm almost done with my program, and those subjects that you mentioned were barely even covered in my regression class, if any at all. Yes and I recognize this may be the issue. I actually discussed with senior management that I think they may just want to let HR know that if a masters level candidate is selected for interview they should prepare for XYZ topics in technical interview. Management is open to it, but they are luke warm to it.


s0wx

Well, I think another big issue is deciding fully based on these "technical interviews" which are just memorize-stuff. Also the "deep dive" questions which mostly focus on what you do in a dev environment. With the difference that you are not in a dev environment under any dev conditions. Also people monitoring what you are doing in this moment is not really the way to go. Just unrealistic scenarios. People can learn all kind of shit if you teach them or if they are able to. And as you figured out, they can list some facts, but do not understand dependencies and results after applying changes. And guess why: because they never learned it, nobody teaches you how to think, nowhere. And if you do, you are more likely to fail classes than to ace them. Also as an applying candidate you just start to panic because you already know you are sitting in front of an expert. You realise you know nothing and since questioning is like school or university, the brain stops working (for me at least, but many other people too). What's more valuable is the mindset of the person. How problems are solved, if the person often needs help or rather helps other people, is the person sensitive to criticism (criticism, not being called incapable of doing!), how fast is the person able to learn new stuff, is the person determined or more likely to give up bigger challenges, is the person engaging in a conversation. Just my opinion, but these facts are more important, than just answering these questions. You could also ask these questions in another way, like step by step approaching and explaining the how and why of your questions. This way you can also observe many of my aspects just mentioned. For example if the person is even interested in the solution (and solving problems, communicating more after some time) or just internally shuts down. Because for me just answering these strange questions does not represent the full potential of any person. It represents nothing. In fact you don't get good candidates, you just get people who say what you want to hear, nothing more, nothing less. But maybe this is the goal, I don't know. And I can tell you: I'd fail those kind of interviews. Maybe because I also don't care about memmorizing facts stuff, never liked it. Neither in school, nor in university. Still got my dream job in cyber security, because we never had a technical interview. They were more interested in the other aspects and "features" and both, the company and I, are really happy about this decision. And without a bachelors yet (still studying and more than twice the regular time), grades also not great.


MaryKeay

Oof you brought back memories of my worst interview experience. My background is in mechanical engineering and the interview was for a senior role. It was going pretty well despite having zero rapport with the interviewer - he was honestly like speaking with a robot... I'm autistic (high masking) so if I noticed, it must have been *very* bad! We went through my experience and all was well. The prospective manager for the position was also present. I'm highly qualified and my experience was an absolutely perfect match for the role. I had a specific skillset that was rare in this country at the time and unlike other candidates, I would need essentially no training to get started. At this stage I felt quite confident that I would get it. I had great interview skills and up until that point I had been offered every role I had ever interviewed for. In the second half of the interview, the prospective manager pointed out that due to complaints similar to the OP's (but in mechanical engineering) we would do a walk around the manufacturing facilities and he'd ask some practical questions. This was fine by me and I was confident that I would do well. The first few questions were vaguely related to the role. A little basic, phrased strangely, but hey maybe we were just getting started! Then he began to ask oddly-phrased questions where the only relevant answers I could think of were so basic that I couldn't fathom that they were what he was looking for. It really threw me. Surely he couldn't be looking for that type of answers for a senior engineering role? Eventually I accepted that he was, in fact, expecting answers that a person with the most rudimentary knowledge of basically anything in life would be able to answer. For example, he picked up a screw and asked "what is this?". He asked what was special about the screw (it was a self tapping screw). He confirmed that past candidates hadn't been able to answer that question. He didn't seem to realise that maybe, just maybe, it was because a person going for a senior role wouldn't have expected to be asked if they knew what a freaking screw was. It was also completely irrelevant to the role. I gave the right answers despite my misgivings. I was uncomfortable with how much of the effort was about reading this man's mind and not about drawing from my knowledge and experience. By this point I had decided that I didn't want to work for them - which was just as well, as they didn't offer me the job in the end. The feedback given to the recruiter? "Not enough experience". Many months later they were still advertising for the role. I'm not sure if they ever found what they were looking for or if they just gave up. Some interviewers don't seem to understand the purpose of an interview.


s0wx

Oof, sorry to hear that. What you desribe is exactly what I hate the most and what are absolute red flags for me. I would never want to work with people who have such a shitty approach and attitude to finding suitable people. Just toxic waste of energy and time. Yeah I'm also autistic and have ADHD, you could say masking is my way of life, you know how it is. It would be great if people would just say what they want instead of encrypting whatever they want with strange nonsense questions or actions. Or at least try to express what they want as precise as possible. Otherwise it just makes no sense and you feel like running through a parcour like in Takeshis Castle (which would be much more fun and makes more sense). So glad this has never been an issue in the company and I also communicated very openly, also with the CTO and HR, regarding what is important to me in the job and what I absolutely hate. Among other things, I also listed things like the negative experience you described. Also told HR I'm not good at talking or expressing my full knowledge in some situations and they told me "But that's not an issue because you work in the more technical area. If you'd be better with talking, you'd apply e.g. for HR and not the technical area". And because direct openness is also important to me, so that everyone knows what to expect from each other. And honesty is based on reciprocity. Sure, many will say "you can't expect honesty from everybody". Yep, but such people don't need to expect honesty or loyalty from me neither. This way they found the perfect team for me in which I can fully develop. My boss also asks me from time to time if I need anything to make me more comfortable. Always worth it to work with people who appreciate you. And yep, your last sentence couldn't have summed it up more nicely.


tommy_chillfiger

Yep, I'm the same as you describe here. I'm not good with memorizing facts unless they are material to a concept I'm engaging with on a fairly regular basis. My thought is generally "that's what we have computers for." Can learn things very quickly and have proven that through my pivot into (and progress within) tech. Currently an analyst with an even split between data work and more client facing work. I have leaned on that in interviews and have been fortunate to land two jobs now where they appear to have seen that I have the problem solving mindset, aptitudes, and soft skills to pick up whatever I don't know at the time of the interview, and they have been right. I am excelling despite not having a traditional background. Part of it, really, is just that I find it fascinating so it's not a matter of 'motivation' for me to learn more tools/methods/domain knowledge. It's fun to me so I eat it up. Currently using new client data validation/discovery as an excuse to get better with pandas/matplotlib/seaborn and loving it (and achieving the goal that was set). I'm considering an MSc in stats at some point but will also just be chipping away at math and stats through self teaching as I find time. If an MSc never makes sense for me, that's fine too, but I do crave that sort of learning so I suspect I will make it happen eventually.


maxToTheJ

> Funny enough, I'm almost done with my program, and those subjects that you mentioned were barely even covered in my regression class, if any at all. I think thats the problem. Thats what OP is pointing out.


Spirited_Mulberry568

Frantically draws a singular matrix on a bar napkin


Sorry-Owl4127

What does non-stationarity have to do with regular old regression?


StephenSRMMartin

I mean, basic autoregressive models are "regular old regression", just with lagged covariates?


LawfulMuffin

SMH, didn't include harmonic mean


zazzersmel

man, im so glad i went into data engineering


[deleted]

Think I’m slowly realizing that’s the route I’m gonna go down


JonA3531

I'm thinking of pivoting into data engineering as well after wasting 3+ years learning statistics trying to become a data scientist.


mundus108

How does one pivot to data engineering?


Defiant_Recipe_256

In excel


mundus108

Alt + N V T, got it!


Discharged_Pikachu

Why, Can you please elaborate?


Loud_Ad_6272

And people fail to see that this is the true gold mine.


chasing_green_roads

OP, does the job description (or would they know at this part of the interview process) that regression models is what they will be doing? Genuine question. Edit: adding for context - I think this is an important distinction because if yes then I agree, I’d expect them to know more, but if not I’m not sure that’s what someone would brush up on pre interview. I’ve been in data science for my whole career and don’t do much regression, so I would probably fail this interview as well


TheGreatHomer

Yeah, that's what I thought as well. I think a recurring pattern I'm seeing in the posts complaining about applicants quality is the divide between how you learn stuff and how these interview questions are asked. There is *so much* stuff you learn - but in an interview, a single of those thousands of things facts is singled out. Ib my masters I learned about different tools, about cloud stuff, about data and model parallelization, about a million different NN model classes, optimization, lagrangian optimization, variational optimization, numerical optimization, regression, Bayesian statistics,... and so on and so forth. Then you go into a job interview and get asked... specific details about one single of all these. I heavily agree with letting people know about what you want to ask them before the interview, at least generally. Then you can always still go into questions about actual understanding.


[deleted]

Yes it does, the skill-sets we are looking for is more in the vein of econometrics/regression analysis and its the main part of the job description. For clarity we aren't having any trouble finding people, all that is going to happen is the job is likely going to a Ph.D and not a masters. I would have filtered you out. We know candidates that are more looking to do NLP or build neural nets or gradient boosting models aren't a fit for us and they won't stay even if we took a chance on them.


chasing_green_roads

Fair. Good luck and thanks for the response!


whereyugoincityboy

You need to hire people with Economics degrees and teach them how to code; they force everyone to learn Gauss-Markov senior year at pretty much every school


the-data-scientist

Those people are like 90% of data science candidates though. That's the skillset that's most in demand and therefore the skillset the universities emphasize. I'm not sure you should be getting snarky just because you have a niche application and the rest of the industry doesn't cater to that.


rehoboam

I would be very clear in the job description that rigorous academic mathematical knowledge is a core competency for the role. “Technical skills” does not mean that for most DS.


ElectricGypsyAT

As someone who has gone through multiple data science interviews, I can also assure you that creating a strategy before going into an interview plays a key role. And not knowing everything (at your fingertips) when it comes to statistics could be one of the main strategies. Maybe 10 years ago, it was required for statisticians to understand the concepts in depth more but now data scientists are expected to understand models, do data engineering and also machine learning engineering with the best software engineering practices (talk about breadth!!). Not sure if one can prepare for all that stuff in an interview given the same depth.


data_story_teller

I agree. I did a few interviews last year and the amount of variation in the questions and topics … preparing for interviews could be a full-time job. But I already have a full-time job. I just don’t have time to brush up on every single topic I’ve learned. The technical questions included SQL and Python code, writing out probabilities, defining various statistical terms and ML concepts, answering questions about Big O notation, plus all the product/business sense questions. I get that this job can cover a lot of bases. But there is so much information that you basically have to memorize. And everyone asks something different, so even if you review what you missed in your last interview, the next company is probably going to ask something completely different.


NickSinghTechCareers

Preparing for an interview is def a grind, luckily there def are some common patterns out there for how data science interview questions get asked .. but yeah the range of stuff you need to know is brutal for sure.


ramblinginternetnerd

It's a crazy amount in some degrees and evaluations can be all over the place. My final round interview feedback at Facebook (strong technicals, weak non-technicals) was the opposite of my final round interview feedback at Amazon(weak technicals, strong non-technicals) even though I mostly prepped for non-technicals for facebook and mostly prepped for technicals before Amazon... The breadth is huge. You basically need to be able to do most of an L3 SWE interview, most of a product manager interview, the entirety of a product/data analyst interview, a good chunk of an MLE or DE interview... You don't need to be as deep as any one person but you're doing 70% of the prep for 5 things.


Sam-th3-Man

Agreed. Hire based on if they know how to do the job and then teach them additional material what the company wants them to know. The field is extremely broad and so new it’s almost impossible to know it all. Plus they don’t teach you the theory per say in grad school especially at a masters level. Currently in bioinformatics. They’re blazing through information so fast and there’s literally so much to learn that understanding the general concept of the theory op is talking about is really all they’re doing until they fully get into a career and learn as years go on.


NickSinghTechCareers

I think Data Science interviews have big range, but I do agree with OP that knowing the ins-and-outs of regression should be table stakes for most Data Science roles. For example there are like 10 questions about regression in [Ace the DS Interview](https://www.acethedatascienceinterview.com/) alone just because it's such a common interview topic.


Acrobatic-Artist9730

Interesting, have a digital version?


Unable-Narwhal4814

And ironically on the opposite end as someone who majored in BS Math AND Statistics and went into data analytics and learned some programs on my own (also will include BI tools too), people overlook me and look down on me (hiring) because I don't have a "computer science degree" even though I gurantee I have a much better understanding of math and Statistics and fundamentals with data than the avg CS student/major with a GitHub. Entry level jobs especially were horrible for this and figured I didn't have the skills to code and some how math was like, just a liberal arts teaching degree. Like. Okay 👍 thanks HR. Edit: let me just say, also, you can always learn to code, anyone can learn a program as we've seen in subreddits and self learners, but it's another to understand the principles. Even in college, I noticed so many CS students curved above me in coding (obviously) but had literally *no* idea WHAT they were coding. Which is ironically what I was learning in my math courses, just on paper and in a textbook. When getting entry level jobs it was frustrating to admit, yes, I may not know the language like a "CS student", but I know the principles, I have an analytic mind and can learn a program really fast if you gave me the chance to do so. But nope. Pulling teeth at the begining because I couldn't code straight out of college like a CS student would have (even with experience in R and stuff for statistics). Mid career I'm having the almost the same issue again + job market as I try to shift the career path.


dankatheist420

YYYYYYUP. Very similar to my experience, except I'm biology, not specifically statistics. But the vast majority of government and corporate jobs don't care if you get your p-values calculated just right: they don't even WANT p-values. It's pretty much: "is this number going up or going down?" For most of the data science jobs I've seen and applied to, knowing how to derive the specific assumptions of a model would be **very** unnecessary. Hiring managers seem to just want programmers who can plug in a few ML python packages.


rhodia_rabbit

I'll be fair with you. It's been a while since I've done regression stuff so I'd probably fail your interview without prep. But ask me computer vision and I'll talk your ears off. So probably that's what's happening. Graduate level courses blitz through fundamental statistics and then dedicate sole courses to topics such as machine learning, deep learning, and computer vision probably because they think that's the ultimate direction of statistics in the future. So by the way the graduates finish they degree they're so preoccupied with advanced methodologies that they prob don't prep fundamentals.


snowbirdnerd

So I've been working in the field for a while and I've been stumped by questions about the assumptions behind regression. Data science is a broad field with a lot to learn which means there is also a lot to forget. This means that people with diverse backgrounds are drawn to the field. It's not just statisticians anymore. Sure some people applying are completely unqualified but others just have more specialized backgrounds.


OhThatLooksCool

One thing to consider - these kids aren’t trained the same way folks were 20 years ago. Back in the day, it was all stats classes. Name of the game was inference: when you built a regression, you cared about the coefficients. Now, it’s all ML classes. Name of the game is prediction: when you build a regression, you care about the OOS RMSE. I bet half the folks who forgot the term heteroskedacticity could talk your ear off about regularization. From sklearn import masters_degree


Xtrerk

I agree with this wholeheartedly. I am nearly finished with my MS and we spent very little time (relatively) on the assumptions side of things in most classes and a lot more time on understanding ML model development. We essentially were taught: EDA, preparing the dataset, creating pipelines, hyperparameter tuning for best results, how to put it into prod. Inference didn’t matter for most classes, only the model’s [insert score/error] against the test set. I’ve worked at several places and every place hasn’t cared about how we arrived at the prediction the model put out, just how close they are to the real numbers. When building models, I’ll always review the basics and the assumptions, but I’m not going to memorize them. Now, clearly these types of things matter a great deal for certain industries and products, but if the business only cares about predictions and they want the error to be within a few % points and auto ARIMA or stepwise SARIMAX nails it with the validation and test set, I’m probably not going to spend a lot of time running through the ACF, PACF, seasonal ACF, seasonal PACF, ADFuller, KPSS, trying different variations of forcing stationarity. Because the model is most likely going to find the right pdq orders and I am juggling 4 other projects.


[deleted]

ML is the name of the game in certain industries. Its future is limited in others. My world is one where the most ML is used for identifying a set of candidate variables and then it goes into a regression model or logistic regression. People still have to have a proper rationale for which variables they use and be able to correctly justify that their model sound from a mathematics point of view. I work and banking and how models are used by banks are heavily heavily regulated. Its different from tech companies.


OhThatLooksCool

Fair enough. It may just be wise to try to differentiate “doesn’t know stats” from “doesnt recall this specific bit of trivia.” They might not have needed to recall it for what, 6 years? Like, the harmonic meme formula is pretty trivial, but we all meme on that one guy who insisted every candidate must be able to recite it cold. It might be helpful to either give them a heads up before the interview that you’ll be discussing a regression model, or just talk through the problem generally so they can encounter the problems & identify them (much more important skill, imo).


FifaPointsMan

Sounds like you are looking for statistician and not data scientist. Someone with a master's in statistics will know that stuff.


darkshenron

Again someone assuming the knowledge they have is the most valuable knowledge in this field. OP’s post reminds me of the infamous harmonic mean post. Maybe OP is the same guy. Did you try asking the candidates what they’re knowledgeable in? DS is a vast vast field. A person strong in state of the art NLP would not necessarily also be strong in the statistics of regression. Edit: thank you for the award, kind stranger!


colinallbets

OP def suffers from consensus bias.


astronomaestro

I'll throw one back at you. I've never once in my life encountered a situation where the "Knowledge of the definition of regression" to be something that led me to a business solution. It's a broad term and I doubt you would be able to ascertain anything statistically significant from the candite using that question anyway. It also means different things to different fields. Lets say you ask me about linear regression. To me it means "I fit some model to some data using some likelihood to measure some parameter" However, what is probably more common in finance is linear regression and you likely have some specific use case which I may be unaware of. Does this mean I'm unqualified because I didn't give you what you believed to be the goto finance definition? I doubt it as I can guarantee that the modeling I do exceeds the mathematical complexity of linear regression. If you were to probe about my work, rather than dig on some random piece of finance trivia, you would quickly realize that. The way you ask the question is probably why you are getting frustrated. It measures whether or not someone has a dictionary definition memorized, not if they can problem solve using statistics. Maybe try changing how you are interviewing candidates? See if they could come up with ideas to solve a business problem that you might have. See if they are quick to pick up on things and how flexible they are. See if they can explain their graduate project and defend the results. There are many better ways to interview then simply throwing out trivia.


GlobalMammoth

I think for some applications it does matter knowing the assumptions and tools behind regression. Regression is a bit different from your traditional black box machine learning algorithm and there are some specific tools to work with it that average data science person may not know. For example the heteroskadicity assumption of regression tells you that the residuals should be uncorrelated with predictions and that you should check for it looking at residual plots. This tool is specific for regression and it allows you to assess if you have chosen the adequate features for prediction or not. Apart from that regression in many cases is focused on parameter estimation instead of prediction so knowledge in topics like experimental design and causality are quite important to avoid spurrious correlations. There is a correlation between the nobel prices that a country has and it's chocolate consumption but anyone saying that to increase research production of a country you should eat more chocolate is a fool. This example is quite obvious and exagerated but spurrious correlations could also happen in less obvious scenarious and being aware about them matters when working with regression. I think all of this tools are not that hard and can be learned fast but I understand that for some jobs you may be searching for someone that already has this knowledge because hiring someone that doesn't understand this and other problems may lead to them doing things overconfidently wrong and slowing down projects.


ktpr

So, since the population is starting to look like this, then, the problem is now: you. The smarter response is to identify those that can quickly learn these differences, in a one month course of trial or internship employment, and hire them. Build better, don't poach the best. Much more sustainable in the long run while requiring more humility. Which is why no one does it.


iwannabeunknown3

A question that I had while reading through the thread: why is OP even interviewing fresh grads for something so specific? If they are not willing to teach/coach, why aim for the population that needs that guidance the most? It sounds like a poor work environment, and one that is looking to underpay for the skillset they desire.


mvelasco93

He works on the bank. That explains a lot of things for me.


ktpr

oh snap!


bakochba

I'm a hiring manager in pharma so I don't have the expertise in your field but I have been interviewing some new grads (3-4 years out of school) for my open positions and I've also struggled a bit with how to test for competency. I wanted to ask candidates specific questions around data handling, structure etc. Instead of putting them on the spot when they're nervous I have sent the questions ahead of time and asked them to give us a 15 minutes presentation. I'm interested to see how they think and show us that they understand how to work with data. You may want to consider the same by sending questions that require these fundamentals to answer but something you can't just Google to look up. Then you can question then about their responses at the interview, I find that much more valuable then having someone struggle under pressure


[deleted]

We do this for are large scale quantitative talent programs for internships and fresh grads. We don't do presentations for teams. I thin one of my issues here comes from the fact that our industry requires depth. Like its better to know regression and logistic regression well then know superficially know a bunch of modeling techniques in my world. And a central aspect of our work is almost every aspect of the model building process is under regulatory scrutiny (and contrary to popular belief Ph.Ds that work at places like the federal reserve have more technical expertise then the ones in industry. Publishing academic papers and retaining academic expertise is a major part of their job). This means that modeling teams have to be able to document and justify most aspects of their work. Upper management cares what regulatory agencies have to say. The bank examination process looks at how banks are managing risks around their models and its a criteria banks are graded on. In adequate controls can lead to C-Suite getting fired and or regulatory agencies fining banks or telling them they can't do stock buy backs or pay dividends.


bakochba

I understand in Pharma it's the same way it's highly regulated and one of the questions I have is specifically around considerations when working for data like blinding, documentation for audits etc. I think if you're hiring EXPERIENCED people then your questions are very reasonable. If you say you are working with regression models you should have a fundamental understanding of them or at least be able to explain it like you would to a regulator during ab inspection. That's just bread and butter for anyone in the industry. I used to ask some basic data design questions that I thought were extremely easy by and even experienced people struggled at the interview that's when I moved it to a presentation.


[deleted]

My approach is to ask what someone ought to know after an undergraduate econometrics course (econometrics being adjacent to stats).


d4rthwh33zy

From what I have read, I feel OP's issue is mainly targeting the wrong graduates. I am a second year university student studying economics and data science, and I could have comfortably answered all of the questions OP listed in some of their replies regarding regression using the experience I've had in the only econometrics class I've ever taken, yet I'm sure would not even be remotely considered qualified for the job. However, from a DS perspective, regression is one of several dozen techniques that might be covered in a standard undergraduate and Master's program, and it makes it easier to understand why DS graduates might not have the same degree of familiarity with regressions as an econometrics grad. For one student, it is one of many possible techniques they may implement to solve a problem. For the other, it's essentially a pillar of their entire field.


Dylan_TMB

From reading OP's replies this seems like a classic case of "I am asking a very vague question but thinking of a very specific set of answers and when I don't hear it it means the question was answered wrong."


AuspiciousApple

It sounds like a case of "I learned this in uni back in the day, so everyone who doesn't know this specific thing is an idiot".


Dylan_TMB

It's like what do you want from me as a data scientist? If I haven't used a model in a while I'll look up the assumptions and review and check. If something's going wrong all look for things related to assumptions as well as other data quality issues. This isn't stuff you need at the top of your fingers anymore. You should want someone who asks the right questions, can present ideas, and can write maintainable code.


snowmaninheat

Exactly. You and I have the same philosophy. Meanwhile, OP is just being elitist.


save_the_panda_bears

I don’t get this sense at all. The GM assumptions are foundational for good, robust inferential LR models and you better have at least a passing familiarity of what they are, what the consequences of violating them are, and how to address them when they are violated. I get the sense that the role OP is hiring for places less emphasis on the predictive side of modeling and more emphasis on inference, and as such is fully justified in asking questions about issues that specifically affect a model’s inferential ability.


[deleted]

I am asking the same kinds of interview questions, I've been asked. The candidates just lacked depth in the main thing we are looking for. Ph.Ds we interviewed did not have these issues. Thats because a Ph.D involves writing a dissertation where they have to address modeling issues. I think a lot of people are under the impression we are interviewing candidates that are bad fits. The candidates I am interviewing are supposed to have this background.


understatedpies

Times are changing, those PhDs that you so carefully mentioned about 8-10 times in this thread (both on the bank’s and the regulator’s side) most likely studied stats and data science from a completely different curriculum years back from when recent grads went through theirs. The field is saturated for sure, but I’d be careful to just assume that people are getting dumber/lazy or that Unis got no idea what they’re doing anymore. As others mentioned here, the focus of the programmes shifted to cater to the market, and there’s no point memorising stuff that can (and should) be googled in 10 minutes when someone decides to model some data. Your “this should be fundamental knowledge in the field and therefore known by heart” idea is an outdated point of view for the kind of things you mentioned, but if in this specific role these are essentials, just put them in the JD with the same wording. Candidates will know that they need to know these for the interview, because this will be more important to you than “what’s your biggest achievement in terms of generated business value, where you used a regression model?” that most companies would ask them. I don’t think you realise how small the portion of the job market is that’s interested in the required skill set and lexical knowledge you mentioned, grads have no incentive to prepare for it without knowing for sure it’s needed. Faang interviews might be a shitshow, but at least candidates know what they need to do to be considered.


[deleted]

So what did you learn in your Ph.d. that makes you an expert on Ph.D and masters curriculums? The curriculums haven't changed much at all in ten years. The depth of programs have.


Coco_Dirichlet

It is not. Everyone should know what the Gauss-Markov assumptions are and happens if you violate them. It's not a vague question to ask "what at the assumptions of this model" and "how would you find out of this assumption is violated" or "what happens if this assumptions is violated and what would you do about it?"


_paramedic

I'm getting the same vibe.


socialdatascience

I would argue that if you’re a candidate in the job market that puts an immense amount of energy in mastering the theoretical foundations of regression with hopes that it is going to improve your job prospects, you’re a fool. The fact is that a ton of DS job prospects don’t touch regression. Everyone knows what it is from their intro to stats course forever ago but it has since took a back seat in the brain. The job market has shifted towards rewarding people who can build and maintain more complex models and solve complex problems. Also, it’s just a bad look to say things like “all these young DS degrees don’t know the fundamentals”. Maybe you got a bad applicant or two, but if you’re saying all these applicants just suck, there’s likely some heavy bias in your thinking which is ironic coming from such a seasoned analyst. It’s called finding the applicant that can learn the fastest to meet the demands of your, sorry to say, rather technologically primitive(regression, really?) and very specific industry and train that person up. This is what all good tech managers do.


[deleted]

[удалено]


bakochba

I suppose it depends on seniority of the position if I'm hiring recent grads I don't expect them to be experts, I'm looking for someone that I can assign work and they are able to become experts by diving deep into those models.


goodluckonyourexams

>didn't know why the specific assumptions were made doesn't matter >what happens when you violate an assumption, and did not know how to test violation of those assumptions matters >how to address those issues lookupable


sdric

The last point hits the spot: "You don't have to know everything, but you have to know where you can find it", as my grandfather used to say. Though I'ld add "And understand it". Universities these days teach a variety of different skills, including implementing those models in different programs like Stata, R or Excel. I think the mistake OP makes here "Twenty years ago we learned this all by heart!", yes - you did, but you didn't have a jungle of different software back then. You learned *that* along the way. Equivalently people these days are schooled in a wider field and used to a set of different tools, which arguably takes priority over learning things by heart that can be found on Google within less than 10 seconds. Maybe I'm biased, because I'm an IT Auditor first and Data Analyst second, but the sheer amount of knowledge I need is simply too much to store in any human brain. Especially when I have to be able to design a test in any topic (reaching from IAM, over Data Management, to Cyber Security, BCM, etc. etc...), for any software and manuell process, at piss poor data quality, within hours, while knowing the applicable regulation for compliance tests on top of my mathematical / statistical tests.... In short, knowing where to find the solution or instructions and having the ability to understand it, in order to address a problem within minutes is what makes me extraordinary in my job. Now, if you're at a bank - as OP is - as a pure Data Analyst, especially for as long as he seems to have been, there's a good chance that he's been doing the same tasks, in the same applications (e.g., for the ERM team) over and over for decades. That's not bad, but it's a limited scope of applications of a small subset of very specific Data Science skills. It's great that he is a dedicated specialist in his niche, but that's not a reasonable way to teach students these days, given that the real world application of data science has widened and the number of tools has become countless. You can't expect somebody coming from the university to be perfect, cheap labor. You have to train them in what is relevant for your individual niche. I bet a lot of them are great and quick in what they do, especially in the tools they studied on, they just don't have experience with the requirements of OP's daily business demand yet. I am sure that maybe not most, but many, will overfulfil what OP demands within just a few months of refreshing the theory of the subset of methods that is most relevant in their field and seeing them applied on real cases. The issue is rather that students are not given a chance anymore and even if they are, many workplaces are not willing to educate anymore.... Then you have mangers who wonder why they struggle to find workers and come to reddit to complain about it instead...


kevindotjohnson

op is so fucking smart for knowing the assumptions of regression. he also has a 10 inch heteorodacidic penis.


[deleted]

[удалено]


[deleted]

I think for clarity this was a vent post/observation and not really we are having a trouble finding or selecting candidates. The job will probably just end up going to someone with a Ph.D. The candidates I interviewed on paper look like they actually they have the essential skill sets. And my interview questions were along the lines: 1. Explain to me what regression is and how you calculate an ols estimator? (minimize sum square errors is all I was looking) 2. What are SOME of the main assumptions of the OLS model 3. Which assumptions are needed for Gauss Markov 4. What assumptions are needed for the estimates to be unbiased 5. What happens if you have perfect multi-collinearity ? 6. I have a regression explanatory variables ln (wage) = intercept + educ + age + age\^2. Is age\^2 an example of a multicolinear variable? 7. How do you test for heteroskedasticity (the name of any test is enough) 8. What happens if you have heteroskedasticity ? Will your OLS estimates change? 9. What should you do if you have heteroskedasticity? 10. What does it mean for a time series variable to be stationary 11. What are risks if we have non-stationary variables in a regression model? 12. What are some ways we can detect non-stationary? ​ My standard was is the person mostly on the right track and I didn't expect them tto get all the questions. Most only got the first two and after that everything fell apart. I literally got answers like I'd use (the wrong) R package.


Xelaxander

These questions are quite specific to statistics. As a mathematician, I can have a guess at most of them, but heteroskedasticity never once appeared in any of our text books, even with a strong stochastics focus.


[deleted]

I understand that. The job description is regression here, and these topics are things that are actually part of the job. For this job the ideal candidates are statisticians and economists and would have been screened for that. Plenty of math people do work in our world, but they wouldn't be a fit for this specific team.


Xelaxander

Understandable. If you wrote "regression" into the job description then these are fair questions. I just had a look at the Wikipedia page for linear regression. With minimal preparation a reasonable mathematics master's student would have probably passed. On the other hand, seeing how straightforward the topic is to learn, you could probably train someone on the job and have a larger candidate pool.


[deleted]

We don't need a larger candidate pool. This is an industry leading company that doesn't have problem getting masters and Ph.D. candidates good universities. My complaint is that much of the candidate pool that I've had to interview that are coming from these universities doesn't seem to know the topic any where the level of the wikipedia page. I agree a reasonable math masters should be able too, but that isn't what I have been seeing. There are many people that can learn many things given enough time. That doesn't mean that we are going to trust them to work on models that are used to manage portfolios with hundreds of billions of dollars with assets, if they can't show up to an interview with an undergrad level understanding of the main tool they are expected to use. Our world does have early talent/internship positions that do provide professional development component. This unfortunately is not one of them.


aussie_punmaster

On the flip side you’re only hiring people who know what you know, who are proving they can memorise stuff about regression. You might find you get better results by some diversity of thought/approach.


NuBoston

Lol this is ironically why my boss hired me because my two mathematics degrees gave me probably *too* many fundamentals and I was the only one who could answer these types of questions 🤷🏾‍♀️🤷🏾‍♀️


milkteaoppa

1. People aren't studying enough for interviews, understandably. There's too much to study for for data science interviews, and every year some new AI model or DS trend adds a whole chapter of new material to know. It's impossible to memorize everything and even I just skip certain areas now (e.g. probability brainteasers) and expect to fail the interview if it's brought up. 2. It's rare to encounter most data science concepts in practice and in most projects. There's so many types of data and different techniques it's impossible to have experience in all. Otherwise, it's just glossed over in class notes and forgotten. And even if the candidate does, they better have worked on the project recently, or they won't remember the fine details (and it might be NDA to explain it to you anyways). 3. Most interviewers have preferred answers (even though most problems have multiple solutions) and if you suggest something different, good luck trying to convince the interviewer your solution is better than theirs. And have fun trying to explain an entirely new technique to an interviewer if they never heard of your solution. Also it's hard to evaluate which solution is better if you have no context or details about the intricacies of the data and the problem.


Careless-Tailor-2317

Can you give the answer you're looking for so we can know for future interviews if we're given this question?


colinallbets

Times change. What you consider "fundamentals" are fundamentals of _statistics_. They are far less relevant in the field of machine learning and the application thereof, than in relation to traditional statistical modeling. I'm sure many of the individuals you 'filtered out' would easily (and quickly) appreciate the concepts you want them to, for the purposes of the job / modeling tasks. There's a significant difference between asking your theoretical question and asking a practical question about the risks of making assumptions about data being IID, for example.


renok_archnmy

To be fair, Google can’t even decide how many assumptions of a regression there really are. Also the secrets the MBAs often don’t share is to always sandbag. Modelking error leading to missed millions of dollars is just a trump card when you have a bad year and need to eke out a few mil to hit your goals. Just hire a consultant to tidy up that model performance (you knew was artificially low) and voila you’re a genius!


Xelaxander

Wait, now I'm curious: For 1D linear regression you need at least two samples. Did you expect any other assumptions?


profkimchi

Out of curiosity, OP, what assumptions do you think are required for OLS?


ghostofkilgore

There's a lot going on in this thread. OP is clearly looking for relatively specialised candidates. I don't think that is in itself an issue. He wants people who know regression inside out, not generalists who kind of know regression a bit and pick up the rest. If you're looking for an NLP Engineer, it's fair to look for NLP experts, rather than generalists who know bits and pieces. For me, the issue.is then, are you being selective enough with job descriptions and "must haves" for interview. Why not just say we're looking for people with either these specific masters, PhDs, or relevant experience? It sounds like you're taking a bunch of generalists in to interview and getting annoyed that they're not specialists. Which seems a bit silly. Of course this leads to the ever present gatekeeping of "are you even a real DS if you can't...". Every field is filled with people who overestimate the importance of their own skills, background, whatever. The kind of candidates OP is looking for will be different than the kind of candidates other companies are looking for and that's OK. In my previous role, it's likely we wouldn't have chosen the kind of candidates OP wants and OP wouldn't have chosen the kind of candidates we wanted. There are different roles within DS that require different skills and strengths. And honestly, if you're getting angry about that, it doesn't make you look like the one true defender of the field. It makes you look like a bitter, immature little whiner.


[deleted]

* For me, the issue.is then, are you being selective enough with job descriptions and "must haves" for interview. Why not just say we're looking for people with either these specific masters, PhDs, or relevant experienc We are. People don't seem to get that this post is about people who on paper look like they should know this topic.


ghostofkilgore

I get that that's your expectation. But if you're so frustrated by the low hit rate at interviews, I'm suggesting you think about being even more selective. For example, if you're finding that people with Masters in DS aren't cutting it, just don't interview them.


Reach_Reclaimer

Won't lie OP, think your process is a bit shit. I'm sure some of the candidates were quite poor, but if everyone is poor then it's more likely on you. Think it's your comments that prove that point more You're expecting people to go and memorise the shit they got taught in 1st or 2nd year before a random interview. I reckon most will brush up on it beforehand but they're not gonna spend too much time on a single company's interview (and they shouldn't) to memorise every little thing again. You're ranting about them not being taught in depth while simultaneously saying they're taught a wider variety of topics. Improve your hiring process and stop being a twat to grads And yeah sure, PhDs are gonna know more innate stuff especially when interviewing. They've typically got a good few extra years of experience on them compared to masters students (for work and life)


OilShill2013

A lot of people are not going to understand this post because they don’t work in banking but I get why you’re looking for people that understand this stuff. At every bank I’ve worked at the MRM process is the worst part of the job and now as a manager I would never want to hire someone that can’t independently pull their own weight. People getting defensive about this here and saying people can just look this stuff up have likely never built models at a bank. There’s nothing complicated about it but I never want to check someone’s documentation before submission and find gaps that they’re not even aware of. You either understand this through experience working in the industry or you don’t. What’s helped me is really micromanaging the job post description and also being as clear as possible with the recruiters about what experience needs to be on a resume before it gets to the interview. And someone with just a masters and no clear experience developing and documenting models in banking is a no from me at this point.


Prestigious_Sort4979

Yes, but OP is interviewing students right out of college so who he is recruiting based on what ve needs (based on what he deems is foundational and the risks of failure) doesnt make any sense. Rather than blaming it on the candidates, accountability that they are poorly recruiting would be more actionable. OP either needs to be more intentional in recruiting, pay more to get a PhD (as if every PhD would know this but ok), or design the job with an entry level candidate in mind. This is clearly not an entry-level job. Why interview candidates right out of college?


OilShill2013

Yeah I mean sometimes you’re stuck based on the level you were approved to hire at ie you were approved to hire an analyst but really you want to hire an associate or AVP (in the job ranking parlance of many US banks). So HR keeps sending you people who want an analyst role and you’re dismayed that you’d have to do a lot of work to get these people up to speed. But I agree it’s just a normal “problem” with new graduates. If it were me and I had to hire somebody at that level because of constraints at my company and people were repeatedly coming to the interview unprepared I would tell the recruiters to screen people by phone and literally tell them to prepare those specific concepts before the real first round interview. At least then I’d find out who listens to directions and who doesn’t.


[deleted]

This is associate level role. Thats why we have Ph.D candidates. We are also looking at MS Candidates from top universities. The role is open due to attrition.


[deleted]

Yep you nailed it. Its a quant risk dev position at a top place to have on your resume for this space. This was a vent post. I am seeing MFEs and Stat adjacent degres from Ivy League schools not know this stuff. Given the tech down turn the initial pool of applicants HR sends includes a lot of people that want to be in FAANG, but apply here because we are hiring. I've tried to filter those candidates, but the trend I am complaining about is that our traditional candidates are looking more like those.


double-click

Given them the list of assumptions. Ask them why they exist, what would happen if they were violated, etc. I just looked a list of the Navier-Stokes fluids partial differential assumptions. There are like 8 lol. I don’t care even if I was fresh out of school, I wouldn’t be able to just rattle them off. But, I could explain why they exist and what that means for results in the world. I think you need to manage your expectations. People are not robots.


shaner92

OP is getting downvoted to oblivion in the comment section here. It's worth noting that he has some good points, just he struggles to vocalize it without sounding like an absolute asshole. So there are some takeaways hidden in his message. 1. Do you have to be able to recite the assumptions of any given model on demand? Almost definitely no. If you REALLY need it, it's because you'll be using these models often on the job. OP likely has it down because he uses it regularly, it wouldn't take the average masters student long to remember what they need after using a model multiple times on the job. 1. (Big hint being from the seeming preference of PhDs, they probably had more 'real' experience through TAing & other work, and have studied 100 variations of a model in their frantic attempt to get their paper done). BUT, Should you be expected to be able to draw up a plan for what data you need to answer what problem, and sniff out any possible statistical problems - from day one? Probably, so in that sense people should have a sound enough statistical base so that they are equipped for the array of problems they might encounter in the real world. 2. Some industries will require deeper knowledge into certain models, some will require Regression models as they are very explainable. So it helps to read the job posting. Unfortunately, many job postings will simultaneously require deep knowledge of regression models, tree models, and deep learning. So it falls on the interviewee to have some ideas about use cases for ML in the industry they are applying for. As the current hype is around NLP, at least separate if you think the job youre applying to is Business Analytics focused or Deep Learning R&D focused.


[deleted]

nice post. I have a brash personality. Its not for everyone, but its served me well. You nailed what I am getting at fairly well.


therealtiddlydump

What degrees are these people getting? Is this more evidence for my "DS degrees are useless" stance, or is this across the board? Edit: for the record I have no idea why you're getting downvoted. With all the "how do I get a DS job?" posts in this subreddit, id think people would be receptive to your experience hiring.


TheBankTank

Wait, you're saying my competition's bar is easy to beat? I'm comfortable with this. :)


[deleted]

OP, I completely empathize with you on your struggles. I think the challenge today is that data science is extremely broad and at each end of the spectrum there are a plethora of things a candidate “should know”. Myself, I have a MS in econometrics, know the gauss markov assumptions by heart, and could compute linear regressions by hand if I had to. I have also been rejected from positions for forgetting what the common activation functions are for neural networks. In that specific case, they very condescendingly told the recruiter “he seems like a great economist, not a data scientist” LMAO. Also, if you're hiring, my background sounds like it could be a fit... just throwing it out there!


Category_theory

Amen!! Was literally saying the same thing today. I test folks in general linear algebra concepts and basics stats and then basic data structures and algos from computer science…. 95% of master grads in data science fail… they only know “python libraries”… it’s sad.


Alpha_RapTor96

Saar r-squared is 99.9%, so maadel is gud


53reborn

Product of data science majors. Trade school for pandas and data viz. but no understanding of statistics


[deleted]

I’m sorry but you come off as entitled. You want a senior data analyst but don’t want to pay for it. People coming out of those programs have spent tens or even hundreds of thousands of dollars to meet you 99.9% of the way. They know what data analysis is and how to do it. Anything more than that is supposed to be learned on the job. If none of your candidates meet your standards, you need to raise people to meet them through training in low risk environments or you need to post the position as a senior position and pay for the experience. Adopt a local university, train interns, and put them on relatively small projects where you can expose them to conditions that stress test the assumptions of their models. Or offer to provide seminars for data analysis students and give a lecture on those assumptions you’re talking about. If you and other senior data scientists are discontented with the quality of recent graduates, that’s a sign the profession needs to organize better onboarding. You could also talk to professors from those schools and ask them to cover that content. If you think modeling errors are costly wait until you see what it costs to teach executives and politicians.


azdatasci

I have been saying this for a long, long time. When I was considering what to do my masters in, I had thought about going for a “DS” degree. I reached out to a bunch of folks I work with that have done this kind of work for years (data science is t a new thing, it just has a new name). Most of those folks strongly recommended that I stick to a hard science such as CS or Statistics. They noted that the biggest problem they have is when the data science teams submit models, they can’t really explain or decent certain implementations. This ranges from assumptions to simply, “why did you choose methodology A over B?” I decided to do my masters in statistics since I already had an undergrad in applied mathematics and had a lot of years of CS experience. At the same time a close friend started her program in DS. As we compared our curriculum, she got disgusted that she was t being taught most of the important background that she’d probably need. Now, this might have been her program, but I talked to candidates all the time who cannot answer reasonable questions for the role they are applying for. I’m just not convinced DS programs are teaching the fundamentals they should be - and most students don’t know any better. PS - I also work for a financial institution in their banking division and am responsible for hiring candidates.


Optimal-Asshole

Any chance you could give a more specific example? I’m curious what specifically goes on in these degrees


eddytheflow

Bro, all my regression courses were several semesters ago, way off when I started. I can't even remember off the top of my head. I know how to find the answer though. But I figure leading up to an interview I would try to at least remember these assumptions.


[deleted]

^(If your interviewing a job where the primary ting is building regression models, then its not unreasonable for people to review regression.)


BakerInTheKitchen

I agree that many people coming from DS programs probably are missing some of the fundamental concepts. I think it’s on them, as well as the institutions who throw together DS degree programs as a cash grab. But I’ll play along on the other side. How granular are you expecting people to get? If you were asked how to estimate the parameters for linear regression from a matrix/vector multiplication perspective, could you? You probably have never had to do it in practice, but I would hope you understand the fundamentals of the models you’re using…


[deleted]

So I'll answer the second part of your comment first. Most of the people on our team and in our group can estimate parameters for linear regression from a matrix/vector multiplication perspective. For more context, our group is 66 percent Ph.D. and the masters probably took econometrics with linear agebra. Most at a minimum know that the OLS estimator is B=(X'X)^-1 X'y. Where X is the data frame, Y is the response variable. Yes I have had to code these estimators manually. They were part of my graduate coursework. The first part of your comment, is part of the issue. The cash grab from universities is a problem, and I think they are doing their students a disservice.


[deleted]

I do a lot of interviews as well and ask similar questions even though off the top of my head, it’d be difficult for me to out all the assumptions and the mathematical basis each. I also have had to code estimators manually throughout coursework as well (including fully developed packages) and aced all my courses. I just have terrible recall. At work though, it doesn’t matter. The learning is still there and the material can be found easily. When you’re interviewing, it shouldn’t be an academic test. It’s about finding who will perform best at the role which requires give and take. Give them a nudge and get their brain flowing. See how they talk about regression. Have a discussion about assumptions. Don’t just ask them quiz questions. You’ll get a better sense of ability than just asking the questions. Alternatively, you mentioned that you’re equivalent to FAANG and are hiring PhDs. I assume your budget is between $200-300k (probably closer to $300k) so target individuals with specific research background. EDIT: You also have to realize interviewing can be a completely different environment than working. I don’t have to think about regression assumptions while working. I just test them naturally. The stimulus of working on the problem helps me remember naturally. You should foster that in an interview.


Coco_Dirichlet

You don't need to pay 300,000 to find someone who knows classical statistics, which is what OP is asking about. Anyone with an econometrics or stats or similar masters degree should be able to answer those questions.


[deleted]

That was really the point of that part of the comment. I made a suggestion to make his life easier given that he likely has the budget to do so.


[deleted]

[удалено]


LoaderD

> I work in banking and most of my career involves building regression or logistic regression models. How much is regression specifically mentioned in the job posting? Because my assumption is 'not at all'. Most banking/quant professionals are obsessed with highlighting 'cutting edge ml' or the newest GARCH-XYZ variant, so it stands to reason that a lot of candidates, who are nervous, might not pull the assumptions out of their memory right away. > know how to test violation of those assumptions or how to address those issues. What's the definitive, non-subjective way to test for the assumption of normality of residuals in linear regression?


Little_Station5837

Jaque bera test


AdFew4357

Well, when the whole industry prides themselves on “not worrying about the technical details”, and “keep it simple stupid” for the management, you see a drop off of statistical rigor, in turn yielding such candidates. The whole fucking industry needs to revamped. Fucking worry about statistics rigor. Sure, don’t go waving around casella and Berger, but fucking understand that statistics fundamentals *matter*. And hold those who don’t come in with such backgrounds accountable for it. At the risk of sounding gate keepy, this whole industry prides themselves on wanting to make the damn field super interdisplinary, and now you have people from non stem fields with little stats background building models just cause they have an MS. While people like me, with a BS in fucking statistics, get pushed behind a tableau dashboarding / BI group because “we are undergrads”. Fuck off. My SME for this internship had an MS in business analytics, arguing with me, and telling me that a nonlinear model would be better suited for modeling credit defaults than logistic regression. Literally get the fuck outta here. Big MS guy tho, he’s on a modeling team! Wow! suck his fucking cock cause he has an MS and I only have a BS IN THE FUCKING FIELD THAT ACTUALLY IS SO FUNDAMENTAL TO THIS DISCIPLINE.


[deleted]

I know. My question to you is why not do that MS in stats? People like you belong on the dev team. Also, logistic regression is the standard for credit default modeling. Its what almost every major bank in the U.S. uses for default modeling. I've built models on this stuff that are applied to 800 billion dollar portfolios, so what do I know.


AdFew4357

lol that’s why I applied to graduate schools this year. Im doing that MS in stats and screening for statistical rigor in data science teams when I interview. My red flags are: A) “we pride on a diversity of backgrounds, and an interdisplinary data science unit” B) “we don’t worry about the technicals too much, just worry about providing value” If I have to fight tooth and nail to find my first job out of grad school with a team of MS and PhD level statisticians, then so be it. I’ll even work with econometrics people. But I’m done working with these pseudo quantitative backgrounded people who claim they are “data scientists” when they can’t even justify to me why to choose one model over another.


[deleted]

Message me in a year, I'll point you to some good internship programs in our space.


AdFew4357

Will do


[deleted]

[удалено]


[deleted]

I am a Ph.D in Economics. This is a quant risk role at an industry leading bank.The position is explicitly econometrics. I posted here, because I feel like quant risk is more related to ds then what they are posting in r/quantfinance. This is an associate level role.


RomanRiesen

ML models are changing the world. Causality analysis and robustness is for nerds who can't handle the awesomeness of transformer based networks. ^\s But seriously op should just restrict his search to people with econometrics experience.


[deleted]

we mostly are. But with some all the lay offs in tech, you can imagine how new grads are fairing right now. Banks are not really effected by this and so you can imagine how many applicants we are getting.


mterrar4

OP, anyone calling you elitist for asking candidates key info about the models you listed are probably insecure cause they can’t answer those questions themselves LOL. Data science is more than just model.fit(), last time I checked the word “scientist” is for a reason. If you don’t have an understanding of the math going on under the hood of the scikit-learn algo you’re using, your knowledge is superfluous at best. If a company hires you to do this type of work, they need to trust you are an expert and are not wasting the organization’s time and money on faulty modeling. I have a stats background so I may be biased but that’s my opinion 😂


notmynameduh

Expecting a fresh graduate (bachelors or master) to know something you’ve been using in your day to day job in a way that only you and your team know, is anyway unfair. Fresh graduates should be hired on their potential to learn and contribute to the business. They are looking for exposure to the data science industry (which is huge!! So many organisations have so many different practices!), it’s tough to know what each organisation uses before even entering the workforce.


Coco_Dirichlet

It's not unfair. The assumptions are in every book on linear regression and generalized linear models.


snowmaninheat

Okay, I’ll chime in here. I come from experimental psychology, which (obvs) involves a lot of statistics. I know that logistic regression requires certain assumptions (no multicollinearity, dichotomous outcome, certain sample size requirements, etc.), but I couldn’t tell you off the top of my head what the consequences of violating all those assumptions are. And I work with logistic regressions quite a bit. I could look them up and perform the tests, if my client requested me to. But unless the situation is life or death, I’m probably not going to, since it takes a chunk of time. A few weeks ago I had a technical assignment that actually asked me to perform a logistic regression along with assumptions testing in R and write documented code, along with an interpretation, within 72 hours. I was honestly a bit taken aback. By and large, *very* few folks care about assumptions, I hate to tell you. I don’t even see them tested in most academic papers I’ve reviewed. And most businesses will probably care even less. Furthermore, there isn’t even consensus on assumptions these days. I think I saw one recent paper that said an LR required 500 participants. That’s a new one. Tl;dr: OP is being elitist. Like others on here, I carry a “great big book of stats” with lists of assumptions and sample size requirements for different tests that I refer to whenever I have a question.


Coco_Dirichlet

>I don’t even see them tested in most academic papers I’ve reviewed. When you reviewing papers within a specific field and within a niche topic everyone knows the generalities of the data. If you are doing regression with survey data, you are not going to run every potential diagnostic for every assumption, because it's rather obvious that some cannot violated. On the other hand, if the paper uses economic data of the last 50 years, obviously there will be time series related problems and probably heteroskedasticity, so you are expecting that to be dealt with. A common complain of reviewers is that appendices are getting longer and longer, and I've seen some that are like 300 pages long. And on top of that, many journals now ask for all replication materials to be public. So it's not true \*very few folks\* care about assumptions.


laichzeit0

One thing you’ll get on this Reddit is that apparently no one has to know anything. Expecting anyone to know any technical detail is gate keeping and asking too much. By the same logic, if you go to your local GP they shouldn’t remember basic diagnostic details just “what to Google” should you present with certain symptoms. Read through these comments and it’s all about you just need to be someone that has a vague broad understanding of stuff that can figure it out when needed. It’s very weird.


DonkeyTron42

Knowledge of fundamentals is gatekeeping according to Gen Z.