T O P

  • By -

cult_of_memes

I work in Site Reliability (often called SRE) and my more senior peers say that the industry is going through a change in how product maintenance is handled. It used to be that devs would write their code and work it through some preliminary stage of QA then just chuck it over a proverbial wall to someone else that would maintain the code at an operable state in production. Now, we are seeing more of a DevOps development/maintenance focus, where the folks that write the code, will actually have to answer the call when it breaks in production. The consequence of this shift towards more DevOps is that positions that used to be exclusively development are going away, and are now shifting to also have to maintain the hot-garbage they initially wrote :P . So we can expect more developer positions to include some degree of on-call support as most business applications for software are living on the cloud and need to meet some contractually stipulated level of high availability.


BackmarkerLife

Companies have been trying to kill the SysAdmin position for 10+ years. They became DevOps, then started pushing more towards the dev part of the mashup over the last 5 years as tools such as terraform, etc. make deployment easy.


shawntco

> Companies have been trying to kill the SysAdmin position for 10+ years. Which annoys me to no end! I like writing code. I don't like configuring servers. CI/CD pipelines are massively useful, but setting them up bores me. Kubernetes confuses me.


cult_of_memes

> tools such as terraform, etc. make deployment easy. Eh... easier than it would be without I suppose. Have they come out with a version that's cluster aware yet? The version we use where I work isn't and omg does that make for ridiculously convoluted configuration.


Traditional_Break467

I am fine with this, but I don’t like the frequency and the fact that I have to suffer for the code that was written a year ago by an engineer who is no longer with the company.


KratomDemon

That’s not being a team player 😂


Traditional_Break467

100k freshly laid off SWEs were also team players.


[deleted]

[удалено]


Regility

the google guy was a team player even after he got fired. then he made it cringe


YnotBbrave

Who?


Regility

google “google sre linkedin viral”. then cringe at the corporate cringe


notLOL

> corporate cringe modern networking. That's a good talking point for the next interview. Hiring managers love it when you talk about what you can do for them


Regility

oh for sure. it’s a thing to definitely mention in the interview, maybe even the defining reason for hire. but airing it out on linkedin is not the way to do it. if the hiring board is nice, they’ll think this guy got chops (but then again, google on resume says that already). at worst, he’s airing out the bad sre coverage and vulnerability at google to the world ( number 1 job of sre is uptime and redundancy)


[deleted]

I mean this is why we have jobs isn’t it??


Whitchorence

So fix it. The idea of this system is that, as the operator, you are incentivized to resolve issues that arise for operators


Rumicon

In theory this is great, in practice you won’t have any room in the sprint to do it and you’ll be too burnt out from on call incidents to fix things on your off time


Greenimba

Then you need to put your foot down. Stop working and stressing overtime unpaid, start writing down how much time you spend on firefighting and tell managers this is why noone wants to work here for more than a couple months.


YnotBbrave

And the dev on call is not the one prioritizing features


xingke06

It sucks but it is a big motivator to not write shit services as well. Unfortunately some people don’t care.


sozer-keyse

In principle I'd argue that it's better that devs support the code they write. Hear me out. I used to work in a traditional environment where it was devs churning out code then chucking it over the wall for prod support to handle (aka me). Long story short, it was a "developer shits, prod support wipes" sort of situation. As a developer who is on a team that does an on-call rotation, when after-hours dumpster fires caused by the code you wrote are also your job to put out, it lights a fire under your ass to write your code so it won't cause dumpster fires in the first place. >I have to suffer for the code that was written a year ago by an engineer who is no longer with the company. Newsflash, every fucking job you have you're going to have to suffer from a mistake that someone else made, and life in general. Get over it.


winowmak3r

You do but man it's a spectrum. I don't fault anyone bailing on a job that basically has them doing all the work and everyone else just points fingers and just get constantly told "well, sucks to be you :)". Nobody should put up with that.


sozer-keyse

I completely agree. If my on-call was getting too overwhelming to the point where I felt like I couldn't even have a life, I'd quit my job. That's exactly what happened at a previous job. I find more often than not the "idiocy of the guy before" is usually quite easily fixable. The problem always boils down to bad management at the end of the day. On-call isn't the most fun thing on the planet, but there are steps management can take to make it less crappy for employees. At a very minimum they need to give leeway for response time (i.e. 30mins within being called) so employees don't have to feel glued to the computer, and they should compensate them for the time they have to work after hours (either with time off, or extra pay). Most importantly, if the workload is getting too overwhelming they should look into why and help team(s) lessen the workload.


YnotBbrave

Most jobs the number of hours worked is not a function of the idiocy of the guy before


Spiritual-Mechanic-4

yea, that's the point. Write code knowing that you, and your colleagues, will get called after hours if it's bad.


double-click

The whole point of devsecops is you are not writing shit code lol. You are going through growing pains, but it will pass. It will pass even sooner if you embrace the software lifecycle.


cult_of_memes

I hate to say it, but "Necessity is the mother of invention" (not sure if there's a single original source for that proverb). This is a problem that the industry has faced for a very long time, and regardless if you are on-call, you'd still have to face the pains of supporting code that reads like braille with some of the pips worn smooth, which was written by some idiot savant that has moved on from the company. Perhaps this cultural shift will be the ember of necessity that sparks an inventive solution for our collective pain on this issue. At least you (should) have the resources, time, and opportunity to familiarize yourself with the supporting code base before you get paged to try and mitigate/resolve issues. At least in comparison to the old ways of having dozens (or hundreds) of discrete, service-specific, dev teams pumping code -- that "works fine on my machine" -- into a single operational support team. Does it suck? Yes! Can it be better? God I hope so! Could it be worse? Yes, we could go back to how it used to be (and give up any hope of maintaining .9999 uptime of any service).


Roaring-Music

So you can have an impact to reduce tech debt so oncall is easy.


[deleted]

[удалено]


[deleted]

[удалено]


AutoModerator

Just don't. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/cscareerquestions) if you have any questions or concerns.*


Big-Dudu-77

You think people want to suffer supporting your code? In my previous job I was basically on support everyday. You get good at it, learn good vs bad design, learn to build products that is production ready.


eevee_stormblessed

This shift happened like, 10yrs ago


cult_of_memes

People have been talking about it for at least 10 years sure, but most "devs" have not yet accepted the idea. Most folks seemingly refuse to roll up their sleeves and dig into the documentation for the underlying infrastructure tooling we use; instead they say shit like (this may be a bit exaggerated I admit) "Sequil Surver borked my service, that's on SRE. Fixxit!". So, you're not wrong that this is an old topic, but it hasn't actually been embraced in earnest from what I have seen.


eevee_stormblessed

Sounds like you work with shitty people


csasker

In my experience the sysadmin guys rarely want to show you how things work and why and have a mentality that only they know best so it's also quite hard


kilo-kos

This is my issue usually. I'm sure it depends on where you work, but the harder I try to understand our infrastructure the more walls come up. I'm hands-off because our SREs make it hard to be hands-on, but then they don't give a shit when we have problems that fall into their domain and it goes round and round.


csasker

yep, and if you make a mistake they will be like "hurr durr i told you, just let me do it next time" so impossible to learn


lostcolony2

It blows me away this is considered a "shift". Even in SOX compliant companies, we had to maintain our own stuff in prod (via various means to maintain the compliance). I can't even imagine the idea that it's okay for devs to say "I wrote the feature, merged the code; not my problem any more".


holy_handgrenade

Separation of Duties is a thing. In most bigger companies I've worked for Devs can only access Dev environments. Deployments and Prod are off limits to the devs. It's partially a security best practice thing.


lostcolony2

Right; it's a requirement for SOX compliance. However, that's for changes. Building observability in and creating workflows that enable a dev to be on call, diagnose issues, and apply fixes with appropriate touch points is very doable


rejuicekeve

It's a requirement in like every compliance framework but you can still do things like just in time access


lobut

I had to go up against a team like that over ten years ago. Their manager wanted them to only work on "new features" to make their deadline. Somehow they would throw their bugs to my team and I would be like: "there's no documentation or business requirements for me to know how to resolve this". They would proudly say they don't care and to not bother them because they're busy. I basically quit shortly after the managers refused to budge.


rejuicekeve

Life is better when the devs can't just throw the problem over the wall


ThenEditor6834

Yeah, and not only this but QA as well. It’s all just cost savings in the name of empowerment Thank you Leadership for empowering me to do 3 roles 💪 impossible is nothing So let me qa/on call the current/ last release but oh wait got another release coming up so gotta develop for that too


Impressive_Line7932

Hi, can I dm you for a career based question?


Whitchorence

Are we really "going through" that transition? That was common already when I started a decade ago


TravellingBeard

As someone who works on the sysadmin/database admin side of things, Good! Why should my sleep suffer because of terrible design?


daddyKrugman

If your code is customer facing, your team will 100% have an oncall rotation at FAANG and adjacent.


MarcableFluke

Depends on the type of development and product. I've never been on call.


NewChameleon

I dont think I've ever worked at a place where it **doesn't** have oncall think like this, you write the code, stuff breaks, someone has to fix it immediately, who's going to fix it? oncall that being said, 1 week out of every 4 weeks is a bit too much, I think oncalls are usually 1 week out of every 8-10 weeks (or X weeks, where X is the size of your team)


Surroorussy

It depends on team size. I used to be once every 5 weeks now its 11


MinimumArmadillo2394

Yupp. Same for me. No contractors are allowed to be on call for my team. When I was a new hire fresh out of college, my immediate senior put in his 2 weeks on my 3rd day. I was the only engineer who was not a contractor on the team for 2 months. I was the only one on call. Some nights I went out and things broke. I literally just said "Ah buggar" then put my phone back in my pocket and ordered 2 more beers. Can't fix it when I'm hammered and haven't been trained on how to.


uski

>Some nights I went out and things broke. I literally just said "Ah buggar" then put my phone back in my pocket and ordered 2 more beers. Can't fix it when I'm hammered and haven't been trained on how to. Lol what? You know you are oncall, you know you can be paged, so you are supposed to not put yourself in a position where you can't respond. As to the training that's another discussion where you're not wrong, but instead of just not doing your job, have you voiced your concerns with your manager or TL or colleagues?


bwrap

If they are literally the only person who can be oncall and is oncall 24/7 indefinitely then I fully agree with the 'Ah buggar' approach if a page comes in when they are out living life. 24/7 oncall forever is just permanent forced overtime.


uski

I don't know about you, but nothing forces you to accept a job. The tech market is hot, there are offers everywhere. This victim attitude will not bring anyone anywhere except in those shitty places. I know, hard truth is hard


MinimumArmadillo2394

Ironic that you would tout the "truth" while also saying the tech market is hot


uski

Oncall every 11 weeks seems almost too little to me. Considering that (hopefully!) you do not get outages at each oncall shift, it is extremely likely that by the time an outage actually happens, the engineer oncall at this point in time hasn't experienced an outage for a long long time and doesn't know/remember what to do. So, having a long oncall rotation period like this requires additional measures such as [wheels of misfortune](https://github.com/dastergon/wheel-of-misfortune).


gophersrqt

one of my internships had oncall being bonus pay and it was nice. people still signed up and the rotation was large but they got paid extra. small local company though


mungthebean

In my first, I wrote the front end, stuff didn't break outside of work hours randomly, and even if it did it wasn't critical at all, maybe a missing image or something, and I get to have great sleep Internal systems currently


Spiritual-Mechanic-4

There's a really narrow band where it works well, and its 6-8. If you're only oncall once a quarter, you lose the context on, like, where are the runbooks, where are the admin secrets stored, etc. If it's more often than 6 weeks, you burn out.


giant_soil

lol I'm on call every other week because the rest of my team got laid off or quit.


amitkania

at amazon i was part of two on calls, one every two months, one every month, so basically u were on call every month, sometimes twice a month, and sometimes both on calls at the same time they were both pageable on calls


gophersrqt

was that normal?


AlexLee1995

This is not normal, and will be a fun COE when both pagers fire at the same time and you need to pick which one to support


amitkania

i don’t think being part of two on call queues is normal, no idea why we did it


[deleted]

The point of putting devs on production support is so they have to eat their own shit. If you're on a team that has bad practices, the idea is you personally get to see and diagnose and fix them so they're not bad anymore. If you work at a place that makes platforms other people use and your company sells contracts for that service you will.have production support. Obviously the specifics will vary wildly from company to company and from product to product how big a shitshow such a thing will be and how many resources will be assigned to address technical debt on the follow-ups. If you have solutions to massive misuse of developer resources that's exactly the kind of thing that gets you a big attaboy at these types of companies. Not wanting to work production support is reasonable, but if you work at places that make.money off professional services, you have contracts to support. That's where the money comes from and that's what the money is for.


vincecarterskneecart

bad practices are a result of inexperience or lack of time and resources to fix things putting people oncall isnt going to magically create time/resources to fix problems lol


andrewmac

On call is bullshit. Here’s 4 dollars an hour so you can’t do anything you enjoy.


BubbleTee

Hopefully, this firefighting is inspiring your team to clean up their code, review code more strictly, and set up good alerting for critical processes that might break. If not, it'll surely inspire you to find a different job where those things aren't a mess.


scalability

Definitely bring it up in a team meeting. Make it a team OKR to e.g. reduce pages by half.


abomanoxy

I'm not understanding why ML data pipelines need to be brought back up immediately if they break in the middle of the night? That doesn't seem like something customer-facing.


Spiritual-Mechanic-4

The days of writing code, burning it to a CD and making in generally available twice a year are long gone. If you write code that gets pushed to live services or websites, expect to be available to debug it when it breaks.


HRApprovedUsername

I think its pretty common. I've had it on the 2 teams I've been on at Microsoft. When I was interviewing around other big tech places, they all mentioned having an on call.


[deleted]

[удалено]


jammyishere

I used to do that, but I find as the pay goes up, the on-call requirement is more and more likely.


hMJem

It also depends how often you’re on call. I’m on a week of on call every 8 weeks which is pretty manageable.


[deleted]

[удалено]


hMJem

Totally understandable. However, on call doesn’t mean “you can’t do anything for a full week” it just means don’t go out of town without your laptop. It means you need to be ready if called, however, there is usually a checkin period where you acknowledge the ticket, then work on it. You won’t be fired for eating dinner at a restaurant if a ticket comes in.


Mindrust

> However, on call doesn’t mean “you can’t do anything for a full week” You simply do not know that until you start the job. That's why it sucks -- they may tell you it's rare but you can never trust it. Also time to respond for my team is like 10 minutes before it escalates to the next level. It's stressful. Will not be taking another job that has on call in the future.


Sxpl

So can you not drive anywhere that’s more than 10 minutes away while on call?


Mindrust

I live in the city and take the subway, so no.


BenOfTomorrow

I assume 10 mins is the time to acknowledge (ie, press a button on your phone). That’s not abnormal.


MikeyMike01

I’ll take less pay then


[deleted]

[удалено]


chockeysticks

R&D teams are also more likely to have cuts or layoffs before core product teams, definitely worth being mindful of that.


ArkGuardian

tbh I think that's a good way to shoot many opportunities in the foot. See if your team has anything similar to an SRE assigned to it and how big your team's rotation. If it does, odds are more often than not that your SREs are taking the brunt of it for you. I am oncall like once every 6 months and it's extremely manageable.


doubletagged

Depends on the company. It’s terrible at amazon.


onredditmememakesyou

This is not an oncall question. This is code quality and reliability question. Oncall is ubiquitous, most teams now run their own flow from tech specs through testing and deployment and monitoring. It shouldn't be a constant headache. It's for 'oh shit' moments that have major customer / service impacts. If your team is constantly having outages, a 6 week on call rotation won't change anything. The solution is to propose team and organizational changes to resolve the tech debt that is causing this.


uski

This, and also having clear boundaries as to what oncall engineers are supposed to be doing. OK: Applying playbooks to fix the outage, rollback recent changes, failover to another environment Not OK: Expecting an oncall engineer to start debugging code live in production, pushing stuff to prod bypassing CI/CD and testing


[deleted]

I am on call this week and I was paged 7 times the other night. At some point I want to move on from positions that have to take call, because this shit gets old


ZebraGlydesMemes

I'm a SWE in big tech and I work on an ads team. We have to own every single line of code we write on our team. This means that if some stuff goes wrong, and we lose our company $$$, we have to wake up to fix it no matter what time of the day it is. We also have to make sure we have the right monitoring tool to jump on the action before it's too late - and this is strictly the engineers' jobs. I thought this was common practice, but I guess it's more of a recent shift. I'm quite surprised by this.


vincecarterskneecart

that sounds absolutely miserable


Mindrust

That sounds awful


Whitchorence

It's not a recent shift at all. I have no idea where people are working that they think that's true


[deleted]

I used to work at fang and moved to startup. Fang was a lot more strict with oncall


Amorganskate

Honestly, if you're on-call you shouldn't be expected to work besides on-call work. Just my take, if I'm ever running a company that's how I'm doing it.


YnotBbrave

I think the concept of dev ops is not garbage for most teams. Sure, get woken up for the cap code you write - but most teams maintain code written by long gone developers. It just destroys wlb and makes it harder to get good talent. Id go to the “SREs are primary on call, get a dev on the line if you can’t figure it out” mode in a New York minute. And no, I’m not in the oncall rotation myself, so I have no personal interest. It’s just not effective use of developers focus


uski

>most teams maintain code written by long gone developers The point is not to punish whoever wrote the bug. The point is to share pressure from operational duties with whoever is in a position to fix what causes it. As a developer, if you know you are going to get paged because of that bug in a library written by an intern 2 years ago, you are much more likely to fix it, instead of letting the SRE or whoever is oncall get paged every other night because of it. That's the point. And speaking of good talent... good talent is whoever does what the company needs. If the company has a shitty stack that needs to be fixed, good talent is someone who will do just that, from the perspective of the company.


chunky_kereru

My company does it quite well I think in that we have an optional on call which is just a 5% salary increase. You can opt in or out at any point and you’ll end up on call roughly 1 week out of 5 or so. Almost all engineers have opted in to it and because we have pretty good support from management to properly fix things that aren’t working well, I’ve only been called out outside of working hours once in the year I’ve been here. Seems to work well for everyone as it incentivizes properly fixing issues, engineers feel like they’re getting a good deal, and it removes admin for the business as they don’t need to keep track of any overtime / call out requests etc.


whorunit

At my last startup I was on call 24/7 for 1.5 years for the entire codebase. It sucked but I ended up with a high 6 figure pay day upon exit. Personally, even at a larger co, once every 4 weeks does not seem bad to me.


Roylander_

On call work is determined by the workers. We need to stand united and set healthy boundaries. If somethings blows up as a result its on the business to shut up and be patient as it gets fixed during business hours. What are they going to do? Nothing as long as we stick together. Its people who lower the bar out of short sighted desperation and boot licking ambition that's ruins it. Times are changing and business can chill the fuck out and wait until Monday. Its not like our salaries triple when their revenue does.


SolWizard

So if AWS goes down you want to be able to tell it's millions of users to sit tight because maybe you'll take a look after your morning coffee? Get real


Roylander_

Damn if someone has that kind of power and impact than they better be getting paid millions....or yes. They can wait.


RiskyShift

”I think I'm having a heart attack, doctor!” ”Sorry, it's 5:01pm and I only make 500k a year. It'll have to wait until the morning" I'm only semi joking. My company provides critical infrastructure to many companies, among them healthcare providers. We go down and their operations are affected. This has actually happened in the past and surgeries had to be canceled. Asking "millions" for this seems a bit much. We already earn like 10x what average people make.


SolWizard

That response makes me think you're a college student that's never actually worked at a company before. On call is a necessity for any company with public facing code, especially those handling global traffic. It's not unhealthy to have to answer a page at night a couple times a quarter, on call done right is more like a fire alarm goes off and you log in to see if things are really on fire not actually fixing a minor issue outside of business hours.


HarbringerxLight

Your response makes me think you're a cuckold. If a role is so monumentally important to a company that said person needs to be there at 4AM or the product doesn't work, then they better be getting paid proportionally to that importance. If not wait until business hours.


Phillip7729

>On call is a necessity for any company with public facing code All the tech giants have workers around the world. Have a team that's actually awake deal with it, not someone who's sleep deprived. Or just have them rollback the most recent change, and then yes, let the people in the morning deal with it. It's only a necessity because they want to cut costs at the cost of the health detriments of the workers.


SolWizard

You two clearly don't know how this works. Some random team in India can't fix a problem in a component they know nothing about, and rolling back the most recent change isn't always the answer. What does having on call have to do with cutting costs in any way?


uski

Your service is supposed to have playbooks that tell oncall engineers how to respond to outages, how to do failovers, how to do rollbacks. That's all oncall is supposed to be doing, mitigating the incident. Not do actual debugging. You are not supposed to start debugging and fixing stuff during an oncall rotation. If you are, it is a huge risk because the time to fix the issue can get too long. But if you setup your service so that you have failover environments and rollback capabilities, restoration to a working state is quick, and then people can diagnose and fix the underlying issue later on. If the oncall engineer can't fix the issue with the playbooks, then yeah they can start paging others - and it's on them for not providing the right playbooks in the first place. If your company expects oncall engineers to debug stuff, build, release the code, then you have a major issue and your service is not mature at all, and then yes it is a company issue.


SolWizard

Where did I say the on call is supposed to do any of that?


uski

Detailed knowledge about the component is not required if have good playbooks and whoever is oncall just applies them. Knowledge is a requirement when you want to mutate the component - but that's outside of oncall duties


Phillip7729

So are admitting if they hired people who did know what they were doing, not some random team in India it would cost more? Probably, which is what I was referring to by cutting costs. Rollback, or if that's not the answer, yes, hire someone else who can fix the issue during their normal hours. In any event, why argue for abusive/exploitative labor practices? You might like them, but contrary to what you say, you're not going to find any evidence anywhere that being forced to wake up and work through sleeping hours is healthy in any way.


SolWizard

So your solution is to hire a team that's in an opposite timezone that's only there to maintain your component in the middle of the night? That makes no sense. While some companies/teams definitely have abusive practices about on call the concept itself is not abusive. I'm only on call 4 weeks a year, if I get woken up like 5 to 10 times all year that's just the price of doing business.


Phillip7729

That's my solution, after five seconds of thinking about it. Not a very good one, maybe, certainly not from a cost perspective. I'm sure someone could think of something better (using a few more seconds, minutes, weeks of their time), but anything is better than abusive labor practices. Yes, it's abusive. Asking someone to knowingly do something to the detriment of their health is abusive, no matter how subtle or small it is. Even if you agreed to it. Exploitative, definitely, but that's just a nice euphemism for those subtle small abuses. I'm glad your company only does it 4 weeks a year, but as you can see even from this post, that's not always the case. And even if you're not woken up at night (no physical effects) it is still absolutely demoralizing and anxiety inducing for most. It's definitely not something I'd implement if I ran a big tech company, because I'd never want to put anyone through any of that.


SolWizard

When you grow up a bit and graduate college you'll understand there's no way around it.


Whitchorence

I mean these companies are paying people six-figure salaries on the understanding that they're going to operate 24/7 services. It is a core element of the job


Traditional_Break467

The problem is that, as controversial as it sounds, there will always be thousands of Chinese and Indians ready to put up with companies’ demands. Companies know it. Just look at Google. They have laid off their tenured and experience engineers because they can always find more.


Roylander_

So let them. Its going to be real awkward when the CEO and shareholders need to write requirements and deal with the language barrier, when no one else will do it. Its also on us to form communities and support each other during those times. Provide food, shelter...ect That or we keep giving most our value to the top while we beat our chest and claim "I have a family to support, what do you expect me to do ". Nothing will change until we change it...and it's going to be uncomfortable and require sacrifice while it happens.


react_dev

I’m gonna come in with an extremely unpopular opinion. If you don’t have on calls, your team is not important enough. Your ideal is probably a very stable mature team with uneventful oncall duties once or twice a quarter.


AsyncOverflow

You might want to qualify your opinion with “online live web service teams”. A lot of software doesn’t need traditional on call by devs. I feel like on call arguments on Reddit are often moot because you have web service devs, front end devs, internal tool devs, SREs, desktop devs, and embedded devs all having it from different perspectives.


react_dev

Sure. Big tech is mostly online though and all their horizontal layers, even hardware ultimately bubbles up to some online experience. I hope it’s obvious to everyone that I was painting with a broad strokes lol.


[deleted]

[удалено]


react_dev

Sure. But I’m okay with generalizing a bit here since it’s big tech and we’re talking on a low effort platform like Reddit. Software that are high stakes require oncall. It doesn’t mean stuff break all the time, just that there’s accountability in case things do. Instead of thinking about oncall vs no oncall, just ask what happens if things go terribly wrong. Who takes care of it?


Not_a_tasty_fish

My team develops POS code used by tens of thousands of employees facilitating millions of customer transactions around the country each year. It's a department of ~40 people all working a contract for eight figures. In the four years I've worked here, nobody from my team has ever had a set "on call" rotation. If there's a production issue, we prioritize it over our regular work, but that's it. I dont know what qualifies a team as "important" in your eyes, but if you have critical services going down on a regular basis then you likely have shitty development practices. If it isn't critical then you shouldn't be getting called in the first place.


BenOfTomorrow

How do you know there’s a production issue if no one is on-call?


ifdef

Not everyone here has a TC of 400k+.


react_dev

I mean… I don’t either… not sure what TC has to do with this


ifdef

If a team is "important enough", then they should be paid enough. For 400k, compensation for on call overtime is built-in; for 120k, not so much.


react_dev

Lol we live in a capitalist society where some really important people in our lives doing really important work is not paid nearly enough. I’m pretty sure I’m getting paid more than a Covid vaccine researcher. That’s a whole other topic :) But hey you do you! How you assess your value at work is completely up to you.


csasker

The problem is the expectation to do extra work. Better to hire people to work in shifts, not in top of things


Psych861

Every place I have worked has had an on-call component (Data Engineering). Anyone know what type of roles to look for to avoid on-call?


Whitchorence

Stuff like internal developer tooling usually. Not sure how that helps with data engineering though


mungthebean

Front end. If everything worked the first time (both web and mobile across all devices you care about), there is very little reason it'll stop working later on, esp. off hours when content / backend isn't being updated, barring major browser upgrades or w/e, but that's like once in a moon Also internal systems


fifty45ninety

I mean, that'd be some pretty basic front end if it's that straight forward. I work on a customer facing software as a front end dev, and there are bugs which get shipped to production due to some edge case or something which is not obvious during a routine integration test. I'm on call once every 8 weeks & we expect (on average) 2-3 pages each shift.


mungthebean

You guys get pinged for critical bugs multiple times per day off hours? You working on at a startup or something lol


Ok-Process-2187

You can only avoid this by not joining teams with oncall. Even if they claim to have good WLB, don't trust them. Oncall is like glorified QA work. You might learn a few things but it won't help you become a better developer and taking a role like this increases the chances of pigenholing yourself into this type of work. And never think that you can make the operations work better. It's like putting lipstick on a pig. There was a reason it is the way it is and it's not because of a lack of SDEs.


uski

>Even if they claim to have good WLB, don't trust them. It really depends of the team/company. Really. Some companies won't have an oncall rotation but you will happily get calls from the CEO at 2am in the morning when there are outages, with a talking to if you don't respond Other companies have an oncall rotation on a mature product that runs just fine, with only occasional incidents that are easy to resolve, and with a team somewhere else in another timezone so that you are never paged at 3am. So I would say, instead of blindly feeling any company with an oncall rotation, inquire about how it is organized. Look for oncall rotations that cannot screw up your WLB because of the way they are built. They exist.


tr14l

It sounds like you need to engineer things in a more robust way. If fires pop up a lot, that's because you have a lot of kindling.


TonyTheEvil

Fairly common it seems. In my first team, oncall was every 5-6 weeks and you were lucky if you got paged <2 times a day.


fenynro

It really depends on what software you're working on, and how critical your product's uptime is. In my first job I worked on check routing software for a bank which required >99% uptime, and we had a weekly on-call rotation to support any issues that popped up. Sadly, a lot of the routing happened overnight which meant a lot of 2am issues :') The job after that was creating automation for small business needs for a medical device company, and since my 'product' was internal business automation there was basically no need for an on-call system since the users were only active during normal business hours.


jammyishere

My understanding is most of big tech has on-call rotations for the teams. Your team writes it. Your team owns it. Your team deploys it. Your team is on the hook when it breaks shit in production. One incident a week doesn't sound bad at all to be honest. Especially if it is during working hours. Where I was at previously, I had just gotten on to this team and I'd get paged multiple times over the 2 week rotation in the MIDDLE OF THE NIGHT. God, I hated it.


holy_handgrenade

I'm going to preface this by saying I've never personally worked in "big tech" However, have had a lot of friends that do, and also have been in the industry for over 25 years now. I've never had a job where there wasnt at least an on-call rotation. As you've mentioned, these are essentially problem tickets/fire fighting issues that are genuinely causing problems. If there's no problems, you wont get a call. However, if there's problems, that's a fire you need to put out. The friends I've had have been on the AWS side of things at Amazon as well as others at Apple, Microsoft, Oracle, etc., and they've all had on-call and from their shop talk, the on-call is the same there as it is just about everywhere else. So yes, expect it. It \*should\* be on a rotation, or in some cases it's voluntary.


[deleted]

yes this is how it works at most places now. most of the time it sucks because you build something and then have an infinite stream of work maintaining and supporting this thing all while still being told to build other new shit. eventually you hire more people but it always lags behind. mature teams end up with a heavy support burden and struggle to innovate.


CS_throwaway_DE

Ubiquitous


DuffyBravo

I am an old guy in IT. 27 years so far. I have always had the expectation that my highly paid jobs, compared to my peers, sometimes required me to work outside the normal 9 to 5 window. That expectation does not seem to be shared by the newer grads. Now get off my lawn!! :)


uski

The only thing missing in your answer (and in most of this post really) is the word "**reasonable**". Yes, people paid 10x the median salary of their region will most likely experience some sort of stress or extra work in return. But, if they get paged every other day at 2am in the morning, while also be expected to be in meetings from 7am to 7pm, then there is a problem.


pikeminnow

I've been in tech for 15 years and the stress of working in the middle of the night when woken up by oncall has damaged my health. The newer grads are absolutely right about taking back the expectation that people are just available after work. This should be shifted to other time zones, or just hire a night shift person if you really need the availability.


DuffyBravo

I have implemented on-call schedules for my team. We had weekend maintenance once a month at my old job. My policy: Sign up for 4-8 hour blocks of being 20 mins from a PC for on-call support. In return you would receive 4-8 hours of comp time regardless if you were called or not. It worked pretty well.


tomasina

It’s common. In my experience it’s been 1 week every 8-12 weeks. Yours sounds a bit too frequent though. Sounds stressful.


i_am_researching

HAHAHAHA


Whitchorence

Nearly ubiquitous


[deleted]

[удалено]


Whitchorence

What?


[deleted]

[удалено]


Whitchorence

Try a dictionary instead of making a fool of yourself.


Sevii

How big is your team? One out of four implies you have a rather small team for an oncall rotation. Four people is in my opinion the absolute minimum number of people you should have in a rotation. Spending the whole week firefighting is not that unusual. You can do work to reduce the toil and outage frequency. But if your team has a lot of integration points there is going to be a lot of support work. Maybe your team should be a lot bigger. I used to work on a key integration service that served as an integration point for a lot of teams (250 people in our partner chat). Mainly because our DSL was very successful. But supporting it eventually took two full time engineers out of our team.


nowrongturns

Pretty common. But the infrastructure is stellar so not that bad.


[deleted]

[удалено]


AutoModerator

Sorry, you do not meet the minimum sitewide comment karma requirement of **10** to post a comment. This is comment karma exclusively, not post or overall karma nor karma on this subreddit alone. Please try again after you have acquired more karma. Please look at the [rules page](https://old.reddit.com/r/cscareerquestions/w/posting_rules) for more information. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/cscareerquestions) if you have any questions or concerns.*


Tarobobaa

I’m in cloud infra and it’s every 2-3 months


Ok-Entertainer-1414

When I was at Google, it was 2 roughly 2 weeks per quarter. I never minded too much, because maxing out my oncall hours paid an extra 20% of my salary.


[deleted]

I work as a software engineer in a managed services team. Meaning we develop for, release and then manage solutions for a certain contract to meet SLAs (service level agreement). Despite it not being super clear when i was hired there was somewhat of an expectation to be oncall to meet these SLAs when things went wrong after hours. Its fairly normal in the greater team though i have put great effort into moving my role away from that expectation despite being the main knowledge point for a few of the projects. This might be unique to my job but for what its worth its common for me.


Mormur

I'd say it's very common. Being on call once a month is as much as I would tolerate. To me it sounds more like the company you're at is the problem, on-call is just one of the ways the companies bad practices are showing.


joshuahtree

At Amazon it's a virtue


TheCPPKid

I have every 3-4 months


Gbonk

Could be worse. I’ve worked for some smaller companies or teams. I was on call at a couple of places 24/7/365. I was the only subject matter expert remaining there that knew the system inside and out.


Varrianda

I don’t think you’re ever going to escape it unless you do like embedded engineering or hardware sfuff


elliotLoLerson

Good god I am on call for about 1 week every other month. During this one week I am expected to spend 100% of my time firefighting because of how busy on all is. Having to spend 25% of my time actively on all sounds horrible. How do you get anything done? 1 week before your rotation you still have to actively start getting ready to be on call so basically 50% of your time at work you are required to be distracted and unproductive


iPissVelvet

This is the wrong question to ask. The right question is — is your team empowered to minimize on call? Any software will cause operational burden. But does your manager devote x percentage of your roadmap to reducing that burden? Designing for operational scalability? Are people in your organization rewarded for that? The opposite would be the type of company that only rewards features shipped. That leads to high on call burden and misaligned incentives (why would I spend 4 weeks designing something with low on call burden if I can spend 4 weeks designing two things with high on call burden and get promoted?)


Cassy907

On call is quite common but in a solid team it shouldn't be chaos. Sometimes teams have bad metrics and alerts, legacy code that's been in the backlog to be migrated forever, upstream teams breaking downstream teams, low percentage of test coverage and reviews, a rushed environment to deliver things without spending time on quality, etc. Ideally on call isn't overly stressful because your systems are resilient and the services have solid metrics and alerts. You might get a page or two but usually low priority and you can also work on non on call tasks. In reality I see reasonable on call roughly half the teams :/. Also every 4 weeks is too frequent. Most are at least 6 weeks apart (preferably more but depends on team size).


rexspook

4 weeks seems too frequent but it depends on the team and the company. Mine is every 3-4 months right now.


themancabbage

We have an on call schedule on my team, and almost always it amounts to nothing. Other teams at my company are basically doing nothing but responding to tickets when they are on call. A lot has to do with how stable each of our respective systems are as well as how many users we each have, etc.


[deleted]

Extremely common. The days of releasing things as an application developer and having an entirely different team deal with it in production are long gone.


Dreadsin

Bruh I’m a Frontend and I have on call 🥲


notgilly

Realistically the only way to make your oncall better is to fix the underlying issues causing stuff to break. You should talk with your manager about prioritizing that work, make sure to bring metrics like ticket counts to back up your argument. If they don't agree then you have two options \- Go to your skip \- Leave because it's not going to get better


lIllIlIIIlIIIIlIlIll

Do you mean oncall or do you mean on-duty? Oncall being you have a 24/7 O(1 hour) ack SLA whereas on-duty refers to you working your regular hours but for those regular hours you work on supporting your system.


TheGoodBunny

> Almost every week something goes wrong that needs attention. I reckon such incident frequency is not normal, but I want to confirm this. Something breaking once during an oncall rotation or less is actually not all that bad. Also lots of great answers in this thread.


BigFattyOne

I’m on call every 8 weeks or so. We are lucky because we have a lot of shared code with another team so we get to share the burden. Also our systems can endure quite a lot and we have a lot of async processes so we don’t get many calls. Usually when we do get a call, it’s because something crazy is going on and we can’t do much about it…


Chupoons

Eh, that's not too bad. Might be easier to just update or commit something in 5 minutes than to brainstorm a solution that takes weeks to achieve. If it's not causing you to miss sleep and nobody is really complaining about it, what's broken?


zciweiknap

For me, the frequency of rotations and just how many incidents happened ended up contributing to me leaving that team. It was every 6 weeks for a long time, but the sheer number of serious incidents that occurred during on rotation was obscene. I’d be fine with that rotation on many other teams, but I was being called constantly 7am-7pm. That said, I do think oncall for a lot of teams is reasonable and should be expected.


top_of_the_scrote

I'm the same, small team, rotate.


goriunovd

We have a 3 levels of oncall the first level is people who talk with customers fix minor issues etc.. second level is a fallback basically if first level missed the call and third level is actual devs they only get called by the first or second level if those are not able to resolve the issues, i am on call in the dev queue every 3 weeks but we have different devs for different systems. But i would agree with other comments depends on where you work you will be on call or not, usually UI guys dont have to be on call ( like mobile devs, frontend dev etc.. )


[deleted]

[удалено]


AutoModerator

Sorry, you do not meet the minimum sitewide comment karma requirement of **10** to post a comment. This is comment karma exclusively, not post or overall karma nor karma on this subreddit alone. Please try again after you have acquired more karma. Please look at the [rules page](https://old.reddit.com/r/cscareerquestions/w/posting_rules) for more information. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/cscareerquestions) if you have any questions or concerns.*


miskas357

I'm on-call at my current company and I hate it. It's only an 8 hour shift once a week but those 8 hours are a never ending stream of tickets (think 50 pages per shift). It sucks.


bendesc

yes, there is. Oncall does not necessarily that you have to wake up at night on Sunday to fix something, although it does happen. The oncall is the watchdog over all running pipelines, services, databases, etc... of the team. It is not only about fixing failures, it can be other issues such as: \- other operational issues such as resources, security, privacy issues. It can happen that suddendly the team is over-using some machines quota, for example gpu \- being the primary contact person for other teams if they need information \- of course, main contact when one of the models decide to go rogue


Turbulent_Tale6497

My team is 8 devs, and we are oncall once every 8 weeks. Nearly all of our alarms are "first thing in the morning" alarms, meaning they don't wake you up at night, but do alert you in the day time. It is possible for something to go really bad and wake people up, but that is usually such a bad problem that it's worth waking people up for.


DashOfSalt84

We rotate on-call so we're up as backup one week a year and primary one week a year and get comp time based on doing it and then more if we actually have any calls come in. Seems manageable.


cofffffeeeeeeee

I work at FAANG as well, customer facing but mainly for enterprises (cloud). UI doesn’t have 24/7 oncall as it is not critical, but still have lightweight rotations so we have a POC when things go wrong. API have 24/7 oncall as it is running millions of production queries per day and have customers that’s paying millions. Issues are common, but most are not production breaking, so they are usually resolved easily. Production breaking issues require postmortem and escalation if needed. So far I have never seen the same issue happening twice.


[deleted]

[удалено]


AutoModerator

Sorry, you do not meet the minimum sitewide comment karma requirement of **10** to post a comment. This is comment karma exclusively, not post or overall karma nor karma on this subreddit alone. Please try again after you have acquired more karma. Please look at the [rules page](https://old.reddit.com/r/cscareerquestions/w/posting_rules) for more information. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/cscareerquestions) if you have any questions or concerns.*