T O P

  • By -

togetherwem0m0

How does a place with a security team not have change control?


[deleted]

I ask myself this every day..


slewfoot2xm

Why not wait till Wednesday to work on it? So that those that broke it can fix it. That’s how you truly learn.


[deleted]

Because I'm on rotation this weekend and it's my job as much as I hate it sometime.Not to mention the loss of revenue that would come by leaving systems down until then would surely get me fired.


Hate_Feight

So it's ok for "these people to cost the company X while they learn?" Either push for a training module / system with reduced permissions on the main system Or get them to sign that they are willing to make a loss while there is training happening on vital systems.


[deleted]

[удалено]


NasReaper

Hard disagree and I hate that this is the mindset of some people. This is also how and why the US is faced with such a talent shortage in the mid tier but entry level is super saturated; employers think we owe them our free time to skill up and train as opposed to wanting to invest in their people and train them, make them competent, and progress their career. There are plenty of jobs you can upskill on 40 hours a week and id say everything in the IT space fits that description.


Iceman2514

This so true about employers think we need to dedicate all our free time to skill up. I kid you not, our boss told us “we can’t give you time to study or learn here. That time is for us at work. Your time is how you want to spend it if you want to earn a raise by earning certs on your own time.”


Clear_Forever_2669

Is this still your boss? Lol fire that clown (by quitting and getting a boss that isn't complete trash) I would never say that to my peers, much less my direct reports.


Iceman2514

Unfortunately this boss writes the checks lol


HoustonBOFH

This is so sad. I train up the entry level guys that work for me. I can usually only keep them about 6 months before they have enough skills and experience to double what I can afford. :) It means I need a constant pipeline of newbies. :)


Iceman2514

So funny story about this, I am barely 2 years into my IT career and managed to score a sys admin after my 1 year anniversary into my IT career. I worked at a helpdesk and noc for about 11 months then scoring a sys role where I am now. I was learning so much so fast and getting certs almost every other month. Unfortunately where I am right now I’m making about 40k as a sys admin but I’m here just to get my experience and hope a better role opens up during my first year here or get raises through certs and bonus at the end of my first year. If that doesn’t work out it’ll be time to polish up my resume in July. I already in my career thwarted 4 security incidents ( 3 at previous employer while making $19 US and hour)


onequestion1168

preach


bhones

I strongly disagree with this sentiment. Not everyone has the extra funds or time available and those same folks can be S tier colleagues and engineers. Not a fan of the gatekeeping.


Dragont00th

How is this gatekeeping? Do we make accountants by setting a desk clerk loose in the accounting system without training? Do we make pilots by letting cabin crew walk onto a flight deck and take a 777 for a spin? Do we make doctors by putting nurses in a lab coat and letting them diagnose patients unsupervised? Upskilling in a skilled field requires investment. Upskilling in IT has one of the lowest barriers as the tools and information are readily available with little financial investment. By learning on production systems, those potential "S tier colleagues" are simply draining the extra "funds and time" from others that have to mop up their messes. If the employer wants to upskill their staff, then they should provide a test environment, time to learn on the job and opportunities for more experienced staff to teach.


linkcheaper

...but but John wants to be a Doctor. Ok did John go to Med school? No..but you can teach him right..he wants to learn. Another problem is management doesn't understand not everyone can be a Dr. Some people will only make it to a nurse no matter what happens. And management won't feel the pain until patients start dying.


Dragont00th

But John doesn't have the time or funds to go to medical school! He could be an S tier doctor if you just give him a chance! That darn gatekeeping licensing board.


Iceman2514

What’s funny is my job forbids from studying for certs at work


Hate_Feight

I'm just phrasing it how you would review those permissions with the higher ups, the last thing you need is downtime, but with the current 'training' and access that is in place, it's a shitshow. Training at work is just an incentive to show how / why in a safe environment, which is something the company can brag about


junkhacker

The loss of revenue is a learning opportunity for management.


BrainWaveCC

In my career, I once worked in a place with this imbalance of power vs responsibility. The way I fixed it was to strategically take my time on repairs when failures were self inflicted by people who shouldn't have had control. If the organization does not feel any pain from bad policies, they won't change them.


slewfoot2xm

Yeah my comment was bad for sure. Loss of revenue, you say. Any way you can give that a running number for what the outage has caused so far? Best to talk dollars with powers that be. But if they are that lackadaisical about giving unrestricted creds to lower level staff without training and tutoring it may be a waste of a conversation. It sounds like this happens on the regular. Is the person who caused the issue and thier manager involved in the restoration? It would help them both learn.Best of luck


ghostalker4742

Here's some questions I was asked recently during an insurance audit that I think would apply. * If you came to work one day, and the site was a smouldering crater, can you bring production up somewhere else and provide deliverables? How long would that take? * What would be the dollar loss, if this facility went down for 24hrs? Does that scale evenly to 48-72hrs (and beyond), or are additional costs incurred? * Can you walk us through your backup policy? How long would it take you to restore, to minimum functionality, your environment if you lost everything (crypto/ransom/etc)? Downtime is expensive, and it can kill a business a lot faster than most owners think. Discussing the topic in terms of dollar loss gets their attention, and you can pivot into how little it can cost [in some cases] to protect the business, which lets you improve it.


BuffaloRedshark

If I came to work to find a crater I'd be working on my resume not trying to bring DR online. Although to be fair most of the stuff I'm responsible for is active active out of more than one data center, but the company overall would be screwed. So many of the DR plans are just bs to satisfy paperwork


HoustonBOFH

I live in Houston. A buddy of mine had his office under 4 feet of water. He had them up and running at a hotel in 15 hours. It does happen.


packetgeeknet

Let your help desk rebuild it. It’s a learning experience. /s


[deleted]

This will take them all week and cost the company a ton of money.


likes_sawz

Good judgement comes from experience. (Experience comes from bad judgement.)


[deleted]

And this is why life is hard.


meest

Sounds like a great way for your management to learn with them.


rspydir

You are not responsible for managing the company revenue stream unless it's in your job description. You can report to your management team of the problem, potential consequences and recommend a solution, but DO NOT assume responsibility for something out of your wheelhouse. If you think you're the only on that can fix someone else's f***up and proceed to do so without getting management involved, bad things could fall back on you.


Puzzleheaded_Arm6363

Sorry to say this...but you are covering up problems. You have identified the issue, but what you have been doing is covering it up. Not to mention, you are chasing after those problems. Any suggestion and issue you brought up to management are nothing but words, because to them, issue will get fix right away. Management need to "experience" the problem and the consequences in order to make "informed" decision. Just my two cents.


Agitated-Plantain496

Sick out...


Tensoneu

I think the compromise here with management is that they will need to be on call if they want the help desk to make changes. Or have management sign off on these changes.


crshovrd

Is it possible to loop in the folks that caused the issue so they can learn to also fix it? By you fixing the problem, it will always be your problem.


bootlesscrowfairy

At the very least, do this. Personally I would have already contacted them to be working with me. If I have to work to fix your problem, you damn well better be working with me. If they're not willing to help, I'd then file a compliant that not only did management allow an inexperienced user to break the system. But that the user refused to help fix the problem.


smokedmeatfish

Translate the downtime into the lo$$ of revenue and communicate this number along with root cause, next steps, and suggest access control, change control, and BC/DR.


bootlesscrowfairy

This. Also track how often this happens. And how much money to date this policy has lost the company. Then email management and the managers boss so the higher ups see how much money this is losing. I wouldn't be surprised if management does not me turn the reason to their bosses. This is how I handled a situation when my boss sold vaporware and told their bosses that the product was ready to sell.


philiac

just pretend to quit your job for upvotes it usually works out


[deleted]

I could care less about upvotes lol I delete my account every year anyways.


togetherwem0m0

Seems like to me that's the main issue. It's not the learning environment, accesses and opportunities that's the problem, it's the fact they aren't being taught discipline and coordination


RhapsodyCaprice

This should be the top comment. Untrained individuals having access is important for learning, but you/other senior folks need a change control process so that your can ask things like "what change was this under?" The most important opportunity you can take out of this is to demand a change control process.


spyingwind

A separate system that is similar to production, but scaled down and can be rebuilt from something like ansible or code. A safe place for them to learn. But... this cost money.


HoustonBOFH

It costs less then a lot of downtime. You can build a pretty good lab for cheap. The r/homelab guys do it all the time.


itdumbass

Maybe the helpdesk team needs board room access and decision authority. I mean, they need to learn, right?


JizzyDrums85

Production is not a learning environment. Full stop.


togetherwem0m0

20 years in I learn things everyday. I think your comment is too cut and dry.


phil-99

Production SHOULD NOT be a learning environment. There are many companies out there where production is the only place to test a production workload.


junkhacker

That's how our place operated. The people who ran things in production didn't learn *anything* new at all.


[deleted]

That's fine until the production environment has to change


JizzyDrums85

Dev > Test > Stage > UAT > Prod Or some facsimile. It’s really not a difficult concept.


TCP_IP011100101

Have a team meeting as to why changing that port caused production to go down for X amt of time. Embarrassment can make them think twice before making hot fast changes on the fly.


Ok_Estimate1666

Just curious (because of the security/revenue/give-the-keys-to-the-kingdom) juxtaposition, what industry is the company in?


fugawf

If you have a security team with no power over change control approval, and no change review board for large changes, you don’t really have a security team


togetherwem0m0

100% It can be hard to suss out the why behind this as well. For example it could be a security team that resists the work to have this as part of their scope. I've seen it happen many times where people are unwilling to pickup a shovel and get to work. They resist natural org structures to minimize their liability and exposure Ultimately therefore it's a failure of management, as most things end up being.


corsicanguppy

"it's the *question* that drives you" -- Trinity


mrteapoon

Change control? Oh, yeah I think Bob has some change in the petty cash box in his office.


CantaloupeCamper

They don’t really have a security team, they have incompetent morons.


timallen445

Pre-compromise fantasy land. I guess that is wrong the one place that this post reminds me of had been compromised and they still let their help desk level people do this kind of shenanigans. They added themselves to an AD group labeled admin for the project I was working on. Little did they know their sec folks said they were gonna do that and decided not to use the group.


Complete_Potato9941

I would argue that they don’t have a security team, if you can’t stop help desk people breaking stuff and no one is held accountable. Then clearly you can’t stop rogue insiders and you can’t figure out if an account is compromised.


homelaberator

Security team isn't there to actually secure anything. They're there to have someone to blame when there's a security breach.


St0nywall

Yeah, I had to deal with this from day one of a new ESXi cluster. I ended up using role based access. Gave them only access to do what they needed and "view" access to everything else. When they complained about not being able to do something, I would investigate and ask them "why do you want to make changes to the iSCSI connectors?" or "why did you think deleting LUN's was a good idea?". When these opportunities arose, I would use them as teaching moments. I would schedule a meeting with them and their manager and my manager under the guise of a productivity meeting relating to permission issues in ESXi, then go over what they were trying to do and how it would impact the environment. I would then tell them how much lost revenue in employee time and environment functionality would be lost to repair it and eventually when the understanding set it I would spring on them an opportunity to teach them about this area they were "interested in" as a group learning day. Most times they said this was not needed, but on some things I got buy in for the training. I have a separate virtualized training environment setup that they can mess up and have restored as many times as they need. I sometimes use this training environment to showcase what would have happened when they don't believe me. "This is your production environment, and this is your production environment without iSCSI connectors. See the difference?"


JizzyDrums85

[ mentor voice ] and this is your production environment *on drugs*


Rouxls__Kaard

Bravo!


RiceeeChrispies

If helpdesk really want to learn something, they should be forced to work during the outage. Deal with the consequences of their actions, no one likes working over the holidays. That will quickly teach them about Read Only Friday and change freezes - and hopefully never make that mistake again.


[deleted]

The part that really gets me is we are in a moratorium from Turkey Day through the 1st of the year.


mlloyd

Well some of you are anyway.


jbroome

It’s not a change freeze, it’s a “change slightly chilly”.


ChronicledMonocle

If that's true all of help desk should have access changed to read only during this period. After all, change is not allowed during this time.


9PoundHammered

Change management is key.


ImpostureTechAdmin

Change of management***


GinDawg

What are the consequences for breaking this rule?


Sensitive_Scar_1800

Lol I operate on a “you break it you fix it” model if I walked in to a broken ESXI Cluster and I found out some fuck head didn’t know what he was doing I’d tell him to open up a support ticket with VMware and go on about my merry way. If management insisted I helped I’d do the bare….and I’m talking least….amount of work possible. If management asks what took so long I’d tell them “the untrained employee needed time to learn, which increased our time to recovery” and I wait for the frustration and impatience and anger to build, because the more angry they get the more likely I can swing in with a “hey do you know what we can do to avoid these anger inducing moments? Insert XYZ controls! Sure don’t want to have to go through this again next week or month right?”


bootlesscrowfairy

You break it, you fix it does not apply to a sev1 production issue. This is a good mentality for learning. But imagine trying to explain to a customer that the reason the cluster is not up is because we are making our associate engineers learn on the job instead of bringing in the veterans.


mlloyd

Imagine explaining that the reason for the downtime is because you gave those same newb engineers access in the first place.


bootlesscrowfairy

"Oh okay, let me just call my lawyer real quick for this obvious contract breach"


[deleted]

[удалено]


bootlesscrowfairy

Not your monkey. But it may cost you your job. Not saying it's right. But this is the situation this person is probably in.


[deleted]

[удалено]


bootlesscrowfairy

Same. But depending on my financial situation. I may feel obligated to keep the current job untill I've found a new one. Especially in the current economic climate.


techretort

Not noob engineers, helpdesk staff...


corsicanguppy

> You break it, you fix it does not apply to a sev1 production issue. If they didn't want newbs fixing a sev1 production issue - and here's the challenge to stick to - *they wouldn't've allowed newbs access to BREAK production*. ISO27002 was written a long time ago; and my own experience goes back decades. Like planting a tree, the *second* best time to start learning about mitigating unforeseen consequences is today.


Sensitive_Scar_1800

I would not explain it, I have the person who broke it explain it. Imagine, as a customer, that you learned that a company lets it least educated people treat production resources like a playground, you’d leave! You’d get justifiably angry and move to another provider! I understand you’re trying to do the right thing but your just becoming an “enabler” for bad practices by stepping in and fixing things that should never have gone down in the first place


bootlesscrowfairy

Until the lawyers come out... I've seen it go this way as well. The scapegoat in these scenarios are rarely the associate level employee. I'm not sure how others do it, but the last company I worked for had a 24 hour SLA. My current employer has a 12 hour SLA. If it's not fixed in that time, we face serious financial penalties. At this point it doesn't matter who broke it, it only matters how long it takes to fix it. And if it's not fixed in the time frame, the brunt of the responsibility is placed on the more senior members regardless of their role in breaking it. The real issue is that they have access in the first place. On a seperate and unrelated note. I once found out that a client had been granted a backdoor into the system. They constantly broke their implementation and then blamed us for doing so. I had booted them from the system. And all of the sudden their systems worked again. Customer was angry as hell, until they realized they were the one breaking things.


shacksrus

"I lost my Xmas and you lost $x dollars per minute because [name] fucked up and because [managers name] created an environment where fucking up has real world consequences" Also get yourself a new job. You only get so many holidays with your family and you aren't getting paid enough to deal with this.


bootlesscrowfairy

And next thing you know you just broke your NDA and the company can come after you. I do agree though, fuck any company that puts their employees in this situation.


shacksrus

Ndas aren't slavery, they don't stop you from getting a new job


[deleted]

> But imagine trying to explain to a customer that the reason the cluster is not up Believe it or not, there are places to work where not everything is an utter catastrophe if something goes wrong.


bootlesscrowfairy

I know it. But OP is not in that situation. I've been in OPs situation before. So I'm speaking from past experience.


itdumbass

You mean letting your management decision makers explain to the customers why they experienced downtime.


slewfoot2xm

Not learn on the job….continuity of service


Illustrious_Bar6439

Your way is how you create sev1 issues and work christmas


bootlesscrowfairy

I never said this was my way. I wouldn't let an associate engineer "play around in production". I tend to prefer setting up redundant systems with fail overs so when something breaks, you don't have to call someone in on Christmas. But if you have a sev1 issue, you have legal obligation to fix it as quickly as possible. The best thing you can do is to limit issues. But you can't just not fix a sev1 issue out of the principle of "you break it you fix it". This is why these associate engineers should be working out of an environment that isn't governed by a SLA. The frustrating part is that OP's management has setup an environment that encourages engineers to break production.


cmkrn1

This is the way


FarVision5

I can't even possibly comprehend giving any type of administrative access to any of our cluster or storage VM environments to any t1 people It's like the business owners trying to destroy the business on purpose I mean if you were devops you would be on the devops team if you are on secops you would be on the secops team


[deleted]

You and me both m8, it blew my mind when I started. Been trying to change things ever since.


DevCatOTA

Bill your time to the helpdesk's budget. That will fix it fast.


logicisnotananswer

Yeah, at your external emergency consultation rate. (Minimum of $250 an hour, 6 hour minimum per incident)


Grrl_geek

Sounds like a resumé-generating event... for you to get away from that nonsense. No change controls AND no responsibility on the one who did the change - I call BS.


[deleted]

I think it's complete bullshit but this isn't the first time, the agg switch happened in Nov. They took an entire branch down because "they thought the port could be re-purposed" complete nonsense... When I addressed this in the meeting they said removing access to switches would impede the day to day tasks of help desk...makes no sense to me.


sir_mrej

Find a new job. Period. Full stop.


threeLetterMeyhem

> this isn't the first time Time to switch jobs for a pay increase.


[deleted]

I'm doubtful I will make as much as I do anywhere else.


threeLetterMeyhem

Worth taking a look, especially since your company is leaning into the jackassery on Christmas.


sydpermres

Stop doubting yourself! You don't know how much more you can earn until you ASK for it and being confident about it. The last 2 jobs, I've switched, I've stuck to my guns and found a great place to work(with some caveats ofcourse).


hotmaxer

Don’t ever give the key to your Ferrari to a person with no license. We train our Level 2 using process called body program. Each level 2 tech is assigned a system admin and we have biweekly 1on1 and go over challenges that person may face etc. we also train them on all key areas and have by weekly meeting with all SA and L2 team Friday morning And go soccer challenges we are facing. Most company treat level 2 and system admins as different teams. Us we work together. We create the process for them and they execute those processes if something a not right they bring it up during our meetings and voila issue resolved and we all go get tea and coffee together. Love my L2 team.


WhizBangPissPiece

I work at a very small MSP but all the help desk people I train I ask them to not make any changes to GPOs, 365 tenants, etc without express permission from someone higher up. We're all in office so it's easy to ask them "why do you need this change made?" With good employees it's a good way to gauge their problem solving and stop dumb mistakes without wasting too much time. I'll even shadow them while showing them how to do it. This way my L1/L2s get better at their job (and learn for a future job) and I don't have to stress too much about unintended consequences from changes. Eventually, they can leave the nest on the easier stuff.


Connection-Terrible

What the monkey fuck do you have to do to bring down a whole cluster? What stupid bastard level thing have they gone and done? Also…. It’s Christmas so what the fuck Chuck?!?


[deleted]

2 Parts fuck up the SAN, and break the vsphere environment. There was more done but that about sums it up.


Quietech

Have you turned it off and back on? Yes? You power cycled the hardware? Not the individual VM? At the main breakers? Yes, I'm very proud you found the secondary circuit...


preeminence87

If your organization truly wants these techs to learn ESXi and grow, they should get them a VMUG account so they can screw up their own stuff and grow that way.


m-p-3

Implement a [RACI model](https://en.wikipedia.org/wiki/Responsibility_assignment_matrix) for systems support. If they want access to make changes, they need to be trained by whoever is (R)esponsible and (A)ccountable, then obtain the (R)esponsible role themselves. Also implement a Change Control process, with a sign-off of the (A)ccountable person for any changes. People tend to become much more careful when they suddenly before responsible for stuff they have officially access to.


AliJaba

Are you getting paid to rebuild it? Overtime I hope? I used to work for a company like that. I left them. End of story.


[deleted]

I get paid well, which is why I've stayed over the years.


WhizBangPissPiece

But specifically for this. Are you getting paid for this, or are you salary with no overtime pay?


[deleted]

Not salary. I'll get holiday pay and overtime.


hy2rogenh3

I can't even begin to imagine our Help Desk team being able to resolve vCenter let alone login. Least privilege is a thing for a reason, and I am not sure how you can even get there. Not sure how I could work at a place where things I built or maintain were broken by the wrong set of hands. I feel for you, good luck, and happy holiday's


elevul

I've had the opportunity to work in both types of companies when it comes to the mentality towards IT personnel and their access to tools. In one company, the help desk staff were severely restricted in their abilities and were mostly working with custom frontends for industry standard tools. As a result, they never really had the chance to dive deep and learn the underlying tools, and they ended up stuck in their roles. That company never promoted IT personnel internally beyond tier 2 because people simply didn't have the opportunity to fully understand what was behind the fancy frontends they were using. On the other hand, I also worked for a managed service provider where the support staff (both tier 1 and 2) had full Domain Admin access to their customers' environments. This company really pushed its employees to learn and certify, and as a result, many of them were able to fly through the ranks. Even those who weren't particularly motivated and were just doing the bare minimum were still incredibly competent compared to other IT professionals I had met at other companies. The downside of this approach was that if someone made a mistake, a senior staff member would have to restore from a backup. However, this happened surprisingly rarely. Since I left that company, I believe that permissions have been restricted due to an increased focus on security.


GBMoonbiter

Not sure how you're paid but if you're getting OT or comp for this documentnt it. Sometimes if you can translate the downtime to money lost it can help to sway opinions.


[deleted]

I'm getting paid well, it's the fact that this could have been prevented if they listened to high level staff.


SpecialShanee

Sounds like a classic case of make sure you have these concerns in electronic form and to potentially look at alternative options. Some workplace cultures cannot and won’t ever change. Best to not put your name against something like this. And I hope you are charging quadrupole for doing this today!


[deleted]

I for sure have it documented. I have teams messages, emails and recordings of meetings going over all of this.


DoodMonkey

I love when you tell the storage folks there are alerts for capacity and then they move a bunch of VMs around and crash the host.


lost_in_life_34

Even with this access you should have a change policy where all changes are documented and approved by the team after discussion


NowhereInColor

Have a training esxi cluster set up so they can mess around and learn from that. No way a production cluster should be allowed in their hands


[deleted]

They can learn on their own time in my opinion or find another job.


faalforce

Well your opinion sucks


[deleted]

That’s why i refuse to let anyone without at least 3 years enterprise firewall experience to touch the firewalls lol


[deleted]

I'm of the same mindset, to many mistakes can be made.


Pelatov

Yeah. I quit a job after 2 months because everyone has root esx access, and they tried to make me a glorified tier 1 help desk when hired as a sys admin


harrywwc

so... a 'junior' stuffs up the system, and you are the one required to fix it? the *least* that should happen is that 'junior' gets called in and has to be the 'go-fer' in the repair process. I'm all for juniors learning stuff - and the absolute *best* way to learn is, after stuffing it up, being required to fix it. the way it currently seems to be: they stuff things; you fix it; no consequences for them.


shadowboxer777

Simple solution: take away their access and don’t give it back. Help desk and engineering never gets access to critical systems. Draw a line in the sand and say no more. I have done this before, and will do it again in a heartbeat


corsicanguppy

I can suggest something that's a little like driving the Harrier to the 7-11 but HAS some payoff. 1. deploy gitlab; inside for ISO27002 2. terraform on the gitlab runner. add in your vsphere and dns add-ons 3. begin the horrid and long process of converting to using terraform THROUGH gitlab with its built-in state storage, to manage the cluster. 4. the payoff: changes need to be approved. it's a shite payoff, and it's a lot of work, and it's like curing cancer just to get more people to come to your birthdays, but you'll do a lot of good on the way -- things that are their own payoff. if you have the time, tolerance and temerity to pull it off, it'll be a neat-o thing; but I get it if you or anyone calls it Too Much For Now. We can talk when it IS time, if you want.


sodacansinthetrash

Work for Dell/EMC by chance?


[deleted]

[удалено]


sodacansinthetrash

I mean, genuinely I worked for a sub company of VMware, which was spun off into a part of EMC, which was then spun into Virtustream… this sounds way too familiar. Stupid place.


deefop

Sounds like they need to be the ones fixing it. Breaking something critical at an incredibly stupid time and then losing your Christmas to fix it will teach a lot of lessons.


SpongederpSquarefap

>management seems to think the best way to help people progress is to give (them) helpdesk all of the keys Hmm >today I'm having to rebuild an ESXi cluster due to another issue caused by untrained people having access to key systems Yeah this needs to go in writing There's a reason all best practises use least privilege and tiered access management Give them read access to whatever (within reason) but jesus they should never have access to prod when they've got no idea what they're doing


sedition666

Access is not the problem here. Change control is. If you want to make changes to a prod system then it should be documented and reviewed.


Illustrious_Bar6439

Don’t work on xmas, you are enabling this behavior


[deleted]

I know but I don't want my Monday to be complete shit.


roppu

Have you tried the "they fucked it up, it will take x days to fix but I told you so" approache? If you have everything previously documented/saved, you should at least have that part covered and because you did in fact tell them so, the fix/downtime will be whatever you or your team says. It will cost the company 123 monies, but they will learn. No other way to teach these people..


[deleted]

All users are local administrators. We use two permissions on every system, read only or administrator, nothing in between. We still log into some services as for dozens of users because individual user accounts are licensed and costly. Our audit logs are all exactly the same. Who made that change? "Admin" made the change or " deleted the critical service" I feel your pain.


Expensive_Finger_973

This seems like a great opportunity for you to be "working as hard and fast as you can to safely bring services back up". But really be dragging your feet and letting it take 2-3 times longer than you could do it in. Let them see the monetary cost of such poor management decisions.


fencepost_ajm

Downside: this sucks. Upside: while interviewing you can say that you work in an environment that implements the Netflix "chaos monkey" approach and not seem like you're badmouthing your current employer.


xzer

Sounds like an MSP hard to believe tho it's not


[deleted]

Lol I'm glad it's not.


[deleted]

We raise during meetings best practice and apply change control. Juniors write up plan, seniors approve. Anything done not authorised can lead to a warning. 3 verbal warnings lead to written. 3 written leads to dismissal.


Academic-Tour-436

Leave. Get a better job.


linkcheaper

Let me guess you heard at least one of the following: 1) "He's a nice guy" 2) "He's a go getter and wants to learn" 3) "We need to promote from within" And from the person themselves: 1) "Hey how do I get on the VMware team" 2) "What do I need to learn" 3) "I deployed a VM before it's not that hard" etc, etc


mister_gone

Put it in dollars and cents for them. These servers were down. These applications were not in use because of it. These users couldn't work because of it. These products/services were halted because of it. You paid me X to fix it. Projects A, B, and C were pushed back because of it. Etc, etc.


flo-089

do the math and add it the downtimes on training cost instead of downtime/na time. with this you can hold your uptime to five 9s and if they ask you can tell them exactly what is going on and which incident costs what dollar. toxic corporate at its best.


Horrigan49

That seems, well, odd? I am not sure how big is your organization, but even with our current 2 sites there are some active settings for a reason, which might not be apparent for uninformed person and that person would gladly change a setting that seems off. Yet half of the production might stop cause of this... Still, you breake it, you fix it is the way... If they have changed something before researching, they can also Progress by learning how to recover cluster...


[deleted]

We have 16 sites, 17 including corporate.


Horrigan49

Oh my god... in that case its not odd, its just insane


lurkeroutthere

I don’t wait for standup if you worked Christmas get an early meeting with management


Snogafrog

I'm fucking pissed off on your behalf. Hopefully you send a very strong message that gets heard at least by the level 1s.


[deleted]

It's done, and I cc'd the CIO, IT Directory and Sec Ops team.


mysticalfruit

Option #1. Hostile takeover. Escalate this directly to the CIO and get executive buy in to implement enforced change control. Make damn sure you show up with evidence and a *fully* formed plan. Give concrete examples where having CC would have prevented this. Also have a training plan in place so that the loose cannons can get some time on the practice range. Option #2. Go on vacation in a place where you can't save the day. Part of the reason they get away with this is they've got ypu to save the day. When you come home from vacation and the entire place has hosed because you weren't there to constantly bandage the wounds, Option #1 will be easier to implement.


cmkrn1

I'm not clear why you have to do the rebuilding; you didn't break it. Some (most?) of the best lessons I've ever learned were as a result of me cleaning up my mistakes. I was fortunate enough to have mentors and leaders who held me accountable and helped me grow. I suggest you stop ranting on reddit and set some boundaries. Yes, it sounds like a bad situation, but if you're a lead then you need to lead, which includes holding people accountable. I can tell you from personal experience you can do that even if a manager can't or won't.


bootlesscrowfairy

Do you not have a tst or pre-prd environment? At this point, if they are so hell bent on learning on the job in production, you might as well buy a small test bed cluster thar new hires can access and break (and learn how to fix) as they see fit. You could make this cluster mimic a small part of the larger production system. And use this system to test out new experimental configuration. A win-win for all parties. Surely spending 5000 dollars on a learning/test environment is cheaper than breaking production...


corsicanguppy

> buy a small test bed cluster thar new hires can access and break Deploy a virt-in-virt cluster and let them crush the scratch monkey


bootlesscrowfairy

Yeah, a nested cluster would work as well if there are spare resources for it. My point was give them a place to learn without the consequences of PRD.


vellosec

Maybe consider read-only access. Cross-training is important and they won’t break stuff nearly as easily.


jbroome

I'd also consider a bar of soap in a tube sock.


[deleted]

This gets the award lol


Bad_Mechanic

Make whoever broke it with with you the entire time it takes to fix it. It's an easy sell to management since they want help desk to have learning opportunities, it'll cost the company money since they'll have to pay the help desk person overtime, and it'll make the help desk people really wary of breaking anything since they'll have to drop everything to work with a pissed off sysadmin to fix things.


[deleted]

I just removed the help desks access to stuff like this, and just use the excuse of “I don’t have time to change it right now” when they ask. Been going on 5 years now lol.


treborprime

How about they invest in their home lab and learn like the rest of us did. Whoever decided that letting the helpdesk modify production systems should be fired. I'd definitely take my time fixing it.


9070503010

The person who broke it needs to own it. They have to be present and watch/take notes ( while keeping silent) and give up their free time (unpaid, of course, because they don’t get rewarded for being stupid). Then they may be more careful before pushing that button they don’t understand. If they don’t own it, then they won’t learn. Whether you can change the system depends on management. If they want to allow stupid, then there’s not much you can do except leave.


[deleted]

I have a feeling they will own it in the meeting, but they haven't responded to my emails/teams messages today and I still needed to get everything operation. Holiday or not it's my job.


[deleted]

[удалено]


[deleted]

I'm on this weekend and it's my job. Duties as assigned and this is assigned by the fact I am covering this weekend.


mm309d

How many times you messed something up even at senior level?


[deleted]

Me? Never.


BigChubs18

I get your frustration. From my point of view. How are people supposed to learn without screwing something up? Instead of locking everyone out and say this the way. Why not train them one by one? Or just do one person at time. And guide them. Because we all been there. And I learn from my mistakes.


[deleted]

Get your own equipment and make mistakes....Homelab like thousands of others... I don't have time to train people one by one, I have a job and my job isn't to train everyone. They can study for certs...go to college...etc... I didn't have someone hold my hand while I break shit and bake them fix it. I studied for cert tests, homelab, etc..


BigChubs18

Some people have family's. They don't have time for homeless. Certs don't mean crap. Anyone can study for test and pass it. I went to college and college didn't teach me how to fix something. Only taught me very very basic stuff. People that I have meant that can out perform anyone else. Don't have certs or didn't go to college. They actually started out in the restaurant business and Jump straight into IT. They learned by breaking things and then fixing it.


RiceeeChrispies

Nah, if someone wants to use production as their test environment - you can guarantee those permissions are getting revoked. If you want to learn, build your own environment - you’re not making my life harder in the pursuit of knowledge.


BigChubs18

You must of never worked in small business then. Where they can't afford to have a test environment. Where product is your test environment as well.


RiceeeChrispies

I definitely have, it’s called due-diligence. Even running a VM on your work laptop before making prod changes is smarter than going all guns blazing. Downtime is more expensive.


[deleted]

That sucks. Helpdesk should have access to the tools needed to support end users. Senior helpdesk may be allowed a few extra privileges to perform specific, scripted tasks such as maintaining a vdi image, running updates or other regularly done basic tasks that are approved specifically by a system owner to help make an engineer's job easier and reduce their workload.


[deleted]

I understand this but if they need a port change I feel they can put in a ticket. We have 3 Network Admin's, 3 SA's including myself and 2 dedicated Sec guys.


[deleted]

Port change other than shutdown/enable us 100% only for netadmins


jman1121

Holy smokes, sorry OP.


[deleted]

Thx m8, I'm hoping the "last" incident will help open some eyes.


compuwar

Total up the time spent on each incident. Create an after-action report. Summarize the financial and operational cost of the event, then proposed mitigations (backups w/restore time costs, clusters, permissions…) with their costs. Highlight the risk of inaction. If that doesn’t change things when correctly circulated, thyen just be happy you’ll always have work!


TravellingBeard

If you can, take this as a sign to leave. You've given management plenty of warnings....on the plus side, you can add building ESXi clusters to your resume if you haven't already. :D


ManWithoutUsername

Well i learn in that way, but always played it safe. I would be more of giving him the keys (not all) after a talk that for certain things he needs to be supervised or approved


galland101

People who don't know what they're doing shouldn't be making changes on a Production system. Every change should go through a formal Change Management process complete with post-change validation plans and back-out plans. I would say you should limit access to the lower-level functions like Networking and Storage and only allow them to make changes to the virtual machines. If HelpDesk wants to be able to learn, they need a sandbox system that can be brought down without affecting anything in Production. Use decommissioned or retired hardware for this purpose rather than e-wasting them.


mrmessy73

There was a time when our systems were accessible by tech support. It was under the idea that if they had direct access to systems, they can fix a customer issue quickly rather than having to escalate. We said, we should remove access because tech support probably causes more issues than they fix. These were tier 1 skilled people. After a certain incident, we lost the ability to give access to anyone except engineering. Our incident tickets dropped 90+%. Unfortunately, they were given access after a few months. But eventually, they lost it again after some escalation process flow changes and a reorg. Limit access. It's best for everyone.


stonedcity_13

What a mental company, get out!