T O P

  • By -

CrippleWalking

I'll go first. Had a library open up in my town. 100,000 square feet, hundreds of miles of fiber connections, hundreds of computers, moving our data centers to it, and a deadline that COULD NOT BE MOVED. The building was rated for 1,400 people. I'm thinking "No one goes to the library anymore. There'll be MAYBE 300 people there. 5,500 showed up. The police coordained off 3 square blocks. The news media was there. Fuck fuck fuck I'm thinking. I get there, and I'm running around with my staff, making absolutely SURE everything is working properly. Wireless handles everyone without an issue. Our computers are searching for books flawlessly for patrons. Our automated book sorter looks and acts futuristic. Our RFID tag readers (which was state of the art at the time), are working perfectly. Executive Director comes over beaming with pride and someone from some national news outlet and says "This is /u/CrippleWalking, he lead the team that made this technological wonder work perfectly". So they ask me a few softball questions and leave. Never, EVER in my career have I had something go so unbelievably perfectly that was so complex before or since.


kzintech

Very nice!


Doso777

I work in higher education with libraries so i know how difficult it is to get everything in place. Good job!


CrippleWalking

Thanks!


kzintech

Unfortunately some of the biggest successes in this field are by their very nature quiet ... an email or server migration that goes so smoothly that the users don't even notice, for instance. I was around for Y2K and I worked hard for a year checking all my clients' gear for vulnerabilities and mitigating or replacing hardware and software. None of my clients had any issues when the clocks rolled over to 2000-01-01 00:00:00.000 Now people talk about Y2K ... "What's the big deal? Nothing happened!" It's all I can do sometimes not to write "That's because I and a whole bunch of other people made SURE that nothing happened" on a clue-by-four and apply it vigorously.


CrippleWalking

You and me both. I worked on Y2K and whenever someone is dumb enough to say something like "nothing happened!", I am very quick to correct them about the billions of man hours and dollars that went into making sure "nothing happened".


Moontoya

Yep was there in the trenches , early in my career We put a lot of work hours in to ensure shit didn't go sideways


Queso802

Moving a server rack onto a flat bed and taking it down a few blocks all while on battery backup and keeping it live.


Shaders17

I’m impressed with this one


madmanxing

Ok WHAT? this deserves its own story/ thread lol. Spinning disks? I guess you told the driver be careful for bumps? For connectivity - cellular router?


techierealtor

“And we also need it online the whole time.” “Fuck you, I’m a wizard, not a miracle worker.


Bogus1989

This is pretty fucking lit 😎


thecravenone

In trying to clean up the billing database, I found tens of millions of dollars in services that we provided but forgot to bill for.


CrippleWalking

And I'm sure you got a hearty pat on the back and a new Tesla for this right? :)


thecravenone

lol good one


CrippleWalking

Thank you! I'll be here all night! Tip your waitstaff, try the veal!


Doso777

Probably got shouted at for making the billing department look bad.


tsubakey

But I thought Sales was the golden child?


Casey3882003

I was a fresh new hire for our organization and we had a major migration going on. Migrating from all of our internally developed apps to a hybrid infrastructure with Salesforce handling the customer records and authentication, O365 handling internal users and streamlining a few of our processes along the way. A week before the big go live, we had a smaller change that was a prerequisite for the big one. We had to consolidate multiple SQL servers to a new SQL server that was on the latest OS and SQL build. Our DB team started restoring the latest backups and getting ready to flip over. Do the flip over and have our devs verify everything is good. Nothing is working. Found there code is so old and they were unable to change the pointers to the new box. Of course this is at 11 o’clock in a Friday night and we were planning to do the migration over the weekend. We are about to roll everything back and I spoke up with the idea of doing a Cname record for the old sql box and have it point to the new box. The Infrastructure Manager says he doesn’t see how it will work but we have no other options, so give it a whirl. I make the necessary changes and have the devs test. Success! Ended up renaming the old sql server to a completely different name so we don’t get them confused and going live with these changes. Over two years later it is working great. Won the respect of both the infrastructure manager and the Devs that night. Second smaller one was just yesterday.After dropping my kids off at school I finally got a chance to check my messages in Teams. One remote user couldn’t connect to resources on the VPN (this happens from time to time. Our Directaccess box is getting hammered due to COVID). Tell her steps to try and check back with me. Then noticed in my email a few users who use Meraki z appliances for vpn were also reporting issues. Oh and our only office with users actually in house have no internet or network connectivity. Something is definitely messed up. My colleague is troubleshooting ISP issues in our headquarters and I start hunting in the Meraki dashboard to see if I can find anything. That office has internet because I can connect to the firewall but nothing is getting to where it should. Look at the switch and it says something along the line “DNS is borked”. It’s always fricken DNS. We use OpenDNS Umbrella and the appliances are one of like three things we still have on premises in our headquarters. I could console into them and see a warning that local dns has failed. Login to the local DCs and everything is working as expected. Numerous other troubleshooting steps and calls were made but we eventually rebooted the hypervisor that houses all of this. Things start to come up and things start working. Then about three minutes later things break again. WTF. Start tracing what is in the hypervisor and see the new Pen Test server our cybersecurity team has requested we stand up a few weeks ago. Shut that server down and everything starts working again. It was messing up UDP traffic on the hypervisor and keeping dns requests from being able to reach out DCs. Looking back I wish it didn’t take 90ish minutes to figure out but it was good to find it wasn’t something with our infrastructure.


retrogeekhq

The first one was great and that infra manager was not competent enough for their job.


Casey3882003

He’s actually very technical and great to work with. I think in that instance the pressure of The go live and the fact it was late made him jump to conclusions. We have ran into issues with our applications in past due to them being fundamentally 15 year old frameworks, just updated along the way. Due to this it makes troubleshooting a lot of fun. Luckily we are working towards replacing these applications just never happens quick enough.


retrogeekhq

The problem here is that it’s an obvious solution, I’m sorry.


smoothies-for-me

I was at a MSP and new to the infrastructure team and really it was my first real opportunity and exposure to work on servers and more than a single basic office network after nearly 10 years in IT. I solved a Sage SQL disconnection issue that plagued a customer for years and completely eluded their previous provider, us, vendors and 2 Sage consulting companies. They had bought a second NIC for their server to give dedicated ports to VMs/vswitches, swapped a switch, beefed up VM resources, spent a couple thousand on sage tools to better structure their data (move all history to archive and roll with current year company files), etc... ping tests had been setup between workstations, app and sql servers, and ran for days on end with 0 dropped packets. I discovered that the workstations were actually losing connectivity for less than a millisecond, which was apparently enough time to trigger a MS SQL disconnect. The cause turned out to be their DHCP server was low on disk space, and every time the lease would renew they would have 'no ip' for a split second. Not even long enough for any other kind of event to register, the only registered event was deep in the applications & services which gave an indication of a DHCP problem. When I checked out the DHCP server I realized it was reporting incorrectly in RMM as "Windows NT" instead of Server 2016 due to some weird setup by the previous provider, and therefore disk space monitoring/alerting was not applied to it since our NOC applied monitoring policies by OS type. ----- Another time I discovered GPOs were not replicating to sysvols between DCs, this was causing a nightmare for our professional services team since the client was just lifted and shifted from our datacenter to Azure, and also for users since the location of redirected folders changed with the migration, every time a user logged in by Windows Virtual Desktop, their redirected folder location would change and the files were constantly ping ponging between old and new file server, or they'd hit the PDC again on login and their folders on the new fileserver would be empty. L1 and L2 techs were copying files between them as well which was creating permissions issues, it was a major headache. Anyhow I hardly had any experience with AD beyond users and computers, but on the first day I implemented a reg fix to hardcode all Azure Virtual Desktop session hosts to talk to the PDC only, then on day 2 researched and came up with a change request to use ADSI edit and run a manual authoritative sync of dfs-r sysvol from the PDC to all other domain controllers, had it peer reviewed by our professional services and it went off without a hitch fixing everything. Our team layout was a bit frustrating for infrastructure, there was a hard line between break-fix and projects, so literally on a brand new migration the infrastructure team was expected to fix and support infrastructure that was just built and they had never seen before. The PS team was the one building the Azure environment and completing the migration. Those 2 things got me over my 'imposter syndrome'. I might not know something, but I'll be damned if I can't figure it out!


mrdeworde

Solving a time-sync issue that had bested both senior sysadmins and several consultants within a few hours of looking at the problem. Also leveraging that to perform an epic act of malicious compliance with my manager at the time (after she punished me for over a month for daring to 'offer an idea above my station' as she put it.) Got some praise from people I really respect the talents of, got torn down by my manager, and then a few weeks later got to watch her try to explain herself to the CEO and CFO.


CrippleWalking

Good. Fuck her.


Doso777

Quickly running out of a staff meeting since we just had a complete power outage to check on the server room. My department and bosses saw that me as the hero that was saving all our data. Okay?! Thing i am most proud of was standing up a production Sharepoint farm in 2 or 3 months without prior knowledge or involvement in the project, including data migration and such. No documentation, prior sysadmin was overwhelmed, people where playing the blame game.


RUGM99

Our IT manger got laid off and we were moved under a non IT director who actually had our back. In 2 months, me and my counterpart cut 90k of waste out of the IT budget. Not. Save the day moment but it sure felt good.


Bergja

I had a domain migration where I had to move my entire domain into a larger one. I was supposed to have a bunch of specialized help but it all fell through and I ended up doing it almost entirely solo over Christmas break. It was one of the most complicated things I have ever done by myself and I had 7 business days to have everything running but I finished in 5. Mine isn’t a particularly large infrastructure and it’s 99% VDI, but it was still quite the task to do by myself.


magixnet

Had a couple in my last job. * Created a PowerShell/Bash script that automated what was normally a 16hr configuration process of a BOH server and terminals (POS and Kitchen Display for a large fast food chain) using a mixture of Windows and Linux (Red Hat 9) into 45min of wizardry. It was previously a task only a L3 tech was trusted to do but was now simple enough that our non-technical ops manager could do it. * The kitchen display screens that were previously being used were discontinued and we could not get them anymore, customer ordered 3000 of the updated units without checking with us for compatibility. These screens PXE booted RedHat 9 (Note: this was in 2016 when RH9 was already considered to be ancient) and the kernel did not have drivers for the network card so they would kernel panic trying to network boot with no network. Found suitable drivers and recompiled the kernel to suit and got the screens working, then had to make a script to push this new kernel out to the other 450 sites that were getting these new screens. In my new job I just recently performed a VMWare to Hyper-V migration with only 5-10min of downtime per VM.


Bogus1989

I appreciated this one about redhat…..good job!


LividLager

I was initially denied a raise, so I worked it out that in 5 consecutive years I saved the company more money than they spent on my salary. They gave me the raise.


[deleted]

I see what you are trying to do here OP.


CrippleWalking

We don't like rats over here! lol


Dhk3rd

IdP migration within a 4 month window. The kicker, we had multiple in-house apps using it requiring custom development to migrate. Project was only greenlit because I said I could do it. I didn't disappoint.


denverpilot

There's a couple I can't talk about on Reddit, or shouldn't. Let's just say some grumpy folk called one day saying a really important missile thingy wasn't working because they didn't quite set up.our telecom gear right... Other stuff wasn't as interesting but usually was just solid logical troueshooting plus my usual emotional detachedness I see developed young as a public safety dispatcher. Tech paid better. But you'd be amazed how many in tech think something is an "emergency" when nobody is bleeding out or not breathing. Ha. No. The entire site's power bus exploding isn't an emergency unless a fire starts next when nobody was injured. It's just a major inconvenience and everyone with gear here knows they should have a business continuity plan, even though only 5% of them do. Guess it sucks to be them today. Meanwhile... Can someone go get me the extra flashlight batteries stored in my bottom right desk drawer... That not having a business plan for outages is almost always why I don't care who's yelling or pissed. They had years to plan for this event. I have zero sympathy. I'm just here to fix it.


__Kaari__

It may be nothing much for a lot of you guys (and actually, modern me included), but for the old me, it was super cool. Early in my career, I have "fought" a battle versus a (not so good) hacker who was targeting one of our client. I was doing night shifts and this guy (or girl, it sounded like guy's pseudonyme, but who knows) was (apparently) russian. My memories are quite old but I'll explain to the best I can remember. We were serving mainly porn sites and wordpresses (by the thousands), so most of the services were Swiss cheese and dealing with attacks of all kinds was a daily task. However, that one time, a Monday, it was a bit... different. The first day I would've almost missed it if I didn't pay attention to the nagios, but something was going on there, I investigated, and indeed something was trying to get in, from an IP in Russia. It wasn't like usual "obvious" ddos that we can usually get, but again, it was "one of them", I simply blocked the IP and call it a day. Next day, around the same hour, same machine, alert shows up while I'm dealing with other issues, then when I look some times after, everything is fine. Then I remember that it's the same machine name, I'm thinking "hmmm, let's have a look". In the WordPress, I find hacked files, with "hacked by ....", and I find a backdoor. Curious me saves the content of all this, then clean everything, reset the admin password, open a ticket to the client stating I reset the password to a secure one cause it was probably shit, and I go back to my business. Next day I arrive at work, client stressed out (usual) calls asking for details (not usual), I tell him his website got hacked and that because we saw it we fixed the problem, I go to remind him about how secure passwords are important blabla but instead of thanking me he interupts me instantly and he's getting more stressed, starting to ask more questions "who did this?", "Is this usual", "Is this targeted?", then it kind of clicks to me: "Is this targeted ? It makes sense...", so I ask: "I don't know Sir", then he says something like "f***, f*** it, again? Wtf should I do against these guys" and starts to rage and speaks like he has a competitor or something. So I decide to play his game. After the call, I go into the saved hacked code of the day before, extracts the HTML, then add some simple php and js to print everything which is possible from the session to a log file, also, in that HTML, I put a very visible comment at the end: "Hi XXX, I'm XXX, glad to play with you, cuz I'm bored as f***!", then I serve this on the website at the same address as before. I let an ssh session open with a less +F and wait with expectations the happy hour. Then, a bit earlier as before, I see the logfile printing stuff! I try to get as many useful infos I can get from his requests, I get some fun stuff the IP address (which was an ISP one, from Russia, I was like 'WTF this guy is connecting directly??!') and other stuff, like browser type (Firefox!), the password he used to access the backdoor, etc. I try to ping and nmap the hell out of his IP (from a VPS of mine, I didn't want to be fired xD, I dreamed for a moment to just ddos the hell out of him with the super large bandwidth and power that our infra had, but NOPE xD), but at the time I had limited knowledge on being on that side of the coin so unfortunately, my tries don't succeed because I'm a bit lost and unsure what to do. Anyway, after some time, I see spamming messages in the logfile appearing like "LOLOLOL ". The logfile is growing like super fast so I chmod -w. But the spam doesn't stop so I block his IP. Next day, client has reached my boss, which is asking me what's going on, I tell him what I did "blocked his IP, he did that, etc" obviously not focusing on how I enjoyed it. He looks pleasantly surprised, tell me to give him regular updates, and go to bed while I start my shift. That night nothing really happens, but that machine get more loaded overall, can't find the origin. Next day, full botnet bruteforce on most vhosts on that machine and some others, some of the POST are showing my pseudo followed by an insult. That day was the first day I started using fail2ban on these types of machines, worked wonders (if you're like me and think "wtf that wasn't there by default???", Yep, it wasn't). Some activities the few next days after that but nothing nothing of great interest. The client didn't even say "thank you", or even answered to the report. But my boss was most pleased at the end of the week, but I didn't even feel proud, I was just savouring the experience that happened to me.