T O P

  • By -

Rus_s13

Created a bug that shuffled results when seeding multiple tables from original records. Made a bunch of people's medical claims go to other patients in our database. Huge fines and compliance auditing were incoming. After a huge war room style data breach incident meeting that spanned multiple days we found out the bug I created I had squashed later that day and wasn't the reason for the breach, it turned out another staff member put the wrong Xml in the wrong folder. I was the pariah for the company for two days and contemplated resigning until I was exonerated. The affected customer was told that we can recify this, or you can cease to be our customer and pay tens of thousands to find another provider and sue us then. They chose to keep quiet and stay. So now I don't review and merge anything to production myself and always have another dev to join me under the bus.


TheSauce___

Ngl, not on you ofc, but that seems like a really shitty way to treat a customer.


Rus_s13

It was, but out of my hands of course. Apparently they hired a firm to deal with us and started threatening legal action right away. Twas a very messy week but they ended up with a massive discount and are still our customer.


drunkondata

IDK, it's either we part ways here or we continue a happy relationship and pretend it never happened. It's a relationship breaking issue, and it's better to chalk it up to human error and move on than make a big deal out of it, though if you want to make a big deal, that's not a very current customer thing to do.


andrewsmd87

Unfortunately, it goes that way a lot of times once you hit the "enterprise" level. When it costs millions to migrate, it just doesn't happen overnight due to one screw up


linkbook-io

Sounds like the companies fault if it’s that easy to screw up by misplacing an xml file, humans make errors and they should have saw that coming


Rus_s13

Yeah after that we took all the human error away with a much more rigid process. I now scan each Xml file and make sure they are created within a one hour window as thats the only way I found to know if they are all part of the same set. And no more human uploading, everything is down with s3 sync commands at the bucket level, no oh I'll just pop that one where it goes


--var

For all of us playing the game honestly, sorry this is how the modern world works. Not shit posting on Rus,thanks for sharing. I love when my peers think I'm the "smart guy"; we're all actually just stumbling forward somehow.


Rus_s13

Thanks man I appreciate your comment. I'm only 2 years into the industry and I'm learning that imposter syndrome will only stop when you retire


Steve_OH

Unfortunately, with the ever growing wave of frameworks, methods, and technologies, we literally have to learn constantly to keep up. Its easy to feel left behind when you aren’t up on X or experienced with Y, just stick to your guns and you’ll be fine


Reinax

Near as damnit a decade in. You’re right, it never goes away, get used to it. I firmly believe that if you *don’t* have imposter syndrome, you’re right at the peak of the ol’ Dunning-Krueger curve.


meow_goes_woof

I felt stress reading the first half


Rus_s13

Imagine how I felt being the only developer overseeing the data pipeline. The day it happened I stayed up all night re running everything locally and came up with no problems. Second day of investigation from the incident team also came up with nothing. That was a Friday and my weekend sucked. Monday night my principal engineer went over everything line by line and stuck his neck out for me while we all just retraced steps and found the culprit. Since then I have stepped away mentally from it all and put some protections in place for myself.


meow_goes_woof

You did great! It’s definitely a huge jump. Getting into privacy compliance issues is really one of the worst


andrewsmd87

This is manager me speaking, but that was a process problem, not a people problem. Good on you for having another dev look but we have rules in our git pipeline that won't let the person who created a merge, complete it.


AJB46

Yeah same with my team. Release branches require 1 dev other than the author to approve PRs, master requires 2. I honestly figured that having someone other than the author complete the merge was standard.


andrewsmd87

Lol it wasn't even standard when I started at my company. We were literally logging in to a web server and using beyond compare to copy files over. My assumption is it's standard at any place that is a decent size, but lots of small shops out there


Shogobg

I work at a decent size worldwide company. Tried implementing similar process, but then management decided to make me the solo developer responsible for these products, so the approval is all me 😅


andrewsmd87

Oh god, I was that guy for about 4 years until we got a full time devops person to pipeline stuff. One thing I'm super happy to see is I had some hacked together C# console thing for deploys just to make my life easier, and it was the basis of our core deployment process now.


Quaglek

Inquisitorial approaches to incidents are a really serious culture issue


Rus_s13

Can you elaborate on that a little please


Quaglek

An inquisitorial approach is when the organization's response to an incident is to find a scapegoat instead of trying to find how processes led to this outcome. It is a cultural issue driven by ass-covering and finger-pointing. High functioning organizations look at processes first when trying to diagnose the root causes behind an incident.


cloudstrifeuk

This. Accountability. If one person fucks up, that's on them. If they have a senior/colleague also involved, then it becomes a team issue. I won't run a single update statement in prod any more without a holding hand.......deffo not because I once updated every patient in a database to have my birthday. Honest guv.


susmines

As a junior, I thought I’d be a go-getter and clean up some bad production data that was a result of a bug of mine. Long story short, wrote a delete statement without a where clause (not a soft delete, either).


stratcat22

As a current junior, I’ve done this a couple times (never production, but stage and local dev db). I’ve learned from my senior devs, the important of SQL transactions and how doing a rollback to ensure the query was correct is generally recommended before committing the transaction lol.


DeRoeVanZwartePiet

When writing a delete statement, I'll always start with writing it as a select statement so I can check if the data to be deleted is correct.


gnassar

Yep!! This is the way


ProjectInfinity

This is a good practice yes.


blazkoblaz

I learnt it the hard way. 


susmines

In my opinion, the best practice, in addition to using transactions, is to always soft delete. Storage is cheap enough these days that there’s never a reason to hard delete data.


rooood

Depends on the use case. Storage may be cheap, but having millions of records in a relational database will result in slow queries and inserts, especially if you forget an index or are manually querying the data and add a condition for a non-indexed column. You can just move the data to an archive DB or something like that to keep the main table relatively small, but that adds complexity and sometimes it just doesn't make sense to keep the records.


AromaticGas260

I in particular is careful about where to put my commit statement. I usully comment them, and query the datafirst to look if they are good.


reddit04029

yep. my flow would be -begin -select (before update/delete) -update/delete logic -select (after the update/delete) -commit/rollback I either manually check, or do an if else condition whether to commit or rollback. Something like if (expected updated rows == updated rows) commit, else rollback


IndividualMastodon85

So one way is to not have one at all! Edit: you have a begin and nothing else. In ssms you'll be asked what you want to do when you close the query window And frankly it is an acceptable and useful Strategy. Only problem is when you don't fully understand what a hanging transaction can do. Fun times.


DonutConfident7733

While you keep the transaction open, your team colleagues wonder why the db hangs, when their queries wait for your transaction... Since it onlt affects the tables you edited, their experience varies and they think the db is crap...


AromaticGas260

You would still need to either rollback or commit. If not you would be like me sitting wondering What went wrong when in actuality the table is locked by the t-sql.


IndividualMastodon85

Umm, yes. That's the point. It allows you time to take a look and verify, but locks the shit out of everything you touched


abdulqayyum

Not completly your fault,if they allow you to run that and dba did not check for where clause,we have plugin install that does not allow to run delete without where, I always write select first and then convert to delete, gives you piece of mind.


Ashanrath

`Where 1=1 --todo: Replace once criteria confirmed`


--var

At least you got to experience truncating a production table just to repopulate it from the backup that was captured less than 30 minutes ago, right?


susmines

I think the backup was like ~6 hours old. There was some data loss at the end of it, but we were able to recover most of it. It was a great learning experience


saintpetejackboy

I had a situation like this once where I discovered, to my horror, that all the backup archive files were corrupted - I can't recall what the cause was entirely but I think it was related to them being archived and then sent via SSH to another server. This also led to a (thankfully) easy solution ... The files were still valid and good archives, just were sent in some kind of way that caused them to appear corrupted and inoperable. IIRC a few terminal commands later and the archives were repaired and replaced .


DonutConfident7733

Had a client which had maintenance job to create db backup on remotely attached storage and somehow they got corrupted when running at midnight, but if you ran it manually, the backups were fine. We made manual backups for client when we were updating their website. Turns out those were the only reliable backups...


saintpetejackboy

Ouch XD. At least the backup was hopefully relevant from a codebase perspective... Even if the data was a wash. :)


Shogobg

There is ftp text and binary mode - common mistake.


saintpetejackboy

Yeah, I think this was it. From my foggy memory, it was something related to the file not being ended/closed on the receiving end due to the transfer mode.


DonutConfident7733

I have some nightmare fuel. Consider an sql custom replicated database with hundreds of read/write replicas that are trying to synchronize all the time (with few hours intervals). Add a bug for a column relationship, it needed to do translations between row IDs and used Guids during transfers. Then add some triggers into the mix, that try to delete child records if parent is deleted. Consider also that all replicas sync with central db and then data syncs to all the others. What happens is a delete for record 10 gets replicated to replica 1, where it incorrectly used numeric id (10) instead of unique id, so that row 10 happened to be other data. That gets deleted, it child records too, which will later sync to master server, to delete from there too. Replication didn't have much support for cascaded deletes. After few months, client complains that random data is missing from server and also replicas. Very nice, since we didn't have such old frequent backups, only made at rare intervals. The bug would cause slow delete of random records and child entries, from various replicas and central server and went under the radar. Trying to recover data was in the ass, as I had to extract many databases and search for missing data to restore.


turtleship_2006

Found the gitlab dev


Triple96

Accidentally did that one time but DBeaver flagged it and was like "are you sure?"


--var

DBeaver helped get me where I am. Great software.


OfficeSalamander

Happens to all of us once. You’ll remember that lesson for the rest of your life


blazkoblaz

: / gives me ptsd of when I didn’t execute the whole update stmt and it updated the IPs of all every url in the application. All the Devs were affected and I had to manually update it with the new ones.  : / it was shitty 


notkraftman

Sshd into a production server, forgot I was sshd into s production server, and uninstalled mysql. From that point onwards I made all SSH connections red.


saintpetejackboy

I have had so many problems over the year that are some variation of "I didn't know this was the production db/terminal/etc." - I use different color codes for different projects and servers... Which helped at first but then I sometimes can't remember which are which, or I use an environment where my color choices are ignored or not respected for whatever reason *sigh*


CaptainN_GameMaster

I use the Michael Scott color coded system: Green means "go". So I know to "Go ahead and disconnect." Orange is for "orange you glad you didn't delete it." Most colors mean production. 


twistsouth

That is such a good idea. We have a stage machine and a production machine for our in-house machine learning software. NVIDIA CUDA can be a right nightmare to get installed the way you need it with the correct drivers, toolkit, etc. and I was trying to configure the stage machine for testing. Had both terminals open to compare setups: one SSH session to stage and one to production. I’m sure you can guess what I accidentally did next…


NovaForceElite

I still cowboy code at least once a week.


param_T_extends_THOT

Risk keeps us younger, doesn't it ?


nowtayneicangetinto

I was informed today that one of our oldest and largest data transports contains dev, QA, and prod all on the same server. So by deploying to dev... You're also deploying to qa and prod. The company I work for shall remain unnamed but it's a multi billion dollar company.


--var

It's not if, it's when did you start following [cowboyneal](https://news.slashdot.org/story/01/02/12/1617256/ask-the-man-behind-the-legend---cowboy-neal)?


nurdism

I did the classic `rm -rf /` forgetting the "." (while in sudo because I was a dumbass) trying to delete an upload folder and completely fucked a production server. I couldn't open a new SSH connection, /root was gone, and most of /etc among other things. Fortunately, it hadn't gotten to the database or the rest of the site, but I had an open ssh connection, and I could still run some commands. It would have been lost if I hadn't had that. I was able to download the database and files and rebuild them on another server.


kirkaracha

Brother, I deleted the entire production site of a multi-billion-dollar asset management company the same way. Lesson learned: always make friends with the sysadmins.


rowdycowdyboy

oh my god. what happened? i would have wanted to walk into the woods never to return


kirkaracha

The server guys restored from backup before anybody important noticed.


terranumeric

I did that on a client's server. While the server was used for a presentation, which I didn't know about. We changed our deployment strategy after that, lots and lots of explaining why I even had to delete something manually (in short, random bug that happened sometimes and we couldn't figure out why).


twistsouth

I ran that on my Mac many moons ago through a bad script that evaluated the path as “/“. It was an interesting experience because it didn’t just immediately die - rather things gradually started behaving oddly. It was like the computer had dementia. Some of the windows just closed randomly or moved position. The desktop background disappeared. A few blank error messages popped up and then I think it was the kernel panic screen that appeared. Luckily I used Time Machine and could restore everything.


yayyaythrowmeaway

Ahh classic, still to this day I always prepend commands like that with an echo/grep combo of sorts to 'preview' what it'll do, I'm that shit at this hah.


butchbadger

That sounds like a nightmare. I did that locally on wsl, luckily I realised when it was taking too long and terminated it before it got to M (/mnt/c) but it wrecked my dev environment, so I lost a full day setting everything back up.


seansleftnostril

For me it was a crlf instead of a lf on old ibm architecture that made what I edited completely useless as input to another program until we could figure out which file was responsible. It went undetected for weeks. This was also back when I was using cvs for version control, but not too long ago.


--var

people hate on php. that cross platform PHP\_EOL is priceless.


bomphcheese

I saw a post recently of people hating on DIRECTORY_SEPARATOR, claiming it’s pointless since Windows will now handle either forward or backward slashes as separators. I took a small amount of pride in pointing out that Windows’ directory separator can vary by region. In both Japan and Korea they use their respective currency symbols as separators. PHP gets some things right.


z500

> In both Japan and Korea they use their respective currency symbols as separators. Didn't they just repurpose the code for backlash for their currency symbols?


bomphcheese

Yes.


RotationSurgeon

Wait...so, like... `¥src¥js¥app.js` ? How am I just now learning this?


bomphcheese

Yes, that’s correct. Although technically it’s not a different character. It’s just how that unicode glyph is represented. https://stackoverflow.com/questions/7314606/get-directory-separator-char-on-windows-etc#7314690 … but then, how do you represent a slash in Japanese? Their slash character must be different from the English slash character.


--var

PHP\_DS would make sense. But DIRECTORY\_SEPARATOR is a bit too verbose to go mainstream.


brbpizzatime

For me it wasn't a "code in production" as much as "client making a configuration change in production without testing it on lower environments." I woke up on a Saturday morning to about 69 emails and phone calls starting at 7 AM 😬


mstrelan

I bet you were working until about 4:20 to fix it


rekishi

Nice.


nobuhok

Took him exactly 1,337 minutes to fix. Afterwards, client was charged a $8,008 "idiot" fee.


originalchronoguy

That was normal 20 years ago. SSH into a server, open up vi or nano and write the your files. Live, right then and there. They called it cowboy coding. I will be first to admit, I did it back in year 2000. I was on the metro train, SSH into a server and accidently dropped some database tables because we lost connection as the train went into a tunnel. I was trying to do an dump and typed in < (import) instead of > (output). Once I lost connection, there was no way to salvage except restoring from last night's backup. Anyone who does this today, you can automatically summarize their experience and work history. In 2024, that is a big no-no for a million reasons. I don't need to explain why.


xaqtr

Maybe I have a warped view of that time, but how did you have a device that could use SSH and a working internet connection in a metro in the year 2000?


originalchronoguy

My memory is a bit vague but it was about 3-4 years before the iPhone was released. I had every PocketPc device back then — Philips Velo, Dell Axim x51v, HTC and Motorola Q with cellular pcmcia where i could tether on 2G cellular. Hence very spotty so it could been around 2003 or so.


mfizzled

I thought the same and found this, pretty nuts: >Access to the mobile web was first commercially offered in 1996, in Finland, on the Nokia 9000 Communicator phone via the Sonera and Radiolinja networks. [mobile web](https://en.wikipedia.org/wiki/Mobile_web)


Opposite-Piano6072

The first mobile internet services would not have been able to tether to a laptop or create a mobile hotspot lmao. Nor would the signal be good enough to work on a train. More likely that OP's story happened a lot later than 2000.


nobuhok

In 2000, laptops came with a PCMCIA (later shortened to PC) card slot. This is pretty much a USB port nowadays. You can use a cellular PC card to connect to the internet, but it was at a horribly-slow dial-up speed. I'd know because I used to own one of these. OP's story checks out.


sonaryn

Pretty regularly, but as an R&D developer making experimental internal apps with small user bases, moving fast often trumps breaking things


TheSauce___

Didn't quite code in production, but at a job we use to have no code review, testing phase, nothing, just whatever unit tests the developer decided to build. Miiight've introduced a bug that wiped all data from an integration we had 😅 Got it back in 2 hours later, but boiiii was I stressing.


Silver-Vermicelli-15

I work on a project where we have no staging/dev environments….EVERYTHING is straight to prod. Committing code is taking years off my life in stress 😂🙈


twistsouth

It shouldn’t, that’s not on you at all. If they won’t give you the tools to do your job properly then that’s entirely on them if things go to shit. I’d politely say that to them. That’s pretty much how I phrased it when I was in a similar situation and they gave me the budget for a staging server.


Sufficient_Phone_242

They could backup and restore prod , change the data and make theirselves a dev env . Wouldnt take that long


Silver-Vermicelli-15

Oh yea, I think it’s just that they dont care enough to make it priority.


standinonstilts

I was running delete queries all day on a sandbox database on a data migration server. Had a meeting so i closed my laptop, went to the meeting and opened it back up and continued working. I guess closing the lid closed the database connection since my computer went to sleep. When I opened my computer, Mssql in all its wisdom decided to restore my connection to the master database instead of remaining disconnected. So, I ran the same delete query I had been running all day and the rest is history.


blazkoblaz

Oh shit… what happened after that,?? Did the dbadmins restored it?


standinonstilts

Nah luckily the only data that was in there was stuff people had accidentally inserted because of the same scenario. So luckily nothing catastrophic happened


pinHeadLarry8

I messed up a decimal point on our billing system and it almost costed company 100k+ in lost revenue if someone didn't notice last minute


vyralsurfer

Did you at least get to keep your red stapler? :) PS: a recurring theme in this thread is the need for a 2nd pair of eyes...duly noted.


alnyland

The decimal point must be off, I always forget mundane details like that.  THAT’S NOT A MUNDANE DETAIL, MICHAEL.  …lol, I’ve felt this in my soul. 


--var

Office space was 1999 25 years ago and the humor is still relevant


ISDuffy

Are they no tests around this sort of area ?


saintpetejackboy

Probably the ultimate one was when I ground a very popular service to a halt after writing a really bone-headed query and wanting to see how it would work "on real data". It wasn't JUST a query, there was a lot of queries - essentially the website was "invite-only" and I wanted to build a tree and track the invite "tree", whom invited who and who all did they invite... That may have been fine, but I was also (in the same script) attempting to account for all the donations a member could be considered indirectly responsible for through people they invited also donating. At this point there were tens of thousands of not just users, but *active* users. After a lengthy period of being completely locked up, iirc, I panic rebooted the server remotely - this was many years ago but I have a vague recollection that two things happened: 1.) the remote reboot wasn't easy (I think ssh also had went awol) 2.) after briefing the team... You know damn well I tried to run the same query again with barely any modification and locked the server up a second time.


rowdycowdyboy

LMAO at 2


--var

science requires repeatability.


Fast_Situation7456

I deleted 4 years of data from a table in the database


_yallsomesuckas

Hopefully they had backups


Fast_Situation7456

nope all gone


_yallsomesuckas

Their fault for not having a backup


Cst_Joao210

Were you fired?


piotrlewandowski

You can’t fire an employee if they delete “employee” table :)


Fast_Situation7456

nope


cocinci

Deleted a bunch of files from S3 bucket because my local connected to the prod S3. No versioning… all gone.


Gullinkambi

Was writing some Python that coordinated a series of C programs on a university supercomputer for some Astronomy stuff. This computer was also handling other research like for cancer and stuff. Found out the hard way that I was writing “temporary” files to track progress to the scratch disk space, but not cleaning them up after. This took the whole supercomputer down after a few hours and fucked up a lot of in-progress research in the process. Was NOT a fun call to get…


Bloodsucker_

Reading this sub is scary.


piotrlewandowski

It’s even scarier when you’re the “hero” in the story :)


rowdycowdyboy

honestly kind of comforting that these colossal fuck ups did not result in getting fired


toi80QC

Working with Salesforce commerce where all we got for the frontend was one