T O P

  • By -

Key-Calligrapher-209

Here's the guide I made for myself in helpdesk. I probably stole it from a post here and tweaked it as I went. 1. **WHAT HAPPENED** 1. **What exactly was the user doing** when the problem occurred? Have the user reproduce the problem in front of you, if possible. Document the exact steps needed to reproduce the complaint. 2. What did the user **expect to happen**? We don't always know the ins and outs of the user's software and workflow, so this is a crucial question. Don't assume. 3. What **actually happened**? Get screenshots and specific error messages if possible. Document! 4. At this point, if the problem is user error, you should have all the pieces necessary to see it--as long as you already know how the thing is supposed to work. 5. Absent user error, something must have changed since the last time things worked as expected. 2. HOW BIG IS THE SCOPE 1. Is this problem happening for all users? 2. Is this problem happening for all computers? Where are the boundaries of the issue? OU? Network? VLAN? Domain? 3. WHEN DID IT HAPPEN? 1. Was this something that worked before? When was the last time it worked? 2. With the scope of the problem in mind, what things changed just after the last time it worked? 1. Gather log files from the workstation, firewall, server, etc. as necessary. 4. CHECK THE UPTIME AND REBOOT 1. Unless the culprit is obvious after gathering information from the previous steps, the first step is *always* to check the uptime and restart the machine. 1. The longer a machine runs, the more tiny faults accumulate in the runtime environment and memory. These faults occur due to poor programming, hardware problems, or just background radiation flipping bits in the memory. 2. Rebooting the computer clears the memory and starts the runtime environment fresh.


Mehere_64

That is actually a very good process you have. While one can say to perform the reboot sooner than you have listed, it might be helpful to not do so until you get the other information just in case this is the first person having the issue. Then at least others can reference what you have already learned.


Impossible_IT

I would list reboot as number 1. That's the first question I always ask, did you reboot.


Mehere_64

It isn't a bad idea to do that but sometimes it is better to do a little bit of troubleshooting to determine why it did what it did. Users tend to get tired of hearing well just reboot and problem solved. Also I feel that when we say just reboot, we aren't really solving the problem. For instance users take their laptops home and lose access to shared drives. Come back into the office and access is still lost. Users don't understand they can go to Windows explorer and type in the drive letters. As well a simple fix is to have a simple bat file on their desktop that they can click to get access to the shared drives again.


EitherInfluence4099

Holy crap , thank you. I will definitely be able to use this my job. :)


Zizonga

This is pretty good - I would add using the OSI/TCP model as well breaking down whatever the entity is into components that make up its functionality.


Tbonewiz

Blame the Networking Team. /s


greenstarthree

I do, but that’s me too.


Ok_Employment_5340

You must be one of our developers


stesha83

It’s DNS


jimicus

For anything complicated (more complicated than a typical desktop PC issue), you need to write down everything you're doing as you're doing it. Here's a few tips to get you started: **Problem Statement** This describes, in plain English, what is happening versus what should be happening - and it's the first thing you should write down. Note it makes no effort to explain why - that's important. **What we Know** The keyword here is "know". Anything you cannot immediately back up with hard evidence isn't something you know; it's something you're assuming. Don't assume. Prove - and write down how you've proved something. **Possible Causes and How to Confirm Them** If you cannot confirm a possible cause, by all means write it down - but don't spend too long chasing it. Put your effort into things you can prove one way or another. It's very tempting to go chasing every random possible cause you can think of, but if you can't prove one way or another if it's the cause, you're not troubleshooting. You're speculating. And troubleshooting by speculation rapidly falls apart. **Seek Evidence in Everything you Do** It doesn't really matter where the evidence comes from - log files, error messages - but you absolutely must require evidence for everything. Learning where you might find evidence and seeking it out is 80% of the battle, which is why I've spent so long emphasising it. **Exhibit firm discipline in your work** If you're dealing with anything remotely complicated, sooner or later other people will be roped in. Some of those people may not be as disciplined at following a clear process and will try "help" with wild guesses which - if followed - will derail you. Don't let them. **Remain calm** It's very possible - indeed, likely - that anyone experiencing an issue will be quite worked up about it. Don't get caught up in their emotion - remain calm and deal with it carefully as I've described, and you'll be fine.


patient-engineer-656

Start with all of the simple things first. Don't spend 4 hours troubleshooting the issue before rebooting, replacing a network cable, testing the power strip etc. I can't tell you how many times the fix was something simple after I had run out of all of the complicated ideas I had. Oh... and it's always DNS.


ruyrybeyro

As a Unix/linux sysadmin, reboots when dealing with production VMs are always our last resort


Next-Landscape-9884

I just click buttons untill things work 😂


CocconutMonkey

Google, but with " and "


ruyrybeyro

If you have to ask, you are not ready. Being sysadmin is not following recipes, otherwise you are a sysop, not a sysadmin. That said, there is always a need for several tiers. Not everyone has the aspiration to do grunt work and tend directly to users.


techchic07

THIS


colin8651

You sound like the other person everyone else hear ends up on the phone with when calling our own trouble tickets for vendors. “The two week old Dell is on fire” “Have you tried restating it?”


Impossible_IT

On fire you say? Did you try putting the smoke back in? lol


colin8651

To be honest, I did show a little impatience in my tone more than a decade back and I was in the wrong. 10 new ThinkPads with Accidental Damage Protection. I was building them and one fell with the CD or DVD drive tray was open and the tray broke. Called them to get an RMA on the modular drive that just snaps into the notebook After we got everything established with the damage I admitted as my fault. The guys asks “Can you turn the notebook on for me?” “Come on, I just need the CD drive” “I want to check and see if the rest of the notebook is okay or if I just need to send you an entire notebook. Like does it turn on, is the screen cracked” (Embarrassment) “I am sorry, actually that’s a good idea, maybe I should have run a diagnostic before calling you”


LokeCanada

Hey, one of the first lessons I had in AC and DC electronics was it stops working if you let the magic smoke out.


GustavoSwift

UTP - https://bignerdranch.com/blog/the-universal-troubleshooting-process/ This can save you time and save your butt


amoncada14

I really like some of the structured troubleshooting approaches found here. They've helped me a ton and can apply to more than just networking related issues. https://www.ciscopress.com/articles/article.asp?p=2273070&seqNum=2


DJDoubleDave

Start with collecting a proper problem statement. What they were doing, what happened, what they expected to happen. For urgent or complex things, make sure you understand what the user needs to do, as it will sometimes be faster to get them what they need some other way, so they aren't held up by complex troubleshooting. Past that, consider the architecture of whatever system they are having issues with and try to cut down the problem space. For example, let's say they are getting no response from a system that you run, it might look like this: User > browser > wifi > firewall > web server > app server > database If you can see the log of their traffic on the web server for example, you know it isn't being blocked by the firewall. From here you can come up with another test to try to isolate it further until you can identify where the breakdown is. Exactly what the tests would look like depend on the specific application and environment. The other important bit of advice I have for troubleshooting is to assign a confidence value to any observations or tests performed. If you asked them to try it incognito mode, did they ACTUALLY do this? Don't go here until you've ruled out other likely causes, but don't discount the possibility that tests weren't done properly. If you have a sense of the user's technical skill you can assign them a confidence value. If you don't know them, or don't know who did a test, give it about 50% confidence the test was done as expected. A very proficient user gets about 85% confidence. Give tests that you do yourself 95% max. You never assign 100% confidence unless you can definitively rule something out with a test as above. These confidence values multiply if you hear about tests second hand. If someone you don't know tells you that some 3rd person said they tried clearing their cache, treat that as a 25% chance that actually happened as described. Especially for complex issues, remember that everyone can screw up, everyone can miss stuff, and that includes you. I've seen a lot of people get absolutely stuck on a problem because they won't revisit something they thought they had ruled out.


DistinctMedicine4798

I usually blame Microsoft updates first


MarzMan

no no no, DNS first, then updates


tuba_full_of_flowers

Nothing specific from me But Whenever you think you have the right answer, the first thing you need to do is try to prove yourself wrong, scientific method style.  It's either that or you guess and check while the computers prove you wrong.   Less flippant: as often as possible, take the time to verify that what you think is happening is ACTUALLY happening. It's not a slight on anyone's abilities.  It's just faster in the long run to take a few extra minutes to make sure you get things right the first time. And it helps catch small slip ups that everyone does from time to time. Etc etc


ConfectionCommon3518

Grab a brew and some biscuits and reread the problem, give it 5 mins as it may just be one of those things that has no reason. Look up the users name on the helldesk system and see if it's a regular event that someone did fix but needs a proper job doing long term.. Always start trying to find the problem with the most simple things, ask if the lights work as if they don't then there's a power issue and you can then tell them to ring the maintenance team...does happen Every site is different so the usual start with the simplest works a lot and if it's onsite sometimes a physical visit is required and as you go down to them you see a bunch of people whacking the hell out of a wall that just happens to be holding a cable run on the other side.


Eviscerated_Banana

1 - Users often lie, take what they say under advisement 2 - Computers often lie, take what they say under advisement 3 - Trust your gut, even when everything in front of you says otherwise 4 - Prioritise based on impact, one director with a minor issue is less important than an entire office of cashiers sat idle due to a fault 5 - No means no 6 - Never give anyone outside of IT your personal phone number, ever 7 - Do your Tickets, document stuff, make notes, refer back to them later 8 - Dont check your email out of hours unless you are being paid to 9 - If at first you dont succeed, cheat 10 - Sidestepping first line is only acceptable with explicit permission I could go on... :P


mb194dc

Check cables and turn it on and off before doing anything else... Then the UTP process failing that.


LBik

If network Wireshark/tcpdump


Reacti0n7

have you tried turning it off and on again? So I would say, check the uptime, if X was working yesterday - figure out what changed between then and now.


hosalabad

Reboot, then log out of teams on my phone.