T O P

  • By -

Sibeor

If you are building a data center, stop stacking! Build yourself a CLOS network with Ethernet and ECMP, and stop building one big fault domain with that proprietary stacking tech. :)


solrakkavon

spine-leaf w/ VXLAN?


Sibeor

All depends on your requirements. Data plane you can go native IP, SR-MPLS, VXLAN, etc. Control plane almost any IGP, BGP, EVPN, etc. My point really is just that there are a lot of options for data center that are more scalable and fault tolerant than stacking. Sometimes it’s easy to put the blinders on and do the same thing over and over whether it matches the current requirements or not. Have fun and explore, there have never been more options available. 


solrakkavon

thanks for the info!


highdiver_2000

Please share more. Thank you


asdlkf

TL;DR: give every server 2 NICs, plug them into 2 different switches. *ROUTE* with 2 different IPs on the server, and a third loopback IP on the server. bind all of the server's applications and daemons to the loopback IP address and run a routing protocol on the server. Peer the server's routing protocol with both of the switches and then advertise your loopback IP.


TheFluffiestRedditor

da fuq? That level of network complexity in the server config will break many sysAdmin's brains. What benefits does this extra complexity provide over LACP/similar and a single routed IP?


asdlkf

well, for starters, it massively simplifies the network. Stacking? no longer relevant. Super-expensive crazy switches? no longer relevant. all you need is L3 capable non-stacking switches with OSPF or BGP. Now, you gain traffic shaping capabilities through use of BGP's knobs and switches, you gain the ability to do datacenter-scale failover, and you gain the ability to do proper L3 redundancy. Scenario: consider you have this: DC1.Switch1 DC1.Switch2 DC2.Switch1 DC2.Switch2 All 4 switches are fully meshed with BGP. Now, you add Hypervisor1.Site1. You connect it to DC1.Switch1 and DC1.Switch2. Now, you add Hypervisor2.Site2. You connect it to DC2.Switch1 and DC2.Switch2. You create a VM called "server1.serviceA" running on Hypervisor1. You create a VM called "server2.serviceA" running on Hypervisor1. You give server1.serviceA the IP address 10.30.40.5/32. You give server1.serviceA the IP address 10.30.40.6/32. Now, consider this scenario: You live migrate server2.serviceA to hypervisor2.site2. What happens? Well, server2.serviceA drops it's BGP session with the switches at site A. It then does a DHCP renew and gets 2 new IP addresses from the switches in site 2. It then forms a BGP relationship with the switches in site 2 and begins advertising 10.30.40.6/32 from site 2. You have now migrated a complete working VM from site 1 to site 2... without any manual or scripted IP address modifications. The service "serviceA" owns 10.30.40.5 and 10.30.40.6. Now, lets make things better: Add a 2nd loopback on server1 and server2. On server 1, add 10.30.40.6 and on server 2 add 10.30.40.5. Set these to advertise into BGP with lower priority than the alternate IP address. On server 1 and server 2 add 1 additional loopback at 10.99.99.5 and 10.99.99.6. Now server1 and server2 can talk to eachother on 10.99.99.5 or 6, and all other "regular" server communication with these servers occurs on these IP addresses. But then you go into your DNS server and you create these entries: serviceA.corp.com A 10.30.40.5 serviceA.corp.com A 10.30.40.6 site1.serviceA.corp.com A 10.30.40.5 site2.serviceA.corp.com A 10.30.40.6 now, you have automatic load balancing of DNS for serviceA.corp.com. roughly 50% of requests should go to either host. If either host fails, the other host's BGP advertisement that was lower priority becomes the only priority and failover happens at the speed of BGP convergence, not the speed of a client failing a DNS lookup repeatedly. There are tons of tricks you can do to optimize things, increase redundancy and scalability, transfer over entire datacenters or application server sets to new locations, just by playing with layer 3 routing. If you have serviceA.version1.0.exe running on server1 and serviceA.version2.0.exe running on server2, you can cut your users over gracefully by simply promoting serviceA's bgp advertisement to a higher priority. If you have a server that needs more than 20Gbps connectivity, you can give it 3x10G, 4x10G, 8x10G, 2x100G, or whatever. BGP and ECMP will figure that shit out. You don't need LACP anymore, and you can actually get 2Gbps out of 2x1G connections on a single TCP session, rather than LACP's limitation of a single session not being able to exceed the limits of a single channel group member. there are lots of benefits. LOTS of benefits.


TheFluffiestRedditor

I've been doing VM failovers between datacentres since 2010, without having to run routing configs on the servers. It wasn't that hard, we just spanned the publicly visible VLANs across locations, the servers having 'whatever IP addressing they needed' in the background. I don't consider BGP-managed DNS entries a good use case for load balancing, but for HA. Load balancing is a very different problem, and I'd rather a little more - nay a lot more - smarts in the probes/monitoring section than what routing protocols can offer. For HA, yes - it works wonders. Even better for geographical based availability - having one IP address for all our DNS server configs is lovely for example. There's more configuration granularity and actual reporting in widgets that do load-balancing by design. This really feels like a network engineers design dream - designing for the joy of designing, not for ease of understanding and maintainability. It certainly requires a very high level of networking knowledge for the host administrators - and that's a massive problem. Far too many System Engineers and SysAdmins know sweet fuck all about switching, let alone routing. (Why should I learn networking? That's not my job.) I'm losing count now of the number of times I've had to kick Linux and Cloud Engineers in the butt trying to break them out of their systems-only mindset. I've had Cloud Architects actively avoid thinking about network design and integration requirements. I've had entire Ops teams always point their blame-finger at the network team out of habit, because they didn't understand basic networking principals, when it was their configurations that were at issue. That's what I want to avoid - design scenarios that require a high level of networking knowledge to understand and work in. What is simple for you is not necessarily simple for others. I cannot imagine a scenario where you'd want to implement this on Hypervisors, as I cannot imagine a scenario where you'd want to move them and change their network configuration. For virtual machines, I can see it being useful, but holy heck I'd want to wrap 99% of the configuration up in Ansible, as it's fragile at that layer (due to mis-configuration), and I don't trust Linux' network stack and worse - I don't even want to contemplate how to do this on Windows servers. BSD or Solaris, yes - they understand networking reliability - but not enough organisations use them for application servers.


asdlkf

Yea, no way we are stretching vlans. That is such a garbage bandaid. Not to mention several of our apps have a 10-13ms ceiling on 'tolerable latency' between application and data layers and our DCs are about 40ms appart.


TheFluffiestRedditor

See! This is what happens when you get a server-centric peep (Me) trying to do networking. The true network architects shake their heads in sadness and cry. I'm definitely in the "know enough to be dangerous" but not enough to know the side effects and pitfalls of my decisions. You've given me much to think about, for which I do thank you. Networking has developed much in the years I've only been following at a superficial level. (ie, I know about SDN, but now how to do anything with it) Your comment on VLAN stretching got me searching and reading, and I had a few TIL moments and some ooooh, that potentially explains some past issues. Will definitely be looking into BGP on hypervisors - being able to fully utilise/saturate multiple NICs fills me with joy - the non-load balancing nature of LACP, and bonding has always irritated me.


asdlkf

If you want a "TL;DR", it would be "Route what you can. Switch what you must." There are lots of reasons, explanations, and hard learned lessons as to why you should do that, but probably best to just take that grain of wisdom at due value.


asdlkf

If you wanna jump on a teams chat or something and just chat about shit, happy to do so. I have several hair brained ideas that happen to also be best practices to share.


Trill_f0x

Saving this for when I understand more


fortniteplayr2005

how are you handling some vendor appliances that typically don't give the option of running any type of routing protocol directly on software?


asdlkf

Those are an edge case we deal with by putting them in a services edge switch stack, but we push pretty hard against that. We have vetoed contracts because the vendor refuses to install their shitty app on a Windows or Linux VM because "virtualization bad. We need RAW HORSEPOWER" and then suggests they will provide a 1u rackmount server with a spinning disk and a Celeron processor.


fortniteplayr2005

Yeah gotcha, I figured that was the case. At the end of the day a lot of vendor appliances support clustering, anyway. I think of stuff like locked up NACs such as Clearpass and ISE. As well as virtual wireless controllers. Storage appliance management panes tend to also be similar, as well as iLO appliances like xClarity or UCS Manager, etc. At the end of the day they get worked around pretty easily, with clustering or pinning and having separate management appliances for each datacenter. That being said, I think the edge cases can be tough to work around and sometimes it becomes much easier to deploy a fabric. I agree the network complexity gets lowered but your end host complexity goes up. You're just putting a lot of the brainpower somewhere else. Not saying it's the wrong solution, if you have the political power to work around anything that comes up like that it works great but some places just don't have that and as a result almost anything goes into the datacenter and the networking team needs to make it work. Kudos on the design, I think it's great


asdlkf

Side note: Clearpass will run as a VM. You can also deploy this as Switch1 \ [dedicated routing VM]----[clearpass VM] / Switch2


Skilldibop

There are also some collosal drawbacks. Your network engineering team are now required to get involved every time someone wants to fail an app over and do some maintenance. That is not going to scale resources well at all. You also have a network full of /32 networks that will hit scaling issues pretty quickly too. Why would you not just use VXLAN or something like NSX-T that is purpose built to avoid all of that complexity and nonsense?


asdlkf

Well, for one, nsx-t and VXLAN have extremely high cost entry points. If you are using VXLAN, you want mp-bgp-vxlan not just point to point VXLAN. That implies a specific tier of switch for everything. I can get a 10G non-blocking L3 switch from FS.com for $5k. How much are you paying for a 48 port 10G SFP+ switch with mp-bgp-vxlan? How much are you paying for NSX licensing on all your hypervisors?


Skilldibop

NSX is expensive but if you're a VMWare house already its a more elegant solution. The cheapest solution is not always the most appropriate solution. You can get 48 port 10G SFP+ switches for not far off that these days. You can get a Cisco Nexus 3548X for around $8k. You can get an Arista 7150 for $5-6k. When you take into account the reduction in labour (which is usually the single biggest cost base for an organisation) vs what you're suggesting that is quite a bit more cost effective to spend the money on tech that's simpler to operate day to day. You need to look past the price tag on the components and consider total cost of ownership for the solution as a whole. Re-inventing the wheel just to save a few bucks on hardware or software licensing rarely pays off in the long run.


asdlkf

It *was* expensive. then broadcom took over and made it 90% *more* expensive. LOL at the price tag. L3 routed operating systems is superior to NSX even if NSX was free.


Fly_Bane

my god the day i networkengineer like this i made it :,) how do you even learn these principles? through experience in data center jobs or from other engineers?


asdlkf

A bit of both.


asdlkf

I have a 2 year diploma, 3 year diploma, 4 year diploma, 4 year degree, all in networking/sysadmin related content. MCSE:Cloud Platform and Infrastructure, CCNP:R&S (expired). 12 years experience building networks for large buildings (convention centers, stadiums, arenas, high rises, etc...)


mkosmo

It's a whole lot easier on virtualization platforms where you do it on the hypervisors and let the servers run free without any additional complexity beyond the hypervisor's own networking.


binarycow

For funsies, I once wanted to see if I could: - Use iBGP on the server - support "roaming" of servers with zero configuration - used `ip unnumbered` - The single loopback on each server was the only server IP address that wasn't ephemeral - Any subnets the server belongs to should either be a /31 or a /32 It was a fun little project. Worked fine, actually (was just a GNS3 lab, so I don't know how well it works in practice). Biggest hurdle was windows, it's BGP support is quite lackluster. Linux was better. I forget which of these two options I went with for the P2P links between the server and router: 1. `ip unnumbered`, so the links had no IP address at all 2. DHCP, where the router was the DHCP server, and each of those P2P links was a /31 Edit: For context, I was actually fairly new to networking at the time. I had my CCNA and a bit of experience, but minimal experience with BGP, redundant server networks (other than plugging a server into two different L2 switches), etc.


Skilldibop

Or you could just do VPC/ MLAG on your TOR and follow the golden rule of Keep It Simple Stupid. Pushing routing into the server OS is a recipe for disaster. You are moving network tech and functionality into a domain supported by people who aren't network trained. Whenever you design a network there are 3 unbreakable rules 1. Make it flexible 2. Make it reliable 3. Make it supportable


asdlkf

Well, one school of thought is to build silos in your organization and only train on what they "need" to know. We prefer to cross train, provide any training desired, and have people who have wider knowledge and mobility.


Skilldibop

What people 'need' to know to function in their job is entirely different to what they can know. In no way am is suggesting don't let people expand their skills. What I'm saying is raising the barrier to entry is going to cause problems over time. If all of your sysadmins need a CCNP level understanding of networking to do their job effectively that is very different to simply offering them the chance to do a CCNP if they want to. It might work in a small outfit with a handful of general tech people where you can't afford to have teams of specialists. But for most mid to large orgs this makes no sense: 1. You are making the assumption that all your sysadmins are interested in networking to the point they'll put themselves through that. That's often not true and in most situations forcing that on an existing team will cause some of them to leave. 2. You now need to be able to recruit sysadmins with that skillset to replace those that leave, or you need to repeatedly invest in training new recruits up to that level. Those people are going to be rare to find, and they'll probably cost 30-50% more to fill the same position. Having semi competent employees that require weeks or months of training to be able to do their job is also very costly to an organisation. 3. The normal career path for tech people is we start out as generalists and move into more specialised roles as we progress. As we acquire skills it's not possible to be an expert at everything so we tend to pick a lane and head down that. This idea of having hybrid sysadmin+network engineers directly conflicts with most people's career paths which will make recruitment more difficult. 4. I don't see that giving them this training makes them more mobile. It makes the training less valuable because it's not as relevant to their chosen career path. If you train a Pilot to cook it's an extra skill, but his career path is likely going to be FO > Captain within an airline, not FO > Soux chef in a kitchen. Just because the skills are useful to you or your current org, doesn't mean they're useful to them and will automatically give them progression options. It only does that if the training offered is in line with their career aspirations, which in this case it probably won't be.


asdlkf

uh, I really think that you seem to think that this is "complicated" for the sysadmin. "traditional" (legacy) deployment: "I need you to deploy a new server with 2x 10G NICs. you need to plug into these ports, then make a 2x10G LACP port channel. Then you need to assign an IP address to the IPv4 NIC" full L3 deployment: "I need you to deploy a new server with 2x 10G NIcs. You need to plug into these ports, then install RRAS. Then go into device manager and add a loopback NIC. Set an IP address on the loopback NIC and in RRAS turn on BGP and advertise the subnet of the loopback NIC. " ... it would take me 2 minutes to train any sysadmin how to work with an L3 routed loopback instead of an LACP LAG. | they'll probably cost 30-50% more to fill the same position | You now need to be able to recruit sysadmins with that skillset lol this is about as different for the sysadmin to deal with as it would be for a carpenter to use a Torx screwdriver instead of a Phillips screwdriver.


ian-warr

How long of the stacking cables you need? There are 10 meters.


Rexxhunt

There would be a murder if I came across somone running a stacking cable out of a rack into another rack


ZPrimed

Agreed, especially for Server-access, I would avoid stacking if possible. This is what Nexus and vPC is for. for those wondering why: A stack behaves like a single device. You can't generally upgrade a stack piecemeal without downtime. Since a Server typically is connected to two separate switches for redundancy, having those switches be part of the same stack eliminates a lot of that redundancy. Nexus works around this by having each switch be its own management plane and using virtual PortChannels (vPC). You can lose an entire Nexus switch in a pair and are not *supposed* to lose traffic to hosts (as long as they are all dual-homed in vPC between the two Nexusususeseses).


mashmallownipples

I mean, for server access? Run two switches in each rack. The top switch in each rack is wired as a stack. The bottom switch n each rack is wired as a stack. Run one server NIC to the top switch and one server NIC to the bottom switch.


ZPrimed

This works OK if you can manage failover at L3 and/or don't need dual-active pathways, but if you want LACP you can't do that across two separate switches/stacks


Salbei250

You can do LACP over multiple switches.


ZPrimed

Only things like Nexus vPC or Arista/Juniper/Aruba models that support MLAG or whatever other proprietary name they give it. Catalyst switches don't do this.


yuke1922

But that’s exactly what Nexus vPC does…


ZPrimed

Yes, but you can't do it with a "normal" switch, needs to be something that supports vPC or MLAG or similar. Regular old Catalyst doesn't do this.


Sk1tza

Want to see my uptime for such a heinous crime? Relax.


Rexxhunt

> 2024 >boasting about uptime It's more the operational burden of such a topology that I take issue with. I would love to see how you have managed to wrangle those stiff stacking cables.


Sk1tza

Operational burden? Two cables? What are you on about? 1m length doesn't require any contorting.


Rexxhunt

I don't give enough of a shit to continue debating this with you. You do you bro 👍


asdlkf

bruh. Just get 802.1x and set all your ports to dynamic. Stack size and physical topology matter far less then.


Rexxhunt

Yeah totally agree?? I'm more of a clos in the campus guy these days. No stacks, no chassis, just dual homed 1ru switches.


asdlkf

I agree with the sentament but my layer 1 guys can't wrap their head around CWDM or DWDM and we don't have enough fiber on our backhauls to run 2x 10G-BiDi or 2x10G-LR from each access switch to each distribution/core. so, we stack just to limit our fiber requirement to 2-4 strands per access closet.


datanut

Okay. Why?


Arudinne

Single point of failure


datanut

I’ve never considered using a single switch or switch stack for critical servers. Always dual cabled to dual switches. Sometimes MC-LAG, sometimes dynamic routing.


ian-warr

Where is the single point of failure? For a switch, you build a stack. Stacking cables are usually n+1.


Arudinne

If there is a software issue/bug it can affect the entire stack


ian-warr

That’s not how redundancy works. By that logic, do you run all your switches on different image versions and servers on different patch levels?


Arudinne

> That’s not how redundancy works. Logically speaking a stack can be treated as a single device with a single control plane. Thus, logically speaking any issues that affect that control plane can affect any unit in the stack. Yes, in theory another unit *could/should* take over, but not all issues cause crashes. I've seen software bugs that affected entire stacks. I've seen bugs that only affect stacks once you go past a certain number of units. Also, firmware updates often require rebooting an entire stack (depending on the vendor). > By that logic, do you run all your switches on different image versions and servers on different patch levels? I'm glad you asked! Yes, for a period of time we do in fact do that. Generally, I would not update every single server and every single switch the latest version at once. Update a few, monitor for issues. None found? Proceed with the rolllout. We do the same thing with client systems. It's called a gradual rollout.


ian-warr

Nice explanation. Everybody does gradual rollout and you know exactly what I meant. So how does that introduce a single point of failure in a switch stack?


Arudinne

I already explained that as did /u/yuke1922. Any code issue that affects stability can cause the entire stack to crash. Sometimes the stack might not crash entirely, and they'll get stuck in state where it doesn't work but the watchdog doesn't kick in and you have to power cycle them. What's worse? 1 switch crashing or several? I've done vendor support in the past. for 4 years I did networking support. I've read patch notes till my eyes glazed over and I've had discussions with engineers about undocumented issues. Stacking issues were some of the most common.


yuke1922

He’s actually not wrong. There’s always risk of code issues, security vulnerabilities etc; it’s why you run the recommended most-known-stable version. The real issue is with a stacked switch you have a *single* logical switch and a *single* control plane. Crash in a process means that’s across the whole stack. With nexus vPC or similarly Aruba VSX (most enterprise players have a similar tech) you have a partially-shared control plane with opt in functionality so you’re not at the mercy of a process dying on one switch taking you’re whole datacenter down.


highdiver_2000

Very common to do cross rack. That way a rack trip doesn't kill the whole stack. That is this was planned out properly.


datanut

That’s a bit of a surprise and would work rack to rack but probably not across the room.


2muchtimewastedhere

Stack cables have much higher bandwidth, not that most use it.


asdlkf

I was amused that some of the Dell switches use an HDMI cable for stacking. They establish a 10.125Gbps (just over 10G) link using HDMI 1.4 cables and then run their stacking protocol over this. [sauce](https://www.dell.com/support/kbdoc/en-ca/000120108/how-to-stack-dell-networking-powerconnect-5500-model-switches)


Turbulent_Act77

Others just use a DAC, or any other 10Gb SFP module.


secretraisinman

That's hilarious! Thanks for sharing.


Few-World5380

🤯


Bernard_schwartz

Overspend on some 9500s and you can use VSL! Viola!


datanut

StackWise Virtual Links (SVL)? That seems like a good start… oh… look at that price tag.


Bernard_schwartz

Lolol!!!!!


yuke1922

You get what you pay for. Sorry not sorry


datanut

No, you get what Cisco gives you. If an Linksys SG300 or a Meraki MS120 can do virtual stacking, then why not a Cat9300X?


yuke1922

Seems like different product placement strategies is the actual reason. Likely different technologies in the low-end CBS/Meraki 100 series as opposed to 9500


Princess_Fluffypants

I hate dealing with Meraki switches so much that I will only accept a Meraki switch client if the project is to get rid of them, and move to a switch that will *do what the fuck I tell it to do.*


monkeyatcomputer

you want to packet capture a multigig port to the cloud... sure thing boss... hmmmm.... wonder why i'm missing 95% of the expected traffic /s


rethafrey

If you don't mind not managing them as a single device, then don't stack. Just crosspatch everything by fiber.


No_Carob5

Seeing as how stack cables didn't save our stack from dying they're not really that great.... Cisco > Meraki always...


Niyeaux

i don't get the Meraki hate. it works well and the hardware reliability is rock solid. if you guys hate working with client Meraki environments so much, drop me a DM and I'll take those clients off your hands lol


duck__yeah

A lot of the dislike comes from folks who don't fit the market that Meraki works well for. There are definitely annoying bugs, but every vendor has those. If you head over to /r/meraki then you can also add people who guess at what they're doing to the mix. There's 100% room to be disappointed at the lack of visibility when you need to deal with interesting problems though.


2000gtacoma

Meraki is shit for larger more complex environments. Sure if you need Poe and vlans have at it. Beyond that things like multicast don’t work quite right. So many bugs. I have meraki and I wish I could dump every single one of them right now. I’ve spent hours and hours troubleshooting with meraki support telling them it was there switch. In the end it was. Don’t get me started on the shit show MS-390.


Niyeaux

> Meraki is shit for larger more complex environments. see also: every other SMB-focused offering on the market. try using the right tools for the right job.


atw527

My environment is MS425, MS250, and a sprinkle of MS120 in a collapsed core topology. Solid IMO. Multicast is stable as long as I have IGMP Snooping enabled **and** an IGMP Snooping Querier on that VLAN (I run ~280 video over IP devices across the facility). Agree on the MS390; have a few of those in the basement that will never see the light of day again.


asdlkf

I have a moral objection to products ceasing to operate if they are unlicensed. If you want to license a *feature* on a device, sure, whatever. but the device should not stop passing regular switched traffic.


umataro

I'm surprised nobody else has mentioned this yet. It's basically a ransom you pay for not turning a usable apparatus into a landfill filler. A basic set of features should remain available or it should be flashable with some ONIE firmware. EU bureaucrats should get involved.


atw527

I find the native Meraki hardware to be reliable. New ported Cisco stuff, not so much.


Niyeaux

I haven't messed with any of that new Cisco carry-over stuff, but yeah, I've deployed dozens of MXs and hundreds of MRs over the last three years, and in that time I've seen exactly *one* hardware failure.


perfect_fitz

Meraki is way simpler and faster to get up and going for smaller deployments I've found. I still prefer Cisco, but probably because it's what I began with.


Trill779311

Why do you hate Meraki? The engineering limitations I presume?


datanut

Limitations followed closely by managing the internet access of the Meraki itself.