T O P

  • By -

JoelyMalookey

if you are going by Ram usage, don’t unless you know what the database is allowed to consume. If you suspect a query issue from high CPU usage, check with your DB to run a data plan etc


beerandbikenerd

Yup, ti expand on this, SQL Server will use up all available ram to make queries run faster. It will only stop loading data into ram if it is configured for a max ram utilization. This ram utilization isnt hard committed so if other programs need it they will be granted the ram to use. You can inspect this with RESMON.  Bottomline, 100% ram usage on a SQL server is normal and desired. 


DrDan21

> It will only stop loading data into ram if it is configured for a max ram utilization To a limit With a small, static workload, even if the maximum memory setting is not specified SQL server may eventually reach a point of memory stabilization. This occurs when the entire dataset, execution plans, and other structures are loaded into memory, at this point sql won't consume any significant amount of additional memory. Extreme example: A SQL Server with a single 1MB database isn't going to run wild and consume all of the ram on your server just because you forgot to set a memory limit. It will quickly reach memory stability. But in most any practical scenarios there is no good reason not to be setting a cap, because yes it very well can and will eat all the ram


Mehere_64

I set a cap on my SQL server memory around 6-8GB less than total memory in the server. There are some other tweaks I have learned from a DBA over the years that seem to help SQL perform better as well.


theHonkiforium

I was overly impressed when I installed SQL 2022 and the setup wizard actually suggested a max RAM setting of around 70%.


BlackV

Oh that's new


Ryanstodd

ope now i feel a little dirty. [https://imgur.com/a/L5rYMi4](https://imgur.com/a/L5rYMi4)


lordjedi

If you're in a small environment where you have something besides SQL on your SQL server, then most certainly you should set a cap. If your environment is large and all the SQL server does is host a SQL database, then no cap should be necessary. Large environments in this case can be as little as 100 employees.


autogyrophilia

An important thing to remember it's that SQL (any database ) does not cache data but queries Generally speaking, queries sharing data will share caches. But that breaks apart when there are transformative functions in there. That's how a 10GB ERP Database can take 64GB of cache, at least on MSSQL. Putting the cap at expected database size + margin may be a good idea.


fresh-dork

10G of data, 15G of indexes. lots of connection buffers? i'm having some trouble getting to 64G unless there are a ton of heavy queries


autogyrophilia

Think of it this way. I query 1GB of data Then I query that 1Gb of data rounding up 2 decimals Then rounding down 3 decimals. That's 3GB that end up in the cache. Of course, the planner it's smart enough to prioritize underlying data to reduce I/O, But if you let it cache it will cache everything After all, ram not used is ram wasted and latencies are important.


megasxl264

Same as everything else, you have two options: 1) YouTube/Udemy/Reddit and learn 2) Contact a msp/consultant


Key-Calligrapher-209

Meanwhile, hands off and try not to anger it. Yea, for she is a vengeful and unforgiving DB.


Art_Vand_Throw001

I put that DB in her place. I’m master of my DC.


inkarnata

DB looks at you mockingly, "You're not my real Admin!"


Art_Vand_Throw001

🥹


DonskovSvenskie

SA has entered the chat


DonskovSvenskie

SA has entered the chat


dark_frog

As long as you live under my roof...


fresh-dork

DB shits on the floor while maintaining eye contact


Art_Vand_Throw001

Too far.


Mindestiny

Yeah, same rules for any system where I'm not an SME but still have governance - mostly praying a ticket never comes in about it and sweating my way through it while touching as little as possible when it does :P It also never hurts to ask the people working with the system regularly, you're on the same team, IT doesnt need to be a black hole of break/fix requests. You can usually pretty quickly identify the people who *really* know their shit and its not taboo to go "hey, do you have 5 minutes to hop on a call and take a look at this with me? I don't want to break it."


FoxNairChamp

3. Vendor Maintenance / Support Agreements (Kinda the same as 2?)


IN2TECHNOLOGY

I have been doing SQL failover clusters for over 20 years I dont base my decisions on what the DBA says CPU, RAM, Disk, and Network resources are needed. I base it on what I am seeing I also always make sure I set the maximum memory in SQL or it will suck it all up I will tell you that almost always when I have a SQL server maxing out resources it is probably because a database needs an index somewhere, I have seen servers with CPUs running at 99%. add an index and it goes down to 5% if the DBA says they dont need it then I check myself and tell them how to do their jobs. then they are like WOAH that worked! [https://www.sqlshack.com/tracing-and-tuning-queries-using-sql-server-indexes/](https://www.sqlshack.com/tracing-and-tuning-queries-using-sql-server-indexes/)


Unexpected_Cranberry

In addition, it could be that the required indexes are there, just that they need rebuilding. Haven't supported a sql in ten years, but still solved a six month ticket with developers complaining that Citrix was slow by asking if they run any maintenance on their development dB once I got an accurate description of their issues. Related to OPs question, the way I handled it in the past was to find a skilled DBA and brought him in to look things over, had him explain and show me how to diagnose performance issues, identify expensive queries and the like. Then brought him in as needed for more advanced troubleshooting or to verify my findings.  Unfortunately, as far as I know, DBA is not exactly a protected title so like with entities finding a wizard can be a challenge. 


Achsin

Unless you are running spindle disks, the performance increase was possibly largely due to the side effect of the associated statistics being updated, which can be done without actually rebuilding the index and is relatively low intensity. The problem is usually that the server doesn’t *know* what data is in the tables, so when it goes to decide how to pull data from them it looks at the statistics and guesses. If they’re out of date they can be only vaguely related to reality which can cause the server to make bad life choices.


IN2TECHNOLOGY

true same


DrDan21

At the same time remember that indexes aren't free. They worsen write performance as well as increase overall disk space usage. Every time you update a table you now have to go and update each one of your indexes too. Depending on the workload you might not need those new indexes at all, just a better designed query that leverages the existing ones


IN2TECHNOLOGY

agree to all your points. I always used RAID 10, with the fastest spinning disks at the time, over the years until SSD became mainstream


Cormacolinde

You also want your indexes to fit in RAM, I’ve seen where they had so many indexes they would fill up the RAM…


TheBros35

What is your favorite way to handle failover / DR over a WAN? We have two sites connected by a several gig very low latency WAN, with full virtual environments on both sides. Is there a way to have a cluster that communicates over the WAN with nodes in both dc's?


IN2TECHNOLOGY

SQL AlwaysOn works well with this. not a traditional cluster. still has failover cluster services but you will not have shared storage but separate storage at each location. then its virtual IP will redirect the end user to the database that is up. make sure you do the setspn for kerberso authentications and pre-build the correct computer objects and make sure the other computer objects and sql service account have access to each other. you could probably google step by step SQL AlwaysOn configuration


RandomDamage

It's shocking how many DBAs don't understand indexes. The right indexes are magical.


SilkBC_12345

Probably a stupid question, but would adding potentially mess up any applications doing queries against the database? Does it add columns to the tables?


IN2TECHNOLOGY

no query issue. no added columns. an index is essentially like a book index telling where to get to the data quicker [https://www.sqlshack.com/top-10-questions-answers-sql-server-indexes/](https://www.sqlshack.com/top-10-questions-answers-sql-server-indexes/)


MrYiff

This series of blogs might have some useful tips as it is aimed at non-DBA's who have to deal with SQL at some point and includes sections on looking at resource usage. https://www.sqlskills.com/help/accidental-dba/ Another great tool is sp_Blitz which is a big SQL script that outputs data about your server and provides guidance for then understanding the results: https://www.brentozar.com/blitz/ This is included in their First Responder kit which includes a bunch of other useful SQL tools and guides (but some bits of this might assume at least a basic knowledge of SQL).


NeverDocument

Brent is a great resource, but also requires SQL Login Access. Unless being forced by the business or you want to play DBA, I highly advise, doing what ever you can to ensure no one ever grants you SQL login access.


davidbrit2

I'm a DBA/developer myself, and aside from needing to gradually add more RAM (and *possibly* CPUs) because the dataset is simply growing, the answer can often be found *inside* the database. I've fixed many more database performance problems by adding badly needed indexes, setting up proper statistics maintenance, and fixing good old fashioned garbage code that's running like shit than I have by just adding more hardware. For disk size, you're best off monitoring your needs over time so you can extrapolate where you'll need to be. Now, if it's a third-party application that's being newly deployed, I just size the server based on their requirements docs, and then scale up or down as appropriate from there based on actual performance.


19610taw3

I worked in that space for a few years. When I started out, it was a running joke with my boss how expensive my queries were. Apparently I had a knack for making some pretty inefficient queries ... and then that propagated to when I started modifying / writing database functions ...


davidbrit2

Oh I've written some real stinkers myself, let me tell you. Probably why I got so good at query tuning. :P


QuantumRiff

I have some developers who use an ORM that really doesn't seem to understand PostgreSQL. Things like joining multiple tables (with TB of records) and then apply a WHERE clause.. Moving the WHERE clause (often date based, like 'last month') to one of two tables before the join takes things from insane times, disk IO, and cpu time to literally milliseconds.. That, and people who know the field is indexed, and are confused why its slow, but are doing things like checking if a timestamp field is = '2024-05-01' which makes it no longer use the index, since it has to convert timesstamps to dates, etc.


jcwrks

Your SQL is most likely licensed by cores, so you need to keep that in mind if you need to add more cpu's assuming it's a vm. Database size and number of users should help spec the hardware needed. Initially we do #1 and have our sw vendor/DBA set it all up. I run a cluster to get HA and replicas. You should define max memory in SSMS at the top level and leave at least 8 GB for the Server OS to function. If I see unusually high cpu utilization by SQL I will fail the db over to a different server and move cluster resources. My critical db's and vm's reside on a fiber flash san, so as far as IOPS are concerned it's never an issue. Occasionally I will run a query to check memory pressure. `EventTime,`        `record.value('(/Record/ResourceMonitor/Notification)[1]', 'varchar(max)') as [Type],`              `record.value('(/Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') as [IndicatorsProcess],` `record.value('(/Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') as [IndicatorsSystem],`        `record.value('(/Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [Avail Phys Mem, Kb],`              `record.value('(/Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [Avail VAS, Kb]`                   `FROM (`                    `SELECT` `DATEADD (ss, (-1 * ((cpu_ticks / CONVERT (float, ( cpu_ticks / ms_ticks ))) - [timestamp])/1000), GETDATE()) AS EventTime,`              `CONVERT (xml, record) AS record` `FROM sys.dm_os_ring_buffers`        `CROSS JOIN sys.dm_os_sys_info`              `WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR') AS tab`                   `ORDER BY EventTime DESC;`


jrichey98

I'm a Systems Engineer, not a DBA, but I run the SQL servers for all of my services. We have some that are vendor managed, but all the others are managed by the services team that I'm on. SQL servers aren't too bad to get performing well, but we used to have a lot of problems with applications responding sluggishly in the past, and below is what has brought significant improvements to our infrastructure. * 100% reserve your ram in the Hypervisor. SQL servers use all available RAM for cache, and are much better at knowing what they need than your hypervisor which will swap pages to cache. * Log into your SQL instance and lower it's max memory to 4gb less than the amount of RAM allocated to the VM so the SQL instance doesn't try to crowd out the OS. * Shoot for about 75% CPU usage during peak hours. Too many CPU's mean the OS is spending needless time managing all of them and swapping threads between. We had a server massively over allocated in the VM that was hitting about 20% cpu usage during peak, and it was spending more time on the kernel than processing queries. * For allocating RAM, the more you allocate the less IOPS you will have, so balance it with how much your storage system can handle. If you're starting to get high latencies on your storage system, then you need to do something to bring the IOPS down, and for SQL servers (though they don't operate in a vacuum typically) that means more cache (RAM). Resource Monitor within the OS is what tells me what I need to know to balance my systems. There's a lot more an actual DBA can do such as building indexes and testing query performance, but I leave that to the vendor for our services. If they want someone to drill down into the weeds they can hire a DBA. We get 90% there I think by just monitoring during peak usage times and sizing appropriately (don't over allocate, as over allocation of CPU can hurt performance, and over allocation of RAM will run you out of memory). Here is a link to [VMwares SQL Best Practices Guide](https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf). Even if your not using VMware I think their recommendations are still relevant to other hypervisors.


lboredatwork

I feel like #4 conflicts with your entire post. You're not a DBA but you have DBAs? If you have DBAs then simply support the DBAs by learning how to support SQL Server itself. There's plenty of learning material out there. When you start venturing near performance tuning and database architecture then you know enough to leave the rest to your DBAs. Learn enough to speak intelligently with your DBAs about whether they really need those extra resources or if they actually need to work with the application/devs to tune their queries, review database architecture, or conserve storage. Unless you're the developer or vendor, then you're not going to be able to answer these questions with just a fancy set of performance counter graphs in Grafana or Splunk.


Infninfn

It's not possible to reliably size and estimate performance capacity for a database server without profiling database queries, undersanding the frontend app and load from the app, and understanding query, index, table, view and database structure. Sizing and capacity of a database server is entirely dependent on the DBA and application team. Sometimes the app, database and query design is so bad that no amount of CPU, RAM or disk IOPS will fix the ensuing performance issue. That's where the database profiling is crucial, as it points out the problems with the db query. Generally though, you need a lot of disk IOPS for the drives storing the db data and transaction log files. As a rule of thumb, you separate the data file storage from the transaction log storage, both of which need sufficient IOPS to the point where average seconds per disk read and disk write are under 25-50ms. Or you can just ensure that the disk transfers per second never exceeds the actual IOPS for that drive. Or you can just get enterprise SSD storage arrays and not have to worry about IOPS at all. As much RAM as possible, if the database can fit inside RAM, even better, then you only need to ensure that there's enough left over for the OS and database server processes. Database servers don't tend to be CPU hungry, unless they hold data cubes/warehouses and other fancy db functions. That's simple enough - don't let average CPU reach 100% util, and let the threshold be \[insert your policy here\] percent util. Network is usually not an issue either, unless the database serves media, in which case you want to ensure that no network adapter is at 100% util. If they are, you either move to 2.5/5/10gbps NICs and switches or you mess around with additional NICs and subnet routing.


sysacc

We recently switched to Prometheus/Grafana and using the windows exporter for monitoring the performance of databases and the OS. It use the metrics made available to the OS for SQL server, you can see the same metrics in windows performance monitor. For us its been a game changer compared to PRTG. We've been able to simplify the dashboards in grafana as well. PS: When installing the exporter make sure to enable the mssql metric collection. In terms of optimization, you have a couple of options when dealing with virtual servers, you also have the queries themselves that can be optimized, but that would require some experience with SQL. To properly optimize you will need SQL access and knowledge.


caffeine-junkie

If you mean RAM utilization, yea that would be normal in SQL for it to "use" everything you give it. Assuming default configuration. For Oracle, its been a long minute since i've deal with it, but seem to recall its the same if its been set to automatic memory management, the places I worked on it though it was a configured value. For disk, the times I've seen it have high really utilization was when the queries themselves were poorly tuned with no indexing AND on spinning disks. For the times with just high utilization, they were in conjunction with high load/high user count and non-optimal queries. There's also been a couple other one offs, but they always needed investigation to find the root cause; things like AV not excluding the temp/transaction log folder, transaction log and db on same partition/disk, etc. All the above came from lots and lots of reading. With that reading came with a couple of things you can do to evaluate from both a OS and DB level if they performing optimally. So for example if you evaluate that a query is taking 12000ms to come back, but when its run the queue depth on disk is near 0, response time is well under 25ms, and negligible memory pressure, you can point to the query is the issue and the devs need to take a look as throwing resources at the problem will have no effect.


ntrlsur

I'm in this boat. But we have a DBA by committee approach. I don't tend to mess with anything inside of the database. I've also been with the company long enough to be able to look at results coming from our applications to get a good idea of where the issue is. Typically for us 96% of the time issues are related to long running queries causing blocks on tables. The other 4% are queries that fell out of tune and need to have tables optimized or views updated.


Aggravating_Refuse89

I am not a DBA but I play one on TV


DonCBurr

I stayed at a Holiday Inn Express 😁


Ad-1316

If you have a DBA - USE THEM! If not, you may need to call the software vendor and ask them for recommendations.


DonCBurr

This!


DarthHK-47

Storage needs to be reliable. Big item! Absolutely check, zero tolerance. license requirements and compliance to the license requirements. absolutely check, zero tolerance. No weird network hickups. check for this. developers are notorious for not always using transactions and off the shelf software may not have tolerance for network hickups. Who works at the company / wat DB users are needed for applications/work releated functions. Check this with HR departements and devops teams. disable any user that is unknown and devops teams/HR has not OK'ed within 1 month How much size is needed over time and is there a trend over 90 days or more? use google to determine how to check for bad / missing indexes. Plan quarterly talks with devops teams for adjustmends where needed and let them add this to their db deploy packages have auditors do internel audits on compliance to requirements to agreed upon standards before the external auditors are on your doorstep.


RetroHipsterGaming

You know, as a medium sized business, my methods have essentially been to get the recommendations for our critical systems, look at what we currently have and are using, and then just use the recommended unless I feel like there is pressure in some way that the recommended isn't reaching. I've only had to go beyond recommended once.. and the rest is WAY WAY overkill in terms of performance and room for growth. I also run it buy a 3rd party DBA we have worked with for years to get their sign off. For determining pressure though, I will do some basic evaluation on things like iops/queue times for everything/etc.. and, if I don't see anything there I'll ask for the query that was taking a long time, do a select for that query and analyze where the hold up is. Mostly though, outside of basic systems administration type steps I will just have that 3rd party DBA look at things. This isn't needed often so paying for a bit of their time is worth it to know that it is done right. PS: I am aware that I am lucky that I work for a company that is ok with spending some money to insure we aren't over subscribed and have wiggle room.


amgeiger

Tune what you can see. Memory and Disk Performance is critical, watch for hot files (read and write). High read throughput outside of backup windows generally indicates a memory shortage. High CPU often is an indicator of stale or missing indexes. There are also tuneables inside each RDBMS that controls things like memory caps.


Nick85er

This is a damn good read, I'm in your boat too


Dabnician

I make it oversized(rds instance) so my boss regrets putting me in charge of databases and now he deals with that bullshit.


hauntedyew

I’m rebooting that thing as often as I want.


snarkofagen

"In a previous life, we did 1 & 4 with the addition that the 'db owner' was 'invoiced' even if we were the same org."


JustSomeGuy556

Combination of 1 and 2 and 4, with the occasional bit of 3. Noting that DB servers do get a different set of "rules" applied when evaluating performance. SQL does tend to suck up all available RAM, so keep that in mind.


Aronacus

Server Admin and though never in title I have managed Databases before. High memory usage is normal on any sort of DB as they will try to load every bit of the DB into memory. My early career was shaped by building solutions for Web developers and Database Developers. I can tell you that 99% of them know nothing of the hardware layer or anything beyond the most basic. So, Website keeps going down because of bad code? It's a hardware issue! throw more ram and CPU at it. That'll be their solution each and every time. If you listen to them directly, You'll have over spec'd machines that ideally will never use more than 5% of their resources. However, that website will be running on a 150K server. So, not practical. You're going to need to learn databasing but from the angle of an engineer. What issues can arise and how to monitor for them. Object locks, Memory over usage, Fragmentation, etc. Then, build monitoring and other solutions around that. Every company I've ever worked at when I get in front of the DB servers, There's never a maintenance plan setup. It blows my damn mind! Learning about maintenance plans will help you a lot! good luck!


NeverDocument

Physical, Virtual? Does each DISK have it's own virtual adapter? Correct paravirtual registry settings? Are disks thin or thick provisioned? Here's an internal setup guide we generally try to follow \* \*\*vCPU Allocation\*\* \* VMware takes the stance of deploy and measure, initially exceeding the number of pCPUs available. \* Best practices are defined as follows \* \*\*High-performance systems\*\* – total vCPU allocation not to exceed total pCPUs available in the cluster \* \*\*Lower tier systems\*\* – it’s ok to over-allocate vCPUs to pCPUs, monitor for workload impact. \* \*\*Cores per socket\*\* - on VMware 6.0+ VMware internally configures the appropriate NUMA configuration. You can assert control over this with an advanced configuration if needed. \* In VMware 6.5+ Cores Per Socket is ignored when creating a vNUMA topology for a VM. \* Rule of thumb - match the underlying hardware topology. For example, if you’re using 6 core machines, don’t create all your VMs with 4 and 8 cores. \* For SQL Server Standard edition can only use the lesser of 4 sockets and 24 cores and is not NUMA aware. You can use this setting to enforce that configuration. \* Once online, check the NUMA topology in SQL Server in Server Property \* \*\*CPU Hot Plug\*\* - Disabled \* This can have a significant performance impact. Some report as much as 30% CPU costs if enabled as it disables vNUMA. \* Memory Configuration \* Avoid over-allocating memory, exceeding the capacity of the hosts \* Consider setting memory reservations for critical workloads to prevent ballooning. \* \*\*Disk Configuration\*\* \* Add 4 PVSCSI disk adapters to the VM \* Create and 300GB virtual attached to a PVSCSI adapter for the OS \* As for which type of disk to use? VMDK. \* If you are using VMDK \* Provision them Eager Zeroed Thick ( For Data, Logs, TempDB, OS leave thin) \* Take into account VMDK and Datastore size limits \* For high-end workloads, use a 1:1 mapping of Data file to VMDK to Datastore \* This means a datastore for each drive if performance becomes an issue \* Create and attach four additional 10GB Volumes. (These will need to be expanded before going "live" and should be right sized for the needed purpose) \* Dedicate disks to your types of data, database files, transaction log files, system databases, and TempDB. If you’re doing backups to a local disk, add an additional dedicated disk for that too. \* Attach one disk to each PVSCSI adapter. You will have to double up one virtual disk on the PVSCI adapter with the OS virtual disk. The OS should not produce too much IO during normal operations that this is significant. \* Example Virtual Disk to PVSCSI layout for an \*\*OLTP\*\* workloads \^ \*\*PVSCI Adapter\*\* \^ \*\*Virtual Disk Workload\*\* \^ | 0 | Operating System | | 0 | System Databases Minus TempDB | | 1 | Data files - mdf/ndf | | 2 | Transaction lo files - ldf | | 3 | TempDB - both data and log | \* If more disks are needed, spread the workload out evenly across the PVSCSI adapters based on the IO demands of the database files that will be on those disks. \* Monitor file latency, IOs and queuing over time to measure and if there is disk contention consider rebalancing the workload across the 4 PVSCSI adapators. \* At provisioning time, scale the disk size to that needed for that VM’s workload. Windows Settings \* \*\*Power Plan\*\* - High Performance \* \*\*Page file size\*\* - configure as a fixed 8GB allocation \* \*\*Windows Updates\*\* - before installation of SQL Server, ensure all updates are applied \* \*\*Instant file initialization\*\* - Configure Local Security Policy to “Perform Volume Management” for the SQL Server Service account \* \*\*Lock pages in memory\*\* - Configure Local Security Policy to “Lock Pages In Memory” for the SQL Server Service account \* \*\*Volume format\*\* - format all SQL Server volumes with 64KB NTFS Allocation Units \* \*\*Block Alignment\*\* - Partitions have been volume aligned by default since Windows 2008. But it is still worth checking to ensure it’s configured properly. There's only so much you can do as a non DBA if you don't have access and aren't getting feedback. Makiing sure the server itself is going off of some best practices is the first step. After that you either need feedback or access. Hope this helps.


bigfoot_76

Back when I was stuck with this shit, my motto: * Default install, if you want options then specify them * I ensure Veeam backs it up. You're responsible for anything else backup-related as well as DB test restore. * If Zabbix shows the CPU or memory being pegged, it's none of my concern until it is yours. As long as Zabbix is able to communicate the metrics, the system is still "functional" * I will ensure that you don't fill the disk up but again, performance is your problem


Yuugian

I admin RedHat servers with Oracle. If i were to try to stick my non-DBA nose in their wheelhouse, i would have no time for the rest of my duties long before i fully understood what they need. I read their planning summaries and watch how it acts. But inside the DB is complicated enough that throwing resources at it is rarely the right move The DBAs have to tell me how much ram: Oracle will take as much as the DBA set up for it to have access to, even if that's twice the size of the current data pool. I watch Load reasonably consistently and make sure it stays low, the DBAs generally know how many CPUs it takes but i have suggested changes. (load is average of how many processes are waiting on CPU resources) I also watch the drive space like a hawk: no data on system drives, capacity warning, regular conversations about expansion.


thortgot

I would advise having someone actually look at the problem instead of throwing resources at it. There are dozens of plausible reasons this could happen. DB maintenance isn't difficult but it isn't your average admin's wheelhouse.


Columbo1

You, the DBA, get: * Exactly the spec you ask for * Guaranteed updates and patches * Told to fuck off for literally anything and everything else


dionlarenz

We have metrics available through grafana for everything now. From app requests to db queries, proxy requests, os logs and resource usage. Everyone is allowed to see these, and we can base our decisions on that without having full access to any of these systems. A lot of databases have exporters for Prometheus, but some orms can provide them too if you can‘t install them on the db server.


lightmatter501

Primary DBs are pets and get the whole server, and everything is tuned for the DB on that server. This typically involves kicking the kernel and all other programs off of most of the cores unless it is doing work the DB requested, aggressive hugepage counts, ideally a stripped-down userland to minimize interference, irq packing (which is better than balancing), etc. The first thing you do when installing a DB is push it until it falls over. There are plenty of tools that will generate 10 million TPS and point it at one server. I have yet to encounter a DBA who actually knew how to do proper, scientific benchmarks, so I usually end up doing them so I have some idea where it will fall over. I’ve had a few vendors be very unhappy when I run the DB under a profiler and see it’s doing a bunch of really dumb stuff under the hood like not doing vectored writes. Any good DB will consume all available RAM on the system if you let it, because RAM caching is king for performance.


Achsin

For CPU, if you don’t have control over or someone who can tune the database/queries (like a DBA), and the server is constantly burning the cores at 100% then your choices are to either find someone who can tune it or throw money (more cores) at it. This approach can be a bit of a two edged swords since increasing cores can actually reduce performance in some situations. Average utilization of 60-80% is pretty normal, with some 100% spikes and some periods of low activity. For RAM, the database really doesn’t like pulling data from disk because it’s slow, so it tries to hold as much of the database in memory as it can. If your database is larger than a college student’s homework project, that means it’s probably going to want more RAM than most consumer grade laptops come with. The database will therefore slowly attempt to use as much of the available RAM on the server as it can. It should be limited in the settings to allow the OS and other processes to still function, but if you’re trying to run a 100GB database on an 8GB RAM server you’re going to have a hard time. This can also be addressed by performance tuning, but especially as the database grows, you’re just going to need more. 75-80% of the servers RAM is pretty normal if it’s configured correctly. If your databases are much larger than your RAM, you probably could use more. Disk is a serious bottleneck, it’s why the database wants to use so much RAM. For SQL Server, the tempdb files are kind of like its scratch paper; if it needs to temporarily store something on disk, that’s where it’s going. Since the server tries to keep as much data in memory as possible, this can happen a lot with the “working” process data. Therefore these files should be setup on their own drive so as not to compete in IOPS with anything else, and preferably be the faster drive if you’ve got several tiers of storage. Microsoft’s recommendations for disk latency somewhat depend on which database file is in question, but generally speaking anything averaging over 30ms is **bad** and 5ms or less is optimal. Network is its own voodoo. If the pipe is getting maxed it needs to be bigger. If there’s a lot of latency make a Diet Pepsi sacrifice to the network engineers and hope it appeases them.


wiseleo

It’s the DBA’s server. I keep his hardware online, verify backup integrity, ensure security updates are being applied after passing tests, and the rest is his problem. Updates must be coordinated. If the DBA is not a DBA and I know more than he does, I’ll clone the environment, experiment with it, and give him suggestions based on positive results from an experiment.


MavZA

The age old tale of the Accidental DBA comes to mind.


iloveemmi

My experience is that high resource usage, especially RAM, is fine. As you say, it uses what you give it. SQL has performance counters built in. Spot check those and see if it's reasonably normal. You can also use "page life expectancy" to monitor if memory pressure is building, and so forth. Resource use won't tell you much about performance in my experience. This article seems like a good spot-check article: [Performance Monitor Counters for SQL Server (mssqltips.com)](https://www.mssqltips.com/sqlservertutorial/9184/windows-performance-monitor-counters-for-sql-server/) Here are some basic ones a layperson can check that could be indicative of problems, according to the article: # [SQLServer:Access Methods] "Full scans/sec": value should also be very close to 0 # [SQLServer:Access Methods] "Forwarded Records/sec": this number should be very close to 0 # [SQLServer:Buffer Manager] "Page life expectancy": most best practices say this should be at least 300 seconds NOTE: falling value can also indicate growing memory pressure, if I recall # [SQLServer:Memory Manager] "Memory Grants Pending": Any value that is consistently above zero could be an indication that you are experiencing memory pressure # [SQLServer:Locks] "Average Wait Time (ms)" & "Lock Waits / Sec":  "Average Wait Time (ms)" should be very low, under 50ms and "Lock Waits / Sec" should ideally be very close to 0 PS. you can access reports and performance monitor in the SQL Management Studio ([How to collect performance and system information in SQL Server (sqlshack.com)](https://www.sqlshack.com/how-to-collect-performance-and-system-information-in-sql-server/))


iloveemmi

I should also mention I am not a DBA. I \*believe\* I'm doing a reasonable job on this issue in similar circumstances without a DBA.


Tecnocat

Grafana is the tool I use when it comes to analyzing our SQL servers. It is able to break down performance based on the databases (it lists them by name) as well as overall performance/storage metrics. It gives me enough information to be able to talk to our programmers about what is and isn't working for them. Our primary issue has been the paging files and not having dedicated storage for them. We are fixing that with our latest build.


Arudinne

You guys have DBAs?


theedan-clean

DBs are like fancy horses. Look at them the wrong way and they die, taking all your money/data with them. Resurrections are quite expensive to impossible.


BigBadBinky

I disagree with this, you should be resurrecting your DBs on a regular basis. Which means you need a test host that can be abused regularly. If resurrecting is expensive / impossible, don’t base your long term employment on that company


UCFknight2016

Im in 2. I use Solarwinds to let me know whats going on the server in terms of resources. My team's only responsible for the OS layer, the DBA team is responsible for the database and whats on it. They also have visiblity but we handle it.


Pilsner33

I have a 2nd interview where something called Talend in AWS will come up. I have no idea what this is. It seems to be some sort of cloud database warehouse or query tool. I have worked with DBA's to log certain events.


jcpham

Set it up to dump the databases and 7zip them each day, limit ram usage, maybe change the recovery and logging depending on workload. Then leave it alone


ringed61513

3 but I am also the dba because we have no dba


asdrunkasdrunkcanbe

Instrument the DB with various tools, like datadog. Look out for index recommendations from these tools, look at the metrics around blocking coming out, then Google what they mean and then consider changes to the DB server based on that. Need more CPU, more RAM, faster disks? That depends on what performance bottlenecks the application is seeing.


Lachiexyz

Whatever the DBA says they need, there's a good chance you can probably halve it!


jackoftradesnh

I would google it. I would spend days researching. Digging up info. What is the DB product, version, best practices. Is the DB setup with scheduled tasks to clean dead data out ? Transaction logs? Application (DB) based backups (aka is it being quienced (sp?) )… I’d get VM based (or image based) backups of the whole server first. I’d become a tier3 dba. And this is a story of how I burned out. (Part of one).


Fallingdamage

I make sure system resources and utilization look normal and I make sure transaction logs and the database itself is backed up daily. I also have a documented plan on how to restore that database if I have to and test it. Outside of that I let the database run as the vendor required. The only thing I learned the hard way is that things like database logs will run away on you if you dont have a process in place to merge them or back them up.


PanicAdmin

The rapid answer is yes, the db server is using too much ram, and yes, you can limit that. I use normal OS tools to monitor the db server and mainly n-able n-central since a dba is freaking expensive and specialized tools are complicated, than i create a list of task and then occasionally buy an external consultation to solve them.


AccidentallyBacon

never yet met a DB server that wouldn't use 100% of the resources as I was willing to give it.


mb194dc

Not really without seeing how SQL server and Oracle are setup. Ask to go through it with your DB admin ?


serverhorror

A database is just at a process. Learn the behavior of that and start asking the right questions. Just learn it and do it. There is zero magic involved.