Blueprinting Your Database Landscape
Crafting a map of your database servers can help you build better systems, from increased security to solid disaster-recovery plans. Take these steps to get your server network in order.
The scene is this: you are an Architect, Technical Manager, or DBA taking over an existing database group at a medium to large company. It would be surprising if someone handed you a 50-page document summarizing all aspects of this core Information Technology group: How many database servers are on the grid? Where are the backups going? What applications are using these database servers? What are the skill sets of the people in the group: You need the answers to all these questions and more, and this article will help you construct such a blueprint.
I have broken this guide into five sections: Inventory, Disaster Recovery, Security, Scalability, and Looking Ahead. There are a few critical items to look at immediately, so first you must ask yourself these questions: Are all the databases getting backed up, sent off site and restorable? Has someone tested each and every backup? Where is the sensitive data stored, and is it secure? (If Social Security numbers or credit card numbers are exposed, close that door as fast as possible.)
Although volumes could be written about each of these topics, I will cover these at a very high level. My goal is to give you a context in which to manage one of your corporation's most valuable assets: data.
Inventory
In a large company determining, how many RDBMS servers you have can be difficult. If you're lucky you have one data center and all the servers are sitting on a raised floor with their names printed on them. More likely you have a mainframe or two, some mid range mini computers, and multiple servers with a variety of operating systems spread throughout the world. Finding the mainframes and mini-computers should not be that difficult, but finding the desktop versions of Oracle, DB2, or SQL Server can be like an Easter egg hunt. The worst-case scenario is you're looking at production databases running from under someone's desk.
There are tools that can help you track down these "renegade" servers. SQL Server comes with a command-line tool named OSQL.exe that, when run with the -L option, will list out all the servers it can find. It will not report on 100 percent of them, but it's a start. With Oracle you are looking for the Listeners; check out http://www.petefinnigan.com/tools.htm for a list of free and commercial tools to help you identify and secure Oracle instances. A commercially available product from Application Security Inc is App Detective, which can find all the database servers on your network.
The goal is to create a list that gives you an overview of every database server. Here is a list of potential items that would be helpful for your roster:
Name: (name of the server)
RDBMS: (Oracle, DB2, SQL Server, MySQL, and so on)
RDBMS Version: (What version of the database engine is running?)
Operating System: (Windows, Linux, OS/390)
OS Version: (Windows 2003, Redhat 7.0)
Storage Fabric: (SAN, DASD, NAS)
Storage Capacity: 100 GB
Storage Free Space: 40GB
Location: Computer room in Cleveland
Production/Test/Development: Production
IT Owner: Jim Smith, Finance, IT
Sensitive Data: Yes
Mission Critical Data: Yes
Hardened: NO
Available From Internet: YES
After you have located the entire RDBMS server population, the next step is tracking down the consumers of the data on these servers. Keep in mind the large variety of sources that could be accessing the databases, from Web applications to desktop ODBC connections and office productivity tools such as Microsoft Office, Louts Notes, and so on.
A good starting point is to examine the logins and connections being used from the RDBMS engine. If it needs to connect, it must have a login, either through an operating system account that has been granted access or a local database specific login contained within the engine itself. The task will be difficult and time consuming, but you will need to attach either an application, process, or person to these database connection points.
Determine the data that is leaving the building. In these days of identity theft this is not only good customer service, but there are legal ramifications as well. California is among a growing number of states that require disclosure of any sensitive data that might have been compromised. The easy part should be finding out where the backup tapes are going and ensuring they are encrypted. If you don't have both of these implemented, drop everything and make that happen first and foremost.
You need to get an understanding of, and assign ownership to, the legitimate, ongoing transfers of data to business partners and constituents. This is a little bit like herding cats. Data can leave or enter the building in so many different ways, whether it through e-mails, flash drives, ftp file transfers, notebooks, mobile deviceseven iPods. If someone can read it, they can take it.
Ask these questions: How many database professionals are on staff? How much experience do they have? Are there any data modelers on staff with good business knowledge? Do you have OLTP, OLAP, Database developers, Business Intelligence, backup and storage expertise? The tricky part here is that many IT professionals claimeven believethey have expertise in many areas. In most cases they are not being deceitful, they simply don't know what they don't know. If the group is small enough. an interview with each staff member will probably give you a good idea of what their strengths are. It's also a good chance to hear what they want to achieve in their careers, and most importantly, in what areas do they want to excel. If they don't like data modeling, chances are they will not be good at it.
Identify the core skills already on the team, and the skills you need to develop in the areas of administration, security, transaction processing, business knowledge, and OLTP and OLAP development. Don't forget to look outside of the immediate database team. You might find valuable database-related talent in the developer community and some of these professionals have a good deal of specific and detailed business knowledge.
Disaster Recovery
Not all databases are created equal. Most will probably not support daily business operations. Some databases, such as data warehouses, might be fairly static, with data loaded periodically. We categorize all of our production databases into four categories based on risk tolerance. The critical line of business databases have their transaction logs backed up every five minutes to a separate NAS drive. In the event of a system failure we can restore to within five minutes of the event. Other databases are placed into one of three categories: two-hour, four-hour, and 24-hour backup schemas.
Next, examine where these backups are going. If the backups are stored on a NAS drive within the same computer room, you are protected from a server failure, but anything that shuts down the computer room has put you out of business. Shipping the tapes off site protects you from a building failure, but only within the timeframe these tapes are away. If you send the tapes sent off site weekly and you have a building failure right before the tapes are set to go, you have lost a week worth of critical data.
If the backup has not been tested, it is not a reliable backup. It is a good idea to test the backups on a consistent schedule. A backup that has not been tested is not a viable backup. Keep track of restore times; it will give you a good idea of how long it could take to restore that 100 GB data warehouse in an emergency.
But what will you do if a critical database server gives out? Do you have spare boards, network cards, and disk drives to plug into the hot swappable chassis? Or will someone be scrambling down to the local electronics store or cannibalizing another server looking for an HBA card? Identify and purchase critical hardware components that the team may need in the event of a failure.
Check your mission-critical data and decide whether it is considered sensitive data. Mission-critical data is information that is used for day-to-day operations, but it is not necessarily sensitive. It can be both transactional data (OLTP) and analytical data (OLAP), as some data warehouse reports are critical to running a business. For critical data, security might not be as big a concern as reliability and recoverability. In today's world, a high-availability solution is almost mandatory to safeguard against a single point of failure.
Finally, review your disaster recovery plan. Unless you are starting from the ground up, there is more likely some type of disaster recovery process taking place. It might not be a formal plan with published contact information, an escalation path, and recovery scenarios, but someone is hopefully doing backups and at least sending them off site. This is a good start, but you need to work toward a formal plan.
You can think of the basic protection scheme for disaster recovery as two circles: One is time and the other is distance. The time circle represents how frequently data is backed up to a separate medium other than its source. If you are backing up a critical database once a week, the circle is a week; if anything bad happens within a week, this week's data is gone. You want this circle to be as small as possible. The distance circle represents how far away those backups travel. If they never leave the building, the danger zone is within the circle around the building; the farther away from the building they get, that's how much safer they are in the event of an earthquake, hurricane, or other capricious act of nature. You want this circle to be as big as possiblein another state, if possible.
Security
The term used for a server that is secure, updated with the latest Service Packs, and has tightly controlled access is "hardened." The goal is to harden every server, but it is absolutely critical that the servers where sensitive data lives are hardened. You must pay special attention to any database that contains Social Security numbers, credit card information, and financial and medical records. The DBA staff must ensure that access to this data is locked down. One of the toughest areas to maintain this security is in the test and development servers. Developers frequently need access to copies of the production databases. The DBAs must have a method for scrubbing some of the more sensitive elements.
One clue to the level of security implemented is the number of individuals that have system administrator's privileges. If this number is high, particularly in production, then security is lacking. Determining who has access to what resources is not always simple. Most RDBMS engines allow access to operating system accounts as well as to database specific logins. My theory is that people are much more likely to share database logins over their network accounts, which are usually tied to their corporate identity (specifically their e-mail). These accounts are less secure, more anonymous, and therefore less desirable. All of the accounts connecting to the RDBMS should be examined and given only the bare minimum access.
One good practice is requiring all applications to access data only through stored procedures. If this is in place, broad table-level permissions (select, insert, update and delete) can be removed for everyone. This can help with uncontrolled access and also aid in the fight against SQL Injection attacks. (For a good explanation of SQL Injection Attacks visit http://www.securitydocs.com/library/3348.) A side benefit of this approach is that all the SQL accessing data can be viewed and tuned by the DBAs. If your developers are writing their SQL embedded within the application, typically it would be out of reach of the DBAs.
You must give special care to anything available directly from outside the corporation, especially from the Internet. These servers should be hardened as soon as possible. It's a good idea to turn on auditing for these servers to see who is attempting not only to connect to the databases, but also to the server itself. I have gone as far as setting up paging in the event of a failed login attempt. There are tools available that will examine the RDBMS engine and tell you of any potential security holes. Microsoft has two free downloads available, Baseline Security Analyzer and Best Practices Analyzer, which examine a SQL server environment and report potential security threats.
Scalability
Before you can understand changes in the demands placed on your servers, you need to gather performance baselines. Each of the key database vendors offers toolsets to examine every facet of the RDBMS engine, from buffer pools and tablespaces to locks and procedure recompiles. Although this all can be extremely helpful, you need an overview of each machine's stress levels of the core resources: memory, processors, and disk drives are the first step in building a baseline. For each machine you need to understand its average load. Is it typically running at 15 percent or 95 percent? Are the disk drives taking a beating or is memory in short supply? All of this presupposes that the database engine is the primary process running on that machine. In my experience most shops have severely under-utilized database servers capable of handling much more than their current workload and would be good candidates for consolidation.
Finally, what are the elements that will prevent your database applications from performing well, now and in the future? Typically, the performance of a database application is governed by the design of the data model and the efficiency of the consuming TSQL. A well designed model and an application with proper indexes and well written SQL can support large databases with a surprisingly small amount of hardware. The bad news is that redesigning a poorly written database application is expensive and time consuming. Sometimes the pragmatic solution is to throw hardware at the problem. I recommend giving each database a letter grade based on overall design. If an "A" application is performing poorly, the next step is to find out what hardware resource is the bottleneck: the processor, memory, or disk drive access.
Looking Ahead
Moving forward, you need to look at the big picture. What areas are lacking? Do you have strong skill sets in vital areas, from development to security and administration? Do you have failover on critical application databases? What does the storage fabric look like: would you benefit from adding an SAN array? How is team morale, and what is frustrating team members? State-of-the-art technology does not build a premier database team; committed, trained, and challenged professionals do. How confident are you that the team is heading in the right direction?
Also, set up a schedule for reviewing all production servers. The environment for database servers can change rapidly. Microsoft publishes security updates monthly, and Windows operating systems have a shelf life of two to three years before the next great OS hits the shelves. Make it someone's job to keep tabs on the server population. Have them keep an eye out for critical security patches and notify the team of any changes to the production environment.
Next, make your disaster recovery plan a priority. If you already have a formal, well tested plan, make sure someone is keeping it current, and ensure the plan is tested on a regular schedule. Depending on the critical nature of your business, test everything from a local hardware failure to simulating a major power outage in your city.
If there is no plan in place, make it somebody's job to make a plan, and let everybody know they need to be a part of it. Comprehensive disaster recovery plans are not a trivial task. Unfortunately you will need a cross-section of IT departments and representatives from the Business community. Many boundaries will be crossed: DBAs, network, application development, security teams. The business people simply expect all the hardware and software to be running nonstop 365 days a year. Keeping the business members educated will set their expectation levels in the event of an emergency.
Finally, what is the business telling you about the future? Be sure to listen to the business. They will not tell you directly what type of database support is required going forward, so it's your job to listen and derive that information. If RFID (Radio Frequency Identification) is on the board, you know you will need to ramp up on capacity. RFID can produce huge amounts of inbound data. If hand held devices are a part of your business, training in small footprint databases will help you prepare for the future. Is the company aggressively pursuing acquisitions? If so, be prepared to marry together disparate database technologies.
Most important, be proactive. What can you give to the business that they might not be aware of? A business person might not be aware of the rich information available through business intelligence and data mining technologies, so show them. These are specialized skills that you might need to acquire, but the results can make you and your team stars.