One of the key advantages of web applications is that they can be deployed as a hosted service and accessed over the Internet rather than needing to be locally installed at a customer's site. Over the years I've helped numerous clients build web applications of this sort. When building such an application there are bound to be questions on the best way to model and store data from the different customers that access the hosted application. Do you co-mingle customer data in a single database with a single schema? Do you use a single database, but separate customer data into different schemas? Or do you create a separate database with separate tables for each customer?
These three data architectures are examined and discussed in detail in the article Multi-Tenant Data Architecture. The authors introduce three different multi-tenant data architectures, which they name:
- Separate Databases - in this architecture each customer's data is stored in a separate database. The databases may all be on the same database server or they could be partitioned across multiple database servers. This approach provides for maximum isolation of customer data. If we were building a hosted application using the Northwind database and had three customers then with this approach we would have three databases - Northwind01, Northwind02, and Northwind03 - and each database would have the same tables, views, stored procedures, and so on. Customer 1's data would all be located in the Northwind01 database, while Customer 2's data would be over in the Northwind02 database.
- Shared Databases, Separate Schemas - SQL Server 2005 introduced the concept of schemas, which offer a way to group a set of tables. With this approach you can have a single database with one schema for each customer. Returning to the Northwind example, with this approach there would be a single database, but there would be three schemas - Customer01Schema, Customer02Schema, and Customer03 schema. Each schema would have the same set of tables, views, stored procedures, and so forth. Customer1's product information would be found in the Customer01Schema.Products table, whereas Customer2's product information would be stored in Customer02Schema.Products.
- Shared Database, Shared Schema - here we have only a single database and a single schema. There would be only one Products table. To differentiate one customer's products from another we'd need to add a NorthwindCustomers table that would have a record for each customer and then add a NorthwindCustomerID foreign key to the Products table (and to the other pertinent tables).
The Multi-Tenant Data Architecture examines these three approaches in much more detail with screen shots, T-SQL snippets, and more in-depth examples, and is definitely worth reading.
I'd like to talk a little bit about my “from the trenches” experience with multi-tenant data architectures. First, a little background.
In 2003 I started work with a client we'll call Acme Medical, which was building a hosted, ASP.NET medical software application. This application had been in existence for a couple of years by this point, but as a Microsoft Access application that was installed and run locally from a couple of hospitals and clinics. I joined the project with the aim of moving the application from Microsoft Access to ASP.NET and SQL Server. We chose to use a Separate Databases architecture. I continue to work on this project today. Originally, there were two customers, each with a few thousand records. The data model contained maybe 20-30 tables. Today there are more than 15 customers, each with hundreds of thousands if not millions of records. There are nearly 350 database tables, hundreds of ASP.NET pages, and several automated backend processes.
Another client, Acme Printers, was a prominent player in the print shop marketplace, selling in-house applications to large-scale print shops. I helped this company build an online, hosted version for order placement and fulfillment. We used a Shared Database, Shared Schema data model. In a nutshell, there is a Customers table with a CustomerID primary key value uniquely identifying each customer. Every other database table has a CustomerID field that identifies what data belongs to what customer. When a user signs on, we look up what customer the user is associated with and then store this CustomerID in session. This CustomerID variable is then used in other pages on the site to pull back the data pertinent to that customer.
I have never used the Shared Database, Separate Schema approach. I recently spoke with Michael Campbell about this topic, and he shared the following advice: “I'd recommend against using shared DBs and separate schemas. In my experience that almost NEVER works out as advertised, adds all sorts of difficulty in terms of disaster recovery, management, and isolation, and really doesn't offer any worthwhile benefits. It also becomes absolutely insane in terms of managing permissions/security as well.”
The factors that should influence whether you go with a Separate Database or Shared Database, Shared Schema architecture are not technical, but rather are regulatory-, security-, or business-related. If you find yourself deciding to use one architecture over another because of some technological reason you're probably asking the wrong questions and evaluating the wrong criteria.
Here's how I tackle this problem. I start by assuming that I'm going to use the Shared Database, Shared Schema architecture, as it's usually the best fit for the types of projects I work on. It easier to setup and implement and test and debug and scale up (to a point) than the Separate Database architecture. For the majority of web applications, the Shared Database, Shared Schema architecture is the best approach when weighing just the technological- and development-related factors.
Sadly, in the real-world there are many factors that outweigh technological ones. While the Separate Databases architecture has more friction associated with it than the Shared Database, Shared Schema, it does have certain advantages in terms of security, privacy, isolation, and so on. When evaluating which architecture to use, I like to ask my client the following questions:
How important is the security and privacy of the data?
For the medical software, security and privacy is paramount. Customers (the hospitals using the software) are entering social security numbers, lab test results, diagnoses from specialists and psychologists, and so on. The data is rife with very personal and sensitive patient information, not to mention information about the health care practitioners, including social security numbers, license information, and disciplinary actions.
Are there any regulatory reasons for choosing one architecture over the other?
Depending on the line of business, there may be governmental or professional regulations that require a clean separation of data. For medical providers in the US there is a regulation known as HIPAA
, which defines guidelines for patient privacy. It's been several years since I've explored HIPAA, but from my recollection HIPAA does not require that a Separate Database
architecture be used, but using one ensures that patient data in one hospital is isolated from the doctors and staff at other hospitals.
Do your customers want/need access to the database?
In some industries, customers using a hosted application demand access to the data. For the medical software, there have been a handful of clients who, before signing up, stipulated that they have access to a weekly backup of their data, which they could store on their computer system. With the Separate Databases approach it has been easy to service these requests - schedule a weekly backup for that customer's database and add an ASP.NET page that allows that customer's administrative users the ability to see past backups and download those backup files to their computer.
How many customers do you expect to be using the system?
Note that the above questions don't touch on technological or development issues. Rather, they are focusing on security, privacy, regulations, and the end user's needs or expectations.
While the Separate Databases architecture provides a higher degree of security and privacy through data isolation, there are a couple of challenges worth noting.
- Rolling out changes to the data model is more difficult. Imagine that you've added a new feature, which entailed adding two database tables, a new view, and six new stored procedures. In a Shared Database, Shared Schema architecture, rolling out that change is a matter of adding those database objects to the production database. With a Separate Databases architecture, you need to make sure to roll out those changes to all of the databases. With a systematic, tested, scripted process this is not much of a challenge, but if you don't have such a system defined - if this work is done manually, for instance - you're asking for trouble as you're undoubtedly going to have a scenario where the database changes get rolled out to some databases, but not all. This can be especially challenging if you have decided to customize the data model on a per-customer basis. If database X has certain tables or fields that are not found in database Y then this makes rolling out changes in an automated fashion more challenging. For this reason I would discourage making table-level customizations per customer if at all possible.
- Viewing data aggregated across the databases is difficult. I've touched upon this topic in an earlier blog post, Running the Same Query Against Multiple Databases. When you find a bug on one database and need to see whether it affects data in other databases there are not many tools at your disposal. One poor man's tool is sp_msForEachDb, but it's less than ideal.
- Adding a new customer requires creating a new database. Signing up a new customer is great news for the company's bottom line, but what changes does it entail in the data model? For a Shared Database, Shared Schema architecture, adding a new customer is as simple as adding a new record to the Customers table. For a Separate Databases architecture, adding a new customer means creating a new database and adding the appropriate database objects and any initial data in these tables. Using the SQL Server model database you can simplify this process, but in order to do so you need to make sure that any changes to the data model - new stored procedures, tables, views, UDFs, etc. - need to also be added to the model database. Similarly, any changes to the initial state - new default records in a lookup table, say - need to also be added to the appropriate tables in model.
Given the pros and cons of these two multi-tenant architectures, you can guess what architecture a company is using based on their price point and their targeted industry. If there is a high cost to sign up to the site then chances are the company has only a “few” clients (maybe just a few, maybe dozens, but probably not hundreds or thousands), and if the application is for medical, financial, or legal purposes then they probably use a Separate Databases architecture. For web applications with a lower price point and geared to industries where data privacy and security is less important and less encumbered by regulations, chances are a Shared Database, Shared Schema approach is being used. As you might expect, the medical software has around 15 customers, each paying thousands of dollars per month to use the service, whereas the print shop software has a couple hundred clients who pay a one time setup fee with the option to buy an annual support contract, which amortizes to hundreds of dollars per month per cusotmer.
Whether you choose the Shared Database, Shared Schema or Separate Databases architecture depends largely on non-technological factors. As I noted earlier, I typically choose the Shared Database, Shared Schema architecture by default, only switching to a Separate Databases architecture if there are particular security, privacy, regulatory, or other business needs that necessitate it.