August 2009 - Posts

Ruminations on Multi-Tenant Data Architectures
19 August 09 04:09 PM | Scott Mitchell | 2 comment(s)

One of the key advantages of web applications is that they can be deployed as a hosted service and accessed over the Internet rather than needing to be locally installed at a customer's site. Over the years I've helped numerous clients build web applications of this sort. When building such an application there are bound to be questions on the best way to model and store data from the different customers that access the hosted application. Do you co-mingle customer data in a single database with a single schema? Do you use a single database, but separate customer data into different schemas? Or do you create a separate database with separate tables for each customer?

These three data architectures are examined and discussed in detail in the article Multi-Tenant Data Architecture. The authors introduce three different multi-tenant data architectures, which they name:

  • Separate Databases - in this architecture each customer's data is stored in a separate database. The databases may all be on the same database server or they could be partitioned across multiple database servers. This approach provides for maximum isolation of customer data. If we were building a hosted application using the Northwind database and had three customers then with this approach we would have three databases - Northwind01, Northwind02, and Northwind03 - and each database would have the same tables, views, stored procedures, and so on. Customer 1's data would all be located in the Northwind01 database, while Customer 2's data would be over in the Northwind02 database.
  • Shared Databases, Separate Schemas - SQL Server 2005 introduced the concept of schemas, which offer a way to group a set of tables. With this approach you can have a single database with one schema for each customer. Returning to the Northwind example, with this approach there would be a single database, but there would be three schemas - Customer01Schema, Customer02Schema, and Customer03 schema. Each schema would have the same set of tables, views, stored procedures, and so forth. Customer1's product information would be found in the Customer01Schema.Products table, whereas Customer2's product information would be stored in Customer02Schema.Products.
  • Shared Database, Shared Schema - here we have only a single database and a single schema. There would be only one Products table. To differentiate one customer's products from another we'd need to add a NorthwindCustomers table that would have a record for each customer and then add a NorthwindCustomerID foreign key to the Products table (and to the other pertinent tables).

The Multi-Tenant Data Architecture examines these three approaches in much more detail with screen shots, T-SQL snippets, and more in-depth examples, and is definitely worth reading.

I'd like to talk a little bit about my “from the trenches” experience with multi-tenant data architectures. First, a little background.

Medical Software
In 2003 I started work with a client we'll call Acme Medical, which was building a hosted, ASP.NET medical software application. This application had been in existence for a couple of years by this point, but as a Microsoft Access application that was installed and run locally from a couple of hospitals and clinics. I joined the project with the aim of moving the application from Microsoft Access to ASP.NET and SQL Server. We chose to use a Separate Databases architecture. I continue to work on this project today. Originally, there were two customers, each with a few thousand records. The data model contained maybe 20-30 tables. Today there are more than 15 customers, each with hundreds of thousands if not millions of records. There are nearly 350 database tables, hundreds of ASP.NET pages, and several automated backend processes.

Print Management
Another client, Acme Printers, was a prominent player in the print shop marketplace, selling in-house applications to large-scale print shops. I helped this company build an online, hosted version for order placement and fulfillment. We used a Shared Database, Shared Schema data model. In a nutshell, there is a Customers table with a CustomerID primary key value uniquely identifying each customer. Every other database table has a CustomerID field that identifies what data belongs to what customer. When a user signs on, we look up what customer the user is associated with and then store this CustomerID in session. This CustomerID variable is then used in other pages on the site to pull back the data pertinent to that customer.

I have never used the Shared Database, Separate Schema approach. I recently spoke with Michael Campbell about this topic, and he shared the following advice: “I'd recommend against using shared DBs and separate schemas. In my experience that almost NEVER works out as advertised, adds all sorts of difficulty in terms of disaster recovery, management, and isolation, and really doesn't offer any worthwhile benefits. It also becomes absolutely insane in terms of managing permissions/security as well.”

The factors that should influence whether you go with a Separate Database or Shared Database, Shared Schema architecture are not technical, but rather are regulatory-, security-, or business-related. If you find yourself deciding to use one architecture over another because of some technological reason you're probably asking the wrong questions and evaluating the wrong criteria.

Here's how I tackle this problem. I start by assuming that I'm going to use the Shared Database, Shared Schema architecture, as it's usually the best fit for the types of projects I work on. It easier to setup and implement and test and debug and scale up (to a point) than the Separate Database architecture. For the majority of web applications, the Shared Database, Shared Schema architecture is the best approach when weighing just the technological- and development-related factors.

Sadly, in the real-world there are many factors that outweigh technological ones. While the Separate Databases architecture has more friction associated with it than the Shared Database, Shared Schema, it does have certain advantages in terms of security, privacy, isolation, and so on. When evaluating which architecture to use, I like to ask my client the following questions:

  • How important is the security and privacy of the data?
    • For the medical software, security and privacy is paramount. Customers (the hospitals using the software) are entering social security numbers, lab test results, diagnoses from specialists and psychologists, and so on. The data is rife with very personal and sensitive patient information, not to mention information about the health care practitioners, including social security numbers, license information, and disciplinary actions.
  • Are there any regulatory reasons for choosing one architecture over the other?
    • Depending on the line of business, there may be governmental or professional regulations that require a clean separation of data. For medical providers in the US there is a regulation known as HIPAA, which defines guidelines for patient privacy. It's been several years since I've explored HIPAA, but from my recollection HIPAA does not require that a Separate Database architecture be used, but using one ensures that patient data in one hospital is isolated from the doctors and staff at other hospitals.
  • Do your customers want/need access to the database?
    • In some industries, customers using a hosted application demand access to the data. For the medical software, there have been a handful of clients who, before signing up, stipulated that they have access to a weekly backup of their data, which they could store on their computer system. With the Separate Databases approach it has been easy to service these requests - schedule a weekly backup for that customer's database and add an ASP.NET page that allows that customer's administrative users the ability to see past backups and download those backup files to their computer.
  • How many customers do you expect to be using the system?
    • There is a maintenance cost associated with each new database. If you expect hundreds or thousands of customers then a Separate Databases architecture is probably going to be prohibitive from a maintenance standpoint.

Note that the above questions don't touch on technological or development issues. Rather, they are focusing on security, privacy, regulations, and the end user's needs or expectations.

While the Separate Databases architecture provides a higher degree of security and privacy through data isolation, there are a couple of challenges worth noting.

  • Rolling out changes to the data model is more difficult. Imagine that you've added a new feature, which entailed adding two database tables, a new view, and six new stored procedures. In a Shared Database, Shared Schema architecture, rolling out that change is a matter of adding those database objects to the production database. With a Separate Databases architecture, you need to make sure to roll out those changes to all of the databases. With a systematic, tested, scripted process this is not much of a challenge, but if you don't have such a system defined - if this work is done manually, for instance - you're asking for trouble as you're undoubtedly going to have a scenario where the database changes get rolled out to some databases, but not all. This can be especially challenging if you have decided to customize the data model on a per-customer basis. If database X has certain tables or fields that are not found in database Y then this makes rolling out changes in an automated fashion more challenging. For this reason I would discourage making table-level customizations per customer if at all possible.
  • Viewing data aggregated across the databases is difficult. I've touched upon this topic in an earlier blog post, Running the Same Query Against Multiple Databases. When you find a bug on one database and need to see whether it affects data in other databases there are not many tools at your disposal. One poor man's tool is sp_msForEachDb, but it's less than ideal.
  • Adding a new customer requires creating a new database. Signing up a new customer is great news for the company's bottom line, but what changes does it entail in the data model? For a Shared Database, Shared Schema architecture, adding a new customer is as simple as adding a new record to the Customers table. For a Separate Databases architecture, adding a new customer means creating a new database and adding the appropriate database objects and any initial data in these tables. Using the SQL Server model database you can simplify this process, but in order to do so you need to make sure that any changes to the data model - new stored procedures, tables, views, UDFs, etc. - need to also be added to the model database. Similarly, any changes to the initial state - new default records in a lookup table, say - need to also be added to the appropriate tables in model.

Given the pros and cons of these two multi-tenant architectures, you can guess what architecture a company is using based on their price point and their targeted industry. If there is a high cost to sign up to the site then chances are the company has only a “few” clients (maybe just a few, maybe dozens, but probably not hundreds or thousands), and if the application is for medical, financial, or legal purposes then they probably use a Separate Databases architecture. For web applications with a lower price point and geared to industries where data privacy and security is less important and less encumbered by regulations, chances are a Shared Database, Shared Schema approach is being used. As you might expect, the medical software has around 15 customers, each paying thousands of dollars per month to use the service, whereas the print shop software has a couple hundred clients who pay a one time setup fee with the option to buy an annual support contract, which amortizes to hundreds of dollars per month per cusotmer.

Whether you choose the Shared Database, Shared Schema or Separate Databases architecture depends largely on non-technological factors. As I noted earlier, I typically choose the Shared Database, Shared Schema architecture by default, only switching to a Separate Databases architecture if there are particular security, privacy, regulatory, or other business needs that necessitate it.

Filed under: ,
Range-Specific Requests in ASP.NET
14 August 09 07:15 AM | Scott Mitchell

The HTTP/1.1 protocol include support for range-specific requests, which allow a client to optionally request a particular range of bytes rather than request the entire file. This functionality is most commonly used by download manager programs, which allow users to pause and resume downloads. In a nutshell, a download manager will start by asking for the entire contents of a file (the default behavior). If the user pauses the download or if the download manager is shut down, the last downloaded byte is remembered. Later, the download manager can resume the download by sending a request to the server for the file starting from the last downloaded byte.

The following diagram illustrates the range-specific request workflow just described when downloading a large file named DancingHampsters.zip. The diagram shows what happens when the download is paused (or interrupted) after having downloaded the first 500,000 bytes, and how using range-specific requests the client can resume the download starting from a specified location in the file (rather than having to re-download the file in its entirety).

While IIS natively handles range-specific requests, ASP.NET does not. If you are serving binary content from an ASP.NET HTTP Handler and if you want (or need) to support range-specific requests then you'll need to add such functionality yourself. I bumped into this requirement when working on a project that serves videos to Apple's handheld devices - the iPhone and iPod Touch. The iPhone and iPod Touch request video files using range-specific requests. Therefore, if you are serving these videos straight from the file system via IIS then everything will work as expected, but if you serve the video from an HTTP Handler in order to implement authorization rules or if the files are dynamically generated then you'll need to write code in your HTTP Handler that will handle the range-specific requests sent by the Apple devices.

In my research I stumbled upon an article by Alexander Schaaf titled Tracking and Resuming Large File Downloads in ASP.NET, which presented Visual Basic code showing how to implement range-specific requests in an ASP.NET 1.x application. I took this code, refactored it, ported it to C#, and utilized a number of language enhancements added since the .NET 1.x days. This code, along with a discussion on how range-specific requests work and a look at how to use the code, is now available on DotNetSlackers.

Read more at: Range-Specific Requests in ASP.NET.

Happy Programming!

Filed under:
Upcoming Talk: Content Syndication With ASP.NET 3.5 (San Diego ASP.NET SIG)
11 August 09 02:12 PM | Scott Mitchell

I'll be speaking this coming Tuesday (August 18th) at the San Diego ASP.NET SIG. My talk will focus on how to create and consume syndicated content (RSS and Atom) using the classes in the System.ServiceModel.Syndication, which were added to the .NET Framework version 3.5. Here is a synopsis of the talk:

A syndication feed is an XML file that summarizes a website's most recently published content. It is commonly used in blogs, news sites, sports sites, social networking sites, and other content producing websites to provide a machine-readable format of the latest content. Microsoft added a number of classes to the .NET Framework version 3.5 to ease creating and reading RSS 2.0 and Atom 1.0 syndication feeds. In this talk we'll explore the history of online syndication and see how to consume and create syndicated content in an ASP.NET web application.

The San Diego ASP.NET SIG meetings are held at the Microsoft offices in UTC and are free to attend. Best of all, there will be free pizza and soft drinks starting at 6:00 PM!!!

Hope to see you there! If you cannot make it, but are interested in learning more, you can download the slides from http://datawebcontrols.com/classes/Syndication.zip. Also be sure to check out my article, How to create a syndication feed for your website.

Happy Programming

Filed under:
More Posts

Archives

My Books

  • Teach Yourself ASP.NET 4 in 24 Hours
  • Teach Yourself ASP.NET 3.5 in 24 Hours
  • Teach Yourself ASP.NET 2.0 in 24 Hours
  • ASP.NET Data Web Controls Kick Start
  • ASP.NET: Tips, Tutorials, and Code
  • Designing Active Server Pages
  • Teach Yourself Active Server Pages 3.0 in 21 Days

I am a Microsoft MVP for ASP.NET.

I am an ASPInsider.