One of my clients has a data-driven Software as a Service (SaaS) web application that is used by a dozen different customers. In particular, this client of mine has a web application that's used by various clinics and hospitals to track, schedule, and bill health care-related activities. As discussed in the Multi-Tenant Data Architecture guide from Microsoft, there are three approaches for modeling data that comes from many distinct customers:
- Separate Databases - each customer has a separate database on the database server.
- Shared Database, Separate Schema - a single databsae is used, but each customer has a unique set of database tables.
- Shared Database, Shared Schema - a single set of database tables are used, with a CustomerId column or some other technique used to identify what rows belong to what customers.
Each approach has its pros and cons, and is good reading if you plan on building a SaaS web app.
For security and privacy reasons (important concerns for health care providers), we used the Separate Databases approach - each customer's data is hosted as a separate database on the server. I would not recommend this approach for the vast majority of multi-tenant applications. The most pronounced drawback of this model centers around the pain points associated with managing and administration of these databases. For example, when rolling out a new version or a bug fix, the database changes must be applied to all customer databases. If there is a business logic-type bug discovered, where there may be logically incorrect data in the database that needs to be addressed, you now have to look through all databases to see where the problem occurs and apply the fix across all databases.
One tool that assists with such debugging challenges is the undocumented Microsoft SQL Server stored procedure sp_MsForEachDb. In short, this stored procedure allows you to run a command on every database on the server. You can use it like so:
EXEC sp_MsForEachDb @command1='command to execute on all databases'
Within the @command1 input you can use a question mark (?) to denote that the name of the database the command is currently executing on be injected.
sp_MsForEachDb is most commonly used for administrative tasks such as updating statistics on all databases, but can also be used for debugging or bug hunting in a multi-tenant SaaS architecture that uses separate databases. Here's how: consider that you unearth a bug in the application code that allows for some illogical value in the database. Perhaps there's some business rule that dictates that a patient cannot have their insurance information entered if the patient's primary address is not provided, yet the application did not correctly enforce this rule and as a result there may be some patients with insurance information on file but without a primary address. Your task is to find out what patients in each database fit this description. You could run the following query on every single database:
SELECT p.PatientId, p.PatientName
FROM Patients p
INNER JOIN PatientInsuranceInformation pi ON p.PatientId = pi.PatientId
WHERE p.Address1 IS NULL AND pi.Active = 1
The issue is that this quickly becomes a pain if you have many different databases. Here's where sp_MsForEachDb comes in - you can use it to run the above on each database with the following statement:
EXEC sp_MsForEachDb @command1 = 'SELECT ''?'' as DatabaseName, p.PatientId, p.PatientName
FROM ?.dbo.Patients p
INNER JOIN ?.dbo.PatientInsuranceInformation pi ON p.PatientId = pi.PatientId
WHERE p.Address1 IS NULL AND pi.Active = 1'
This will output a three column resultset that displays those patients that have insurance information but no primary address along with what database the patient information resides in. The ''?'' part in the SELECT list displays the database name. The question mark character is also used in the FROM and INNER JOIN clauses to indicate the database from which to query against (i.e., you can refer to a table as database.owner.tableName).
The only downside of sp_MsForEachDb is that it runs on all databases, meaning it will run on databases like master, which don't have the application tables. As a result, you'll get SQL errors for those databases which clutter the output a bit. But for the databases specific to the application you will quickly see the results of your query and know, in an instant, what patients in what databases have this particular problem.