A few weeks ago my boss gave me a task to improve our mail sending system. At that moment we had a system which would send email without parallel, single message after message. We don't send a lot of emails per day, but we also have irregular behavior. For example, the system was idle for two hours and after that it needed to send 100 emails. After consideration I have created a simple, message-based system based on SQL Server. It should work as if queue persisted in
SQL Server single table. The email system isn't so big, so we didn't want to use RabbitMQ or MSMQ. In SQL Server we can create reports and audit data practically out of the box. Today I will present to you a smaller and not so complete version, but it's good enought to explain the assumptions. So le's switch to code.
Assumptions
- system should be based on SQL Server and single table .
- many parts of system can insert data to table (it represents sending emails) at the same time.
- system should process parallel data from one table - send many emails at the same time.
- if sending message has error, the system shouldn't mark (remove) message from table.
1. Database
The database code is very simple - all stuff is in one table.
- CREATE TABLE [dbo].[messages](
- [id] [int] IDENTITY(1,1) NOT NULL,
- [inserted] [datetime2](7) NOT NULL,
- [messageType] [nvarchar](512) NOT NULL,
- [messageBody] [nvarchar](max) NOT NULL,
- CONSTRAINT [PK_messages] PRIMARY KEY CLUSTERED
- (
- [id] ASC
- )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
- ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
-
- GO
-
- ALTER TABLE [dbo].[messages] ADD CONSTRAINT [DF_messages_inserted] DEFAULT (sysdatetime()) FOR [inserted]
- GO
- id - is unique message identifier - autoincrement,
- inserted - represent date, when message was inserted
- messageType - represent full C# type name (namespace + class name),
- messageBody - has serialized message.
2. C# Abstractions: I'd like to start programming from defining my interfaces. So let's do this.
- public interface IMessageProcessor<T>
- {
- void Process(T message);
- }
-
- public interface IMessageProcessorEngine
- {
- void ProcessAllMessages(int maxDegreeOfParallelism);
- }
-
- public interface IMessageRepository : IDisposable
- {
- void BeginTransaction();
- void CommitTransaction();
- void RollbackTransaction();
- DbMessageModel GetOldestMessage();
- void RemoveMessage(DbMessageModel message);
- }
IMessageProcessor: Implementation of this class will have logic for processing message -> in my work example, it send emails by SMTP.
- IMessageEngine - has logic to manage tasks.
- IMessageRepository - it responsible of manage a single message (row) in [dbo].[messages] sql table.
3.
C# Implementation: Let's create class which represents the object structure of [dbo].[message] table row.
- public class DbMessageModel
- {
- public int Id { get; set; }
- public DateTime Inserted { get; set; }
- public string MessageType { get; set; }
- public string MessageBody { get; set; }
- }
Next let's implements IMessageRepository,
- public class MessageRepository : IMessageRepository
- {
- private SqlConnection _connection;
- private SqlTransaction _transaction;
-
- public void BeginTransaction()
- {
- _connection =
- new SqlConnection(ConfigurationManager.ConnectionStrings["ParallelMessageProcessingDb"].ConnectionString);
- _connection.Open();
- _transaction = _connection.BeginTransaction();
- }
-
- public void CommitTransaction()
- {
- _transaction.Commit();
- }
-
- public void RollbackTransaction()
- {
- _transaction.Rollback();
- }
-
- public DbMessageModel GetOldestMessage()
- {
- const string queryText = @"SELECT TOP (1)
- id,
- inserted,
- messageType,
- messageBody
- FROM dbo.messages WITH (ROWLOCK, READPAST, UPDLOCK, INDEX (PK_messages))
- ORDER BY id";
-
- using (var command = new SqlCommand(queryText, _connection, _transaction))
- {
- var dataTable = new DataTable();
- dataTable.Load(command.ExecuteReader());
-
- if (dataTable.Rows.Count == 0)
- return null;
-
- return new DbMessageModel
- {
- Id = (int) dataTable.Rows[0]["id"],
- Inserted = (DateTime) dataTable.Rows[0]["inserted"],
- MessageType = (string) dataTable.Rows[0]["messageType"],
- MessageBody = (string) dataTable.Rows[0]["messageBody"]
- };
- }
- }
-
- public void RemoveMessage(DbMessageModel message)
- {
- const string queryText = @"DELETE top(1) FROM dbo.messages WITH (ROWLOCK) WHERE id = @id";
-
- using (var command = new SqlCommand(queryText, _connection, _transaction))
- {
- var idParameter = new SqlParameter("id", SqlDbType.Int);
- idParameter.Value = message.Id;
-
- command.Parameters.Add(idParameter);
- command.ExecuteNonQuery();
- }
- }
-
- public void Dispose()
- {
- _transaction.Dispose();
- _connection.Dispose();
- }
- }
I will not explain the manage transaction, select and delete statements - I think it's quite easy. The first strange thing which you may not know are SQL query hints.
- ROWLOCK: it suggests SQL Server to lock single row in transaction, not whole table.
- READPAST: tells SQL Server to get only not locked rows (not edited by another transaction etc).
- UPDLOCK: it increases a SQL Server lock to update, so if another part of system executes thd same query, the result will not be contained to that row (readpast is responsible for that).
- INDEX: In that scenario INDEX hint is not necessary, SQL Server will use it by default (primary key clustered index). But if you want order your rows by different column, You must create a specified index for that. For example, if in current structure of table i will order data by "inserted" column without index on that, the SQL Server need to scan whole table and automatically it will expand ROWLOCK to table lock. As i mentioned before, the ROWLOCK is only suggestion to SQL Server, but it can execute query with it own, when it need it.
The MessageProcessorEngine is responsible for managing parallel and executing specified message processor.
- public class MessageProcessorEnigine : IMessageProcessorEngine
- {
- private readonly Func<Type, object> _mmessageHandlerFactory;
-
- public MessageProcessorEnigine(Func<Type, object> mmessageHandlerFactory)
- {
- _mmessageHandlerFactory = mmessageHandlerFactory;
- }
-
- public void ProcessAllMessages(int maxDegreeOfParallelism)
- {
- var tasks = new List<Task<bool>>();
-
- while (true)
- {
- while (tasks.Count < maxDegreeOfParallelism)
- {
- Task<bool> task = Task.Factory.StartNew(() => ProcessOldestMessage());
- tasks.Add(task);
- }
-
- int finishedTaskIndex = Task.WaitAny(tasks.ToArray());
- var taskResult = tasks[finishedTaskIndex].Result;
-
- if (!taskResult)
- {
- Task.WaitAll(tasks.ToArray());
- break;
- }
-
- tasks.RemoveAt(finishedTaskIndex);
- }
- }
-
- private bool ProcessOldestMessage()
- {
- using (IMessageRepository messageRepository = new MessageRepository())
- {
- messageRepository.BeginTransaction();
- var oldestMessage = messageRepository.GetOldestMessage();
-
- if (oldestMessage == null)
- {
- messageRepository.RollbackTransaction();
- return false;
- }
-
- try
- {
- var messageType = Type.GetType(oldestMessage.MessageType);
- object message = null;
-
- var xmlSerializer = new XmlSerializer(messageType);
- using (var stringReader = new StringReader(oldestMessage.MessageBody))
- message = xmlSerializer.Deserialize(stringReader);
-
- var messageProcessorType = typeof(IMessageProcessor<>).MakeGenericType(messageType);
- dynamic messageProcessor = _mmessageHandlerFactory(messageProcessorType);
-
- messageProcessor.Process((dynamic)message);
- messageRepository.RemoveMessage(oldestMessage);
- messageRepository.CommitTransaction();
- }
- catch (Exception ex)
- {
-
- messageRepository.RollbackTransaction();
- }
-
- return true;
- }
- }
- }
So, here's how it works. The engine starts a few tasks. If any one of the executing tasks return NULL from repository - it mean that is nothing to do, and the enginge will wait for finish existing tasks, otherwise it will peek next row (message) from database and process it on spearate task.
Processing single message contains: message deserialization, creating instance of specified message processor and execution of process message.
Let's write some classes that, show us how it works.
- namespace ParallelMessageProcessing.Messages
- {
- public class SendEmailMessage
- {
- public string Recipient { get; set; }
- public string Subject { get; set; }
- public string Body { get; set; }
- }
- }
-
- namespace ParallelMessageProcessing.Messages
- {
- public class SendEmailMessageProcessor : IMessageProcessor<SendEmailMessage>
- {
- public void Process(SendEmailMessage message)
- {
- Console.WriteLine("Sending message to: {0} with subject: {1}", message.Recipient, message.Subject);
- Task.Delay(TimeSpan.FromSeconds(2)).Wait();
- }
- }
- }
Message processor is simulating a two second sending email process.
Let's fill the database with some data.
- DECLARE @counter int = 0;
-
- WHILE(@counter < 30)
- BEGIN
-
- INSERT INTO dbo.messages (messageType, messageBody)
- VALUES ('ParallelMessageProcessing.Messages.SendEmailMessage', '<SendEmailMessage><Recipient>[email protected]</Recipient><Subject>' + CAST(@counter AS NVARCHAR(5)) + '</Subject></SendEmailMessage>')
-
- SET @counter = @counter + 1
- END
The last step is to write Program.cs,
- internal class Program
- {
- private static void Main(string[] args)
- {
- Func<Type, object> ioc = (Type type) => new SendEmailMessageProcessor();
- IMessageProcessorEngine messageEngine = new MessageProcessorEnigine(ioc);
- messageEngine.ProcessAllMessages(3);
-
- Console.ReadKey();
- }
- }
Now run the app.
As You can see application processed parallel all inserted messages (from 0 do 29).
4. Conculsion
In that simple application I will show you the basis of how you can use SQL Server locks to create parallel processing system that works like queue. In your production code you should focus on error handling (in example if some will going wrong, the another task will process same message and it will over and over again). Probably you should do something like processing counter, if for example after three tries it is still going wrong - move that message to another error table.
Also, you must know that whilethe engine is processingthe message the SQL Server keeps opening transaction - so, you shouldn't use a huge number of tasks - in our system 10 works well.
Also when something is going really wrong (for example your server is powered off), it may be a situation when the system processed the message and it will not be removed from the message table (Processor finished its job but transaction was not committed). You should add additional logic for checking that, but it will be a really rare situation.
The simple engine is a good starting point for extend (You can add your own Dependency Injection container, the message serialization will be moved out of Engine etc.)
Read more articles on SQL Server: