Some weeks ago, during Codemotion 2014, a big community event held in Madrid, Spain, a great friend, Sergio Navarro, told me about a big performance problem he had in a Windows Store app. Sergio uses SiaqoDb, an awesome cross-platform NoSQL database. SiaqoDb makes storage of data in multi-platform applications much easier.
The problem
The problem was about the poor performance of SiaqoDb in Windows Store apps. It performed well in Windows Phone 8.0, but in Windows Phone 8.1 or Windows Store, the read performance decreased 40x times compared with Windows Phone 8.0.
The SiaqoDb people even sent a test project to Sergio, demoing the problem with the WinRT APIs. You can have a look at the original test project here. The most interesting part of the code is that SiaqoDb doesn't read the complete file at once. They read the file in little chunks. In the test code, they perform 5000 reads of the file, reading 100 bytes each time.
It looked very interesting to me, but at first sight I thought the performance difference could happen due to a difference in the hardware. Modern Smartphones use solid state memory and Desktop PCs use regular magnetic disks that are more slower. But, 40 times? That's a big difference.
At the same time, another friend, Juan Manuel Montero, was also taking a look at the problem. Seems that by reading the file in one step, the entire file, the issue was eliminated and the performance was very similar to Windows Phone 8.0. But SiaqoDb needs to read the file in chunks so the solution wasn't applicable to them. But this interested me more.
So I made some tests with the original project. I tested the Windows Store app both on a Desktop PC and on a Surface Pro 3. The results were the same: 6.4 seconds to read the total file. Then I gave a try to the Windows Phone project on my Lumia 1520: 0.3 seconds total time. Wow! That was a big difference. Also the result between my Desktop PC with magnetic disk and my Surface Pro 3 with SSD were steady, there wasn't a performance increase in each one of them.
Silverlight
It was very clear that something wrong happened with WinRT readings. Let's take a look at the original Silverlight Read method implemented by SiaqoDb:
- public virtual int Read(long pos, byte[] buf)
- {
- file.Seek(pos, SeekOrigin.Begin);
- return file.Read(buf, 0, buf.Length);
- }
Plain and simple code. The file object is an IsolatedStorageFileStream instance created in the class constructor. It simply moves to the position in the stream of where we want to start reading from and then reads the bytes into an array.
WinRTNow let's take a look at the WinRT code:
- public virtual async Task<int> ReadAsync(long pos, byte[] buf)
- {
- fileStream.Seek((ulong)pos);
-
- if (buf.Length > 0)
- {
-
- var buffer = Windows.Security.Cryptography.CryptographicBuffer.CreateFromByteArray(buf);
- IBuffer rd = await fileStream.ReadAsync(buffer, (uint)buf.Length, InputStreamOptions.None);
- rd.CopyTo(buf);
- }
- return (int)buf.Length;
- }
Simple, but maybe we could start to see what the problem is. Can you see it? In this code, we use WinRT async methods. We need to create a new instance of CryptographicBuffer, a class implementing the IBuffer interface, with the CreateFromByteArray method, then we use our buffer to call the ReadAsync method of the FileRandomAccessStream class. Finally we do a copy of the IBuffer result to our byte array and return the length.
In fact we make many more operations than the Silverlight code. Create objects, make copies... How much time do those operations cost us? Let's see them.
- The CryptographicBuffer's CreateFromByteArray method has a total cost of 600 milliseconds across all the calls made to read the entire file. This alone is twice the time Silverlight needed to read the complete file.
- IBuffer's CopyTo method took another 300 milliseconds to execute.
In summary, only in conversion operations we spent 3 times more than the total operation took on Silverlight. 5.4 seconds more is consumed by the FileRandomAccessString's ReadAsync method.
The solution
We enter in an interesting thing. It's clear, in my opinion, that the async/await operations cost more time than regular sync operations. In addition, the ReadAsync method returns an IAsyncOperationWithProgress, that adds an overload to the entire operation.
But, if we read the entire file, the performance is better, why? Let's go step-by-step. How much time does Silverlight need to perform all 5000 read operations? Around 75 milliseconds. Just my opinion here, but what I thought was: each read operation is so fast, the overload async/await and the progress indication is introduced and surpasses the time of the operation itself, so for each read we are increasing very much time. But, if we make a one-time read of the entire file then that overload is produced only one time, not reflecting so much of the final timing.
In fact, the Microsoft engineering team thinks that async APIs for each method call can potentially take more than 30 milliseconds. But in our case the read operation is so small and fast, it barely took 1 millisecond to perform. So if we eliminate the async/await stuff and the type conversions, we can significantly improve the operation, returning the code to the basics and avoiding all the overloads we don't need in this case.
How to do that
The FileRandomAccessStream object has an extension method called AsStream, who in fact returns the base, the plain and simple Stream it uses underneath to perform the read and writes. In the constructor of our file class we can get that instance and then, rewrite our ReadAsync method this way to make it synchronized:
- public virtual int Read(long pos, byte[] buf)
- {
- syncfileStream.Seek(pos, SeekOrigin.Begin);
- return syncfileStream.Read(buf, 0, buf.Length);
- }
And, ladies and gentlemen, this is the same sync code we use in Silverlight, exactly the same. So we can simplify our code implementations even more, sharing the same code among platforms. But, how does it perform in the Windows Store? Removing the async overload, the total read of the file, doing 5000 iterations takes about 250 milliseconds. A bit faster than the result we get in Silverlight.
FinishingAnd we finish this little analysis, after sending my findings to Sergio and SiaqoDb engineers, they tested the solution, even publishing an app to the store to be sure this code doesn't break app certification and all went fine. So they will implement this code in their base and get a big performance improvement in Windows Store.
There are two things I would like to say as my final thoughts about this:
- Don't take something for granted in terms of performance, sometimes a decision an engineering team made in general, could not be the best for a specific case. I think async/await is awesome for most of the cases out there, but there is no such thing as a "Silver Bullet" you can apply to each situation.
- The community works around Microsoft technologies. All this analysis, work and deductions were made by community members, helping each other and for the sake of sharing the knowledge. This is something big and I am very happy to be a part of such a great community.
And that's all folks! Happy coding!