Introduction
Despite advancements in network and processor speeds, performance remains a key
concern among application developers. So whether you are writing an XML Web
service, pushing image bitmaps to a video card, or even engineering that next
great processing chip, you will invariably want to consider utilizing a
universal mechanism for improving performance: a cache.
In this At Your Service column, we will look at how you as a developer and
consumer of XML Web services can utilize caching. We'll take a look at ways you
can do application-level caching with ASP.NET, and will take a look at HTTP
caching and its application for XML Web services. Finally, we will look at how
we can take the sample MSDN Pencil Company's Discovery service and implement a
caching strategy that makes sense for providing a pencil catalog that is updated
daily.
Questions to Ponder When Considering
Caching Options
There are a number of ways you could implement various caching capabilities when
creating an XML Web service or consuming an XML Web service. However, not all
mechanisms for implementing a cache will effectively enhance performance, or
even offer the perception of enhanced performance. You must analyze what makes
sense in your particular usage scenario. Here are some questions you will want
to ask yourself when considering caching functionality for your XML Web service:
How much of my data is dynamic?
It is hardly a foregone conclusion that caching is always a good idea.
For instance, if the data returned from an XML Web service is always different,
then caching may not help much. However, just because data is dynamic, it
doesn't mean that caching is out of the question. If even a portion of the
response is relatively static, caching could improve your Web server's
performance. Consider a scenario where information is changing, but not changing
with each and every request? If you are receiving hundreds of requests a second
for your temperature service, for example, you might want to send back cached
data for most requests, and only update the data every 5 minutes or so.
Is my data private?
In many cases an XML Web service will deal with user-specific data.
This tends to decrease the usefulness of caching-but don't write off caching
just because you are dealing with user-specific data. Say your XML Web service
has a small number of users; it might make sense to cache information for each
user, particularly if the user might request the same information multiple
times. Even if a user does not request the same information every time, there
may be a common instance of a class that could be referenced for each request
from that same user. Be careful when caching private information, however,
because bugs in this kind of code may allow private data to be compromised. To
play it safe, it might be wise for your code to enforce access restrictions.
Does my XML Web service use resources that I
can share between requests?
Caching is not limited to simply caching responses. You may be able to
gain significant performance enhancements by caching any sort of application
data or resources. It might make sense to keep around a dataset, for instance,
to handle multiple queries. The response data may vary depending upon the
specific queries on the dataset, but the dataset data itself may remain the same
for many requests.
Can I predict the use of future resources?
Consider the usage scenarios for your XML Web service. Are there
behaviors that you can predict? Say, for instance, that an XML Web service
allows consumers to search for a particular article, and then allows them to
download that article. It may make sense to assume that once a successful search
has been performed for an article, a download request will soon follow. Your XML
Web service can start the potentially lengthy process of loading the article
into memory (perhaps from a file or a database), so that it is all set to
respond once it receives the request to download the article.
Where do I cache my XML Web service's data?
The correct answer to this question may often be, "everywhere." But
what are the different options for caching data? To answer this question let's
take a look at the potential XML Web service scenario shown in Figure 1.
Figure 1. Caching possibilities for one XML Web
service scenario.
The figure
starts in the upper left with an end user browsing to the Web site located in
the yellow box. Unbeknownst to the user, the Web site sits behind an HTTP proxy
server. The Web server then makes a SOAP request to a Web service in a different
organization (represented by the green box). The SOAP request also goes through
an HTTP proxy. The first Web service must then forward the request to a second,
internal Web service, the internal Web service queries a Microsoft® SQL Server
for the data required, and the response is finally returned. The SQL data is
used to build the internal Web service response, and the internal Web service
response is used to build the response to the initial Web service. The Web site
uses the Web service response to create an HTML page that is returned to the
end-user browser.
All this happens through the various proxies and routers along the way. So where
can the response data be cached? At every point in this scenario: The SQL Server
could cache query results. The internal Web service could cache the SQL query
results. The initial Web service could cache the results from the internal Web
service, and the green organization's HTTP proxy could cache the results as
well. The Web server could cache the Web service's response. The yellow
organization's proxy could cache the Web server's response, and the end-user's
browser could cache the HTML page.
When will my data expire?
One of the key problems when designing caching strategies is
determining when the data in the cache should be removed from the cache.
Sometimes this is fairly simple to determine, since a process that updates the
data may run at regular intervals. However, there may be other situations where
the data is updated at relatively random intervals. In either case, the key is
to figure out the optimal time interval for updating the cache, so that a
balance can be achieved between the harm of returning stale data against the
performance improvements provided by returning cached information. Once you have
figured out this optimal interval, you can include that information with your
data, so that the caching systems can update their data appropriately.
How do I notify consumers of my XML Web
service that my data will expire?
The answer to this question depends upon how you are doing your
caching. It may make sense for the client application to cache the data. If this
is the case, then you need to inform the client application when the data
expires. Presumably, applications will need you to include expiration
information in the data being returned. For your Web service, this may mean
adding a field to the XML response that specifically states an expiration time.
If you are depending on other pre-built solutions for performing your cache,
these usually provide mechanisms for indicating an expiration time. In the case
of using HTTP caching, you can set the HTTP headers that indicate to proxies and
client systems when the data expires. ASP.NET includes a cache class that you
can insert data into. When inserting the data, you have the ability to specify
when the data will be removed from the cache.
Can I depend on the data being in the cache?
The short answer is, no. Almost every caching mechanism ever designed
has an algorithm for removing old information from the cache. Data can be
removed because it expired, but it can also be removed because it has not been
accessed recently and there is other data to be added to the relatively limited
cache resource. Therefore, most caching mechanisms do not guarantee that data
will remain in the cache. This is particularly true of shared caching
mechanisms, such as HTTP proxy caches, or even ASP.NET caches.
What ramifications will there be if
consumers of your XML Web service do not use cached data?
There are a number of reasons why data may not be cached. As mentioned
above, it could simply be that higher priority data replaced your application's
data in the shared cache. It could also be that the developer writing the code
to access your XML Web service is not being responsible about reusing data
previously acquired. When designing your XML Web service, take into account the
possibility for performance improvements based off of caching scenarios, but
also allow for cases where your data does not get cached for any number of
reasons. You will need to be able to deal with situations where caching is not
working optimally.
Caching Scenarios
Now that we have seen
some of the issues to consider when evaluating caching possibilities, we will
look at what those possibilities are for XML Web service developers. First we
will look at two approaches to caching-one at the application level and one at
the protocol level. Then we will check out the capabilities ASP.NET provides for
implementing caching at both those levels.
Application Caching
"Application" is a word that is overused these days. For the sake of
this discussion, I'm using "application" in the sense of the "application
layer." Therefore "application" encompasses both the XML Web service and the
client accessing the XML Web service, where a developer might write code that is
impacted by caching.
Therefore it makes sense that "application caching" refers to writing code for
the XML Web service, or for the client that would perform some sort of caching.
In the XML Web service, caching logic may take the form of storing reusable
class instances in memory. It could also be response data that doesn't change
over many requests.
Client-side application caching is caching where there is code written on the
client to store the information from an XML Web service response, so that the
client does not need to send another request the next time the response data is
needed.
Providing the support for caching usually means indicating the expiration of the
data as well. When an expiration time is fixed, it is possible that the period
can simply be documented and hardcoded in the client without specifically
indicating the expiration of a specific XML Web service response. There are many
cases, however, where there will be no implied expiration time, in which case
expiration information may need to be included with the data to be cached. In
the case of application caching, this may mean that returned data needs a new
field that includes the expiration. Since the expiration time of the data is
basically meta-information that describes the data, the appropriate place for
this information is where meta-information for a SOAP message is supposed to be
stored-in the SOAP headers element.
HTTP Caching
HTTP provides a rich mechanism for caching information. In the HTTP
specification, there are guidelines for system services to provide caching
capabilities. Basically, HTTP proxies and client computers provide caching
capabilities for free to developers writing applications that use HTTP. But
there are limits to the applicability of the HTTP cache to XML Web services.
XML Web services today are primarily accessed through SOAP messages in the body
of an HTTP POST request. Unlike HTTP GET requests, HTTP POST requests have a
body that is outside the scope of HTTP. Therefore HTTP protocol implementations
on proxies and clients will have no ability to intelligently determine how to
cache responses to HTTP POST requests. ASP.NET does have support for invoking
Web methods through HTTP GET requests, but this mechanism is mostly provided for
debugging purposes and is not supported by most other SOAP toolkits.
ASP.NET Caching Capabilities
One of the nice things about developing XML Web services on ASP.NET is
that you get to take advantage of a lot of functionality that developers of Web
Form applications have been using for some time. ASP.NET has rich cache support
built into it, which we can use to make our job easier when providing caching
capabilities for XML Web services. As of the writing of this article, Rob Howard
has started a multi-part series in his Nothing
But ASP.NET column on ASP.NET
caching capabilities. Take a look at this in order to better understand the
specific ASP.NET capabilities in regards to caching. I will focus on which of
those mechanisms might help someone writing an XML Web service.
In ASP.NET, there are basically three different approaches to caching: ASP.NET
output caching, HTTP response caching, and ASP.NET application caching. The
output cache provides a way to inform ASP.NET that the response built for a
particular page can be returned to any further requests for that page. Instead
of executing the ASPX script for future requests, the response from the previous
request is immediately returned. You can specify that a whole page be added to
the output cache or just the output from a specific ASP.NET user control. There
are mechanisms for setting an expiration as well as ways to cache multiple views
of a page based on Web form input.
For XML Web services, you can take advantage of the ASP.NET output cache by
adding theCacheDuration parameter
to the WebMethod attribute
in your Web method declaration. TheCacheDuration parameter
indicates the number of seconds to hold the response in the ASP.NET output
cache. The following code shows how you would use the CacheDuration parameter to
cause the response to be stored in the output cache for 60 seconds.
<WebMethod(CacheDuration:=60)>
_
Public Function HelloWorld() As String
Return "Hello
World"
End Function
In
contrast to ASP.NET output caching, HTTP response caching is simply the way that
ASP.NET allows you to set the HTTP headers so that client applications and HTTP
proxies know how to cache the HTTP response you are sending. The HttpCachePolicy class
is used to perform HTTP response caching. It is available from
Context.Response.Cache within your XML Web service code but as mentioned
previously, its application to SOAP requests in HTTP POST requests is limited.
The third form of ASP.NET caching, application caching, is implemented by an
interesting class, appropriately called the Cache class.
Do not confuse the HttpCachePolicy class
with the Cacheclass,
even though their parent classes refer to both of these members as "cache." The
Cache class is accessible straight from the HttpContext class
for your Web service. The Cache class provides generic caching capability for an
ASP.NET application. You can use the cache to store any sort of random data in
its collection. In many ways, the cache is similar to the ability of theHttpApplicationState class
to hold application-scoped data in its collection.
However, unlike the HttpApplicationState class,
you can also set expiration criterion for the data that you store in the Cache class.
For instance, you can indicate that an object you are storing in the class
expires at a specific time. You can also have rolling expirations, so that an
object is removed from the cache if it has not been accessed for a period of
time. You can even set relationships of cached items to files, so that if the
file changes, the item in the cache will be removed. The next time you look for
the item in the cache, it will not be there, so you will be required to refresh
the data-presumably based off the new information in the specified file. And of
course, as with any other legitimate caching mechanism, the Cache class
implements a mechanism for removing items that have been inactive when resources
are scarce. The following code adds the Foo object to the ASP.NET application
cache.
Dim Foo As New MyFooClass
Context.Cache.Insert("foo", _
Foo, _
Nothing,
_
DateAdd(DateInterval.Minute, 30, Now()), _
System.Web.Caching.Cache.NoSlidingExpiration)
The Insert function
has a number of different options for adding data to the cache. The first
parameter, "foo", is the key for referring to our object in the collection. The
second parameter is the actual item we are adding to the cache. The third
parameter can be used to indicate a dependency, such as the file dependency we
mentioned earlier. In this case our cache item will have no dependency, so we
set the third parameter to "Nothing." The forth parameter is the explicit
expiration time for this item in the cache. We have indicated a time 30 minutes
from now, using the DateAdd function.
The last parameter can be used to set a sliding expiration. This can be used to
indicate that our cached item should expire after it has not been accessed for a
given period of time. In our case, we used an explicit expiration time (30
minutes from now, indicated in the forth parameter), so we set the sliding
expiration to NoSlidingExpiration.
Caching
the MSDN Pencil Company's Catalog
Now we will take a
look at a specific example, and determine what our caching strategy might be in
this scenario. In the last At
Your Service column, Scott defined some changes to our MSDN Pencil Company's PencilDiscovery interface,
so that an entire catalog of our pencil inventory can be requested, instead of
requiring users to do multiple searches. This design was created so that smart
client applications could cache the entire catalog, and then provide querying
capabilities into the data. This will offload our Web service from the extra
work of handling many specific queries, and will give more information to the
client applications using our service. We decided that for our implementation,
we would potentially update the data once a day to allow for new pencils that
could be added to our catalog-or removed if inventories ran out.
There are a couple of nice things about this particular problem with regards to
caching strategies. The first is that the data is public data, which means that
we do not need to worry about specific users having different views into the
data. The second is that we can estimate an explicit time when our data will be
updated so we can set an expiration on our data with a fair degree of
confidence.
Now let's consider our options for application caching. From the client side,
this is a no-brainer. The client application should request the data once a day
and use that data to handle any discovery activities until the next day-but
there is still the issue of informing the client application when the data
expires.
One option would be to simply document that client applications should refresh
their data every 24 hours, but that creates a window of time, potentially as
large as 24 hours, where the client's data could be different than the data from
the Web service. Further, we are defining an interface that could be implemented
by a number of different businesses, and one business may determine that they
want their data to only be refreshed every week instead of every day.
The solution is to simply indicate the expiration time for the data with the
response. Scott's interface definition included a ValidUntil element
in the type declaration for the pencil catalog. We will use this field to
indicate the expiration time for the catalog data. By including the expiration
time in the SOAP message, we also get the added advantage of providing the
ability for caching our information, even if the SOAP message is transferred
over a different protocol than HTTP. For instance, our catalog may be requested
from our XML Web service over HTTP, but it might then be sent to someone else by
SMTP. Because the expiration data is not kept solely in the HTTP headers, it
will not be lost when the message is sent over SMTP.
The following Microsoft® Visual Basic® .NET client code illustrates how a
client will use theValidUntil property
to determine if its cached catalog needs to be updated before handling a user's
query against the data.
Dim PencilResults() As org.pencilsellers.Pencil
If PencilCatalog.ValidUntil
< Now() Then
Dim Discovery As New org.pencilsellers.DiscoveryBinding
PencilCatalog = Discovery.GetCatalog()
End If
PencilResults =
QueryCachedCatalog(PencilCatalog, QueryCriterion)]
On the server side,
we not only need to let the clients know when the data expires by setting theValidUntil element,
we also need to think about ways we can avoid having to build the catalog from
scratch every time we receive a request. Adding the CacheDuration parameter
to theWebMethodAttribute attribute
is one mechanism for doing this. The disadvantage of theCacheDuration parameter
in our case, however, is that the expiration period is fixed at design time. If
we set the CacheDuration to
24 hours, we could run into the following problem:
Suppose we built our catalog from scratch at 6:00 a.m. on April 1, and set the ValidUntil element
for 6:00 a.m. on April 2. The first response would include this data, and that
response would be put in the ASP.NET output cache with an expiration time of
roughly 6:00 a.m. on April 2. Now, suppose we receive a lot of traffic for other
ASP.NET pages around 10:00 p.m. on April 1. Because we will not receive a lot of
requests for the catalog, it is quite likely that it will be removed from the
output cache to free up resources for more immediate output cache needs. Now at
10:30 p.m. on April 1 we receive another request for the pencil catalog. Because
the output cache does not have the response in memory, it will re-run the Web
method and set the expiration time for 10:30 p.m. on April 2. And here lies the
problem: The pencil catalog data will be updated at 6:00 a.m. April 2nd.
However, the output cache could continue responding with the old data until
10:30 p.m. April 2. What we really need is an application caching system where
we can explicitly specify the expiration at runtime.
The ASP.NET application cache provides a nice way to do this. We used the .NET
Framework SDK's WSDL.EXE utility with the /Server command line option to build
the various classes defined by the WSDL definition for the Pencil Discovery
interface. One of the classes it created for us was aCatalog class
based on the type declared in the interface. We simply create the catalog based
off of a SQL query and use the Insert method
to add it to the ASP.NET application cache. We set the expiration time for the
cache entry to the same value as the ValidUntil field
in the catalog. The code for the GetCatalog Web
method is shown below.
Notice that I still use the CacheDuration option
for adding the response to the ASP.NET output cache, but I set the expiration
time to a relatively small 10-minute interval. That way I minimize the time that
stale data might be returned, but still gain the advantage of output caching
performance, which will come in handy for times when we receive a lot of
requests for the catalog. We would expect that most requests for the catalog
would occur within 10 minutes of the expiration time every day.
<System.Web.Services.WebMethodAttribute(
_
CacheDuration:=600), _
System.Web.Services.Protocols.SoapDocumentMethodAttribute( _
"http://pencilsellers.org/2002/04/pencil/GetCatalog", _
RequestNamespace:= _
"http://pencilsellers.org/2002/04/pencil/discovery", _
ResponseNamespace:= _
"http://pencilsellers.org/2002/04/pencil/discovery", _
Use:=System.Web.Services.Description.SoapBindingUse.Literal, _
ParameterStyle:= _
System.Web.Services.Protocols.SoapParameterStyle.Wrapped, _
Binding:="DiscoveryBinding")> _
Public Overrides Function GetCatalog() As Catalog
Dim PencilCatalog As Catalog
If Context.Cache("PencilCatalog") Is Nothing Then
PencilCatalog = CreateCatalog()
Context.Cache.Insert("PencilCatalog", _
PencilCatalog, _
Nothing,
_
PencilCatalog.ValidUntil, _
System.Web.Caching.Cache.NoSlidingExpiration)
Else
PencilCatalog =
Context.Cache("PencilCatalog")
If PencilCatalog.ValidUntil
< Now() Then
Context.Cache.Remove("PencilCatalog")
PencilCatalog = CreateCatalog()
Context.Cache.Insert("PencilCatalog", _
PencilCatalog, _
Nothing,
_
PencilCatalog.ValidUntil, _
System.Web.Caching.Cache.NoSlidingExpiration)
End If
End If
Return PencilCatalog
End Function
Conclusion
When designing your XML Web services there is a good chance that you will want
to implement it with some sort of caching mechanism. This can take several
forms, such as taking advantage of the limited HTTP caching capabilities,
performing application caching on the server, caching responses on the client,
or simply designing options in your XML Web service so that smart clients can
offload some of the fundamental processing work from your server. In the next
column, Scott is going to look at an issue developers of XML Web services and
their consumers may often run into: merging XML. Scott will take the catalog
that our GetCatalog Web
method returns, and will merge it with catalogs from other businesses, so that
his Web site can display a global pencil catalog to his users.