Introduction
It is a common requirement in XML Web services design to implement means for
sending binary data back and forth between the client and the server. What often
comes as an additional burden is very high upper limit in allowed data volume.
For example, service used to upload and download videos might have limit in
video size measured in gigabytes. With such requirement at hand it is obvious
that some specific decisions must be made when designing the service.
Even in
much simpler cases such specific design must be employed as well. For example,
if images are to be uploaded and downloaded, that also puts the data volume
limit far into order of megabytes. With moderate network bandwidth available,
transfer of images would take at least several seconds, which can certainly not
be viewed as an instantaneous operation.
Hence the conclusion that every Web service supposed to transfer binary data
must be designed to expect transfers to take longer periods of time. In
addition, Web service should be prepared to sustain high data transfer rate for
prolonged periods of time without suffering performance loss.
In this article we are going to demonstrate design techniques which can be used
to produce such Web service. Solution which will be developed along the course
of this text is for demonstration purpose only and cannot be applied to
practical, real world problems as is. But we hope that methods used to develop
it can be applied to much more complex cases.
First we will start with defining requirements that must be met by the Web
service to consider it a good solution. Then we will design the Web service and
corresponding client for a simple task of uploading and downloading files.
Requirements
In data transfer operations both client and server must be able to control the
transfer all the way long, from the first to the last byte. If either side is
not able to control the process then it might become unresponsive. And that is
the least favorable property of any software. If unresponsiveness takes longer
time, like ten seconds or more, then it becomes a serious problem and a drawback
of the whole application.
Therefore main request placed in front of the client and the server, beyond the
trivial fact that they are supposed to actually transfer data among each other,
is that both must be responsive at all times. And to be responsive means to
respond to messages sent by other entities or even by each other. Now we should
ask what those messages are, to which client and server should respond. Server
typically must respond to messages regarding state of shared resources, like
free memory or available CPU. It must be able to control amount of resources
allocated to specific client during the course of the long operation, so that
overall server stability is not jeopardized. Further on, server must be able to
cancel the ongoing operation and to release all resources allocated for it. On
the other side client should be able to receive progress indications. It should
also be able to cancel current operation at will.
In many practical cases these requirements can be formally defined as the
following list:
- Upload server must be able to define
allowed transfer rate, so to limit amount of resources allocated to a single
remote client. If client attempts to upload at higher rate, its request may
fail.
- Upload server must be able to provide
progress indication to the client.
- Upload client must be able to cancel
further transfer and to quit all the data transferred up to the cancellation
point.
- Download client must be able to set the
transfer rate at which it can accept data from the server.
- Download server should be able to provide
progress indication, or at least total amount of data that will be sent to
the client, so that client can calculate the progress based on amount
received so far.
- Download client must be able to cancel
further download at any point.
- Both upload and download operations must
be recoverable. It means that if connection between client and server is for
any reason broken, then transfer should continue from the point when it was
interrupted once the connection is restored.
In the following section we will design Web
service and appropriate client for the simple task of uploading and downloading
files.
Solution Design
One of the simplest techniques used to build responsive application is to
perform work in time slices - short amounts of time between which short
maintenance tasks can be performed. For example, long calculation can be
executed in a loop. Application would then test presence of a certain signal
before every iteration step. If signal is present, further calculation would be
cancelled. Otherwise, operation would be continued for one more iteration.
Similar technique can be applied to data transfer. In this case slicing would be
done over data, rather than time. For example, client could send data in chunks
not larger than 10 kilobytes in size. That would give the server opportunity to
do maintenance work between each two successive chunks. That is the core idea in
the solution which will be presented in this article. We are going to design a
Web service which receives chunks of data and when all chunks are received, that
will mean that whole file has been received.
This design implies that client will contact the server many times before the
file is fully transferred. So we need a method to ensure that server will know
that successive calls are actually coming from the same client and that data
transferred in each of the calls are part of one consistent data stream.
Luckily, we already have a solution to that very same problem in Web
applications in general - it is called sessions.
Client's communication with the server is done in multiple requests being sent
over time, each followed by the response from the server. The glue which
connects those requests, which are generally executed over separate socket
connections opened and closed in distinct intervals in time, is a token called
session ID. It is preserved at client at all times until the overall
communication is complete. When client contacts the server for the first time,
it receives a unique session ID. With every next request, client sends the token
to the server, so that server knows that the request is a continuation of
previous communication.
Upload Server Design
First method which we will design in the Web service, i.e. on the server side,
is the one used to initiate file upload. The method will be named
BeginUploadFile and it will return new file handle. This handle will be later
used to identify the file in the following upload operations. In addition to
that, method will return an output value indicating maximum accepted chunk size.
Here is the method definition:
[WebMethod]
public string BeginUploadFile(out int maxChunkSize)
{
maxChunkSize = 128 * 1024;
// This server will not allow more than 128KB to be sent in one call
// Now we will create a
temporary file in temporary directory
// File name will be in form _partNNN.dat, where NNN is a three-digit unique
number
// This unique number will then be returned as file handle
DirectoryInfo dir = new
DirectoryInfo(Path.GetTempPath() + "FileUpload");
if (!dir.Exists)
dir.Create();
Random rnd = new Random();
int n = 0;
string name = null;
do
{
n = rnd.Next(1000);
name = string.Format("_part{0:000}.dat", n);
}
while (dir.GetFiles(name).Length > 0);
// At this point we have
obtained unique name which is identified by number n
FileStream fs = new
FileStream(dir.FullName + "\\" + name, FileMode.CreateNew);
fs.Close();
// Now we have created a file
of size zero, which is server-side representative of the file handle
// Finally, return handle to
the caller
return n.ToString("000");
}
This method basically creates a file under the FileUpload directory within the
temporary files directory. The file name contains a part which is randomly
generated, and that part is actually returned as new file handle. It is very
important that handle is stripped off from all meaningful information other than
trivially available, such as timestamp and similar. Handle, as well as session
IDs, should be such that client cannot intentionally generate valid value in
order to try to overtake other client's session and use it to do damage. In our
demonstration handle is a three-digit random number. In more serious
applications string of length 20 characters or so would probably be more
appropriate, and more secure.
Next method exposed by the Web service is named UploadChunk. As its name
implies, this method will upload part of the file by effectively appending data
to partial file on the local file system. Method receives three parameters: file
handle previously obtained from the BeginUploadFile method, array of bytes which
is the content of part to append to file and position at which block of data
starts in the complete file. Since we are appending data to partially completed
file, this position would have to match current length of the file. Here is the
source code:
[WebMethod]
[SoapDocumentMethod(OneWay=true)]
public void UploadChunk(string fileHandle, byte[] data, long startAt)
{
FileInfo fi = new
FileInfo(Path.GetTempPath() + "FileUpload\\_part" + fileHandle + ".dat");
// Perform validation
if (!fi.Exists || fi.Length != startAt)
{
// Do whatever needs to be done when validation fails
}
else
{
// When validation has
passed,
using (FileStream fs = new FileStream(fi.FullName, FileMode.Append))
fs.Write(data, 0, data.Length);
}
}
Method first processes the file handle by simply constructing the path to the
file. Next step is to validate input data - in our case just to verify that file
exists and that startAt matches current file length, in which case buffer
content would be appended at the end of the file, as expected by design. When
verification has passed, method performs the operation by appending content of
the buffer at the end of the file.
Note that this method is declared as one-way, which means that client will not
wait for the returning SOAP message, but it will consider the request completed
as soon as it gets low-level TCP/IP confirmation that its request has been
successfully posted to the server. This measure ensures that client will get
onto sending the next chunk as soon as possible. But on the downside, this
feature cuts off any feedback from the server through this method - there is no
way for the client to find out whether storing data went well or not. Even if
server threw the exception, that would not be sent back to the client.
To have the issue resolved Web service can expose a method which indicates
current status. Client may invoke this method once in a while and, if status is
not correct, it could jump back through data and repeat last couple of upload
operations. Calling this status method rarely enough would ensure that
performance will not diminish due to waiting for server to respond. This little
complication may be afforded in practice because setting the UploadChunk method
to be one-way would typically double the upload speed because half of the round
trip is saved with every chunk. Here is the GetStatus method, which can be used
to read current upload status:
[WebMethod]
public long GetStatus(string fileHandle)
{
FileInfo fi = new FileInfo(Path.GetTempPath() + "FileUpload\\_part" +
fileHandle + ".dat");
long pos = (fi.Exists ? fi.Length : 0);
return pos;
}
This implementation simply returns current length of the partial uploaded file,
which indicates total amount of data successfully uploaded so far.
Last method exposed by the Web service is named EndUploadFile and it must be
invoked every time when a file is uploaded. This method receives file handle, so
that server knows to which file to turn its attention, and a flag indicating
whether file upload has been complete or client wishes to cancel further upload.
[WebMethod]
public void EndUploadFile(string fileHandle, bool quitUpload)
{
FileInfo fi = new
FileInfo(Path.GetTempPath() + "FileUpload\\_part" + fileHandle + ".dat");
// Validate handle
if (!fi.Exists)
throw new System.ArgumentException();
if (quitUpload)
fi.Delete();
else
{
FileInfo targetFile = new FileInfo(Path.GetTempPath() + "FileUpload\\file"
+ fileHandle + ".dat");
if (targetFile.Exists)
targetFile.Delete();
fi.MoveTo(targetFile.FullName);
}
}
Again first step in the method is to determine file location from file handle
and to verify that file exists. When verification passes, method will either
delete temporary file (if quitUpload flag is set to true), or rename the file
into permanent form (overwriting any existing file with the same name). These
last steps mimic the cleanup logic which would be implemented in a real world
system when data upload ends.
These methods complete design of the upload server. Following section will
present the client design.
Upload Client Design
Client can be designed in quite a simple way, for example like this:
FileTransferServer srv = new FileTransferServer();
int maxChunkSize = 0;
string fileHandle = srv.BeginUploadFile(out maxChunkSize);
byte[] buffer = new
byte[maxChunkSize];
using (FileStream fs = new
FileStream("<file path here>", FileMode.Open))
{
int bytesRead = 0;
do
{
long pos = fs.Position;
bytesRead = fs.Read(buffer, 0, buffer.Length);
if (bytesRead < buffer.Length)
Array.Resize<byte>(ref buffer, bytesRead);
srv.UploadChunk(fileHandle, buffer, pos);
Console.WriteLine("Uploaded {0} of {1} bytes (handle={2})", fs.Position,
fs.Length, fileHandle);
}
while (bytesRead > 0);
}
srv.EndUploadFile(fileHandle, false);
This implementation first instantiates the proxy class and then makes a request
to the server's BeginUploadFile method, as to initiate the file transfer.
Method's return value is remembered in the fileHandle string variable, and that
is the handle which will be provided in all subsequent calls until this
particular file upload completes.
After this introduction, series of calls to the UploadChunk method are made
until all data contained in the file are sent to the server. Once done, a single
call to EndUploadFile is made to indicate that upload is complete. Beyond this
line, current value of fileHandle variable is no more valid, because server has
already forgotten it.
Conclusion
In this article we have presented server and client code for chunked file
upload. Techniques used to code these two entities can be employed in many other
solutions. Advantages of chunking technique are that it can be applied in
virtually any communication protocol (HTTP, XML Web services, sockets, etc.) and
that it is very simple to implement. If one system is capable to send data in
one segment, then it is certainly capable to send it sliced into pieces. Hence
only a small change in design is required to introduce chunking into an existing
system. On the negative side, chunking does not resolve the issue of
responsiveness completely. It is so because client and server have opportunity
to exchange control messages only between the chunks. If sending of one block of
data gets stuck, then all communication gets stuck.
This problem can be overcome by putting the complete communication on a separate
thread. As long as control messages sent between chunks only affect present
session (e.g. sending of one file), outer world can assume whatever it wants
about that session - e.g. that it has ended instantaneously when outer entity
has instructed the client on a separate thread to cancel the sending. The fact
that effective cut off will occur only after current chunk is sent, which might
not be instantaneous at all, does not affect the rest of the system. This
particular design is discussed in the next article titled "How to Improve Responsiveness of Objects that do not Guarantee Responsiveness" (http://www.c-sharpcorner.com/uploadfile/b81385/8914/).