Introduction:
This article describes
a quick and simple approach to programmatically completing a PDF document
through the use of the iTextSharp DLL. The article also discusses how one might
go about using the iTextSharp DLL to discover and map the fields available
within an existing PDF if the programmer has only the PDF but does not have
Adobe Designer or even a list of the names of the fields present in the PDF.
Figure
1:Resulting PDF after Filling in Fields Programmatically.
iTextSharp is a C# port of a Java library written to support the creation and
manipulation of PDF document; the project is available for download through
SourceForge.net here: http://sourceforge.net/projects/itextsharp/
With the iTextSharp DLL, it is possible to not only populate fields in an
existing PDF document but also to dynamically create PDFs. The examples here are
limited to a description of the procedures associated with completion of a PDF;
the download will contain examples of PDF creation in both Visual Basic and C#.
The examples contained herein are dependent upon the availability of the
iTextSharp DLL; use the link provided previously in order to download the DLL
locally to your development machine.
In order to demonstrate filling out a PDF using the iTextSharp DLL, I downloaded
a copy of the W-4 PDF form from the IRS website. The
form contains controls and may be filled out programmatically so it serves as a
good example.
PDF documents that do not contain controls; those meant to be printed and filled
in with a pencil, cannot be completed using this approach. Of
course if you have access to the Adobe tools (Adobe Professional, Adobe
Designer), you can always create your own PDFs with controls, or can add
controls to existing PDFs. Further,
though not demonstrated here, one can also use iTextSharp to create a PDF
document with embedded controls.
Getting Started:
In order to get started, fire up the Visual Studio 2005 IDE and open the
attached solution. The solution consists of a single Win Forms project with a
single form.
I have also included a PDF that will be used for demonstration purposes; this
form is the IRS W-4 form completed by US taxpayers; however, any PDF with
embedded controls (text boxes, check boxes, etc.) is fair game for this
approach. Note that a reference to the iTextSharp DLL has been included in the
project.
All of the project code is contained with the single Windows form. The form
itself contains only a docked textbox used to display all of the field names
from an existing PDF document. The
completed PDF is generated and stored in the local file system; the PDF is not
opened for display by the application.
The application uses the existing PDF as a template and from that template; it
creates and populates the new PDF. The template PDF itself is never populated
and it is used only to define the format and contents of the completed PDF.
Figure 2:Solution Explorer.
The Code: Main
Form
As was previously mentioned, all of the code used in the demonstration
application is contained entirely in the project's single Windows form. The
following section will describe the contents of the code file.
The file begins with the appropriate library imports needed to support the code. Note
that the iTextSharp libraries have been included into the project. The
class declaration is in the default configuration.
Imports System
Imports System.Collections
Imports System.ComponentModel
Imports System.Data
Imports System.Drawing
Imports System.Text
Imports System.Windows.Forms
Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Public Class Form1
The next
section of code contains the form 1 load event handler. During
form load, two functions are called; those functions are used to display all of
the fields present in the template PDF and to create a new PDF populated with a
set of field values.
''' <summary>
''' Application main form Load event handler
''' </summary>
''' <param
name="sender"></param>
''' <param
name="e"></param>
''' <remarks></remarks>
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
'
Load all field names from template PDF
ListFieldNames()
'
Fill the target PDF form with canned values
FillForm()
End Sub
The next
section of code contained in the demo application defines a function used to
collect the names of all of the fields from the target PDF. The
field names are displayed in a text box contained in the application's form.
''' <summary>
''' List all of the form fields into a textbox. The
''' form fields identified can be used to map each of the
''' fields in a PDF.
''' </summary>
Private Sub ListFieldNames()
Dim pdfTemplate As String = "c:\Temp\PDF\fw4.pdf"
'
title the form
Me.Text
+= "
- " + PdfTemplate
'
create a new PDF reader based on the PDF template document
Dim pdfReader As PdfReader
= New PdfReader(pdfTemplate)
'
create and populate a string builder with each of the
'
field names available in the subject PDF
Dim sb As New StringBuilder()
Dim de As New DictionaryEntry
For Each de In pdfReader.AcroFields.Fields
sb.Append(de.Key.ToString() +
Environment.NewLine)
Next
'
Write the string builder's content to the form's textbox
textBox1.Text = sb.ToString()
textBox1.SelectionStart = 0
End Sub
Figure 3
shows the field names collected from the target PDF using the ListFieldNames function
call. In order to map these fields to specific fields in the PDF, one need only
copy this list and pass values to each of the fields to identify them. For
example, if the form contains ten fields, setting the value (shown next) to a
sequential number will result in the display of the numbers 1 to 10 in each of
the fields. One can then track that field value back to the field name using
this list as the basis for the map. Once
the fields have been identified, the application can be written to pass the
correct values to the related field.
Checkbox
controls may be a little more challenging to figure out. I tried passing several
values to the checkbox controls before lining up a winner. In this example, I
tried pass zero, one, true, false, etc. to the field before figuring out that
'yes' sets the check.
Figure 3:The Available
PDF Fields.
The next section of code in the demo project is used to fill in the mapped field
values. The process is simple enough, the first thing that happens is that that
the template file and new file locations are defined and passed to string
variables. Once
the paths are defined, the code creates an instance of the PDF reader which is
used to read the template file, and a PDF stamper which is used to fill in the
form fields in the new file. Once the template and target files are set up, the
last thing to do is to create an instance of the AcroFields which is populated
with all of the fields contained in the target PDF. After the form fields have
been captured, the rest of the code is used to fill in each field using the
field's SetField function.
In this
example, the first worksheet and the W-4 itself are populated with meaningful
values whilst the second worksheet is populated with sequential numbers which
are then used to map those fields to their location on the PDF.
After
the PDF has been filled out, the application reads values from the PDF (the
first and last names) in order to generate a message indicating that the W-4 for
this person was completed and stored.
Private Sub FillForm()
Dim pdfTemplate As String = "c:\Temp\PDF\fw4.pdf"
Dim newFile As String = "c:\Temp\PDF\Final_fw4.pdf"
Dim pdfReader As New PdfReader(pdfTemplate)
Dim pdfStamper As New PdfStamper(pdfReader, New FileStream(
_newFile, FileMode.Create))
Dim pdfFormFields As AcroFields
= pdfStamper.AcroFields
'
set form pdfFormFields
'
The first worksheet and W-4 form
pdfFormFields.SetField("f1_01(0)", "1")
pdfFormFields.SetField("f1_02(0)", "1")
pdfFormFields.SetField("f1_03(0)", "1")
pdfFormFields.SetField("f1_04(0)", "8")
pdfFormFields.SetField("f1_05(0)", "0")
pdfFormFields.SetField("f1_06(0)", "1")
pdfFormFields.SetField("f1_07(0)", "16")
pdfFormFields.SetField("f1_08(0)", "28")
pdfFormFields.SetField("f1_09(0)", "Franklin
A.")
pdfFormFields.SetField("f1_10(0)", "Benefield")
pdfFormFields.SetField("f1_11(0)", "532")
pdfFormFields.SetField("f1_12(0)", "12")
pdfFormFields.SetField("f1_13(0)", "1234")
'
The form's checkboxes
pdfFormFields.SetField("c1_01(0)", "0")
pdfFormFields.SetField("c1_02(0)", "Yes")
pdfFormFields.SetField("c1_03(0)", "0")
pdfFormFields.SetField("c1_04(0)", "Yes")
'
The rest of the form pdfFormFields
pdfFormFields.SetField("f1_14(0)", "100
North Cujo Street")
pdfFormFields.SetField("f1_15(0)", "Nome,
AK 67201")
pdfFormFields.SetField("f1_16(0)", "9")
pdfFormFields.SetField("f1_17(0)", "10")
pdfFormFields.SetField("f1_18(0)", "11")
pdfFormFields.SetField("f1_19(0)", "Walmart, Nome, AK")
pdfFormFields.SetField("f1_20(0)", "WAL666")
pdfFormFields.SetField("f1_21(0)", "AB")
pdfFormFields.SetField("f1_22(0)", "4321")
'
Second Worksheets pdfFormFields
'
In order to map the fields, I just pass them a sequential
'
number to mark them once I know which field is which, I
'
can pass the appropriate value
pdfFormFields.SetField("f2_01(0)", "1")
pdfFormFields.SetField("f2_02(0)", "2")
pdfFormFields.SetField("f2_03(0)", "3")
pdfFormFields.SetField("f2_04(0)", "4")
pdfFormFields.SetField("f2_05(0)", "5")
pdfFormFields.SetField("f2_06(0)", "6")
pdfFormFields.SetField("f2_07(0)", "7")
pdfFormFields.SetField("f2_08(0)", "8")
pdfFormFields.SetField("f2_09(0)", "9")
pdfFormFields.SetField("f2_10(0)", "10")
pdfFormFields.SetField("f2_11(0)", "11")
pdfFormFields.SetField("f2_12(0)", "12")
pdfFormFields.SetField("f2_13(0)", "13")
pdfFormFields.SetField("f2_14(0)", "14")
pdfFormFields.SetField("f2_15(0)", "15")
pdfFormFields.SetField("f2_16(0)", "16")
pdfFormFields.SetField("f2_17(0)", "17")
pdfFormFields.SetField("f2_18(0)", "18")
pdfFormFields.SetField("f2_19(0)", "19")
'
report by reading values from completed PDF
Dim sTmp As String = "W-4
Completed for " + pdfFormFields.GetField("f1_09(0)")
+ "
" + _pdfFormFields.GetField("f1_10(0)")
MessageBox.Show(sTmp, "Finished")
'
flatten the form to remove editting options, set it to false
'
to leave the form open to subsequent manual edits
pdfStamper.FormFlattening = True
'
close the pdf
pdfStamper.Close()
End Sub
End Class
To finish up the PDF, it is necessary to determine whether or not
additional edits will be permitted to the PDF after it has been programmatically
completed. This task is accomplished by setting theFormFlattening value
to true or false. If the value is set to false, the resulting PDF will be
available for edits, if the value is set to true, the PDF will be locked against
further edits.
Once the
form has been completed, the PDF stamper is closed and the function terminated. That
wraps up the discussion of the form based demo project.
Summary
This article described an approach to populating a PDF document with values
programmatically; this functionality was accomplished using the iTextSharp DLL.
Further, the article described an approach for mapping the fields contained in
PDF and may be useful if one is dealing with a PDF authored elsewhere and if the
programmer does not have access to Adobe Professional or Adobe Designer. The
iTextSharp library is a powerful DLL that supports authoring PDFs as well as
using in the manner described in this document; however, when authoring a PDF,
it seems that it would be far easier to produce a nice document using the visual
environment made available through the use of the Adobe tools. Having
said that, if one is dynamically creating PDFs with variable content, the
iTextSharp library does provide the tools necessary to support such an effort;
with the library, one can create and populate a PDF on the fly.