Merging data with a PDF Form

A previous article, Submitting a PDF Form, showed how a PDF Form could be submitted to a web server, without navigating away from the PDF. A related question is, "Can I populate a PDF Form at the server?", or, "How do I merge an FDF with a PDF?". This article will show you how to serve the user a pre-filled PDF Form.

The FDF Format

Consider a PDF document containing two simple textbox fields, named "Text1" and "Text2". If you enter data into those fields, and export the Form Data (in Acrobat 6.0 Professional: 'Advanced | Forms | Export Forms Data'), you will create a file with the .FDF (Forms Data Format) extension. Open this file in a text editor, and you'll see something like this:

Example 1. The FDF format.
%FDF-1.2
%âãÏÓ
1 0 obj<</FDF<</F(tgreer/articles/fdf/form01.pdf)
/ID[<826851cbc19b7f5fba86369c981fe040><159c51c6e0b4814ca2552f89ab9a1ed1>]
/Fields[<</T(Text1)/V(Thomas)>><</T(Text2)/V(Greer)>>]>>>>
endobj
trailer
<</Root 1 0 R>>
%%EOF

The FDF file format is officially documented in the PDF Reference. Skip the heavy reading, we'll break it down here. The first line is the "FDF Header", and is required. The current specification is version 1.2, so %FDF-1.2 indicates that this is an FDF file conforming to version 1.2 of the specification.

The next line with the strange characters in it is... not documented. I speculate its purpose is to force file transfer mechanisms, such as FTP, to treat the file as binary. The rest of the file consists of "objects". In our simple form here, there is only a single object. If we reformat the file a bit, we can more clearly see the structure.

Example 2. The FDF format, re-formatted.
01:  %FDF-1.2
02:  %âãÏÓ
03:  1 0 obj
04:  <<
05:     /FDF
06:     <<
07:       /F (tgreer/articles/fdf/form01.pdf)
08:       /ID [ <826851cbc19b7f5fba86369c981fe040> <159c51c6e0b4814ca2552f89ab9a1ed1> ]
09:       /Fields
10:        [
11:           << /T(Text1) /V(Thomas) >>
12:           << /T(Text2) /V(Greer) >>
13:        ]
14:      >>
15:    >>
16:   endobj
17:   trailer
18:   << /Root 1 0 R >>
19:   %%EOF

I added linebreaks and spaces to make the file more readable. Note that this doesn't break the file, it will work just fine. This is important to note because, if you hadn't guessed it already, we'll be writing our own FDF. The line numbers, of course, aren't part of the file. They are there so I can say things like:

Line 03 defines an "indirect object". The data types in a FDF are borrowed from PDF. It's simple: this is the first object "1", and it's the first generation "0", of the object. When we create a dynamic FDF, we'll always be creating a file with a single, first generation object. Thus "1 0 obj" is the only indirect object we'll need to worry about. The object is closed at Line 16, with the keyword "endobj".

What's is this object? A dictionary. A dictionary is a construct borrowed from the PostScript programming language. A dictionary contains key-value pairs, and is delimited by double angle brackets. So line 03 creates an object, and lines 4 and 15 define the object as a dictionary.

A dictionary contains key-value pairs. In PDF and FDF files, the "keys" are always names. A Name is a datatype, it's basically a "variable name", a string with a leading "/". The "value" part can be any other datatype.

So our main dictionary contains a single key, /FDF, and a single value, another dictionary. The /FDF dictionary is required, and it in turn contains three entries: /F, /ID and /Fields.

The /F entry, line 07, is required, and is called the "file specification". The value is a string, delimited with parantheses. The string is the path to the "parent" PDF. The data in this FDF will be loaded into the PDF referenced in the file specification. If you open the FDF, in fact, this PDF will open, with the data plugged in. Aha!

The /ID entry, line 08, is called the "file identifier". It's a way to uniquely identify the parent PDF. It's meant to protect against files "pretending" to be the original, and it contains an array of two hexadecimal strings. It's optional, so we're going to choose the option of ignoring it's very existence.

The heart of the beast, our data: line 09 is the /Fields array (arrays are another datatype, delimted by square brackets). The array contains a dictionary per form field. Since our form contained two textboxes, the /Fields array contains two dictionaries.

The rest is self-explanatory. But I'll explain it anyway, of course. Each dictionary in the /Fields array contains two key-value pairs. /T is the name of the field, /V is the value to place in the field. The field names and values are both strings, and so are wrapped in parentheses.

The FDF file ends with the "trailer" keyword, and the actual trailer. The trailer contains a dictionary of all the indirect objects in the file. The only required entry is "Root", which will list the object containing the /FDF dictionary: << /Root 1 0 R >>. We wrap things up with the end-of-file comment, line 19.

Linking the FDF to a PDF

Look back at the file specification (/F) entry. When we open an FDF, Acrobat will find and open the PDF. Acrobat is internet-aware. If we use a fully-qualified URL to the PDF, Acrobat will actually retrieve the PDF from the web.

That's the heart of our solution. If we serve the user an FDF, Acrobat will go get the PDF from our server.

Writing the FDF

To generate the FDF dynamically, you'll need to use a server-side language. I'll use PHP in this example, but you can do the same thing with ASP or ASP.NET. The technique is exactly the same as generating dynamic HTML. You'll retrieve data from a database, and output that data as an FDF.

Discussing all the ins and outs of database access and server-side coding is outside the scope of this article. I'm assuming you already know the basics. I've created a very simple database, containing name and address data and a (fake) social security number. I've also created a highly-modified "tax" form. We're going to select a record from the database, and serve back a PDF pre-filled with the data.

If you click on a name, you are taken to a PHP program which gets the name off the querystring, queries the database for that person's information, and authors an FDF. I've colored the PHP parts for easier reading.

Example 3. PHP code listing.
<? require("db_conn.php"); $name = urldecode($_GET["name"]); $sql = "SELECT * FROM `sample_db` WHERE name = '".$name."'"; $res = mysql_query($sql); $row = mysql_fetch_array($res, MYSQL_ASSOC); $soc_sec = $row["soc_sec"]; header("Content-type: application/vnd.fdf"); ?>
%FDF-1.2 1 0 obj << /FDF << /F(http://www.tgreer.com/tektips/fw9.pdf) /Fields [ << /T(c1-1) /V/<? if ($row["biz_type"] == "I") { echo "Yes"; } else { echo "Off"; } ?> >> << /T(c1-2) /V/<? if ($row["biz_type"] == "C") { echo "Yes"; } else { echo "Off"; } ?> >> << /T(c1-3) /V/<? if ($row["biz_type"] == "P") { echo "Yes"; } else { echo "Off"; } ?> >> << /T(f1-1) /V(<? echo $row["name"]; ?>) >> << /T(f1-2) /V(<? echo $row["biz_name"]; ?>) >> << /T(f1-4) /V(<? echo $row["addr1"]; ?>) >> << /T(f1-5) /V(<? echo $row["addr2"]; ?>) >> << /T(f1-8) /V(<? echo $soc_sec[0]; ?>) >> << /T(f1-9) /V(<? echo $soc_sec[1]; ?>) >> << /T(f1-10) /V(<? echo $soc_sec[2]; ?>) >> << /T(f1-11) /V(<? echo $soc_sec[3]; ?>) >> << /T(f1-12) /V(<? echo $soc_sec[4]; ?>) >> << /T(f1-13) /V(<? echo $soc_sec[5]; ?>) >> << /T(f1-14) /V(<? echo $soc_sec[6]; ?>) >> << /T(f1-15) /V(<? echo $soc_sec[7]; ?>) >> << /T(f1-16) /V(<? echo $soc_sec[8]; ?>) >> ] >> >> endobj trailer <</Root 1 0 R>> %%EOF

Some things to note. First, you must output the proper content-type header. For an FDF file, this is "application/vnd.fdf". If you don't do this, your FDF will be interpreted as a text file. Secondly, the Field entries must match the actual field names in the PDF Form. Also, we've left out the optional /ID key, as well as the line of funny characters. They aren't needed, so why bother with them?

Performance Considerations

The infamous "server roundtrip" comes into play again. Your application will generate the FDF, and serve it to the user. Acrobat will open, parse the FDF, and request the PDF from your server. The PDF loads, and Acrobat will combine the PDF and FDF. It's a performance hit, but there's no such thing as a free lunch.

Conclusion

With a little server-side coding, it's possible to serve your users pre-filled PDF Forms.

About the Author

Thomas D. Greer has over 12 years experience in the printing business. He held the position of Director of Development for Consolidated Graphics, where he wrote the COIN eCommerce platform. Prior to that he was Vice-President of Technology of a large printing company acquired by Consolidated Graphics, where he was responsible for the development of a completely custom-written plant management system still in use.

Today Thomas provides consulting, development, implementation, and training services to commercial printers. He can be reached on the web at www.tgreer.com.

Now What?

Perhaps you'd like to read some other technical articles I've written?

If you'd like to discuss this article, or make suggestions for future articles, join my free discussion forum.