Streaming a PDF From the Web to a Mobile or Desktop App

Wouldn’t it be nice to be able to view a remote PDF the same way as one can view an online video? By this we mean you can see the beginning of the content almost immediately, and if you move to the middle of the content, it is prioritized and loaded very quickly, before other parts.

Unfortunately, with remotely stored PDFs, this is not how things usually work. What typically happens is that the entire file must be downloaded before it can be opened and viewed. This is the case for two reasons:

  1. PDF documents are not typically linearized (or in Adobe lingo “fast web view”ed). This means that the contents of page twenty, for example, can actually be located in many different places within the file, with no way of being able to quickly determine where the different pieces are. Without this information, the entire document must be downloaded before page two can be displayed.
  2. Even if a document is linearized, most viewers are not equipped to show partial content. They are designed to work on complete documents and will reject partial documents as corrupted.

There is, however, a better way.

If you’re building a website or HTML5 app, see this post for how to display a PDF in HTML. If you’re building a native program for desktops (Windows, Mac, Linux) or mobile (iOS, Android, WinRT, Windows Phone 8), then read on…

Providing Responsive Remote PDF Viewing

What would be ideal is if the user could view the beginning, middle or end of the document as they scroll, before other pages have necessarily been downloaded. This is exactly how viewing a PDF using PDFNet can work.

What is necessary to make this happen?

  1. A linearized PDF document.
  2. A web server that supports byte-range requests (otherwise known as byte serving).
  3. PDFNet’s OpenURL method.

How to Make a Linearized PDF Document

A linearized document has two important properties:

  1. Its internal structure is organized so that pages are arranged from beginning to end and that a page’s data is all in the same area, and
  2. the document has a “linearization dictionary” and “hint tables” that list the location and size of all internal objects at the beginning of the PDF file.

PDFNet can linearize new and and existing PDF documents when they are saved. Once you have a PDFDoc object (by either opening an existing PDF or creating a new one), it can be saved with the e_linearized flag, for example:

using (PDFDoc doc = new PDFDoc("in.pdf")) {
pdfdoc.Save("out.pdf", SDFDoc.SaveOptions.e_linearized);
}

Another potential option is to use Acrobat to save a document for “Fast Web View”, which will also linearize a document.

Byte-Range Requests Explained

The second piece of the puzzle is a web server that supports byte-range requests. A byte-range request asks the server to send a certain set of bytes from a file that don’t necessarily start from byte zero or comprise the entire file. For example if the HTTP GET headers include the following key value pair

Range: bytes=1495454-1594723

~ 97 KB will be sent from the requested file by a byte-range supporting server, starting at byte 1495454.

The good news is your webserver probably already supports this feature. To test if it does, use cURL on your favourite *nix system (or using the native Windows version) as follows:

curl -H Range:bytes=16- -I http://pdftron.com/index.html

If the server responds with

HTTP/1.1 206 Partial Content

then it supports byte ranges. (If it responds with “HTTP/1.1 200 OK”, then it does not support byte ranges.)

A couple of small notes:

  1. If you’re storing documents on SharePoint, caching needs to be enabled to enable byte-range requests.
  2. If you are serving documents stored in a database served via dynamic web pages, it may not support byte-serving in this specific case. A solution would be to temporarily save the file to a static URL that can then be used for byte serving.

How to Use PDFNet to Open a Remote Linearized PDF Document

Now that you understand how to linearize a document, and how to check to make sure your webserver supports byte serving, the last step is to actually open the document with PDFNet. This is done with  PDFViewCtrl’s API OpenURLAsnyc. This API is available for Windows (C++, .NET), Android, iOS, WinRT/Windows Phone.

Instead of calling SetDoc:

PDFViewCtrl.SetDoc(PDFDoc doc);

the call is replaced with a call to OpenURLAsync:

PDFViewCtrl.OpenURLAsync(string url);

What Happens After the Call to OpenURLAsync

Once the call to OpenURLAsync is made, there will be a slight pause while the control contacts the server and downloads the preliminary data describing the document, such as the total number of pages and where resources for each page are kept. When this information is obtained (a typical wait would be 0.5-3 seconds), blank pages for the entire document will be loaded, and content for the current page will be downloaded and displayed. If the user does not scroll the document, the control will continue downloading content for the surrounding pages. If the user scrolls to a page that has not been downloaded, the control will then download it before any other pages that still need to be downloaded. This ensures a responsive viewing experience that will (depending on the network connection speed) not differ tremendously from viewing a local PDF.

Conclusion

As more and more data heads to the cloud, it’s important that information that is stored remotely can be accessed in a fast, responsive manner. Using linearized PDF documents with PDFNet’s OpenURL method is a way to deliver a top-notch remote PDF viewing experience.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s