Semantic Content Recognition in PDF

Semantic content recognition is the ability to identify components of a document by their “class” – that is if any particular content constitutes a title, subtitle, section, paragraph, word, figure, caption, table, etc. This is a problem, that despite decades of research, remains open. Available solutions are unreliable and are far, far behind the ability of a human being.

At the 2015 PDF Technical Conference, PDFTron’s CTO gave a presentation addressing the problem of semantic content recognition in PDF. The presentation gives an overview of the problem itself, why it has been such a hard problem to solve, and how the industry as a whole might organize itself to finally develop solutions that perform with the same accuracy as a person.


pdf.js: Interesting Project, Incorrect Rendering

pdf.js is a well known project for rendering PDF documents directly in the browser. In that sense, it is similar to our recently announced PDFNetJS. While pdf.js is interesting project, and may be a reasonable choice in some very specific situations, it has a number of serious problems that make it unreliable for any situation where PDF rendering is important.

Continue reading

Introducing PDFNetJS: A Complete Browser-Side PDF Viewer and Editor


The WEB is taking over (obviously)

On desktop computers, web apps continue to replace activities that were previously fulfilled by Windows/Mac/Linux programs. The advantages are many: web apps are immediately available on every connected computer; the user doesn’t need to download and install something; they instantly update and they’re cross-platform. That they naturally lend themselves to a subscription model is yet another reason that companies are choosing to develop web apps in favor of a traditional desktop program.

However, web apps have historically had a number of shortcomings. An inability to deal with local files (without long uploads). Multimedia required securitychallenged plugins. And they couldn’t display PDF files. Continue reading

PDFTron at the PDF Technical Conference 2015

PDFTron is pleased to announce that we are a sponsor and presenter at the upcoming PDF Technical Conference 2015, held October 19-20 in San Jose, California. Aimed at software developers and technical product managers encountering PDF technologies in their work, the event will consist of educational and sponsored sessions presented by experts in the PDF field.

Continue reading

Cross-Platform Word to PDF Conversion

View and Convert Microsoft Word Documents Anywhere

We’re very pleased to announce the launch of the newest addition to PDFNet SDK: built-in Word conversion.  Now you can go straight from .docx to .pdf,  free from the shackles of Microsoft Word or any other 3rd party software.  Conversions are accurate and fast; they also work on any platform supported by PDFNet SDK  (and there are a lot of them! see the SDK download page for more details).

docx. It took a long time to get the text to flow around shapes correctly in the docx engine.

Dependency-free Word conversion enables a couple of great use cases: you can  perform reliable conversions in a server environment, or pair it with our PDF Viewer for seamless viewing of .docx files on Android, iOS, and Windows Phone/RT.

Continue reading