Semantic content recognition is the ability to identify components of a document by their “class” – that is if any particular content constitutes a title, subtitle, section, paragraph, word, figure, caption, table, etc. This is a problem, that despite decades of research, remains open. Available solutions are unreliable and are far, far behind the ability of a human being.
At the 2015 PDF Technical Conference, PDFTron’s CTO gave a presentation addressing the problem of semantic content recognition in PDF. The presentation gives an overview of the problem itself, why it has been such a hard problem to solve, and how the industry as a whole might organize itself to finally develop solutions that perform with the same accuracy as a person.
pdf.js is a well known project for rendering PDF documents directly in the browser. In that sense, it is similar to our recently announced PDFNetJS. While pdf.js is interesting project, and may be a reasonable choice in some very specific situations, it has a number of serious problems that make it unreliable for any situation where PDF rendering is important.
The WEB is taking over (obviously)
On desktop computers, web apps continue to replace activities that were previously fulfilled by Windows/Mac/Linux programs. The advantages are many: web apps are immediately available on every connected computer; the user doesn’t need to download and install something; they instantly update and they’re cross-platform. That they naturally lend themselves to a subscription model is yet another reason that companies are choosing to develop web apps in favor of a traditional desktop program.
However, web apps have historically had a number of shortcomings. An inability to deal with local files (without long uploads). Multimedia required security–challenged plugins. And they couldn’t display PDF files. Continue reading
PDFTron is pleased to announce that we are a sponsor and presenter at the upcoming PDF Technical Conference 2015, held October 19-20 in San Jose, California. Aimed at software developers and technical product managers encountering PDF technologies in their work, the event will consist of educational and sponsored sessions presented by experts in the PDF field.
View and Convert Microsoft Word Documents Anywhere
We’re very pleased to announce the launch of the newest addition to PDFNet SDK: built-in Word conversion. Now you can go straight from .docx to .pdf, free from the shackles of Microsoft Word or any other 3rd party software. Conversions are accurate and fast; they also work on any platform supported by PDFNet SDK (and there are a lot of them! see the SDK download page for more details).
Dependency-free Word conversion enables a couple of great use cases: you can perform reliable conversions in a server environment, or pair it with our PDF Viewer for seamless viewing of .docx files on Android, iOS, and Windows Phone/RT.
PDFTron was pleased to present at the PDF Association‘s recently held PDFDay conference in Washington, DC and New York City. James Borthwick, a member of our development team, presented a talk on Collaborating with PDF: Where we are today, and what’s next. It is now available online:
PDFTron would like to invite you to join us at PDF Day. Hosted in Washington, DC on December 10, 2014 and New York City on December 11, 2014, the event will provide CIOs, IT executives, content strategists and document management vendors the big picture on PDF technology – not sales pitches – from top developers in the space.