ApacheCon North America 2010 will start in 4 weeks and I’m happy to be a part of it. One year after its graduation from incubation I’ll present Apache PDFBox an open source Java PDF library for working with PDF documents within the content track.
Starting with a brief description of the project and its history I’ll continue with an overview on how to use Apache PDFBox for
- text extraction
- creating pdfs from java
- merging/splitting pdfs
- converting a single page to an image
After explaining some of the examples coming along with Apache PDFBox I’ll introduce some of the command-line utilities which might be valuable when using Apache PDFBox within applications.