PDA

View Full Version : [Java] PDFBox


Emaborsa
08-04-2010, 10:22
Sono circa tre settimane che sto lavorando ad un progetto per l'università, devo scrivere un programma per parsare PDF. Ne ho trovati due, tra cui PDFBox ed ho scelto questo. Dopo aver importato tutti i pacchetti e scritto vari codici, continua a darmi degli errori (sono molti, ma sono sicuro che risolvendo il primo, automaticamente vengono eliminati anche gli altri) non capisco perché.

Parsing text from PDF file IEEEXplore_01.pdf....
Exception in thread "AWT-EventQueue-0" java.lang.ExceptionInInitializerError
at org.apache.pdfbox.encoding.EncodingManager.<clinit>(EncodingManager.java:40)
at org.apache.pdfbox.pdmodel.font.PDType1CFont.loadEncoding(PDType1CFont.java:457)
at org.apache.pdfbox.pdmodel.font.PDType1CFont.loadOverride(PDType1CFont.java:434)
at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:329)
at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:108)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:124)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at PDFTextParser.pdftoText(PDFTextParser.java:64)
at MainFrame$jbParseL.actionPerformed(MainFrame.java:123)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Unknown Source)
at java.io.InputStreamReader.<init>(Unknown Source)
at org.apache.pdfbox.encoding.Encoding.loadGlyphList(Encoding.java:111)
at org.apache.pdfbox.encoding.Encoding.<clinit>(Encoding.java:64)
... 41 more


Chi mi sa dare una mano?

PGI-Bis
08-04-2010, 14:58
Quando una libreria spara una null pointer exception ci sono buone probabilità che sia un bug della libreria.

Bisognerebbe andare a vedere cosa faccia questa linea:

org.apache.pdfbox.encoding.Encoding.loadGlyphList(Encoding.java:111)

Prova a usare IText o PDFRenderer al posto di PDFBox.

Emaborsa
09-04-2010, 09:07
Allora, ho fatto delle prove e sono arrivato al punto che PDFBox non lavora con TUTTI PDF, ma solo con quelli con la formattazione giusta. Usando quelli giusti funziona, mi crea una String come mi serviva.