This entry was written by Nancy my princess for my blog thank you very much.
Recently I came across the need to read pdf's from a server to get the text of pdf and its properties, so I had to investigate what tools could read a pdf to make it a String and then manipulate the content to get just the text that I wanted, I came across a great blog which explains how to do it PDFBox I think is very understandable and helped me to my purpose, all I did was modify it a bit:
This is the blog link I mentioned above: http://noelia-java.blogspot.com/2009/07/leer-pdf-desde-java.html
These are the kind that generates I:
/ **
* * @ author nany
* / public class
EntidadPDF {private String title;
private String author;
numeroDePaginas private int;
private String subject;
private String-words;
private String creator;
private String producer;
private String content; / / the full contents of the document
getAutor public String () {return
author
} public void
setAutor (String author)
this.autor = {author}
public String getContent () {return content
;
} public void
setContenido (String content) {
this.contenido = content;}
getCreador public String ()
{return operator;
} public void
setCreador (String creator) {
this.creador = creator;}
getNumeroDePaginas public int () {return
numeroDePaginas;
} public void
setNumeroDePaginas (int numeroDePaginas) {
this.numeroDePaginas = numeroDePaginas;
} public String
getPalabrasClave ()
-words {return;}
public void setPalabrasClave (String-words) {
this.palabrasClave =-words;
} public String
getProductor () {return
producer;
} public void
setProductor (String producer) {
this.productor = producer;}
getTema public String () {return item
;
} public void
setTema (String topic) {
this.tema = item;}
getTitulo public String ()
{return title;
} public void
setTitulo (String title) {
this.titulo = title;
}}
The method to read the pdf given a url is contained in the next class
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java. io.IOException;
import java.io.InputStream;
import java.net.URL;
org.pdfbox.pdfparser.PDFParser import, import
org.pdfbox.pdmodel.PDDocument;
org.pdfbox.pdmodel.PDDocumentInformation import; import
org.pdfbox.util.PDFTextStripper;
/ **
* * @ author nany
* / public class
ContenedorPDF {private FileInputStream file = null;
PDDocument private document = null;
/ **
* Open the file specified in the path to generate an object of type EntidadPDF
* @ param file * @ return * @ throws org.pdfbox.exceptions.CryptographyException
* @ throws org.pdfbox . exceptions.InvalidPasswordException
* / public
EntidadPDF convertirAModelo (String file) {
EntidadPDF EntidadPDF entPdf = new ();
try {/ / read the pdf of the actual item is
URL url = new URL (file);
InputStream is = url.openStream ();
PDFParser
parser = new PDFParser (is);
parser.parse ();
parser.getPDDocument document = ();
/ / Get the entire contents of pdf
PDFTextStripper PDFTextStripper stripper = new ();
entPdf.setContenido (stripper.getText (document));
/ / get the information of pdf properties
document.getDocumentInformation PDDocumentInformation info = ();
entPdf.setTitulo (info.getTitle ());
entPdf.setAutor (info.getAuthor ());
entPdf.setNumeroDePaginas (document.getNumberOfPages ());
entPdf.setTema (info.getSubject ());
entPdf.setPalabrasClave (info.getKeywords ());
entPdf.setCreador (info.getCreator ());
entPdf.setProductor (info.getProducer ());
} catch (FileNotFoundException e) {
e.printStackTrace ();
} catch (IOException e) { / could not open the file
e.printStackTrace ();} finally {
if (file! = null) {try {
file.close ();
} catch (IOException e) {/ / failed close the file
e.printStackTrace ();}
} if (document! = null) {try {
document.close ();
} catch (IOException e) {/ / could not close the document
e.printStackTrace ();
}}}
entPdf return;
}} Well I hope
and will be of help, we put another link in other libraries that allow manipulation of pdf
http://www.qoppa.com/
0 comments:
Post a Comment