POI HSLF
Java API To Access Microsoft Powerpoint Format Files
1. POI API Document
http://jakarta.apache.org/poi/apidocs/index.html
2. POI Download
http://www.apache.org/dyn/closer.cgi/jakarta/poi/
3. 간단한 텍스트 읽기
4. 테스트버젼
PPT 2000
POI는 MicroSoft 사의 OLE2 기반의 파일 형식을 자바로 개발하기 위한 Apache Software Foundation의 프로젝트 중 하나이다. POI는 MS Format File을 자바를 이용하여 액세스 할 수 있는 API를 제공한다.
package net.study.poi.hssf; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import org.apache.poi.hssf.extractor.ExcelExtractor; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.DateUtil; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.util.CellReference; import org.apache.poi.xssf.extractor.XSSFExcelExtractor; import org.apache.poi.xssf.usermodel.XSSFWorkbook; public class ExcelReader { public void read(String excel) throws IOException { // check file File file = new File(excel); if (!file.exists() || !file.isFile() || !file.canRead()) { throw new IOException(excel); } // Workbook HSSFWorkbook wb = new HSSFWorkbook(new FileInputStream(file)); // Text Extraction ExcelExtractor extractor = new ExcelExtractor(wb); extractor.setFormulasNotResults(true); extractor.setIncludeSheetNames(false); System.out.println( extractor.getText() ); // Getting cell contents for( int i=0; i<wb.getNumberOfSheets(); i++) { for( Row row : wb.getSheetAt(i) ) { for( Cell cell : row ) { // Cell Reference CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex()); System.out.print(cellRef.formatAsString()); System.out.print(" - "); switch( cell.getCellType() ) { case Cell.CELL_TYPE_STRING : System.out.println( cell.getRichStringCellValue().getString() ); break; case Cell.CELL_TYPE_NUMERIC : if(DateUtil.isCellDateFormatted(cell)) { System.out.println(cell.getDateCellValue()); } else { System.out.println(cell.getNumericCellValue()); } break; case Cell.CELL_TYPE_BOOLEAN : System.out.println(cell.getBooleanCellValue()); break; case Cell.CELL_TYPE_FORMULA : System.out.println(cell.getCellFormula()); break; default: System.out.println(); } } } } } public void read2007(String excel) throws IOException { // check file File file = new File(excel); if (!file.exists() || !file.isFile() || !file.canRead()) { throw new IOException(excel); } // Workbook XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(file)); // Text Extraction XSSFExcelExtractor extractor = new XSSFExcelExtractor(wb); extractor.setFormulasNotResults(true); extractor.setIncludeSheetNames(false); System.out.println( extractor.getText() ); // Getting cell contents for( int i=0; i<wb.getNumberOfSheets(); i++) { for( Row row : wb.getSheetAt(i) ) { for( Cell cell : row ) { // Cell Reference CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex()); System.out.print(cellRef.formatAsString()); System.out.print(" - "); switch( cell.getCellType() ) { case Cell.CELL_TYPE_STRING : System.out.println( cell.getRichStringCellValue().getString() ); break; case Cell.CELL_TYPE_NUMERIC : if(DateUtil.isCellDateFormatted(cell)) { System.out.println(cell.getDateCellValue()); } else { System.out.println(cell.getNumericCellValue()); } break; case Cell.CELL_TYPE_BOOLEAN : System.out.println(cell.getBooleanCellValue()); break; case Cell.CELL_TYPE_FORMULA : System.out.println(cell.getCellFormula()); break; default: System.out.println(); } } } } }
public static void main(String argv[]) { try { // ExcelReader ExcelReader reader = new ExcelReader(); // Test File String excel = reader.getClass().getResource("/point.xls").getPath(); String excel2007 = reader.getClass().getResource("/point.xlsx").getPath(); // reader.read(excel); reader.read2007(excel2007); } catch (Exception e) { System.out.println("Exception occurred! : " + e); } } }
1. 엑셀 읽어 DB에 저장.
Action 부분 : 업로드 파일을 받아서 서버에 저장.
// 파일 이름 중복 방지 SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss"); String rndName = sdf.format(new java.util.Date()) + System.currentTimeMillis(); // 실제 저장될 파일 이름 String realName = rndName + ".xls"; // 실제로 저장될 파일 풀 경로 File file = new File( "경로" +"/", realName); // 저장하기(복사) is = vo.getUploadFile().getInputStream(); os = new FileOutputStream(file); int b = 0; int buf_length = 4096; byte[] buff = new byte[buf_length]; while ((b = is.read(buff, 0, buf_length)) != -1) { os.write(buff, 0, b); os.flush(); } vo.setUpload_file(realName);
// 엑셀파일 로드
File file = new File("경로" +"/", vo.getUpload_file());
POIFSFileSystem fs=new POIFSFileSystem(new FileInputStream(file));
HSSFWorkbook wb=new HSSFWorkbook(fs);
HSSFSheet sheet=wb.getSheetAt(0); // 시트 가져오기
int rows=sheet.getPhysicalNumberOfRows(); // Row 갯수 가져오기
int cells=sheet.getRow(0).getPhysicalNumberOfCells(); // cell 갯수 가져오기
if ( rows >= 2 ) { // 1번째 줄일 제목 줄이기 때문에 2번부터 시작
for(int i=1; i<rows; i++){ // row 루프
HSSFRow row=sheet.getRow(i); // row 가져오기
if(row != null && !"".equals(row.getCell(0).getStringCellValue())){
int idx=1;
for(int c=0; c<cells; c++){ // cell 루프
HSSFCell cell = row.getCell(c); // cell 가져오기
String value = "";
if(cell!=null){
switch(cell.getCellType()){ // cell 타입에 따른 데이타 저장
case HSSFCell.CELL_TYPE_FORMULA:
value=cell.getCellFormula();
break;
case HSSFCell.CELL_TYPE_NUMERIC:
value=""+cell.getNumericCellValue();
break;
case HSSFCell.CELL_TYPE_STRING:
value=""+cell.getStringCellValue();
break;
case HSSFCell.CELL_TYPE_BLANK:
//value=""+cell.getBooleanCellValue();
value="";
break;
case HSSFCell.CELL_TYPE_ERROR:
value=""+cell.getErrorCellValue();
break;
default:
}
}
stmt.setString(idx++, value.trim());
}
result = stmt.executeUpdate(); //쿼리문 insert 실행
}
}
}
logger.info("완료!!");
간단하기때문에 jakarta 홈페이지에 가서 guide를 참조하면서 개발 하면 된다.
2. DB에서 자료 읽어서 서버에 엑셀 저장 후 다운로드
// Excel Write HSSFWorkbook workbook = new HSSFWorkbook(); HSSFSheet sheet = workbook.createSheet("Sheet1"); // Font 설정. HSSFFont font = workbook.createFont(); font.setFontName(HSSFFont.FONT_ARIAL); //제목의 스타일 지정 HSSFCellStyle titlestyle = workbook.createCellStyle(); titlestyle.setFillForegroundColor(HSSFColor.SKY_BLUE.index); titlestyle.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND); titlestyle.setAlignment(HSSFCellStyle.ALIGN_CENTER); titlestyle.setFont(font); // Row 생성 HSSFRow row = sheet.createRow((short)0); // Cell 생성 HSSFCell cell1 = row.createCell((short)0 ); cell1.setCellValue("제목1"); cell1.setCellStyle(titlestyle); HSSFCell cell2 = row.createCell((short)1); cell2.setCellValue("제목2"); cell2.setCellStyle(titlestyle); HSSFCell cell3 = row.createCell((short)2); cell3.setCellValue("제목3"); cell3.setCellStyle(titlestyle); // 내용 스타일 지정 HSSFCellStyle style = workbook.createCellStyle(); style.setFont(font); // 내용 중 가운데 정렬 추가 HSSFCellStyle styleCenter = workbook.createCellStyle(); styleCenter.setAlignment(HSSFCellStyle.ALIGN_CENTER); styleCenter.setFont(font); for (int i=0; i<list.size();i++){ row = sheet.createRow((short)(i+1)); cell1 = row.createCell((short)0 ); cell1.setCellValue("내용1"); cell1.setCellStyle(styleCenter); cell2 = row.createCell((short)1); cell2.setCellValue("내용2"); cell2.setCellStyle(style); cell3 = row.createCell((short)2); cell3.setCellValue("내용3"); cell3.setCellStyle(style); } // 실제 저장될 파일 이름 realName = "TEST.xls"; // 실제로 저장될 파일 풀 경로 File file = new File("경로" +"/", realName); // 엑셀 파일을 만듬 FileOutputStream fileOutput = new FileOutputStream(file); workbook.write(fileOutput); fileOutput.close(); logger.info("Excel File 생성 OK");
POI - HSSF
// workbook HSSFWorkbook wb = new HSSFWorkbook(); // sheet HSSFSheet sheet = wb.createSheet("sheetname"); // title cell sheet.createRow((short) 0).createCell((short) 0).setCellValue("Position"); sheet.createRow((short) 0).createCell((short) 1).setCellValue("DaumID"); sheet.createRow((short) 0).createCell((short) 2).setCellValue("[DaumName]"); sheet.createRow((short) 0).createCell((short) 3).setCellValue("imgUrl"); int startpoint = currentpage * 400 + 1; int index = 0; for (int i = startpoint; i < testList.size() + startpoint; i++) { sheet.createRow((short) i).createCell((short) 0).setCellValue(testList.get(index).getPosition()); sheet.createRow((short) i).createCell((short) 1).setCellValue(testList.get(index).getDaumid()); // 한글 인코딩 HSSFCell nametmp = sheet.createRow((short) i).createCell((short) 2); nametmp.setEncoding(HSSFCell.ENCODING_UTF_16); nametmp.setCellValue(testList.get(index).getDaumname()); sheet.createRow((short) i).createCell((short) 3).setCellValue(tesetList.get(index).getUrl()); index++; } [FileOutputStream] fileOut = new [FileOutputStream]("/hanmail/test.xls"); wb.write(fileOut); fileOut.close();
POIFS : Microsoft의 OLE2 포맷 형식의 문서를 자바로 읽고 쓸 수 있는 컴포넌트
HSSF : Microsoft의 엑셀파일을 읽고 쓸수 있도록 지원하는 컴포넌트
HWPF : Microsoft의 워드파일을 읽고 쓸수 있도록 지원하는 컴포넌트
HPSF : Microsoft의 OLE2 포맷 형식의 문서 속성을 어플리케이션에서 사용 할 수 있도록 지원하는 컴포넌트
http://jakarta.apache.org/site/downloads/downloads_poi.cgi
Last updated Jun 15, 2007.
While Apache POI's support for Microsoft Word documents is simplistic, it does provide the capabilities to read the contents of a Word document. In this section I review how to use POI to extract the contents of a Word document.
Word documents are maintained by a POIFSFileSystem class, just like the other documents in this series. Thus to access a Word document, you start with a POIFSFileSystem instance:
POIFSFileSystem fs = new POIFSFileSystem( new FileInputStream( filename ) );
While the POIFSFileSystem provides access to the underlying document, the specific class that represents a Word document is the org.apache.poi.hwpf.HWPFDocument class. At this point in its development, this class provides access to the underlying components of the Word document, but it does not provide a useful higher level abstraction. To extract the textual contents of the document, POI provides another helper class: org.apache.poi.hwpf.extractor.WordExtractor. This class provide three methods, shown in table 1.
Method |
Description |
String[] getParagraphText() |
Retrieves the text from a Word file as an array of paragraph Strings (each String represents a paragraph in from the document) |
String getText() |
Returns all of the text of the Word document |
String getTextFromPieces() |
Returns the text from the document pieces, which may contain markup text; works when the text piece to paragraph mapping is broken |
A WordExtractor can be created from an HWPFDocument instance as follows:
WordExtractor we = new WordExtractor( doc );
And then we can iterate over the entire Word document, paragraph-by-paragraph by using thegetParagraphText() method. For example:
// Extract all paragraphs in the document as strings
String[] paragraphs = we.getParagraphText();
// Output the document
System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
for( int i=0; i<paragraphs.length; i++ )
{
System.out.println( paragraphs[ i ] );
}
Listing 1 puts this all together into a working example that accepts as a command line input a Word document and then displays its contents to the standard output.
package com.informit.poi;
// Import POI classes
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
// Import Java classes
import java.io.*;
import java.util.*;
public class WordSampleReader
{
public static void main( String[] args )
{
if( args.length == 0 )
{
System.out.println( "Usage: WordSampleReader <filename>" );
System.exit( 0 );
}
String filename = args[ 0 ];
try
{
// Create a POI File System object; this is the main class for the POIFS file system
// and it manages the entire lifecycle of the file system
POIFSFileSystem fs = new POIFSFileSystem( new FileInputStream( filename ) );
// Create a document for this file
HWPFDocument doc = new HWPFDocument( fs );
// Create a WordExtractor to read the text of the word document
WordExtractor we = new WordExtractor( doc );
// Extract all paragraphs in the document as strings
String[] paragraphs = we.getParagraphText();
// Output the document
System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
for( int i=0; i<paragraphs.length; i++ )
{
System.out.println( paragraphs[ i ] );
}
}
catch( Exception e )
{
e.printStackTrace();
}
}
}
Listing 1 opens the specified file by creating a FileInputStream to it and building a POIFSFileSystemobject. It then constructs an HWPFDocument object from the POIFSFileSystem and a WordExtractor from the HWPFDocument. It extracts all paragraphs from the WordExtractor and then iteratively displays each paragraph to the standard output.
The output is free of all markup, such as document style, underlining, bolding, and so forth. I tested this program by parsing my Java Reference Guide update (in its raw Word format) from last week (the POI PowerPoint update) and it displayed the following output, abbreviated:
Word Document has 140 paragraphs
(c)Jakarta POI
(d)Microsoft PowerPoint Documents
Having taught dozens of classes and given hundreds of presentations, Microsoft PowerPoint has become one of my trusty companions. (A quick search of my local documents revealed over 300 PowerPoint presentations.) And while I might not feel the need to compose a slide presentation without Microsoft PowerPoint, the oppor
tunity to read my PowerPoint presentations and search for content from within a Java application is very appealing.
...
The only "noise" that it displayed at the end of the document included the template name as well as an internal date stamp:
MACGLOBL.DOT
? DATE \l ¶6/3/2007§? TIME ¶9:35 PM§
MACGLOBL.DOT
? DATE \l ¶6/3/2007§? TIME ¶9:35 PM§
While not perfect, it does provide you with access to read and search the contents of a Word document.
Microsoft maintains a collection of proprietary file formats that make interoperability with Java a difficult task. The Jakarta POI project bridges that gap by providing a pure Java solution to reading from and writing to these proprietary file formats.
Thus far in this series we have explored how POI can be used to read Microsoft Excel, PowerPoint, and Word files. As a project for you to embark on if you want more experience using POI, try the following suggestion: build a document search engine. Accept as input a directory to search and then iteratively open each document. If the document has a ".doc" extension then open it with the Word reader, an ".xls" extension then open it with the Excel reader, or a ".ppt" extension then open it with the PowerPoint reader. For a simple search you can load the entire document into a String and then search for a specific substring using the indexOf() method. And for the truly adventurous, you can incorporate the Jakarta Regular expression library and go to town.
POI HSLF
Java API To Access Microsoft Powerpoint Format Files
1. POI API Document
http://jakarta.apache.org/poi/apidocs/index.html
2. POI Download
http://www.apache.org/dyn/closer.cgi/jakarta/poi/
3. 간단한 텍스트 읽기
4. 테스트버젼
PPT 2000
[펌] SWIG를 이용해 C++코드를 C#에서 사용하기 (0) | 2012.06.08 |
---|---|
[POI와 JXL 비교] JAVA에서 엑셀 파일 읽고 쓰기 (0) | 2011.06.20 |
Akka를 이용한 Concurrent 프로그래밍 시작하기 (0) | 2011.04.03 |
Tlemock을 이용한 Mock 객체 생성 및 사용 (0) | 2011.04.03 |
jMock vs. EasyMock (0) | 2011.04.03 |