java pdf转txt用于文档全文检索
待处理
https://stackoverflow.com/questions/18098400/how-to-get-raw-text-from-pdf-file-using-java
https://stackoverflow.com/questions/50692771/multiple-pdf-file-to-txt-in-java
https://stackoverflow.com/questions/30570196/how-to-convert-pdf-into-text-file-using-itext-liberary
https://stackoverflow.com/questions/23813727/how-to-extract-text-from-a-pdf-file-with-apache-pdfbox
https://stackoverflow.com/questions/583615/pdf-to-text-tool-or-java-library
https://stackoverflow.com/questions/17986305/how-can-i-convert-pdf-file-to-word-file-using-java
lucene 全文检索
https://www.toptal.com/database/full-text-search-of-dialogues-with-apache-lucene(https://github.com/dougsparling/lucene-testbed)
https://stackoverflow.com/questions/6807701/lucene-full-text-search
https://medium.com/@wkrzywiec/full-text-search-with-hibernate-search-lucene-part-1-e245b889aa8e
(https://github.com/wkrzywiec/Library-Spring/tree/163fbbac65750b199cc665a2ba61fd4b80fc2ff6)
https://blog.csdn.net/forfuture1978/article/details/4711308
https://blog.csdn.net/yerenyuan_pku/article/details/72582979
https://blog.csdn.net/u014704496/article/details/40408387
https://www.baeldung.com/lucene-file-search(https://github.com/eugenp/tutorials/tree/master/lucene)
https://github.com/tantivy-search/tantivy
分解出pdf中的目录:
https://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html
- 点赞
- 收藏
- 关注作者
评论(0)