3
Answers

Text search and extraction in pdf file

Photo of Ritika

Ritika

8y
274
1

I am working for text search and extraction from pdf using third party dll itextsharp.

I am getting the text on searching but not only that text, the whole text of that page.

I thought to use phrases or chunks so that I can get pre-and post of that text only along with it instead of whole page text. Can anyone suggest me code for phrases or anything else which I can use for it. Thanks!

My code is:

string searchText = null;

string filename = System.AppDomain.CurrentDomain.BaseDirectory;

filename = @"C:\test.pdf";

searchText = textBox.Text.ToString();

List<int> pages = new List<int>();

if (File.Exists(filename))

{

PdfReader pdfReader = new PdfReader(filename);

List<Phrase> PhraseList = new List<Phrase>();

for (int page = 1; page <= pdfReader.NumberOfPages; page++)

{

ITextExtractionStrategy strategy = SimpleTextExtractionStrategy();

string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)

if (currentPageText.Contains(searchText))

{

pages.Add(page);

textBox1.AppendText(PdfTextExtractor.GetTextFromPage(pdfReader, page));

textBox1.Text += pages.ToString();

}

}

pdfReader.Close();

}

Answers (3)

0
Photo of Mahesh Chand
2 286.9k 123.7m 18y
Here are some good article show what is new in SQL Server 2005.
http://www.dotnetheaven.com/Article/frmDisplayArticles.aspx?SectionID=7
0
Photo of Mehul B
NA 10 0 18y
New Business Intelligence included... Report Service as well as XML datatype has been included. Other lots of changes has been done.
Check out this link. Hope it will help you...

http://www.devx.com/dbzone/Article/21539