Background
I viewed this article written by Sourav Kayal some days ago and an idea just popped up in my head: Can I use this library to scan barcodes from a PDF document? So I downloaded this library and tested it for the purpose of scanning barcodes from a PDF document. But the library failed to scan the barcodes directly in the PDF document.
Introduction
There is a very popular free .NET PDF library called iTextSharp. I used it to process a PDF document sometimes. So I tried to complete the job using iTextSharp and the library introduced in the first paragraph. And it worked. So I want to share the solution with you guys here.
What iTextSharp support:
- PDF generation
- PDF manipulation (stamping watermarks, merging/splitting PDFs and so on)
- PDF form filling
- XML functionality
- Digital signatures
- Simple code showing how to use iTextSharp:
-
- Document document = new Document(PageSize.A4, 50, 50, 25, 25);
-
-
- FileStream output = new FileStream("firstPdf.pdf", FileMode.Create);
-
- var writer = PdfWriter.GetInstance(document, output);
-
-
- document.Open();
-
-
- Paragraph welcomeParagraph = new Paragraph("Hello, World!");
-
-
- document.Add(welcomeParagraph);
-
-
- document.Close();
Screenshot
Sample Code to Scan Barcode from PDF
In this part, I present you complete code to fulfill the job. If you don't know how to use the barcode library, please check Sourav Kayal's article. The “Hello, World!” code will make you understand how to use iTextSharp.
The method GetImages uses iTextSharp to extract images from a PDF document.
Source PDF document
The method GetImages
- private static void GetImages(string filename)
- {
- int pageNum = 1;
-
- PdfReader pdf = new PdfReader(filename);
- PdfDictionary pg = pdf.GetPageN(pageNum);
- PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
- PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
- if (xobj == null) { return; }
- foreach (PdfName name in xobj.Keys)
- {
- PdfObject obj = xobj.Get(name);
- if (!obj.IsIndirect()) { continue; }
- PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
- PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
- if (!type.Equals(PdfName.IMAGE)) { continue; }
- int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
- PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
- PdfStream pdfStrem = (PdfStream)pdfObj;
- byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
- if (bytes == null) { continue; }
- using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
- {
- memStream.Position = 0;
- System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
-
- string path = Path.Combine(String.Format(@"result-{0}.jpg", pageNum));
- System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
- parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
- var jpegEncoder = ImageCodecInfo.GetImageEncoders().ToList().Find(x => x.FormatID == ImageFormat.Jpeg.Guid);
- img.Save(path, jpegEncoder, parms);
- }
- }
- }
The following shows how to use the barcode library to scan the extracted image:
-
- GetImages("source2.pdf");
-
-
- bool imageExist = File.Exists("result-1.jpg");
- if (imageExist)
- {
- string scaningResult = Spire.Barcode.BarcodeScanner.ScanOne("result-1.jpg");
- Console.WriteLine(scaningResult);
- }
-
- Console.WriteLine("Done!");
- Console.ReadLine();
Result
Conclusion
You are welcome to test the code to scan barcodes from a PDF document. I hope this article may provide you some help in programming.