0
Hi Vulpes,
If i have more than one image, how can get all images with their path.
Thanks,
Darma
0
Hi,
Im not sure why it wasn't working for you. HtmlAgilityPack is well tested and is fast.
Here is the code
static void Main(string[] args)
{
var doc = new HtmlDocument();
doc.Load("e:\\temp\\dharma.html");
string align = string.Empty,src = string.Empty,title = string.Empty;
var titleNode = doc.DocumentNode.SelectSingleNode("//title");
if(titleNode != null)
{
title = titleNode.InnerText;
}
var p = doc.DocumentNode.SelectSingleNode("//p[@align]");
if(p != null)
{
align = p.Attributes["align"].Value;
}
var img = doc.DocumentNode.SelectSingleNode("//img[@src]");
if(img != null)
{
src = img.Attributes["src"].Value;
}
Console.WriteLine(title);
Console.WriteLine(align);
Console.WriteLine(src);
}
0
Hi Vulpes and Raj,
Thanks for the replies.
@Vulpes: Perfectly, It is working.
@ Raj: First I tried with "HtmlAgilityPack", Unfortunately it was not working.
Thanks,
Darma
0
Hmm,
Sorry my bad incorrectly interpreted, I thought it was on the client side in browser, so you want read this in C#,
Use HtmlAgilityPack
http://htmlagilitypack.codeplex.com/, add references to your project.
var doc = new HtmlDocment();
doc.load("yourfile.htm");
Use xpath queries something like this
var p =
doc.DocumentElement.SelectNodes("//p[@align]"); 0
Here's a different approach using regular expressions.
As I don't know what type of application you're writing, I've used a console application for illustration:
using System;
using System.IO;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
string html = File.ReadAllText("darma.html");
string regExp1 = "<TITLE>(.*?)</TITLE>";
string title = Regex.Match(html, regExp1).Groups[1].Value;
Console.WriteLine(title);
string regExp2 = @"<p align=""(.*?)"">";
string align = Regex.Match(html, regExp2).Groups[1].Value;
Console.WriteLine(align);
string regExp3 = @"<img src=""(.*?)""";
string img = Regex.Match(html, regExp3).Groups[1].Value;
Console.WriteLine(img);
Console.ReadKey();
}
}
The output, as expected, is:
MY WEBSITE
left
boat.gif
0
Hi Raj,
Thanks for the reply.
Which class I should use to load my html file by giving file path.
Thanks
Darma
0
//MY WEBSITE
var title = document.title;
//left value, Note: without an id, get p first element, with id use document.getElementById
var align = document.getElementsByTagName("p")[0].getAttribute("align");
//boat.gif
var src = document.getElementsByTagName("img")[0].getAttribute("src");
Call above scripts when dom is ready i,e. either call a function on body load or include above script between </body> and </html> tags(when all the dom elements are processed).
Hope this helps,
Cheers,
Raj