
Reading HTML in C# opens up a world of possibilities for you to interact with web content in C# .NET applications. You can easily parse or navigate HTML documents for simple data extraction or complex web scraping tasks in C# to process HTML content. Accordingly, this blog post covers how to read HTML in C# while covering different approaches to load HTML content and parse the HTML string based on your requirements.
Configure HTML Reader API in C#
You can easily download the API from the New Releases section or install Conholdate.Total for .NET from NuGet gallery by running the following installation command in Package Manager Console in Visual Studio:
PM> NuGet\Install-Package Conholdate.Total
Read an HTML File in C#
HTML (Hypertext Markup Language) is the backbone of web pages, responsible for defining the structure and content of websites. It consists of elements represented by tags, each serving a specific purpose. When you access a web page, your browser interprets the HTML code and renders it into a visual layout that you can interact with. To read and manipulate HTML content in C#, you can parse and navigate HTML documents by following the steps below:
- Load the source HTML file with the HTMLDocument class instance.
- Read the HTML content using the OuterHTML property.
The code snippet below demonstrates how to read an HTML file using C#:
string documentPath = "document.html"; | |
// Load an HTML file | |
var document = new HTMLDocument(documentPath); | |
// Write the document content to the output stream | |
Console.WriteLine(document.DocumentElement.OuterHTML); |
Navigate HTML File to Read HTML Contents in C#
You need to follow the steps below to navigate an HTML file and read the HTML contents in C#:
- Prepare HTML code and initiate HTMLDocument class object.
- Get the reference to the first child (first SPAN) of the BODY.
- Navigate through the child nodes and extract information.
The following code sample shows how to navigate HTML Nodes to read HTML contents in C#:
Read HTML File as String in C#
You can read HTML files as a string in C# from any URL with the following steps:
- Initialize HTMLDocument class object with URL.
- Read the text contents of the HTML format.
- Write the TXT file with extracted text from HTML via URL.
The code sample below elaborates on how to read an HTML file as String in C# from any URL:
// Initialize HTMLDocument object with URL | |
HTMLDocument document = new HTMLDocument("https://products.aspose.com/html/net"); | |
// Read the text contents of the HTML format | |
String text = document.Body.TextContent; | |
// Write the TXT file with extracted text | |
File.WriteAllText("Webpage.txt", text); |
Free Evaluation License
You can get a free temporary license to avoid any evaluation limitations.
Summing Up
Being able to read HTML in C# is a valuable skill for working on web-related projects and data extraction tasks. In this blog post, we have covered three different approaches to reading HTML in C#. This enables you to scrape or parse information from HTML pages for further processing. However, you may explore many other features offered by the API and feel free to reach out to us at the forum.