Svideo (21) [Avatar] Offline
#1
I have a figured out how to drill down thought a xml converted webpage to the tagged hrefs, but I would like to get the class tag value at the end of the tag (AAA).
Any suggestions on how you get the that value.

AAA



Dim TableRows = From TR In xdoc...<tr> _
Where TR.@class = TableName

Dim links = From link In TableRows... _
Select link.@class
Svideo (21) [Avatar] Offline
#2
Re: Linq to XML referencing the class value in a href
a href="http://AnySite.com/qtables.asp?sym=AAA" class="BBLink">AAA

Some how that didn't get in the first message...
jwooley (123) [Avatar] Offline
#3
Re: Linq to XML referencing the class value in a href
In all honesty, I would probably use a regular expression to parse the query. While I love LINQ to XML, many sites are not truely XHTML compliant. There are tools to take such a site and return it as XHTML, but my success has varied with this in the past.

If you do decide to use LINQ to XML, make sure to specify the namespaces that are included in the source. Most issues people have is with not including the namespace. Remember, XML variables are strong typed just as CLR types are.

That being said, see if the following query would work (Assuming you have imported the namespaces):

Dim query = from node in source...<a> _
where node.@class.Value = MySearchString _
select node.Value

This will return an IEnumerable(Of String) which you would then iterate over, or use the standard First/Single/etc methods as appropriate.

Jim
Svideo (21) [Avatar] Offline
#4
Re: Linq to XML referencing the class value in a href
Decided to use HTMLAgilityPack with the new extended property (HtmlDocumentExtensions) to export xml to be searched with Linq to XML. This might make the results a lot more consistant with linq. Because some sites return executed javascript, used a webbrowser control although would like to find something a bit lighter weight to build a library. Maybe someone knows of a library?

My goal was to program a search to find the one table that had 100 rows. So next is to is convert all the properties in a site to an object where I can find what I want quickly. Regex is fine, but I will be converting tens of thousands of pages into a database and the names and positions of the tables are inconsistent. there was a product called webzinc.net that did something like this, but they aren't supporting it anymore.


So now I can check every table, find the first one where the table has 100 rows by checking the .count property, and use that table name, instead of hoping regex finds what I want in a changing landscape of names and positions.

I did try the .value property but I was doing something else wrong and was stuck on stupid.

Regards


http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=27908
Svideo (21) [Avatar] Offline
#5
Re: Linq to XML referencing the class value in a href
Using your example code and I'm stuck in the same place.

Where node.@class.Value = SearchName

Error 1 'Value' is not a member of 'String'
jwooley (123) [Avatar] Offline
#6
Re: Linq to XML referencing the class value in a href
Your right. I coded that from memory. It should be:

where node.@class = MySearchString _
select node.Value

The attribute evaluates as a string.

The HTMLAgilityPack is the best one that I've heard of as well.

Jim
Svideo (21) [Avatar] Offline
#7
Re: Hexadecimal value is invalid character
As a follow up it is working well but one of the issues is the conversion to xml that you get a Hexadecimal value 0x is an invalid character, Hexadecimal value is an invalid character, in the xml conversion.

http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/

There is a pretty good article about stripping control characters out of the HTML before conversion to xml.