Steev's HTML Parser
Steev's HTML Parser is an HTML parsing library that builds a complete hierarchy for each element and attribute in the supplied HTML file. Each element is its own C++ class, replete with child nodes, allowing for full control and processing. An 'HTML beautifier' example is included.
网址: http://freshmeat.net/projects/steevshtmlparser/
htmlcxx
htmlcxx is a simple non-validating CSS1 and HTML parser for C++. The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tree gives exactly the same output as the original HTML document. Another key feature is an STL-like tree navigation API provided by the tree.hh template library.
网址: http://freshmeat.net/projects/htmlcxx/
Xport toolkit
Xport is a C++ template class library that can be included in any C++ project to enable the creation and generation of XHTML documents. Although it was developed with the idea of creating XHTML documents for reporting purposes, Xport can be used to create XHTML documents for many other uses as well. It can easily generate and parse (X)HTML documents and stylesheets. It is intuitive to use, and allows many options for parsing and generating documents.
网址: http://freshmeat.net/projects/xporttoolkit/
搜索的方法我是在freshmeat网站搜索关键字 html parse 搜到的,上面的三个都是开源的,最后一个貌似很好很强大,希望对老大有用。