代码: 全选
<tr class="build"><th colspan="0">Build 110</th></tr> <tr class="arccase project flagday"><td>Feb-25</td><td></td><td></td><td></td><td><a href="../pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />cpupm keyword mode extensions - <a href="/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br /> CPU Deep Idle Keyword - <a href="/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br /></td></tr>
代码: 全选
<tr class="build"><th colspan="0">Build 110</th></tr> <tr class="arccase project flagday"><td>Feb-25</td><td></td><td></td><td></td><td><a href="http://www.opensolaris.org/os/community/on/flag-days/all//pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />cpupm keyword mode extensions - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br /> CPU Deep Idle Keyword - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br /></td></tr>
推荐Beautiful Soup 几行代码就搞定了 哈哈~
代码: 全选
base_url = "http://www.opensolaris.org/os/community/on/flag-days/"
soup = BeautifulSoup(html_text)
link_set = soup.findAll('a')
links = [ e['href'] for e in link_set ]
#get html_text
for e in links:
html_text = string.replace(html_text,e, urljoin(base_url,e),1)