抓取网页中的指定内容

noble_out · #1

代码：全选

……
<div id="statuspzgz" class="fundpz">
<span class="red bold">1.7791</span>
<div id="statuszdf" class="fundzf up">
<p class="red">0.0261</p>
<p class="red">1.49%</p>
</div>
<p class="time">2014-12-25 15:00</p>
</div>
……

curl http://xxoo.com -o page.html
用命令怎么抓取1.49%
请教各位大神

eexpress · #2

grep -o '[0-9\.]*%'

noble_out · #3

eexpress 写了：grep -o '[0-9\.]*%'

3ks e神

noble_out · #4

网页内容有点多，。匹配不是很精确，有时候提取的不是我想要的。请教各位大神，如何先提取省略号之前的那段内容，再用e给的方法提取呢？

noble_out · #5

grep -o 'statuspzgz.*time'
呃，这样

Lavande · #6

python用beautifulsoup
呃。。。好吧shell我跑题了。。。

eexpress · #7

tr -d '\n'|grep -o 'statuspzgz[^%]*%'|grep -o '[0-9\.]*%'

astolia · #8

shell解析html推荐用xidel，可以用xpath或css选择器精确定位
http://videlibri.sourceforge.net/xidel.html

代码：全选

xidel -q --css "#statuszdf>p:nth-child(2)" a.html

或者用管道

代码：全选

echo '<div id="statuspzgz" class="fundpz">
<span class="red bold">1.7791</span>
<div id="statuszdf" class="fundzf up">
<p class="red">0.0261</p>
<p class="red">1.49%</p>
</div>
<p class="time">2014-12-25 15:00</p>
</div>' | xidel -q --css "#statuszdf>p:nth-child(2)" -

noble_out · #9

写了个实时查看基金收益的小脚本:em06

代码：全选

#!/bin/bash
echo "基金代号\t基金名称\t净值\t涨跌\t份额\t收益\t市值"
jrsy=0
ljsz=0
for i in 531020:建信转债增强债券:2223.98 530008:建信稳定增利债券:1076.81 217008:招商安本增利债券:1091.46
do
	id=`echo $i |awk -F ":" '{print $1}'`
	name=`echo $i |awk -F ":" '{print $2}'`
	num=`echo $i |awk -F ":" '{print $3}'`
	curl http://fund.eastmoney.com/$id.html -s -o ~/desktop/TMP/$id.html
	sed -i '410!d' ~/desktop/TMP/$id.html
	data=`cat ~/desktop/TMP/$id.html | grep -o 'statuspzgz.*time' |grep -o '\-\?[0-9\.]*'`
	jz=`echo ${data} | awk '{print $1}'`
	zd=`echo ${data} | awk '{print $3}'`
	sy=`echo "scale=2;$num*$zd/100" | bc`
	sz=`echo "scale=2;$num*$jz" | bc`
	jrsy=`echo "scale=2;$sy+$jrsy" | bc`
	ljsz=`echo "scale=2;$sz+$ljsz" | bc`
	echo "$id\t$name\t$jz\t$zd%\t$num\t$sy\t$sz"
done
ljsy=`echo "scale=2;$ljsz-7000" | bc`
echo "今日收益:$jrsy"
echo "累计收益:$ljsy"
rm ~/desktop/TMP/*.html

抓取网页中的指定内容

抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容

Re: 抓取网页中的指定内容