当前时区为 UTC + 8 小时



发表新帖 回复这个主题  [ 21 篇帖子 ]  前往页数 1, 2  下一页
作者 内容
1 楼 
 文章标题 : 求助:字符提取问题
帖子发表于 : 2010-12-12 14:40 
头像

注册: 2009-04-09 15:06
帖子: 673
送出感谢: 0 次
接收感谢: 13
如何从
代码:
var vrsvideolist = {"videolist": [{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30507.jpg","relativeVideoId":51297,"videoUrl":"http://tv.sohu.com/20100119/n269698584.shtml","videoId":51296,"videoOrder":"1","videoName":"安与安寻1第1集","playLength":412},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30509.jpg","relativeVideoId":51301,"videoUrl":"http://tv.sohu.com/20100119/n269698593.shtml","videoId":51300,"videoOrder":"2","videoName":"安与安寻1第2集","playLength":485},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30511.jpg","relativeVideoId":51304,"videoUrl":"http://tv.sohu.com/20100119/n269698604.shtml","videoId":51303,"videoOrder":"3","videoName":"安与安寻1第3集","playLength":469},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30512.jpg","relativeVideoId":51306,"videoUrl":"http://tv.sohu.com/20100119/n269698616.shtml","videoId":51305,"videoOrder":"4","videoName":"安与安寻1第4集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30513.jpg","relativeVideoId":51308,"videoUrl":"http://tv.sohu.com/20100119/n269698620.shtml","videoId":51307,"videoOrder":"5","videoName":"安与安寻1第5集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30517.jpg","relativeVideoId":51313,"videoUrl":"http://tv.sohu.com/20100119/n269698637.shtml","videoId":51312,"videoOrder":"6","videoName":"安与安寻1第6集","playLength":433},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30523.jpg","relativeVideoId":51320,"videoUrl":"http://tv.sohu.com/20100119/n269698643.shtml","videoId":51319,"videoOrder":"7","videoName":"安与安寻1第7集","playLength":472},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30534.jpg","relativeVideoId":51334,"videoUrl":"http://tv.sohu.com/20100119/n269698653.shtml","videoId":51333,"videoOrder":"8","videoName":"安与安寻1第8集","playLength":405},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30531.jpg","relativeVideoId":51330,"videoUrl":"http://tv.sohu.com/20100119/n269698658.shtml","videoId":51329,"videoOrder":"9","videoName":"安与安寻1第9集","playLength":582},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30541.jpg","relativeVideoId":51343,"videoUrl":"http://tv.sohu.com/20100119/n269698672.shtml","videoId":51342,"videoOrder":"10","videoName":"安与安寻1第10集","playLength":568},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30527.jpg","relativeVideoId":51325,"videoUrl":"http://tv.sohu.com/20100119/n269698695.shtml","videoId":51324,"videoOrder":"11","videoName":"安与安寻1第11集","playLength":440},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30538.jpg","relativeVideoId":51339,"videoUrl":"http://tv.sohu.com/20100119/n269698697.shtml","videoId":51338,"videoOrder":"12","videoName":"安与安寻1第12集","playLength":537}]}

这种格式字符中提取出
代码:
http://tv.sohu.com/20100119/n269698584.shtml
http://tv.sohu.com/20100119/n269698593.shtml
http://tv.sohu.com/20100119/n269698604.shtml
http://tv.sohu.com/20100119/n269698616.shtml
http://tv.sohu.com/20100119/n269698620.shtml
http://tv.sohu.com/20100119/n269698637.shtml
http://tv.sohu.com/20100119/n269698643.shtml
http://tv.sohu.com/20100119/n269698653.shtml
http://tv.sohu.com/20100119/n269698658.shtml
http://tv.sohu.com/20100119/n269698672.shtml
http://tv.sohu.com/20100119/n269698695.shtml
http://tv.sohu.com/20100119/n269698697.shtml
保存到文件中


页首
 用户资料  
 
2 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-12 15:17 

注册: 2006-11-27 14:21
帖子: 49
送出感谢: 0 次
接收感谢: 0 次
正则表达式应该就可以了,sed + awk or perl ??


_________________
我的百度空间: http://hi.baidu.com/huangyunict/
我的个人主页:http://www.comp.nus.edu.sg/~huangyun/


页首
 用户资料  
 
3 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-12 15:17 
头像

注册: 2009-04-09 15:06
帖子: 673
送出感谢: 0 次
接收感谢: 13
解决了,笨方法
代码:
sed 's/var vrsvideolist = {"videolist"\: \[//g' ./f | sed 's/\}\]//g' | sed 's/\},/\}\n/g' | sed 's/"//g' | awk -F [,] '{print $3}' | sed 's/videoUrl\://g' > ./ff


页首
 用户资料  
 
4 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-12 15:46 

注册: 2008-11-08 18:34
帖子: 627
送出感谢: 0 次
接收感谢: 1
代码:
cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'


_________________
气血鼓荡,身体发胀,偶飘上头,三时舒畅


页首
 用户资料  
 
5 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-12 16:03 
头像

注册: 2006-10-25 18:08
帖子: 1582
送出感谢: 0 次
接收感谢: 0 次
怎么还有空格
sed 's/shtml/&\n/g' lines|sed 's/http/\n&/g'|sed 's/ //g'|grep shtml
http://tv.sohu.com/20100119/n269698584.shtml
http://tv.sohu.com/20100119/n269698593.shtml
http://tv.sohu.com/20100119/n269698604.shtml
http://tv.sohu.com/20100119/n269698616.shtml
http://tv.sohu.com/20100119/n269698620.shtml
http://tv.sohu.com/20100119/n269698637.shtml
http://tv.sohu.com/20100119/n269698643.shtml
http://tv.sohu.com/20100119/n269698653.shtml
http://tv.sohu.com/20100119/n269698658.shtml
http://tv.sohu.com/20100119/n269698672.shtml
http://tv.sohu.com/20100119/n269698695.shtml
http://tv.sohu.com/20100119/n269698697.shtml


_________________
楼主真是一派胡言,真可谓:“两个黄鹂鸣翠柳,不知所云;一行白鹭上青天,不知所止“。本来不想和你辩论,今天气愤不过,和你理论一番。我国宪法写得清清楚楚:“一夜夫妻百日恩,七楼以上才有电梯”。这个想必你知道,既然知道,你就不能断章取义,就算是天气预报,它还有不准的时候呢!!!再者说了,那中国银行也不是你一家开的。人家马拉多纳都结婚了,你还拿着粮票顶什么用呢。真是滑天下之大稽。前些日子,全国人大刚刚开过会,郑重声明:“中国不搞多party制,存栏母猪给补贴”。多好的事呢,楞让你这号人给搅混了。


页首
 用户资料  
 
6 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-12 18:15 
头像

注册: 2009-04-09 15:06
帖子: 673
送出感谢: 0 次
接收感谢: 13
gzbao9999 写道:
代码:
cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'

没有效果
trigger 写道:

:em11

都谢谢了


页首
 用户资料  
 
7 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 9:51 

注册: 2008-11-08 18:34
帖子: 627
送出感谢: 0 次
接收感谢: 1
mjp123 写道:
gzbao9999 写道:
代码:
cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'

没有效果

附件:
文件注释: 敢说没效果
2010-12-13-094911_589x243_scrot.png
2010-12-13-094911_589x243_scrot.png [ 14.37 KiB | 被浏览 444 次 ]



_________________
气血鼓荡,身体发胀,偶飘上头,三时舒畅


页首
 用户资料  
 
8 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 10:04 
头像

注册: 2005-08-14 21:55
帖子: 58428
地址: 长沙
送出感谢: 4
接收感谢: 274
.*?.shtml
这假设了url内部不能带shtml了。

应该使用perlre的环视正则,才是最终的方法。


_________________
● 鸣学


页首
 用户资料  
 
9 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 12:49 
头像

注册: 2009-04-09 15:06
帖子: 673
送出感谢: 0 次
接收感谢: 13
gzbao9999 写道:
mjp123 写道:
gzbao9999 写道:
代码:
cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'

没有效果

附件:
2010-12-13-094911_589x243_scrot.png

对不起,可能是我的系统的问题。
运行后显示grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
是不是perl的安装的问题?


页首
 用户资料  
 
10 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 13:09 

注册: 2008-11-08 18:34
帖子: 627
送出感谢: 0 次
接收感谢: 1
eexpress 写道:
.*?.shtml
这假设了url内部不能带shtml了。
应该使用perlre的环视正则,才是最终的方法。

这是勉强模式,意思是从0个开始一个一个的往前吞进匹配

mjp123 写道:
运行后显示grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
是不是perl的安装的问题?

你的perl没有编译安装支持perl正则的模块,所以不能使用-P选项


_________________
气血鼓荡,身体发胀,偶飘上头,三时舒畅


页首
 用户资料  
 
11 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 13:48 
头像

注册: 2009-04-09 15:06
帖子: 673
送出感谢: 0 次
接收感谢: 13
我的grep 不支持-P?是系统(debian)默认安装的,需要重新编译grep吗?
搜了一下发现pcregre=grep -P
不知对不?


页首
 用户资料  
 
12 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-13 14:19 
头像

注册: 2005-08-14 21:55
帖子: 58428
地址: 长沙
送出感谢: 4
接收感谢: 274
引用:
往前吞进匹配
:em04 就是最小匹配嘛。
你这简单的,grep -o就好了。-P干吗,那不如直接用perl了。


_________________
● 鸣学


页首
 用户资料  
 
13 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-22 14:24 
头像

注册: 2009-05-03 21:09
帖子: 39
送出感谢: 0 次
接收感谢: 0 次
代码:
nawk  'BEGIN{RS="\""}/shtml$/{print }' check| sed -n 's/ //gp'


_________________
GONE WITH THE WIND ~~~


页首
 用户资料  
 
14 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-22 14:51 
论坛管理员

注册: 2005-03-27 0:06
帖子: 10149
系统: Ubuntu 12.04
送出感谢: 7
接收感谢: 130
python 凑个热闹

代码:
cat bbb|python -c "import sys,re;print '\n'.join(re.findall(r'(http://tv.sohu.com.*?.shtml)',sys.stdin.read().replace(' ','')))"


页首
 用户资料  
 
15 楼 
 文章标题 : Re: 求助:字符提取问题
帖子发表于 : 2010-12-22 18:47 

注册: 2008-10-31 22:12
帖子: 6546
系统: 践兔
送出感谢: 18
接收感谢: 25
[python]
#!/usr/bin/env python
# -*- encoding : utf-8 -*-

vrsvideolist = {"videolist": [{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30507.jpg","relativeVideoId":51297,"videoUrl":"http://tv.sohu.com/20100119/n269698584.shtml","videoId":51296,"videoOrder":"1","videoName":"安与安寻1第1集","playLength":412},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30509.jpg","relativeVideoId":51301,"videoUrl":"http://tv.sohu.com/20100119/n269698593.shtml","videoId":51300,"videoOrder":"2","videoName":"安与安寻1第2集","playLength":485},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30511.jpg","relativeVideoId":51304,"videoUrl":"http://tv.sohu.com/20100119/n269698604.shtml","videoId":51303,"videoOrder":"3","videoName":"安与安寻1第3集","playLength":469},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30512.jpg","relativeVideoId":51306,"videoUrl":"http://tv.sohu.com/20100119/n269698616.shtml","videoId":51305,"videoOrder":"4","videoName":"安与安寻1第4集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30513.jpg","relativeVideoId":51308,"videoUrl":"http://tv.sohu.com/20100119/n269698620.shtml","videoId":51307,"videoOrder":"5","videoName":"安与安寻1第5集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30517.jpg","relativeVideoId":51313,"videoUrl":"http://tv.sohu.com/20100119/n269698637.shtml","videoId":51312,"videoOrder":"6","videoName":"安与安寻1第6集","playLength":433},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30523.jpg","relativeVideoId":51320,"videoUrl":"http://tv.sohu.com/20100119/n269698643.shtml","videoId":51319,"videoOrder":"7","videoName":"安与安寻1第7集","playLength":472},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30534.jpg","relativeVideoId":51334,"videoUrl":"http://tv.sohu.com/20100119/n269698653.shtml","videoId":51333,"videoOrder":"8","videoName":"安与安寻1第8集","playLength":405},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30531.jpg","relativeVideoId":51330,"videoUrl":"http://tv.sohu.com/20100119/n269698658.shtml","videoId":51329,"videoOrder":"9","videoName":"安与安寻1第9集","playLength":582},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30541.jpg","relativeVideoId":51343,"videoUrl":"http://tv.sohu.com/20100119/n269698672.shtml","videoId":51342,"videoOrder":"10","videoName":"安与安寻1第10集","playLength":568},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30527.jpg","relativeVideoId":51325,"videoUrl":"http://tv.sohu.com/20100119/n269698695.shtml","videoId":51324,"videoOrder":"11","videoName":"安与安寻1第11集","playLength":440},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30538.jpg","relativeVideoId":51339,"videoUrl":"http://tv.sohu.com/20100119/n269698697.shtml","videoId":51338,"videoOrder":"12","videoName":"安与安寻1第12集","playLength":537}]}

for i in vrsvideolist['videolist']:
print (i["videoUrl"])

[/python]


_________________
代码:
] ls -ld //


页首
 用户资料  
 
显示帖子 :  排序  
发表新帖 回复这个主题  [ 21 篇帖子 ]  前往页数 1, 2  下一页

当前时区为 UTC + 8 小时


在线用户

正在浏览此版面的用户:没有注册用户 和 1 位游客


不能 在这个版面发表主题
不能 在这个版面回复主题
不能 在这个版面编辑帖子
不能 在这个版面删除帖子
不能 在这个版面提交附件

前往 :  
本站点为公益性站点,用于推广开源自由软件,由 DiaHosting VPSBudgetVM VPS 提供服务。
我们认为:软件应可免费取得,软件工具在各种语言环境下皆可使用,且不会有任何功能上的差异;
人们应有定制和修改软件的自由,且方式不受限制,只要他们自认为合适。

Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
简体中文语系由 王笑宇 翻译