求助:字符提取问题

sh/bash/dash/ksh/zsh等Shell脚本
头像
mjp123
帖子: 702
注册时间: 2009-04-09 15:06

求助:字符提取问题

#1

帖子 mjp123 » 2010-12-12 14:40

如何从

代码: 全选

var vrsvideolist = {"videolist": [{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30507.jpg","relativeVideoId":51297,"videoUrl":"http://tv.sohu.com/20100119/n269698584.shtml","videoId":51296,"videoOrder":"1","videoName":"安与安寻1第1集","playLength":412},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30509.jpg","relativeVideoId":51301,"videoUrl":"http://tv.sohu.com/20100119/n269698593.shtml","videoId":51300,"videoOrder":"2","videoName":"安与安寻1第2集","playLength":485},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30511.jpg","relativeVideoId":51304,"videoUrl":"http://tv.sohu.com/20100119/n269698604.shtml","videoId":51303,"videoOrder":"3","videoName":"安与安寻1第3集","playLength":469},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30512.jpg","relativeVideoId":51306,"videoUrl":"http://tv.sohu.com/20100119/n269698616.shtml","videoId":51305,"videoOrder":"4","videoName":"安与安寻1第4集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30513.jpg","relativeVideoId":51308,"videoUrl":"http://tv.sohu.com/20100119/n269698620.shtml","videoId":51307,"videoOrder":"5","videoName":"安与安寻1第5集","playLength":428},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30517.jpg","relativeVideoId":51313,"videoUrl":"http://tv.sohu.com/20100119/n269698637.shtml","videoId":51312,"videoOrder":"6","videoName":"安与安寻1第6集","playLength":433},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30523.jpg","relativeVideoId":51320,"videoUrl":"http://tv.sohu.com/20100119/n269698643.shtml","videoId":51319,"videoOrder":"7","videoName":"安与安寻1第7集","playLength":472},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30534.jpg","relativeVideoId":51334,"videoUrl":"http://tv.sohu.com/20100119/n269698653.shtml","videoId":51333,"videoOrder":"8","videoName":"安与安寻1第8集","playLength":405},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30531.jpg","relativeVideoId":51330,"videoUrl":"http://tv.sohu.com/20100119/n269698658.shtml","videoId":51329,"videoOrder":"9","videoName":"安与安寻1第9集","playLength":582},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30541.jpg","relativeVideoId":51343,"videoUrl":"http://tv.sohu.com/20100119/n269698672.shtml","videoId":51342,"videoOrder":"10","videoName":"安与安寻1第10集","playLength":568},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30527.jpg","relativeVideoId":51325,"videoUrl":"http://tv.sohu.com/20100119/n269698695.shtml","videoId":51324,"videoOrder":"11","videoName":"安与安寻1第11集","playLength":440},{"videoImage":"http://photocdn.sohu.com/20100118/vrsb30538.jpg","relativeVideoId":51339,"videoUrl":"http://tv.sohu.com/20100119/n269698697.shtml","videoId":51338,"videoOrder":"12","videoName":"安与安寻1第12集","playLength":537}]}
这种格式字符中提取出

代码: 全选

http://tv.sohu.com/20100119/n269698584.shtml
http://tv.sohu.com/20100119/n269698593.shtml
http://tv.sohu.com/20100119/n269698604.shtml
http://tv.sohu.com/20100119/n269698616.shtml
http://tv.sohu.com/20100119/n269698620.shtml
http://tv.sohu.com/20100119/n269698637.shtml
http://tv.sohu.com/20100119/n269698643.shtml
http://tv.sohu.com/20100119/n269698653.shtml
http://tv.sohu.com/20100119/n269698658.shtml
http://tv.sohu.com/20100119/n269698672.shtml
http://tv.sohu.com/20100119/n269698695.shtml
http://tv.sohu.com/20100119/n269698697.shtml
保存到文件中
huangyun
帖子: 49
注册时间: 2006-11-27 14:21
联系:

Re: 求助:字符提取问题

#2

帖子 huangyun » 2010-12-12 15:17

正则表达式应该就可以了,sed + awk or perl ??
头像
mjp123
帖子: 702
注册时间: 2009-04-09 15:06

Re: 求助:字符提取问题

#3

帖子 mjp123 » 2010-12-12 15:17

解决了,笨方法

代码: 全选

sed 's/var vrsvideolist = {"videolist"\: \[//g' ./f | sed 's/\}\]//g' | sed 's/\},/\}\n/g' | sed 's/"//g' | awk -F [,] '{print $3}' | sed 's/videoUrl\://g' > ./ff
gzbao9999
帖子: 627
注册时间: 2008-11-08 18:34

Re: 求助:字符提取问题

#4

帖子 gzbao9999 » 2010-12-12 15:46

代码: 全选

cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'
气血鼓荡,身体发胀,偶飘上头,三时舒畅
头像
trigger
帖子: 1604
注册时间: 2006-10-25 18:08

Re: 求助:字符提取问题

#5

帖子 trigger » 2010-12-12 16:03

楼主真是一派胡言,真可谓:“两个黄鹂鸣翠柳,不知所云;一行白鹭上青天,不知所止“。本来不想和你辩论,今天气愤不过,和你理论一番。我国宪法写得清清楚楚:“一夜夫妻百日恩,七楼以上才有电梯”。这个想必你知道,既然知道,你就不能断章取义,就算是天气预报,它还有不准的时候呢!!!再者说了,那中国银行也不是你一家开的。人家马拉多纳都结婚了,你还拿着粮票顶什么用呢。真是滑天下之大稽。前些日子,全国人大刚刚开过会,郑重声明:“中国不搞多party制,存栏母猪给补贴”。多好的事呢,楞让你这号人给搅混了。
gzbao9999
帖子: 627
注册时间: 2008-11-08 18:34

Re: 求助:字符提取问题

#7

帖子 gzbao9999 » 2010-12-13 9:51

mjp123 写了:
gzbao9999 写了:

代码: 全选

cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'
没有效果
敢说没效果
敢说没效果
气血鼓荡,身体发胀,偶飘上头,三时舒畅
头像
eexpress
帖子: 58428
注册时间: 2005-08-14 21:55
来自: 长沙

Re: 求助:字符提取问题

#8

帖子 eexpress » 2010-12-13 10:04

.*?.shtml
这假设了url内部不能带shtml了。

应该使用perlre的环视正则,才是最终的方法。
● 鸣学
头像
mjp123
帖子: 702
注册时间: 2009-04-09 15:06

Re: 求助:字符提取问题

#9

帖子 mjp123 » 2010-12-13 12:49

gzbao9999 写了:
mjp123 写了:
gzbao9999 写了:

代码: 全选

cat bbb|grep -Po 'http://tv.sohu.com.*?.shtml'
没有效果
2010-12-13-094911_589x243_scrot.png
对不起,可能是我的系统的问题。
运行后显示grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
是不是perl的安装的问题?
gzbao9999
帖子: 627
注册时间: 2008-11-08 18:34

Re: 求助:字符提取问题

#10

帖子 gzbao9999 » 2010-12-13 13:09

eexpress 写了:.*?.shtml
这假设了url内部不能带shtml了。
应该使用perlre的环视正则,才是最终的方法。
这是勉强模式,意思是从0个开始一个一个的往前吞进匹配
mjp123 写了: 运行后显示grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
是不是perl的安装的问题?
你的perl没有编译安装支持perl正则的模块,所以不能使用-P选项
气血鼓荡,身体发胀,偶飘上头,三时舒畅
头像
mjp123
帖子: 702
注册时间: 2009-04-09 15:06

Re: 求助:字符提取问题

#11

帖子 mjp123 » 2010-12-13 13:48

我的grep 不支持-P?是系统(debian)默认安装的,需要重新编译grep吗?
搜了一下发现pcregre=grep -P
不知对不?
头像
eexpress
帖子: 58428
注册时间: 2005-08-14 21:55
来自: 长沙

Re: 求助:字符提取问题

#12

帖子 eexpress » 2010-12-13 14:19

往前吞进匹配
:em04 就是最小匹配嘛。
你这简单的,grep -o就好了。-P干吗,那不如直接用perl了。
● 鸣学
头像
linxiaoyu
帖子: 39
注册时间: 2009-05-03 21:09

Re: 求助:字符提取问题

#13

帖子 linxiaoyu » 2010-12-22 14:24

代码: 全选

nawk  'BEGIN{RS="\""}/shtml$/{print }' check| sed -n 's/ //gp'
GONE WITH THE WIND ~~~
头像
oneleaf
论坛管理员
帖子: 10441
注册时间: 2005-03-27 0:06
系统: Ubuntu 12.04

Re: 求助:字符提取问题

#14

帖子 oneleaf » 2010-12-22 14:51

python 凑个热闹

代码: 全选

cat bbb|python -c "import sys,re;print '\n'.join(re.findall(r'(http://tv.sohu.com.*?.shtml)',sys.stdin.read().replace(' ','')))"
tusooa
帖子: 6548
注册时间: 2008-10-31 22:12
系统: 践兔
联系:

Re: 求助:字符提取问题

#15

帖子 tusooa » 2010-12-22 18:47

[python]
#!/usr/bin/env python
# -*- encoding : utf-8 -*-

vrsvideolist = {"videolist": [{"videoImage":"http://photocdn.sohu.com/20100118/vrsb3 ... ength":537}]}

for i in vrsvideolist['videolist']:
print (i["videoUrl"])

[/python]

代码: 全选

] ls -ld //
回复