原文件是MS的LETOR中一个计算RANKING性能的perl script(附件中的eval-score-mslr.txt,请改成.pl) 其中的匹配行是(LINE209):
代码: 全选
$lnFea =~ /^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+)$/
请注意,这只有一行。数据集里是很多很多行的。0 qid:15903 1:0.011571 2:0.076923 3:0.000000 4:1.000000 5:0.012383 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.994191 12:0.081240 13:0.000000 14:0.774174 15:0.994196 16:0.003958 17:0.088235 18:0.142857 19:0.250000 20:0.004048 21:0.813038 22:0.954210 23:0.868413 24:0.914573 25:0.622796 26:0.426148 27:0.204424 28:0.397757 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.978268 34:1.000000 35:0.686586 36:1.000000 37:0.791811 38:1.000000 39:0.887732 40:1.000000 41:0.111111 42:0.336449 43:0.000000 44:0.000545 45:0.000789 46:0.002564 #docid = GX008-66-8698208 inc = 1 prob = 0.176849
我用这个script一直提示:
而我直接用那个匹配的句子去写个简单的程序,又发现匹配完全正常,以下是我写的script.Error to parse test.txt at line 1:
代码: 全选
#!/usr/bin/perl
my $_ = "0 qid:18219 1:0.052893 2:1.000000 3:0.750000 4:1.000000 5:0.066225 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.047634 12:1.000000 13:0.740506 14:1.000000 15:0.058539 16:0.003995 17:0.500000 18:0.400000 19:0.400000 20:0.004121 21:1.000000 22:1.000000 23:0.974510 24:1.000000 25:0.929240 26:1.000000 27:1.000000 28:0.829951 29:1.000000 30:1.000000 31:0.768123 32:1.000000 33:1.000000 34:1.000000 35:1.000000 36:1.000000 37:1.000000 38:1.000000 39:0.998377 40:1.000000 41:0.333333 42:0.434783 43:0.000000 44:0.396910 45:0.447368 46:0.966667 #docid = GX004-93-7097963 inc = 0.0428115405134536 prob = 0.860366\n";
chomp($_);
if ($_ =~ /^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+)$/)
{
my $label = $1;
my $qid = $2;
my $did = $3;
my $inc = $4;
my $prob= $5;
print "label is $label\nqid is $qid\ndid is $did\ninc is $inc\nprob is $prob\n";
}
else
{
print "Error to parse the line:\n$_\n";
exit -2;
};
吾百思不得其解,看了许多过于正则匹配的资料,也没发现什么特别的东西。label is 0
qid is 18219
did is GX004-93-7097963
inc is 0.0428115405134536
prob is 0.860366
故在此发贴求教论坛中的perl高手,望不吝赐教...