Python生成快速匹配的浏览器PAC文件

科学之子 · #1

Python生成快速匹配的浏览器PAC文件
Python生成"快速"(理论上,但实际测试发现更慢)匹配的浏览器PAC文件
首先感谢astolia提供思路
顺便问如何测试浏览器PAC的实际性能?
测试方法

代码：全选

import sys
import json
def add_domain(domain):
    sub_domains=domain.rstrip('\n').split('.')
    if len(sub_domains):
        sub_domains.reverse()
        #last sub domain dictionary
        last_sub_domain_dict={}
        #last sub domain
        last_sub_domain=''
        #current sub domain dictionary
        cur_domain_dict=domains_dict
        for sub_domain in sub_domains:
            last_sub_domain=sub_domain
            last_sub_domain_dict=cur_domain_dict
            if sub_domain not in cur_domain_dict:
                cur_domain_dict[sub_domain]={}
            elif type(cur_domain_dict[sub_domain])!=dict:
                cur_domain_dict[sub_domain]={}
            cur_domain_dict=cur_domain_dict[sub_domain]            
        last_sub_domain_dict[last_sub_domain]=0

domains_dict={}

with open('./domains_list') as domains_list_file:
    for domain in domains_list_file:
        add_domain(domain)

with open('./pac_template.pac') as pac_template_file:
    pac_out=pac_template_file.read()

pac_out=pac_out.replace('__DOMAINS__',json.dumps(domains_dict))
with open('./my_pac.pac','w') as pac_out_file:
    pac_out_file.write(pac_out)

改进后的高效模板,比gfwlist2pac更快,再次感谢astolia
生成模板(更自gfwlist2pac):

代码：全选

var proxy = "PROXY 127.0.0.1:80";

var domains = __DOMAINS__;

var direct = 'DIRECT;';

function FindProxyForURL(url, host) {
var sub_domains_array=host.split('.');
sub_domains_array.reverse();
sub_domains_array_length=sub_domains_array.length
cur_sub_domain=domains
for(var i=0;i<sub_domains_array_length;i++)
{
	cur_sub_domain=cur_sub_domain[sub_domains_array[i]];
	if(cur_sub_domain==0)
	return proxy;
	else if(cur_sub_domain==null)
	return direct;
}
return direct;
}

[/s]

科学之子 · #2

@astolia
用这个网页工具:
http://www.w3school.com.cn/tiy/t.asp?f= ... te_gettime
测试发现这种形势的匹配起来居然更慢?
测试方法就是循环调用然后计算时差

astolia · #3

明显是你的pac中代码写的太烂了。reverse操作就毫无必要，循环中比较两次也是多余

科学之子 · #4

astolia 写了：明显是你的pac中代码写的太烂了。reverse操作就毫无必要，循环中比较两次也是多余

PAC模板写好了,确实快了不少:

代码：全选

var proxy = "PROXY 127.0.0.1:8580";

var domains = __DOMAINS__;

var direct = 'DIRECT';

function FindProxyForURL(url, host) {
var sub_domains_array=host.split('.');
sub_domains_array_length=sub_domains_array.length
cur_sub_domain=domains
while(1)
{
	sub_domains_array_length--
	cur_sub_domain=cur_sub_domain[sub_domains_array[sub_domains_array_length]];
	if(cur_sub_domain===0)
	return proxy;
	else if(cur_sub_domain===undefined)
	return direct;
}
}

我这里循环10000000次,改进后约8XX毫秒,gfwlist2pac的是27XX毫秒
如有建议还望不吝赐教

科学之子 · #5

发现了一个奇怪的现象

代码：全选

<html>
<body>

<script type="text/javascript">

//...生成后的代码

my_url=''
my_domain='1.2'

document.write(FindProxyForURL(my_url,my_domain));
document.write("<br>");

start_t=new Date();
for(var i=0;i<10000000;i++)
FindProxyForURL(my_url,my_domain);
end_t=new Date();
document.write(end_t-start_t);
</script>

</body>
</html>

gfwlist2pac的方式面对这种数字和字母没什么不同
但是如果域名是纯数字的,对改进后的pac代码会有很大影响
my_domain='1.2',这种情况下甚至改进后的代码比gfwlist2pac还要慢
gfwlist2pac的PAC代码:
viewtopic.php?p=3170246#p3170246

科学之子 · #6

在5楼的情形中Chromium和Firefox的测试结果竟然完全相反
Chromium中即使是纯数字,也没有什么区别
看来是JS引擎的具体实现问题
Firefox纯数字的话改进后的代码就慢了一些
不过这种速度降低不会随域名级数明显增加

Python生成快速匹配的浏览器PAC文件

Python生成快速匹配的浏览器PAC文件

Re: Python生成快速匹配的浏览器PAC文件

Re: Python生成"快速"(理论上,但实际测试发现更慢)匹配的浏览器PAC文件

Re: Python生成"快速"(理论上,但实际测试发现更慢)匹配的浏览器PAC文件

Re: Python生成快速匹配的浏览器PAC文件

Re: Python生成快速匹配的浏览器PAC文件