微波EDA网,见证研发工程师的成长!
首页 > 研发问答 > 微电子和IC设计 > IC后端设计交流 > how to speed up perl processing two very big text file

how to speed up perl processing two very big text file

时间:10-02 整理:3721RD 点击:
there two very big, text file, a few million lines each, fileA and fileB.a perl task to sort the same line content in both files
for instance, line 3 of fileA has the line read
abdce fghijklmnop\n
and in fileB
abdce fghijklmnop\n
happened to be line 30,000
the perl script is going to pick out those lines and printout
That should be easy, exhaustively searchfileB for each line of fileA
But the process time would be very long, for very big fileA and fileB
Is there a way in perl to speed up the processing.
split up one file to parallel process using multiple CPU should be one way.
Do we other ways? in perl?
please help

the first thing is to read the whole file into memory, assuming you have enough memory.
methods like readline would be very slow in non-SSDs.

it is very simple , perl programming skills

这个用diff 不就完了么, 干么用perl呢?
perl的算法应该是:
存储每行进2个数组:
my @lines_a ;
while (<A> ) {
push @lines_a , $_
}
my @lines_b ;
while (<B> ) {
push @lines_b , $_
}
# 然后开始比较@lines_a和 @lines_b
# 可以用regexp, 用| 连接
my $regex_a = join "|" , @lines_a ;
for my $var ( @lines_b ) {
if( $var =~ $regex_a ) {
print $var ;
}
}
在@lines_b 里面 进行 @lines_a循环肯定不行,太慢了,
相当于双循环。

Thank you very much for the response
I am going to test it out.

Copyright © 2017-2020 微波EDA网 版权所有

网站地图

Top