how to speed up perl processing two very big text file

时间：10-02 整理：3721RD 点击：

there two very big, text file, a few million lines each, fileA and fileB.a perl task to sort the same line content in both files
for instance, line 3 of fileA has the line read
abdce fghijklmnop\n
and in fileB
abdce fghijklmnop\n
happened to be line 30,000
the perl script is going to pick out those lines and printout
That should be easy, exhaustively searchfileB for each line of fileA
But the process time would be very long, for very big fileA and fileB
Is there a way in perl to speed up the processing.
split up one file to parallel process using multiple CPU should be one way.
Do we other ways? in perl?
please help

the first thing is to read the whole file into memory, assuming you have enough memory.
methods like readline would be very slow in non-SSDs.

it is very simple , perl programming skills

这个用diff 不就完了么，干么用perl呢?
perl的算法应该是：
存储每行进2个数组:
my @lines_a ;
while (<A> ) {
push @lines_a , $_
}
my @lines_b ;
while (<B> ) {
push @lines_b , $_
}
# 然后开始比较@lines_a和 @lines_b
# 可以用regexp，用| 连接
my $regex_a = join "|" , @lines_a ;
for my $var ( @lines_b ) {
if( $var =~ $regex_a ) {
print $var ;
}
}
在@lines_b 里面进行 @lines_a循环肯定不行，太慢了，
相当于双循环。

Thank you very much for the response
I am going to test it out.

上一篇：小弟新安装了IC5141求大神帮看一下有什么问题
下一篇：会工艺文件的，或者做过standard cell的大侠们请指教！

big perl file text 相关文章：

栏目分类