Regular competitive programmers and users of problem providing sites are familiar with this issue. Often, the input is very large and the most trivial (ah, not really!) task of reading input from stdin might prove to be a bottleneck. Such problem might be accompanied with 'Warning: large Input/Output data, be careful with certain languages'.
Is it really true? Is there such a big difference in the performance of scanf() versus the performance of cin in C++? Lets put this observation to test...
Let me create a dummy input file, similar to a large input a programming problem may have. A file containing a line with 16 bytes followed by a newline and having 1000000 such lines, making a file of 17 MB should be good enough.
Now that I have a large enough input file, let's quickly compare the time taken to read the file from stdin (get the file from disk to stdin using redirection) by using scanf() versus cin.
cin_test.cc to test the performance of cin.
Is it really true? Is there such a big difference in the performance of scanf() versus the performance of cin in C++? Lets put this observation to test...
Let me create a dummy input file, similar to a large input a programming problem may have. A file containing a line with 16 bytes followed by a newline and having 1000000 such lines, making a file of 17 MB should be good enough.
$ yes 1111111111111111 | head -1000000 > /tmp/dummy
Now that I have a large enough input file, let's quickly compare the time taken to read the file from stdin (get the file from disk to stdin using redirection) by using scanf() versus cin.
cin_test.cc to test the performance of cin.
#include <iostream>
int main() {
char buffer[256];
while (std::cin >> buffer) {}
return 0;
}
scanf_test.cc to test the performance of scanf().
#include <cstdlib>
#include <cstdio>
int main() {
char buffer[256];
while (scanf("%s", buffer) != EOF) {}
return 0;
}
Here is what I get on running the two:
$ g++ cin_test.cc -o cin_test
$ time ./cin_test < /tmp/dummy
real 0m2.162s
user 0m1.696s
sys 0m0.332s
$ g++ scanf_test.cc -o scanf_test
$ time ./scanf_test < /tmp/dummy
real 0m0.426s
user 0m0.248s
sys 0m0.084s
Well, the above results are consistent with our observations. But, when I think about it, I find it a little weird. Think about it! Our comparison shows that scanf() is about 5 times faster than cin. Five times!!
What went wrong with cin? On a high level both of them are wrappers over the read() system call, just syntactic sugar. The only visible difference is that scanf() has to explicitly declare the input type, whereas cin has the redirection operation overloaded using templates. This does not seem like a good enough reason for a performance hit of 5x.
Why is scanf faster than cin?
It turns out that iostream makes use of stdio's buffering system So, cin wastes time synchronizing itself with the underlying C-library's stdio buffer, so that calls to both scanf()and cin can be interleaved.[1]
The good thing is that libstdc++ provides an option to turn off synchronization of all the iostream standard streams with their corresponding standard C streams using[2]
std::ios::sync_with_stdio(false);
and cin becomes faster than scanf() as it should have been.
Here is cin_test_2.c to show the same:
#include <iostream>
int main() {
char buffer[256];
std::ios_base::sync_with_stdio(false);
while (std::cin >> buffer) {}
return 0;
}
$ g++ cin_test_2.cc -o cin_test_2
$ time ./cin_test_2 < /tmp/dummy
real 0m0.380s
user 0m0.240s
sys 0m0.028s
And the world is as it should have been!!
Like with all things, there is a caveat here. With synchronization turned off, using cin and scanf() together will result in an undefined mess.
With synchronization turned off, my benchmark indicates that cin is 8-10% faster than scanf(). This is probably because scanf() interprets the format arguments at runtime and makes use of variable number of arguments, whereas cin does this at compile time.
Now I am wondering, how fast can it be done?
$ time cat /tmp/dummy > /dev/null
real 0m0.185s
user 0m0.000s
sys 0m0.092s
Wow! This is fastt!!!
[1] http://gcc.gnu.org/onlinedocs/libstdc++/manual/io_and_c.html
[2] http://www.cplusplus.com/reference/ios/ios_base/sync_with_stdio/
[1] http://gcc.gnu.org/onlinedocs/libstdc++/manual/io_and_c.html
[2] http://www.cplusplus.com/reference/ios/ios_base/sync_with_stdio/
fantastic tutorial!!!
ReplyDeleteYou wrote:
ReplyDelete"Well, the above results are consistent with our observations. But, when I think about it, I find it a little weird. Think about it! Our comparison shows that scanf() is about 5 times slower than cin. Five times!!"
I'm afraid you mean "scanf() is about 5 times *FASTER* than cin."
Great catch! Thank you, I have corrected the error.
DeleteThanks for clearing my doubt about cin and scanf :)
ReplyDeleteMy result for time cat /tmp/dummy>/dev/null
ReplyDeletereal 0m0.017s
user 0m0.000s
sys 0m0.017s
nice post thanks
ReplyDeleteivanka trump hot pics