io_splice RubyGem user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: ruby.io.splice@librelist.com
Subject: Re: Some benchmarks
Date: Wed, 22 Dec 2010 11:56:46 -0800	[thread overview]
Message-ID: <20101222195646.GB20567@dcvr.yhbt.net> (raw)
In-Reply-To: AANLkTinXs4NK5D-=hn9shO4r8LMRmLLP8ssvb12JhSBh@mail.gmail.com

Iñaki Baz Castillo <ibc@aliax.net> wrote:
> Hi, I've done some benchamarks comparing FileUtils vs io_splice when
> copying files in my computer (AMD 64 Phenom II X4 965):

<snip>
> Test data:
> - Source file size:  1156222 bytes
> - Number of iterations:  1000
> Results:
> - FileUtils.cp:            12.707785606384277
> - FileUtils.copy_stream:   13.745135068893433
> - IO.splice:               9.723489761352539
> 
> 
> The script is below.
> 
> It's clear that using io_splice is good for big files (or lot of
> copies from same small source file).
> 
> I don't understand why in the last test (big file, 1000 copies)
> io_splice takes so long, maybe it takes more time initializing each
> object within the benchmark block?

It'ls likely the test wrote enough to force blocking writes to disk,
and your disk is now the bottleneck.  In that case, all the memory
tricks in the world won't help :)

Try it on a tmpfs mount.

Also, relying on GC to close file descriptors could be a small
performance problem given the number of iterations.

> Script:
> ---------------------------------------------------------
> #!/usr/bin/ruby
> 
> require "fileutils"
> require "benchmark"
> require "io/splice"
> 
> 
> SRC_FILE = ARGV[0]
> DST_FILE = ARGV[1]
> TIMES = ( ARGV[2] ? ARGV[2].to_i : 1 )
> 
> 
> puts "Test data:"
> puts "- Source file size:  #{File.size(SRC_FILE)} bytes"
> puts "- Number of iterations:  #{TIMES}"
> 
> 
> puts "Results:"
> 
> print "- FileUtils.cp:            "
> puts Benchmark.realtime {
>  TIMES.times do
>    FileUtils.cp SRC_FILE, DST_FILE
>  end
> }
> 
> 
> print "- FileUtils.copy_stream:   "
> puts Benchmark.realtime {
>  TIMES.times do
>    FileUtils.copy_stream SRC_FILE, DST_FILE
>  end
> }
> 
> 
> print "- IO.splice:               "
> puts Benchmark.realtime {
>  TIMES.times do |n|
>    source = File.open(SRC_FILE, 'rb')
>    dest = File.open(DST_FILE + "_#{n}", 'wb')
>    source_fd = source.fileno
>    dest_fd = dest.fileno
> 
>    # We use a pipe as a ring buffer in kernel space.
>    # pipes may store up to IO::Splice::PIPE_CAPA bytes
>    pipe = IO.pipe

You can reuse the pipe object through multiple runs assuming you drain
it properly.

>    rfd, wfd = pipe.map { |io| io.fileno }
> 
>    begin
>      nread = begin
>        # first pull as many bytes as possible into the pipe
>        IO.splice(source_fd, nil, wfd, nil, IO::Splice::PIPE_CAPA, 0)
>      rescue EOFError
>        break
>      end
> 
>      # now move the contents of the pipe buffer into the destination file
>      # the copied data never enters userspace
>      nwritten = IO.splice(rfd, nil, dest_fd, nil, nread, 0)
> 
>      nwritten == nread or
>        abort "short splice to destination file: #{nwritten} != #{nread}"
>    end while true
>  end
> }
> ---------------------------------------------------------
> 
> I've tryed to declare source, source_fd and pipe = IO.pipe before the
> benchmark block but then I get empty copied files (I need to declare
> all of them within the benchmark block). I assume the test script can
> be improved.

You need to rewind source if you don't specify an input offset for splice.
Likewise for the dest and destination offset.  The open+close is part of
the test for the FileUtils things, though, so I would just open close
(you were leaving the close up to GC, which usually fine for MRI but not
for benchmark purposes).

  reply	other threads:[~2010-12-22 19:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <AANLkTi=M+qGa7G1PvYyB+fbJUzrersJQwtDkct3hZEiy@mail.gmail.com>
2010-12-22 14:01 ` Some benchmarks Iñaki Baz Castillo
2010-12-22 19:56   ` Eric Wong [this message]
2010-12-23 15:41     ` Iñaki Baz Castillo
2010-12-23 18:06       ` Eric Wong
2010-12-27 10:01         ` Iñaki Baz Castillo
2010-12-27 17:33       ` Iñaki Baz Castillo
2010-12-27 21:38         ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/ruby_io_splice/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101222195646.GB20567@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=ruby.io.splice@librelist.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/ruby_io_splice.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).