From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS6939 64.71.128.0/18
X-Spam-Status: No, score=-1.4 required=3.0 tests=AWL,BAYES_00,FREEMAIL_FROM,
 MSGID_FROM_MTA_HEADER shortcircuit=no autolearn=ham version=3.3.2
Path: news.gmane.org!not-for-mail
From: "=?utf-8?b?0K7RgNC40Lkg0KHQvtC60L7Qu9C+0LI=?=" <funny.falcon@gmail.com>
Newsgroups: gmane.comp.lang.ruby.kgio.general
Subject: Re: [PATCH 3/3] add `#kgio_writev` and `#kgio_trywritev`
Date: Fri, 1 Jun 2012 10:14:23 +0400
Message-ID:
 <CAL-rCA0cDLbOBCitwJ60-WsrCbBpWU_wkQYzxy5yCUpvo4cAAw@mail.gmail.com>
References: <20120530203915.GB17661@dcvr.yhbt.net>
 <1338454833-11754-1-git-send-email-funny.falcon@gmail.com>
 <20120531211923.GA21892@dcvr.yhbt.net>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Trace: dough.gmane.org 1338531283 32381 80.91.229.3 (1 Jun 2012 06:14:43
 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Fri, 1 Jun 2012 06:14:43 +0000 (UTC)
To: kgio@librelist.com
Original-X-From: kgio@librelist.com Fri Jun 01 08:14:42 2012
Return-path: <kgio@librelist.com>
Envelope-to: gclrkg-kgio@m.gmane.org
List-Archive: <http://librelist.com/archives/kgio/>
List-Help: <http://librelist.com/help.html>
List-Id: <kgio-public@bogomips.org>
List-Post: <mailto:kgio@librelist.com>
List-Subscribe: <mailto:kgio-subscribe@librelist.com>
List-Unsubscribe: <mailto:kgio-unsubscribe@librelist.com>
Precedence: list
Original-Sender: kgio@librelist.com
Xref: news.gmane.org gmane.comp.lang.ruby.kgio.general:172
Archived-At:
 <http://permalink.gmane.org/gmane.comp.lang.ruby.kgio.general/172>
Received: from zedshaw.xen.prgmr.com ([64.71.167.205]) by plane.gmane.org
 with esmtp (Exim 4.69) (envelope-from <kgio@librelist.com>) id
 1SaL8D-0007bi-IX for gclrkg-kgio@m.gmane.org; Fri, 01 Jun 2012 08:14:42 +0200
Received: from zedshaw.xen.prgmr.com (localhost [IPv6:::1]) by
 zedshaw.xen.prgmr.com (Postfix) with ESMTP id 15B7E21DD28 for
 <gclrkg-kgio@m.gmane.org>; Fri,  1 Jun 2012 06:22:35 +0000 (UTC)
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: PublicInbox::Filter 0.0.1

2012/6/1 Eric Wong <normalperson@yhbt.net>

> Sokolov Yura 'funny-falcon <funny.falcon@gmail.com> wrote:
> > +/* it seems that socket buffer is rarely exceed 1MB */
>
> See other email about LFN.  On the other hand, 1M may be a good size
> anyways to fit into the CPU cache.  Buffer size tuning is hard so
> I usually avoid it :)
> ....
> I've checked some setups I use (75-120ms over 1 gigabit) and those have
> the max tcp_wmem set to 16M(!).  I've also known (and pity) folks that
> have to deal with ~500ms latency on their LFNs.
> > +#define WRITEV_MEMLIMIT (1024*1024)
>

Firstly, I agree with statement about CPU cache. Secondly, if network has
such big latency,
then we still fill kernel buffer in several syscalls. Yes, that would be 16
calls instead of 1, but
16 fast calls is not much worse that 1 slow call (and copying of 16MB of
string could be slow).
Even if we fill buffer at first "giant" call, it is very likely that we
could not pass the same size
of data at the second call, because not much data were sent over network,
and the buffer is not
as free as it were at first call.
May be I'm mistaken at this point. But trying to send unlimited data chunk
seems to me meaningless.

Could you test at your setup with 16M tcp_wmem:
how much `write/send` could send at single call for nonblocking socket?
how much will send subsequent call to `write/send` (when socket became
writeable) ?
how much is slower to send 16 times by 1M that 1 time by 16M?

In other way: some rare systems has no real kernel `writev`, but they
emulate it in a way
`custom_writev` do it. So that, if we pass unlimited total size to it, it
will try to allocate
unlimited memory chunk for buffer. (Happily, Linux on x86/x86_64 seems to
have real `writev` :),
I think *BSD systems also have it. But even glibc has such substitution for
some broken systems)

May be WRITEV_MEMLIMIT could be increased, but 4M seems to me the highest
meaningful
value.


> > +#define WRITEV_IMPL_THRESHOLD 512
>
> I would like to see data/examples/benchmarks for how you arrived at the
> WRITEV_IMPL_THRESHOLD.
>

I tried to send a lot of string sized 100, 200, 300,....1000 bytes (total
size were about 3001000000 bytes) with only library writev and with only
custom_writev . 100 - 400 byte strings were sent faster
with custom_writev (22sec vs 24sec for 400 bytes), 500 byte strings were on
par (~21-22sec) and
600 byte were sent faster with library writev (19sec vs 20 with
custom_writev). With string's size increased, library writev performs
better and better (1000 byte strings were sent for 17-18 sec),
while custom_writev stagnates at 21-22 sec.
512 byte I choose simply cause it looks pretty, and cause GCC could replace
'x / 512' by
'x >> 9'.

I'll post a gist with my benchmark script when I came to work.


> This patch looks pretty good. More comments below.
>
> > +static unsigned int iov_max = 1024; /* this could be overriden in init
> */
> > +
> > +struct io_args_v {
> > +     VALUE io;
> > +     VALUE buf;
> > +     VALUE vec_buf;
> > +     struct iovec *vec;
> > +     unsigned long iov_cnt;
> > +     size_t iov_len;
>
> The "iov_len" name confused me, I think something with "total" is better.
> Perhaps "total_len"?
>

Then "batch_len", cause it's a length of one batch.


> > +static void prepare_writev(struct io_args_v *a, VALUE io, VALUE ary)
> > +{
> > +     long vec_cnt;
> > +     static struct iovec stub = {NULL, 0};
> > +     a->io = io;
> > +     a->fd = my_fileno(io);
> > +     a->something_written = 0;
> > +
> > +     /* rb_ary_subseq will not copy array unless it modified */
> > +     a->buf = rb_ary_subseq(ary, 0, RARRAY_LEN(ary));
> > +
> > +     a->vec_buf = rb_str_new(0, 0);
> > +     a->vec = &stub;
>
> Can we just leave a->vec unset?  It's always set in fill_iovec anyways.
>

Well, I caught Segmentation fault on a test with empty input array in
custom_writev :)
I could fix it with checking a->iov_cnt against zero, but then I though:
  sending empty array to `#kgio_*writev` will be very rare case, so that,
introducing other branch
for handling it will be tiresome, and it's not unwise simply assign
something meaningful to a->vec.
But since this static stub is not used anywhere else, I made it local to
function.

 Static variables inside library function smells funny.
>

I'm funny_falcon, lets it smell by me :)


> > +static void fill_iovec(struct io_args_v *a)
> > +{
> > +     unsigned long i;
> > +     struct iovec *curvec;
> > +
> > +     a->iov_cnt = RARRAY_LEN(a->buf);
> > +     a->iov_len = 0;
> > +     if (a->iov_cnt == 0) return;
> > +     if (a->iov_cnt > iov_max) a->iov_cnt = iov_max;
> > +     rb_str_resize(a->vec_buf, sizeof(struct iovec) * a->iov_cnt);
> > +     curvec = a->vec = (struct iovec*)RSTRING_PTR(a->vec_buf);
>
>
> > +             /* lets limit total memory to write,
> > +              * but always take first string */
> > +             next_len = a->iov_len + str_len;
> > +             if (i && next_len > WRITEV_MEMLIMIT) {
> > +                 a->iov_cnt = i;
> > +                 break;
>
> Use hard tabs like the rest of the file
>

Oh, sorry, it seems I've print it before remember to fix vim settings.
Could vim lines be included in C source's headers for automatic vim
configuration?


> > +static VALUE my_writev(VALUE io, VALUE str, int io_wait)
> > +{
> > +     struct io_args_v a;
> > +     long n;
> > +
> > +     prepare_writev(&a, io, str);
> > +     set_nonblocking(a.fd);
> > +
> > +     do {
> > +             fill_iovec(&a);
> > +#ifdef HAVE_WRITEV
> > +             /* for big strings use library function */
> > +             if (a.iov_len / WRITEV_IMPL_THRESHOLD > a.iov_cnt)
> > +                     n = (long)writev(a.fd, a.vec, a.iov_cnt);
> > +             else
> > +#endif
> > +                     n = (long)custom_writev(a.fd, a.vec, a.iov_cnt,
> a.iov_len);
>
> Please no.  CPP conditionals inside functions is confusing, but CPP
> conditionals mixing with C conditionals makes my eyes bleed.


> How about something like this?
>
> #ifdef HAVE_WRITEV
> static ssize_t
> real_writev(int fd, const struct iovec *vec, int cnt, size_t total)
> {
>        return writev(fd, vec, cnt);
> }
> #else
> #  define real_writev(fd,vec,cnt,total)
> custom_writev((fd),(vec),(cnt),(total))
> #endif
>
Well, you miss C condition.
My thought: it increases lines of code and tangles logic more than my case.
But since
it is your library, and you have much more experience than me, your opinion
is more important.

With regards,
Sokolov Yura aka funny_falcon