From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6939 64.71.128.0/18 X-Spam-Status: No, score=-1.4 required=3.0 tests=AWL,BAYES_00,FREEMAIL_FROM, MSGID_FROM_MTA_HEADER shortcircuit=no autolearn=ham version=3.3.2 Path: news.gmane.org!not-for-mail From: "=?utf-8?b?0K7RgNC40Lkg0KHQvtC60L7Qu9C+0LI=?=" Newsgroups: gmane.comp.lang.ruby.kgio.general Subject: Re: [PATCH 3/3] add `#kgio_writev` and `#kgio_trywritev` Date: Fri, 1 Jun 2012 10:14:23 +0400 Message-ID: References: <20120530203915.GB17661@dcvr.yhbt.net> <1338454833-11754-1-git-send-email-funny.falcon@gmail.com> <20120531211923.GA21892@dcvr.yhbt.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Trace: dough.gmane.org 1338531283 32381 80.91.229.3 (1 Jun 2012 06:14:43 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 1 Jun 2012 06:14:43 +0000 (UTC) To: kgio@librelist.com Original-X-From: kgio@librelist.com Fri Jun 01 08:14:42 2012 Return-path: Envelope-to: gclrkg-kgio@m.gmane.org List-Archive: List-Help: List-Id: List-Post: List-Subscribe: List-Unsubscribe: Precedence: list Original-Sender: kgio@librelist.com Xref: news.gmane.org gmane.comp.lang.ruby.kgio.general:172 Archived-At: Received: from zedshaw.xen.prgmr.com ([64.71.167.205]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SaL8D-0007bi-IX for gclrkg-kgio@m.gmane.org; Fri, 01 Jun 2012 08:14:42 +0200 Received: from zedshaw.xen.prgmr.com (localhost [IPv6:::1]) by zedshaw.xen.prgmr.com (Postfix) with ESMTP id 15B7E21DD28 for ; Fri, 1 Jun 2012 06:22:35 +0000 (UTC) Content-Transfer-Encoding: 7bit X-Content-Filtered-By: PublicInbox::Filter 0.0.1 2012/6/1 Eric Wong > Sokolov Yura 'funny-falcon wrote: > > +/* it seems that socket buffer is rarely exceed 1MB */ > > See other email about LFN. On the other hand, 1M may be a good size > anyways to fit into the CPU cache. Buffer size tuning is hard so > I usually avoid it :) > .... > I've checked some setups I use (75-120ms over 1 gigabit) and those have > the max tcp_wmem set to 16M(!). I've also known (and pity) folks that > have to deal with ~500ms latency on their LFNs. > > +#define WRITEV_MEMLIMIT (1024*1024) > Firstly, I agree with statement about CPU cache. Secondly, if network has such big latency, then we still fill kernel buffer in several syscalls. Yes, that would be 16 calls instead of 1, but 16 fast calls is not much worse that 1 slow call (and copying of 16MB of string could be slow). Even if we fill buffer at first "giant" call, it is very likely that we could not pass the same size of data at the second call, because not much data were sent over network, and the buffer is not as free as it were at first call. May be I'm mistaken at this point. But trying to send unlimited data chunk seems to me meaningless. Could you test at your setup with 16M tcp_wmem: how much `write/send` could send at single call for nonblocking socket? how much will send subsequent call to `write/send` (when socket became writeable) ? how much is slower to send 16 times by 1M that 1 time by 16M? In other way: some rare systems has no real kernel `writev`, but they emulate it in a way `custom_writev` do it. So that, if we pass unlimited total size to it, it will try to allocate unlimited memory chunk for buffer. (Happily, Linux on x86/x86_64 seems to have real `writev` :), I think *BSD systems also have it. But even glibc has such substitution for some broken systems) May be WRITEV_MEMLIMIT could be increased, but 4M seems to me the highest meaningful value. > > +#define WRITEV_IMPL_THRESHOLD 512 > > I would like to see data/examples/benchmarks for how you arrived at the > WRITEV_IMPL_THRESHOLD. > I tried to send a lot of string sized 100, 200, 300,....1000 bytes (total size were about 3001000000 bytes) with only library writev and with only custom_writev . 100 - 400 byte strings were sent faster with custom_writev (22sec vs 24sec for 400 bytes), 500 byte strings were on par (~21-22sec) and 600 byte were sent faster with library writev (19sec vs 20 with custom_writev). With string's size increased, library writev performs better and better (1000 byte strings were sent for 17-18 sec), while custom_writev stagnates at 21-22 sec. 512 byte I choose simply cause it looks pretty, and cause GCC could replace 'x / 512' by 'x >> 9'. I'll post a gist with my benchmark script when I came to work. > This patch looks pretty good. More comments below. > > > +static unsigned int iov_max = 1024; /* this could be overriden in init > */ > > + > > +struct io_args_v { > > + VALUE io; > > + VALUE buf; > > + VALUE vec_buf; > > + struct iovec *vec; > > + unsigned long iov_cnt; > > + size_t iov_len; > > The "iov_len" name confused me, I think something with "total" is better. > Perhaps "total_len"? > Then "batch_len", cause it's a length of one batch. > > +static void prepare_writev(struct io_args_v *a, VALUE io, VALUE ary) > > +{ > > + long vec_cnt; > > + static struct iovec stub = {NULL, 0}; > > + a->io = io; > > + a->fd = my_fileno(io); > > + a->something_written = 0; > > + > > + /* rb_ary_subseq will not copy array unless it modified */ > > + a->buf = rb_ary_subseq(ary, 0, RARRAY_LEN(ary)); > > + > > + a->vec_buf = rb_str_new(0, 0); > > + a->vec = &stub; > > Can we just leave a->vec unset? It's always set in fill_iovec anyways. > Well, I caught Segmentation fault on a test with empty input array in custom_writev :) I could fix it with checking a->iov_cnt against zero, but then I though: sending empty array to `#kgio_*writev` will be very rare case, so that, introducing other branch for handling it will be tiresome, and it's not unwise simply assign something meaningful to a->vec. But since this static stub is not used anywhere else, I made it local to function. Static variables inside library function smells funny. > I'm funny_falcon, lets it smell by me :) > > +static void fill_iovec(struct io_args_v *a) > > +{ > > + unsigned long i; > > + struct iovec *curvec; > > + > > + a->iov_cnt = RARRAY_LEN(a->buf); > > + a->iov_len = 0; > > + if (a->iov_cnt == 0) return; > > + if (a->iov_cnt > iov_max) a->iov_cnt = iov_max; > > + rb_str_resize(a->vec_buf, sizeof(struct iovec) * a->iov_cnt); > > + curvec = a->vec = (struct iovec*)RSTRING_PTR(a->vec_buf); > > > > + /* lets limit total memory to write, > > + * but always take first string */ > > + next_len = a->iov_len + str_len; > > + if (i && next_len > WRITEV_MEMLIMIT) { > > + a->iov_cnt = i; > > + break; > > Use hard tabs like the rest of the file > Oh, sorry, it seems I've print it before remember to fix vim settings. Could vim lines be included in C source's headers for automatic vim configuration? > > +static VALUE my_writev(VALUE io, VALUE str, int io_wait) > > +{ > > + struct io_args_v a; > > + long n; > > + > > + prepare_writev(&a, io, str); > > + set_nonblocking(a.fd); > > + > > + do { > > + fill_iovec(&a); > > +#ifdef HAVE_WRITEV > > + /* for big strings use library function */ > > + if (a.iov_len / WRITEV_IMPL_THRESHOLD > a.iov_cnt) > > + n = (long)writev(a.fd, a.vec, a.iov_cnt); > > + else > > +#endif > > + n = (long)custom_writev(a.fd, a.vec, a.iov_cnt, > a.iov_len); > > Please no. CPP conditionals inside functions is confusing, but CPP > conditionals mixing with C conditionals makes my eyes bleed. > How about something like this? > > #ifdef HAVE_WRITEV > static ssize_t > real_writev(int fd, const struct iovec *vec, int cnt, size_t total) > { > return writev(fd, vec, cnt); > } > #else > # define real_writev(fd,vec,cnt,total) > custom_writev((fd),(vec),(cnt),(total)) > #endif > Well, you miss C condition. My thought: it increases lines of code and tangles logic more than my case. But since it is your library, and you have much more experience than me, your opinion is more important. With regards, Sokolov Yura aka funny_falcon