cmogstored dev/user discussion/issues/patches/etc
 help / color / mirror / code / Atom feed
From: Eric Wong <e@yhbt.net>
To: Arkadi Colson <arkadi@smartbit.be>
Cc: cmogstored-public@yhbt.net, cmogstored-public@bogomips.org
Subject: Re: Heavy load
Date: Wed, 8 Jan 2020 03:35:06 +0000	[thread overview]
Message-ID: <20200108033506.GA1337@dcvr> (raw)
In-Reply-To: <e1907a1e-e60e-93e0-544e-f57d6741db20@smartbit.be>

Arkadi Colson <arkadi@smartbit.be> wrote:
> On 18/12/19 18:58, Eric Wong wrote:
> > Arkadi Colson <arkadi@smartbit.be> wrote:
> >> On 17/12/19 20:42, Eric Wong wrote:
> >>> Arkadi Colson <arkadi@smartbit.be> wrote:
> >>>>> Any idea? If you need more info, please just ask!
> >>>> We are having about 192 devices spread over about 23 cmogstored hosts.
> >>>> Each device is one disk with one partition...
> >>>>> How many "/devXYZ" devices do you have? Are they all
> >>>>> on different partitions?
> >>> OK, thanks. I've only got a single host nowadays with 3
> >>> rotational HDD. Most I ever had was 20 rotational HDD on a
> >>> host but that place is out-of-business.
> >>>
> >>> Since your build did not include -ggdb3 with debug_info by default;
> >>> I wonder if there's something broken in your build system or
> >>> build scripts... Which compiler are you using?
> >>>
> >>> Can you share the output of "ldd /path/to/cmogstored" ?
> >> root@mogstore:~# ldd /usr/bin/cmogstored
> >>     linux-vdso.so.1 (0x00007fff6584b000)
> >>     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >> (0x00007f634a7f2000)
> >>     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
> >>     /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
> >>
> >>> Since this is Linux, you're not using libkqueue, are you?
> > I was hoping for a simple explanation with libkqueue being
> > the culprit, but that's not it.
> >
> > Have you gotten a better backtrace with debug_info? (-ggdb3)
> No, no other crashes so far...

OK, so still no crash after a few weeks?

Btw, new address will be:  cmogstored-public@yhbt.net
bogomips.org is going away since I can't afford it
(and I hate ICANN for what they're doing to .org TLD)

> >>> Also, which Linux kernel is it?
> >> In fact it's a clean debian stretch installation with htis kernel:
> >>
> >> root@mogstore:~# uname -a
> >> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2
> >> (2019-11-11) x86_64 GNU/Linux
> > OK, I don't think there's known problems with that kernel;
> > cmogstored is pretty sensitive to OS bugs and bugs in
> > emulation layers like libkqueue.
> >
> >>> Are you using "server aio_threads =" via mgmt interface?
> >> I don't think so. How can I verify this?
> > You'd have something connecting to the mgmt port (7501 in your
> > case) and setting "server aio_threads = $NUMBER".
> I'm not sure I'm getting this setting. How can I set or get this 
> setting? We did not configure anything like this anywhere. So I presume 
> it uses the default setting?

Yes, it's probably the default.  There's no way of reading the
actual value in either cmogstored or Perl mogstored, only
setting it.  You can however display the actual number threads
the kernel is using by listing /proc/$PID/task/ or using tools like
ps(1)/top(1).

In either case, you can set threads by using telnet to type that
parameter in via telnet|nc|socat|whatever.

> >>> Are you using the undocumented -W/--worker-processes or
> >>> -M (multi-config) option(s?)
> >> I don't think so: Config looks like this:
> >>
> >> httplisten  = 0.0.0.0:7500
> >> mgmtlisten  = 0.0.0.0:7501
> >> maxconns    = 10000
> >> docroot     = /var/mogdata
> >> daemonize   = 1
> >> server      = none
> > OK
> >
> >>> Is your traffic read-heavy or write-heavy?
> >> We saw peaks of 3Gb traffic on the newest cmogstore when marking one
> >> host dead...
> > Are you able to reproduce the problem on a test instance with
> > just cmogstored?
> > (no need for full MogileFS instance, just PUT/GET over HTTP).
> I had no time yet to try and reproduce the problem in another 
> environment. We try to do it as soon as possible...

OK.  Since it was near holidays, was your traffic higher or lower?
(I know some shopping sites hit traffic spikes, but not sure
about your case).

> > Also, are you on SSD or HDD? Lower latency of SSD could trigger
> > some bugs. The design is for high-latency HDD, but it ought to
> > work well with SSD, too. I haven't tested with SSD, much,
> > unfortunately.
> 
> We only use HDD, no SSD
> 
> I will come back to you with more information after the next crash with 
> more debug info. For me it's OK to put the case on hold for now... By 
> the way, thanks a lot already for your help!

No problem.  I'm sorry for it crashing.  Btw, was that "ldd"
output from the binary after recompiling with -ggdb3?  If you
could get the ldd output from a binary which caused a crash,
it would be good to compare them in case the original build
was broken.

Fwiw, the only Linux segfault I ever saw in production was fixed in:
https://bogomips.org/cmogstored-public/e8217a1fe0cf341b/s/
And that was because it was using the -W/--worker-process feature.
That instance saw many TB of traffic every day for years and never
saw any other problem.


  reply	other threads:[~2020-01-08  3:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 13:54 Heavy load Arkadi Colson
2019-12-11 17:06 ` Eric Wong
2019-12-12  7:30   ` Arkadi Colson
2019-12-12  7:59     ` Eric Wong
2019-12-12 19:16     ` Eric Wong
2019-12-17  7:40       ` Arkadi Colson
2019-12-17  8:43         ` Eric Wong
2019-12-17  8:57           ` Arkadi Colson
2019-12-17 19:42             ` Eric Wong
2019-12-18  7:56               ` Arkadi Colson
2019-12-18 17:58                 ` Eric Wong
2020-01-06  9:46                   ` Arkadi Colson
2020-01-08  3:35                     ` Eric Wong [this message]
2020-01-08  9:40                       ` Arkadi Colson
2020-01-30  0:35                         ` Eric Wong
2020-03-03 15:46                           ` Arkadi Colson
2019-12-17  7:41       ` Arkadi Colson
2019-12-17  8:31         ` Eric Wong
2019-12-17  8:43           ` Arkadi Colson
2019-12-17  8:50             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/cmogstored/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200108033506.GA1337@dcvr \
    --to=e@yhbt.net \
    --cc=arkadi@smartbit.be \
    --cc=cmogstored-public@bogomips.org \
    --cc=cmogstored-public@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/cmogstored.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).