cmogstored dev/user discussion/issues/patches/etc
 help / color / mirror / code / Atom feed
From: Arkadi Colson <arkadi@smartbit.be>
To: Eric Wong <e@yhbt.net>
Cc: "cmogstored-public@yhbt.net" <cmogstored-public@yhbt.net>,
	"cmogstored-public@bogomips.org" <cmogstored-public@bogomips.org>
Subject: Re: Heavy load
Date: Wed, 8 Jan 2020 09:40:18 +0000	[thread overview]
Message-ID: <33583fd3-9c2c-5443-3642-e6243f5f3b0e@smartbit.be> (raw)
In-Reply-To: <20200108033506.GA1337@dcvr>



Met vriendelijke groeten
Arkadi Colson

Smartschool • Digitaal Schoolplatform
Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
www.smartschool.be • info@smartschool.be
T +32 11 64 08 80 • F +32 11 64 08 81


On 8/01/20 04:35, Eric Wong wrote:
> Arkadi Colson <arkadi@smartbit.be> wrote:
>> On 18/12/19 18:58, Eric Wong wrote:
>>> Arkadi Colson <arkadi@smartbit.be> wrote:
>>>> On 17/12/19 20:42, Eric Wong wrote:
>>>>> Arkadi Colson <arkadi@smartbit.be> wrote:
>>>>>>> Any idea? If you need more info, please just ask!
>>>>>> We are having about 192 devices spread over about 23 cmogstored hosts.
>>>>>> Each device is one disk with one partition...
>>>>>>> How many "/devXYZ" devices do you have? Are they all
>>>>>>> on different partitions?
>>>>> OK, thanks. I've only got a single host nowadays with 3
>>>>> rotational HDD. Most I ever had was 20 rotational HDD on a
>>>>> host but that place is out-of-business.
>>>>>
>>>>> Since your build did not include -ggdb3 with debug_info by default;
>>>>> I wonder if there's something broken in your build system or
>>>>> build scripts... Which compiler are you using?
>>>>>
>>>>> Can you share the output of "ldd /path/to/cmogstored" ?
>>>> root@mogstore:~# ldd /usr/bin/cmogstored
>>>>      linux-vdso.so.1 (0x00007fff6584b000)
>>>>      libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>>>> (0x00007f634a7f2000)
>>>>      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f634a453000)
>>>>      /lib64/ld-linux-x86-64.so.2 (0x00007f634ac50000)
>>>>
>>>>> Since this is Linux, you're not using libkqueue, are you?
>>> I was hoping for a simple explanation with libkqueue being
>>> the culprit, but that's not it.
>>>
>>> Have you gotten a better backtrace with debug_info? (-ggdb3)
>> No, no other crashes so far...
> OK, so still no crash after a few weeks?
Nope, but in holidays there is less activity on the application so I'm 
afraid we have to wait until it happens again...
> Btw, new address will be:  cmogstored-public@yhbt.net
> bogomips.org is going away since I can't afford it
> (and I hate ICANN for what they're doing to .org TLD)
>
>>>>> Also, which Linux kernel is it?
>>>> In fact it's a clean debian stretch installation with htis kernel:
>>>>
>>>> root@mogstore:~# uname -a
>>>> Linux mogstore 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2
>>>> (2019-11-11) x86_64 GNU/Linux
>>> OK, I don't think there's known problems with that kernel;
>>> cmogstored is pretty sensitive to OS bugs and bugs in
>>> emulation layers like libkqueue.
>>>
>>>>> Are you using "server aio_threads =" via mgmt interface?
>>>> I don't think so. How can I verify this?
>>> You'd have something connecting to the mgmt port (7501 in your
>>> case) and setting "server aio_threads = $NUMBER".
>> I'm not sure I'm getting this setting. How can I set or get this
>> setting? We did not configure anything like this anywhere. So I presume
>> it uses the default setting?
> Yes, it's probably the default.  There's no way of reading the
> actual value in either cmogstored or Perl mogstored, only
> setting it.  You can however display the actual number threads
> the kernel is using by listing /proc/$PID/task/ or using tools like
> ps(1)/top(1).
root@mogstore:~# ls /proc/787/task/ | wc -l
114

> In either case, you can set threads by using telnet to type that
> parameter in via telnet|nc|socat|whatever.
>
>>>>> Are you using the undocumented -W/--worker-processes or
>>>>> -M (multi-config) option(s?)
>>>> I don't think so: Config looks like this:
>>>>
>>>> httplisten  = 0.0.0.0:7500
>>>> mgmtlisten  = 0.0.0.0:7501
>>>> maxconns    = 10000
>>>> docroot     = /var/mogdata
>>>> daemonize   = 1
>>>> server      = none
>>> OK
>>>
>>>>> Is your traffic read-heavy or write-heavy?
>>>> We saw peaks of 3Gb traffic on the newest cmogstore when marking one
>>>> host dead...
>>> Are you able to reproduce the problem on a test instance with
>>> just cmogstored?
>>> (no need for full MogileFS instance, just PUT/GET over HTTP).
>> I had no time yet to try and reproduce the problem in another
>> environment. We try to do it as soon as possible...
> OK.  Since it was near holidays, was your traffic higher or lower?
> (I know some shopping sites hit traffic spikes, but not sure
> about your case).
Lower in our case, so let's wait until it happens again and I will get 
back to you. I tried to reproduce in a test environment but could not 
reproduce...
>
>>> Also, are you on SSD or HDD? Lower latency of SSD could trigger
>>> some bugs. The design is for high-latency HDD, but it ought to
>>> work well with SSD, too. I haven't tested with SSD, much,
>>> unfortunately.
>> We only use HDD, no SSD
>>
>> I will come back to you with more information after the next crash with
>> more debug info. For me it's OK to put the case on hold for now... By
>> the way, thanks a lot already for your help!
> No problem.  I'm sorry for it crashing.  Btw, was that "ldd"
> output from the binary after recompiling with -ggdb3?  If you
> could get the ldd output from a binary which caused a crash,
> it would be good to compare them in case the original build
> was broken.
ldd looks the same before and after recompiling
>
> Fwiw, the only Linux segfault I ever saw in production was fixed in:
> https://bogomips.org/cmogstored-public/e8217a1fe0cf341b/s/
> And that was because it was using the -W/--worker-process feature.
> That instance saw many TB of traffic every day for years and never
> saw any other problem.
>

  reply	other threads:[~2020-01-08  9:40 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 13:54 Heavy load Arkadi Colson
2019-12-11 17:06 ` Eric Wong
2019-12-12  7:30   ` Arkadi Colson
2019-12-12  7:59     ` Eric Wong
2019-12-12 19:16     ` Eric Wong
2019-12-17  7:40       ` Arkadi Colson
2019-12-17  8:43         ` Eric Wong
2019-12-17  8:57           ` Arkadi Colson
2019-12-17 19:42             ` Eric Wong
2019-12-18  7:56               ` Arkadi Colson
2019-12-18 17:58                 ` Eric Wong
2020-01-06  9:46                   ` Arkadi Colson
2020-01-08  3:35                     ` Eric Wong
2020-01-08  9:40                       ` Arkadi Colson [this message]
2020-01-30  0:35                         ` Eric Wong
2020-03-03 15:46                           ` Arkadi Colson
2019-12-17  7:41       ` Arkadi Colson
2019-12-17  8:31         ` Eric Wong
2019-12-17  8:43           ` Arkadi Colson
2019-12-17  8:50             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/cmogstored/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33583fd3-9c2c-5443-3642-e6243f5f3b0e@smartbit.be \
    --to=arkadi@smartbit.be \
    --cc=cmogstored-public@bogomips.org \
    --cc=cmogstored-public@yhbt.net \
    --cc=e@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/cmogstored.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).