From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-32-ewr.dyndns.com (mxout-054-ewr.mailhop.org [216.146.33.54]) by lists.bufferbloat.net (Postfix) with ESMTP id 288D22E0545 for ; Mon, 18 Apr 2011 07:11:19 -0700 (PDT) Received: from scan-32-ewr.mailhop.org (scan-32-ewr.local [10.0.141.238]) by mail-32-ewr.dyndns.com (Postfix) with ESMTP id 12FE16FDBBE for ; Mon, 18 Apr 2011 14:11:19 +0000 (UTC) X-Spam-Score: 0.0 () X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 209.85.161.171 Received: from mail-gx0-f171.google.com (mail-gx0-f171.google.com [209.85.161.171]) by mail-32-ewr.dyndns.com (Postfix) with ESMTP id 177FD6FD10B for ; Mon, 18 Apr 2011 14:11:14 +0000 (UTC) Received: by gxk22 with SMTP id 22so2141129gxk.16 for ; Mon, 18 Apr 2011 07:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer; bh=psm/YBf6mhi0RLW9QdsmSYkqzPSOCZjRaFjAmqL75iw=; b=O72Dzcg09gNxVnW4j+9xJV8VXq05ZuKJbHad9Mvl5+3DFUbgWpsr09kWsC9hN0LG4i D3a/Y5zd1+GumG8q+pZy3Oel4p16MosCatVi2tsfzJbKzDz7hWoFuVGve0jyNttLMeYb tHrRbis7EduYOBbkOHyg8cHc3RTiup3QQUv+c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=DaIdXSWmWguH0CgIMUurSn/K3hpdT/Uj+kG5wvK31NarrKLIfqVl4iWXcb3TCLHJE/ ZyDlOMqOwtckUSTBChgTZst9p+7qIH028mCKu/kGm61zVIl6rkm5s5rPeJgER5fG0aFQ 4zSAF8Tsle1Y1yeZsTzJF0sWMsZ09TSwM3Pwc= Received: by 10.150.255.36 with SMTP id c36mr4758262ybi.56.1303135872672; Mon, 18 Apr 2011 07:11:12 -0700 (PDT) Received: from [172.16.0.159] (adsl-145-176-102.asm.bellsouth.net [72.145.176.102]) by mx.google.com with ESMTPS id p33sm810299ybk.2.2011.04.18.07.11.10 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 18 Apr 2011 07:11:11 -0700 (PDT) Sender: Nick Feamster Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Nick Feamster In-Reply-To: <4DAC329E.3040603@taht.net> Date: Mon, 18 Apr 2011 10:11:10 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <34F36110-1468-4ADC-816E-48D869E76DA2@cc.gatech.edu> References: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu> <4DAC2C03.2060502@taht.net> <4DAC329E.3040603@taht.net> To: Dave Taht X-Mailer: Apple Mail (2.1082) Cc: hndr@lists.pixilab.org, bismark-devel@lists.bufferbloat.net, abhishekjain95@gmail.com Subject: Re: [Bismark-devel] [Hndr] Dates are really tricky!! X-BeenThere: bismark-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: BISMark related software development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2011 14:11:20 -0000 Dave, Django does allow native SQL, but the cool thing about it is that most = queries can be done with Python methods, that essentially do = introspection on the database tables. It's pretty slick. Switching = from mysql to postgres (when we are ready) is a one-line change. I like the idea of generating summary tables per day. That would = _really_ speed things up, I think, and most people probably are just = looking at the most recent queries by default. Is this in the bug = database yet? I'll be happy to file it. :-) Who is working on this = part? -Nick On Apr 18, 2011, at 8:46 AM, Dave Taht wrote: > On 04/18/2011 06:18 AM, Dave Taht wrote: >> On 04/17/2011 11:57 PM, Nick Feamster wrote: >>=20 >>> I think we can whip up some quick visualizations of what's in the DB = with a nice Web page with some concerted python hacking. For starters, = it would be nice to have a version of the data whereby queries can = actually complete in a reasonable time. :-) What's the best way to do = this? Split the table by months? >>>=20 >> I did look over one of the slower (30+ seconds!) queries and = suggested that a between statement be used instead of an and for the = date range, but we never got around to testing that and mysql used to = not be able to do anything sane with a between range (postgres can). >>=20 >> e.g: >>=20 >> mysql> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, = sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from = FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and = tsstart>=20 >> MIGHT do better with: >>=20 >> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, = sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from = FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and = unix_timestamp('2011-2-20') and deviceid=3D'NB105' group by dstport = order by down_bytes desc limit 20; >>=20 >>=20 >> I'd not be huge on divvying up tables by month, but creating summary = tables by day and by hour would be a good idea. Usage patterns tend to = be self-similar across day of week and by hour of day. >>=20 >> I note that it is important to be able to capture hour of day AT the = location (EDT/MDT/PDT), as that is more self-similar that hour of day = (UTC) >>=20 > To be clearer, what I suggest above is not solved by what I wrote = below, and the below contained an error. >=20 > Coping with dates can be really tricky and should not be undertaken = before my first cup of coffee. >=20 > http://www.postgresql.org/docs/8.2/static/datatype-datetime.html >=20 >> So the above becomes a nightly cron job in a specialized summary = table, something like (untested) >>=20 >> drop table device_port_daily_summary; >> create table device_port_daily_summary as >>=20 > select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM = TIMESTAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as = down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from = FLOWS_newformat group by deviceid, dstport, test_day, test_year; >=20 > The way described below has a day-range of 0-365, which wasn't what = you wanted. Days since the epoch or the start of database time would = probably be better than explicitly storing the year... but I don't = remember the good way of doing that. >=20 > Dates are tricky. >=20 >> select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as = up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as = down_bytes from FLOWS_newformat group by deviceid, dstport, test_day >=20 > would be wrong. >=20 > _______________________________________________ > HNDR mailing list > HNDR@lists.pixilab.org > http://lists.pixilab.org/listinfo.cgi/hndr-pixilab.org