From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <feamster@gmail.com>
Received: from mail-32-ewr.dyndns.com (mxout-054-ewr.mailhop.org
	[216.146.33.54])
	by lists.bufferbloat.net (Postfix) with ESMTP id 288D22E0545
	for <bismark-devel@lists.bufferbloat.net>;
	Mon, 18 Apr 2011 07:11:19 -0700 (PDT)
Received: from scan-32-ewr.mailhop.org (scan-32-ewr.local [10.0.141.238])
	by mail-32-ewr.dyndns.com (Postfix) with ESMTP id 12FE16FDBBE
	for <bismark-devel@lists.bufferbloat.net>;
	Mon, 18 Apr 2011 14:11:19 +0000 (UTC)
X-Spam-Score: 0.0 ()
X-Mail-Handler: MailHop by DynDNS
X-Originating-IP: 209.85.161.171
Received: from mail-gx0-f171.google.com (mail-gx0-f171.google.com
	[209.85.161.171])
	by mail-32-ewr.dyndns.com (Postfix) with ESMTP id 177FD6FD10B
	for <bismark-devel@lists.bufferbloat.net>;
	Mon, 18 Apr 2011 14:11:14 +0000 (UTC)
Received: by gxk22 with SMTP id 22so2141129gxk.16
	for <bismark-devel@lists.bufferbloat.net>;
	Mon, 18 Apr 2011 07:11:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:sender:subject:mime-version:content-type:from
	:in-reply-to:date:cc:content-transfer-encoding:message-id:references
	:to:x-mailer; bh=psm/YBf6mhi0RLW9QdsmSYkqzPSOCZjRaFjAmqL75iw=;
	b=O72Dzcg09gNxVnW4j+9xJV8VXq05ZuKJbHad9Mvl5+3DFUbgWpsr09kWsC9hN0LG4i
	D3a/Y5zd1+GumG8q+pZy3Oel4p16MosCatVi2tsfzJbKzDz7hWoFuVGve0jyNttLMeYb
	tHrRbis7EduYOBbkOHyg8cHc3RTiup3QQUv+c=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	b=DaIdXSWmWguH0CgIMUurSn/K3hpdT/Uj+kG5wvK31NarrKLIfqVl4iWXcb3TCLHJE/
	ZyDlOMqOwtckUSTBChgTZst9p+7qIH028mCKu/kGm61zVIl6rkm5s5rPeJgER5fG0aFQ
	4zSAF8Tsle1Y1yeZsTzJF0sWMsZ09TSwM3Pwc=
Received: by 10.150.255.36 with SMTP id c36mr4758262ybi.56.1303135872672;
	Mon, 18 Apr 2011 07:11:12 -0700 (PDT)
Received: from [172.16.0.159] (adsl-145-176-102.asm.bellsouth.net
	[72.145.176.102])
	by mx.google.com with ESMTPS id p33sm810299ybk.2.2011.04.18.07.11.10
	(version=TLSv1/SSLv3 cipher=OTHER);
	Mon, 18 Apr 2011 07:11:11 -0700 (PDT)
Sender: Nick Feamster <feamster@gmail.com>
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Nick Feamster <feamster@cc.gatech.edu>
In-Reply-To: <4DAC329E.3040603@taht.net>
Date: Mon, 18 Apr 2011 10:11:10 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <34F36110-1468-4ADC-816E-48D869E76DA2@cc.gatech.edu>
References: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu>
	<4DAC2C03.2060502@taht.net> <4DAC329E.3040603@taht.net>
To: Dave Taht <d@taht.net>
X-Mailer: Apple Mail (2.1082)
Cc: hndr@lists.pixilab.org, bismark-devel@lists.bufferbloat.net,
	abhishekjain95@gmail.com
Subject: Re: [Bismark-devel] [Hndr] Dates are really tricky!!
X-BeenThere: bismark-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: BISMark related software development
	<bismark-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bismark-devel>,
	<mailto:bismark-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bismark-devel>
List-Post: <mailto:bismark-devel@lists.bufferbloat.net>
List-Help: <mailto:bismark-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bismark-devel>,
	<mailto:bismark-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2011 14:11:20 -0000

Dave,

Django does allow native SQL, but the cool thing about it is that most =
queries can be done with Python methods, that essentially do =
introspection on the database tables.  It's pretty slick.  Switching =
from mysql to postgres (when we are ready) is a one-line change.

I like the idea of generating summary tables per day.  That would =
_really_ speed things up, I think, and most people probably are just =
looking at the most recent queries by default.  Is this in the bug =
database yet?  I'll be happy to file it. :-)  Who is working on this =
part?

-Nick


On Apr 18, 2011, at 8:46 AM, Dave Taht wrote:

> On 04/18/2011 06:18 AM, Dave Taht wrote:
>> On 04/17/2011 11:57 PM, Nick Feamster wrote:
>>=20
>>> I think we can whip up some quick visualizations of what's in the DB =
with a nice Web page with some concerted python hacking.  For starters, =
it would be nice to have a version of the data whereby queries can =
actually complete in a reasonable time.  :-)  What's the best way to do =
this?  Split the table by months?
>>>=20
>> I did look over one of the slower (30+ seconds!) queries and =
suggested that a between statement be used instead of an and for the =
date range, but we never got around to testing that and mysql used to =
not be able to do anything sane with a between range (postgres can).
>>=20
>> e.g:
>>=20
>> mysql>  select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, =
sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from =
FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and =
tsstart<unix_timestamp('2011-2-20') and deviceid=3D'NB105' group by =
dstport order by down_bytes desc limit 20;
>>=20
>> MIGHT do better with:
>>=20
>> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, =
sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from =
FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and =
unix_timestamp('2011-2-20') and deviceid=3D'NB105' group by dstport =
order by down_bytes desc limit 20;
>>=20
>>=20
>> I'd not be huge on divvying up tables by month, but creating summary =
tables by day and by hour would be a good idea. Usage patterns tend to =
be self-similar across day of week and by hour of day.
>>=20
>> I note that it is important to be able to capture hour of day AT the =
location (EDT/MDT/PDT), as that is more self-similar that hour of day =
(UTC)
>>=20
> To be clearer, what I suggest above is not solved by what I wrote =
below, and the below contained an error.
>=20
> Coping with dates can be really tricky and should not be undertaken =
before my first cup of coffee.
>=20
> http://www.postgresql.org/docs/8.2/static/datatype-datetime.html
>=20
>> So the above becomes a nightly cron job in a specialized summary =
table, something like (untested)
>>=20
>> drop table device_port_daily_summary;
>> create table device_port_daily_summary as
>>=20
> select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM =
TIMESTAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as =
down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from =
FLOWS_newformat group by deviceid, dstport, test_day, test_year;
>=20
> The way described below has a day-range of 0-365, which wasn't what =
you wanted. Days since the epoch or the start of database time would =
probably be better than explicitly storing the year... but I don't =
remember the good way of doing that.
>=20
> Dates are tricky.
>=20
>> select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as =
up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as =
down_bytes from FLOWS_newformat group by deviceid, dstport, test_day
>=20
> would be wrong.
>=20
> _______________________________________________
> HNDR mailing list
> HNDR@lists.pixilab.org
> http://lists.pixilab.org/listinfo.cgi/hndr-pixilab.org