Historic archive of defunct list bismark-devel@lists.bufferbloat.net
 help / color / mirror / Atom feed
From: Nick Feamster <feamster@cc.gatech.edu>
To: Dave Taht <d@taht.net>
Cc: hndr@lists.pixilab.org, bismark-devel@lists.bufferbloat.net,
	abhishekjain95@gmail.com
Subject: Re: [Bismark-devel] [Hndr] Dates are really tricky!!
Date: Mon, 18 Apr 2011 10:11:10 -0400	[thread overview]
Message-ID: <34F36110-1468-4ADC-816E-48D869E76DA2@cc.gatech.edu> (raw)
In-Reply-To: <4DAC329E.3040603@taht.net>

Dave,

Django does allow native SQL, but the cool thing about it is that most queries can be done with Python methods, that essentially do introspection on the database tables.  It's pretty slick.  Switching from mysql to postgres (when we are ready) is a one-line change.

I like the idea of generating summary tables per day.  That would _really_ speed things up, I think, and most people probably are just looking at the most recent queries by default.  Is this in the bug database yet?  I'll be happy to file it. :-)  Who is working on this part?

-Nick


On Apr 18, 2011, at 8:46 AM, Dave Taht wrote:

> On 04/18/2011 06:18 AM, Dave Taht wrote:
>> On 04/17/2011 11:57 PM, Nick Feamster wrote:
>> 
>>> I think we can whip up some quick visualizations of what's in the DB with a nice Web page with some concerted python hacking.  For starters, it would be nice to have a version of the data whereby queries can actually complete in a reasonable time.  :-)  What's the best way to do this?  Split the table by months?
>>> 
>> I did look over one of the slower (30+ seconds!) queries and suggested that a between statement be used instead of an and for the date range, but we never got around to testing that and mysql used to not be able to do anything sane with a between range (postgres can).
>> 
>> e.g:
>> 
>> mysql>  select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and tsstart<unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;
>> 
>> MIGHT do better with:
>> 
>> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;
>> 
>> 
>> I'd not be huge on divvying up tables by month, but creating summary tables by day and by hour would be a good idea. Usage patterns tend to be self-similar across day of week and by hour of day.
>> 
>> I note that it is important to be able to capture hour of day AT the location (EDT/MDT/PDT), as that is more self-similar that hour of day (UTC)
>> 
> To be clearer, what I suggest above is not solved by what I wrote below, and the below contained an error.
> 
> Coping with dates can be really tricky and should not be undertaken before my first cup of coffee.
> 
> http://www.postgresql.org/docs/8.2/static/datatype-datetime.html
> 
>> So the above becomes a nightly cron job in a specialized summary table, something like (untested)
>> 
>> drop table device_port_daily_summary;
>> create table device_port_daily_summary as
>> 
> select deviceid, dstport, day(tsstart) as test_day, EXTRACT(YEAR FROM TIMESTAMP tsstart) as test_year , sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day, test_year;
> 
> The way described below has a day-range of 0-365, which wasn't what you wanted. Days since the epoch or the start of database time would probably be better than explicitly storing the year... but I don't remember the good way of doing that.
> 
> Dates are tricky.
> 
>> select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day
> 
> would be wrong.
> 
> _______________________________________________
> HNDR mailing list
> HNDR@lists.pixilab.org
> http://lists.pixilab.org/listinfo.cgi/hndr-pixilab.org


  parent reply	other threads:[~2011-04-18 14:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-18  5:57 [Bismark-devel] FCC app progress: django Nick Feamster
2011-04-18  7:57 ` [Bismark-devel] [Hndr] " Nick Feamster
2011-04-18 12:18 ` Dave Taht
2011-04-18 12:46   ` [Bismark-devel] Dates are really tricky!! Dave Taht
2011-04-18 13:22     ` [Bismark-devel] [Hndr] " Marshini Chetty
2011-04-18 14:11     ` Nick Feamster [this message]
2011-04-18 14:37       ` Srikanth Sundaresan
2011-04-18 14:39         ` Nick Feamster
2011-04-18 14:42           ` Dave Taht
2011-04-18 14:58           ` Beki Grinter
2011-04-18 15:03             ` Nick Feamster
2011-04-18 15:35               ` Marshini Chetty
2011-04-18 16:02                 ` Keith Edwards

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34F36110-1468-4ADC-816E-48D869E76DA2@cc.gatech.edu \
    --to=feamster@cc.gatech.edu \
    --cc=abhishekjain95@gmail.com \
    --cc=bismark-devel@lists.bufferbloat.net \
    --cc=d@taht.net \
    --cc=hndr@lists.pixilab.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox