Historic archive of defunct list bismark-devel@lists.bufferbloat.net
 help / color / mirror / Atom feed
From: Dave Taht <d@taht.net>
To: hndr@lists.pixilab.org, bismark-devel@lists.bufferbloat.net,
	 abhishekjain95@gmail.com
Subject: Re: [Bismark-devel] [Hndr] FCC app progress: django
Date: Mon, 18 Apr 2011 06:18:11 -0600	[thread overview]
Message-ID: <4DAC2C03.2060502@taht.net> (raw)
In-Reply-To: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu>

On 04/17/2011 11:57 PM, Nick Feamster wrote:
> (sorry for list cross-posting)
>
> Srikanth, Abhishek, Walter,
Taking walter and sri off the explicit cc list as both are already on 
bismark-devel.

> Small, slow progress on the FCC app---
>
> I have a written a very simple server with django.  I've got the django server hooked up to our bismark mysql database.  It's basically a slick python wrapper for doing Web development with a database back-end.   The setup is ugly right now, but it's a proof of concept, and it works (django is talking to our back-end).  For example:
> http://networkdashboard.org:8000/summary/
>
How we have configured the web servers here has had us survive several 
slashdottings.

http://www.bufferbloat.net/projects/bloat/wiki/Dogfood_Principle

Using fast cgi vs wsgi would probably lead to some security benefits and 
more predictable behavior, however it's a tossup.

http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/

> Django has templates that will allow us to make this slick and pretty, but it's getting a little late for that tonight.
>
> I think we can whip up some quick visualizations of what's in the DB with a nice Web page with some concerted python hacking.  For starters, it would be nice to have a version of the data whereby queries can actually complete in a reasonable time.  :-)  What's the best way to do this?  Split the table by months?
>

Your mysql configuration was an out of the box default one designed for 
very small databases. Yours is quite large. Increasing several buffer 
sizes by several orders of magnitude would help a lot.

Sri had made significant inroads into porting the database into postgres 
when I was there. Postgres is not only more highly performant and 
flexible than mysql, with a more capable parser, but has good analysis 
tools. It was my hope to benchmark some of your slowest existing queries 
using postgres's ANALYZE <query> tool to see how the design could be 
improved overall.

I did look over one of the slower (30+ seconds!) queries and suggested 
that a between statement be used instead of an and for the date range, 
but we never got around to testing that and mysql used to not be able to 
do anything sane with a between range (postgres can).

e.g:

mysql>  select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and tsstart<unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;

MIGHT do better with:

select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;


I'd not be huge on divvying up tables by month, but creating summary 
tables by day and by hour would be a good idea. Usage patterns tend to 
be self-similar across day of week and by hour of day.

I note that it is important to be able to capture hour of day AT the 
location (EDT/MDT/PDT), as that is more self-similar that hour of day (UTC)

So the above becomes a nightly cron job in a specialized summary table, 
something like (untested)

drop table device_port_daily_summary;
create table device_port_daily_summary as

select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day


(although it would be mildly better to do this incrementally, creating a 
primary key on the first 3 fields, and perhaps reversing the order of 
primary fields above. Depends on the statistics)

> I will check what I have into the bismark source tree so that others can work on this.  (I'm looking at Abhishek :-)
>
While not exactly allergic to svn, I am a git fan-boy. Even when working 
with svn these days I use git-svn to cope with it (offline git logs 
rule). I'm not asking you to change your scm backend, just sayin...



  parent reply	other threads:[~2011-04-18 12:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-18  5:57 [Bismark-devel] " Nick Feamster
2011-04-18  7:57 ` [Bismark-devel] [Hndr] " Nick Feamster
2011-04-18 12:18 ` Dave Taht [this message]
2011-04-18 12:46   ` [Bismark-devel] Dates are really tricky!! Dave Taht
2011-04-18 13:22     ` [Bismark-devel] [Hndr] " Marshini Chetty
2011-04-18 14:11     ` Nick Feamster
2011-04-18 14:37       ` Srikanth Sundaresan
2011-04-18 14:39         ` Nick Feamster
2011-04-18 14:42           ` Dave Taht
2011-04-18 14:58           ` Beki Grinter
2011-04-18 15:03             ` Nick Feamster
2011-04-18 15:35               ` Marshini Chetty
2011-04-18 16:02                 ` Keith Edwards

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DAC2C03.2060502@taht.net \
    --to=d@taht.net \
    --cc=abhishekjain95@gmail.com \
    --cc=bismark-devel@lists.bufferbloat.net \
    --cc=hndr@lists.pixilab.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox