From: Dave Taht <d@taht.net>
To: hndr@lists.pixilab.org, bismark-devel@lists.bufferbloat.net,
abhishekjain95@gmail.com
Subject: Re: [Bismark-devel] [Hndr] FCC app progress: django
Date: Mon, 18 Apr 2011 06:18:11 -0600 [thread overview]
Message-ID: <4DAC2C03.2060502@taht.net> (raw)
In-Reply-To: <27196F99-CB9E-4963-8AA9-71A66B567B69@cc.gatech.edu>
On 04/17/2011 11:57 PM, Nick Feamster wrote:
> (sorry for list cross-posting)
>
> Srikanth, Abhishek, Walter,
Taking walter and sri off the explicit cc list as both are already on
bismark-devel.
> Small, slow progress on the FCC app---
>
> I have a written a very simple server with django. I've got the django server hooked up to our bismark mysql database. It's basically a slick python wrapper for doing Web development with a database back-end. The setup is ugly right now, but it's a proof of concept, and it works (django is talking to our back-end). For example:
> http://networkdashboard.org:8000/summary/
>
How we have configured the web servers here has had us survive several
slashdottings.
http://www.bufferbloat.net/projects/bloat/wiki/Dogfood_Principle
Using fast cgi vs wsgi would probably lead to some security benefits and
more predictable behavior, however it's a tossup.
http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/
> Django has templates that will allow us to make this slick and pretty, but it's getting a little late for that tonight.
>
> I think we can whip up some quick visualizations of what's in the DB with a nice Web page with some concerted python hacking. For starters, it would be nice to have a version of the data whereby queries can actually complete in a reasonable time. :-) What's the best way to do this? Split the table by months?
>
Your mysql configuration was an out of the box default one designed for
very small databases. Yours is quite large. Increasing several buffer
sizes by several orders of magnitude would help a lot.
Sri had made significant inroads into porting the database into postgres
when I was there. Postgres is not only more highly performant and
flexible than mysql, with a more capable parser, but has good analysis
tools. It was my hope to benchmark some of your slowest existing queries
using postgres's ANALYZE <query> tool to see how the design could be
improved overall.
I did look over one of the slower (30+ seconds!) queries and suggested
that a between statement be used instead of an and for the date range,
but we never got around to testing that and mysql used to not be able to
do anything sane with a between range (postgres can).
e.g:
mysql> select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart>unix_timestamp('2011-2-6') and tsstart<unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;
MIGHT do better with:
select dstport, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat where tsstart between unix_timestamp('2011-2-6') and unix_timestamp('2011-2-20') and deviceid='NB105' group by dstport order by down_bytes desc limit 20;
I'd not be huge on divvying up tables by month, but creating summary
tables by day and by hour would be a good idea. Usage patterns tend to
be self-similar across day of week and by hour of day.
I note that it is important to be able to capture hour of day AT the
location (EDT/MDT/PDT), as that is more self-similar that hour of day (UTC)
So the above becomes a nightly cron job in a specialized summary table,
something like (untested)
drop table device_port_daily_summary;
create table device_port_daily_summary as
select deviceid, dstport, day(tsstart) as test_day, sum(uppkts) as up_p, sum(dwpkts) as down_p, sum(upbytes) as up_bytes, sum(dwbytes) as down_bytes from FLOWS_newformat group by deviceid, dstport, test_day
(although it would be mildly better to do this incrementally, creating a
primary key on the first 3 fields, and perhaps reversing the order of
primary fields above. Depends on the statistics)
> I will check what I have into the bismark source tree so that others can work on this. (I'm looking at Abhishek :-)
>
While not exactly allergic to svn, I am a git fan-boy. Even when working
with svn these days I use git-svn to cope with it (offline git logs
rule). I'm not asking you to change your scm backend, just sayin...
next prev parent reply other threads:[~2011-04-18 12:18 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-18 5:57 [Bismark-devel] " Nick Feamster
2011-04-18 7:57 ` [Bismark-devel] [Hndr] " Nick Feamster
2011-04-18 12:18 ` Dave Taht [this message]
2011-04-18 12:46 ` [Bismark-devel] Dates are really tricky!! Dave Taht
2011-04-18 13:22 ` [Bismark-devel] [Hndr] " Marshini Chetty
2011-04-18 14:11 ` Nick Feamster
2011-04-18 14:37 ` Srikanth Sundaresan
2011-04-18 14:39 ` Nick Feamster
2011-04-18 14:42 ` Dave Taht
2011-04-18 14:58 ` Beki Grinter
2011-04-18 15:03 ` Nick Feamster
2011-04-18 15:35 ` Marshini Chetty
2011-04-18 16:02 ` Keith Edwards
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DAC2C03.2060502@taht.net \
--to=d@taht.net \
--cc=abhishekjain95@gmail.com \
--cc=bismark-devel@lists.bufferbloat.net \
--cc=hndr@lists.pixilab.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox