New servers and a new IT team...

Please register or login

Welcome to ScubaBoard, the world's largest scuba diving community. Registration is not required to read the forums, but we encourage you to join. Joining has its benefits and enables you to participate in the discussions.

Benefits of registering include

  • Ability to post and comment on topics and discussions.
  • A Free photo gallery to share your dive photos with the world.
  • You can make this box go away

Joining is quick and easy. Log in or Register now!

lamont:
My only counter-point is that Oracle-done-right is generally expensive, while MySQL+Linux/FreeBSD is the cost of the hardware and the expertise... You don't take a $20B company and put its most important financial data onto MySQL, but it should work fine as an architecture for scubaboard...

Oh, actually another thing is that one cheap way to beat Oracle is to use statically chewed data and serve it out of apache.... you can hit 10,000+ transactions per second on $3k of hardware... Of course you can get the best of both worlds by using this as a read-only cache sitting in front of Oracle... The best solution is generally not going to one or the other but a mixture of both... So when you say this:

We are talking about data volumes for the apps I work on that are staggering, and the amount of hardware we run on is in the 10's of millions of dollars. Imagine pulling EVERY switch in a global network, and the samples are taken every five minutes.

In fact we do front end our Oracle DB's with hashing... WAY to much hashing. We just don't put enough time into our design to make it easy to maintain or grow gracefully which is a HUGE issue. Imagine trying to add 1000 square feet of space to a store that resides on the ground floor of a NY city building with adjacent property that is NOT available on all sides. This is kinda what it's like attempting to add to our software in many cases, and believe you me, after we are done cramming the 1000 square feet of STUFF into an already crowded showroom, things suffer.

Cache, and hashing solutions work, what do you think is in the guts of Oracle? But overuse of this amounts to so much complex inhouse code that it becomes a house of cards, and visibility into what is going on at a low level is next to impossible. Visibility and user access to provide for the changing reporting requirements is also a real issue. Restarting, and recovery suffers, and as we implement more and more of this type of solution, we start to look more and more like a DB, with all the complexity, and none of the tools or expertise.

For Scubaboard I agree there are a LOT of viable solutions. Unfortunately I'm not sure they have the manpower to retool. I'm also no expert on what is available in Can solutions that may hook directly into the existing DB. Unfortunately like so many software solutions, once you realize that the direction you have chosen is not working, it's kinda like trying to build a plane why flying it! :banghead:

It would appear the SB will become more stable as they fine tune the new hardware. However throwing hardware Only and patches at a problem is generally NOT going to buy long term stability or scaleability.
 
RonFrank:
We are talking about data volumes for the apps I work on that are staggering, and the amount of hardware we run on is in the 10's of millions of dollars. Imagine pulling EVERY switch in a global network, and the samples are taken every five minutes.

Well, we pull timeseries data from 30,000 servers with a 1 minute polling interval and take ~100 datapoints/machine. With a retention time of 2 weeks we just keep it in a memory cache on a fleet of appx 100 machines. Working set is around 400GB of data (all in RAM). We don't need SQL access for timeseries data (there's nothing much 'relational' about the dataset) and we can do that on ~$300k of hardware (technically I think its closer to $150k/year of hardware including datacenter costs). Oracle would be expensive and probably slow (or else exorbitantly expensive), and corruption or dataloss is not catastrophic. This system is actually far more available than the Oracle DBs and takes far less exotic knowledge of O/S internals to debug (using a big cache pinned in memory removes all the VM pathology that you need to worry about if your app talks to the disk).

Oracle is also not a silver bullet. If you run enough Oracle DBs with enough different workloads you'll wind up finding pathlogical interactions between Oracle and the O/S (particularly the memory management layer) sooner or later -- even if you're not trying to do anything particularly obscene. At the highest levels tuning both Oracle and the underlying O/S is something that requires a lot of expertise (and commands an extremely large salary) which gets folded into the enterprise cost of the whole platform.

It would appear the SB will become more stable as they fine tune the new hardware. However throwing hardware Only at a problem is generally NOT going to buy long term stability.

Yup, crashing every day under load is not an indication of a hardware issue unless you're just swapping or running at 100% CPU all the time. Something else has to be going on. I expect the vBulletin c0d3rz are not all that good...
 
lamont:
Well, we pull timeseries data from 30,000 servers with a 1 minute polling interval and take ~100 datapoints/machine. With a retention time of 2 weeks we just keep it in a memory cache on a fleet of appx 100 machines. Working set is around 400GB of data (all in RAM).

Oracle is also not a silver bullet. If you run enough Oracle DBs with enough different workloads you'll wind up finding pathlogical interactions between Oracle and the O/S (particularly the memory management layer) sooner or later -- even if you're not trying to do anything particularly obscene. At the highest levels tuning both Oracle and the underlying O/S is something that requires a lot of expertise (and commands an extremely large salary) which gets folded into the enterprise cost of the whole
platform.

Last time I looked our VPN network had around 200,000 nodes, however we must aggregate and store at the tunnel level which is involvels node relationships and is in the area of 1,000,000 tunnels, hence the relational model.

I agree that tuning an expertise are required. The UNIX world I work in has become more mainframe like at every turn. Used to be the DBA, Developer, and SA were part of one team under one manager. Now with the political nature and firing of staff in lieu of management our environment is one where DBAs, Prod support, Security, Applications Developers, Testers, etc., all report to different managers.

Put six managers on a call (we don't even occupy the same office space) and all you get is more proceedures, finger pointing, and an endless chain of authorizations and emails that results in an act of God to get a five minute task completed like opening a filter to a dedicated IP address. This creates an environment that does not lend to thoughtful system and application design.


lamont:
Yup, crashing every day under load is not an indication of a hardware issue unless you're just swapping or running at 100% CPU all the time. Something else has to be going on. I expect the vBulletin c0d3rz are not all that good...

I agree with this.
 
lamont:
The issue is going to be how vBulletin uses MySQL...
Lamont,

This is the crux of what we have been dealing with. We actually ran into this wall this past February. Heads were scratched, and it was determined that the best track was to develop two new mondo servers, get away from Free BSD and hire a DB engineer. Unfortunately, the web server had serious hardware issues, some of which were never resolved. It's impossible to make headway with software development when the hardware crashes every other hour.

Now we are on a single stable mondo server. FloridaCaveDiver, who has a PHD in CIS, has resolved most of the vBulletin/MySQL issues on this single box. This weekend, we will be doubling the memory on this server and adding in the data server as well. So we will double the processors and quintuple the memory running ScubaBoard. Hopefully, this will will resolve the remaining issues that seem to mostly relate to overloading memory swapping.

BTW, we were never able to ssh into the planet servers when the board froze. We had to continually reboot the whole shooting match. Now, sshing in to resolve a stuck Apache or MySQL process is the norm. The server never seems to crash. It's great to have only ONE SET of problems to deal with (software) instead of both software and hardware issues at the same time. Life is good.
 
"Listening in" on your conversation brings tears to my eyes as I remember the days when I actually did something tangible as a software engineer. Now all we do where I work is implement software others build, produce ridiculous amounts of documentation to make the FDA happy ... oh, and attend meetings and read (and occasionally respond to) email.
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:eek:ffice:eek:ffice" /><o:p></o:p>
 
I agree Oracle is not the best solution for every problem. My SQL is fine for this type of application. Now if the picture gallery wold just come back up....


Mike
 
Divin'Hoosier:
"Listening in" on your conversation brings tears to my eyes as I remember the days when I actually did something tangible as a software engineer. Now all we do where I work is implement software others build, produce ridiculous amounts of documentation to make the FDA happy ... oh, and attend meetings and read (and occasionally respond to) email.
<o:p></o:p>

You should try consulting. I've never been happier.

When you're dealing directly with the customer, and they're paying actual money for everything you produce, the busy-work tends to be quite limited, and the actual work is really interesting.

Terry
 

Back
Top Bottom