Requests per second. A server load reference


As there seems to be a lot of misconceptions about what Big Data, there are also not really a good baseline to know “how much is high load”, specially from the point of view of people with not that much experience in servers. If you have some experience dealing with servers, you will probably know all this. So, just for the sake of convenience, I am going to do some back-of-the-envelope calculations to try to set a few numbers and explain how to calculate how many requests per second a server can hold.

We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.

Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.

I/O bound or CPU bound

There are two type of requests, I/O bound and CPU bound.

Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open

Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open

Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.[1]

In memory bound systems, getting the number of RPS is making the following calculation:

RPS = (memory / worker memory)  * (1 / Task time)

For example:

Total RAM Worker memory Task time RPS
16Gb 40Mb 100ms 4,000
16Gb 40Mb 50ms 8,000
16Gb 400Mb 100ms 400
16Gb 400Mb 50ms 800
Crunch those requests!

Crunch those requests!

Some other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core.  Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.

In CPU bound systems, getting the number of RPS is making the following calculation:

RPS = Num. cores * (1 /Task time)

For example:

Num. cores Task time RPS
4 10ms 400
4 100ms 40
16 10ms 1,600
16 100ms 160

Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers.  And, of course, they can be errors. But there are good numbers to check and keep in mind.

Calculating the load of a system

If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.

That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, just assume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.

For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.

That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.

HULK CAN HOLD ANY LOAD!

HULK CAN HOLD ANY LOAD!

A typical load for a server

This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.

1,000 RPS is not difficult to achieve on a normal server for a regular service.

2,000 RPS is a decent amount of load for a normal server for a regular service.

More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work these days.

Again, this are just my personal “measures”, that depends on a lot of factors, but are useful to keep in mind when checking if there’s a problem or the servers can be pushed a little more.


1 – Small detail, async systems work a little different than this, so they can be faster in a purely I/O bound system. That’s one of the reasons why new async frameworks seems to get traction. They are really good for I/O bound operations. And most of the operations these days are I/O bound.


How to Make Technology Choices

Truly awesome post by Steven Lott.

The expectation of finality is the most disturbing: the expectation that someone can make OnePerfectFinalDecision.
No technology choice is ever final. Today’s greatest ever state-of-the-art, kick-ass-and-take-names SDK may evaporate in a cloud of lawsuits tomorrow. Today’s tech giant may collapse. Today’s platform of choice may be tomorrows weird anachronism.
If you are really, really lucky, you may get big enough to have scalability issues. Having a scalability issue is something we all dream about. Until you actually have a specific scalability issue, don’t try to “pre-solve” a potential problem you don’t yet have. If your software is even moderately well design, adding architectural layers to increase parallelism is not as painful as supporting obscure edge cases in user stories.
When you’re still circulating your ideas prior to writing a demo, all technology choices are equally good. And equally bad. It’s more important to get started than it is to make some impossibly PerfectFinalDecision. Hence the advice to build early and build often.

Making a tech choice and migrating after you know more is not necessarily a problem. It is at least unavoidable, and probably even good for design.

Vim as IDE. Are you getting the wrong parts?


There are a lot of discussion about how to make Vim “an IDE”. Vim is a great text editor, but when we are developing, there are lots of extra tools that are very useful. Code completion. Easy navigation through files and classes. Source control integration. Syntax checking. Navigation that understand the semantics. Integrated debugger.

My problem with IDEs (and I have used a few over the years) is that they give you a lot of support, but at the cost of increasing the complexity. They are slow for a lot of common operations and they use a lot of resources. Code completion, the good kind that will allow you to get the types of a method, not just finish your words (which most of the time is trivial), is useful when developing on a static language, but since I  program in Python is something that I can’t find good use of it. I just don’t use all that stuff, so I don’t feel I can justify to pay the price.

Another problem with IDEs is that they tend to be designed, by default, to the newbie. That’s not necessarily a bad thing, Vim is a pretty intimidating tool because is totally crazy for the rookie. But it generates a bloated environment to start with. If you open Eclipse, a popular IDE (and one I’ve used for some years), you’ll get a relatively small frame for the code, and a lot of extra stuff surrounding it. Logs, file navigation, class definition, a huge toolbar, maybe even a TO DO list…

This is a lot of stuff!

This is a lot of stuff!

For example, think about file navigation. Sure, it’s useful to move around. But it is used only at certain points in time. Most of the time, it’s just used as an entry point to code, and then the navigation can be achieved, either just moving around in the same file, by a referral from the code (like going to a method definition), or just searching on the whole project. In case you need to go to an specific file, you can then show the navigation window, or even better, search by filename. That only happens during small periods of time, so the rest of the time the window is just wasted space on screen. The same thing happen for a task list. It is useful to know the next step. But not while you’re working in one task.

Hey, it is possible to hide the navigation window, and display it only at the proper moment, to save space. I have done that. But it’s not there by default, so most of the people I know just keep it open “just in case, giving context”. They just get used to it, and don’t perceive it as a problem, But having half of your screen full of information that is irrelevant 95% of the time is a huge price to pay. And certainly not a good use of an IDE. The good parts of an IDE are things like automatic compilation and deployment, refactoring tools (not just renaming), debugging, help with the types in static languages, automatic generation of code, etc. Not showing everything, all the time.

Mimicking the wrong parts of an IDE

Mimicking the wrong parts of an IDE You can do better.

Vim is a text editor, but it is also sort of a philosophy. It is not about showing stuff, but about having stuff available at the right moment, with a quick command. It is not just about using hjkl to move around and having an insert mode. It’s about being precise. It is difficult at first, because you can’t simply discover what are the available options, but it also pays off in terms of focus and clean environment. When programming, the most important part is to keep as few things into mind as possible. Keeping all relevant information, which is already enough, but nothing more than can distract you for the task. It is also about doing one thing, and not a hundred. I use other tools for things like source control and debugging. After all, we have the whole OS to work as our IDE.

I use a small number of plugins in Vim. When you learn a little about it, you find out that it’s amazing the number of features and stuff that can be achieved “out of the box”, and how little extra is actually needed to have a really productive environment. It’s just that we need to move from the usual “browse through menus” world that most of software uses, and devout some time to, well, study and learn. It’s really worth it.

Notifications and emails


Air Mail Envelope

Yet another vintage representation of Email

We all now that email, being a technology created a long time ago and developed organically into some sort of lingua franca of Internet persona and communications, has a series of problems. No easy ones. Manage the email is a problem of its own, and there are lots of articles about it on the Internet.

One of the most annoying is the notifications. We all receive too much email that are only reminders of something relatively interesting in a different app. That could be a new comment on a blog post, an update on LinkedIn, or even a new post on a forum (yep, that used to be a huge thing). GMail’s recent move to group together all notification email is a great example that this system is quite inefficient. It is difficult to find the balance between keep a user informed and not sending spam.

To increase the annoyance, notifications typically will be produced in bursts. There is some discussion in a blog, with 4 or 5 messages in an hour, then it stops for several hours, and then someone else post another comment, producing another couple of comments.

My impression is that any serious app that produces a significant number of notifications (not even very high, something like twice a week or more) and wants to show some respect to their uses should move to a notification system. Hey, Facebook has done it. Remember when Facebook used to send tons of mail everyday with new likes, friends and posts? They changed that to make a notification system in their page. That mean you can always close Facebook, and when coming back, you can easily go to everything since last time.

But, of course, Facebook is a special case, because most people keeps it open or at least check it regularly. Most of other apps that are not that frequently used needs to use email, or no one will check them.

So that’s the deal. Send only one email. One saying “You have new stuff on app X. go to this link to check your new notifications. No new email will be sent until you visit our page” And maybe send another reminder after a week (that can be disabled). This way, if I don’t want to go immediately to the page, no more spamy notifications are received. If I’m interested in the app, I’ll check every time I get that email, but the email is not spam. It allows a very interesting natural flow. And it also shows up respect for your users.

PD: Yes, I know that this is inspired by the way phpBB works, but in a more high level approach. Not sure why that way of doing stuff is not more common.

My favourite scientists


There are many figures in science that are fascinating. Not only from the point of view of the importance of their discoveries, but also from the person behind the figure, as well as their times and common beliefs. For example, Isaac Newton is a complex and fascinating figure. He is without a doubt one of the greatest physics of all time, but he was also a difficult personality and made things like taking an effort in obscuring Robert Hooke research on gravitation.

But among all the scientists from the ages, there are two that I have a special kind of appreciation: Johannes Kepler and Richard Feynman. This appreciation is sightly related in both cases.

Beautiful models

Johannes Kepler

Johannes Kepler

Up to the 18th century, there were only six planets known, that had been known since ancient times. Mercury, Venus, Earth, Mars, Jupiter and Saturn. Of course, until the heliocentrism Earth wasn’t considered a planet, but at Kepler time, the fact that Earth was a planet just like the others, orbiting around the Sun was settling down. So, Kepler, who was a very religious man and studied previously theology, convinced himself that there should be a relationship between regular polygons and the different planets. And he worked hardly, adjusting his model to work with tridimensional polyhedra, the most significant Platonic solids. When ordered properly, the orbits of the planets, represented by spheres, will be engraved one after the other!

Just imagine what that model will mean for Kepler. You have a relatively simple model that shows the Solar system has a direct relationship with the five Platonic solids (tetrahedron, cube, octahedron, dodecahedron and icosahedron) which carried, for a long time, a deep philosophical meaning of perfection. And all is consistent with the astronomical data that was available at the time! Amazing! The model is beautiful to our eyes, but for Kepler I can only imagine that it has to be also wonderful and spiritual.

But, of course, reality is not what we want it to be. Later, Kepler worked with the astronomer Tycho Brahe, who had the best observatory of the world, and refine all the measurements and calculations. And, guess what? The numbers did not add up. There was an small, but consistent, deviation from the model. And the precious, perfect circles weren’t true.

Roll 2D8 to advance to the next planet

Roll 2D8 to advance to the next planet

Can you imagine how devastating that discovery had to be for Johannes? The model MADE SENSE. The model was PERFECT. And yet the real world measurements shows that it can’t be. There has been others, during history, that had faced this kind of challenge and hided the data, manipulated the numbers, or closed their eyes. Kepler did what he had to do. Be honest and modify his model. He didn’t make this with a smile. That was awful and disconcerting, and felt like a failure. Why ellipses? Why God did not design the Universe using circles? It took years to get to that conclusion. But he corrected his model using ellipses.

And that is now known as Kepler’s First Law of planetary movement: “The orbit of every planet is an ellipse with the Sun at one of the two foci.”

And I think that has to be extremely hard, at a very personal level. And I admire him for being honest, not hiding the data, and being able to move on, even if he felt awful about that. I always keep in mind that properly measured real world data is paramount. And that hiding your failures under the carpet takes you nowhere.

The model itself, of course, is completely blown away the moment you add an extra planet, but it is still beautiful. I’d love to be able to buy one wooden Kepler model and have it on my desk. 

Atypical stereotype

Careful now, I'm making science

Careful now, I’m making science

I think that Richard Feynman is probably one of the most loved scientists among scientists and technical people, but, strangely, not so much for the general population. And it’s a shame, because the stereotype of the scientist is mostly based on Albert Einstein. Well, probably on an already stereotyped version of Einstein, You know, oblivious to mundane things; interested on very complex, indecipherable things; talking almost in riddles; locked up on his lab; long curly grey hair, etc… Don’t you laugh when a “scientist” on a movie says something totally cryptic to describe a simple phenomenon?

But Feynman was sort of the opposite of all those common misconceptions. He was an extremely vital, passionate man. He was funny. He find everything surrounding him fascinating. He travel extensively around the world. He was eloquent and an excellent teacher and communicator. He played bongos. He went to samba school in Brazil. He used to write physics equations on paper placemats on a topless bar. All of that while winning a Nobel Prize and being one of the best physicists of the 20th Century.

He was a truly fascinating man. We are fortunate to have a quite extensive record of interviews and books with his thoughts. I like specially his brief description of the key to science in this video, which is related to the story about Kepler.

He also said (about the Challenger disaster, as he played an important part on the commission that analysed the disaster)

For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.

There are lots of videos on YouTube where Feynman talks about his views on the world. I would recommend watching them, they are very interesting.

Curiously, Einstein is a good example of a scientists that wasn’t able to accept reality. He never totally accepted quantum physics. Don’t be too hard, Einstein still was awesome.

Bring in the code

But I also try to keep these premises in mind. As I said before, the world is not the way we like it to be, it’s the way it is.

And I find that is a reminder to have when I work. Software development is not usually about science, but about engineering. Related but not exactly the same thing.

We developers tend to develop love for our precious. Our favourite set of tools, frameworks, programming languages, operative systems, editor, etc… But specially our code, our modules, the ones that we have developed, as we have put ourselves on them. Most of the time, they are fine. There are lot of choices that are really a matter of taste.

But we should keep an eye on reality. Measure carefully to be sure when things stop bending and start breaking. Do not lie yourself with meaningless benchmarks, but with the best, stressful test that you can perform on your system. And when there are problems, and our failure is irrefutable, accept it and be able to move on.

It is not easy. I don’t like being wrong. It hurts. But it is the only honest solution. And I find wisdom in Feynman to know how to proceed and courage in Kepler to accept the results.

Vim speed is not really the point


500px-Vimlogo.svg

I am a Vim user. And a Vim fan.

I was fiddling around for some time, you know, just knowing the basics, because it is convenient to do small changes on servers without having to worry about installing anything. Just the basics, insert mode, some search, save,

and a couple more things. Then, around two years ago, I decided to give it a try as my main editor, after watching a presentation of a co-worker (my boss, actually) about how to use Vim (he has been using Vim for around 20 years)

At the beginning, it is quite difficult, to be honest. You have to relearn everything about editors. But I was doing OK after one or two weeks, so I kept using it. I was also forcing myself into improving my usage, reading about it and learning tricks…

Then, after a while of using it and read a lot of instructional material (I cannot recommend “Practical Vim” by Drew Neil strongly enough. It’s a FANTASTIC book), everything started to make sense. Let’s be serious, the problem with Vim is not exactly that is “difficult” per se, it’s that it is so alien to any other text edition experience, that you have to forget everything that you know about how to edit text. That’s why I don’t agree that the Vim learning curve is a myth. Because, when we first heard of Vim, we already have 10+ years of using things like Word, NotePad or Gmail to write text. We need time to assimilate how to edit text “the Vim way”. And that takes time to assimilate, as your muscular memory works against you.

And, yes, the Vim way is awesome. But it is awesome not for the reason that usually someone will think at the start. There is the perception that Vim is awesome because is fast. It is not.

Continue reading

GitHub for reviewing code


A couple of weeks ago we started (in my current job) to use GitHub internally for our projects. We were already using git, so it sort of make sense to use GitHub, as it is very widespread and used in the community. I had used GitHub before, but only as a remote repository and to get code, but without much interaction with the “GitHub extra features”. I must say, I was excited about using it, as I though that it will be a good step forward in making the code more visible and adding some cool features.

One of the main uses we have for GitHub is using it for code reviews. In DemonWare we peer-review all the code, which really improves the quality of the code. Of course, peer-review is different from reviewing the code in an open software situation, as it is done more often and I suspect than the average number of reviewers is lower in most open source projects. We were using Rietveld, which is a good tool, but probably not stellar when you start using it. The main process was, write code, submit it to Rietveld, ask for reviewers, check and discuss comments, update the code and repeat the process; and wait for approval for the reviewers to submit the code to the remote repo.

What can I say? I am quite disappointed with the result. It seems that the tools for code review in GitHub are not great, to put it lightly. Not great at all.

I guess that my disappointment is big because I though that, as GitHub is so used in the open source community, the review tools should be very good, as open source is all about checking and reviewing code that anyone can send. But the fact is that the review tool itself is pretty limited.

Continue reading

Password Extravaganza: Open discussion about security


In recent times, I’ve been thinking quite a lot about security on Internet. And I mean my personal security on Internet. There has been some recent examples of leaked passwords on some common websites (LinkedIn, I am talking about you!), and I get the impression that the way I was handling passwords on the past was no longer good enough. Luckily, I never had problems, but I thought that I needed review my habits and to take it more seriously.

As with everything that is new, when I open my first email account (about 15 years ago) and register in the very first web pages, my security concerns weren’t really that much important. I started with a relatively (for the time) strong password with more than 6 characters, upper and lower caps + numbers that I can remember easily. Back in the day that was strong enough. I then started to use it everywhere. I’ll call it “password A” from now on.

After some time, I realised that it wasn’t really that good of a strategy, so I got another coupe of stronger passwords, and use them on “sensible” places, like my email, which is the most important point on the chain, or later Facebook.

So, some time ago, I started to think more and more about this, and started being more conscious to password security and the challenges it present. I am going to describe what are my views about passwords and my strategy about them. I am not a security expert, and I think there are a lot of wrong assumptions and myths around passwords. That’s why I want to be open about that, and try to make a “call for review” to share tips and see if I am doing something wrong and see other ways. So, please, add whatever you feel is interesting.

Continue reading

ffind: a sane replacement for command line file search


Screen Shot 2013-03-26 at 22.53.13
I tend to use the UNIX command line A LOT. I find it very comfortable to work when I am developing and follow the “Unix as IDE” way. The command line is really rich, and you could probably learn a new different command or parameter each day and still be surprised every day for the rest of your life. But there are some things that sticks and gets done, probably not on the most efficient way.

In my case, is using the command `find` to search for files. 95% of the times I use it, is in this form:

find . -name '*some_text*'

Which means ‘find in this directory and all the subdirectories a file that contains some_text in its filename’

It’s not that bad, but I also use a lot ack, which I think is absolutely awesome. I think is a must know for anyone using Unix command line. It is a replacement for grep as a tool for searching code, and works the following way (again, in my 90% usage)

ack some_text

Which means ‘search in all the files that look like code under this directory and subdirectories that contains the text some_text (some_text can be a regex, but usually you can ignore that part)

So, after a couple of tests, I decided to make myself my own ack-inspired find replacement, and called it ffind. I’ve been using it for the last couple of days, and it integrates quite well on my workflow (maybe surprisingly, as I’ve done it with that in mind)

Basically it does this

ffind some_text

Which means ‘find in this directory and all the subdirectories a file that contains some_text in its filename’ (some_text can be a regex). It has also a couple of interesting characteristics like it will ignore hidden directories (starting with a dot), but not hidden files, it will skip directories that the user is not allowed to read due permissions  and the output will have by default the matching text in color.

The other use case is

ffind /dir some_text

Which means ‘find in the directory ‘/dir’ and all the subdirectories a file that contains some_text in its filename’

There are a couple more params, but they are there to deal with special cases.

It is done in Python, and it is available in GitHub. So, if any of this sounds interesting, go there and feel free to use it! Or change it! Or make suggestions!

ffind in Github

ffind in Github

UPDATE: ffind is now available in PyPI.

Magical thinking in Software Development


I guess we all Python developers heard this kind of argument from time to time:

Python is slower than C++/Java/C# because is not compiled.

Other than the usual “blame the others” when working with other companies (usually big corporations than thinks than using anything except C# or Java is laughable), you can also see a lot of comments in technical blogs or places like Hacker News or Reddit with similar, simplistic arguments. You can recognise them on the usual rants about how technology X is The Worst Thing That Ever Happened™ and Should Never Be Used™

That’s a form of Software Development Magical Thinking. This can be really harmful for software development, specially when the opposite, positive form is used. Let me define Software Development Magical Thinking in this context:

Software Development Magical Thinking noun Assuming that a technology will magically avoid a complex problem just by itself.

Probably that will become clearer after a couple of examples:

Java is a static type language and it is safer than dynamic type languages like Ruby.

We program in C++ so our code is very fast.

MongoDB / NodeJS / Riak is web-scale.

Please note that those are not completely, utterly wrong statements. C++ can be very fast. Static typed languages can avoid some bugs related with input parameters type. But there is no guarantee that creating a system in C++ is going to act like a magic wand against slow code. Or that Erlang will avoid having a single point of failure. And you’ll get as sick of bugs and security issues both on static type language and dynamic type languages. *

Those are all complex problems that need careful design and possibly measurements to deal with them. Deep analysis of the problem, which usually is more complicated that looks on the first place. Or even worst, the problem is not as bad as it looked and the designed system is more complex that it should, trying to catch a problem that never arises. Not to exclude having previous experience to avoid subtle errors.

Let me say it again. There are problems that are HARD. In software systems they are confronted almost daily. And no single thing will make you forget them. Even if you use a very good tool for what you’re doing (like Erlang for concurrency), which usually implies paying a price (in development time, etc), doesn’t replace vigilance and issues could eventually appear. Unfortunately, making software is tough.

The problem with Software Development Magical Thinking is that it is very easy and it is also very natural. Seductive. We know that “general Magical Thinking”, simple solutions to very complex problems, is quite common. Hey, a lot of times, it even seems to work, because the Feared Problem will only present after certain size that is never attained, or after the designer leave the company and left a latent problem behind. Most of the time, making a totally informed decision is unrealistic, or simply not possible, and some risks must be taken.

But as software developers we should know that things are not that easy, even if we have to compromise. Each bug that takes time methodically eliminating causes. Every measurement that makes you wonder what is the best metric to reflect a value. Every time you realise that there was a back-of-the-envelope calculation that shows something that will have an impact on some design aspects. Those are all reminders that should makes us think that there are no silver bullets and we shouldn’t take lightly all those difficult problems.

Make decisions. Design systems. Choose a tool over others. Take risks. But don’t be delusional and careless. Be conscious that software can bite you back. Be vigilant. Be skeptic. Avoid Magical Thinking.

PD: And please, don’t say “Python is slow”. Just don’t. It is not for most of the jobs. It is not going to make you win a discussion unless you carefully measure and proof it. And, perhaps most importantly, raises my urge to kill.

* No, I am not going to comment anything the Mythical Web Scale property.

EDIT: Wow, it has been submitted to Hacker News here. Just in case any one whats to add to the discussion there.