Posted on October 19, 2010 by Jaime Buelta

Planes de desarrollo profesional

Llevo un tiempo pensando en escribir sobre algunas de las posiciones que tengo al respecto de mi empleo y mi manera de abordar mi carrera profesional.

Algo que se destaca constantemente en webs relacionadas con el empleo y la carrera profesional es la necesidad de tener un plan. Y un plan profesional es algo que casi siempre se pone en las oferta de trabajo (posibilidades de carrera), pero que, lamentablemente, tenemos que crear nosotros mismos. Es lo que Andrés Pérez lleva mucho tiempo sintetizando en “o eliges tú, o alguien va a elegir por tí”

No es nada sencillo, porque lo primero que viene a la cabeza son los típicos planes detallados al milimetro, como si la vida de uno fuese “Ocean’s 11” y todo funciona como un reloj. Pero eso es imposible, porque la vida no es una película de Hollywood y hay que improvisar constantemente. Y proponerte objetivos imposibles sólo va a frustrarte.

Lo más importante es tener claros tus objetivos estratégicos. No necesariamente tienen que ser objetivos muy concretos, sino sensaciones generales sobre los que un conocido llamaba “qué quiero ser de mayor”. Añadir algún tema concreto está bien, de todos modos, pero esas preguntas de “¿dónde te ves dentro de 5 años?“, al menos a mí, me resultan terriblemente complicadas de contestar. Cualquier cosa que hubiese podido decir hace 5 años no hubiese tenido nada que ver con lo que hago ahora, posiblemente…

Para poder hacerlo, aunque parezca una perogrullada, necesitamos conocernos bien a nosotros mismos. Pero no lo es. Cuesta mucho realmente darse cuenta de lo que es importante en nuestra vida y no seguir lo que nos imponen los demás, bien sean nuestros padres, nuestros amigos, la sociedad en general, etc… Tenemos que tener claro qué deseos son nuestros y cuales son “lo que se supone que tiene que ser”. Eso supone pensar y meditar sobre uno mismo.

Y, por supuesto, cada cierto tiempo hay que revisarlas y, ¿por qué no?, cambiarlas.

Por ejemplo, algunas de mis ideas actuales:

En primer lugar, creo que el trabajo de cada uno debe ser algo que llene, que vaya más allá que el mero hecho de cobrar una nómina. Más que lo creo, es que QUIERO que mi trabajo sea algo gratificante y que me importe. Bastantes horas al día se tira uno trabajando como para que luego no merezca la pena más que en la cuenta corriente.

A pesar de que muchas veces se plantea la dicotomía trabajar en lo que te gusta-ganar dinero, a partir de mi experiencia personal, creo que trabajar en lo que a uno le gusta funciona mucho mejor como medio de ganar dinero mas que “sacrificarse estoicamente”.

Quiero también que mi carrera profesional sea interesante y variada. Me gusta mucho aprender cosas nuevas, aunque eso inevitablemente sea algo complicado y difícil. Require esfuerzo, y todos tendemos a ser vaguetes y a apalancarnos.

A pesar de ser desarrollador de software, que es algo que implica reinventare casi casi cada día, me gusta también probar otras cosas, y trabajar en equipos interdisciplinares. También meto aquí lo interesante que creo que está siendo para mí la experiencia de estar trabajando en Irlanda y en una industria como la de los videojuegos, en las que cada semana parece un año de lo rápido que va.

Creo que uno debe siempre aspirar a más. Ser ambicioso. Eso no quiere decir ambición desmedida, pero sí tener un cierto sentido de progreso, entendido de manera personal, y, aunque sea secundaria, por qué no en lo económico. También incluye el reconocimiento.

Valoro mi independencia y no soporto el micromanagement. Quiero ser capaz de proponer objetivos y no hacer las cosas porque las diga el jefe, sino porque es necesario/mejora el proyecto. Quiero estar en un entorno cooperativo y no ordeno-mando. Es muy frustante tener que luchar contra imposiciones.

Cada vez creo más en ser transparente, en comunicar lo que haces y menos en guardarse secretos. Eso incluye hablar de lo que hago de manera pública (como en este blog), pero también aceptar errores, admitir inseguidades y ser sincero. Y mira que cuesta horrores.

Con estas coas, ya puedes ir sacando diferentes “derivadas”, Por ejemplo, muchas cosas que son importantes para mí (y, destaco, lo mismo no lo son para nadie más) son mucho mas factibles en pequeñas empresas o startups que en grandes empresas (y esa es una buena manera de clasificar qué buscar), o, incluso, llegado el caso, con autoempleo.

Eso implica que, probablemente, tenga que vivir con más riesgo implícito que estando 10 años seguidos en la misma compañía. Bueno, por el momento lo asumo (aunque cada día piense más que trabajar en una gran empresa es muy arriesgado, en cualquier momento prescinden de tí en las “altas esferas” y no hay manera de librarse). Pero creo que es importante entender que no hay nada gratis, para todo se paga un precio y es cuestión de prioridades. Es parte del plan, decidir si es más importante esto o lo otro e ir a por ello.

Así que, de tanto en tanto, me planteo, ¿lo que estoy haciendo sirve para mis objetivos estratégicos? ¿Hay otro camino para cumplirlos mejor? Y, si lo hay, pues uno se puede poner manos a la obra con un objetivo en la cabeza…

4 Comments

Posted on October 14, 2010 by Jaime Buelta

Database madness with mongoengine and SQLAchemy

Yesterday I gave a presentation in the Python Ireland October meeting about some work we are doing with mongoengine and SQLAchemy and how we are managing three databases (MS SQL server, MySQL and MongoDB) on an online football management game we are working on.

So, here are the slides, so feel free to make comments, ask questions and even criticize them!

You can also download the presentation on PDF here.

PD: When I talk about football game, I’m referring to soccer.

16 Comments

Posted on September 15, 2010 by Jaime Buelta

Commenting the code

I always find surprising to find out comments like that regarding code comment. I can understand that someone argues about that writing comments on the code is boring, or that you forget about it or whatever. But to say that the code shouldn’t be commented at all looks a little dangerous to me.

That doesn’t mean that you’ll have to comment everything. Or that adding a comment it’s an excuse to not be clear directly on the code, or the comment should be repeat what is on the code. You’ll have to keep a balance, and I agree that it’s something difficult and everyone can have their opinion about when to comment and when not.

Also, each language has it’s own “comment flow”, and definitively you’ll make more comments on low level languages like C than in a higher level language like Python, as the language it’s more descriptive and readable. Ohhh, you have to comment so many things in C if you want to be able to understand what a function does it in less that a couple of days… (the declaration of variables, for example) #

As everyone has their own style when it comes to commenting, I’m going to describe some of my personal habits commenting the code to open the discussion and compare with your opinions (and some example Python code):

I put comments summarizing code blocks. That way, when I have to localize a specific section of the code, I can go faster reading the comments and ignoring the code until getting to the relevant part. I also tend to mark those blocks with newlines.

# Obtain the list of elements from the DB
.... [some lines of code]

# Filter and aggregate the list to obtain the statistics
...  [some lines of code]

UPDATED: Some clarification here, as I think that probably I have choose the wrong example. Of course, if blocks of code gets more than a few lines and/or are used in more than one place, will need a function (and a function should ALWAYS get a docstring/comment/whatever) . But some times, I think that a function is not needed, but a clarification is good to know quickly what that code is about. The original example will remain to show my disgrace, but maybe this other example (I have copy-paste some code I am working right now and change a couple of things)
It’s probably not the most clean code in the world, and that’s why I have to comment it. Latter on, maybe I will refactor it (or not, depending on the time).

               # Some code obtaining elements from a web request ....

                # Delete existing layers and requisites
                update = Update.all().filter(Update.item == update).one()
                UpdateLayer.all().filter(UpdateLayer.update_id == update.item_id).delete()
                ItemRequisite.all().filter(ItemRequisite.item == update).delete()

                # Create the new ones
                for key, value in request.params.items():
                    if key == 'layers':
                        slayer = Layer.all().filter(Layer.layer_number == int(value)).one()
                        new_up_lay = UpdateLayer(update=update, layer=slayer)
                        new_up_lay.save()
                    if key == 'requisites':
                        req = ShopItem.all().filter(ShopItem.internal_name == value).one()
                        new_req = ShopItemRequisite(item=update, requisite=req)
                        new_req.save()

I describe briefly every non-trivial operation, specially mathematical properties or “clever tricks”. Optimization features usually needs some extra description telling why a particular technique is used (and how it’s used).

# Store found primes to increase performance through memoization
# Also, store first primes
found_primes = [2,3]

def prime(number):
    ''' Find recursively if the number is a prime. Returns True or False'''

    # Check on memoized results
    if number in found_primes:
        return True

    # By definition, 1 is not prime
    if number == 1:
        return False

    # Any even number is not prime (except 2, checked before)
    if number % 2 == 0:
        return False

    # Divide the number between all their lower prime numbers (excluding 2)
    # Use this function recursively
    lower_primes = (i for i in xrange(3,number,2) if prime(i))
    if any(p for p in lower_primes if number % p == 0) :
        return False

    # The number is not divisible, it's a prime number
    # Store to memoize
    found_primes.append(number)
    return True

(Dealing with prime numbers is something that deserves lots of comments!) EDIT: As stated by Álvaro, 1 is not prime. Code updated.

I put TODOs, caveats and any indication of further work, planned or possible.

# TODO: Change the hardcoded IP with a dynamic import from the config file on production.
...
# TODO: The decision about which one to use is based only on getting the shorter one. Maybe a more complex algorithm has to be implemented?
...
# Careful here! We are assuming that the DB is MySQL. If not, this code will probably not work.
...

UPDATE: That is probably also related to the tools I use. S.Lott talks about Sphinx notations, which is even better. I use Eclipse to evelop, which takes automatically any “TODO” on the code and make a list with them. I find myself more and more using “ack-grep” for that, curiously…

I try to comment structures as soon as they have more than a couple of elements. For example, in Python I make extensive use of lists/dictionaries to initialize static parameters in table-like format, so use a comment as header to describe the elements.

# Init params in format: param_name, value
init_params = (('origin_ip','123.123.123.123'),
               ('destiny_ip','456.456.456.456'),
               ('timeout',5000),
              )
for param_name, value in init_params:
    store_param(param_name, value)

Size of the comment is important, it should be short, but clearness goes first. So, I try to avoid shorting words or using acronyms (unless widely used). Multiline comments are welcome, but I try to avoid them as much as possible.
Finally, when in doubt, comment. If at any point I have the slightest suspicious that I’m going to spend more than 30 seconds understanding a piece of code, I put a comment. I can always remove it later the next time I read that code and see that is clear enough (which I do a lot of times). Being both bad, I prefer one non-necessary comment than lacking one necessary one.
I think I tend to comment sightly more than other fellow programmers. That’s just a particular, completely unmeasured impression.

What are your ideas about the use of comments?

UPDATE: Wow, I have a reference on S.Lott blog, a REALLY good blog that every developer should follow. That’s an honor, even if he disagrees with me on half the post 😉

On one of my first projects on C, we follow a quality standard that requires us that 30% of the code lines (not blank ones) should be comments.

1 Comment

Posted on September 7, 2010 by Jaime Buelta

ORMs and threads

Do you remember the post from Joel Spolsky about leaking abstractions? It’s the kind of idea that, the first time I read, about it, was intrigued, but after some time, I began to see it on every place. There are from time to time some problems on my Python code (as well as in other high-level languages) that I am really glad to be able to have an idea of the underlaying low level C, or I will be struggling with some very weird, confusing problems. I have enough confusing and weird problems of my own to add more…

One of my recent leaking abstractions has come using a ORM, in particular mongoengine, but I think it will happen probably on every ORM. On a web application I am developing at the moment, we need to launch a thread to perform some operations, in a timed manner. A request comes to the server, launches a thread, and then that thread stores its status on the database. Then the user can check the status from the database (and do more operations, like pause, etc, but that I will leave that). While performing some tests on the application, I made the following code:

def testing():
    user = User.objects.get(TEST_USER)
    user.launch_thread()
    time.sleep(TIME)
    assert user.status == END

Inside the thread, the code looks similar to this

def thread(user_id):
    thread_user = User.objects.get(user_id)
    # Do things that take a while, but less than TIME
    thread_user.status = END
    thread_user.save()

Ok, so we’re getting an object from the database, the object launches a thread that changes its state to END and saves it after a while. Well, not really… Obviously it’s not working (or I wouldn’t be writing this). But we all already know that threads are the root of all evil, and always have nasty hidden surprises.

The error I was making was assuming (and that’s the abstraction in my mind) that the ORM maps the database into memory, and that the copy is unique. After all, that’s why you have a database. But it’s not true. What it’s happening here it’s that we are creating two different objects in memory. I have (now) used two different names, user and thread_user. In my code I used the same name (user) which probably adds to the confusion. Each one reflects the status of the database when you read the database, but after that, you are not updating the object with the real information on the DB. So, the user object has still the starting status, the first one, as we haven’t refreshed it with the new and changed information that another, rogue thread, has changed while we naively thought that was under our control.

Usually, on a web application (at least the ones developed with high-level tools) the usual situation is having a request, read the data from the DB using a ORM, change something, and then save. We don’t have rogue threads interrupting that operations and requests can be processed fast enough. And even user data is different so two users probably don’t need to write any related information. BUT definitively another request (faster one) could interrupt the process and make the data to not be coherent. It’s going to be (extremely) rare in most applications, but in case of long, threaded operations, could be important to be aware of this and try not to relay on the ORM as a virtual copy of the DB, but to read and write in short operations. Or lock the database.

Just one more thing. It’s possible to use only one object in memory, and pass it to the thread, and avoiding this problems. But that could generate others, like not storing (and loading) any intermediate steps of the process. So, in case the thread is stopped (for example, a server restart), the process is totally lost. Any operation that takes time to execute will ideally have some “resume” process, so that will include storing the partial state, as well as a resume, which will need to read from the DB. Also, in this particular case, there are more than one thread working the same process, communicating through the DB.

But wait! There is still a little more unexpected and funny behavior!

To reload the user object, my first idea was to generate a refresh method, this way:

class User(mongoengine.Document):
     ...

     def refresh(self):
          ''' Refresh the object '''
          self = User.objects.get(self.id)

And again, it’s not working… 😦
Again, the problem is an abstraction. self it’s not always the object, not outside the method. It’s just a label (or pointer, if you know C) to the object. Yes, we have created a new object called selfwhich has the new (and correct) object. BUT the label user is still pointing to the not-updated object we have since the beginning.

So… no shortcuts, we will have to reload the object after the sleep to check that the object on the DB it’s behaving properly

def testing():
    user = User.objects.get(TEST_USER)
    user.launch_thread()
    time.sleep(TIME)
    user = User.objects.get(TEST_USER)
    assert user.status == END

Cambio de aspecto en el blog

El anterior no me terminaba de convencer, sobre todo porque en WordPress.com no tienes muchas opciones de configurar el aspecto, tienes que atenerte a uno de los temas que te proponen. Eso sí, los hay estupendos.

A pesar de que me gustaba mucho la idea de ese mapa entre Memphis y Nashville, que creo que daba mucho juego, este estilo es mucho más limpito, claro, y creo que “pega” también muy bien con la idea de tener un blog más orientado a temas técnicos. Además, he cambiado el idioma por defecto para que los enlaces, etc aparezcan en inglés…

De todas formas, estoy abierto a sugerencias si alguien quiere proponerme otros temas, o si este le parece horrible y propone volver al tema anterior…

3 Comments

Posted on May 21, 2010 by Jaime Buelta

Migrating data to a new db schema with Django 1.2

This week I had to make a migration of data on a database from an old schema to a new one. The database is part of a Ruby on Rails application, so the changes are part of new features, and we have also taken the opportunity to clean up a little the database and make some changes to be more flexible in the future. As we want to be consistent, we need to migrate all the old data to the new database.

After talking with people with more experience than me on Rails (I have only use it for this project) about how to perform a migration, and as this week the brand new Django version, supporting multiple DBs was release, I decided to use the Django ORM to perform it.

Research

My initial idea about the multiple database support on Django was that each of the models will have some kind of meta information that will determine which database if going to use. So, the idea will be to create models for the old database and models for the new database, each one with its own meta information.

Well, Django doesn’t work exactly this way… You can use that approach if each of the tables on the databases are named differently, because is smart enough to know that a table is only on one database, but the problem was that some of the tables we are using keep the same name, but change the way the information is stored.

In fact, the Django approach it’s more powerful than that, and allow a lot of different techniques, but you have to make some code. The key point is using a ‘router’. A router is a class that, using standardized methods, will return the appropriate database to look when you’re going to read, write, make a relationship or sync the db, according to the model performing the operation. As you can write those methods, you can basically do whatever you can imagine on the databases. For example, write always to the master database and read from a random (or consecutive) slave database. Or write the models of one application on one database and of the other three applications on another.

The router class then is added to the settings.py file. You can even generate several routers and apply them in order.

Getting the models

As the database model hasn’t been designed using Django, but Ruby on Rails, I had to connect to the databases (old and new) and let Django discover the models for me. The easy part is to generate the first models, just using

python manage.py inspectdb --database=DATABASE

Specifying the correct database, and storing each results in two different files, one for the old models and another for the new models (I have called them, in a moment of original inspiration as new_models.py and old_models.py). Then, rename each model to begin with Old or New, so each model name is unique. Then I created a models.py file that will import both, to follow Django conventions. I could also combine both, but having both models as different files seems to make more sense to me.

Then, as you can imagine, the problems began.

First, there is one table that has a composed primary key. Django doesn’t like that, but, as that table is new and doesn’t need the old data, I have just ignored and delete the model.

Another problem is that Rails doesn’t declare relationships as, well, relationships. It doesn’t create the fields as foreign keys on the database, but just as plain integers, and then the code will determine that there are relationships. So, as Django analyze the database, it will determine that the codes are not foreign keys, but plain integers. You have to manually change all those integers to foreign keys to the correct table, if you want to use the ORM properly. Apparently there are some plugins for Rails to define the relationships as foreign keys on the database.

To add a little confusion, Rails declare the names of the tables as plurals (for example, for an model called ‘Country’, the table will be called ‘Countries’), so the name of the models will be plural. I’m used to deal with singular model names in Django, so I tend to use the singular name instead of the plural when using the models, which will raise an error, of course. Anyway, you can avoid it changing the name in the models.

Routing the models

The router is easy, it will just route a model depending on the first letters on the model name. Models beginning with ‘New’ will go to the new database and every other model will go to the old (default) database, both to write and to read. I have started old models with ‘Old’. So the code is like this:

class Router(object):
    """A router to control all database operations on models in
    the migration. It will direct any Model beggining with 'New' to the new database and
    any with 'Old' to the default database"""

    def db_for_read(self, model, **hints):
        if model.__name__.startswith('New'):
            return 'new'
        return 'default'
    def db_for_write(self, model, **hints):
        if model.__name__.startswith('New'):
            return 'new'
        return 'default'

To avoid problems, the access to the old database is made with a read-only database user. That will avoid accidentally deleting any data.

Migration script

The migration script imports the Django settings and then mix all the data from the old database and then generate the new data for the new database. Using the Django ORM is easy, but there are some problems.

Django ORM is slooooooooow. Really really slow, and that’s a bad thing for migrations™ as they usually have lots of stored data. So there are some ideas to keep in mind:

Raw copy of tables can be performed using raw SQL, so try to avoid just copying from one table in the old database to the same table in the new database using Django, as it can take lots of time, and I mean LOTS. I began copying a table with about 250 thousands records. Time with Django, over 2 hours, time dumping in SQL, about 20 seconds.
Use manual commits, if the database allows it. It’s not a “magical option”, it’s still slow, but can help.
As usually the migration will be performed only once, try to work on development with a small subset of the information, or at least try to import one table at a time, and don’t recreate it once it’s on the new database. When you’re happy with your code, you can run it again from the beginning, but it’s awful to wait 5 minutes to realize that you have a typo error on one line, and another 5 minutes to discover the next typo error two lines below that.

Another thing to keep in mind is that the relationships are not shared over databases, so you need to recreate them. For example, imagine that we have this two models, when we store comic book characters. The Publisher table is going to keep the same shape, but the character table will now include the secret identity name.

class NewPublisher(models.Model):
    id = models.IntegerField(primary_key=True)
    name = models.CharField(max_length=50)
    class Meta:
         db_table = u'publisher'

class OldPublisher(models.Model):
    id = models.IntegerField(primary_key=True)
    name = models.CharField(max_length=50)
    class Meta:
         db_table = u'publisher'

class NewCharacter(models.Model):
    id = models.IntegerField(primary_key=True)
    publisher = models.ForeignKey('NewPublisher')
    nickname = models.CharField(max_length=50)
    secret_identity = models.CharField(max_length=50)
    class Meta:
         db_table = u'character'

class OldCharacter(models.Model):
    id = models.IntegerField(primary_key=True)
    publisher = models.ForeignKey('OldPublisher')
    name = models.CharField(max_length=50)
    class Meta:
         db_table = u'character'

The publisher table is identical, so the private keys are the same. Let’s say that all the secret identities are going to be “Clark Kent”, so the code to migrate the data will be like:

for old_character in OldCharacter.objects.all():
    new_character = NewCharacter(nickname=old_character.name,
          secret_identity='Clark Kent',
          publisher=NewPublisher.objects.get(pk=old_character.publisher.id))
    new_character.save()

You cannot use the relationship directly, and say that publisher = old_character.publisher, because that will try to assign an OldCharacter to a field that should be a NewCharacter. Django will raise an exception, but it’s good to keep that in mind. All those checks will help in the end to have a better control over the data in the new database and will ensure that all the data is consistent.

Conclusions

Migrate data from one database to another is always painful. One can argue that it SHOULD be painful, as you’re dealing with lots of information that should remain in good shape and that should always been taken with respect and caution.

Having that into mind, I must say that Django has made it a little less painful. I think also that the functionality for multi-db support it’s quite powerful and can be adapted to several uses. One thing that has always been more complicated that it should and now can be quite easy is migrating from one kind of database to another (from MySQL to Postgres, for example), keeping the data intact.

Anyway, I still think that including some kind of meta information to specify the database (or database behavior) per model could be a good idea.

But, by far, the worst problem is the way that Django is slow working with large sets of information. Adding some kind of bulk insert will be a huge improvement of the framework. Yes, you can always read the data from the database using the Django ORM and compose the INSERT statements by hand on a file to then load them, which is several orders of magnitude faster, but the key point of using the ORM should be not having to use SQL.

2 Comments

Posted on April 15, 2010 by Jaime Buelta

Presentation “Use of Django at Jolt Online” at Python Ireland

Yesterday I give a talk about the use of Django in my actual position at Jolt Online Gaming. on the Python Ireland group I talked a little, using one recent system as example, about our production configuration, use of Django on non-typical ways, work with the database and other tools we use related. At least there were a lot of questions and comments and some great conversation afterwards, so I think it was interesting for the people attending…

So there are the slides (in PDF), in case anyone want to take a look at them. I have put some notes.

Use of Django at Jolt Online 14 Apr 2010 low res

Let me know what do you think!

PD: I had to edit a little the presentation…

EDIT: I’ve added the presentation to SlideShare, it looks like this:

Remember to read the notes!

6 Comments

Posted on April 11, 2010 by Jaime Buelta

The intelligent solution that turns to be unadequate

Recenlty I asked for advice on StackOverflow with a question related to sorting information on a Django application.

I copy the question here:

I’m trying to do something similar to this, in Django. This is part of the page of Anna:
Pos NickName      Points
--- ---------     ------
1   The best      1000
...
35  Roger         550
36  Anna          545
37  Paul          540
It’s a chart showing the scoring system, and it intends to show the first position, as well as the relative position of the presented player.

Showing the first one it’s easy, as it’s only making a query to the database and extracting the first one:
 best = Score.objects.all().order_by('-points')[0]
But I’m having problems getting the ones close to the presented player (Anna, in this case). I don’t want to go searching through the complete list, as the complete list of players can be quite long.

Maybe there’s a way to know the position a register occupies in an ordered list…

Any ideas on how to achieve it?

Well, I get with a clever solution that was, well, count the number of players with higher number of points than the actual player, so you’ll get the position. It’s definitively more elegant than rendering the complete list and then searching for the position of a particular player.

The problem is that it won’t work when you have several players with exactly the same number of points. In that case, you’ll end getting the position of the first one with the same number of points, but not necessarily the one you’re looking, so you end with this kinds of tables:

Pos NickName      Points
--- ---------     ------
1   The best      1000
...
35  Roger         545
36  Paul          550
37  Anna          550

Not showing the actual player on the correct place, or even worse…

Pos NickName      Points
--- ---------     ------
1   The best      1000
...
35  Roger         545
36  Paul          550
37  Lucy          550

Not showing the player at all!

To make things worse the order is not always the same, as you have some players with the same number of points and the DB just seems to not return always the same order, so you need to introduce another element for sorting (the name, for example) to ensure that the ranking is not dancing just because the players have the exact number of points.

So.. at the end I needed to make a sort using two parameters (points, then name), get the complete list, get the index of the player and do the things the boring way.. Actually it’s working fine and nice, but I wonder if anyone knows a way of getting that avoiding the need of getting the complete list…

3 Comments

Posted on April 6, 2010 by Jaime Buelta

Recuperando archivos antiguos

Hace unos dias descubrí que, rebuscando entre los archivos de Internet, podía acceder a mi antiguo blog (llamado como éste), así que he copiado los artículos que he encontrado para tenerlos disponibles…
Son bastante antiguos, y la orientación entonces del blog era bastante distinta, pero igual a alguien le puede resultar interesante… ¡Quien sabe! Es posible que se me haya pasado error al hacer la copia, si es así, por favor decídmelo para intentar arreglarlo.

Los he agrupado en la categoría “Blog antiguo”, en un arranque de originalidad…

19 Comments

Posted on April 5, 2010 by Jaime Buelta

People really love RDBMSs

I had this discussion with a friend, helping him with a personal project. It was a software to help with a weekly schedule, so it has some teachers, each ones with a profile, some alumns, each one with a profile, and classes, relating the teachers with the alumns with time and physical place.

My friend has deal work a lot with RBMS and as a DBA, mostly with Oracle and MySQL, and he his quite good at that. He began designing a relational database model, with tables for alumns, teachers, places, etc… He also wanted to learn Python, so he wanted to access the database with SQLAlchemy and installed a MySQL database for development.

This development was intended for a small academy, so it will run on only one computer. There is no need of making some client-server architecture or any concurrency. Just one application you can start, load the data, change it, and then close and store the data for the next time you need to open it again.

So, basically, what you got here it is making some classes to define objects. Those objects with the information, using SQLAlchemy, will be stored on a MySQL database, and accessed just using the SQLAlchemy interface.

To me it’s clearly overdesign.

What it’s the point in store the data on a RDBMS? You don’t need all the good stuff a relational database bring at all. You don’t need transactions as your data is only accesed by one single application. You don’t need to share the data over a network. You don’t have to standarize the data to be accessed from different languages or clients. Or make a clustered database… Sure, relational databases are great and are used in lots of applications, but in this particular one you’r not getting any real advantage in using one. Instead, you’re creating a data model using classes and objects in Python. We only want that information to be persistent, so we can close the application and open it again and have the same data.

The most appropiate tool in this case, for me, it’s just plain serialization. Just generate an object with all the data and pickle it on one file before closing the application. If we need extra care in case of error, just do it each time something changes. If the data is expected to be huge (not on this case), you can do it on different files. The same if you expect a lot of search.

You can argue that , well, the application can grow, and be distributed, and THEN you can use all the fancy MySQL features. And that’s true. So you need to define a data model than THEN can be adapted using SQLAlchemy or other ORM (I really like the Django one). It’s really not so difficult. But complicate your system in advance is something I like to avoid.

I like to think that we should design a data model, and then (and not before) think how to implement this data model on a particular way (memory, files, database, etc… ). Of course a relational database it’s a lot of time the good solution, but it’s not the only one available.

EDITED: After some comments (which I really appreciate), I have to say that I can be really easily convinced to use SQLite. My position was more “I think that pickling the data is enough, but if you feel more confortably, use SQLite. I really think that MySQL it’s too much). My key idea on the post is to discuss that I think that we should think, even for a few seconds, if the use of a RDBMS is appropiate on every design, and consider alternatives… I think that a lot of designs begin with with MySQL ( or worse, with Oracle or MS SQL server) before even think about the data model… The DB should support the data model, not the data model be made to fit the DB…

« 1 … 10 11 12 13 14 … 16 »

Wrong Side of Memphis

Planes de desarrollo profesional

Database madness with mongoengine and SQLAchemy

Commenting the code

ORMs and threads

Cambio de aspecto en el blog

Migrating data to a new db schema with Django 1.2

Research

Getting the models

Routing the models

Migration script

Conclusions

Presentation “Use of Django at Jolt Online” at Python Ireland

The intelligent solution that turns to be unadequate

Recuperando archivos antiguos

People really love RDBMSs

Top Posts

Archives

Share this:

Share this:

Share this:

Share this:

Share this:

Research

Getting the models

Routing the models

Migration script

Conclusions

Share this:

Share this:

Share this:

Share this:

Share this:

Top Posts

Category Cloud

Archives