Alois Reitbauer About the Author

Beyond Persistence – How to Exploit Your Database for Better Performance

This post is contributed by Asad Ali – Enablement Services Consultant at dynaTrace. Based on his experience with numerous customers he describes how to optimize the data access layer of applications by “violating” a paradigm that is often referred to as a golden rule for designing applications.

Before the advent of the 3-tier architecture, client server technology was prevalent. In the client-server architecture, the databases played an important role for data persistence. During this time, major database vendors optimized the technology for fast read/write, almost real-time replication and clustering for maximum uptime.

The arrival of J2EE servers and .Net Framework saw a paradigm shift in application computing. The ability to deploy applications across multiple tiers and the use of multiple communication protocols including SOAP, Web Services, RMI have made it possible to scale up and scale down the application based upon the transaction load. Along with this shift in application deployment came the need to not only write code once and run anywhere but also to write application server independent code.

Over the past year and a half while consulting with customers, I have observed that while the business logic is tightly embedded in the application server, the database is treated as just a data store in many customers’ applications. To get data to process the business logic, the applications make lot of database queries per web requests. Shown below is an example of database requests made in one web- request.

SQL statements for one single user transaction

SQL statements for one single user transaction

As shown above, to process one web request this application is making over 3000 database calls. Such a data access strategy has multiple disadvantages including:

  • High network overhead – The more the number of JDBC statements, the more the number of packets in the network – and hence more network latency.
  • Unnecessary object creations – Once the data is received by the JDBC ResultSet, the application code has to create collection objects to process the results.
  • Response time overhead in managing data result sets – Excessive object creation to handle the JDBC results means more memory consumption in the application server – which in turn can cause high GC cycles.
  • High CPU consumption in the database server due to high database queries.

One of the common reasons used by architects in support of such architecture is to keep the business logic out of the databases. Their argument supports the concept that minimal changes would be required in the application if the database vendor were changed. This argument begs the question about how many times database vendors are changed during an application lifecycle. The answer is hardly anytime.

The (mis)use of Object-Relational (OR) mapping tools to load data into the application server also contributes to the problem of excessive calls to the database. OR mapping tools can not only cause the (n+1) query pattern problem but also can cause unnecessary trips to the database to retrieve one row at a time from the tables. It has also been argued by some Java developers that OR mapping tools significantly reduce application code. However, OR mapping tools do not perform well when massive amount of data is updated or deleted.

As I said earlier, the database vendors have made significant strides in optimizing data storage in the database. Databases are not just persistent stores. They are a sophisticated piece of software that handles very fast data reads and writes even when large data sets are manipulated. To harness this power of the database engine, it is important to write a query in such a way that you get the desired resultset by joining multiple tables instead of retrieving data from one table first and then using the foreign key to retrieve data from the other tables.

For complex reports in your application use of stored procedures is optimal. Replacing multiple queries used in a report by a single stored procedure not only reduces the network traffic but also improves query response times. Unlike individual SQL queries, stored procedures are precompiled in the database and hence their execution times are much lower. Additionally, for queries that join multiple tables with complex join conditions, stored procedures are better alternatives as you can split the queries in one stored procedure and use an optimized query plan.

Conclusion

As you look for ways to optimize your application, do pay attention to the data access layer. Patterns like (n+1) queries can be easily optimized by replacing multiple queries that get data from one table at a time with queries that can join multiple tables and only get the desired result sets. Use of stored procedures is definitely another way to optimize the data access layer. However, ensure that you don’t replace your queries that get data from a single table with a stored procedure because you won’t get any optimization results with this approach.

 

Comments

  1. Duh. Also, looks like your example doesn’t use
    prepared statements, and no bind variables.

  2. I would say that Databases are sophisticated pieces of software that handles very fast data reads and writes and are responsible for simple persistence (simple ain’t easy). Large reads and writes are not the most common use case for most applications. Also I take issue with your observation that so many queries is proof that the code wasn’t designed with a databases capability in mind. A common design pattern in Hibernate is to let the ORM create discrete SELECTs and rely on caching layers to return the results. A discrete (and general) select only hits the database the first time its called. Modern Databases (ie everyone you are likely to work with in the open source and commercial space) all have caching built into to their parsing and retrieval code so even sending a “raw” SQL command isn’t actually going cause the sql server to stop in its track, recompile the command execute all the needed reads and compose a result set. Furthermore, if you make used of prepared statements, you are actually sending the command and it’s qualifying values as separate pieces. In MySQL’s case (just to pick on them for a moment) it will compile the prepared statement (think the command with the ? marks) the first time it sees it, then call that compiled code every time the same prepared statement comes back even if the “WHERE” values have changed). This cycle of compile, cache and execute is the exact same process that occurs when you write a stored procedure (ie they have the same execution characteristics). The only difference is one is based on a vendor specific language and harder to intercept and track with continuous integration tools (the stored proc) while the other is written in the same code as the rest of your app and presumably can be unit tested, reviewed, locked down or whatever else you do to make sure your projects are on track (the ORM). Now the point about round trips is certainly valid but here an ORM like Hibernate is really helping you because when properly configured (and the default configuration is to do this) Hibernate will maintain a local cache of any part of the database you work with. So in a server farm you have the ORM layer actively distributing the load from the DB server out to all the web servers (though a process called locality) while still providing a single interface back to a central database (or bank of databases). And all of this sophistication is coming to you for free.

    By the way I am with you on the point that switching databases is not a use case worth optimizing for.

  3. Bimal Shah says:

    Couple of points for your consideration:

    1. In many enterprises, with hundreds of applications in use, related applications share the infrastructure. A database server would be a shared resource. DB Administration too is easier this way. In such a shared infrastructure, it is often seen that a heavy report that does all its processing in the DB, takes up all DB resources and impacts other applications. Code reviews could have caught such poorly written Stored Procedures. But it is not uncommon to find code that has not been reviewed and rushed into production by people who believe they are saving time by bypassing such reviews. So to protect the DB, would it not be better to have architectural guidelines that require all business processing to be done in the app tier? It is much easier to horizontally scale the app tier; if cache replication is in place, many a times it is a matter of firing a new VM and including it in the bank of app servers. DB Servers can scale vertically, but you do reach a physical limit here. Horizontal scaling of DB servers is a difficult task that may involve changes in application code. We also need to remember that there are background tasks such as replication and backup on the DB server.

    2. Consider the case of Operations Staff. They are supporting many applications. They may not be aware how an application is coded. With Stored Procedures and and Triggers, processing is kind of hidden. The DB Admin sees higher activity on the DB: but what has caused it? Difficult to debug.

  4. Nice read, thanks.

    Also, sometimes “views” can be a solution to reduce the amount of queries.

    Do you know a way to monitor a database (for example, logs request & execution time), or a tool to analyse the “performance” of an SQL statement (for example, to determine if a specific request performs better with a JOIN statement or with a query like “SELET … FROM a, b WHERE a.id=b.id …”) ?

  5. You are right, another important point is, DB implementation will reduce development time :) because developer need not restart the application or web server every time when ever code changes. It is taking 20%(avg) of the time during the development.

    When it is coming to DB, no need, change the code and execute directly, it will save lot of time. But when SQL code is badly written then it will take 30 to 40 % time to revisit and need good DB person to resolve.

  6. The particulars and exact recommendation are insurance specifically what I was wanting. I’ve book marked and will definitely be returning.business insurance Thanks for the information in this blog. Online Business Directory

  7. It is much easier to horizontally scale the app tier; if cache replication is in place, many a times it is a matter of firing a new VM and including it in the bank of app servers.We also need to remember that there are background tasks such as replication and backup on the DB server.
    Side effects of Vicodin

  8. Consider the case of Operations Staff. They are supporting many applications. They may not be aware how an application is coded. With Stored Procedures and and Triggers, processing is kind of hidden. The DB Admin sees higher activity on the DB: but what has caused it? Difficult to debug………….

  9. It is very encouraging to go through the post for it contains information about these interesting feature. It is a useful tutorial. Hot Listings

Comments

*


− 1 = two