Beyond Persistence – How to Exploit Your Database for Better Performance
This post is contributed by Asad Ali – Enablement Services Consultant at dynaTrace. Based on his experience with numerous customers he describes how to optimize the data access layer of applications by “violating” a paradigm that is often referred to as a golden rule for designing applications.
Before the advent of the 3-tier architecture, client server technology was prevalent. In the client-server architecture, the databases played an important role for data persistence. During this time, major database vendors optimized the technology for fast read/write, almost real-time replication and clustering for maximum uptime.
The arrival of J2EE servers and .Net Framework saw a paradigm shift in application computing. The ability to deploy applications across multiple tiers and the use of multiple communication protocols including SOAP, Web Services, RMI have made it possible to scale up and scale down the application based upon the transaction load. Along with this shift in application deployment came the need to not only write code once and run anywhere but also to write application server independent code.
Over the past year and a half while consulting with customers, I have observed that while the business logic is tightly embedded in the application server, the database is treated as just a data store in many customers’ applications. To get data to process the business logic, the applications make lot of database queries per web requests. Shown below is an example of database requests made in one web- request.
As shown above, to process one web request this application is making over 3000 database calls. Such a data access strategy has multiple disadvantages including:
- High network overhead – The more the number of JDBC statements, the more the number of packets in the network – and hence more network latency.
- Unnecessary object creations – Once the data is received by the JDBC ResultSet, the application code has to create collection objects to process the results.
- Response time overhead in managing data result sets – Excessive object creation to handle the JDBC results means more memory consumption in the application server – which in turn can cause high GC cycles.
- High CPU consumption in the database server due to high database queries.
One of the common reasons used by architects in support of such architecture is to keep the business logic out of the databases. Their argument supports the concept that minimal changes would be required in the application if the database vendor were changed. This argument begs the question about how many times database vendors are changed during an application lifecycle. The answer is hardly anytime.
The (mis)use of Object-Relational (OR) mapping tools to load data into the application server also contributes to the problem of excessive calls to the database. OR mapping tools can not only cause the (n+1) query pattern problem but also can cause unnecessary trips to the database to retrieve one row at a time from the tables. It has also been argued by some Java developers that OR mapping tools significantly reduce application code. However, OR mapping tools do not perform well when massive amount of data is updated or deleted.
As I said earlier, the database vendors have made significant strides in optimizing data storage in the database. Databases are not just persistent stores. They are a sophisticated piece of software that handles very fast data reads and writes even when large data sets are manipulated. To harness this power of the database engine, it is important to write a query in such a way that you get the desired resultset by joining multiple tables instead of retrieving data from one table first and then using the foreign key to retrieve data from the other tables.
For complex reports in your application use of stored procedures is optimal. Replacing multiple queries used in a report by a single stored procedure not only reduces the network traffic but also improves query response times. Unlike individual SQL queries, stored procedures are precompiled in the database and hence their execution times are much lower. Additionally, for queries that join multiple tables with complex join conditions, stored procedures are better alternatives as you can split the queries in one stored procedure and use an optimized query plan.
As you look for ways to optimize your application, do pay attention to the data access layer. Patterns like (n+1) queries can be easily optimized by replacing multiple queries that get data from one table at a time with queries that can join multiple tables and only get the desired result sets. Use of stored procedures is definitely another way to optimize the data access layer. However, ensure that you don’t replace your queries that get data from a single table with a stored procedure because you won’t get any optimization results with this approach.