Easysoft ODBC-ODBC Bridge Performance White Paper

Contents

1.0 General

This paper aims to provide all the information to application developers and users to get the best possible throughput from their use of the Easysoft ODBC-ODBC Bridge (OOB) Most of this information is OOB specific but in some cases it is driver or application/interface specific.

1.1 Document Conventions

Some sections in this document provide pseudo code. The pseudo code uses constructs that should be familiar to anyone with some programming experience.

Languages differ in their use of string concatenation. Some use '+', some use '.', both or don't need a string concatenation operator. We have used a '+' character to denote string concatenation.

1.2 Applications/Interfaces

The applications and interfaces specifically addressed in this paper are MS Access, Perl DBI/DBD::ODBC, PHP. You should realise that all of these applications and interfaces are still being developed so in some cases the best source of information will be the mailing lists and news groups associated with those products.

2.0 General Advice

2.1 ODBC/Logging

ODBC logging is your friend. If you want to see exactly what ODBC calls are being made from your application or interface turn ODBC tracing on in the ODBC driver manager.

For Windows this may be done via the ODBC administrator by selecting the Tracing tab, entering a valid trace log filename and selecting "Start Tracing Now".

For unix and the unixODBC driver manager, edit the odbcinst.ini file and add a section like this:

[ODBC]
Trace      = No 
TraceFile  = /tmp/sql.log

Turning on tracing is extremely useful if you did not write the application or interface and don't have the source code. The driver manager will log all the ODBC calls and you will get a much better idea of what ODBC calls are being made and the access patterns of the application.

However, bare in mind that enabling tracing in the driver manager is the last thing you want to do for high throughput as tracing is very time consuming and can generate huge log files.

Unfortunately there are a number of places and ways in which tracing may be enabled and in many cases we see, some form of tracing is slowing the database connection down. Check the following list to make sure tracing is not being turned on in a production environment:

  1. Check all odbc.ini files for lines containing "Logging=n" where n > 0. Comment them out or set the value to 0.
  2. If you are using the unixODBC driver manager, check odbcinst.ini files and make sure the [ODBC] section does not have "Trace = yes". If it does set Trace to "No" or remove the line entirely.
  3. Make sure you have not turned on tracing at the server end. You can check this through a web browser to http://server_machine:8890 and then Configuration. Logging should be set to 0. Alternatively on UNIX examine the server odbc.ini/esoobserver.ini file or on Windows examine the registry for:

    HKEY_LOCAL_MACHINE\
    SOFTWARE\
    Easysoft ODBC-ODBC Bridge\
    Configuration\
    System\
    Settings\
    Logging
    
  4. If using Windows make sure tracing in the driver manager is not turned on. You can check this by going in to the ODBC Administrator and clicking on the tracing tab.

  5. Check your application is not turning ODBC tracing on by calling SQLSetConnectAttr(SQL_ATTR_TRACE, SQL_OPT_TRACE_ON).

2.2 OOB Authentication

By default the OOB Server authenticates each user connecting to it.

The client passes an operating system username and password to the server which the OOB Server then passes to the operating system to validate. In both Windows and UNIX the OOB Server then becomes that user for the duration of the client connection.

In Windows, in particular this can be quite a lengthy process depending on where your primary domain controller is located and what it is doing. If you are happy accessing the OOB Server without operating system username/password authentication is safe for your network then disabling OOB Server authentication can make a big difference to connection times.

Here are some sample times from Linux to Windows where the Windows machine is the PDC and is running MS SQL Server (which we use as the database). The results are the average of 50 consecutive connections.

time in seconds with authentication enabled (default) : 0.486s
time in seconds with authentication disabled          : 0.174s

If your PDC is busy or it takes longer to contact it, the difference between authenticated or not might be much larger.

Neither of these times are likely to bother a normal user logging in to a workstation on your network but if say the OOB is running under your web server to connect to your remote database thousands of times a day then this may prove a significant difference.

To disable OOB Server authentication change the configuration in the OOB Server HTTP administrator (see Configuring the OOB Server in the manual).

2.3 Connection Pooling

Connection pooling is where the driver manager keeps hold of open connections when SQLDisconnect is called so they may be used again. The driver manager stores the connection string used to SQLDriverConnect (or SQLConnect) with each connection it opens so it may be reused if SQLDriverConnect/SQLConnect is called with the same connection string. Usually there is a timeout associated with a pooled connection such that if the pooled connection is not used again within the timeout period the driver manager will drop it.

Connection pooling can significantly speed up most applications which repeatedly open and close ODBC connections.

With OOB there are two sides of the operation to consider, the client end and the server end; as both connect to an ODBC driver manager. Connection pooling at the OOB Server end will speed up access to the final ODBC driver for you database. Connection pooling at the OOB client end will speed up access to the OOB Server.

To enable connection pooling in Windows you need to edit the ODBC driver manager configuration using the ODBC configuration utility. Select the "Connection Pooling" tab, double click the driver you are using, click "Pool Connections to this driver" and enter a timeout in the field labelled "Time that unused connections remain in the pool in seconds".

An example of this dialogue can be found here

For the unixODBC driver manager, edit the odbcinst.ini file, add "Pooling = Yes" to the [ODBC] section and then add a "CPTimeout" value to the driver section e.g.

[ODBC]
Trace       = No
Trace File  = /tmp/sql.log
Pooling     = Yes

[OOB]
Description = Easysoft ODBC-ODBC Bridge
Driver      = /usr/local/easysoft/oob/client/libesoobclient.so
Setup       = /usr/local/easysoft/oob/client/libesoobsetup.so
FileUsage   = 1
CPTimeout   = 120

As an example of the difference connection pooling can make we run the same test as in "OOB Authentication" above but with connection pooling enabled. The average time per connection after 50 consecutive connections was 0.007s.

There are a few additional considerations you should note:

  1. Pooled connections which have timed out are only closed when another connection to the same driver manager is made.
  2. Pooled connections are generally only process wide.

    This is a very important point especially with respect to the OOB Server which normally runs in a multi-threaded mode but can run in a multi-process mode. If the OOB Server is running in Multi-Process mode then connection pooling at the server is not viable as the OOB Server starts a new process for each incoming connection.

    In addition, a popular initial candidate for using pooled connections at the client end is a web server like Apache. Here again (like above) you need to remember that most web servers fork child processes to handle the connections. The pooled connections are only within the individual web server processes and not web server wide unless the web server is multi-threaded.

  3. Pooling connections may cause you to have more open database connections at any one time than you did prior to pooling connections (this may have licensing implications within your database).

    e.g. Suppose you enable connection pooling for driver X and run applications from one or more clients which make connections to multiple data sources all in the same database. If the connection pooling timeout is set to 120 seconds but 120s elapses before that connection is reused you effectively had a connection open for an additional 120s (over non-pooled scenario). If you have limited licenses for your database (or for OOB) then this could cause you to hit the license limit. The worst scenario would involve many clients which connect infrequently as each of their connections would be held open for at least 120s after they disconnected and possibly a lot longer if no other connections were made through the same driver manager.

    A classic case of this is a web server (see above). Most web servers create multiple processes to handle HTTP requests. If you enable connection pooling then you will potentially have N open database connections (where N is the number of HTTP Server processes). Since you generally have little control over how the main web server process hands off pages to its child processes with sufficient traffic you eventually end up with a lot of pooled connections.

    e.g. Assume you configure your web server to only allow it to start 5 child processes to handle connections. Assume you have enabled connection pooling, a timeout of 120 seconds and you have one page on your web site that requires ODBC access. Assume the page that requires ODBC access is hit once every 20 seconds. If the web server hands out the HTTP request for the ODBC enabled page to a different child each of the first 5 times the page is hit then you have 5 concurrent connections to the database after a 100 seconds. Without connection pooling you would only have had 1 concurrent connection at any one time.

Please refer to the OOB knowledge base article Why do my connections to the OOB server seem to be dropped after a while?

2.4 Persistent Connections

Persistent connections are really just the same mechanism as Connection Pooling (above) except that in general the application/interface does the pooling instead of the ODBC driver manager.

Where possible we would recommend using ODBC driver manager connection pooling over any application specific persistent connection method.

Please see "Connection Pooling" as most of the notes there are applicable to persistent connections.

Please refer to the OOB knowledge base article Why do my connections to the OOB server seem to be dropped after a while?

2.5 Windows:OOB Server threads versus processes

The OOB Server for windows may be configured to run in Multi-Process mode or Multi-threaded mode. In Multi-Threaded mode (the default) the OOB Server starts a new thread to handle each incoming client connection. In Multi-Process mode the OOB Server starts a new process for each incoming connection.

Multi-Process mode in the OOB Server was originally written to allow connection to thread-unsafe ODBC drivers (e.g. JET implementations before JET 4.0).

There are a few differences between running the server Multi-Threaded and Multi-Process which can impact throughput and resources:

  1. Windows tends to create threads quicker than processes. As a result connection times for a Multi-Threaded OOB Server are slightly less.

    For our connection test mentioned above, the multi-threaded server handled 150 consecutive connections from the same application in 20.3 seconds compared with 22.7 seconds in the multi-process server.

  2. Windows seems to allocate CPU in a fairer way between processes than threads in a process. As a result, the Multi-Process OOB Server tends to be quicker after connection when there are multiple concurrent connections.
  3. The Multi-Threaded OOB Server uses less resources (machine wide handles, memory specifically).

2.6 UNIX: inetd versus standalone server

The OOB Server for UNIX is installed by default under {x}inetd. This means inetd listens on a port on behalf of the OOB Server and when a connection is made to that port it runs a shell script which starts an OOB Server process which inherits the socket connection. Some sites also use tcpwrappers in their inetd configuration file to provide access control to the OOB Server.

The OOB Server may be run in standalone mode without inetd. In this mode the OOB Server listens on the port and forks a child process to handle each connection.

Running the OOB Server in standalone mode is quite a bit quicker because there is a lot less to do (one less fork, server configuration file read on startup instead of once per connection etc). It also has the advantage that you can run the OOB HTTP administration server to see what is going on in the OOB Server.

See "OOB the Server configuration" chapter in the manual or knowledge base article Can I run the OOB Server on UNIX without inetd SuperServer

The inetd server only has one advantage; if the main standalone OOB Server terminates no connections are possible until it is restarted This is not true of an inetd started server.

If you are using {x}inetd and making alot of connections in short periods of time please see the OOB knowledge base article Why is inetd terminating the OOB Server service?.

2.7 Reverse Lookups on client IP addresses

When a client connects to the OOB Server it can do a reverse lookup on the client's IP address to find out who they are. The resulting string may be used in the OOB's audit trail file and various notices the OOB Server issues.

To do this successfully, you need a working DNS server or a hosts lookup table with the clients IP address. If you DNS server is not contactable most DNS clients will make multiple attempts and this will slow connections down alot.

For fastest possible connections disable the OOB Server's ReverseLookup in its configuration (see the OOB Server configuration chapter of the manual for how to do this).

Please see the question Do you have to have DNS working for correct operation of the OOB? in the OOB knowledge base for further relevant information.

2.8 Access Control and OOB Server limits

The standard OOB Server contains access control rules defining which clients can access the server. The Enterprise server contains an additional access control list for access to specific data sources.

These access control lists are a vital part of the security built in to OOB but then they can also consume time during initial connection. For this reason you should try and keep your access control rules short and concise.

There are two connection-based limits in the OOB Server; MaxThreadCount and MaxClientConnect. MaxThreadCount limits the total number of threads or processes the OOB Server creates to handle client connections. MaxClientConnect limits the total number of concurrent connections from a particular client.

Both of these limits are useful in throttling the OOB Server but they have one draw back. At any one time, the OOB Server does not know accurately how many active connections there are (see the OOB manual for a description of why). As a result, when you set either of these limits the OOB Server has to cycle through the created threads/processes and find out if any have terminated. If you have many concurrent connections this can lengthen the connection time quite considerably.

2.9 ODBC data sources

A data source name is a logical name for a data repository or database. Any attributes that define a data source are stored under the DSN for retrieval by the driver.

In the case of the OOB, the data source name is the key to a set of attributes in the odbc.ini or the system information (in Windows).

When the application calls SQLDriverConnect or SQLConnect with a DSN, the driver manager uses the DSN to locate the relevant driver, loads the driver and calls SQLDriverConnect/SQLConnect in the driver with the same connection strings.

The OOB client can look up other attributes it needs for the connection in the odbc.ini or system information.

When the OOB Server calls SQLDriverConnect or SQLConnect it is acting like an ordinary ODBC application.

You should consider exactly what happens in the driver manager when you call SQLDriverConnect/SQLConnect with a data source name. In Windows the driver manager needs to locate the data source name in the registry, pull out the name of the driver, load the driver and then pass the ODBC calls on to the driver. The same is true for UNIX except the registry is replaced with the odbcinst.ini and odbc.ini files. Parsing these files when there are a large number of DSNs can make a large difference to connection times.

This is a slightly exaggerated example but here are some timings for an odbc.ini file containing one DSN and an odbc.ini file containing 60 DSNs with the DSN we want last in the file:

timings for 50 connects/disconnects
one DSN in odbc.ini file                               : 28s
60 DSNs in odbc.ini file and required DSN last in file : 42s

NOTE: In later unixODBC versions the difference is significantly less if all the connects/disconnects are done in the same process because the unixODBC driver manager caches values parsed from the ini files.

There is another aspect with respect to connection attributes. If your application calls SQLConnect/SQLDriverConnect with a DSN name and nothing else then the OOB client needs to examine the registry/odbc.ini to obtain the other attributes it needs to complete the connection to the server (e.g. Server, TargetDSN etc). Any connection attribute that may be specified in the registry/odbc.ini may be specified in the connection string. For optimum speed or convenience (DSN-less connections) you may want to pass all the required connection attributes in the connection string.

On UNIX there is one last potential possibility for slowing down the connection which is related to how the OOB client ODBC driver and the unixODBC ODBC driver manager locate DSNs. The OOB client looks for DSNs in the following places (in order):

ODBCINI (environment variable)
<cwd>odbc.ini
<cwd>.odbc.ini (UNIX only, not VMS)
<home>odbc.ini
<home>.odbc.ini (UNIX only, not VMS)
/etc/odbc.ini (SYS$ROOT:odbc.ini for VMS) - system DSNs only
<wherever unixODBC stores DSNs>

Obviously it takes time to examine the file system for all these files, read them and look for the DSN. The sooner your DSN is found the quicker the initial connect will be. In particular, if you have built the unixODBC driver manager yourself the configure script defaults sysconfdir (where unixODBC looks for system DSNs) to /usr/local/etc/odbc.ini. The OOB client does not look in /usr/local/etc/odbc.ini itself, partly because unixODBC can be configured to use any directory. Instead, if the OOB client has not found your DSN it attempts to dynamically load unixODBC's libodbcinst shared object and then use the SQLPrivateProfileString API to retrieve DSN attributes. This process takes alot longer than if OOB finds the DSN itself. One way to speed this up to locate the odbc.ini unixODBC uses for system DSNs and then symbolically link /etc/odbc/ini to that file. In this way OOB locates the DSN alot quicker and unixODBC continues to work.

2.10 Client-side threads

One way you might think will automatically increase the throughput from your application is to make it multi-threaded. This can be true but there are a few simple situations to think about first:

  1. Can you divide your database access up in a meaningful way to distribute the work between threads?
  2. Are you using a thread-safe OOB client? The Windows OOB client ODBC driver is thread-safe but for UNIX there are separate distributions for thread-safe and thread-unsafe drivers. The UNIX distributions containing thread-safe OOB clients and unixODBC driver manager have "-mt" in the distribution name and contain OOB client shared objects with "_r" in the name.
  3. Making multiple threads use the same connection is not going to produce any performance increase. The reason for this is that the OOB client's network library serialises access to the socket connected to the server (there is one socket per ODBC connection). To get the maximum benefit from multiple threads using ODBC each thread should run with its own ODBC connection handle.

2.11 Connection Attempts and Licensing

There are a few remaining configurable parameters for the OOB client and server which can affect connection times.

2.11.1 ConnectAttempts

For the OOB client ConnectAttempts specifies how many attempts the client will make to connect to a server.

The OOB client will make ConnectAttempts connection attempts if the server operating system returns ECONREFUSED (connection refused). The reason for this is that on some platforms (notably NT workstation) the listen backlog can only be set to a low level so if there is a sudden rush of connections the operating system may turn some down before the OOB Server sees them. When ECONREFUSED is received by the client it waits connection_attempt * 0.1 seconds before trying again up to a maximum of ConnectAttempts times.

Here is an example where ConnectAttempts is 5 (the default) and the server always refuses the connection:

connect attempt 1 - refused by server
wait 0.1 seconds
connect attempt 2 - refused by server
wait 0.1*2 seconds
.
.
.
connect attempt 5 - refused by server
client gets error diagnostic:

08001:1:4:[Easysoft ODBC (Client)]
  Client unable to establish connection
HY000:2:4:[Easysoft ODBC (Client)]
  Connection refused, connect(), after 5 attempts

and here is an example where the first few attempts are refused due to the listen backlog queue being full but by the third attempt the backlog is cleared and the connection accepted:

connect attempt 1 - refused by server
wait 0.1 seconds
connect attempt 2 - refused by server
wait 0.1 * 2 seconds
connect attempt 3
 connection accepted by server operating system

ConnectAttempts is 5 by default but it may be modified in the OOB Settings off the DSN dialogue (Windows) or in the DSN section of your odbc.ini file. e.g.

[mydsn]
Driver = OOB
.
.
ConnectAttempts = 3

Please note: OOB used to have a global setting (NetConnectRetryCount) for all DSNs but this is deprecated now.

The value of ConnectAttempts takes on extra importance if you are using OOB Enterprise Server or OOB2 Server and you have specified multiple servers and ports in a client DSN entry. In this scenario you specifically want an OOB client which cannot contact the first OOB Server to fallover to the second and subsequent servers as quickly as possible. For this situation you want ConnectAttempts to be as low as possible e.g. 1.

2.11.2 RetryCount and RetryPause

The RetryCount and RetryPause configurable parameters affect what the OOB Server does under two situations:

  1. resource use on your server is such that the OOB Server is unable to create a new thread or process to handle the current incoming connection.
  2. you have used all your available license slots.

In each of these cases the RetryCount value specifies the number of additional attempts the server will make to obtain this resource and the RetryPause is the time in seconds it will wait between attempts.

The default values are RetryCount=5 and RetryPause=3.

You can modify these values in the OOB HTTP Administrator.

The default values are set in an attempt to let you maximise the use of your license slots. If you have many clients connecting and disconnecting then there may be times when all your license slots are in use for short periods of times. Rather than just return to the client, the server is assuming (with the default values) that a license slot will become available soon. However, the net affect of the default values when no license slots become available is to delay informing the client that there are no license slots available for 15 seconds.

Only you can determine what your desired behavior here should be.

2.12 Disabling auditing, access control and failing SQL log

The OOB Server contains functionality to:

Using any or all of these options has a small effect on resources and processing in the OOB Server. If you are looking for the optimum performance and are prepared to lose these features then disable them in the OOB HTTP Administrator.

2.13 Trouble shooting slow connections

If you believe connections to the OOB Server from the client are taking too long then there are a number of things you can do to check this out. However, first you need to determine if the connection time to expect is realistic. As a rough guide we have found:

In our experiments it takes less than 1/10 second to connect an application linked directly with the OOB client on Linux to a data source on NT over a 100Mb Ethernet link where both machines are on the same LAN and are mostly idle and OOB Server authentication is disabled. This is a small fraction of a second slower for interpreted languages like Perl or if there is a driver manager (like unixODBC) between the application and the OOB client. It can be considerably slower on Windows if OOB Server authentication is enabled as a lot depends on what you PDC is doing and where it is located on your network.

If you are still convinced something might not be configured correctly then check the following:

  1. Are the client and server machines on the same Local Area Network (LAN)? Connection times can increase dramatically if they are not. Try pinging the server from the client and see what return times you get.

    e.g.

    ping www.easysoft.com
    PING www.easysoft.com (194.131.236.4): 56 data bytes
    64 bytes from 194.131.236.4: icmp_seq=0 ttl=255 time=0.6 ms
    64 bytes from 194.131.236.4: icmp_seq=1 ttl=255 time=0.1 ms
    

    In this case less than 0.6ms - quick. Machines not on the same LAN can show a much greater time and this will be reflected in your ODBC connect times.

    e.g.

    ping www.whitehouse.gov
    PING www.whitehouse.gov (193.50.203.51) : 56 data bytes.
    64 bytes from 193.50.203.51: icmp_seq=0 ttl=240 time=34.5 ms
    64 bytes from 193.50.203.51: icmp_seq=1 ttl=240 time=35.2 ms
    

    This is relatively slow.

  2. Check that ODBC tracing is not turned on. See 2.1 ODBC/Logging.
  3. If the ReverseLookup configurable parameter in the OOB Server is enabled disable it and test again. See 2.7 Reverse Lookups on client IP addresses.
  4. Check you are not running out of license slots temporarily. See 2.11.2 RetryCount and RetryPause.
  5. Check the size of your odbc.ini files/registry and the position of the DSN you are using within it. See 2.9 ODBC data sources.
  6. If you are using MS SQL Server see the advice under 4.1 MS SQL Server.
  7. If you are using Windows 2000 see the OOB knowledge base and the article Why have connection times to MS SQL Server slowed down since upgrading to Windows 2000 Server?.
  8. Work through the examples of using oobping in 2.13.1 Using oobping to help diagnose slow connections.

2.13.1 Using oobping to help diagnose slow connections

oobping is a small program distributed with the OOB since version 1.0.0.35. oobping is a valuable tool for checking OOB connectivity and diagnosing connection problems or connection timing issues.

A full description of oobping may be found in the OOB manual and the OOB knowledge base articles

For Windows distributions you will find oobping.exe in installdir\Easysoft\Easysoft ODBC-ODBC Bridge.

For UNIX OOB distributions there are two versions of oobping called oobpings (a statically linked version) and oobpingd (a dynamically linked version linked against the libesoobclient shared object) which may be found in /usr/local/easysoft/bin. To use oobpingd you may need to set and export your LD_LIBRARY_PATH/LD_RUN_PATH/LIBPATH to include /usr/local/easysoft/oob/client and /usr/local/easysoft/lib.

oobping has the following command line:

oobping [-h host | -d ODBC_connection_string] {-t port}
        {-u osuser} {-p ospassword} {-e}

where:

-h host The name or IP address of the machine where the OOB Server is.
-d ODBC_connection_string The ODBC connection string is that defined by ODBC. i.e. a list of semi-colon separated attribute=value pairs. e.g. DSN=test;UID=fred;PWD=bloggs;
If you specify -u and/or -p as well as -d "LogonUser=xxx;LogonAuth=yyy;" will be added to the end of the connection string where xxx and yyy are the values specified for -u and -p.
-t port The port on which the OOB Server is listening
-u osuser A valid username on the "host" operating system
-p ospassword A password for the user specified with -u
-e Show time to complete the requested operation (post OOB 1.1.0.0)

From OOB after version 1.1.0.0, oobping has the -e option which times the requested operation. This can be invaluable in diagnosing a slow connection and determining which phase the problem is occurring in.

If you don't have a version supporting the -e switch then you can use the common UNIX "time" function in front of the oobping command instead.

A simple example is:

==========
oobping -e -h myserver -t 8888
Host: myserver, Port: 8888
Attempting connection...OK
Examining Server...
    OOB Server Version: 1.1.0.00
    OOB Server Name: OOB
Time for execution: 0.16s
==========

If this is repeated with -u and -p you can work out the extra time required to perform the operating system logon:

==========
oobping -e -h myserver -u myosuser -p mypassword
Host: myserver, Port: 8888
Attempting connection...OK
Examining Server...
    OOB Server Version: 1.1.0.00
    OOB Server Name: OOB
Trying to authenticate...OK
Time for execution: 0.52s
==========

This example clearly demonstrates the extra time required to authenticate a user.

As a final example the next example shows a full connection to the OOB Server, OOB server authentication and remote ODBC connection:

==========
oobping -e -d "DSN=test;"
Using Connection string :
DSN=test;
Connected OK

01000:1:5701:[NetConn: 0172c620][Microsoft][ODBC SQL Server Driver]
  [SQL Server]Changed database context to 'test'.
01000:2:5703:[NetConn: 0172c620][Microsoft][ODBC SQL Server Driver]
  [SQL Server]Changed language setting to us_english.
OutConnectionString:
DSN=test;SERVER=myserver;TRANSPORT=tcpip;PORT=8888;TARGETDSN=test;
  UID=dbuser;PWD=dbauth;LOGONUSER=me;LOGONAUTH=mypassword;

Connected to database:  test
DBMS Name:  Microsoft SQL Server
Driver Name:  esoobclient
Driver Version:  01.00.0043
Disconnecting
Time for execution: 0.78s
==========

Here you can see the extra time required to connect to the remote ODBC data source.

2.14 Simple SQL advice

This is some simple advice on SQL use which may help you prevent wasting valuable cpu cycles in your code:

  1. When issuing queries only select the columns you need. Generating result-sets containing columns you don't need just wastes time in the driver. Worse still, some drivers will have no choice but to read the column data from the database even though you never access it. Doing this also helps protect your application from table changes (e.g. the addition of a new column).
  2. Try batching your SQL. An Application that builds several SQL statements to execute often realizes better performance if it batches the statements together and sends them to the server all at once. e.g.

    select * from table1; select * from table2

    You can use SQLMoreResults() to get the second result-set.

  3. If your database supports stored procedures you can often realise performance gains by coding your SQL statements into them. In most cases:

    • The SQL statements are parsed and compiled once when the procedure is created and not when the procedure is used by your application.
    • Often a call to a stored procedure requires sending less data over the network between the ODBC driver and the database than the equivalent SQL.

3.0 Applications/Interfaces

3.1 Using ODBC directly

In ODBC there are multiple ways of retrieving query result-sets and inserting data. Picking the right one for your application is crucial in determining throughput. Here we examine the simplest ways of retrieving result-sets and inserting data and then gradually improve the methods to get more speed. The code examples are pseudo code.

To provide meaningful results we have used a table containing 5 columns, a integer, a char(30), a varchar(255), a timestamp and a float. The table is created with 50000 rows.

3.1.1 Retrieving Result-sets

The simplest mechanism for issuing a query and retrieving a result-set is:

SQLExecDirect("query producing a result-set")
columns = SQLNumResultCols()
while (SQLFetch() succeeds)
{
   for (col = 1; col <= columns; col++)
   {
     column_data = SQLGetData(col, SQL_C_CHAR)
   }
}

Here all the columns are retrieved as SQL_C_CHARs so the database has to convert the integer column and the timestamp column to a string.

The C code for the above algorithm took 47 seconds to run.

One possible way it could be easily modified that might speed it up is to fetch the columns as their native types, e.g. the integer as a SQL_C_LONG, the timestamp as a SQL_C_TIMESTAMP and the float as a SQL_C_FLOAT. This would avoid the ODBC driver converting the native type to a SQL_C_CHAR string. The pseudo code for this looks like:

SQLINTEGER idata
SQLFLOAT fdata
SQL_TIMESTAMP_STRUCT ts
SQLCHAR cdata1[256]
SQLCHAR cdata2[256]

idata = SQLGetData(1, SQL_C_LONG);
cdata = SQLGetData(2, SQL_C_CHAR);
cdata2 = SQLGetData(3, SQL_C_CHAR);
ts = SQLGetData(4, SQL_C_TIMESTAMP);
fdata = SQLGetData(5, SQL_C_FLOAT);

However, when we run this version of the program we see it takes 62 seconds to run (longer). The main reasons for this are:

  1. the floats representation on the server platform are not necessarily the same as the client platform so OOB has to convert them from the server's representation to the client's representation.
  2. the timestamp is a structure which the OOB splits up into the fields of the structure to transfer over the network and then has to put the timestamp back together at the client end.

Doubles are the same as floats in this respect and SQL_DATEs, SQL_TIMEs and SQL_NUMERICs are the same as timestamps.

One way we might be able to speed it up is to use ODBC bound columns. With bound columns you bind memory addresses to each column and when SQLFetch is called the entire row is returned in one go without having to call SQLGetData individually for each column. The pseudo code for this is:

SQLINTEGER idata
SQLFLOAT fdata
SQL_TIMESTAMP_STRUCT ts
SQLCHAR cdata1[256]
SQLCHAR cdata2[256]

SQLBindCol(1, &idata)
SQLBindCol(2, cdata1)
SQLBindCol(3, cdata2)
SQLBindCol(4, &ts)
SQLBindCol(5, &fdata)

SQLExecDirect("query producing a result-set")
while (SQLFetch() succeeds)
{
  ; SQLFetch has placed the column data into our variables
}

The time for this implementation is 22 seconds, half of what we started with. The main reason for this being a lot faster is that the OOB Server can send an entire row back to the client in one network call. A good rule of thumb for OOB is reducing the number of remote ODBC calls speeds the application up.

The last example used column-wise binding but we only retrieved one row at a time. ODBC has a facility for returning more than one row of column-bound columns at a time. To enable this you call SQLSetStmtAttr() with an argument of SQL_ATTR_ROW_ARRAY_SIZE and pass in the number of rows you want in one go.

The pseudo code looks like this:

define ROWS 10
SQLINTEGER idata[ROWS]
SQLFLOAT fdata[ROWS]
SQL_TIMESTAMP_STRUCT ts[ROWS]
SQLCHAR cdata1[ROWS][256]
SQLCHAR cdata2[ROWS][256]

SQLSetStmtAttr(SQL_ATTR_ROW_ARRAY_SIZE, ROWS)

SQLBindCol(1, &idata)
SQLBindCol(2, cdata1)
SQLBindCol(3, cdata2)
SQLBindCol(4, &ts)
SQLBindCol(5, &fdata)


SQLExecDirect("query producing a result-set")
while (SQLFetch() succeeds)
{
  ; SQLFetch has placed the column data into our variables
}

When we run this implementation the time to retrieve the 50000 rows is down to 5 seconds, 1/9 of our first implementation.

The speed increase of using bound columns is usually greater the more columns there are in the result-set. In addition, setting SQL_ATTR_ROW_ARRAY_SIZE > 1 tends to provide larger amounts of consecutive bytes for column data which OOB will favor for data compression when transferring the data over the network.

There is one other alternative, which is row-wise binding of columns. In row-size binding, you create a structure with one field for each column, ask for row-wise binding and then pass the address of each field in the structure to SQLBindCol and the size of the structure. It is basically the same as column-wise binding except when you set SQL_ATTR_ROW_ARRAY_SIZE > 1. With column-wise binding each address passed to SQLBindCol is supposed to be the base address of an array of that column type. With row-wise binding you create an array of your row structures.

For OOB, row-wise binding is a little slower. The OOB client does not actually pass the call to set SQL_BIND_TYPE through to the driver when the type is anything other than xxx_BIND_BY_COLUMN. OOB leaves the bind type set to column-wise binding, notes the application wants row-wise binding and converts the returned data to row-wise. This makes no difference to any application which uses row-wise binding as it works as ODBC dictates but it is one of the few instances in OOB where what the application asks for (in ODBC terms) is not mirrored at the server side.

3.1.2 Inserting data

Unsurprisingly, much of the examples in the previous section apply to inserting data.

These examples use the same table structure as in the previous section but this time we want to populate the table with 50000 rows.

The simplest pseudo code to populate our table is:

SQLCHAR sql[1024]
for (i = 1; i <= 50000; i++)
{
    sql =  "insert into perf values " +
          "(" + i + ", 'this is a char thirty', " +
          "'this is a varchar 255', " +
          "{ts '2002-08-01 11:12:13.123'}, " + i * 1.1 + ")"
    SQLExecDirect(sql);
}

Here we construct the SQL insert statement once for each row and call SQLExecDirect once for each row. When we run this code it takes 5 minutes and 10 seconds. Why? The first problem with this code is the database has to parse and prepare the insert statement once per row; this can be time consuming.

A much more efficient method is to used a parameterized insert. Here parameters are using in place of the column data. Instead of the above SQL we would use:

insert into perf values (?,?,?,?,?)

We prepare this SQL once (by calling SQLPrepare), bind the parameters with SQLBindParameter and then just keep calling SQLExecute, changing the data we bound each time. e.g.

SQLINTEGER idata
SQLCHAR cdata1[256]
SQLCHAR cdata2[256]
SQL_TIMESTAMP_STRUCT ts
SQLREAL fdata

SQLPrepare(insert into perf values (?,?,?,?,?)

SQLBindParameter(1, idata)
SQLBindParameter(2, cdata1)
SQLBindParameter(3, cdata2)
SQLBindParameter(4, ts)
SQLBindParameter(5, fdata)

for (i = 0; i <= 50000; i++)
{
    set idata, cdata1, cdata2, ts, fdata to whatever values you
    want to insert
    SQLExecute()
}

However, if you run this through OOB you'll find it does not make any real difference. The reason for this is that in the first case with straight forward SQL insert you are making one network call per insert whereas with the parameterised inserts there are two network calls; one for SQLExecute and one to send the parameters (OOB also has some extra work to pack up the parameters).

With many databases a very easy speed up if you are doing bulk inserts, say copying large quantities of data from one place to another is to turn autocommit off and commit the inserts at the end. This can half the insert time.

e.g.

Add the above pseudo code:

SQLSetConnectAttr(SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF)
<pseduo code above>
SQLEndTran(SQL_HANDLE_DBC, connection_handle, SQL_COMMIT)

When we apply this change and run to MS SQL Server, DB2 or Oracle (others may be the same) the time comes down to around 2 minutes 30 seconds; this is nearly half of what we started with. There are two things to remember about this:

  1. If you only commit at the end of all your inserts then if you abort when any SQLExecute fails none of your inserts are committed.
  2. Individual inserts will not be visible to other users until you have finished all your inserts and committed them. If this bothers you then perhaps you can still get improved speed by committing less often.

One final change we can make is to use arrays of parameters (as we did in row fetching). To do this you call SQLSetStmtAttr() and set the SQL_ATTR_PARAMSET_SIZE to a number > 1. e.g.

#define ROWS 10
SQLINTEGER idata[ROWS]
SQLCHAR cdata1[ROWS][256]
SQLCHAR cdata2{ROWS][256]
SQL_TIMESTAMP_STRUCT ts[ROWS]
SQLREAL fdata[ROWS]

SQLSetStmtAttr(SQL_ATTR_PARAMSET_SIZE, ROWS)

SQLPrepare(insert into perf values (?,?,?,?,?)

SQLBindParameter(1, idata)
SQLBindParameter(2, cdata1)
SQLBindParameter(3, cdata2)
SQLBindParameter(4, ts)
SQLBindParameter(5, fdata)

for (i = 0; i <= 5000; i++)
{
    set idata, cdata1, cdata2, ts, fdata to whatever values you
    want to insert (* 10)
    SQLExecute()
}

Note this code only does 5000 iterations as you are inserting 10 rows at a time. The time for this version was 34 seconds. If you take this further inserting 50 rows at a time it falls again to 18 seconds but there is usually a limit at which the speed increase bottoms out. This final figure works out as 2777 inserts a second.

3.1.3 Cursors

3.1.3.1 In General

In ODBC a cursor is automatically opened on each result-set and an application retrieves rows through the cursor using SQLFetch, SQLFetchScroll or SQLExtendedFetch. An application can change the cursor using SQLSetStmtAttr before executing a SQL statement but it should be noted that the driver may change these settings as it sees fit in which case it returns an "option value changed" diagnostic (see later).

The default settings for cursor options are:

SQL_ATTR_CURSOR_TYPE - SQL_CURSOR_FORWARD_ONLY
SQL_ATTR_CONCURRENCY - SQL_CONCUR_READ_ONLY
SQL_ROWSET_SIZE/SQL_ATTR_ROW_ARRAY_SIZE - 1

An application can change the SQL_ATTR_CURSOR_TYPE to one of the following types:

  1. Static cursors (SQL_CURSOR_STATIC)

    In a static cursor the result-set is unchanging once a particular row has been fetched (for some drivers the result-set is built when the cursor is opened). The cursor does not reflect any changes made in the database with respect to fetched rows in the result-set or values in the columns i.e. a static cursor always returns the result-set as it was at the time the cursor was opened or a row was fetched. Any rows added/deleted or column data changed since the cursor was created that satisfy the conditions of the query are not seen (although static cursors may detect the same applications own updates, deletes or inserts) until the cursor is reopened. Static cursors are read-only.

  2. Dynamic cursors (SQL_CURSOR_DYNAMIC)

    Dynamic cursors are the opposite of static cursors; they reflect all changes made to the rows in the result-set as the user scrolls around the cursor i.e. the data values and rows in the cursor can change dynamically on each fetch.

    The cursor shows all deleted, inserted and updated statements whether they were made by the application with the dynamic cursor or something else.

    You cannot use fetch absolute to position the cursor somewhere in the result-set as the size of the result-set and the position of the rows within the result-set are not constant.

  3. Forward-only cursors (SQL_CURSOR_FORWARD_ONLY)

    This is the same as a dynamic cursor except you can only fetch the rows in the result-set in sequence from the start to the end of the cursor.

  4. Keyset-driven (SQL_CURSOR_KEYSET_DRIVEN)

    With keyset-driven cursors the rows in the result-set and their order is fixed when the cursor is opened. Keyset-driven cursors are controlled using a set of unique identifiers (keys) known as the keyset. The keys are built from the set of columns that uniquely identify the rows. The keyset is the set of all the key values that made up the rows in the result-set when the cursor was opened.

    Changes to the values of columns which are not in the keyset (whether by the current application or something else) are reflected in the rows as the cursor is scrolled through. Inserts are not reflected (the cursor needs to be closed and reopened to see inserts). Rows which are deleted generate an invalid cursor position error if they are fetched. Likewise, updates to key columns operate like a delete of the old key followed by an insert on a new key and so the cursor does not see the new row.

  5. Mixed cursors

    e.g. SQL_CURSOR_KEYSET_DRIVEN and SQL_KEYSET_SIZE=n

    ODBC cursors support the concept of a rowset, which is the number of rows returned on an individual call to a fetch function. For example, an application which is capable of displaying only 20 rows at a time in the window might set the rowset size to 20 so that each fetch returned one window's depth of rows.

Cursor operations are also affected by the concurrency options set by the application. These are SQL_CONCUR_READONLY, SQL_CONCUR_VALUES, SQL_CONCUR_ROWVER and SQL_CONCUR_LOCK. SQL_CONCURRENCY affects how the database engine locks rows in the result-set.

The locking behavior of cursors is based on an interaction between the concurrency options and the transaction isolation level set by the application. You set the transaction isolation level with SQL_ATTR_TXN_ISOLATION which made be set to:

In addition, an application can specify the characteristics or a cursor rather than specifying the cursor type e.g. scrollability by setting SQL_ATTR_CURSOR_SCROLLABLE or sensitivity by setting SQL_ATTR_CURSOR_SENSITIVITY. The driver then has to choose the cursor type that most efficiently provides that characteristic. Whenever an application sets any of the SQL_ATTR_CONCURRENCY, SQL_ATTR_CURSOR_SCROLLABLE, SQL_ATTR_CURSOR_SENSITIVITY or SQL_ATTR_CURSOR_TYPE statement attributes the driver makes any required change to the other statement attributes in this set of four so their values remain consistent.

An application that sets statement attributes to specify both a cursor type and a cursor characteristic is running the risk of obtaining a cursor that is not the most efficient method available in that driver of meeting the applications requirements.

Choice of a cursor depends largely on greater knowledge of how the database engine implements it and the correct choice can have a dramatic affect on performance/locking. As a basic rule, use the cursor and concurrency with sufficient functionality for your application and no more. However, there are other considerations depending on your database engine such as:

3.1.3.2 Cursor Library

The cursor library built into the ODBC Driver Manager attempts to imitate cursors when the ODBC driver does not have support for them. The principle settings are i) always use the cursor library (no matter what the ODBC driver supports) and ii) use the cursor library when required (i.e. if the ODBC driver does not support the required cursor). The cursor library only comes into play when an application binds columns. If the application binds columns and wants to move about the result-set with SQLFetchScroll (for instance) then the ODBC driver manager will keep a copy of the rows retrieved in bound columns so the application can move around the result-set even though the ODBC driver does not support this functionality.

Try and avoid the cursor library.

If your driver has support for the cursor you require then asking the driver manager to use the cursor library with SQLSetConnectAttr(SQL_USE_CURSORS, SQL_CUR_USE_ODBC) is a time waster. When you use SQL_CUR_USE_ODBC, the cursor library will generally convert all calls to SQLFetchScroll and SQLExtendedFetch into SQLFetch calls and will often ignore SQL_ROWSET_SIZE and SQL_ATTR_ROW_ARRAY_SIZS when calling the ODBC driver. i.e. if you set SQL_ROWSET_SIZE or SQL_ATTR_ROW_ARRAY_SIZE > 1 (to get extra performance) but then stipulate you want to use the driver manager cursor library then each call your app makes to SQLFetchxxx to get n rows at a time will be converted into n calls to SQLFetch.

3.1.4 Miscellaneous

Here are a few miscellaneous tips:

  1. Avoid repeatedly calling SQLDescribeParam and SQLDescribeCol on the same SQL. It is much better to cache these results yourself as these metadata calls can be expensive.
  2. If possible leave calling SQLDescribeParam/SQLDescribeCol until after calling SQLExecute as in some drivers this causes extra work.
  3. Try to avoid data conversions in SQLGetData, SQLBindCol etc. e.g. calling SQLGetData asking for a SQL_C_CHAR for a column which is a char.

3.2 Perl

3.2.1 DBD::ODBC

This section discusses use of Perl, perl modules DBI and DBD::ODBC, the unixODBC driver manager and OOB providing the ODBC driver. Perl DBI and DBD::ODBC are continually being improved so you should check you have latest versions.

The first thing we should point out is that you need to be realistic about your expectations with Perl. You are never going to get the kind of insert or select speeds from a Perl script that we have demonstrated with compiled C code in the previous sections.

DBI provides a generic interface to databases and DBD::ODBC the specific driver under DBI to access ODBC drivers. DBD::ODBC is written to work with any ODBC driver and as such it is difficult to provide specific optimisations like the ones in the previous ODBC section.

Some of the methods used by DBD::ODBC are not the most efficient use of ODBC for specific situations but are required for the generic nature of DBD::ODBC e.g. when some SQL is executed DBD::ODBC does not know if it was a select statement or an insert so a typical insert is followed by SQLNumResultCols(), SQLRowCount() and SQLMoreResults() calls all of which prove useless on a simple insert. This is not a criticism of DBD::ODBC it is simply recognising the difficulty of writing a super efficient DBD without knowing what the SQL is. These additional calls to SQL functions cause remote procedure calls to the server in OOB which slow perl scripts down.

As an example and using the same table described in the above ODBC section the following code took 15 minutes (55 inserts per second):

use DBI;
my $dbh = DBI->connect() || die "connect";
my $i;
my $sql;

for ($i = 1; $i <= 50000; $i++) {
    $sql = "insert into perf values (" . $i .
	", 'this is a char thirty', " .
	"'this is a varchar 255'," .
	"{ts '2002-08-01 11:12:13.123'}, " .
	$i*1.1 . ")";
    $dbh->do($sql);
}
$dbh->disconnect()

Unlike the C examples in the previous section, using parameterised inserts in Perl is quite a bit faster so the above code can be written as:

use DBI;

my $dbh = DBI->connect() || die "connect";

my $sql = "insert into perf values (?,?,?,?,?)";
my $stmt = $dbh->prepare($sql);

$stmt->bind_param(2, 'this is a char thirty');
$stmt->bind_param(3, 'this is a varchar 255');
$stmt->bind_param(4, '2008-08-01 11:12:13.123');

for (my $i = 1; $i <= 50000; $i++) {
    $stmt->bind_param(1, $i);
    $stmt->bind_param(5, $i*1.1);
    $stmt->execute();
}
$dbh->disconnect();

and the time for the 50000 inserts drops to 10 minutes (83 inserts a second). Using arrays of bound parameters like this:

use DBI;

my $dbh = DBI->connect()|| die "connect";

my $i;
my $sql;
$sql = "insert into perf values (?,?,?,?,?)";
my $stmt = $dbh->prepare($sql);
my (@data1, @data2, @data3, @data4, @data5);

for ($i = 0; $i < 10; $i++)
{
    $data1[$i] = $i;
    $data2[$i] = "this is a char thirty";
    $data3[$i] = "this is a varchar 255";
    $data4[$i] = "2002-08-01 11:12:13.123";
    $data5[$i] = $i * 1.1;
}
$stmt->bind_param_array(1, \@data1);
$stmt->bind_param_array(2, \@data2);
$stmt->bind_param_array(3, \@data3);
$stmt->bind_param_array(4, \@data4);
$stmt->bind_param_array(5, \@data5);

$data2 = "this is a char thirty";
$data3 = "this is a varchar 255";
$data4 = "2002-08-01 11:12:13.123";

my %status;

for ($i = 1; $i <= 5000; $i++) {
    $data5 = $i * 1.1;
    $stmt->execute_array(\%status);
}
$dbh->disconnect();

does not seem to make any real difference. This is probably because at this time array bound parameters are actually executed in ODBC terms with SQL_PARAMSET_SIZE = 1 so it is unsurprising there is no speed up over the previous example.

Using the same table of 50000 rows and writing simple perl to fetch the data we might use:

#!/usr/bin/perl
use DBI;
my $dbh = DBI->connect()|| die "connect";

my $sql;
$sql = "select * from perf";
$stmt = $dbh->prepare($sql);
$stmt->execute();
my @row;
while(@row = $stmt->fetchrow_array()){};
$dbh->disconnect();

This takes around 93 seconds (537 rows a second).

Using bound columns we would write:

use DBI;
my $dbh = DBI->connect()|| die "connect";

my $sql;
$sql = "select * from perf";
$stmt = $dbh->prepare($sql);
my $nf = $sth->{NUM_OF_FIELDS};
my @cols;
for (my $i = 1; $i <= $nf; $i++) {
    $sth->bind_col($i, \$cols[$i]);
}
$stmt->execute();
my @row;
while($stmt->fetch()){};
$dbh->disconnect();

but this appears to take a similar amount of time. This is no surprise for the same reasons given under the above ODBC section.

So the conclusion here is that if you want the best insert/fetch speed use C but if you want the flexibility/portability of Perl you need to accept it is going to be slower.

3.2.2 DBD::Proxy

For anyone wanting a Perl only solution DBD::Proxy offers similar functionality that OOB provides. DBD::Proxy transfers the DBI calls and data between the client and server. Since it is only doing the DBI calls and OOB is doing all the ODBC calls beneath DBI you would expect OOB to be slower than DBD::Proxy, however this is not usually the case.

Using the same tables listed in the ODBC section in this document the results below are for 50000 inserts and a select on all columns for the 5000 rows to an ISAM database using the Easysoft ODBC driver. All values in seconds and the average of 3 runs.

CPUu and CPUs are user and system CPU times.

testEasysoft ODBC-ODBC BridgeDBD::Proxy
 CPUuCPUsElapsedCPUuCPUsElapsed
insert19.7322.7284.1858.175.31151.54
select7.017.0931.942.520.6148.33

When these numbers are used to calculate the inserts per second and rows retrieved per second:

testEasysoft ODBC-ODBC BridgeDBD::Proxy
inserts per second594330
rows retrieved per second15651035

Of course, as we have shown turning auto commit off and committing all the 50000 inserts at the end speeds up the inserts per second with most databases but this works equally for both Easysoft ODBC-ODBC Bridge and DBD::Proxy.

DBD::proxy does have some optimisation settings like proxy_no_finish, proxy_quote and RowCacheSize. OOB has Block Fetch Mode but it does not apply to ODBC 2.0 applications (which DBD::ODBC is before version 0.45_2) and after 0.45_2 DBD::ODBC binds the columns itself anyway.

With the following settings the select test was run again:

proxy_quote = "local"
proxy_no_finish = 1
RowCacheSize = 100
testEasysoft ODBC-ODBC BridgeDBD::Proxy
 CPUuCPUsElapsedCPUuCPUsElapsed
select7.317.1732.193.60.1416.12

which puts the final rows retrieved per second at:

testEasysoft ODBC-ODBC BridgeDBD::Proxy
rows retrieved per second15533102

So, in their default and best configurations we could find, OOB was quicker than DBD::Proxy when inserting. OOB is also quicker than DBD::Proxy under default configuration when retrieving rows. However, if memory constraints permit you to use RowCacheSize with higher values then DBD::Proxy is quicker than OOB on row retrieval.

We started by saying OOB RPC's every ODBC call but DBD::Proxy acts at a higher level and only has to RPC DBI calls. One area DBD::Proxy might possibly do alot better is over encrypted connections. Our experience is that encryption of these sorts of data streams can be time consuming. The more network calls made the worse the effect of encryption becomes. There are so many different encrpytion tunnels, (e,g, SSH, zebedee) we have not verified this.

3.3 PHP

Most of the comments made under the Perl section above apply equally to PHP. PHP is however different from Perl's DBI/DBD::ODBC in that it provides a PHP interface to ODBC i.e. ODBC functions are wrapped in C code written for PHP. However, a lot of the flexibility of ODBC is hidden from the PHP user and this reduces the opportunity for tuning PHP code for the fastest ODBC access. PHP always uses bound columns when retrieving a result-set and all our experiments showed very little difference in speed between different insertion and retrieval methods.

Having said that PHP is quite fast. The standard table we have used throughout this document of 50000 rows containing 5 mixed type columns is retrieved in 90 seconds (555 rows a second).

If you are using PHP with MS SQL Server there are two important points you should note:

  1. PHP attempts to set the cursor type in any ODBC driver that successfully returns SQL_FD_FETCH_ABSOLUTE in the SQLGetInfo call for SQL_FETCH_DIRECTION. The cursor which is requested depends on the version of PHP. Older versions (around 4.0.n/4.1.n) set the cursor to SQL_CURSOR_DYNAMIC but later versions (4.2.n) seem to set the cursor to SQL_CURSOR_FORWARD_ONLY.

    We believe the original reason for setting the cursor to SQL_CURSOR_DYNAMIC was because for MS SQL Server this is a server-side cursor which allows multiple active statements. Server side cursors in MS SQL Server are more expensive (see the section on cursors in this document). SQL_CURSOR_FORWARD_ONLY is the default cursor and this is not a server-side cursor but does not allow multiple active statements.

    The other drawback of MS SQL Server server-size cursors is that they do not support all SQL statements. Server cursors do not support any SQL statements that generate multiple result-sets, therefore they cannot be used to execute a stored procedure or a batch containing more than one select.

  2. Please see the notes in the Enabling ODBC support in PHP under Apache about PHP and PHP code hanging when using multiple active statements.

3.4 MS Access

There appear to be 4 main methods of ODBC access from MS Access:

  1. ADO

    ActiveX Data Objects supports different cursor types, is a wrapper over OLE DB and is often used under IIS.

    A typical ODBC access pattern for ADO is:

    SQLSetConnectAttr(SQL_MAX_ROWS=0)
    SQLSetStmtAttr(PARAM_BIND_TYPE = 0)
    SQLSetStmtAttr(PARAM_BIND_OFFSET_PTR = 0)
    SQLSetDescField(OCTET_LENGTH_PTR = 0)
    SQLParamOptions(1,0)
    SQLExecDirect(select * from table)
    SQLRowCount
    SQLNumResultCols
    Loop through columns
    {
      SQLDescribeCol(col)
      SQLColAttribute(to get SQL_COLUMN_UNSIGNED, 
                             SQL_COLUMN_UPDATABLE and
        SQL_COLUMN_LABEL)
    }
    SQLSetStmtAttr(ROW_BIND_TYPE=n where n>0)
    SQLSetStmtAttr(ROW_BIND_OFFSET_PTR=ptr where ptr != 0)
    SQLSetStmtAttr(RETRIEVE_DATA=0)
    Loop through columns:
    {
      SQLBindCol
    }
    SQLSetStmtAttr(ROWSET_SIZE=1)
    Loop though rows:
    {
      SQLExtendedFetch
    }
    

    Unfortunately, this is one of the slowest ways of getting column data with OOB. OOB deliberately does not support rowise binding ODBC calls; instead it binds the columns using column-wise binding (to avoid the structure alignment problems between different machines). An entire row is retrieved in one call to SQLExtendedFetch (which is good) but then OOB needs to translate that from column-wise binding to row-wise binding for the application. This method also uses more memory because the OOB client needs to allocate memory for one row of column-bound columns.

    Setting BlockFetchSize in OOB is irrelevant here as the application does its own binding.

  2. Dynaset

    Dynaset can be used where there are one of more unique columns (if there aren't any unique columns JET drops back to Snapshot mode - see below).

    What JET does depends on whether the DB engine can support multiple active statements. If it can then recent Jets (JET 4) appear to make the following ODBC calls:

    SQLSpecialColumns
    SQLGetTypeInfo(various)
    SLColumns
    SQLStatistics (to locate unique columns)
    stmt1 = SQLExecDirect(select uniqcol1,uniqcoln from table)
    stmt2 = SQLPrepare(select col1,col2,coln from table where uniqcol1=? and uniqcoln=?)
     From here on it enters a loop reading one row from stmt1 to use
    as parameters in stmt2:
     Loop
    {
      SQLFetch(stmt1)
      SQLGetData(stmt1, uniqcol1)
      .
      .
      SQLGetData(stmt1, uniqcoln)
      SQLBindParameter(stmt2, p1)
      .
      .
      SQLBindParameter(stmt2, pn)
      SQLExecute(stmt2)
      SQLFetch(stmt2)
      SQLGetData(stmt2, col1)
      .
      .
      SQLGetData(stmt2, coln)
      SQLFreeStmt(SQL_CLOSE)
    }
    

    As you can see, this method means that on stmt1 the one call to SQLExecDirect returns a result-set containing the unique columns for every row in the table and on stmt2, each SQLExecute returns a result-set containing only one row (the row that contains the current values for uniqcol1..n).

    OOB contains a Block Fetch Mode (see separate entry in this document) which may be used to speed up this ODBC access pattern. In this scenario setting the OOB's connection attribute, BlockFetchSize to 1 is the quickest. In this mode OOB, binds the columns in the result-sets and asks for one row at a time; the row data is cached by OOB when SQLFetch is called and returned to Jet when it calls SQLGetData for each column.

    With BlockFetchSize=1 the time taken to retrieve an entire table is often as little as half that when BlockFetchSize=0 although this depends on the number of columns and unique columns in your table.

  3. ODBC direct

    ODBC direct uses a forward-only cursor and the SQL is passed through to the DB engine.

    A typical ODBC access pattern looks like this:

    SQLGetTypeInfo(for all types)
    SQLSetStmtAttr(SQL_ROWSET_SIZE, 100)
    SQLExecDirect(select * from table)
    SQLNumResultCols
    Loop through columns
    {
      SQLColAttributes(coln)
      SQLBindCol(coln)
    }
    Loop until no more data
    {
      SQLExtendedFetch(SQL_FETCH_NEXT, 1)
    }
    

    This is a particularly fast way of reading a result-set as it

    • reads from the start to the end of the result - forward only and
    • sets the ROWSET_SIZE to 100 and retrieves 100 rows at a time.

    For OOB this is one of the fastest ways of retrieving a result-set (as was demonstrated in "Using ODBC directly"). This is always faster than Jet's dynaset mode. This is always slightly faster than OOB's Block Fetch Mode (which is irrelevant here as the application does its own binding) since the OOB client knows what datatypes the application wants to bind for each column before the column data is returned to the application.

  4. Snapshot

    Snapshot produces a read-only result-set where you can use first/next/last etc and is the fallback for dynaset if the table does not contain unique columns.

    A typical ODBC access pattern looks like this:

    SQLSpecialColumns
    SQLGetTypeInfo(various)
    SQLColumns
    SQLGetTypeInfo(various)
    SQLExecDirect(select col1, col2, coln from table)
    Loop through rows:
    {
      SQLFetch
      SQLGetData(col1)
      .
      .
      SQLGetData(coln)
    }
    

    This scenario is greatly speeded up by using OOB's Block Fetch Mode. Setting BlockFetchSize=100 will reduce the time to retrieve the table massively and since the result-set is read-only this is safe to do.

3.5 Miscellaneous Applications

For applications where you don't have access to the source code or cannot build it yourself then there is still one possible optimisation. OOB contains a Block Fetch Mode which can be used to retrieve multiple rows of data at once and then hand them off to the application one column at a time.

Many applications are written making ODBC calls like this:

SQLExecDirect(select * from table)
while(SQLFetch() succeeds)
{
    SQLGetData(column1)
    .
    .
    SQLGetData(columnn)
}

There is nothing wrong with the code but it is certainly not the quickest method to retrieve results-sets via OOB. As discussed in 3.1 Using ODBC directly binding columns to the result-set and asking for multiple rows per fetch is alot faster. If your application uses the above method but you are not able to change it then you may be able to use OOB's Block Fetch Mode. OOB's Block Fetch Mode comes in to play if you set the BlockFetchSize connection attribute and call a fetch API when certain other conditions do not apply. The following conditions must be true if OOB is to enter Block Fetch Mode:

  1. Cursor type must be FORWARD_ONLY (the usual default)
  2. RowArraySize must be 1
  3. there mustn't be any bound columns
  4. if the driver is MS SQL Server then the application must be ODBC 3.

    The MS SQL Server driver will not return RowsProcessedPtr to an ODBC 2.0 app.

  5. the row bind offset must be 0 (the ODBC default)
  6. you must not be using SQLExtendedFetch

Block Fetch Size may be set in the DSN dialogue

If BlockFetchSize is set and the above conditions apply then the OOB client allocates space for column/row data and binds the columns. SQLFetch is called and now the OOB client has N rows of data. As the application calls SQLFetch the OOB client moves through the bound result-set one row at a time. When the application calls SQLGetData there is not round trip to the server, the column data is supplied from memory. The only major difference here is that the OOB client does not know what data type the application will ask for the columns as so all columns are bound as SQL_C_DEFAULT. When the application calls SQLGetData and specifies the data type required, the OOB client will perform the conversion. The application should behave as it always has but result-sets will be retrieved more quickly.

Do not use Block Fetch Mode (where BlockFetchSize > 1) if you application performs positioned updates or deletes. BlockFetchSize = 1 should be safe for any application and is often still faster then not enabling Block Fetch Mode.

To enable Block Fetch Mode add the attribute BlockFetchSize=n to your connection string or enter n into the OOB ODBC driver DSN dialogue. A value of 0 means Block Fetch Mode is disabled (the default). A value for n (where n>0) is the number of rows to fetch in one go. BlockFetchSize may be set up to 100 but there is a balance between memory use and speed. As BlockFetchSize is increased more memory at the client end is required.

4.0 Databases

4.1 MS SQL Server

There are a number of entries in the OOB knowledge base relevant to connecting to MS SQL Server.

4.1.1 Positioning of OOB Server

With OOB there are basically two possible configuration scenarios.

  1. The OOB Server is installed on the same machine as MS SQL Server.
  2. The OOB Server is installed on a different machine to MS SQL Server.

Normally it is OK to do [1] but in some situations responsiveness of the OOB Server is poor when installed on the same machine as MS SQL Server.

Microsoft introduced changes to MS SQL Server 7.0 which may cause connections to an OOB Server and MS SQLServer and subsequent operations to be very slow. These settings elevate the priority of SQL Server to such an extent that other processes on the machine do not get a look in. (e.g. when someone is issuing a query, TaskManager even stops displaying). If you are running MS SQL Server 7 and the OOB Server is running on the same machine as SQL Server you should try the following:

  1. The settings are available from the properties menu of the Enterprise Manager. The settings are "Boost SQL Server Priority on Windows NT" and "Use NT fibres". When these are set NT is so busy servicing SQLServer, the accept() in the OOB Server can take some time due to NT not scheduling any other process.
  2. Move the OOB Server onto another Windows machine.

In general we would recommend running the OOB Server on a different machine to SQL Server for best responsiveness.

Further evidence for this comes from a typical scenario we tried here at Easysoft running an in house ODBC benchmark through OOB to an OOB Server on the same machine as MS SQL Server and to an OOB Server on a different machine to MS SQL Server. The tests cover the creation of tables containing different column types, insertion via SQLExecDirect, parameterised inserts and the selection of rows via various methods including parameterised selects.

graph

Here we used MS SQL Server Enterprise Edition v8.00.760(SP3) on a 2 cpu machine with 1Gb with SQL Server configured to use only 1 cpu, NT fibers disabled and boost priority disabled. The SQL Server machine was doing nothing else. In the case where the OOB Server was on a different machine it was running NT 4.00.1381 and was on the same LAN as the MS SQL Server machine. Clearly, in this case it is a lot faster to run the OOB Server on a separate machine to MS SQL Server.

Also, it is worth avoiding putting the OOB Server on your PDC (Primary Domain Controller) or SDC (Secondary Domain Controller) if you have OS authentication enabled (the default) as experiments at Easysoft have shown this to be faster.

4.1.2 Miscellaneous

Here are a few tips for speeding up access to MS SQL Server through OOB:

  1. If you are inserting a lot of data and don't need to know how many rows were inserted prefix your SQL with "SET NOCOUNT ON". This tells MS SQL Server you do not need to know the number of rows affected i.e. you are not going to need a useful result from SQLRowCount().
  2. SQLPrepare/SQLExecute is often more expensive than SQLExecDirect. the main reason for this is that SQL Server does not directly support SQLPrepare/SQLExecute. The MS SQL Server ODBC driver provides SQLPrepare/SQLExecute support by creating a temporary procedure; this is one extra round trip to the database over SQLExecDirect. There is also an impact on your tempdb as this is where temporary procedures are created (see SQLSetStmtAttr attributes SQL_USE_PROCEDURES_FOR_PREPARE).

    If you are writing an application which issues specific SQL statements and intend distributing that to multiple users in your organisation a procedure would be more appropriate. Having many users issuing the same SQL in SQLPrepare/SQLExecute is going to start filling your tempdb up with multiple occurrences of same temporary procedure.