Monday, April 7, 2014

AX 2012, AifMessageInspector::AfterReceiveRequest - Failed to Logon to Microsoft Dynamics AX

From what I can see, many people have stumbled upon (aka experienced) this beeing logged in the Event Log (Application) on servers hosting AOS instances. I also see this beeing logged especially on AOS servers hosting AOS instances handling Enterprise Portal and Enterprise Search load, but also on other AOS servers (more sporadic).

Some relevant examples:
In the last link above, Martin DrĂ¡b suggests to look at the Service Operations granted and this is the best suggestion I have seen so far. I have not verified this yet, but I would recommend to have a look at the Security Roles and especially the Service Operations granted to a specific Security Role. A good starting point is the articles Services, service operations, and service groups [AX 2012] and About role-based security in services and AIF [AX 2012] on TechNet.

My hypotese is that AX simply reports back Failed to Logon instead of a more informative description pointing to missing Privilegies or Duties.

All comments are welcomed to share some experience and reach a final solution for this rather hard to diagnose issue (no information about which Service except it's an AIF Document Service and what user is Failing to Logon to AX).

WCF Tracing could perhaps reveal the AIF Document Service and User (I have not looked at this yet), but Exception Logging will not catch this (the issue happens before the actual Message reaches the Queue in AX).

The only solution I would not recommend of the ones I have seen suggested, is to add the Business Connector Proxy User (System Account) as a User in AX. Why? This is a privileged user account in AX and the fact that it's covered by a specific form in AX, should tell you that this account should not be treated as a normal user in AX...

AX 2012 R2 CU6, Feature to reduce the effect of Parameter Sniffing

Working with many AX solutions over the years, I have seen Parameter Sniffing in many solutions and the effect of "bad Execution Plans" in SQL Server, can be rather "not so nice" to all end users.

Most of the solutions I see have many companies or Legal Enteties (LE) as they are refered to in AX 2012, but the distribution of the data in shared tables, are very commonly skewed where one company or LE typically allocates 80 to 9x percent of the rows while 8 - 10 companies/LE contribute to the rest of the rows in individual shared tables. Microsoft introduced PARTITION as a new way of isolating companies or LE in AX 2012 R2 and this led to another column in most tables contributing to the parameter sniffing issue. AX 2012 R2 is designed to have several partitions and many LE each representing a balanced data distribution (each partition and Company/LE having roughly the same number of rows), but in real life, this is very seldom what customers end up with.

I would say that Parameter Sniffing is not a phenomen (it's in fact a SQL Server by design behaviour), but it tend to hit AX solutions very hard when the design criteria is not met. The x-factor lies in the fact that Microsoft utilizes Parameterized Queries heavily to make sure every Query is executed "the same way" (also the main reason for recommending MAXDOP = 1) each time and to reduce the number of SQL Compiles and Recompiles (has a small CPU cost). But the downside is that the actual value for the parameter when the statement reaches SQL Server for the first time (or the SQL Server is experiencing mamory pressure with flushing of execution plans), must represent "an average" partions and/or company/LE. The x-factor can simply be caused by the the first user logging on to AX the morning (early bird) after the AOS instances and/or SQL Server has been restarted and representing a small company with regards to the number of rows in shared tables, can "trick" SQL Server in generating a execution plan where SQL Server find it most efficent (based on the cost) to perform a table or index scan. When an user representing the largest company (number of rows in the underlying tables) implicitly reuses the same execution plan, all sorts of bad performance reports starts popping up. Looking at the SQL Server you will probably notice constantly high CPU utilization and digging a little further, You will probably identify one or several statements consuming much CPU time (some of then probably also beeing utilized by API Cursors).

Microsoft introduced a new feature in AX 2012 R2 CU6 to overcome this vulnerability and it's a kernel feature that's turned off by default. The feature is well described in the article Overcoming parameter sniffing issue in Microsoft Dynamics AX 2012-R2 – CU6 written by the AX Performance Team.

I have been aware of this rather important change (results in values for the columns PARTITION and DATAAREAID beeing always sent to SQL Server as literals or actual values instead of parameterized values), but I tend to await activation until I actually see (have proofe of) that a solution is impacted by Parameter Sniffing. Last week I saw proof of this happining in a busy solution focused around Sales Order processing and after activating the changes in the table SYSGLOBALCONFIGURATION followed by a stop - start sequence of the AOS instances, we imediately saw SQL Server generating execution plans using literals for the columns described which again "calmed" the solution. At the same time the number of SQL Compiles and Recompiles increased somewhat, but this was expected as the number of Execution Plans for each Parameterized Statement will increase by the product of the number of Partitions and the number of individual companies/LE utilizing a spesific module in AX.

This is perhaps one of the most important changes Microsoft has introduced and I have not seen any other solution improving (trace flags, force literals keyword etc.) or near eliminating this issue until this feature was introduced. Everyone involved in implementing AX 2012 solutions should be aware of this and be able to utilize this feature when needed. I say AX 2012 since MS stated that this change was expected to be back ported to AX 2012 (I don't know the status), but it's confirmed in AX 2012 R2 CU6 and later binary builds. Customers still utilizing AX 4.0 or AX 2009, will probably never benefit from this feature.

Many factors play a part in this story such as SQL Server Statistics, Max Degree of Parallism and API Cursors, but this is the single most efficent method of helping SQL Server generating the most representative (balanced) Execution Plans for AX based on my experience.