Testing and troubleshooting various IIS/AppPool settings related to API timeouts

Is there a documented default timeout value for a runspace hosting the APIs defined in PSU? I created a simple endpoint that gives details about the current ProcessId and start time. The first request to a specific endpoint yields a response of about 16 seconds since the pwsh.exe process has just started. A subsequent request is obviously much faster, i.e., 341 milliseconds. (See comparison below. Command execution timing is in my prompt function). At some point in the future, (not sure if it based on time or additional requests acting as keep-alives), I will get a new ProcessId and the response will be closer to 16 seconds again. Is there anyway to adjust this? I also noticed that API endpoints do not share the same runspace or process. Not sure if this may also be beneficial or adjustable since this behavior is per endpoint.

First request:

PS > irm https://<servername>/pid

Name: pwsh
ProcessId: 13284
WorkingSetMB: 83.26953125
StartTime: 2020-06-29T08:34:53.9376477-04:00

[16.129s] PS >

Second request:

PS > irm https://<servername>/pid

Name: pwsh
ProcessId: 13284
WorkingSetMB: 88.74609375
StartTime: 2020-06-29T08:34:53.9376477-04:00

[341.767ms] PS >

The API shouldn’t be restarting like that. The API won’t share the same runspace but it should share the same process. Runspaces are reset per API request.

If you API is restarting, it might be that the API process is terminating for some reason and we should figure out why. PSU logs or event logs might be good to spot why that’s happening.

1 Like

Thanks for the response Adam. I am not aware of a place to upload a log, but there are a few things that I noticed in the log, so I will share and highlight certain sections.

The first thing I notice is that when the new process starts, there is a warning about a Dashboard Proxy.

info: UniversalAutomation.JobProcessManager[0]

  Starting C:\Program Files\PowerShell\7\pwsh.exe with command line -NoProfile -NoExit -Command "& { [System.Reflection.Assembly]::LoadFrom('C:\Universal\Host.dll') | Out-Null; [UniversalHost.AgentService]::StartSecurity(65089, 14876) }"

info: UniversalAutomation.JobProcessManager[0]

  Starting job using Process.

info: Universal.Server.Services.SecurityProxy[0]

  Port set to 65090

info: UniversalAutomation.JobProcessManager[0]

  Starting C:\Program Files\PowerShell\7\pwsh.exe with command line -NoProfile -NoExit -Command "& { [System.Reflection.Assembly]::LoadFrom('C:\Universal\Host.dll') | Out-Null; [UniversalHost.AgentService]::StartApi(65092, 14876) }"

info: UniversalAutomation.JobProcessManager[0]

  Starting job using Process.

warn: Universal.Server.Services.DashboardManager[0]

  Dashboard proxy for dashboard 1 not found.

info: UniversalAutomation.JobProcessManager[0]

  Starting job using Process.

info: UniversalAutomation.JobProcessManager[0]

  Starting job using Process.

Next thing is the GroomService log entries seem to be timestamped exactly 30 days ago at GMT time. I’m not sure this is cause for concern.

info: Microsoft.Hosting.Lifetime[0]

  Application started. Press Ctrl+C to shut down.

info: Microsoft.Hosting.Lifetime[0]

  Hosting environment: Production

info: Microsoft.Hosting.Lifetime[0]

  Content root path: C:\Universal

info: UniversalAutomation.GroomService[0]

  Starting groom job.

info: UniversalAutomation.GroomService[0]

  Groom date is: 5/30/2020 1:01:08 PM

info: UniversalAutomation.GroomService[0]

  Old jobs: 0

info: UniversalAutomation.GroomService[0]

  Finished groom job.

After exactly 20 minutes, I see an error related to the LogFLushThread() method for the DashboardProxy:

info: UniversalAutomation.JobProcessManager[0]

  Starting job using Process.

fail: Universal.Server.Services.DashboardManager[0]

  Failed to flush logs.

LiteDB.LiteException: Cannot insert duplicate key in unique index ‘_id’. The duplicate value is ‘"$/dashboardlog/1\00000"’.

at LiteDB.IndexService.AddNode(CollectionIndex index, BsonValue key, Byte level, IndexNode last)

at LiteDB.IndexService.AddNode(CollectionIndex index, BsonValue key, IndexNode last)

at LiteDB.LiteEngine.InsertDocument(CollectionPage col, BsonDocument doc, BsonType autoId)

at LiteDB.LiteEngine.<>c__DisplayClass22_0.b__0(CollectionPage col)

at LiteDB.LiteEngine.Transaction[T](String collection, Boolean addIfNotExists, Func`2 action)

at LiteDB.LiteEngine.Insert(String collection, BsonDocument doc, BsonType autoId)

at LiteDB.LiteFileStream.WriteChunks()

at LiteDB.LiteFileStream.Flush()

at LiteDB.LiteFileStream.Dispose(Boolean disposing)

at System.IO.Stream.Close()

at LiteDB.LiteStorage.Upload(String id, String filename, Stream stream)

at UniversalAutomation.FileStorage`1.Insert(T item) in E:\src\universal\src\UniversalAutomation.LiteDB\LiteDBDatabase.cs:line 374

at Universal.Server.Services.DashboardProxy.LogFlushThread() in E:\src\universal\src\Universal.Server\Services\DashboardProxy.cs:line 417

info: UniversalAutomation.GroomService[0]

  Starting groom job.

Finally, the process terminates about 33 minutes later in this case. It talks about the path to E:\src\universal\src\Universal.Server\Services\DashboardProxy.cs file, but I do not see that path on my E: drive. (The actual “servername” was replaced before sharing).

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:b7068279 caught stopping signal…

info: Microsoft.Hosting.Lifetime[0]

  Application is shutting down...

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:b7068279 caught stopped signal…

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:b7068279 All dispatchers stopped

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:b7068279 successfully reported itself as stopped in 0.5085 ms

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:b7068279 has been stopped in total 9.6033 ms

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:c7e4328d caught stopping signal…

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:c7e4328d All dispatchers stopped

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:c7e4328d successfully reported itself as stopped in 0.0927 ms

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:c7e4328d has been stopped in total 3.7637 ms

2020-06-29 09:33:51 [INFO] (Hangfire.Server.BackgroundServerProcess) Server <servername>:14876:c7e4328d caught stopped signal…

fail: Universal.Server.Services.DashboardManager[0]

  Failed to flush logs.

System.NullReferenceException: Object reference not set to an instance of an object.

at UniversalAutomation.FileStorage`1.Delete(T item) in E:\src\universal\src\UniversalAutomation.LiteDB\LiteDBDatabase.cs:line 315

at Universal.Server.Services.DashboardProxy.LogFlushThread() in E:\src\universal\src\Universal.Server\Services\DashboardProxy.cs:line 410

Unhandled exception. System.AggregateException: An error occurred while writing to logger(s). (Cannot access a disposed object.

Object name: ‘EventLogInternal’.)

—> System.ObjectDisposedException: Cannot access a disposed object.

Object name: ‘EventLogInternal’.

at System.Diagnostics.EventLogInternal.OpenForWrite(String currentMachineName)

at System.Diagnostics.EventLogInternal.InternalWriteEvent(UInt32 eventID, UInt16 category, EventLogEntryType type, String[] strings, Byte[] rawData, String currentMachineName)

at System.Diagnostics.EventLogInternal.WriteEvent(EventInstance instance, Byte[] data, Object[] values)

at System.Diagnostics.EventLog.WriteEvent(EventInstance instance, Object[] values)

at Microsoft.Extensions.Logging.EventLog.WindowsEventLog.WriteEntry(String message, EventLogEntryType type, Int32 eventID, Int16 category)

at Microsoft.Extensions.Logging.EventLog.EventLogLogger.WriteMessage(String message, EventLogEntryType eventLogEntryType, Int32 eventId)

at Microsoft.Extensions.Logging.EventLog.EventLogLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)

at Microsoft.Extensions.Logging.Logger.g__LoggerLog|12_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func3 formatter, List1& exceptions, TState& state)

— End of inner exception stack trace —

at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1 exceptions)

at Microsoft.Extensions.Logging.Logger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)

at Microsoft.Extensions.Logging.Logger1.Microsoft.Extensions.Logging.ILogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func3 formatter)

at Microsoft.Extensions.Logging.LoggerExtensions.Log(IL

The application event log corroborated the PSU logs also. Sorry for the long and verbose post. Any help is appreciated!

This warning is fine. It’s attempting to stop the dashboard before starting it but is not finding it.

This is the start date for logs being groom. That being, it grooms logs older than 30 days.

This one is a bit concerning. Not sure why it’s exactly 20 minutes and why it’s throwing that error. I’ll take a look at that code.

This is the code on our builder and just info for the stack trace.

How are you hosting this? It appears to be gracefully shutting down. It looks like it received a stop signal from something.

Thanks again Adam. I am hosting this on IIS 10 (Windows 2016). I am wondering now that you said that, maybe it’s a setting on my site or AppPool? I did follow the recommended and documented settings, but who knows? :man_shrugging:

It could be the default app pool recycling setting. We might need to add that one to the docs.

Lol…its also kinda funny that the default setting is 29 minutes

I thought that was in minutes. So 29 hours, no? However, looking under the General section in Advanced settings, I just tried to change “Start Mode” to “AlwaysRunning” from “OnDemand” which is the default.

Oh, man. I’m blind.

Ok. Let me know if that helps.

So, I think I got a step in the right direction. It turns out, that you may want to not only set the “Start Mode” to “AlwaysRunning”, but in doing so, also set the “Idle Time-out (minutes)” from the default of 20 to 0 under the Process Model section. Further explanation can be seen here: https://github.com/aspnet/Hosting/issues/522#issuecomment-165036633

I tried testing both, and proved that AlwaysRunning = True -AND Idle Time-out = 20, the AppPool still times out after 20 minutes. However, with the Time-out set to 0, I got much further.

Unfortunately, after about ~3 hours, the w3wp.exe process terminated on my server. Here is the event log entry:

Log Name: Application
Source: Application Error
Date: 6/29/2020 3:32:33 PM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: <servername>
Description:
Faulting application name: w3wp.exe, version: 10.0.14393.0, time stamp: 0x57899b8a
Faulting module name: KERNELBASE.dll, version: 10.0.14393.3659, time stamp: 0x5e9140ed
Exception code: 0xe0434352
Fault offset: 0x0000000000034f38
Faulting process id: 0x2ae0
Faulting application start time: 0x01d64e3b227717db
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\System32\KERNELBASE.dll
Report Id: 11c6a501-1ae4-4bd1-9790-66ee216df3d5
Faulting package full name:
Faulting package-relative application ID:

I’m just wondering now if any others are seeing the same w3wp.exe process crashes, and/or they have a similar version of KERNELBASE.dll loaded?

In your web.config, do you have it set to the InProcess hosting model? If you are, you might be able to switch it to OutOfProcess to see if a better error message comes out. I’ve noticed that the InProcess hosting model, while more performant, really hides the output from PSU.

Just changed it over to “OutOfProcess” and going through another cycle of testing. I will report back.

Update: I am no longer experiencing a w3wp.exe termination, or a pwsh.exe time-out. Here are the things that I changed:

  1. AppPool Start Mode to ‘AlwaysRunning’
  2. AppPool Idle Time-out to 0
  3. Using the ‘OutOfProcess’ hosting model in web.config
  4. Removing the single UD Dashboard hosted in PSU

I’d like to reintroduce #3 & #4 back one by one to see if either can be identified as culprits. I wanted to share this since these AppPool settings may be worthwhile considerations for hosting in IIS.