# $\Delta U T O S \Delta R^{M}$

## Why AUTOSAR Fails so Often

A report from a firefighter for Embedded Software Timing

Version 3

Peter Gliwa

12<sup>th</sup> June 2024

BOSCH

15<sup>th</sup> AUTOSAR Open Conference

STELLANTIS TOYOTA VOLKSWAGEN GROUP

Tokyo







- Wait a minute, we are at an AUTOSAR Open Conference.
- Is this an offense?
- A revolution?
- No, with this talk, I would like to
  - 1. Explain common reasons for timing problems and memory problems
  - 2. Point out solutions





This talk is a report from someone who saw many projects struggle with AUTOSAR.



#### Let's talk about German culture

With 'German culture', I do *not* mean

- Johann Sebastian Bach
- Ludwig van Beethoven
- Friedrich Schiller
- Johann Wolfgang von Goethe

Let's talk about the German culture of

- working together
- communicating

The following slide describes a certain tendency, please do not take it too seriously!













#### Where is the opponent?



#### Scenario seen around the globe

Customer is superior

- Has the power
- Must not be criticized

Sometimes Supplier and Customer act like opponents



#### Scenario with German engineers

Customer and Supplier work *together* to solve the problem.

If honest and very direct communication helps solving the problem, so be it.



#### Contents



- Introduction
- Common AUTOSAR problem I RTE overhead

non-terminating ECC tasks

Common AUTOSAR problem II

1

2







Summary





Timing/stack/memory analysis

German company, ~60 empl., 20% annual growth





Peter Gliwa (owner & CEO)

- Actively coaching international automotive OEMs and Tier-1s
- AUTOSAR work-package leader of work-package "ARTI" and document owner of AUTOSAR TR "Timing Analysis"
- Previously with ETAS / BOSCH
- Author of book "Embedded Software Timing" ( DE, EN, CN, KR, JP)





#### So what exactly is the problem?

#### Many projects run into timing issues

- System overloaded, yet functionality to add
- Sporadic crashes
- Weird functional behavior
- Communication issues
- ...

It is often not clear that a timing problem is causal!



#### Root-cause No.1 is an inefficient AUTOSAR configuration.

The trend is: it is getting worse (despite all the education and many task forces with successful outcome)





Data consistency in preemptive systems

Example: sharing data between preemptive tasks

**Step 1:** Task A starts reading data from a struct.

**Step 2:** Task B preempts Task A and updates the data structure.

**Step 3:** When Task A resumes execution, it uses inconsistent data (partly old, partly new)









Data consistency through copies (RTE implicit com.)

- At the beginning of Task A, copies of all critical data are created.

   *'critical' means here: data gets accessed from code with higher priority, such as Task B.*
- The actual copy process is protected typically through interrupt suspension or even spinlocks.
- Task A uses the copy of the data only.
- If Task A also *writes* shared data, it is copied back (not shown below) with protection.







AUTOSAR RTE: in many projects a high-consumer!



Example I Customer project (platform) 19 vehicle models affected







AUTOSAR RTE: in many projects a high-consumer!

|         | А          | В                      | С                                     | D      | E                            |
|---------|------------|------------------------|---------------------------------------|--------|------------------------------|
| 1       | Name       | , Туре                 | Address 🕞                             | Size 🗸 | CPUloadCore0                 |
| 48      | MK_Syscall | FUNC                   | 0x801d8e98                            | 0x46   | 29.273                       |
| 10      | ···· · · · |                        | · · · · · · · · · · · · · · · · · · · | ~ ~    |                              |
|         |            |                        |                                       |        |                              |
|         | ٨          | Р                      | 6                                     |        |                              |
|         | A          | В                      | C                                     | D      | E                            |
| 1       | Name       | ь<br><sub>⊸</sub> Туре | ر<br>Address                          | ⊃ Size | E<br>CPUloadCore1            |
| 1<br>48 |            | _                      | Address<br>0x801d8e                   |        | E<br>CPUloadCore1<br>27.5478 |

Example II Customer project (platform) 6 vehicle models affected

 Nearly 30% CPU-load
 on each core is spent for disabling/enabling
 / interrupts!

223,312 **syscalls** per second (!) on core 0 and 343,284 on core 1.







AUTOSAR RTE: in many projects a high-consumer!

Example III Customer project (platform) 29 vehicle models affected

Multicore AURIX overloaded. Option: use more powerful AURIX device.

#### Estimated cost: 160m €



¥ 26,001,600,000.00









Thoughts leading to a solution

Preemptions can cause inconsistent data (cf. earlier slide)  $\rightarrow$  no preemption = no problem

AUTOSAR RTE specification R20-11:

Copy semantic Copy semantic means, that the accessing entities **are able to read or write the 'copied' data from their** execution context in a non concurrent and non preempting manner. If all accessing entities are in the same preemption area this might not require a real physical data copy.

Example: Task A and Task B use implicit communication. When assigned the same priority, no copy is required for the data only these two tasks access.





What is the solution?

Use as few priorities as possible.

Even better: use internal resources (cf. Priority Ceiling Protocol) to avoid preemption. This allows to control the order in which several tasks in 'ready' state tasks are started through their priority.

**Hint:** assign all tasks same priority to evaluate the optimization potential (do not flash/use this software).



Starting point of the optimization: thousands of copies! After the optimization: lots of code/data vanishes!





Yet another customer example

```
Customer: "We face sporadic communication issues."
                                           GLIWA: "Is the system overloaded?"
void ErrorHook(StatusType status)
                                           Customer: "No. If it was, the ErrorHook would
                                                      trigger a reset."
  switch(status) {
                                           GLIWA: "Are you using the standard RTE set-up?"
    case E OS LIMIT:
                                           Customer: "Yes."
       /* failed task activation
                                           GLIWA: "Ouch!"
        * as a result of an overload
        * situation */
                                            User's intention: Reset
      SystemReset(); <
                                            when system is
      break;
                                            overloaded
    default:
                                            BUT: ErrorHook does
                  ErrorHook: called by
      break;
                  the OS, implemented
                                            not get called when an
                  by the user (of the OS)
                                            event is re-triggered
```











#### Back to customer example



Let's assume this runnable with a desired period of 10ms runs for 26ms (which it sometimes did in the customer's project).

#### **Event**

Rte\_Ev\_Cyclic2\_Task\_B\_0\_10ms gets re-triggered, ErrorHook does *not* get called.





And here the impact...



The customer was completely unaware of the overload scenario shown in this T1 trace.

> Red lines: here the ErrorHook was expected to fire.







Simple solution: use BCC1

```
TASK(Task_B_10ms)
```

```
CanNm_MainFunction();
CanSM_MainFunction();
TerminateTask();
```

```
TASK(Task_B_5ms)
```

}

```
CanTp_MainFunction();
CanXcp_MainFunction();
TerminateTask();
```

BCC1: Straight forward and simple

ErrorHook works as expected.

Perfect basis for cooperative multitasking

By the way: never use multiple task activations (BCC2). It tends to be a dirty workaround for an overloaded system.





Far too many projects using AUTOSAR are inefficient.

My recommendation: do not unlink from the real world!

*More* (AUTOSAR features) is not naturally *better*!

Introduce safety/efficiency subset/conformance class? Remove some of the problematic features?







## $\Delta UTOSAR^{M}$

# ありがとう Thank you!

Peter Gliwa Dipl.-Ing. (BA) Geschäftsführer (CEO)



GLIWA GmbH embedded systems Pollinger Str. 1 82362 Weilheim i.OB. Germany fon +49 - 881 - 13 85 22 - 10 fax +49 - 881 - 13 85 22 - 99 mobile +49 - 177 - 2 57 86 72

peter.gliwa@gliwa.com www.gliwa.com

STELLANTIS TOYOTA VOLKSWAGEN GROUP





