Kobus’ musings

The ramblings of an embedded fool.

Entelect Challenge 2016

This year I participated in the Entelect AI challenge, a yearly South African programming competition that has been running for five years now where programmers can match their skills in writing an autonomous agent competing in a classic computer game. This year was bomberman, but previous years competitions were based on tron light cycles, tanks, space invaders etc.

My bot unfortunately did not reach the final, but I am nevertheless not too disappointed in my bots performance, and in this post I will give a quick run down on the approach I took with developing my entry.

The game

The full list of rules can be found here, but what follows is a short description: The game is turn based and loosely based on bomberman, where between two to four contestants control an avatar.

Each avatar can navigate a 21x21 map containing destructible and indestructible walls by moving up, down, left and right. Each avatar can also plant a bomb , thereby either attempting to blow up his opponent, or a destructible wall possibly uncovering a powerup (granting the avatar more bombs or longer fuses on his bombs). An avatar can also choose to detonate his bomb with the shortest fuse.

The complete map might look something like this:

Points are scored for blowing your opponent up, blowing up destructible walls and exploring the map.

The competition also had a GUI component, and the images you see here is from the winner of the GUI competition (Ruan Moolman) and reproduced with his kind permission.

My approach

Markov decision process

For a great introduction to Markov decision processes (MDP), you are free to have a look at Introduction to Artificial Intelligence presented by Peter Norvig and Sebastian Thrun.

MDP’s as a concept is quite simple. You assign a desireable objective, like say a powerup a value. Then each square next to this powerup gets a value equal to the value you assigned to the powerup minus a penalty value. Two squares away from the powerup gets assigned a value minus two times the penalty and so forth.

You have to run the algorithm a couple of times over your map until the values settle, because with multiple powerups you want to give each square it’s highest possible value (i.e. the value it derives from the closest powerup to it). Then not only powerups can have assigned values, but any other objective (like destructible walls) as well, and you can assign the objectives different weights based on their perceived importance (and given I have done very little rigorous analysis about what weight each objective should have or what the penalty should be, the final weighting was admit-ably a bit of a thumb suck.

What you end up with is a map that looks as follows (based on the game map above):

This is basically a map with hills and valleys, with the hills being desirable locations, and the valleys less desirable. All that is left for the AI to do is then move in the direction of the steepest ascend to reach the closest most desirable location. This is great as a relatively simple and robust mechanism to decide what my bot’s next action should be. As a side bonus, it is independent of my bot’s location, so it could also be used to guess what my opponents next move might be as well…

Decision tree overlay

On top of the Markov decision tree, I implemented a basic expert system (if this then do that). I had basically only 5 general rules:

Should we blow a bomb

Can we steal a wall

Can we blow up an enemy player

Should we plant a bomb
 
Dont walk into explosions

The problem with this kind of ruleset, is it becomes very complex very quickly, and cannot accommodate all the edge cases you might face. A rule that works in one situation might be less than ideal in another. For instance, to determine if my bot should plant a bomb, I first only checked if the square is in range of a destructible wall. But what if one square on you can blow two (or more) wall simultaneously, or if there is a power up, should the bot pick up the power up first, or if there is another player close by, and on and on. Hence the success of neural nets for these kinds of problems. Because having a human write a ruleset for something like this is a never ending job. Nevertheless my basic rule set had to suffice for the competition.

Multiround look ahead

Lastly my bot could predict up to nine rounds forward from the current game state. The competition rules stated that each bot had up to two seconds each round to determine its next move. Remember a bot had the choice to either move up, down, left, right, do nothing, plant or blow a bomb, thus seven actions; and there were up to four bots in each game; and bombs had a maximum timer of 9 rounds. Thus if you wanted to brute force your solution by working out the best possible move out of all the possible moves (min maxing), you had to compute (7 x 4e9) = 2e37 positions every two seconds. Now modern computers are fast, but not that fast.

Modern computer clockcycles are about 1 Ghz, and lets say you have an octacore processor, and you are uber programmer (so you can compute a position every clock cycle and can code perfect multithreaded code with no overhead), that still only gives you about 2e31 clock cycles every two seconds to work with.

So clearly brute forcing every possible game state wasn’t going to cut it. So I had to reduce the search space, by not computing the impossible moves (bots can’t walk through walls), and only guessing my opponent bots most probable move, and only calculating three possible moves for my own bot, and then deciding which of those three was my best next move.

That cuts the worst case problem space to about 6e9 = 2e23 game states I had to calculate. Not quite doable for the worst case, but for most cases I could reduce the search space by not calculating future states where I had already died. From the 51 test cases I wrote for my bot, most ran under 200 ms, some to 700ms and one or two over two seconds, where I had code detecting it took to long and just went with the best result I had at that stage.

Results

Like I said, I didn’t get very far in the competion. (Be sure to view these videos using the desktop client, the mobile client is disabled for some reason) My bot competed in four rounds, top two goes through. Rounds 1 and 2 it dit quite well:

Round 1

Round 2

Rounds 3 and 4 did not go so well, though as a consolation my bot was leading in points just before the end.

Round 3

Round 4

Limitations of my approach

  • Writing rule sets like I have done is error prone and cannot catch all edge cases even in a relatively simple game environment such as this one.
  • Even though brute forcing every game state is not possible, my approach of aggressively cutting the solution space does come with its own set of problems. You want to be able to look 9 moves ahead in order not to get trapped in a corridor by your own bomb. But you also want to calculate each possible move your opponent might make not to be caught by surprise. Thus a better approach would have been more brute force calculation for one or two rounds ahead, and then minimal calculation for longer horizon predictions. (If that makes sense)
  • Calculating the best possible move (brute force min maxing) seems to still be a better solution than using Markov decision process to calculate a bots next move. MDP’s is simple and provides a general good solution, but in a competitive space it does not necessarily provide the best move which is what you want to win.

Lessons learned in general

  • Use unit tests to test your game logic. Not fine grained unit testing, basic functional testing, it will speed up your ability to test multiple strategies for your bot knowing the basic soundness of your solution is still intact.
  • In addition to writing the bot, I had to write a game viewer, unit tests, get the game harnass working, update periodically to the latest game engine, test multiple strategies, fix bugs, play test (a lot). The point being that it will take more time than you think to provide a competitive solution.
  • Compete against other competitors as early as you can before the actual competition, even if your bot is not quite ready yet. The lessons you will learn are invaluable and the earlier you can learn them the better.
  • Do not rest on your laurels, if your bot performs well early before the competition, keep working.
  • Plan to have your bot done a week before the competition closes. There are always issues uploading your bot at the very end, and you make mistakes if you still try to implement major functionality right before the end.

Why not a neural network

Neural networks, especially deep neural nets are actually a very good solution for a competition such as this one. Unfortunately in 2016, by the time I learned of OpenAI, Tensorflow and the Sedel Go competition, I was already quite a far with my bot not wanting to trash it all and start again.

I think if I have time to enter such a competition again, I would like to see if I can train a neural net and see how competitive it could be. It is without a doubt nearly all entries of these competitions is already mostly deep neural nets, with the only reason this competition didn’t have much neural net entries (to my knowledge) yet is because South Africa is a small market with a fairly small developer community lacking the necessary skill set / experience (please correct me if I am wrong).

Code is available on Github

A More Agile DO-178

When looking at software development methodologies, there is in my mind a spectrum that looks something as follows:

So what’s wrong with the extremes of waterfall or agile development, especially concerning safety critical products? Well I think Dilbert may have some advice to share here. Waterfall development typically suffers from the following:

DILBERT © 2001 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.


With waterfall development, a lot of unnecessary requirements gets specified, because the client is scared that if he doesn’t name every possible use case his product could get used for, he won’t have another chance. The result is bloat, increased costs and increases in the product complexity, not great for safety critical products.

Ok, so agile should be perfect then?

DILBERT © 1997 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.


Nope, agile struggles to guarantee that the product is safe. Agile tries to shorten the feedback cycle, “fail fast” or “move fast and break things”. Well when you are developing software for an aircraft, you can’t exactly crash an airplane every time you release and then quickly fix the bug, even if you do it fast…

Ok, so up to now most DO-178 development was done with waterfall methodologies, but what might a more agile DO-178 development process look like?

To answer this question, we have to look at the deliverables required for DO-178 certification, and at what stages of the waterfall development model they are typically produced:

Organisational:
(These can be re-used across multiple projects, if the projects are similar enough off course)

  • Software configuration management plan (SCM)
  • Software quality assurance plan (SQA)
  • Software requirements standards (SRS)
  • Software design standards (SDS)
  • Software code standards (SCS)
  • Software verification plan (SVP)

At the start of a project - Specification phase:

  • Plan for software aspects of certification (PSAC)
  • Software development plan (SDP)
  • Software requirements data (SRD)
  • Design description (DD)

During development - Implementation phase:

  • Source code
  • Object code
  • Software verification Cases and Procedures (SVCP)
  • Problem reports
  • Software Quality Assurance Records (SQA)
  • Software Configuration Management Records (SCMR)

At the end of a project - Testing and verification phase:

  • Software Verification Results (SVR)
  • Software Life Cycle Environment Configuration Index (SECI)
  • Software Configuration Index (SCI)
  • Software Accomplishment Summary (SAS)

The problem here is that a lot of the deliverables are generated at the start of a project, before the lessons have been learned. And a lot of the deliverables are generated at the end of a project, not keeping pace with the development of the software, and as such represent a significant source of costs, as these have to be produced and verified manually.

Most of these deliverables also usually takes the form of documentation, with the exception of the source code and object code. DO-178 does not specifically state that the outputs have to be in the form of documentation, and where possible we will try to replace traditionally labour intensive documentation with other artefacts, saving us effort and costs. Off course we must prove that there is no reduction in the reliability of the final product when making these changes.

The organisational deliverables is not my concern here, as once these have been generated they can be re-used across multiple projects. But let’s see if we can get some of the project specific deliverables be generated and verified continuously and automatically during development using the following:

Scrum

During sprint planning and review sessions, we can review and update the Design Description (DD) document detailing the software architecture. At the end of a sprint, the implemented user stories will form the Software Requirements Data (SRD’s). It will look something like this…

During sprint planning we ensure that the user stories i.e. the high level requirements we are planning to implement is consistent with previous implemented requirements. During sprint review we update the requirements with the implemented user stories, and ensure the created functional tests and unit tests i.e. the low level requirements are consistent with the high level requirements (user stories).

This turns the traditional waterfall model on it’s head. How can you write code and generate the requirements only afterwards? Well a lot of times, you only realise what the true requirements are when you are writing the code, so we only set the requirements in stone once we are sure. We still generate user stories as possible requirements before writing code.

Continuous integration (CI) and Continuous deployment (CD)

The CI and CD servers themselves are effectively the Software Life Cycle Environment Configuration Index (SECI), Software Configuration Index (SCI) and parts of the Sofware Configuration Management Records (SCMR) deliverables. For this to be possible, the CI server must have a copy of the version control database when duplicated for certification.

This means an agile setup might look as follows:

If all the tests pass, the CI / CD will autogenerate a snapshot of itself (VM’s or some other duplication means) and the version control database to serve as the SECI and SCI. It will also generate reports of the tests run and their results to serve as the SVP and SVR’s, and generate reports that can serve as SCMR items (This is baselines, commit histories etc.). This is off course highly idealistic and represents no small amount of additional work, but the purpose here is to show that these required deliverables of DO-178 is fairly repetative and thus highly automatable.

Test driven development

Test driven development can be used to generate the large parts of the Software Verification Cases and Procedures (SVCP’s) and Software Verification Results (SVR’s). High level requirements will be developed and tested with feature driven development, and unit tests will be used to develop and test low level requirements. But not all unit tests are really low level requirements, for instance testing if a function can handle null pointer parameters. As such we will mark which unit tests are indeed low level requirements in the unit testing code itself.

The relationship between the Continuous integration (CI) server and the Continuous deployment (CD) server is detailed in the popular test pyramid developed by Mike Cohn, where the CI is responsible for making sure the source code compiles at all times, and passes all unit tests, and the CD is responsible for making sure the automated functional tests pass at all times. One would expect to develop a lot more unit tests than functional tests, thereby limiting (but not eliminating) the need for expensive manual testing.

And what will the workflow differences look like between a waterfall and agile DO-178 project? The following represents a very simplified project workflow, but will hopefully give you an idea.

Every blue item represents a stage gate, that has to be satisfied before the team can continue on to the next set of items

*Sprint review includes a retrospective, code review, functional and unit tests review.

Going back to the DO-178 specification, it lists as an appendix the requirements for the deliverables. The purpose of an agile process would be then to automate the verification of as many of these requirements in order to speed up the certification and re-certification of the product. The difference between how a waterfall or agile work flow satisfies these requirements then looks as follows (DO-178C wording used):

Table A-1: Software planning process

Objective
Output
Agile strategy
1
The activities of the software life cycle processes are defined.
Plan for Software Aspects of Certification

Software Development Plan

Software Verification Plan

Software Configuration Management Plan

Software Quality Assurance Plan
An agile process is defined.
2
The software life cycle(s), including the inter-relationships between the processes, their sequencing, feedback mechanisms, and transition criteria, is defined.
An agile process is defined.
3
Software life cycle environment is selected and defined.
The CI server is defined as the software lifecycle environment as a final deliverable.
4
Additional considerations are addressed
No difference.
5
Software development standards are defined
SW Requirements standards

SW Design Standards

SW Code Standards
No difference.
6
Software plans comply with this document
Software Quality Assurance Records

Software verification results
The SVR’s will now be automatically generated by the unit tests, and the CD server, with a small section still generated manually with manual testing.
7
Development and revision of software plans are coordinated.
Software Quality Assurance Records

Software verification results.
No difference


Table A-2: Software Development processes

Objective
Output
Agile strategy
1
High-level requirements are developed
Software Requirements Data
At the end of each scrum, the implemented user stories will generate the high level requirements section in the SRD.
2
Derived high-level requirements are defined and provided to the system processes, including the system safety assessment process.
Software Requirements Data
At the end of each scrum, the implemented derived user stories will generate the high level requirements section in the SRD. There is a problem here, in that the system safety assesment process requires the high level requirements as input, and determines the DO-178 level required, but in an agile process the high level requirements are not defined at the beginning of a project.
3
Software architecture is developed
Design description
At the beginning of each scrum, the software architecture is reviewed, at the end of each scrum the software architecture document (DD) is updated.
4
Low level requirements are developed
Design description
At the end of each scrum, the implemented unit tests marked as low level requirements will generate the low level requirements section in the DD.
5
Derived low-level requirements are defined and provided to the system processes, including the system safety assessment process.
Design description
At the end of each scrum, the implemented unit tests marked as low level requirements will generate the low level requirements section in the DD.
6
Source code is developed
Source code
No difference.
7
Executable Object Code and Parameter Data Item Files, if any, are produced and loaded in the target computer.
Executable object code
No difference.


Table A-3: Verification of Outputs of Software Requirements Process

Objective
Output
Agile strategy
1
High-level requirements comply with system requirements.
Software Verification Results
At the beginning of each scrum, the suitability of the user stories to be implemented will be evaluated against the system requirements.
2
High level requirements are accurate and consistent
Software Verification Results
User stories to be accurate and consistent.
3
High level requirements are compatible with target computer
Software Verification Results
User stories verified with continuous deployment and functional tests.
4
High level requirements are verifiable
Software Verification Results
User stories verified with continuous deployment and functional tests.
5
High level requirements conform to standards
Software Verification Results
User stories to conform to standards.
6
High level requirements are traceable to system requirements
Software Verification Results
No difference.
7
Algorithms are accurate
Software Verification Results
No difference.


Table A-4: Verification of Outputs of Software Design Process

Objective
Output
Agile strategy
1
Low level requirements comply with high level requirements
Software Verification Results
Newly written unit tests marked as low level requirements will be annotated as to which high level requirement it is traced to and reviewed at every sprint.
2
Low level requirements are accurate and consistent
Software Verification Results
Unit tests to be accurate and consistent.
3
Low level requirements are compatible with target computer
Software Verification Results
Unit tests verified with continuous integration.
4
Low level requirements are verifiable
Software Verification Results
Unit tests verified with continuous integration.
5
Low level requirements conform to standards
Software Verification Results
Unit tests to conform to standards.
6
Low level requirements are traceable to High level requirements
Software Verification Results
Unit tests marked as low level requirements will be annotated as to which high level requirement it is traced to.
7
Algorithms are accurate
Software Verification Results
Accuracy can be verified with unit tests.
8
Software architecture is compatible with high level requirements
Software Verification Results
Software architecture to be reviewed and updated with every sprint.
9
Software architecture is consistent
Software Verification Results
Software architecture to be reviewed and updated with every sprint.
10
Software architecture is compatible with target computer
Software Verification Results
Software architecture verified with continuous deployment and functional tests.
11
Software architecture is verifiable
Software Verification Results
Software architecture verified with continuous deployment and functional tests.
12
Software architecture conforms to standards
Software Verification Results
Software architecture to be reviewed and updated with every sprint.
13
Software partitioning integrity is confirmed
Software Verification Results
Software partitioning integrity verified with continuous deployment and functional tests.


Table A-5: Verification of Outputs of Software Coding and Integration Process

Objective
Output
Agile strategy
1
Source code complies with low level requirements
Software Verification Results
Verified with unit tests.
2
Source code complies with software architecture
Software Verification Results
Can be confirmed with sprint code reviews or peer programming.
3
Source code is verifiable
Software Verification Results
Verified with unit tests and functional tests.
4
Source code conforms to standards
Software Verification Results
Can be confirmed with sprint code reviews or peer programming.
5
Source code is traceable to low level requirements
Software Verification Results
Verified with continuous integration and unit tests.
6
Source code is accurate and consistent
Software Verification Results
Can be confirmed with sprint code reviews or peer programming.
7
Output of software integration process is complete and correct
Software Verification Results
Software integration process verified with continuous deployment and functional tests.
8
Parameter Data Item File is correct and complete
Software Verification Cases and Procedures

Software Verification Result
Parameter Data Item File verified with continuous deployment and functional tests.
9
Verification of Parameter Data Item File is achieved.
Software Verification Results
Parameter Data Item File verified with continuous deployment and functional tests.


Table A-6: Testing of Outputs of Integration Process

Objective
Output
Agile strategy
1
Executable object code complies with high level requirements
Software Verification Cases and Procedures

Software Verification Results
User stories verified with continuous deployment and functional tests.
2
Executable object code is robust with high level requirements
Software Verification Cases and Procedures

Software Verification Results
User stories verified with continuous deployment and functional tests.
3
Executable object code complies with low level requirements
Software Verification Cases and Procedures

Software Verification Results
Verified with continuous integration and unit tests.
4
Executable object code is robust with low level requirements
Software Verification Cases and Procedures

Software Verification Results
Verified with continuous integration and unit tests.
5
Executable object code is compatible with target computer
Software Verification Cases and Procedures

Software Verification Results
User stories verified with continuous deployment and functional tests.


Table A-7: Verification of Verification Process Results

Objective
Output
Agile strategy
1
Test procedures are correct
Software Verification Cases and Procedures
Sprint review of unit tests and functional tests
2
Test results are correct and discrepancies explained
Software Verification Results
Sprint review of unit tests and functional tests
3
Test coverage of high level requirements is achieved
Software Verification Results
Sprint review of unit tests and functional tests
4
Test coverage of low level requirements is achieved
Software Verification Results
Sprint review of unit tests and functional tests
5
Test coverage of software structure (modified condition / decision coverage) is achieved
Software Verification Results
No difference
6
Test coverage of software structure (decision coverage) is achieved
Software Verification Results
No difference
7
Test coverage of software structure (statement coverage) is achieved
Software Verification Results
No difference
8
Test coverage of software structure (data coupling and control coupling) is achieved
Software Verification Results
No difference
9
Verification of additional code, that cannot be traced to Source Code, is achieved.
Software Verification Results
No difference


Table A-8: Software Configuration Management Process

Objective
Output
Agile strategy
1
Configuration items are identified
SCM Records
No difference
2
Baselines and traceability are established
Software Configuration index

SCM Records
Baseline generated by cloning the CI / CD, traceability checked not as normal with documented traceability matrixes, but by verifying traceability between annotated unit tests, functional tests and user stories.
3
Problem reporting, change control, change review, and configuration status accounting are established
Problem reports

SCM Records
No difference
4
Archive, retrieval, and release are established
SCM Records
No difference
5
Software load control is established
SCM Records
No difference
6
Software life cycle environment control is established
Software Life Cycle Environment Configuration Index

SCM Records
No difference


Table A-9: Software Quality Assurance Process

Objective
Output
Agile strategy
1
Assurance is obtained that software plans and standards are developed and reviewed for compliance with this document and for consistency.
Software Quality Assurance Records
No difference
2
Assurance is obtained that software life cycle processes comply with approved software plans.
Software Quality Assurance Records
No difference
3
Assurance is obtained that software life cycle processes comply with approved software standards.
Software Quality Assurance Records
No difference
4
Assurance is obtained that transition criteria for the software life cycle processes are satisfied.
Software Quality Assurance Records
No difference
5
Assurance is obtained that software conformity review is conducted.
Software Quality Assurance Records
No difference


Table A-10: Certification Liaison Process

Objective
Output
Agile strategy
1
Communication and understanding between the applicant and the certification authority is established
Plan for software aspects of certification
No difference
2
The means of compliance is proposed and agreement with the Plan for Software Aspects of Certification is obtained
Plan for software aspects of certification
No difference
3
Compliance substantiation is provided
Software Accomplishment Summary

Software Configuration Index
No difference


So there you have it, a proposal on what a more agile DO-178 development process might look like. I want to make it clear, none of this was developed in a vacuum or my own work, but cherry picked from various sources, which I’ll attribute to as part of the literature study of my thesis.

The question now is, will this pass certification, and can this agile process deliver software of at least the same robustness / quality as a waterfall process delivers. For this question, I will be guiding two three-man student groups to complete the same software project, one group following the waterfall model and another group following the agile model. More on that in the next post (experimental design).

A lot of this post has been quite abstract, not mentioning any specific software solutions to be used during development. The next post will detail the exact solutions the students will use in the form of PSAC and SDP documents, giving more clarity of an agile DO-178 development process.

If I have missed anything or you would like to make a suggestion, kindly do so at the discussion on HN and reddit. Comments and suggestions are very welcome.

If you are currently; or in the past have worked on DO-178 projects, it would be appreciated if you would be so kind as to take part in a quick survey about the state of DO-178 development. I will release the results of this survey shortly. Thank you for everyone who has completed the survey already.

When Is a Team Agile?

Background

This post serves as part of study on the effectiveness of the DO-178B certification in achieving correctness of implementation and safety guarantees in the presence of incomplete requirements, feature creep and complex technology stacks, also known as your typical software project.

If you are currently; or in the past have worked on DO-178 projects, it would be appreciated if you would be so kind as to take part in a survey about the state of DO-178 development.

One of the challenges in defining a more agile DO-178 process, is proving that the process is actually agile. Proving it conforms to DO-178 is easy, there is an entire specification written for that, but agile…hmmm.

So what is agile?

What a question…the software development world is a buzz with agile this and agile that, but ask any practitioner of agile software development this question, and you’ll invariably get a different answer from each and every one of them.

So to ask the question, lets first frame what I mean with agile. In my mind, agile can be seen to exist on three tiers, let’s call it the agile pyramid if you will.

Tier 1- Philosophy

First tier is the agile manifesto, where it all began off course. The manifesto states:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

The agile manifesto also talks about twelve principles of agile, but I think the above four statements captures the intention well enough.

Now the manifesto sounds great and all, but it doesn’t give you much of an example to work with on how to run an agile project. Fortunately shortly after the manifesto Kent Beck lead a software project that has been studied quite a bit, and gave rise to Extreme Programming (XP). But extreme programming was a little bit too extreme for some, and besides, project managers still didn’t have much of a clue about how this agile thing works exactly anyway, which leads us to our next tier in the pyramid.

Tier 2- Project management

In order to formalize what agile software development is exactly, legions of consultants sprang up to teach these projects managers and their programmers. What came forth was the leading agile development methodology, Scrum, but also Kanban.

Ok so scrum is a daily ritual of scrum masters, stand up meetings, user stories, time limits and velocity tracking. Oh and Kanban boards.

So what is the difference between Scrum and Kanban? Well, not much, except Scrum could be described as the anal retentive cousin of Kanban. For where scrum has roles, time limits, velocity tracking, daily meetings, scrum masters etc, Kanban has none, just a board and a team. (Ok I might just have started a religious war, don’t take this stuff too seriously).

Tier 3- Best practices

So Scrum and Kanban relies on certain agile best practices to really succeed, and if you really get down to it, this can become a very long list. I’ve listed the most important ones for my purposes (agile DO-178), but there are many more:

Ok so back to my initial question, at what point can it be said a team is agile? Is it only necessary that the spirit of agile is followed (agile manifesto), or only when the entire pyramid is in effect?

Please join the discussion at HN and reddit. Comments are very welcome.

DO-178B Crash Course

Background

This post serves part of a three part introductory primer for 3rd year computer science students as to the typical working of a software project seeking DO-178B certification. The other parts can be found here:
Agile crash course (TBC)
A more agile DO-178 (TBC)

The students will form part of a study on the effectiveness of the DO-178B certification in achieving correctness of implementation and safety guarantees in the presence of incomplete requirements, feature creep and complex technology stacks, also known as your typical software project.

If you are currently; or in the past have worked on DO-178 projects, it would be appreciated if you would be so kind as to take part in a survey about the state of DO-178 development.

What is DO-178?

First let’s start with what is DO-178? DO-178 is an international standard for the assurance of the safety of avionics software. It is published by RTCA, Incorporated, and the latest revision of the standard is known as DO-178C, although DO-178B is still widely implemented and is the subject of this post.

Although DO-178 is concerned with the software of airborne systems and equipment, various other industries concerned about safety critical software have adopted the standard to certify its software. DO-178 ties closely with DO-254 which is concerned with development of airborne electronic hardware, and SAE ARP4753 which is concerned with system level considerations of airborne equipment. There also exists other independant standards with much the same goals as DO-178, namely the IEC 61508 based standards; IEC60601-1 for medical devices; ISO26262 for automotive electronics and IEC 60880-2 for the nuclear energy industry.

This post is not concerned with the actual certification aspects of DO-178B, but with the process DO-178B enforces on software development to ensure the safety and correctness guarantees it attempts to achieve. For a better overview of the actual certification process, especially as it relates to FAA certification, look here. Also another excellent overview of DO-178B can be found in The Avionics Handbook chapter 27.

Criticality level

DO-178B specifies 5 levels of criticality to which a system can be developed. The amount of effort involved in satisfying the DO-178B certification depends on the criticality level of your software and as such is the first consideration you should have when starting your product development cycle. The criticality level is determined from the possible consequences that anomalous software would have on the aircraft.

There is very little data on the amount of additional effort that each level requires, with some sources claiming only an increase of 75% to 150%, and others claiming a 1000% increase in costs. It depends on various factors off course, such as the experience of the team, complexity of the software, software development lifecycle etc. But a relative measure of the increase in workload can be gauged from the increasing objectives to be met for each criticality level.

List of deliverables to be completed

Since DO-178B is a software quality assurance standard, not a software development standard, it does not impose any restrictions or considerations on how software is to be developed.

It does however require the following list of deliverables, with the requirements for each depending on the criticality level chosen (click on each deliverable for a description):

Deliverable
Independent
Objectives
- Plan for Software Aspects of Certification (PSAC)
-
-
The Plan for Software Aspects of Certification is the primary means used by the certification authority for determining whether an applicant is proposing a software life cycle that is commensurate with the rigor required for the level of software being developed.
- Software Development Plan (SDP)
-
-
The Software Development Plan includes the objectives, standards and software life cycle(s) to be used in the software development processes.
- Software Verification Plan (SVP)
-
-
“The Software Verification Plan is a description of the verification procedures to satisfy the software verification process objectives.
- Software Configuration Management Plan (SCMP)
-
-
The Software Configuration Management Plan establishes the methods to be used to achieve the objectives of the software configuration management (SCM) process throughout the software life cycle.
- Software Quality Assurance Plan (SQAP)
-
-
The Software Quality Assurance Plan establishes the methods to be used to achieve the objectives of the software quality assurance (SQA) process. The SQA Plan may include descriptions of process improvement, metrics, and progressive management methods.
- Software Requirements Standards (SRS)
-
-
The purpose of Software Requirements Standards is to define the methods, rules and tools to be used to develop the high-level requirements.
- Software Design Standards (SDS)
-
-
The purpose of Software Design Standards is to define the methods, rules and tools to be used to develop the software architecture and low-level requirements.
- Software Code Standards (SCS)
-
-
The purpose of the Software Code Standards is to define the programming languages, methods, rules and tools to be used to code the software.
- Software Requirements Data (SRD)
-
-
Software Requirements Data is a definition of the high-level requirements including the derived requirements.
- Software Design Description (SDD)
-
-
The Design Description is a definition of the software architecture and the low-level requirements that will satisfy the software high-level requirements.
- Source Code
-
-
This data consists of code written in source language(s) and the compiler instructions for generating the object code from the Source Code, and linking and loading data. This data should include the software identification, including the name and date of revision and/or version, as applicable.
- Executable Object Code
-
-
The Executable Object Code consists of a form of Source Code that is directly usable by the central processing unit of the target computer and is, therefore, the software that is loaded into the hardware or system.
- Software Verification Cases and Procedures (SVCP)
-
-
Software Verification Cases and Procedures detail how the software verification process activities are implemented.
- Software Verification Results (SVR)
-
-
The Software Verification Results are produced by the software verification process activities.
- Software Life Cycle Environment Configuration Index (SECI)
-
-
The Software Life Cycle Environment Configuration Index (SECI) identifies the configuration of the software life cycle environment. This index is written to aid reproduction of the hardware and software life cycle environment, for software regeneration, reverification, or software modification.
- Software Configuration Index (SCI)
-
-
The Software Configuration Index (SCI) identifies the configuration of the software product.
- Problem Reports
-
-
Problem reports are a means to identify and record the resolution to software product anomalous behavior, process non-compliance with software plans and standards, and deficiencies in software life cycle data.
- Software Configuration Management Records
-
-
The results of the SCM process activities are recorded in SCM Records. Examples include configuration identification lists, baseline or software library records, change history reports, archive records, and release records.
- Software Quality Assurance Records
-
-
The results of the SQA process activities are recorded in SQA Records. These may include SQA review or audit reports, meeting minutes, records of authorized process deviations, or software conformity review records.
- Software Accomplishment Summary (SAS)
-
-
The Software Accomplishment Summary is the primary data item for showing compliance with the Plan for Software Aspects of Certification.


That’s a lot of dead trees…

An objective is typically something like the “Software development standards are defined” or “High level requirements are verifiable”. So it is still fairly open for interpretation by the developers and the certification body. I’ll go into more detail of each objective when considering how DO-178 can be made more agile. (Observant readers will notice the total objectives does not equal those reported in the criticality table, that is because some objectives has to be included in multiple documents and so I counted them twice).

Where objectives are marked independent, it means an independent authority has to verify conformance. For this purpose quite a few consultants in the business earn their keep by evaluating compliance independently.

Software development process

DO-178B prescribes the following software development process:

  • Software requirements process
  • Software design process
  • Software development process
  • Integration process

Typically for DO-178B this is implemented through the V-model in systems engineering, also somewhat equivalent to the waterfall method in software development.

(Note this is nowhere specified in the DO-178 specification, but is what I have typically observed happens on DO-178 projects).

Traceability

Traceability’s purpose is two-fold. First is to make sure the design takes into account all the requirements set for the project. Requirements which has not been taken into account by the design are called childless requirements.

Traceability analysis must also make sure there are no additional and unneeded requirements introduced during design, as these would unnecessarily escalate the development costs. These are called orphan requirements. But it is understood that some requirements may be derived from the design decisions made and is thus not traceable to the user requirements. These additional requirements must be taken into consideration when analyzing their safety effects on the system.

For these reasons most of the documentation listed above will contain a traceability matrix towards the end of the document, indicating the parent of each requirement.

Verification

Verification is concerned with if the development is being implemented correctly according to the design, and if the integration is done correctly as designed and developed i.e. “Are we building this correctly”

Verification for DO-178B consists of two steps, namely requirements based coverage analysis where it is checked that all requirements are satisfied and tested, and also structural coverage analysis where it is checked that during testing all code paths are executed, so there is no untested code in the final product.

Lastly as part of the verification process DO-178B requires that no dead code be present in the final binary and that de-activated code (perhaps code used in another configuration of the product) cannot be accidentally executed.

For these reasons code coverage tests is required for the various levels in DO-178:

Validation

Validation is concerned with if the final product satisfies the intended use of the product i.e. “Did we build the right thing” or “Does this product actually work”. Sometimes this is not the case.

Odds and ends

DO-178B does not mandate the development process to be followed, but does focus quite a bit on the supporting functions to the development process. These include configuration management, quality assurance, certification liaison and software verification.

DO-178B lists two control categories according to which every deliverable must be configured. Control category 1 has for instance requirements such as “Protection against unauthorized changes”, “Change review” etc. Control category 2 is a relaxed list of Control category 1. As such Level A certification mandates more items be configured according to the requirements of Control Category 1, whereas the lower levels allows more items under Control Category 2. Control Category 1 can be a real pain in the …

DO-178B also focuses quite a bit on the reproducibility of the executable from the source code and ensuring its correctness. As such any tools used to produce the executable should be under configuration management, and if possible the tools (such as the compiler) should also be DO-178 certified. This also applies to off the shelve software components used with the developed software, and is the reason you get some DO-178 certified RTOS (real time operating systems) these days. Good luck getting a DO-178 certified compiler though…

Where these tools and off the shelve components do not conform to DO-178 requirements, a gap analysis should be done to determine the effort that would be required in certifying the tool or off the shelve software to DO-178. A great many times it is determined to be cheaper to develop the functionality of the tool or software component in house than to attempt to certify the already existing item.

A word about documentation conventions

DO-178B does not specify the documentation standard to be followed, but most projects do follow some or other documentation standard. The following figure is loosely based on MIL-STD-490A, although sometimes “Detail design” and “Notes” are changed into some other topic of discussion.

In the documentation, especially when talking about requirements and specifications, certain words convey additional meaning apart from their linguistic use. These words are usually capitalized.

SHALL and SHALL NOT - Indicates a mandatory requirement.

WILL and WILL NOT - Indicates a declaration of purpose or an expression of simple futurity.

SHOULD and SHOULD NOT - Indicates a non-mandatory desire, preference or recommendation.

MAY and MAY NOT - Indicates a non-mandatory suggestion or permission.

MUST and MUST NOT should be avoided as it causes confusion with the above terms.

That concludes this overview of DO-178B. It is certainly not an exhaustive analysis of DO-178B, for that you might just as well read the specification, but should prove sufficient to get students started implementing a DO-178 certified project.

If I have missed anything or you would like to make a suggestion, kindly do so at the discussion on HN and reddit. Comments and suggestions are very welcome.

I will be drilling more into DO-178, especially the 66 objectives mentioned earlier in my post A more agile DO-178 (TBC). Stay tuned.

Slow Bugs

A slow bug is one which requires substantial testing to show itself. Recently we had such a bug, which required roughly 8 hours of testing, and if you’re lucky, you’ll see it once. Horrible stuff…

This makes for extremely slow debug cycles, almost going back to the batch programming mainframe days where you run code in your head and hypothesise what the code is actually doing. It turned out our bug in this case had nothing to do with a fault in the code, but noise getting picked up on one of the JTAG select lines. Unfortunately we didn’t start by looking at the JTAG lines…

Once we had settled on this hypotheses, and implemented a fix, the question became how long should one test before you are satisfied the bug is fixed. How long before you are 99% sure, 99.9% sure, 99.999% sure?

Thinking about this problem got me thinking about bug probabilities and of how they might manifest. I theorised three types of bugs. First is bugs with a unity probability distribution function, that is it is equally likely to occur at any time during the testing procedure. This is applicable for a huge class of bugs, most importantly for bugs which doesn’t have memory. So timing bugs, race conditions, noise induced bugs etc. all fit the bill.



This is in contrast to bugs with memory, where the longer you test the more likely it is that you’ll encounter the bug. This could be memory leaks, heap fragmentation or in some cases state machine bugs. These bugs represents an interesting dilemma for us, which I’ll get back to later.





Looking at the previous two plots, I wondered if there were bugs with the following graph, bugs which will manifest early, but become less likely to manifest the longer the system runs. And it dawned on me you do get bugs like this, namely system start-up and initialization bugs. Once the system is running, it is stable, but if you reboot it many times over, every now and then the reboot will fail.



Focussing on the uniform distribution bugs, we can think of the probability of encountering a bug as a Poisson distribution. More specifically, if we encountered on average $ \frac{1}{8} $ bugs every hour, then the probability mass function of our Poisson distribution looks as follows:

That’s great and all if we wanted to know how many times we’ll see a bug while testing, but we want to know how long should we test to be sure the bug is gone (There’s a subtle difference). To calculate this we need the cumulative distribution function of f(x), i.e.

This result has a exponential distribution:

Then to calculate the amount of testing required for your confidence of say 99% we need to take the area under curve as shown:

This gives us the following answer (I calculated for 99.9% and 99.999% as well)

So this is quite interesting, to be 99% certain you have solved your bug, you need to test about 5 times longer than it would normally take for the bug to manifest. I know of many times where I only tested about 2 times longer and called the bug fixed with great confidence. For interest sake, if you only test 2 times longer, you can only be 22% confident your bug is fixed!!

Ok great so that answers our initial question, but what about the other types of bugs? Well for the third class of bugs i.e. start-up bugs, it is sufficient to think not in term of how many hours you need to run the test, but rather how many times you should run the test (reboot the system). Then this class of bug look exactly the same as the uniform distribution bug, and can be solved in much the same way, but instead of hours you get the amount of test iterations you should run.

Now I said the increasing probability bugs represents an interesting dilemma, and that is because they do not represent an Poisson distribution. Rather no amount of testing will adequately prove to any confidence level that you have solved the bug. These kind of bugs are in essence known as the halting problem in computer science. Unfortunately that one has already been proved unsolvable…

Ok so my stats was never that great, I had some help from my old statistics handbook: Engineering Statistics Montgommery et al.

*Also please note that this is not in any way supposed to be a rigorous statistical analysis.