Software Engineering

Requirements Engineering (RE)

Requirements Engineering (RE) is the process responsible to discover and document the purpose of a software. This is done to avoid misunderstandings and build the right product.

RE is a iterative and collaborative process that needs continuous reviews.

The core activities of RE are:

Stakeholder identification: Identify Stakeholders (users, customers, etc.) that will use or have interest in the product;
Requirements elicitation: Discover and extract the needs and constraints from stakeholders;

Requirements

Requirements can be broadly classified into three main categories.

Functional Requirements (FR)

The functional requirements describe the services that the system should provide and the interactions between the system and the environment.

These requirements must be implementation-independent, meaning that they should not specify how the functionality will be implemented, but rather what the system should do.

Non-Functional Requirements (NFR)

The non-functional requirements describe the quality and how well the system performs its functions. They doesn’t describe specific behaviors or functions, but might model constraints on the system.

NFRs can be measured using specific metrics.

Performance: How fast the system responds (time behavior, resource utilization, etc).
Usability: How easy the system is to learn and use.
Reliability: The system’s ability to operate without failure (e.g., uptime, fault tolerance).
Security: The system’s ability to protect against unauthorized access or data breaches.
Maintainability: How easy it is to modify or update the system.
Portability: The system’s ability to run on different platforms or environments.

Constraints

The constraints are specific technical or business requirements that limit or restrict the solution.

Examples of constraints include regulatory compliance, budget limitations, or specific technology choices.

How to write requirements

Well-written requirements are essential for a successful project. They must be clear, precise, and unambiguous.

Single Concern: Each statement should focus on a single, atomic requirement. Avoid combining multiple ideas into one sentence.
- Bad: “The system shall allow users to log in and view their profile.”
- Good: “The system shall allow users to log in.” and “The system shall allow users to view their profile.”
Not Ambiguous: The requirement should not be open to different interpretations. Use specific, technical language and avoid vague terms.
- Bad: “The system shall be fast.”
- Good: “The system shall respond to user requests within 2 seconds.”
Testable: It must be possible to verify whether the requirement has been met. This requires quantifiable metrics.
- Bad: “The system shall be user-friendly.”
- Good: “The system shall have a user satisfaction rating of at least 85% in user surveys.”
Achievable: The requirement must be realistic and within the scope of what the software can accomplish on its own. It shouldn’t depend on external, uncontrollable factors.
- Bad: “The system depends on Adobe Acrobat to function.”

Context

RE is responsible to to define the phenomena (the observable events) that are relevant to the project.

World: This is the real-world environment in which the machine operates. It includes events and properties that happen in the environment, not observable by the machine. Some phenomena could goals (G) and domains properties (D);
- Example: user thoughts, weather conditions, etc.;
Machine: This is the part of the system that is being developed. It’s the software and hardware building. It includes events and properties that are observable by the machine;
- Example: internal states, computations, etc.;
Shared Phenomena: interaction between the world and the machine. Here reside the requirements (R).
- Machine controlled: the machine perform an action that the world can observe (e.g., display a message, interacting with external services);
- World controlled: the world can perform an action that the machine can observe (e.g., user inputs a command).

A requirement is complete iff it satisfy (logically entails) the goal in the context of the domain.

\text{R and D} \models G

Alloy

Alloy is a logic-based formal notation used to specify models of a system. It allows developers to describe the structure and behavior of a system and perform automated analysis to check for consistency and correctness.

Alloy uses a declarative notation, meaning it describes what the system should do (constraints and relationships) rather than how to do it.

The analysis is performed by the Alloy Analyzer, which translates the model into a boolean formula and uses a SAT solver to find instances (examples) or counterexamples.

Core Concepts

Signatures (`sig`)

Signatures define the types of objects (atoms) in the system. They are similar to classes in OOP but represent sets of atoms.

sig Name, Addr {}
sig Book {
    addr: Name
}

abstract sig: A signature that has no atoms of its own (must be extended).
one sig: A signature that contains exactly one atom (singleton).
extends: Creates a subset of another signature (disjoint by default).

Multiplicity

Fields in signatures define relations. Multiplicity keywords constrain the size of these relations:

set: Any number (default).
one: Exactly one.
some: One or more (at least one).
lone: Zero or one (optional).

Relations

Relations connect atoms from one signature to atoms of another (or the same) signature. They can be unary, binary, or n-ary.

sig Object {}

sig Chart {
    object: set Object -> set Int
}

Facts (`fact`)

Facts are constraints that are assumed to be always true in the model. They restrict the set of possible instances.

fact NoSelfReference {
    all n: Node | n !in n.next
}

They can also be declared without a name or with the definition of a signature:

sig Node {
    next: lone Node
} {
    next != this
}

Predicates (`pred`)

Predicates are parameterized constraints that can be reused. They are often used to describe operations or state transitions. They are not automatically enforced but can be invoked.

pred add [b: Book, a: Addr] {
    b'.addr = b.addr + a
}

Functions (`fun`)

Functions are expressions that return a value (a set or relation) rather than a boolean.

fun addrCount [b: Book]: Int {
    #b.addr
}

Assertions (`assert`)

Assertions are properties that the system is expected to satisfy. The analyzer checks if these hold given the facts.

assert AddIdempotent {
    all b: Book, a: Addr |
        add[b, a] implies b'.addr = b.addr + a
}

Analysis Commands

run: Asks the analyzer to find an instance where a predicate is true. Used for simulation and validation (checking if a scenario is possible).
```
run add for 3 but 1 Book
```
check: Asks the analyzer to find a counterexample to an assertion. Used for verification.
```
check AddIdempotent for 3
```

Operators

Set Operators: + (union), & (intersection), - (difference), in (subset).
Relational Join (.): Navigates relations (similar to dereferencing). a.r joins atom a with relation r.
Quantifiers:
- all x: S | ... (For all)
- some x: S | ... (There exists)
- no x: S | ... (There exists none)
- one x: S | ... (There exists exactly one)
- lone x: S | ... (There exists at most one)
Cardinality (#): Returns the number of atoms in a set.
Transitive Closure (^): Computes the transitive closure of a relation.
Reflexive-Transitive Closure (*): Computes the reflexive-transitive closure of a relation.
Transpose (~): Reverses the direction of a relation.

Temporal Logic (Alloy 6)

Alloy 6 introduces support for Linear Temporal Logic (LTL), allowing the modeling of dynamic systems where state changes over time.

Mutable Signatures and Fields (`var`)

To model changing state, signatures and fields can be marked as var.

var sig State {}
sig System {
    var status: one State
}

Temporal Operators

These operators are used to express properties over execution traces.

Operators about the future:

always: The formula must hold in the current state and all future states.
eventually: The formula must hold in the current state or some future state.
after: The formula must hold in the next state.

Operators about the past:

historically: The formula must have held in all past states.
once: The formula must have held in some past state.
before: The formula must have held in the previous state.

Prime Operator (`'`)

The prime symbol (') is used to refer to the value of a variable in the next state.

fact Transition {
    always ( some s: System | s.status' != s.status )
}

Software Design

Software Design is the phase where we decide how the system will be implemented. It bridges the gap between requirements and code by making high-level decisions about the system’s structure.

Design is not about “perfection” but it’s a negotiation between multiple tradeoffs (performance, maintainability, scalability, etc).

The workflow is:

stateDiagram
  state HighPhases {
    FeasibilityStudy: Feasibility Study
    RequirementsAnalysis: Requirements Analysis
    ArchitecturalDesign: Architectural Design
  }
  state LowPhases {
    CodingAndUnitTesting: Coding and Unit Testing
    IntegrationAndTesting: Integration and Testing
    Deployment
    Maintenance
  }

  [*] --> HighPhases
  FeasibilityStudy --> RequirementsAnalysis
  RequirementsAnalysis --> ArchitecturalDesign
  HighPhases --> LowPhases
  CodingAndUnitTesting --> IntegrationAndTesting
  IntegrationAndTesting --> Deployment
  Deployment --> Maintenance

To reduce the complexity the system is looked at different views:

Module Structure (Static View)

The module structure describe how the system is decomposed into Implementation Units (modules, files, packages, libraries, etc) and how they relate to each other.

This view is used to evaluate:

Cohesion: how closely related and focused the responsibilities of a single module are.
Coupling: the degree of dependence between modules. Low coupling is desirable as it reduces the impact of changes in one module on others.
Planning the implementation phase.

The module structure can be represented with:

Package Diagrams: show the organization of the system into packages and their dependencies.
Class Diagrams: show the classes within each package and their relationships.

Component-and-Connector (C&C) Structure (Runtime View)

The C&C structure describe how the system behaves at runtime.

The view is separated between:

Components: are the processing elements (modules, services, etc)
Connectors: the mean of communication between components (APIs, message queues, etc).

This view is used to evaluate:

Performance: identify bottleneck and scalability issues;
Reliability: identify single point of failure;
Security: identify access points and vulnerabilities.

The C&C structure can be represented with:

Component Diagrams: show the components and their interactions.
Sequence Diagrams: show the dynamic interactions between components over time.

Deployment Structure (Physical View)

The deployment structure describe how the system is physically deployed on hardware and network infrastructure.

The components mapped are:

Hardware: physical devices (servers, routers, etc);
Execution Environment: software platforms (OS, containers, VMs, etc);
Networking: network devices and configurations (Firewall, Load Balancer, etc).

This is crucial for non-functional requirements like performance, availability, and security.

The deployment structure can be represented with:

Deployment Diagrams: show the physical nodes and their relationships.

Architecture Style

Client-Server

The Client-Server architecture is a distributed structure that divides the system into two main components: the server (provider of services) and the client (consumer of services).

There are three main layers:

Presentation: User interface and user experience.
Application Logic: Business rules and processing.
Data Management: Storage and retrieval of data.

The client-server model can be organized based on the distribution of workload:

Thick Client: The client performs most of the processing, while the server mainly handles data storage and retrieval. This reduces server load and improves responsiveness but requires more powerful client devices.
Thin Client: The client relies heavily on the server for processing and logic, handling mainly the user interface and input. This simplifies client devices but increases server load and network traffic.

Concurrency Models (Handling Multiple Requests)

Servers must handle multiple requests simultaneously. Common approaches include:

Request per Process

Traditional servers (like older versions of Apache) handle concurrency by forking a new process (or thread) for each incoming request. This isolates requests but is resource-intensive and inefficient under high load due to context switching overhead.

Worker Pool

Modern servers (like Nginx) use a worker pool.

A fixed number of workers handle requests from a shared queue. This prevents resource exhaustion and handles high concurrency more efficiently, though it may introduce availability issues if the queue becomes full.

REST (Representational State Transfer)

REST is an architectural style for distributed systems, commonly used over HTTP.

Key constraints include:

Statelessness: Each request from the client must contain all the information needed to process it. The server does not store session state between requests.
Resource-Based: Data is modeled as resources identified by URIs.

Data is serialized into formats like JSON, XML, or Protocol Buffers. These formats vary in:

Expressiveness: Ability to represent complex data structures.
Interoperability: Support across different languages and platforms.
Performance: Serialization/deserialization speed and message size.
Transparency: Human readability.

Error Handling

Error handling is decoupled. The server returns standard HTTP status codes (e.g., 400 Bad Request, 500 Internal Server Error) with an optional error body, and the client is responsible for handling them appropriately.

Versioning

To maintain backward compatibility, APIs should be versioned (e.g., /api/v1/resource). This allows introducing new features without breaking existing clients.

Interface Documentation

Documentation is crucial for developers consuming the API. Standard specifications like OpenAPI (formerly Swagger) allow describing:

Endpoints and HTTP methods.
Input parameters and request bodies.
Response schemas.
Authentication mechanisms.

Tools can generate documentation, client SDKs, and server stubs from these specifications.

Event-Driven Architecture

Event-Driven Architecture is based on the producer-consumer pattern. Components communicate by emitting and reacting to events.

Producers publish events to an Event Bus (or Broker).
Consumers subscribe to specific events they are interested in.

This decouples producers from consumers; they do not need to know about each other, only about the event format.

Delivery Models

Push: The event bus pushes events to consumers immediately.
Pull: Consumers poll the event bus for new messages at their own pace, allowing time-decoupling.

Delivery Semantics

At most once: The event is delivered zero or one time (fire-and-forget). Low overhead, but data loss is possible.
At least once: The event is delivered one or more times. Ensures delivery but requires consumers to handle duplicates (idempotency).
Exactly once: The event is delivered exactly once. High overhead and latency, difficult to achieve in distributed systems.

Kafka

Apache Kafka is a popular distributed event streaming platform. It uses a log-based approach:

Topics: Categories where records are published.
Partitions: Topics are split into partitions for scalability and parallelism.
Brokers: Servers that store data.

Kafka uses a pull mechanism, allowing consumers to process events at their own speed. It ensures fault tolerance through replication.

Microservices

Microservices architecture structures an application as a collection of loosely coupled, independently deployable services. This contrasts with a Monolithic architecture, where all components are bundled into a single unit.

The main advantages of microservices include:

Scalability: Services can be scaled independently based on demand (fine-grained scaling), unlike monoliths where the entire application must be replicated.
Decoupling: Services are isolated; a failure in one service doesn’t necessarily bring down the whole system.
Agility: Smaller, separate codebases allow different teams to work in parallel using different technologies.

The main components of a microservices architecture are:

Data Store: Each microservice owns its database to ensure loose coupling (Database-per-service pattern).
Business Logic: Each service focuses on a specific business capability (Single Responsibility Principle).
Interface: Services communicate via well-defined APIs (REST, gRPC) or messaging.

Service Discovery (Location Transparency)

In a dynamic environment, service instances scale up/down and change IP addresses. Service Discovery allows clients to locate services without hardcoding addresses.

Registration: Services register themselves with the Service Discovery upon startup.
Discovery: Clients query the Service Discovery to get the location of a service.
Health Checks: The Service Discovery monitors service health via heartbeats. If a service stops sending heartbeats, it is removed from the registry.

Key considerations:

Availability: The registry must be highly available (often replicated).
Load Balancing: the registry should balance requests among multiple instances.
Resilience: The system should handle failures gracefully.

Resilience Patterns

To detect failures each service is called through a Circuit Breaker, a proxy that monitors calls to the service. It prevents the application from trying to execute an operation that is likely to fail.

Closed: Normal operation. Requests pass through.
Open: After a threshold of failures is reached, the circuit opens, and requests fail immediately (fail-fast) without waiting for timeouts.
Half-Open: After a timeout, a limited number of requests are allowed to pass to test if the service has recovered.

stateDiagram
    [*] --> Closed
    Closed --> Open: failures > threshold
    Open --> HalfOpen: timeout
    HalfOpen --> Closed: success
    HalfOpen --> Open: failure

Security Patterns (API Gateway)

Directly exposing microservices to clients creates security and complexity issues. An API Gateway acts as a single entry point for all clients.

Authentication & Authorization: Verifies identity and permissions before routing requests.
Routing: Forwards requests to the appropriate microservice.
Rate Limiting: Protects services from overuse.

Note: The Gateway can be a single point of failure, so it is usually replicated.

Communication & Coupling

Communication between services can be:

Synchronous (e.g., HTTP/REST): The client waits for a response. Tends to increase coupling and latency.
Asynchronous (e.g., Message Queues): The client sends a message and continues. Decouples services and handles bursts of traffic.

Using queues (like RabbitMQ or Kafka) buffers requests and decouples the sender from the receiver.

One-way: Fire-and-forget.
Two-way: Requires a response queue.

Availability

Availability is the probability that a system is operational and able to perform its required function at a given instant of time. It is a measure of the system’s readiness.

Failure Lifecycle

When a failure occurs, the recovery process involves several phases:

Detection Time: Time between the occurrence of the failure and its detection.
Response Time: Time to diagnose the issue and decide on a recovery strategy.
Repair Time: Time to fix the issue (replace component, restart service, etc.).
Recovery Time: Time to restore the system to normal operation (sync state, warm up caches).

Metrics

MTTF (Mean Time To Failure): The average time the system runs successfully before failing (Average Uptime).
MTTR (Mean Time To Repair): The average time required to repair the system after a failure (Average Downtime).
MTBF (Mean Time Between Failures): The average time between two consecutive failures.

The availability A is calculated as:

A = \frac{MTTF}{MTBF} = \frac{MTTF}{MTTF + MTTR}

The “Nines” Notation

Availability is often expressed in “nines”:

Availability	Downtime per year
90% (1 nine)	36.5 days
99% (2 nines)	3.65 days
99.9% (3 nines)	8.76 hours
99.99% (4 nines)	52.56 minutes
99.999% (5 nines)	5.26 minutes
99.9999% (6 nines)	31.5 seconds

The availability of a composite system depends on how its components are connected:

Serial Connection: The system fails if any component fails. A_{serial} = \prod_{i=1}^n A_i = A_1 \times A_2 \times \dots \times A_n (Availability decreases)
Parallel Connection: The system fails only if all components fail. A_{parallel} = 1 - \prod_{i=1}^n (1 - A_i) = 1 - (1 - A_1) \times (1 - A_2) \times \dots \times (1 - A_n) (Availability increases)

Availability Tactics

Replication

Replication involves using multiple instances of a component to ensure continuity.

Active Redundancy: All replicas process the same input in parallel. If one fails, others are already up-to-date.
- TMR (Triple Modular Redundancy): Three replicas process input, and a voter determines the result (majority wins). Handles value faults.
Passive Redundancy: Only the primary replica processes input. Secondary replicas take over upon failure.
- Hot Spare: The backup is fully synchronized and ready to switch over immediately.
- Warm Spare: The backup is running but needs to load the latest state before taking over.
- Cold Spare: The backup is not running and must be started and synchronized.

If the system is stateless, switching is immediate. If stateful, state synchronization is needed.

Forward Error Recovery

In forward error recovery, the system is designed to continue operating correctly even in the presence of faults.

From the normal state, the system goes to the failure state when a fault occurs. The system then detects the failure and transitions to a degraded state, where it takes corrective actions to return to the normal state.

Design Document (DD) Structure

The Design Document (DD) describes the high-level design decisions and how the system will be implemented to satisfy the requirements specified in the RASD.

Introduction
1. Scope: Defines the boundaries of the system and what is included/excluded.
2. Definitions: Glossary of terms used in the document.
3. Reference Documents: Lists related documents (e.g., RASD, project plan).
4. Overview: High-level summary of the system’s design and structure.
Architectural Design
1. Overview: Informal description of high-level components and their interactions.
2. Component View: Static component diagrams showing the system’s modules and their relationships.
3. Deployment View: Deployment diagrams illustrating physical nodes, hardware, and software environments.
4. Component Interfaces: Signatures and descriptions of the interfaces between components.
5. Runtime View: Dynamic interactions described via sequence diagrams.
6. Selected Architectural Styles and Patterns: Justification and description of chosen styles (e.g., client-server, microservices).
7. Other Design Decisions: Additional decisions impacting the design (e.g., trade-offs, constraints).
User Interface Design: Mockups or wireframes of the UI, refining the RASD from low to mid-fidelity prototypes.
Requirements Traceability: Mapping between requirements (from RASD) and design components, often using a traceability matrix.
Implementation, Integration, and Test Plan: Defines the order of component implementation (sequential/parallel), integration strategies, and testing approaches.
Effort Spent: Summary of time and resources expended during design activities.
References: Citations for external sources, standards, or tools used.

Verification and Validation

Verification and Validation (V&V) are independent procedures that are used together for checking that a product meets requirements and specifications and that it fulfills its intended purpose.

Verification: It’s an internal process that ensures the product is built correctly according to the specifications.
Validation: It’s an external process that ensures the right product is built for the user.

The chain of causality for software problems is:

Error (Mistake): A human action that produces an incorrect result.
Defect (Fault/Bug): An imperfection or deficiency in a product where it does not meet its requirements or specifications.
Failure: An event in which a system or system component does not perform a required function within specified limits.

Static Analysis

Static Analysis is the process of evaluating a system or component based on its form, structure, content, or documentation, without executing the code.

This can be achieved using linters, type checkers, formal verification tools.

Some common defects detected by static analysis tools include:

Memory leaks, buffer overflows, null pointer dereferences.
Concurrency issues (race conditions, deadlocks).
Security vulnerabilities.
Coding standard violations.

Since checking non-trivial properties of programs is undecidable (Rice’s Theorem), static analysis tools must approximate. They need a balance precision (minimize false positives) and performance.

Data Flow Analysis

Data flow analysis gathers information about the possible set of values calculated at various points in a computer program. It operates on the Control Flow Graph (CFG).

The CFG is a directed graph that represents all paths that might be traversed through a program during its execution. It is composed of:

Nodes: Represent basic blocks (a sequence of instructions with a single entry and exit point).
Edges: Represent control flow between basic blocks (e.g., jumps, branches).

Reaching Definitions Analysis

CFG can be used for Reaching Definitions Analysis that determines which definitions of a variable v may reach a point p in the code without being overwritten (killed).

For a basic block n:

Gen[n]: Set of definitions generated within block n.
Kill[n]: Set of definitions in the program that are overwritten by definitions in n.
In[n]: Set of definitions reaching the entry of n.
Out[n]: Set of definitions reaching the exit of n.

The data flow equations are: In[n] = \bigcup_{p \in pred(n)} Out[p] Out[n] = Gen[n] \cup (In[n] - Kill[n])

From the Reaching Definitions, we can derive the liveness of variables: a variable v is “live” at a point p, meaning it holds a value that may be needed in the future.

It is also possible to build def-use chains and use-def chains:

Def-Use (DU): Connects a definition of a variable to all its possible uses.
Use-Def (UD): Connects a use of a variable to all its possible definitions.

These chains are essential for optimizations (like dead code elimination) and bug finding (like use-before-define).

Symbolic Execution

Symbolic Execution is a program analysis technique that executes programs with symbolic inputs instead of concrete values. This allows to analyze reachability (which parts of the code can be executed), path feasibility (which paths are possible to take), and generate test cases.

During the symbolic execution, the program is executed symbolically. During the execution the path condition (logical formula that represents the constraints on the inputs that must hold for the execution to follow a particular path) is built.

When a branch is encountered (e.g., an if statement), the symbolic execution forks into two paths:

True branch: The path condition is updated to include the condition of the branch.
False branch: The path condition is updated to include the negation of the condition.

At the end of each path is possible to analyze the path condition to determine if the path is feasible (i.e., if there exists an input that satisfies the path condition) or is infeasible.

Dynamic Analysis

Dynamic Analysis is the process of evaluating a system or component based on its execution behavior. It involves running the program with specific inputs and observing its behavior to identify defects.

The main goals are:

Make the system fail (testing).
Monitor the system during execution (profiling, monitoring).
Trigger different parts of the code (code coverage).
Interaction between components (integration testing).
Ensure that past defects do not reoccur (regression testing).

A test is composed of:

Setup: Prepare the system for testing (initialize variables, configure environment).
Execution: Run the system with specific inputs.
Teardown: Record the results and clean up the environment.

Testing

Test cases can be generated with two main approaches:

Black-box Testing: The tester does not have knowledge of the internal structure of the system. Test cases are derived from requirements and specifications.
White-box Testing: The tester has knowledge of the internal structure of the system. Test cases are derived from the code itself.

The generation can be manual or automated.

Concolic Execution

Concolic Execution (CONCrete + symbOLIC) is a hybrid testing technique that combines concrete execution with symbolic execution. It aims to leverage the strengths of both approaches to improve test coverage and defect detection.

The process involves executing the program with concrete inputs while simultaneously tracking symbolic expressions for the program’s variables. This allows the generation of new test inputs that explore different execution paths.

Concrete Execution: The program is executed with specific concrete inputs, and the actual values of variables are recorded.
Symbolic Tracking: Alongside the concrete execution, symbolic expressions are maintained for the program’s variables.
Path Exploration: When a branch is encountered, the path condition is updated symbolically. New test inputs are generated by negating the path condition of the taken branch, allowing exploration of alternative paths.

Fuzzing

Fuzzing is an automated testing technique that involves providing random or semi-random inputs to a program to discover vulnerabilities, crashes, or unexpected behavior. The main idea is to stress-test the system by feeding it a large volume of inputs, some of which may be malformed or unexpected.

Fuzzing can be classified into:

Simple Fuzzing: Inputs are generated randomly without any knowledge of the program’s structure or expected input format. This approach is simple but may not effectively explore deep code paths.
Mutation Fuzzing: Inputs are generated based on an understanding of the program’s expected input format and structure. This can involve mutating valid inputs to create new test cases that are more likely to trigger interesting behavior.

Fuzzing is good at finding buffer overflows, missing input validation, rough edge cases, and other security vulnerabilities.

The best practice is to use fuzzing along with runtime memory checks (like AddressSanitizer) to detect memory corruption issues.

Search-Based Testing

Search-Based Testing is an automated testing technique generates tests based on an objective (coverage, reachability, etc.).

The distance to the objective is calculated using a fitness function. The fitness function assigns a score to each test case based on how close it is to achieving the testing objective.

Unit Testing

Unit Testing is a manual process aimed at verifying the correctness of individual units or components of a software system in isolation.

This can be achieved using some strategies:

Big Bang: Test all components together after they are all implemented. This approach can make it difficult to isolate defects, but avoid using stubs and drivers.
Top-Down Integration: Start testing from the top-level modules (modules that depend on lower-level modules) and progressively integrate and test lower-level modules. Stubs are used to simulate lower-level modules that are not yet implemented.
Bottom-Up Integration: Start testing from the lowest-level modules (modules that do not depend on other modules) and progressively integrate and test higher-level modules. Drivers are used to simulate higher-level modules that are not yet implemented.
Thread Integration: Test based on features, not on module. Testing involves all the modules required to implement a specific feature.

E2E Testing

End-to-End (E2E) Testing is a testing procedure that validates the complete and integrated system to ensure that it meets the specified requirements and behaves as expected from start to finish.

This allows to test the functions, the performances, the load handling, and the security of the entire system.

Project Management

Project Management is the discipline of planning, organizing, and managing resources to achieve specific goals within a defined timeline and budget.

It is composed of different phases:

Initiation

Project Initiation is the first phase of the project management lifecycle. It involves defining the project at a broad level and obtaining authorization to start the project.

Planning

Project Planning is the second phase of the project management lifecycle. It involves developing a detailed project plan that outlines how the project will be executed, monitored, and controlled.

Scheduling

This includes defining tasks, milestones, timelines, and deliverables.

The work is divided using a Work Breakdown Structure (WBS) that decomposes the project into smaller, manageable components.

The tasks are organized using the Dependence Diagram Method (PDM) that represents tasks as nodes and dependencies as directed edges. This allows to identify the Critical Path (the longest sequence of dependent tasks that determines the minimum project duration).

They can be tracked using Gantt Charts that represent tasks over time, showing dependencies and progress.

Risk Management

Risk Management is the process of identifying, assessing, and mitigating risks that could impact the success of a project.

Each risk is form using this format: “If <cause> happens, then <consequence> will occur, for <stakeholder>.”

Each risk has its own:

Likelihood: Probability of occurrence (Improbable, Unlikely, Likely, VeryLikely, Near Certainty).
Impact: Severity of consequences (Negligible, Marginal, Moderate, Critical, Catastrophic).

Each risk must have a Mitigation Strategy to reduce its likelihood or impact.

Effort Estimation

Effort estimation is the process of predicting the amount of effort (time, resources) required to complete a project or a task.

This can be done using Function Point Analysis (FPA) that estimates the size of the software based on its functionality.

The application is decomposed into its functional components:

External Inputs (EI): User inputs that provide data to the system.
External Outputs (EO): Outputs generated by the system for the user.
External Inquiries (EQ): User-initiated requests for information without changing the system state.
Internal Logical Files (ILF): Data stored within the system that is maintained by the system.
External Interface Files (EIF): Data used by the system but maintained by external systems.

Each component is assigned a complexity level (Low, Average, High).

Than there is a table that converts the components into Unadjusted Function Points (UFP) based on their type and complexity.

The total UFP is calculated by summing the function points of all components.

Execution, Monitoring, and Control

This phase involves carrying out the project plan, tracking progress, and making adjustments as necessary to ensure the project stays on track.

At the beginning of the project a time and cost baseline is established to measure performance. All the variable are converted in *Earned Value (EV) that represents the value of work actually performed up to a specific point in time.

The main metrics used are:

Planned Value (PV): The budgeted cost for the work scheduled to be completed by a specific date.
Actual Cost (AC): The actual cost incurred for the work completed by a specific date.
Earned Value (EV): The budgeted cost for the work actually completed by a specific date.
Budget at Completion (BAC): The total budget allocated for the project.

From these metrics, we can derive, from the schedule point of view:

Schedule Variance (SV): The difference between the earned value and the planned value (SV = EV - PV)
Schedule Performance Index (SPI): The ratio of earned value to planned value (SPI = \frac{EV}{PV})

From the cost point of view:

Cost Variance (CV): The difference between the earned value and the actual cost (CV = EV - AC)
Cost Performance Index (CPI): The ratio of earned value to actual cost (CPI = \frac{EV}{AC})

Those metrics help to identify if the project is ahead/behind schedule, under/over budget, and how it is expected to perform in the future (Cost Estimated At Complete (EAC)).

The EAC can be calculated using different formulas based on the situation:

Keeping the same cost performance: EAC = \frac{BAC}{CPI}
Assuming future work will be performed at the planned rate: EAC = AC + (BAC - EV)
Assuming future work will be performed at a new rate: EAC = \frac{AC + BAC - EV}{CPI \times SPI}

Closing

The project closing phase involves finalizing all project activities, delivering the completed product to the client, and formally closing the project.

This phase is important to ensure that all project objectives have been met, lessons learned are documented, and resources are released for future projects.

Indice