Software Engineering — Appunti TiTilda

Indice

Requirements Engineering (RE)

Requirements Engineering (RE) is the process responsible to discover and document the purpose of a software. This is done to avoid misunderstandings and build the right product.

RE is a iterative and collaborative process that needs continuous reviews.

The core activities of RE are:

Requirements

Requirements can be broadly classified into three main categories.

Functional Requirements (FR)

The functional requirements describe the services that the system should provide and the interactions between the system and the environment.

These requirements must be implementation-independent, meaning that they should not specify how the functionality will be implemented, but rather what the system should do.

Non-Functional Requirements (NFR)

The non-functional requirements describe the quality and how well the system performs its functions. They doesn’t describe specific behaviors or functions, but might model constraints on the system.

NFRs can be measured using specific metrics.

Constraints

The constraints are specific technical or business requirements that limit or restrict the solution.

Examples of constraints include regulatory compliance, budget limitations, or specific technology choices.

How to write requirements

Well-written requirements are essential for a successful project. They must be clear, precise, and unambiguous.

Context

RE is responsible to to define the phenomena (the observable events) that are relevant to the project.

A requirement is complete iff it satisfy (logically entails) the goal in the context of the domain.

\text{R and D} \models G

Alloy

Alloy is a logic-based formal notation used to specify models of a system. It allows developers to describe the structure and behavior of a system and perform automated analysis to check for consistency and correctness.

Alloy uses a declarative notation, meaning it describes what the system should do (constraints and relationships) rather than how to do it.

The analysis is performed by the Alloy Analyzer, which translates the model into a boolean formula and uses a SAT solver to find instances (examples) or counterexamples.

Core Concepts

Signatures (sig)

Signatures define the types of objects (atoms) in the system. They are similar to classes in OOP but represent sets of atoms.

sig Name, Addr {}
sig Book {
    addr: Name
}

Multiplicity

Fields in signatures define relations. Multiplicity keywords constrain the size of these relations:

Relations

Relations connect atoms from one signature to atoms of another (or the same) signature. They can be unary, binary, or n-ary.

sig Object {}

sig Chart {
    object: set Object -> set Int
}

Facts (fact)

Facts are constraints that are assumed to be always true in the model. They restrict the set of possible instances.

fact NoSelfReference {
    all n: Node | n !in n.next
}

They can also be declared without a name or with the definition of a signature:

sig Node {
    next: lone Node
} {
    next != this
}

Predicates (pred)

Predicates are parameterized constraints that can be reused. They are often used to describe operations or state transitions. They are not automatically enforced but can be invoked.

pred add [b: Book, a: Addr] {
    b'.addr = b.addr + a
}

Functions (fun)

Functions are expressions that return a value (a set or relation) rather than a boolean.

fun addrCount [b: Book]: Int {
    #b.addr
}

Assertions (assert)

Assertions are properties that the system is expected to satisfy. The analyzer checks if these hold given the facts.

assert AddIdempotent {
    all b: Book, a: Addr |
        add[b, a] implies b'.addr = b.addr + a
}

Analysis Commands

Operators

Temporal Logic (Alloy 6)

Alloy 6 introduces support for Linear Temporal Logic (LTL), allowing the modeling of dynamic systems where state changes over time.

Mutable Signatures and Fields (var)

To model changing state, signatures and fields can be marked as var.

var sig State {}
sig System {
    var status: one State
}

Temporal Operators

These operators are used to express properties over execution traces.

Operators about the future:

Operators about the past:

Prime Operator (')

The prime symbol (') is used to refer to the value of a variable in the next state.

fact Transition {
    always ( some s: System | s.status' != s.status )
}

Software Design

Software Design is the phase where we decide how the system will be implemented. It bridges the gap between requirements and code by making high-level decisions about the system’s structure.

Design is not about “perfection” but it’s a negotiation between multiple tradeoffs (performance, maintainability, scalability, etc).

The workflow is:

stateDiagram
  state HighPhases {
    FeasibilityStudy: Feasibility Study
    RequirementsAnalysis: Requirements Analysis
    ArchitecturalDesign: Architectural Design
  }
  state LowPhases {
    CodingAndUnitTesting: Coding and Unit Testing
    IntegrationAndTesting: Integration and Testing
    Deployment
    Maintenance
  }

  [*] --> HighPhases
  FeasibilityStudy --> RequirementsAnalysis
  RequirementsAnalysis --> ArchitecturalDesign
  HighPhases --> LowPhases
  CodingAndUnitTesting --> IntegrationAndTesting
  IntegrationAndTesting --> Deployment
  Deployment --> Maintenance

To reduce the complexity the system is looked at different views:

Module Structure (Static View)

The module structure describe how the system is decomposed into Implementation Units (modules, files, packages, libraries, etc) and how they relate to each other.

This view is used to evaluate:

The module structure can be represented with:

Component-and-Connector (C&C) Structure (Runtime View)

The C&C structure describe how the system behaves at runtime.

The view is separated between:

This view is used to evaluate:

The C&C structure can be represented with:

Deployment Structure (Physical View)

The deployment structure describe how the system is physically deployed on hardware and network infrastructure.

The components mapped are:

This is crucial for non-functional requirements like performance, availability, and security.

The deployment structure can be represented with:

Architecture Style

Client-Server

The Client-Server architecture is a distributed structure that divides the system into two main components: the server (provider of services) and the client (consumer of services).

There are three main layers:

The client-server model can be organized based on the distribution of workload:

Concurrency Models (Handling Multiple Requests)

Servers must handle multiple requests simultaneously. Common approaches include:

Request per Process

Traditional servers (like older versions of Apache) handle concurrency by forking a new process (or thread) for each incoming request. This isolates requests but is resource-intensive and inefficient under high load due to context switching overhead.

Worker Pool

Modern servers (like Nginx) use a worker pool.

A fixed number of workers handle requests from a shared queue. This prevents resource exhaustion and handles high concurrency more efficiently, though it may introduce availability issues if the queue becomes full.

REST (Representational State Transfer)

REST is an architectural style for distributed systems, commonly used over HTTP.

Key constraints include:

Data is serialized into formats like JSON, XML, or Protocol Buffers. These formats vary in:

Error Handling

Error handling is decoupled. The server returns standard HTTP status codes (e.g., 400 Bad Request, 500 Internal Server Error) with an optional error body, and the client is responsible for handling them appropriately.

Versioning

To maintain backward compatibility, APIs should be versioned (e.g., /api/v1/resource). This allows introducing new features without breaking existing clients.

Interface Documentation

Documentation is crucial for developers consuming the API. Standard specifications like OpenAPI (formerly Swagger) allow describing:

Tools can generate documentation, client SDKs, and server stubs from these specifications.

Event-Driven Architecture

Event-Driven Architecture is based on the producer-consumer pattern. Components communicate by emitting and reacting to events.

This decouples producers from consumers; they do not need to know about each other, only about the event format.

Delivery Models

Delivery Semantics

Kafka

Apache Kafka is a popular distributed event streaming platform. It uses a log-based approach:

Kafka uses a pull mechanism, allowing consumers to process events at their own speed. It ensures fault tolerance through replication.

Microservices

Microservices architecture structures an application as a collection of loosely coupled, independently deployable services. This contrasts with a Monolithic architecture, where all components are bundled into a single unit.

The main advantages of microservices include:

The main components of a microservices architecture are:

Service Discovery (Location Transparency)

In a dynamic environment, service instances scale up/down and change IP addresses. Service Discovery allows clients to locate services without hardcoding addresses.

  1. Registration: Services register themselves with the Service Discovery upon startup.
  2. Discovery: Clients query the Service Discovery to get the location of a service.
  3. Health Checks: The Service Discovery monitors service health via heartbeats. If a service stops sending heartbeats, it is removed from the registry.

Key considerations:

Resilience Patterns

To detect failures each service is called through a Circuit Breaker, a proxy that monitors calls to the service. It prevents the application from trying to execute an operation that is likely to fail.

stateDiagram
    [*] --> Closed
    Closed --> Open: failures > threshold
    Open --> HalfOpen: timeout
    HalfOpen --> Closed: success
    HalfOpen --> Open: failure

Security Patterns (API Gateway)

Directly exposing microservices to clients creates security and complexity issues. An API Gateway acts as a single entry point for all clients.

Note: The Gateway can be a single point of failure, so it is usually replicated.

Communication & Coupling

Communication between services can be:

Using queues (like RabbitMQ or Kafka) buffers requests and decouples the sender from the receiver.

Availability

Availability is the probability that a system is operational and able to perform its required function at a given instant of time. It is a measure of the system’s readiness.

Failure Lifecycle

When a failure occurs, the recovery process involves several phases:

  1. Detection Time: Time between the occurrence of the failure and its detection.
  2. Response Time: Time to diagnose the issue and decide on a recovery strategy.
  3. Repair Time: Time to fix the issue (replace component, restart service, etc.).
  4. Recovery Time: Time to restore the system to normal operation (sync state, warm up caches).

Metrics

The availability A is calculated as:

A = \frac{MTTF}{MTBF} = \frac{MTTF}{MTTF + MTTR}

The “Nines” Notation

Availability is often expressed in “nines”:

Availability Downtime per year
90% (1 nine) 36.5 days
99% (2 nines) 3.65 days
99.9% (3 nines) 8.76 hours
99.99% (4 nines) 52.56 minutes
99.999% (5 nines) 5.26 minutes
99.9999% (6 nines) 31.5 seconds

The availability of a composite system depends on how its components are connected:

Availability Tactics

Replication

Replication involves using multiple instances of a component to ensure continuity.

If the system is stateless, switching is immediate. If stateful, state synchronization is needed.

Forward Error Recovery

In forward error recovery, the system is designed to continue operating correctly even in the presence of faults.

From the normal state, the system goes to the failure state when a fault occurs. The system then detects the failure and transitions to a degraded state, where it takes corrective actions to return to the normal state.

Design Document (DD) Structure

The Design Document (DD) describes the high-level design decisions and how the system will be implemented to satisfy the requirements specified in the RASD.

  1. Introduction
    1. Scope: Defines the boundaries of the system and what is included/excluded.
    2. Definitions: Glossary of terms used in the document.
    3. Reference Documents: Lists related documents (e.g., RASD, project plan).
    4. Overview: High-level summary of the system’s design and structure.
  2. Architectural Design
    1. Overview: Informal description of high-level components and their interactions.
    2. Component View: Static component diagrams showing the system’s modules and their relationships.
    3. Deployment View: Deployment diagrams illustrating physical nodes, hardware, and software environments.
    4. Component Interfaces: Signatures and descriptions of the interfaces between components.
    5. Runtime View: Dynamic interactions described via sequence diagrams.
    6. Selected Architectural Styles and Patterns: Justification and description of chosen styles (e.g., client-server, microservices).
    7. Other Design Decisions: Additional decisions impacting the design (e.g., trade-offs, constraints).
  3. User Interface Design: Mockups or wireframes of the UI, refining the RASD from low to mid-fidelity prototypes.
  4. Requirements Traceability: Mapping between requirements (from RASD) and design components, often using a traceability matrix.
  5. Implementation, Integration, and Test Plan: Defines the order of component implementation (sequential/parallel), integration strategies, and testing approaches.
  6. Effort Spent: Summary of time and resources expended during design activities.
  7. References: Citations for external sources, standards, or tools used.

Verification and Validation

Verification and Validation (V&V) are independent procedures that are used together for checking that a product meets requirements and specifications and that it fulfills its intended purpose.

The chain of causality for software problems is:

  1. Error (Mistake): A human action that produces an incorrect result.
  2. Defect (Fault/Bug): An imperfection or deficiency in a product where it does not meet its requirements or specifications.
  3. Failure: An event in which a system or system component does not perform a required function within specified limits.

Static Analysis

Static Analysis is the process of evaluating a system or component based on its form, structure, content, or documentation, without executing the code.

This can be achieved using linters, type checkers, formal verification tools.

Some common defects detected by static analysis tools include:

Since checking non-trivial properties of programs is undecidable (Rice’s Theorem), static analysis tools must approximate. They need a balance precision (minimize false positives) and performance.

Data Flow Analysis

Data flow analysis gathers information about the possible set of values calculated at various points in a computer program. It operates on the Control Flow Graph (CFG).

The CFG is a directed graph that represents all paths that might be traversed through a program during its execution. It is composed of:

Reaching Definitions Analysis

CFG can be used for Reaching Definitions Analysis that determines which definitions of a variable v may reach a point p in the code without being overwritten (killed).

For a basic block n:

The data flow equations are: In[n] = \bigcup_{p \in pred(n)} Out[p] Out[n] = Gen[n] \cup (In[n] - Kill[n])

From the Reaching Definitions, we can derive the liveness of variables: a variable v is “live” at a point p, meaning it holds a value that may be needed in the future.

It is also possible to build def-use chains and use-def chains:

These chains are essential for optimizations (like dead code elimination) and bug finding (like use-before-define).

Symbolic Execution

Symbolic Execution is a program analysis technique that executes programs with symbolic inputs instead of concrete values. This allows to analyze reachability (which parts of the code can be executed), path feasibility (which paths are possible to take), and generate test cases.

During the symbolic execution, the program is executed symbolically. During the execution the path condition (logical formula that represents the constraints on the inputs that must hold for the execution to follow a particular path) is built.

When a branch is encountered (e.g., an if statement), the symbolic execution forks into two paths:

  1. True branch: The path condition is updated to include the condition of the branch.
  2. False branch: The path condition is updated to include the negation of the condition.

At the end of each path is possible to analyze the path condition to determine if the path is feasible (i.e., if there exists an input that satisfies the path condition) or is infeasible.

Dynamic Analysis

Dynamic Analysis is the process of evaluating a system or component based on its execution behavior. It involves running the program with specific inputs and observing its behavior to identify defects.

The main goals are:

A test is composed of:

Testing

Test cases can be generated with two main approaches:

The generation can be manual or automated.

Concolic Execution

Concolic Execution (CONCrete + symbOLIC) is a hybrid testing technique that combines concrete execution with symbolic execution. It aims to leverage the strengths of both approaches to improve test coverage and defect detection.

The process involves executing the program with concrete inputs while simultaneously tracking symbolic expressions for the program’s variables. This allows the generation of new test inputs that explore different execution paths.

  1. Concrete Execution: The program is executed with specific concrete inputs, and the actual values of variables are recorded.
  2. Symbolic Tracking: Alongside the concrete execution, symbolic expressions are maintained for the program’s variables.
  3. Path Exploration: When a branch is encountered, the path condition is updated symbolically. New test inputs are generated by negating the path condition of the taken branch, allowing exploration of alternative paths.
Fuzzing

Fuzzing is an automated testing technique that involves providing random or semi-random inputs to a program to discover vulnerabilities, crashes, or unexpected behavior. The main idea is to stress-test the system by feeding it a large volume of inputs, some of which may be malformed or unexpected.

Fuzzing can be classified into:

Fuzzing is good at finding buffer overflows, missing input validation, rough edge cases, and other security vulnerabilities.

The best practice is to use fuzzing along with runtime memory checks (like AddressSanitizer) to detect memory corruption issues.

Search-Based Testing

Search-Based Testing is an automated testing technique generates tests based on an objective (coverage, reachability, etc.).

The distance to the objective is calculated using a fitness function. The fitness function assigns a score to each test case based on how close it is to achieving the testing objective.

Unit Testing

Unit Testing is a manual process aimed at verifying the correctness of individual units or components of a software system in isolation.

This can be achieved using some strategies:

E2E Testing

End-to-End (E2E) Testing is a testing procedure that validates the complete and integrated system to ensure that it meets the specified requirements and behaves as expected from start to finish.

This allows to test the functions, the performances, the load handling, and the security of the entire system.

Project Management

Project Management is the discipline of planning, organizing, and managing resources to achieve specific goals within a defined timeline and budget.

It is composed of different phases:

Initiation

Project Initiation is the first phase of the project management lifecycle. It involves defining the project at a broad level and obtaining authorization to start the project.

Planning

Project Planning is the second phase of the project management lifecycle. It involves developing a detailed project plan that outlines how the project will be executed, monitored, and controlled.

Scheduling

This includes defining tasks, milestones, timelines, and deliverables.

The work is divided using a Work Breakdown Structure (WBS) that decomposes the project into smaller, manageable components.

The tasks are organized using the Dependence Diagram Method (PDM) that represents tasks as nodes and dependencies as directed edges. This allows to identify the Critical Path (the longest sequence of dependent tasks that determines the minimum project duration).

They can be tracked using Gantt Charts that represent tasks over time, showing dependencies and progress.

Risk Management

Risk Management is the process of identifying, assessing, and mitigating risks that could impact the success of a project.

Each risk is form using this format: “If <cause> happens, then <consequence> will occur, for <stakeholder>.”

Each risk has its own:

Each risk must have a Mitigation Strategy to reduce its likelihood or impact.

Effort Estimation

Effort estimation is the process of predicting the amount of effort (time, resources) required to complete a project or a task.

This can be done using Function Point Analysis (FPA) that estimates the size of the software based on its functionality.

The application is decomposed into its functional components:

Each component is assigned a complexity level (Low, Average, High).

Than there is a table that converts the components into Unadjusted Function Points (UFP) based on their type and complexity.

The total UFP is calculated by summing the function points of all components.

Execution, Monitoring, and Control

This phase involves carrying out the project plan, tracking progress, and making adjustments as necessary to ensure the project stays on track.

At the beginning of the project a time and cost baseline is established to measure performance. All the variable are converted in *Earned Value (EV) that represents the value of work actually performed up to a specific point in time.

The main metrics used are:

From these metrics, we can derive, from the schedule point of view:

From the cost point of view:

Those metrics help to identify if the project is ahead/behind schedule, under/over budget, and how it is expected to perform in the future (Cost Estimated At Complete (EAC)).

The EAC can be calculated using different formulas based on the situation:

Closing

The project closing phase involves finalizing all project activities, delivering the completed product to the client, and formally closing the project.

This phase is important to ensure that all project objectives have been met, lessons learned are documented, and resources are released for future projects.

Ultima modifica:
Scritto da: Andrea Lunghi