OS knowledge syllabus 1

. . .

Syllabus

Session Topic Detailed Topics
1 JVM STRING FINAL 1. Warm Up
2. JVM Memory Management
3. JVM, JDK, JRE
4. Garbage Collection
5. String & StringBuilder & StringBuffer
6. Final, Finally, Finalize
7. Immutable class (optional: basic syntax of java)
2 STATIC OOP 1. Static
2. Marker Interface - Serializable, Cloneable
3. OOP
4. SOLID Principle
5. Reflection
6. Generics
3 COLLECTION 1. Array vs ArrayList vs LinkedList
2. Set, TreeSet, LinkedHashSet
3. Map, LinkedHashMap, ConcurrentHashMap(how it works)
4. SynchronizedMap
5. Iterator vs Enumeration
4 EXCEPTION DESIGN PATTERN 1. Design Pattern - Singleton, Factory, Observer, Proxy
2. Exception Type - compile, runtime, customized
5 THREADS 1. MultiThreads Interaction (Synchronized, Atomic, ThreadLocal, Volatile)
2. Reentrant Lock
3. Executor and ThreadPool, ForkJoinPool
4. Future & CompletableFuture
5. Runnable vs Callable
6. Semaphore vs Mutex
6 JAVA8,17 1. Java 8: Functional Interface, Lambda, Stream API (map, filter, sorted, groupingBy etc), Optional, Default
2. Java 17: Sealed Class, advantage vs limitation, across package
7 SQL 1. Primary Key, Normalization
2. Different type of Joins
3. Top asked SQLs - nth highest salary; highest salary each department; employee salary greater than manager
4. Introduce of Stored Procedure and Function
5. Cluster index vs Non - Cluster - Index
6. Explain Plan - what does it do, what can it tell
8 NOSQL 1. SQL vs NoSQL
2. MongoDB vs Cassandra introduction
3. ACID vs CAP rules explanation
9 REST API 1. DispatcherServlet
2. Rest API
3. How to create a good rest api
4. Http Error Code: 200, 201, 400, 401, 403, 404, 500, 502, 503, 504
5. Introduction of GraphQL, WebSocket, gRPC
6. ReactiveJava
10 SPRING CORE 1. IOC/DI
2. Bean Scope
3. Constructor vs Setter vs Field based injection
11 SPRING ANNOTATIONS 1. Different spring annotations
2. @Controller vs @RestController
3. @Qualifier, @Primary
4. Spring Cache and Retry
12 SPRING BOOT 1. How to create spring boot from Scratch
2. Benefit of Spring boot
3. Annotation @SpringBootApplication
4. AutoConfiguration, how to disable
5. Actuator
13 SPRING BOOT2 1. Spring ActiveProfile
2. AOP
3. @ExceptionHandler, @ControllerAdvice
14 DATA ACCESS 1. JDBC, statement vs PreparedStatement, Datasource
2. Hibernate ORM, Session, Cache
3. Optimistic Locking - add version column
4. Association: many - to - many
15 TRANSACTION JPA 1. @Transactional - atomic operation
2. Propagation, Isolation
3. JPA naming convention
4. Paging and Sorting Using JPA
5. Hibernate Persistence Context
16 SECURITY 1. How to implement Security by overriding Spring class
2. Basic Authentication and password encryption
3. JWT Token and workflow
4. Oauth2 workflow
5. Authorization based on User role
17 UNIT TEST 1. Different Type of Tests in whole project lifecycle
2. Unit Test, Mock
3. Testing Rest Api with Rest Assured
18 AUTOMATION TEST 1. BDD - Cucumber - annotations
2. Load Test with JMeter
3. Performance tool JProfiler
4. AB Test
19 MICROSERVICE 1. Benefits/Disadvantage of MicroService
2. How to split monolithic to microservice
3. Circuit Breaker - concept, retry, fallback method
4. Load Balancer - concept and algorithms
5. API Gateway
6. Config Server
20 KAFKA 1. Kafka - concepts, how it works and how message is sent to partition
2. Consumer Group, assignment strategy
3. Message in Order
21 KAFKA2 1. Kafka Duplicate Message
2. Kafka Message Loss
3. Poison Failure, DLQ
4. Kafka Security (SASL, ACLs, Encrypt etc)
22 DISTRIBUTED SYSTEM 1. MicroService: how to communicate between services
2. Saga Pattern
3. Monitoring: Splunk, Grafana, Kabana, CloudWatch etc
4. System Design: distributed system
23 DEVOPS 1. CICD
2. Jenkins pipeline with example
3. Git Commands: squash, cherry - pick etc
4. On - Call: PageDuty etc
5. How do you solve a production issue with or without log
24 KUBERNETES 1. Kubernetes, EKS, WCNP, KubeCtl
25 CLOUD AWS Modules with examples

Kafka

同一个消费组对于一个分区是共享同一个offset的是吧

是的,在 Kafka 里,同一个消费组对于一个分区共享同一个 offset。下面从原理、目的和影响等方面详细解释:

原理

  • 分区分配机制:Kafka 为了保证消息在分区内的顺序消费和避免重复消费,采用了一个分区只能被一个消费组内的一个消费者消费的策略。这意味着在同一个消费组中,针对特定的一个分区,只会有一个消费者负责处理该分区的消息。既然只有一个消费者处理这个分区,那么该分区对应的消费偏移量(offset)在这个消费组内就是唯一的,也就是共享同一个 offset。
  • offset 提交:消费者在消费完消息后,需要向 Kafka 提交 offset,以此告知 Kafka 该消费组已经消费到了分区的哪个位置。在一个消费组里,无论哪个消费者实际消费了分区的消息,最终提交的都是该分区统一的 offset。例如,消费者 A 消费了分区 P0 的部分消息后提交 offset,这个 offset 就代表了整个消费组在分区 P0 的消费进度。

目的

  • 保证消息顺序性:确保同一个消费组对分区内的消息按顺序处理。如果每个消费者都有自己独立的 offset,就可能出现多个消费者乱序消费分区消息的情况,破坏了消息在分区内的顺序性。
  • 避免重复消费:统一的 offset 能让消费组清楚知道哪些消息已经被消费过。当消费者出现故障重启或者进行分区重平衡后,能从正确的 offset 位置继续消费,避免重复处理已经消费过的消息。

影响

  • 故障恢复:若消费组内负责某分区的消费者出现故障,Kafka 会将该分区分配给其他消费者,新的消费者会从共享的 offset 位置继续消费,保证消费的连续性。
  • 分区重平衡:当消费组发生分区重平衡时,所有消费者会重新分配分区,每个分区对应的共享 offset 能帮助新接手的消费者准确找到消费起点。

Spring Annotations

Spring has numerous annotations. For example, @Component is used to mark ordinary POJO classes as components, allowing Spring to automatically scan and manage them. @Service is typically used to mark classes in the service layer, and @Repository is used for classes in the data access layer. Essentially, they are specialized forms of @Component with more specific semantics. @Autowired is used for automatic bean wiring, and @RequestMapping is used to handle request mappings and so on.

@Controller vs @RestController

@Controller is an annotation in Spring MVC used to mark controller classes. It usually works with view technologies such as JSP and is used to return views. @RestController, on the other hand, is a combination of @Controller and @ResponseBody. It means that the methods in this controller class will by default return the return value directly to the client as the response body. It is suitable for building RESTful APIs, and the returned data is usually in formats such as JSON or XML.

In Spring MVC, the difference between @Controller and @RestController mainly lies in how they handle responses:

  • • @Controller is used to return views (HTML, JSP, etc.), making it suitable for traditional web applications.
  • • @RestController is used to return JSON or XML data, making it ideal for RESTful APIs.

1. @Controller Example (Returning a View)

By default, @Controller returns a view name. To return JSON, you must use @ResponseBody.

@Controller
public class MyController {

@GetMapping("/hello")
public String hello(Model model) {
model.addAttribute("message", "Hello, Spring!");
return "helloPage"; // Returns a view name, not JSON
}
}

📌 Explanation

  • • The return “helloPage”; statement does not return JSON. Instead, it looks for a helloPage.html or helloPage.jsp view.
  • • To return JSON from @Controller, you must explicitly add @ResponseBody:
  • @Controller
    public class MyController {

    @GetMapping("/json")
    @ResponseBody // Ensures the response is JSON instead of a view
    public String jsonResponse() {
    return "{\"message\": \"Hello, JSON!\"}";
    }
    }

2. @RestController Example (Returning JSON)

@RestController returns JSON by default, without requiring @ResponseBody.

@RestController
public class MyRestController {

@GetMapping("/api/hello")
public Map<String, String> helloJson() {
Map<String, String> response = new HashMap<>();
response.put("message", "Hello, JSON!");
return response; // Automatically converted to JSON
}
}

📌 Explanation

  • • @RestController is a combination of @Controller and @ResponseBody, so there’s no need to add @ResponseBody manually.
  • • Spring Boot automatically converts the Map response into JSON:

{
“message”: “Hello, JSON!”
}

3. When to Use @Controller vs @RestController?

✅ Use @Controller 👉 If your application needs to return HTML pages (traditional MVC web apps).
✅ Use @RestController 👉 If you’re building a REST API that returns JSON data.

In short:

  • • If your app is frontend-backend separated (React, Vue, Angular consuming JSON), use @RestController.
  • • If your app renders views on the server and serves HTML pages, use @Controller.

🚀 If you’re unsure, prefer @RestController—it aligns with modern web development practices!

  • @Qualifier, @Primary
    @Qualifier is used to specify the name or qualification conditions of the specific bean to be wired during autowiring. When there are multiple beans of the same type, @Qualifier can be used to clearly specify which one to use. @Primary is used to mark a bean as the preferred candidate for autowiring. When there are multiple beans of the same type, Spring will preferentially select the bean marked with @Primary for autowiring.
  • Spring Cache and Retry
    Spring Cache provides declarative cache support. Annotations like @Cacheable, @CachePut, and @CacheEvict can be easily used to cache the results of methods, improving system performance and reducing database access. Spring Retry, on the other hand, offers a mechanism to automatically retry operations that fail due to certain conditions, such as network failures or temporary database unavailability. It can be configured through annotations and configuration classes to specify retry policies, including the number of retries, retry intervals, and conditions for triggering retries.

Spring Boot

  • How to create spring boot from Scratch
    To create a Spring Boot project from scratch, you can start by choosing a build tool like Maven or Gradle. Then, create a basic project structure with directories for source code, resources, etc. Add the necessary Spring Boot dependencies to the build configuration file. Define the main application class, which is usually annotated with @SpringBootApplication. You can then start adding controllers, services, and other components as needed to build your application.
  • Benefit of Spring boot
    Spring Boot simplifies the development of Spring applications. It provides auto-configuration, which reduces the need for extensive XML or Java configuration. It also comes with a built-in embedded server, making it easy to run and deploy applications. Spring Boot starters allow for quick addition of common dependencies and functionality. Additionally, it offers features like actuator endpoints for monitoring and managing the application, and it makes it easier to handle configuration properties and profiles.
  • Annotation @SpringBootApplication
    @SpringBootApplication is a composite annotation that combines @SpringBootConfiguration, @EnableAutoConfiguration, and @ComponentScan. It is used to mark the main application class and tells Spring Boot to start the application, perform auto-configuration, and scan for components in the specified packages and their sub-packages.
  • AutoConfiguration, how to disable
    Spring Boot’s auto-configuration automatically configures beans based on the dependencies present in the project. To disable specific auto-configurations, you can use the @SpringBootApplication(exclude = {SomeAutoConfiguration.class}) annotation at the application level. You can also set properties in the application.properties or application.yml file to disable certain auto-configurations, for example, spring.autoconfigure.exclude=com.example.SomeAutoConfiguration.
  • Actuator
    Spring Boot Actuator provides endpoints that allow you to monitor and manage your application. It offers endpoints such as /health to check the health of the application, /metrics to view various metrics about the application’s performance, /info to get information about the application, and many others. These endpoints can be used for debugging, monitoring, and optimizing the application.

Spring Boot2

  • Spring ActiveProfile
    Spring ActiveProfile allows you to define different profiles for your application, such as dev, test, prod, etc. You can configure different beans, properties, and behaviors for each profile. By setting the spring.profiles.active property, you can switch between different profiles at runtime, enabling different configurations for different environments.
  • AOP
    Aspect-Oriented Programming (AOP) in Spring Boot 2 allows you to modularize cross-cutting concerns such as logging, security, and transaction management. You can define aspects using annotations like @Aspect, @Before, @After, @Around to intercept method calls and perform additional actions before, after, or around the execution of the target methods.
  • @ExceptionHandler, @ControllerAdvice
    @ExceptionHandler is used in a controller class to handle specific exceptions that occur within the methods of that controller. @ControllerAdvice is a global exception handling mechanism. It allows you to define a class that can handle exceptions across multiple controllers. You can use @ExceptionHandler within a @ControllerAdvice class to handle different types of exceptions globally and return appropriate error responses to the client.

Data Access

  • JDBC, statement vs PreparedStatement, Datasource
    JDBC (Java Database Connectivity) is an API for interacting with databases in Java. Statement is used to execute SQL statements directly, but it is vulnerable to SQL injection attacks. PreparedStatement is a more secure and efficient alternative. It allows you to precompile SQL statements and set parameters, preventing SQL injection. A DataSource is a factory for connections to a database. It manages the connection pool and provides connections to the application.
  • Hibernate ORM, Session, Cache
    Hibernate ORM is an Object Relational Mapping framework that allows you to map Java objects to database tables. A Session in Hibernate is a lightweight, short-lived object that provides an interface to interact with the database. It is used to perform operations like saving, loading, and deleting objects. Hibernate also has a caching mechanism to improve performance. It can cache objects in memory to reduce database access. There are different levels of caches, such as the first-level cache (session-level cache) and the second-level cache (shared cache across sessions).
  • Optimistic Locking - add version column
    Optimistic locking is a concurrency control mechanism used in databases. In the context of Hibernate, it can be implemented by adding a version column to the database table. When an object is loaded, the version number is also loaded. When the object is updated, Hibernate checks if the version number has changed. If it has, it means the object has been modified by another transaction, and the update will fail, preventing data conflicts.
  • Association: many - to - many
    In object-relational mapping, a many-to-many association is used when multiple objects of one entity can be related to multiple objects of another entity. For example, in a system with users and roles, a user can have multiple roles, and a role can be assigned to multiple users. In Hibernate, this is usually mapped using a join table and appropriate annotations like @ManyToMany and @JoinTable.

Transaction JPA

  • @Transactional - atomic operation
    The @Transactional annotation in Spring JPA is used to mark a method or a class as a transactional operation. It ensures that the operations within the method are executed atomically. That is, either all the operations succeed and are committed to the database, or if an error occurs, all the operations are rolled back, maintaining data consistency.
  • Propagation, Isolation
    Transaction propagation defines how a transaction should behave when a transactional method calls another transactional method. There are several propagation types like REQUIRED, REQUIRES_NEW, SUPPORTS, etc. Isolation levels define the degree to which one transaction is isolated from other transactions. Common isolation levels are READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, and SERIALIZABLE. Each level has different trade-offs in terms of data consistency and concurrency.
  • JPA naming convention
    JPA has certain naming conventions for mapping entity classes to database tables and columns. By default, it uses a naming strategy where the entity class name is mapped to the table name, and the property names are mapped to column names. However, you can also customize the naming using annotations like @Table and @Column to specify different names if needed.
  • Paging and Sorting Using JPA
    JPA provides support for paging and sorting data. You can use the Pageable interface and related classes to specify the page number, page size, and sorting criteria. For example, you can use methods like findAll(Pageable pageable) in a JPA repository to retrieve a paginated and sorted list of entities.
  • Hibernate Persistence Context
    The Hibernate persistence context is a set of managed entities that are associated with a particular session. It tracks the state of the entities and is responsible for synchronizing the changes between the entities and the database. It manages the lifecycle of the entities, including loading, saving, and deleting them.

Security

  • How to implement Security by overriding Spring class
    You can implement security in a Spring application by overriding certain Spring security classes. For example, you can extend WebSecurityConfigurerAdapter and override methods like configure(HttpSecurity http) to define custom security configurations such as access rules, authentication mechanisms, etc. You can also override other classes like UserDetailsService to provide custom user authentication and authorization logic.
  • Basic Authentication and password encryption
    Basic authentication is a simple authentication mechanism where the client sends the username and password in the request headers. In Spring, it can be configured easily. Password encryption is crucial for security. Spring provides various password encoding mechanisms like BCryptPasswordEncoder to securely hash and store passwords. When a user registers or changes their password, the password is encrypted and stored in the database, and during authentication, the provided password is encrypted and compared with the stored hash.
  • JWT Token and workflow
    JSON Web Token (JWT) is a widely used token-based authentication and authorization mechanism. The workflow typically involves the client sending username and password to the server for authentication. If the authentication is successful, the server generates a JWT token containing user information and a signature. The client then stores the token and sends it in the headers of subsequent requests. The server validates the token on each request and authorizes the user based on the information in the token.
  • Oauth2 workflow
    OAuth2 is an authorization framework that allows users to grant limited access to their resources on one server to another server without sharing their credentials. The typical OAuth2 workflow involves steps like the client redirecting the user to the authorization server for authentication and authorization, the user granting permission, the authorization server issuing an access token, and the client using the access token to access protected resources on the resource server.
  • Authorization based on User role
    In a Spring security application, authorization based on user roles can be implemented by assigning different roles to users and configuring access rules based on those roles. You can use annotations like @PreAuthorize or configure access rules in the security configuration to specify which roles are allowed to access which resources or perform which operations. For example, you can define that only users with the ROLE_ADMIN role can access certain administrative endpoints.

REST API

1. DispatcherServlet

Definition

DispatcherServlet is a key component in the Spring Web MVC framework. It serves as the front - controller in a Spring - based web application. A front - controller is a single servlet that receives all HTTP requests and then dispatches them to the appropriate handlers (controllers) based on the request’s URL, HTTP method, and other criteria.

Function

  • Request Routing: It maps incoming requests to the appropriate @Controller classes and their methods using the configured handler mappings. For example, it can match a request to a specific controller method based on the URL pattern defined in the @RequestMapping annotation.
  • View Resolution: After a controller method processes the request and returns a logical view name, the DispatcherServlet uses a view resolver to map this logical name to an actual view (such as a JSP page or a Thymeleaf template) and renders the response.
  • Intercepting and Pre - processing: It can also use interceptors to perform pre - processing and post - processing tasks on requests and responses, like logging, authentication checks, etc.

2. Rest API

Definition

REST (Representational State Transfer) is an architectural style for building web services. A REST API (Application Programming Interface) is a set of rules and conventions for creating and consuming web services based on the REST principles.

Characteristics

  • Stateless: Each request from a client to a server must contain all the information necessary to understand and process the request. The server does not store any client - specific state between requests.
  • Resource - Oriented: Resources are the key abstractions in a REST API. Resources can be things like users, products, or orders, and are identified by unique URIs (Uniform Resource Identifiers).
  • HTTP Verbs: REST APIs use standard HTTP methods (verbs) to perform operations on resources. For example, GET is used to retrieve a resource, POST to create a new resource, PUT to update an existing resource, and DELETE to remove a resource.

3. How to create a good REST API

Design Principles

  • Use Clear and Descriptive URIs: URIs should clearly represent the resources. For example, use /users to represent a collection of users and /users/{userId} to represent a specific user.
  • Follow HTTP Verbs Correctly: Use GET for retrieval, POST for creation, PUT for full - update, PATCH for partial - update, and DELETE for deletion.
  • Return Appropriate HTTP Status Codes: Indicate the result of the request clearly. For example, return 200 for successful retrievals, 201 for successful creations, and 4xx or 5xx for errors.
  • Provide Good Documentation: Use tools like Swagger to generate documentation that explains the API endpoints, their input parameters, and expected output.

Security and Performance

  • Authentication and Authorization: Implement proper authentication mechanisms (e.g., OAuth, JWT) to ensure that only authorized users can access the API.
  • Caching: Implement caching strategies to reduce the load on the server and improve response times.

4. HTTP Error Codes

  • 200 OK: Indicates that the request has succeeded. It is commonly used for successful GET requests to retrieve a resource or successful PUT/PATCH requests to update a resource.
  • 201 Created: Used when a new resource has been successfully created. For example, when a client sends a POST request to create a new user, and the server successfully creates the user, it returns a 201 status code.
  • 400 Bad Request: Signifies that the server cannot process the request due to a client - side error, such as malformed request syntax, invalid request message framing, or deceptive request routing.
  • 401 Unauthorized: Indicates that the request requires user authentication. The client needs to provide valid credentials to access the requested resource.
  • 403 Forbidden: The client is authenticated, but it does not have permission to access the requested resource. For example, a regular user trying to access an administrative - only endpoint.
  • 404 Not Found: The requested resource could not be found on the server. This might be because the URL is incorrect or the resource has been deleted.
  • 500 Internal Server Error: A generic error message indicating that the server encountered an unexpected condition that prevented it from fulfilling the request. It could be due to a programming error, database issues, etc.
  • 502 Bad Gateway: The server, while acting as a gateway or proxy, received an invalid response from an upstream server.
  • 503 Service Unavailable: The server is currently unable to handle the request due to temporary overloading or maintenance. The client may try again later.
  • 504 Gateway Timeout: The server, while acting as a gateway or proxy, did not receive a timely response from an upstream server.

5. Introduction of GraphQL, WebSocket, gRPC

GraphQL

  • Definition: GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. It allows clients to specify exactly what data they need from an API, reducing over - fetching and under - fetching of data.
  • Advantages: It provides a more efficient way of data retrieval compared to traditional REST APIs, especially in complex applications where clients may need different subsets of data. It also has a strong type system and can be introspected by clients.

WebSocket

  • Definition: WebSocket is a communication protocol that provides full - duplex communication channels over a single TCP connection. It enables real - time communication between a client and a server.
  • Advantages: It reduces the overhead of traditional HTTP requests by maintaining a persistent connection, which is suitable for applications that require real - time updates, such as chat applications, online gaming, and live dashboards.

gRPC

  • Definition: gRPC is a high - performance, open - source universal RPC (Remote Procedure Call) framework. It uses Protocol Buffers as the interface definition language and serialization format.
  • Advantages: It offers high performance, low latency, and strong typing. It is suitable for microservices architectures where efficient communication between services is crucial.

6. ReactiveJava

Definition

ReactiveJava is a Java implementation of the Reactive Extensions (Rx) library. It is used for reactive programming, which is a programming paradigm that deals with asynchronous data streams and the propagation of change.

Key Concepts

  • Observable: Represents a source of data that can emit zero or more items over time. An Observable can emit data synchronously or asynchronously.
  • Subscriber: A Subscriber subscribes to an Observable to receive the emitted items. It can react to the data, errors, or the completion of the data stream.
  • Operators: ReactiveJava provides a rich set of operators that can be used to transform, filter, combine, and manipulate the data streams. For example, the map operator can be used to transform each item in the stream, and the filter operator can be used to filter out unwanted items.

Use Cases

  • Asynchronous Programming: It simplifies asynchronous programming by providing a declarative way to handle asynchronous operations. For example, in a web application, it can be used to handle multiple asynchronous API calls and combine their results.
  • Event - Driven Programming: It is well - suited for event - driven applications where events need to be processed in a reactive and efficient manner. For example, in a GUI application, it can be used to handle user input events and update the UI accordingly.

MongoDB vs Cassandra introduction

MongoDB and Cassandra are both popular NoSQL databases, but they have different characteristics and use cases. Here is an introduction to their differences:

Data Model

  • MongoDB: It uses a document-oriented data model. Data is stored in BSON (Binary JSON) format, which allows for flexible and nested data structures. Documents can have different fields and structures within the same collection, making it suitable for applications where the data schema may change frequently or is not well-defined upfront. For example, in a content management system, different types of content like blog posts, images, and videos can be stored in the same collection with each document having its own set of relevant fields.
  • Cassandra: It employs a column-oriented data model. Data is organized into tables, rows, and columns, similar to a traditional relational database, but with more flexibility. However, the columns are dynamic, and a row can have a different set of columns than other rows. Cassandra is optimized for handling large amounts of data with a high write throughput and is often used in applications that require efficient storage and retrieval of time-series data or data that needs to be partitioned and distributed across multiple nodes. For instance, in an IoT (Internet of Things) application, sensor data can be stored in Cassandra with each sensor’s data as a row and the timestamp and sensor readings as columns.

Querying Capabilities

  • MongoDB: Supports rich querying capabilities. It allows for complex queries using a JSON-like query language. You can query based on field values, use operators like $gt (greater than), $lt (less than), $in, etc., and perform queries on nested fields and arrays. It also supports indexing to improve query performance. For example, you can easily query for all blog posts authored by a specific user or all documents with a certain tag.
  • Cassandra: Its querying capabilities are more limited compared to MongoDB. Queries in Cassandra are mainly based on the primary key. You can query by the partition key and optionally the clustering key. It does not support ad-hoc queries as easily as MongoDB. However, it can perform very efficiently for the queries it is designed to handle, such as retrieving a range of data based on the primary key or querying for data within a specific partition.

Scalability and Performance

  • MongoDB: Scales horizontally using sharding. It can distribute data across multiple servers, called shards, to handle large amounts of data and high traffic. MongoDB is known for its good read performance and can handle a moderate to high number of read operations. In a sharded environment, it can route queries to the appropriate shards efficiently.
  • Cassandra: Is highly scalable and designed to handle massive amounts of data and high write loads. It uses a distributed architecture where data is replicated across multiple nodes for fault tolerance and scalability. Cassandra can handle a very high volume of write operations and can scale out easily by adding more nodes to the cluster. It is often used in applications that require continuous data ingestion, such as social media platforms or financial systems that need to handle a large number of transactions.

Use Cases

  • MongoDB: Suited for applications where the data structure is flexible and dynamic, such as content management systems, mobile applications, and web applications with evolving data requirements. It is also a good choice for applications that require complex querying and indexing capabilities, like e-commerce platforms where you need to query products based on various attributes.
  • Cassandra: Ideal for applications that deal with large volumes of data, high write throughput, and where data needs to be distributed and replicated across multiple nodes. It is commonly used in real-time analytics, IoT applications, social media platforms for storing user-generated content and activity streams, and in financial systems for handling high-frequency trading data and transaction records.

Monitoring: Splunk, Grafana, Kabana, CloudWatch

The following is an introduction to these monitoring technologies:

Splunk

  • Overview: Splunk is a powerful data analytics platform that is widely used for monitoring and analyzing machine data. It can ingest, index, and correlate data from various sources such as logs, metrics, and events.
  • Features:
    • Data Collection: It can collect data from a large number of sources including servers, applications, network devices, etc.
    • Search and Analytics: Provides a powerful search language that allows users to quickly query and analyze data to find patterns, troubleshoot issues, and gain insights.
    • Visualization: Enables users to create various visualizations like dashboards, charts, and graphs to present data in an intuitive way.
    • Alerting: Can set up alerts based on specific conditions or thresholds, notifying users when important events occur.
  • Use Cases: Commonly used in IT operations for monitoring infrastructure health, in security for detecting threats and analyzing security incidents, and in business for analyzing customer behavior and operational data.

Grafana

  • Overview: Grafana is an open-source data visualization and monitoring tool. It focuses mainly on presenting data in a visually appealing and understandable way, making it easy for users to monitor and analyze metrics.
  • Features:
    • Data Sources: Supports a wide range of data sources such as Prometheus, InfluxDB, MySQL, etc.
    • Visualization Options: Offers a variety of visualization types including line charts, bar charts, pie charts, heatmaps, and more. Users can customize dashboards to display the data they need.
    • Alerting System: Allows setting up alerts based on metric values and conditions. It can send notifications through various channels like email, Slack, etc.
    • Plugin System: Has a rich ecosystem of plugins that can extend its functionality, enabling integration with other tools and adding new features.
  • Use Cases: It is popular in DevOps and IT teams for monitoring application performance, infrastructure metrics, and for visualizing time-series data. It helps in quickly identifying trends and anomalies in the data.

Kibana

  • Overview: Kibana is an open-source data visualization and exploration tool that is closely integrated with Elasticsearch. It is used to visualize and analyze data stored in Elasticsearch.
  • Features:
    • Data Visualization: Allows users to create a variety of visualizations such as bar charts, line charts, maps, and histograms. It provides an intuitive interface for exploring and filtering data.
    • Dashboard Creation: Users can easily create and customize dashboards to display multiple visualizations in one place, providing a comprehensive view of the data.
    • Search and Filtering: Provides a powerful search and filtering functionality to quickly find and analyze specific data subsets.
    • Time-series Analysis: Specializes in analyzing time-series data, which is useful for monitoring and understanding how data changes over time.
  • Use Cases: Commonly used in log analysis, security information and event management (SIEM), and for monitoring the performance of applications and infrastructure. It is widely used in combination with Elasticsearch for large-scale data analysis and monitoring.

CloudWatch

  • Overview: CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). It allows users to monitor AWS resources and the applications running on them.
  • Features:
    • Resource Monitoring: Automatically collects metrics from various AWS resources such as EC2 instances, RDS databases, S3 buckets, etc.
    • Custom Metrics: Allows users to define and send their own custom metrics to CloudWatch for monitoring application-specific performance indicators.
    • Alarms: Can set up alarms based on metric thresholds and events. It can trigger actions such as sending notifications, auto-scaling resources, or invoking Lambda functions.
    • Logs Management: Integrates with AWS CloudTrail and other services to collect and store logs. Users can analyze logs to gain insights into the behavior of their applications and resources.
  • Use Cases: In the AWS ecosystem, it is essential for monitoring the health and performance of cloud-based applications and infrastructure. It helps in optimizing resource utilization, detecting and resolving issues quickly, and ensuring the reliability of applications running on AWS.

Saga pattern

Saga 模式是一种设计模式,用于分布式系统和微服务架构中,以管理数据一致性,并处理涉及多个服务的事务。以下是详细介绍:

The Saga pattern is a design pattern used in distributed systems and microservices architecture to manage data consistency and handle transactions that involve multiple services. Here is a detailed introduction:

Definition and Concept

A saga is a sequence of transactions that are executed in a coordinated manner across multiple microservices or distributed components. Each transaction in the saga is a local operation within a single service, and the saga orchestrates these transactions to achieve a consistent outcome across the entire system. If any of the transactions in the saga fails, the saga must take appropriate action to roll back the changes made by the previous transactions in order to maintain data consistency.

Working Mechanism

  • Orchestration: There are two main ways to orchestrate sagas - choreography and orchestration. In choreography, the participating services communicate with each other directly to coordinate the execution of the saga steps. They follow a predefined protocol or set of messages to determine the order of execution and handle failures. In orchestration, there is a central orchestrator that controls the execution of the saga. It sends commands to the individual services to perform specific steps and monitors the progress of the saga.
  • Compensating Transactions: Each step in a saga has a corresponding compensating transaction. When a failure occurs, the compensating transactions are used to undo the effects of the previously executed steps. For example, if a saga involves creating an order in one service and reserving inventory in another, and the inventory reservation fails, the compensating transaction for the order creation service would delete the created order.

Use Cases

  • E-commerce Systems: In an e-commerce application, a saga can be used to handle complex operations such as placing an order. The saga might involve steps like creating an order record, reserving inventory, charging the customer’s credit card, and scheduling delivery. If any of these steps fail, the saga can roll back the previous steps to ensure data consistency.
  • Financial Transactions: For example, in a cross-border money transfer system, a saga can be used to manage the series of operations involved, such as deducting the amount from the sender’s account, converting the currency, and crediting the receiver’s account. If there is an issue during the currency conversion, the saga can roll back the deduction from the sender’s account.
  • Travel Booking Systems: When booking a trip that involves multiple services like flight booking, hotel reservation, and car rental, a saga can be used to ensure that all these operations are either completed successfully or rolled back in case of failures.

Advantages

  • Scalability: It allows microservices to operate independently and scale individually, as each service only needs to manage its own local transactions.
  • Flexibility: Sagas can be designed to handle complex business processes that involve multiple services and have different requirements and constraints.
  • Resilience: By using compensating transactions, sagas can recover from failures and maintain data consistency even in the face of partial successes or errors.

Disadvantages

  • Complexity: Designing and implementing sagas can be complex, especially when dealing with multiple services and complex business logic. The coordination and management of compensating transactions require careful planning and implementation.
  • Error Handling: Handling all possible error scenarios and ensuring that the compensating transactions work correctly in all cases can be challenging.
  • Performance Overhead: The need to coordinate multiple transactions and potentially execute compensating transactions can introduce some performance overhead compared to traditional single-transaction approaches.

定义与概念

一个 Saga 是一系列事务的序列,这些事务在多个微服务或分布式组件之间以协调的方式执行。Saga 中的每个事务都是单个服务内的本地操作,而 Saga 则对这些事务进行编排,以在整个系统中实现一致的结果。如果 Saga 中的任何一个事务失败,Saga 必须采取适当的措施来回滚之前事务所做的更改,以维护数据一致性。

工作机制

  • 编排:编排 Saga 主要有两种方式——编舞式和协调式。在编舞式中,参与的服务直接相互通信,以协调 Saga 步骤的执行。它们遵循预定义的协议或一组消息来确定执行顺序并处理失败情况。在协调式中,有一个中央协调器来控制 Saga 的执行。它向各个服务发送命令以执行特定步骤,并监控 Saga 的进展。
  • 补偿事务:Saga 中的每个步骤都有一个对应的补偿事务。当发生失败时,补偿事务用于撤销先前已执行步骤的影响。例如,如果一个 Saga 涉及在一个服务中创建订单,并在另一个服务中预留库存,而库存预留失败,那么订单创建服务的补偿事务将删除已创建的订单。

用例

  • 电子商务系统:在电子商务应用程序中,Saga 可用于处理复杂的操作,如下单。Saga 可能涉及的步骤包括创建订单记录、预留库存、从客户的信用卡中扣款以及安排配送。如果这些步骤中的任何一个失败,Saga 可以回滚先前的步骤,以确保数据一致性。
  • 金融交易:例如,在跨境汇款系统中,Saga 可用于管理所涉及的一系列操作,如从汇款人的账户中扣除金额、进行货币兑换以及向收款人的账户中存入款项。如果在货币兑换过程中出现问题,Saga 可以回滚从汇款人账户中扣除金额的操作。
  • 旅行预订系统:当预订一次旅行,涉及多个服务,如航班预订、酒店预订和租车时,Saga 可用于确保所有这些操作要么成功完成,要么在出现失败时回滚。

优点

  • 可扩展性:它允许微服务独立运行并单独扩展,因为每个服务只需管理其自身的本地事务。
  • 灵活性:Saga 可以被设计用来处理涉及多个服务、具有不同需求和约束条件的复杂业务流程。
  • 弹性:通过使用补偿事务,即使在部分成功或出现错误的情况下,Saga 也能从失败中恢复并维护数据一致性。

缺点

  • 复杂性:设计和实现 Saga 可能很复杂,尤其是在处理多个服务和复杂业务逻辑时。补偿事务的协调和管理需要仔细规划和实现。
  • 错误处理:处理所有可能的错误场景,并确保补偿事务在所有情况下都能正确工作,可能具有挑战性。
  • 性能开销:与传统的单事务方法相比,协调多个事务并可能执行补偿事务的需求会引入一些性能开销。

Kubernetes、EKS、WCNP、KubeCtl

Kubernetes相关

  1. What is Kubernetes?
    • Answer: Kubernetes is an open-source container orchestration platform. It automates the deployment, scaling, and management of containerized applications. It allows you to group containers into logical units for easy management and discovery, and provides features like automatic pod scheduling, load balancing, and self-healing.
  2. What are the main components of Kubernetes?
    • Answer: The main components include the control plane (containing API Server, etcd, Scheduler, Controller Manager) and the nodes (with kubelet, kube-proxy, and container runtime). The control plane manages the cluster, while nodes run the pods and containers.
  3. Explain the concept of a pod in Kubernetes.
    • Answer: A pod is the smallest and simplest unit in Kubernetes that you can create and manage. It can contain one or more closely related containers that share the same network namespace and storage volumes. Pods are used to group containers that need to be co-located and co-scheduled on the same node.

EKS相关

  1. What is EKS?
    • Answer: EKS stands for Amazon Elastic Kubernetes Service. It is a managed Kubernetes service provided by Amazon Web Services. It allows you to easily run Kubernetes on AWS without having to manage the underlying infrastructure, taking care of tasks like provisioning servers, installing Kubernetes, and maintaining the cluster.
  2. What are the advantages of using EKS?
    • Answer: Some advantages include easy integration with other AWS services, high availability as it is managed by AWS, auto-scaling capabilities to handle varying workloads, and reduced operational overhead as AWS takes care of many maintenance tasks.
  3. How does EKS handle node management?
    • Answer: EKS allows you to use AWS EC2 instances as nodes. You can define node groups and configure auto-scaling for these node groups. EKS also provides tools and APIs to manage the lifecycle of nodes, such as adding or removing nodes based on the cluster’s needs.

WCNP相关

  1. What is WCNP?
    • Answer: It’s a collection of Walmart’s internal cloud services.

KubeCtl相关

  1. What is KubeCtl?
    • Answer: KubeCtl is a command-line tool used to interact with a Kubernetes cluster. It allows you to perform various operations such as deploying applications, managing pods, services, and other Kubernetes resources, as well as viewing the status and logs of the cluster and its components.
  2. List some common KubeCtl commands.
    • Answer: Common commands include kubectl create to create resources, kubectl get to view resources, kubectl describe to get detailed information about a resource, kubectl delete to delete resources, and kubectl apply to apply configuration changes.
  3. How do you use KubeCtl to deploy an application?
    • Answer: First, you create a deployment configuration file (usually in YAML or JSON format) that defines the application’s pods, replicas, and other details. Then you use the kubectl apply -f <file> command, where <file> is the name of your configuration file. This will deploy the application to the Kubernetes cluster according to the specified configuration.

AWS Modules with examples

AWS (Amazon Web Services) offers a wide range of modules and services to build and manage various types of applications and infrastructure. Here are some of the key AWS modules with examples:

Compute Modules

  • Amazon Elastic Compute Cloud (EC2)
    • Description: A web service that provides resizable compute capacity in the cloud. It allows users to launch virtual servers, known as instances, with various operating systems and configurations.
    • Example: A startup might use EC2 instances to host their web application. They can choose an appropriate instance type based on their CPU, memory, and storage requirements. For instance, they could select a t2.micro instance for a small-scale development environment or an m5.xlarge instance for a more resource-intensive production application.
  • AWS Lambda
    • Description: A serverless compute service that lets you run code without provisioning or managing servers. It automatically scales based on the incoming request volume.
    • Example: A mobile application might use AWS Lambda to process user sign-up events. When a user signs up, the app triggers a Lambda function that validates the input, stores the user data in a database, and sends a welcome email.

Storage Modules

  • Amazon Simple Storage Service (S3)
    • Description: An object storage service that offers high scalability, data durability, and security. It is used to store and retrieve any amount of data from anywhere on the web.
    • Example: A media company could use S3 to store and distribute large video files. They can create an S3 bucket, upload the video files, and then use S3’s content delivery network (CDN) integration to serve the videos to users with low latency.
  • Amazon Elastic Block Store (EBS)
    • Description: A block-level storage service that provides persistent storage volumes for EC2 instances. It offers high-performance storage that can be attached to instances and used like a local hard drive.
    • Example: A database server running on an EC2 instance might use an EBS volume to store its data. The EBS volume can be sized according to the database’s storage needs and can be easily detached and attached to another instance for maintenance or scaling purposes.

Database Modules

  • Amazon Relational Database Service (RDS)
    • Description: A managed relational database service that makes it easy to set up, operate, and scale a relational database in the cloud. It supports popular database engines like MySQL, PostgreSQL, and Oracle.
    • Example: An e-commerce website could use RDS to manage its customer and order data. They can create an RDS instance with the appropriate database engine and configure it with the necessary storage and compute resources. The website’s application can then connect to the RDS instance to perform database operations such as inserting, updating, and querying data.
  • Amazon DynamoDB
    • Description: A fully managed NoSQL database service that offers fast and predictable performance with seamless scalability. It is designed for applications that require low-latency access to data.
    • Example: A mobile gaming company might use DynamoDB to store user game progress, leaderboard data, and in-game purchases. The database can handle the high write and read throughput required by the game, and it can scale automatically as the number of users grows.

Networking Modules

  • Amazon Virtual Private Cloud (VPC)
    • Description: Allows you to provision a logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define.
    • Example: A financial institution could create a VPC to host its critical applications and services. They can define subnets, route tables, and security groups within the VPC to ensure secure and isolated networking. For example, they might have a public subnet for web servers that need to be accessible from the internet and a private subnet for database servers that should only be accessible from within the VPC.
  • Amazon Route 53
    • Description: A highly available and scalable Domain Name System (DNS) web service. It translates domain names into IP addresses and routes internet traffic to the appropriate AWS resources.
    • Example: A company with multiple websites and applications can use Route 53 to manage their domain names and DNS records. They can create DNS records to point their domain names to the corresponding EC2 instances, load balancers, or other AWS services. For instance, they can set up an A record to map a domain name to the IP address of a web server hosted on EC2.

Here is an English introduction to these testing-related knowledge points:

Test

1. Different Type of Tests in whole project lifecycle

  • Unit Tests: These are the most granular level of tests. They focus on testing individual units of code, such as a single function, method, or class. Unit tests are usually written by developers and are aimed at verifying that a particular piece of code behaves as expected in isolation. They help in catching bugs early in the development process and make the code easier to maintain.
  • Integration Tests: These tests check how different components or modules of the system work together. They ensure that the interfaces between various parts of the application are functioning correctly. For example, in a software system with a database layer, a business logic layer, and a presentation layer, integration tests would verify that data can flow properly between these layers.
  • System Tests: System tests evaluate the entire system as a whole to ensure that it meets the specified requirements. They simulate real-world scenarios and user interactions to test the system’s functionality, performance, and usability. This includes testing all the components together in the production-like environment.
  • Acceptance Tests: These tests are performed to determine whether the system meets the business requirements and is acceptable to the end-users or stakeholders. Acceptance tests can be user acceptance tests (UAT), where end-users test the system to see if it meets their needs, or contract acceptance tests, which are based on the requirements specified in a contract.
  • Regression Tests: After making changes to the system, such as bug fixes or new feature implementations, regression tests are run to ensure that the existing functionality has not been broken. They are a subset of the overall test suite that focuses on the areas of the system that are likely to be affected by the changes.

2. Unit Test, Mock

  • Unit Test: A unit test is a piece of code that exercises a specific unit of functionality in an isolated way. It provides a set of inputs to the unit under test and verifies that the output is as expected. Unit tests should be fast, independent, and repeatable. For example, in a Java application, a unit test for a method that calculates the sum of two numbers would provide different pairs of numbers as inputs and check if the calculated sum is correct.
  • Mock: In unit testing, a mock is an object that mimics the behavior of a real object, such as a database, a web service, or another component. Mocks are used when the real object is difficult to create, expensive to set up, or not available during testing. For instance, if a unit of code depends on a database call, instead of actually connecting to the database, a mock object can be used to return predefined data. This allows the unit test to focus on testing the logic of the unit under test without being affected by the external dependencies.

3. Testing Rest Api with Rest Assured

Rest Assured is a Java library used for testing RESTful APIs. It simplifies the process of sending HTTP requests to an API and validating the responses.

  • Sending Requests: With Rest Assured, you can easily send different types of HTTP requests like GET, POST, PUT, DELETE, etc. For example, to send a GET request to an API endpoint, you can use code like given().when().get("https://example.com/api/endpoint").then();
  • Validating Responses: You can validate various aspects of the response, such as the status code (e.g., then().statusCode(200); to check if the response has a 200 status code), the headers, and the body. You can use methods to extract data from the response body and perform assertions on it. For instance, if the API returns JSON data, you can use JsonPath expressions in Rest Assured to extract and validate specific fields in the JSON.

4. AUTOMATION TEST

  • BDD - Cucumber - annotations: Behavior-Driven Development (BDD) is an approach that focuses on defining the behavior of the system from the perspective of the stakeholders. Cucumber is a popular tool for implementing BDD in Java (and other languages). Annotations in Cucumber are used to mark different parts of the feature files and step definitions. For example, @Given, @When, @Then are commonly used annotations in step definitions. @Given is used to set up the preconditions, @When describes the action being performed, and @Then is used to define the expected outcome. Feature files written in Gherkin language (a simple syntax used by Cucumber) use these annotations to describe the behavior of the system in a human-readable format.
  • Load Test with JMeter: Apache JMeter is a tool used for load testing web applications, web services, and other types of applications. It can simulate a large number of concurrent users sending requests to the application to measure its performance under load. You can configure JMeter to define the number of threads (simulating users), the ramp-up period (how quickly the users are added), and the duration of the test. It can generate detailed reports on metrics such as response times, throughput, and error rates, helping you identify bottlenecks in the application.
  • Performance tool JProfiler: JProfiler is a powerful Java profiling tool used for performance analysis. It can help you identify performance issues in your Java applications by analyzing memory usage, CPU utilization, and thread behavior. It allows you to take snapshots of the application’s state at different times, trace method calls, and find memory leaks. You can use JProfiler to optimize your code by identifying methods that consume a lot of resources and improving their performance.
  • AB Test: AB testing is a method of comparing two versions (A and B) of a web page, application feature, or marketing campaign to determine which one performs better. In AB testing, a random subset of users is shown version A, and another random subset is shown version B. Metrics such as click-through rates, conversion rates, or user engagement are then measured for each version. Based on the results, you can decide which version to implement permanently. AB testing is often used in web development and digital marketing to make data-driven decisions about changes to the product or service.