Java backend syllabus

. . .

Contractor interview real Questions

Contractor interview real Q

Java

Java Top Q notes

Kafka

MQ notes

Spring

Spring Boot Notes

Syllabus

Session Topic Detailed Topics
1 JVM STRING FINAL 1. Warm Up
2. JVM Memory Management
3. JVM, JDK, JRE
4. Garbage Collection
5. String & StringBuilder & StringBuffer
6. Final, Finally, Finalize
7. Immutable class (optional: basic syntax of java)
2 STATIC OOP 1. Static
2. Marker Interface - Serializable, Cloneable
3. OOP
4. SOLID Principle
5. Reflection
6. Generics
3 COLLECTION 1. Array vs ArrayList vs LinkedList
2. Set, TreeSet, LinkedHashSet
3. Map, LinkedHashMap, ConcurrentHashMap(how it works)
4. SynchronizedMap
5. Iterator vs Enumeration
4 EXCEPTION DESIGN PATTERN 1. Design Pattern - Singleton, Factory, Observer, Proxy
2. Exception Type - compile, runtime, customized
5 THREADS 1. MultiThreads Interaction (Synchronized, Atomic, ThreadLocal, Volatile)
2. Reentrant Lock
3. Executor and ThreadPool, ForkJoinPool
4. Future & CompletableFuture
5. Runnable vs Callable
6. Semaphore vs Mutex
6 JAVA8,17 1. Java 8: Functional Interface, Lambda, Stream API (map, filter, sorted, groupingBy etc), Optional, Default
2. Java 17: Sealed Class, advantage vs limitation, across package
7 SQL 1. Primary Key, Normalization
2. Different type of Joins
3. Top asked SQLs - nth highest salary; highest salary each department; employee salary greater than manager
4. Introduce of Stored Procedure and Function
5. Cluster index vs Non - Cluster - Index
6. Explain Plan - what does it do, what can it tell
8 NOSQL 1. SQL vs NoSQL
2. MongoDB vs Cassandra introduction
3. ACID vs CAP rules explanation
9 REST API 1. DispatcherServlet
2. Rest API
3. How to create a good rest api
4. Http Error Code: 200, 201, 400, 401, 403, 404, 500, 502, 503, 504
5. Introduction of GraphQL, WebSocket, gRPC
6. ReactiveJava
10 SPRING CORE 1. IOC/DI
2. Bean Scope
3. Constructor vs Setter vs Field based injection
11 SPRING ANNOTATIONS 1. Different spring annotations
2. @Controller vs @RestController
3. @Qualifier, @Primary
4. Spring Cache and Retry
12 SPRING BOOT 1. How to create spring boot from Scratch
2. Benefit of Spring boot
3. Annotation @SpringBootApplication
4. AutoConfiguration, how to disable
5. Actuator
13 SPRING BOOT2 1. Spring ActiveProfile
2. AOP
3. @ExceptionHandler, @ControllerAdvice
14 DATA ACCESS 1. JDBC, statement vs PreparedStatement, Datasource
2. Hibernate ORM, Session, Cache
3. Optimistic Locking - add version column
4. Association: many - to - many
15 TRANSACTION JPA 1. @Transactional - atomic operation
2. Propagation, Isolation
3. JPA naming convention
4. Paging and Sorting Using JPA
5. Hibernate Persistence Context
16 SECURITY 1. How to implement Security by overriding Spring class
2. Basic Authentication and password encryption
3. JWT Token and workflow
4. Oauth2 workflow
5. Authorization based on User role
17 UNIT TEST 1. Different Type of Tests in whole project lifecycle
2. Unit Test, Mock
3. Testing Rest Api with Rest Assured
18 AUTOMATION TEST 1. BDD - Cucumber - annotations
2. Load Test with JMeter
3. Performance tool JProfiler
4. AB Test
19 MICROSERVICE 1. Benefits/Disadvantage of MicroService
2. How to split monolithic to microservice
3. Circuit Breaker - concept, retry, fallback method
4. Load Balancer - concept and algorithms
5. API Gateway
6. Config Server
20 KAFKA 1. Kafka - concepts, how it works and how message is sent to partition
2. Consumer Group, assignment strategy
3. Message in Order
21 KAFKA2 1. Kafka Duplicate Message
2. Kafka Message Loss
3. Poison Failure, DLQ
4. Kafka Security (SASL, ACLs, Encrypt etc)
22 DISTRIBUTED SYSTEM 1. MicroService: how to communicate between services
2. Saga Pattern
3. Monitoring: Splunk, Grafana, Kabana, CloudWatch etc
4. System Design: distributed system
23 DEVOPS 1. CICD
2. Jenkins pipeline with example
3. Git Commands: squash, cherry - pick etc
4. On - Call: PageDuty etc
5. How do you solve a production issue with or without log
24 KUBERNETES 1. Kubernetes, EKS, WCNP, KubeCtl
25 CLOUD AWS Modules with examples

Kubernetes

pod vs node in Kubernetes

本文由 简悦 SimpRead 转码, 原文地址 www.cloudzero.com

Kubernetes pods, nodes, and clusters get mixed up. Here’s a guide for beginners or if you just need t……

July 19, 2024 , 10 min read

Kubernetes pods, nodes, and clusters get mixed up. Here’s a simple guide for beginners or if you just need to reaffirm your knowledge of Kubernetes components.

alt text

Kubernetes is increasingly becoming the standard way to deploy, run, and maintain cloud-native applications that run inside containers. Kubernetes (K8s) automates most container management tasks, empowering engineers to manage high-performing, modern applications at scale.

Meanwhile, several surveys, including those from VMware and Gartner, suggest that inadequate expertise with Kubernetes has held back organizations from fully adopting containerization. So, maybe you’re wondering how Kubernetes components work.

In that case, we’ve put together a bookmarkable guide on pods, nodes, clusters, and more. Let’s dive right in, starting with the very reason Kubernetes exists; containers.

Quick Summary

 

Pod

Node

Cluster

Description

The smallest deployable unit in a Kubernetes cluster

A physical or virtual machine

A grouping of multiple nodes in a Kubernetes environment

Role

Isolates containers from underlying servers to boost portability

Provides the resources and instructions for how to run containers optimally

Provides the compute resources (CPU, volumes, etc) to run containerized apps

Has the control plane to orchestrate containerized apps through nodes and pods

What it hosts

Application containers, supporting volumes, and similar IP addresses for logically similar containers

Pods with application containers inside them, kubelet

Nodes containing the pods that host the application containers, control plane, kube-proxy, etc

What Is A Container?

In software engineering, a container is an executable unit of software that packages and runs an entire application, or portions of it, within itself.

Containers comprise not only the application’s binary files, but also libraries, runtimes, configuration files, and any other dependencies that the application requires to run optimally. Talk about self-sufficiency.

alt text

Credit: Containers vs virtual machine architectures

This design enables a container to be an entire application runtime environment unto itself.

As a result, a container isolates the application it hosts from the external environment it runs on. This enables applications running in containers to be built in one environment and deployed in different environments without compatibility problems.

Also, because containers share resources and do not host their own operating system, they are leaner than virtual machines (VMs). This makes deploying containerized applications much quicker and more efficient than on contemporary virtual machines.

What Is A Containerized Application?

In cloud computing, a containerized application refers to an app that has been specially built using cloud-native architecture for running within containers. A container can either host an entire application or small, distributed portions of it (which are known as microservices).

Developing, packaging, and deploying applications in containers is referred to as containerization. Apps that are containerized can run in a variety of environments and devices without causing compatibility problems.

One more thing. Developers can isolate faulty containers and fix them independently before they affect the rest of the application or cause downtime. This is something that is extremely tricky to do with traditional monolithic applications.

What Is A Kubernetes Pod?

A Kubernetes pod is a collection of one or more application containers.

The pod is an additional level of abstraction that provides shared storage (volumes), IP address, communication between containers, and hosts other information about how to run application containers. Check this out:

alt text
Credit: Kubernetes Pods architecture by Kubernetes.io

So, containers do not run directly on virtual machines and pods are a way to turn containers on and off.

Containers that must communicate directly to function are housed in the same pod. These containers are also co-scheduled because they work within a similar context. Also, the shared storage volumes enable pods to last through container restarts because they provide persistent data.

Kubernetes also scales or replicates the number of pods up and down to meet changing load/traffic/demand/performance requirements. Similar pods scale together.

Another unique feature of Kubernetes is that rather than creating containers directly, it generates pods that already have containers.

Also, whenever you create a K8s pod, the platform automatically schedules it to run on a Node. This pod will remain active until the specific process completes, resources to support the pod run out, the pod object is removed, or the host node terminates or fails.

Each pod runs inside a Kubernetes node, and each pod can fail over to another, logically similar pod running on a different node in case of failure. And speaking of Kubernetes nodes.

What Is A Kubernetes Node?

A Kubernetes node is either a virtual or physical machine that one or more Kubernetes pods run on. It is a worker machine that contains the necessary services to run pods, including the CPU and memory resources they need to run.

Now, picture this:

alt text

Credit: How Kubernetes Nodes work by Kubernetes.io

Each node also comprises three crucial components:

  • Kubelet – This is an agent that runs inside each node to ensure pods are running properly, including communications between the Master and nodes.
  • Container runtime – This is the software that runs containers. It manages individual containers, including retrieving container images from repositories or registries, unpacking them, and running the application.
  • Kube-proxy – This is a network proxy that runs inside each node, managing the networking rules within the node (between its pods) and across the entire Kubernetes cluster.

Here’s what a Cluster is in Kubernetes.

What Is A Kubernetes Cluster?

-

Nodes usually work together in groups. A Kubernetes cluster contains a set of work machines (nodes). The cluster automatically distributes workload among its nodes, enabling seamless scaling.

Here’s that symbiotic relationship again.

A cluster consists of several nodes. The node provides the compute power to run the setup. It can be a virtual machine or a physical machine. A single node can run one or more pods.

Each pod contains one or more containers. A container hosts the application code and all the dependencies the app requires to run properly.

Something else. The cluster also comprises the Kubernetes Control Plane (or Master), which manages each node within it. The control plane is a container orchestration layer where K8s exposes the API and interfaces for defining, deploying, and managing containers’ lifecycles.

The master assesses each node and distributes workloads according to available nodes. This load balancing is automatic, ensures efficiency in performance, and is one of the most popular features of Kubernetes as a container management platform.

You can also run the Kubernetes cluster on different providers’ platforms, such as Amazon’s Elastic Kubernetes Service (EKS), Microsoft’s Azure Kubernetes Service (AKS), or the Google Kubernetes Engine (GKE).

Take The Next Step: View, Track, And Control Your Kubernetes Costs With Confidence

Open-source, highly scalable, and self-healing, Kubernetes is a powerful platform for managing containerized applications. But as Kubernetes components scale to support business growth, Kubernetes cost management tends to get blindsided.

Most cost tools only display your total cloud costs, not how Kubernetes containers contributed. With CloudZero, you can view Kubernetes costs down to the hour as well as by K8s concepts such as, cost per pod, container, microservice, namespace, and cluster costs.

alt text

By drilling down to this level of granularity, you are able to find out what people, products, and processes are driving your Kubernetes spending.

You can also combine your containerized and non-containerized costs to simplify your analysis. CloudZero enables you to understand your Kubernetes costs alongside your AWS, Azure, Google Cloud, Snowflake, Databricks, MongoDB, and New Relic spend. Getting the full picture.

You can then decide what to do next to optimize the cost of your containerized applications without compromising performance. CloudZero will even alert you when cost anomalies occurs before you overspend.

to see these CloudZero Kubernetes Cost Analysis capabilities and more!

Kubernetes FAQ

Is a Kubernetes Pod a Container?

Yes, a Kubernetes pod is a group of one or more containers that share storage and networking resources. Pods are the smallest deployable units in Kubernetes and manage containers collectively, allowing them to run in a shared context with shared namespaces.

What is the difference between container node and pod?

A node is a worker machine in Kubernetes, part of a cluster, that runs containers and other Kubernetes components. A pod, on the other hand, is a higher-level abstraction that encapsulates one or more containers and their shared resources, managed collectively within a node.

Can a pod have multiple containers?

Yes, a Kubernetes pod can have multiple containers. Pods are designed to encapsulate closely coupled containers that need to share resources and communicate with each other over localhost. This approach facilitates running multiple containers within the same pod while treating them as a cohesive unit for scheduling, scaling, and management within the Kubernetes cluster.

How many pods run on a node?

The number of Kubernetes pods that can run on a node depends on various factors such as the node’s resources (CPU, memory, etc.), the resource requests and limits set by the pods, and any other applications or system processes running on the node.

Generally, a node can run multiple pods, and the Kubernetes scheduler determines pod placement based on available resources and scheduling policies defined in the cluster configuration.

Security

  • How to implement Security by overriding Spring class
    • You can implement security in a Spring application by overriding certain Spring security classes. For example, you can extend WebSecurityConfigurerAdapter
    • and override methods like configure(HttpSecurity http) to define custom security configurations such as access rules, authentication mechanisms, etc.
    • You can also override other classes like UserDetailsService to provide custom user authentication and authorization logic.
  • Basic Authentication and password encryption
    • Basic authentication is a simple authentication mechanism where the client sends the username and password in the request headers.
    • In Spring, it can be configured easily. Password encryption is crucial for security.
    • Spring provides various password encoding mechanisms like BCryptPasswordEncoder to securely hash and store passwords.
    • When a user registers or changes their password, the password is encrypted and stored in the database, and during authentication, the provided password is encrypted and compared with the stored hash.
  • JWT Token and workflow
    • JSON Web Token (JWT) is a widely used token-based authentication and authorization mechanism.
    • The workflow typically involves
      • the client sending username and password to the server for authentication.
      • If the authentication is successful, the server generates a JWT token containing user information as payload and a signature.
        • The client then stores the token and sends it in the headers of subsequent requests.
        • The server validates the token on each request and authorizes the user based on the information in the token.
  • Oauth2 workflow
    • OAuth2 is an authorization framework that allows users to grant limited access to their resources on one server to another server without sharing their credentials.
    • The typical OAuth2 workflow involves steps like
      • the client redirecting the user to the authorization server for authentication and authorization,
      • the user granting permission, the authorization server issuing an access token,
      • and the client using the access token to access protected resources on the resource server.
  • Authorization based on User role
    • In a Spring security application, authorization based on user roles can be implemented by assigning different roles to users and configuring access rules based on those roles.
    • You can use annotations like @PreAuthorize or configure access rules in the security configuration to specify which roles are allowed to access which resources or perform which operations.
    • For example, you can define that only users with the ROLE_ADMIN role can access certain administrative endpoints.
  • What is XSS attack and how to prevent it?
    • XSS (Cross-Site Scripting) is a vulnerability where an attacker injects malicious scripts into a website.
      These scripts run on users’ browsers, allowing attackers to steal data or perform actions on behalf of the user.
    • To prevent XSS, you should:
      • Sanitize and escape user input.
      • Use Content Security Policy (CSP).
      • Set HttpOnly and Secure flags for cookies.
      • Avoid inserting user input directly into the HTML without validation.
  • What is CSRF attack and how to prevent it?
    • CSRF (Cross-Site Request Forgery) is an attack where an attacker tricks a user into making unwanted requests to a website on which they are authenticated.
      This can result in unauthorized actions, such as changing account settings or making purchases.
    • To prevent CSRF, you should:
      • Use anti-CSRF tokens in forms.
      • Implement SameSite cookie attributes.
      • Ensure that sensitive actions require additional authentication (like a CAPTCHA).
      • Check the Referer header to validate requests.

Authorization based on User role using Spring Security

In Spring Security, you can implement role-based authorization by assigning roles to users and restricting access to certain endpoints or methods based on those roles.

Step-by-Step: Role-Based Authorization in Spring

1. Assign Roles to Users

Typically in your UserDetailsService implementation or user entity:

@Override
public UserDetails loadUserByUsername(String username) {
return new User(
"admin",
"{noop}password",
List.of(new SimpleGrantedAuthority("ROLE_ADMIN"))
);
}

⚠️ Prefix roles with ROLE_ — it’s required by Spring Security.

2. Secure Endpoints Based on Role

Using HttpSecurity in a SecurityConfig class:
@Override
protected void configure(HttpSecurity http) throws Exception {
http
.authorizeHttpRequests()
.requestMatchers("/admin/**").hasRole("ADMIN")
.requestMatchers("/user/**").hasAnyRole("USER", "ADMIN")
.anyRequest().authenticated()
.and()
.formLogin();
}

3. Method-Level Authorization (Optional)

Use annotations with @EnableMethodSecurity:

@PreAuthorize("hasRole('ADMIN')")
public void adminOnlyOperation() {
// only admins can call this
}

Summary

Component Purpose
SimpleGrantedAuthority Assign roles to user
hasRole("ROLE_NAME") Protect endpoints/methods by role
@PreAuthorize Method-level security (optional)
SecurityFilterChain Define path-based access control

XSS

XSS (Cross-Site Scripting) is a type of security vulnerability where an attacker injects malicious scripts into trusted websites. When a user visits the site, the script runs in their browser, potentially stealing cookies, session tokens, or sensitive data.

Types of XSS

Type Description
Stored XSS Malicious script is permanently stored on the server (e.g., in a database or comment).
Reflected XSS Script is immediately returned by the server (e.g., in URL parameters).
DOM-based XSS Script is injected via client-side JavaScript manipulation, without server interaction.

Example of XSS

function escape(s) {
return '<script>console.log("'+s+'");</script>';
}

如果输入的 s");alert(1);// ,则将 return <script>console.log("");alert(1);//");</script> , 这就会弹出警告窗口 alert(1) 这就是恶意脚本注入

How to Prevent XSS

1. Escape Output

  • Always escape user input before injecting it into HTML, JS, or attributes.
  • Use libraries like DOMPurify (for HTML) or encoding functions (encodeURIComponent, etc).

2. Use Safe APIs

  • Prefer textContent, createTextNode over innerHTML.

3. Validate Input

  • Use server-side and client-side validation to restrict allowed content.

4. Use Content Security Policy (CSP)

  • Prevents execution of inline scripts or loading from untrusted sources.

5. Sanitize User Input

  • Strip or neutralize dangerous code via input sanitization libraries (e.g., DOMPurify).

Summary

XSS exploits trust between users and websites.
Defense = sanitize, escape, validate, and use secure APIs.

CSRF

CSRF(Cross-Site Request Forgery),即跨站请求伪造,是一种常见的网络攻击方式。下面为你详细解释 CSRF 但不涉及本项目代码:

和JWT的关系

在某些情况下,比如使用无状态的 RESTful API(如使用 JWT 进行身份验证,是无状态的),可以考虑禁用 CSRF 防护,因为 JWT 本身已经提供了一定的安全性,且 RESTful API 通常通过其他方式(如令牌验证)来确保请求的合法性。

但在一些有状态的应用中,如传统的基于会话的 Web 应用,通常需要开启 CSRF 防护

攻击原理

  1. 用户认证:用户在访问某个受信任的网站 A 时,进行了登录操作,网站 A 会在用户的浏览器中保存用户的认证信息,比如会话 Cookie
  2. 恶意网站诱导:攻击者构建一个恶意网站 B,当用户在访问恶意网站 B 时,网站 B 会利用一些手段(比如自动提交表单等)向受信任的网站 A 发送一个请求。由于用户的浏览器中保存了网站 A 的认证信息,这个请求会携带用户的认证信息(如 Cookie)发送到网站 A。
  3. 网站 A 处理请求:网站 A 收到请求后,因为请求中包含了用户的合法认证信息,会误以为是用户自己发起的请求,从而执行相应的操作,比如修改用户的密码、转账等。

常见的攻击场景

  • 自动提交表单:恶意网站包含一个隐藏的表单,表单的 action 属性指向受信任网站的某个敏感操作接口,当用户访问恶意网站时,表单会自动提交,从而触发对受信任网站的攻击请求。
  • 图片标签攻击:攻击者在恶意网站中使用 <img> 标签,将其 src 属性设置为受信任网站的某个敏感操作接口,当用户访问恶意网站时,浏览器会自动请求该图片,从而触发对受信任网站的攻击请求。

防范措施

  1. 使用 CSRF Token

    • 原理:服务器在生成页面时,会为每个用户的请求生成一个唯一的 CSRF Token,并将其嵌入到页面中(比如作为隐藏表单字段或者请求头)。当用户提交表单或者发送请求时,必须携带这个 CSRF Token。服务器在接收到请求时,会验证请求中的 CSRF Token 是否与服务器生成的一致,如果不一致则拒绝请求。
    • 示例:在表单中添加 CSRF Token
      <form action="/transfer" method="post">
      <input type="hidden" name="csrf_token" value="{{ csrf_token }}">
      <!-- 其他表单字段 -->
      <input type="submit" value="Transfer">
      </form>
  2. 检查请求的 Referer

    • 原理:服务器在接收到请求时,检查请求的 Referer 头,确保请求是从本网站的页面发起的。如果 Referer 头为空或者指向其他域名,则拒绝请求。
    • 缺点:Referer 头可以被篡改,并且有些用户可能会禁用 Referer 头,因此这种方法不是非常可靠,通常作为辅助手段使用。
  3. 使用 SameSite Cookie 属性
    • 原理:SameSite 是一个 Cookie 属性,用于控制 Cookie 在跨站请求时的发送行为。可以将 Cookie 的 SameSite 属性设置为 StrictLaxStrict 表示 Cookie 只能在同一站点的请求中发送,Lax 表示 Cookie 可以在一些安全的跨站请求(如 GET 请求)中发送。
    • 示例:在设置 Cookie 时添加 SameSite 属性:
      // Java 示例
      Cookie cookie = new Cookie("session_id", "123456");
      cookie.setSameSite("Strict");
      response.addCookie(cookie);

总之,CSRF 是一种严重的安全威胁,开发人员在开发 Web 应用时需要采取有效的防范措施来保护用户的信息安全。

SSL 3 ways handshake

alt text

It goes roughly as follows:

  1. The ‘client hello’ message: The client initiates the handshake by sending a “hello” message to the server. The message will include which TLS version the client supports, the cipher suites supported, and a string of random bytes known as the “client random.”
  2. The ‘server hello’ message: In reply to the client hello message, the server sends a message containing the server’s SSL certificate, the server’s chosen cipher suite, and the “server random,” another random string of bytes that’s generated by the server.
  3. Authentication: The client verifies the server’s SSL certificate with the certificate authority that issued it. This confirms that the server is who it says it is, and that the client is interacting with the actual owner of the domain.
  4. The premaster secret: The client sends one more random string of bytes, the “premaster secret.” The premaster secret is encrypted with the public key and can only be decrypted with the private key by the server. (The client gets the public key from the server’s SSL certificate.)
  5. Private key used: The server decrypts the premaster secret.
  6. Session keys created: Both client and server generate session keys from the client random, the server random, and the premaster secret. They should arrive at the same results.
  7. Client is ready: The client sends a “finished” message that is encrypted with a session key.
  8. Server is ready: The server sends a “finished” message encrypted with a session key.
  9. Secure symmetric encryption achieved: The handshake is completed, and communication continues using the session keys.

REST API

常考题-写个Controller

@RestController
@RequestMapping("/user")
public class HelloController {

@Autowired
private HelloService service;

@GetMapping("/hello/{employee_name}")
public String greetings(@PathVariable("employee_name") String name) { // TODO pathVariable要指定名字吗?
return "How are you, " + name;
}
// In the second example, strname is the path variable in the URL, but the method parameter is named username. Without "strname" in @PathVariable, Spring wouldn’t know how to map it.
@GetMapping("/greeting/{employee_name}")
public ResponseEntity<String> greetingsEmployee(@PathVariable String employee_name) { // TODO pathVariable要指定名字吗?
// return "How are you, " + employee_name;
return ResponseEntity.ok("How are you, " + employee_name);
}

@GetMapping("/testRequestParam")
public ResponseEntity<String> testRequestParam(@RequestParam("num") int number) {
return ResponseEntity.ok(String.valueOf(number));
}

@PostMapping("/create")
public ResponseEntity<String> testPostBody(@RequestBody UserInfo userInfo, @RequestParam(value = "userRole", required = false, defaultValue = "guest") String userRole) {
// @PageableDefault(size = 10) Pageable pageable) {
// return ResponseEntity.ok(userInfo.getUserName() + ": " + userRole + ": pageable = " + pageable);
return ResponseEntity.ok(userInfo.getUserName() + ": " + userRole );
}

1. DispatcherServlet

Definition

DispatcherServlet is a key component in the Spring Web MVC framework. It serves as the front - controller in a Spring - based web application. A front - controller is a single servlet that receives all HTTP requests and then dispatches them to the appropriate handlers (controllers) based on the request’s URL, HTTP method, and other criteria.

Function

  • Request Routing: It maps incoming requests to the appropriate @Controller classes and their methods using the configured handler mappings. For example, it can match a request to a specific controller method based on the URL pattern defined in the @RequestMapping annotation.
  • View Resolution: After a controller method processes the request and returns a logical view name, the DispatcherServlet uses a view resolver to map this logical name to an actual view (such as a JSP page or a Thymeleaf template) and renders the response.
  • Intercepting and Pre - processing: It can also use interceptors to perform pre - processing and post - processing tasks on requests and responses, like logging, authentication checks, etc.

2. Rest API

Definition

REST (Representational State Transfer) is an architectural style for building web services. A REST API (Application Programming Interface) is a set of rules and conventions for creating and consuming web services based on the REST principles.

Characteristics

  • Stateless: Each request from a client to a server must contain all the information necessary to understand and process the request. The server does not store any client - specific state between requests.
  • Resource - Oriented: Resources are the key abstractions in a REST API. Resources can be things like users, products, or orders, and are identified by unique URIs (Uniform Resource Identifiers).
  • HTTP Verbs: REST APIs use standard HTTP methods (verbs) to perform operations on resources. For example, GET is used to retrieve a resource, POST to create a new resource, PUT to update an existing resource, and DELETE to remove a resource.

3. How to create a good REST API

Design Principles

  • Use Clear and Descriptive URIs: URIs should clearly represent the resources. For example, use /users to represent a collection of users and /users/{userId} to represent a specific user.
  • Follow HTTP Verbs Correctly: Use GET for retrieval, POST for creation, PUT for full - update, PATCH for partial - update, and DELETE for deletion.
  • Return Appropriate HTTP Status Codes: Indicate the result of the request clearly. For example, return 200 for successful retrievals, 201 for successful creations, and 4xx or 5xx for errors.
  • Provide Good Documentation: Use tools like Swagger to generate documentation that explains the API endpoints, their input parameters, and expected output.

Security and Performance

  • Authentication and Authorization: Implement proper authentication mechanisms (e.g., OAuth, JWT) to ensure that only authorized users can access the API.
  • Caching: Implement caching strategies to reduce the load on the server and improve response times.

4. HTTP Error Codes

  • 200 OK: Indicates that the request has succeeded. It is commonly used for successful GET requests to retrieve a resource or successful PUT/PATCH requests to update a resource.
  • 201 Created: Used when a new resource has been successfully created. For example, when a client sends a POST request to create a new user, and the server successfully creates the user, it returns a 201 status code.
  • 400 Bad Request: Signifies that the server cannot process the request due to a client - side error, such as malformed request syntax, invalid request message framing, or deceptive request routing.
  • 401 Unauthorized: Indicates that the request requires user authentication. The client needs to provide valid credentials to access the requested resource.
  • 403 Forbidden: The client is authenticated, but it does not have permission to access the requested resource. For example, a regular user trying to access an administrative - only endpoint.
  • 404 Not Found: The requested resource could not be found on the server. This might be because the URL is incorrect or the resource has been deleted.
  • 500 Internal Server Error: A generic error message indicating that the server encountered an unexpected condition that prevented it from fulfilling the request. It could be due to a programming error, database issues, etc.
  • 502 Bad Gateway: The server, while acting as a gateway or proxy, received an invalid response from an upstream server.
  • 503 Service Unavailable: The server is currently unable to handle the request due to temporary overloading or maintenance. The client may try again later.
  • 504 Gateway Timeout: The server, while acting as a gateway or proxy, did not receive a timely response from an upstream server.

5. Introduction of GraphQL, WebSocket, gRPC

GraphQL

  • Definition: GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. It allows clients to specify exactly what data they need from an API, reducing over - fetching and under - fetching of data.
  • Advantages: It provides a more efficient way of data retrieval compared to traditional REST APIs, especially in complex applications where clients may need different subsets of data. It also has a strong type system and can be introspected by clients.

以下是关于 GraphQL 和 RESTful 的对比表格,从多个方面详细阐述了它们的特点和差异:

对比维度 RESTful GraphQL
数据获取方式 通常以固定的端点(Endpoints)获取资源,每个端点返回固定结构的数据。例如,/users 端点返回所有用户列表,/users/{id} 返回特定用户详细信息。如果客户端需要多个不同资源的数据,可能需要多次请求不同的端点。 客户端可以精确指定需要的数据字段,服务器只返回所请求的数据。通过在查询中定义字段,可以在一次请求中获取多个相关资源的数据,避免过度获取或不足获取数据的问题。
数据传输量 可能会返回过多不必要的数据(过度获取),或者客户端需要多次请求才能获取完整所需数据(不足获取),导致数据传输量较大或请求次数增多。例如,客户端只需要用户的姓名和邮箱,但 /users/{id} 端点返回了用户的所有详细信息,包括地址、电话等。 只返回客户端请求的数据字段,减少了不必要的数据传输,提高了数据传输效率,尤其在移动设备等带宽有限的场景下更具优势。
版本控制 一般通过在 URL 中添加版本号(如 /v1/users/v2/users)来进行版本控制。新的版本可能会对端点的结构和功能进行修改,客户端需要明确区分不同版本并进行相应调整。 由于客户端自定义查询,服务器端的字段增减或修改不一定会影响客户端已有的查询。如果需要对数据模型进行修改,可以在不破坏现有客户端查询的前提下进行,因此在版本控制方面相对灵活,不需要像 RESTful 那样严格的版本区分。
缓存策略 可以利用 HTTP 缓存机制,如 Cache-ControlETag 等进行缓存。但由于每个端点返回的数据结构相对固定,缓存粒度较粗,可能会出现缓存失效或缓存不命中的情况。 缓存相对复杂,因为每个客户端的查询可能不同。可以通过在服务器端实现自定义的缓存策略,针对具体的查询进行缓存,但需要更多的开发和维护工作。
错误处理 通常使用 HTTP 状态码来表示请求的结果,如 200 表示成功,404 表示资源未找到,500 表示服务器内部错误等。对于更详细的错误信息,可能需要在响应体中返回。 可以在响应中返回详细的错误信息,包括错误位置(在查询中的位置)和错误描述,帮助客户端更准确地定位和处理错误。
开发效率 开发人员需要为每个资源和操作定义端点,当需求变化或新增功能时,可能需要修改或新增多个端点,开发和维护成本较高。 开发人员定义数据模型和 GraphQL 模式(Schema),客户端根据模式进行查询。由于客户端有更多的自主性,服务器端的开发和修改相对集中在模式的更新上,一定程度上提高了开发效率。
学习曲线 基于 HTTP 和 REST 原则,概念相对简单直观,开发人员容易理解和上手,尤其对于有 Web 开发经验的人员。 需要学习 GraphQL 的语法、模式定义、查询和变更(Mutation)等概念,对于初学者来说可能有一定的学习成本,但掌握后可以更灵活地进行数据交互。
生态系统和工具支持 有丰富的工具和框架支持,如 Express(Node.js)、Django REST framework(Python)等,并且与现有的 Web 技术和基础设施兼容性好。 生态系统在不断发展壮大,有许多优秀的客户端和服务器端库,如 Apollo Server(Node.js)、Relay(React)等,但相对 RESTful 生态系统的成熟度和普及度可能稍逊一筹。

以下是一个简单的示意图来直观展示 GraphQL 和 RESTful 在数据获取上的差异:

RESTful 数据获取示例

客户端 ----> GET /users/{id} ----> 服务器
(获取用户详细信息,可能包含过多不需要字段)
客户端 ----> GET /posts?userId={id} ----> 服务器
(获取该用户的文章,需额外请求)

GraphQL 数据获取示例

客户端 ----> POST /graphql {
user(id: "{id}") {
name
email
posts {
title
content
}
}
} ----> 服务器
(一次请求获取用户信息及相关文章,精确获取所需字段)

希望以上表格和说明能帮助你更好地理解 GraphQL 和 RESTful 的区别。如果还有其他疑问,可以继续向我提问。

WebSocket

  • Definition: WebSocket is a communication protocol that provides full - duplex communication channels over a single TCP connection. It enables real - time communication between a client and a server.
  • Advantages: It reduces the overhead of traditional HTTP requests by maintaining a persistent connection, which is suitable for applications that require real - time updates, such as chat applications, online gaming, and live dashboards.

gRPC

  • Definition: gRPC is a high - performance, open - source universal RPC (Remote Procedure Call) framework. It uses Protocol Buffers as the interface definition language and serialization format.
  • Advantages: It offers high performance, low latency, and strong typing. It is suitable for microservices architectures where efficient communication between services is crucial.

6. ReactiveJava

Definition

ReactiveJava is a Java implementation of the Reactive Extensions (Rx) library. It is used for reactive programming, which is a programming paradigm that deals with asynchronous data streams and the propagation of change.

Key Concepts

  • Observable: Represents a source of data that can emit zero or more items over time. An Observable can emit data synchronously or asynchronously.
  • Subscriber: A Subscriber subscribes to an Observable to receive the emitted items. It can react to the data, errors, or the completion of the data stream.
  • Operators: ReactiveJava provides a rich set of operators that can be used to transform, filter, combine, and manipulate the data streams. For example, the map operator can be used to transform each item in the stream, and the filter operator can be used to filter out unwanted items.

Use Cases

  • Asynchronous Programming: It simplifies asynchronous programming by providing a declarative way to handle asynchronous operations. For example, in a web application, it can be used to handle multiple asynchronous API calls and combine their results.
  • Event - Driven Programming: It is well - suited for event - driven applications where events need to be processed in a reactive and efficient manner. For example, in a GUI application, it can be used to handle user input events and update the UI accordingly.

Test

1. Different Type of Tests in whole project lifecycle

  • Unit Tests: These are the most granular level of tests. They focus on testing individual units of code, such as a single function, method, or class. Unit tests are usually written by developers and are aimed at verifying that a particular piece of code behaves as expected in isolation. They help in catching bugs early in the development process and make the code easier to maintain.
  • Integration Tests: These tests check how different components or modules of the system work together. They ensure that the interfaces between various parts of the application are functioning correctly. For example, in a software system with a database layer, a business logic layer, and a presentation layer, integration tests would verify that data can flow properly between these layers.
  • System Tests: System tests evaluate the entire system as a whole to ensure that it meets the specified requirements. They simulate real-world scenarios and user interactions to test the system’s functionality, performance, and usability. This includes testing all the components together in the production-like environment.
  • Acceptance Tests: These tests are performed to determine whether the system meets the business requirements and is acceptable to the end-users or stakeholders. Acceptance tests can be user acceptance tests (UAT), where end-users test the system to see if it meets their needs, or contract acceptance tests, which are based on the requirements specified in a contract.
  • Regression Tests: After making changes to the system, such as bug fixes or new feature implementations, regression tests are run to ensure that the existing functionality has not been broken. They are a subset of the overall test suite that focuses on the areas of the system that are likely to be affected by the changes.

2. Unit Test, Mock

  • Unit Test: A unit test is a piece of code that exercises a specific unit of functionality in an isolated way. It provides a set of inputs to the unit under test and verifies that the output is as expected. Unit tests should be fast, independent, and repeatable. For example, in a Java application, a unit test for a method that calculates the sum of two numbers would provide different pairs of numbers as inputs and check if the calculated sum is correct.
  • Mock: In unit testing, a mock is an object that mimics the behavior of a real object, such as a database, a web service, or another component. Mocks are used when the real object is difficult to create, expensive to set up, or not available during testing. For instance, if a unit of code depends on a database call, instead of actually connecting to the database, a mock object can be used to return predefined data. This allows the unit test to focus on testing the logic of the unit under test without being affected by the external dependencies.

3. Testing Rest Api with Rest Assured

Rest Assured is a Java library used for testing RESTful APIs. It simplifies the process of sending HTTP requests to an API and validating the responses.

  • Sending Requests: With Rest Assured, you can easily send different types of HTTP requests like GET, POST, PUT, DELETE, etc. For example, to send a GET request to an API endpoint, you can use code like given().when().get("https://example.com/api/endpoint").then();
  • Validating Responses: You can validate various aspects of the response, such as the status code (e.g., then().statusCode(200); to check if the response has a 200 status code), the headers, and the body. You can use methods to extract data from the response body and perform assertions on it. For instance, if the API returns JSON data, you can use JsonPath expressions in Rest Assured to extract and validate specific fields in the JSON.

4. AUTOMATION TEST

  • BDD - Cucumber - annotations: Behavior-Driven Development (BDD) is an approach that focuses on defining the behavior of the system from the perspective of the stakeholders. Cucumber is a popular tool for implementing BDD in Java (and other languages). Annotations in Cucumber are used to mark different parts of the feature files and step definitions. For example, @Given, @When, @Then are commonly used annotations in step definitions. @Given is used to set up the preconditions, @When describes the action being performed, and @Then is used to define the expected outcome. Feature files written in Gherkin language (a simple syntax used by Cucumber) use these annotations to describe the behavior of the system in a human-readable format.
  • Load Test with JMeter: Apache JMeter is a tool used for load testing web applications, web services, and other types of applications. It can simulate a large number of concurrent users sending requests to the application to measure its performance under load. You can configure JMeter to define the number of threads (simulating users), the ramp-up period (how quickly the users are added), and the duration of the test. It can generate detailed reports on metrics such as response times, throughput, and error rates, helping you identify bottlenecks in the application.
  • Performance tool JProfiler: JProfiler is a powerful Java profiling tool used for performance analysis. It can help you identify performance issues in your Java applications by analyzing memory usage, CPU utilization, and thread behavior. It allows you to take snapshots of the application’s state at different times, trace method calls, and find memory leaks. You can use JProfiler to optimize your code by identifying methods that consume a lot of resources and improving their performance.
  • AB Test: AB testing is a method of comparing two versions (A and B) of a web page, application feature, or marketing campaign to determine which one performs better. In AB testing, a random subset of users is shown version A, and another random subset is shown version B. Metrics such as click-through rates, conversion rates, or user engagement are then measured for each version. Based on the results, you can decide which version to implement permanently. AB testing is often used in web development and digital marketing to make data-driven decisions about changes to the product or service.

Database

  • What is data modeling? Why do we need it? When would you need it?
  • What is primary key? How is it different from unique key?
  • What is normalization? Why do you need to normalize?
  • What does data redundancy mean? Can you give an example of each?
  • What is database integrity? Why do you need it?
  • What are joins and explain different types of joins in detail.
  • Explain indexes and why are they needed?
  • If we have 1B data in our relational database and we do not want to fetch all at once. What are the ways that we can partition the data rows?

Explain clustered and non-clustered index and their differences.

1. Clustered Index

Definition

A clustered index determines the physical order of data storage in a table. In other words, the rows of the table are physically arranged on disk in the order of the clustered index key. A table can have only one clustered index because there can be only one physical ordering of the data rows.

How it Works

  • Index Structure: The clustered index is often implemented as a B - tree data structure. The leaf nodes of the B - tree contain the actual data rows of the table, sorted according to the index key.
  • Data Retrieval: When you query data using the columns in the clustered index, the database can quickly locate the relevant rows because they are physically stored in the order of the index. For example, if you have a Customers table with a clustered index on the CustomerID column, and you query for a specific CustomerID, the database can efficiently navigate through the B - tree to find the corresponding row.

Example

-- Create a table with a clustered index on the ID column
CREATE TABLE Products (
ProductID INT PRIMARY KEY CLUSTERED,
ProductName VARCHAR(100),
Price DECIMAL(10, 2)
);

In this example, the ProductID column is the clustered index. The rows in the Products table will be physically sorted by the ProductID value.

2. Non - Clustered Index

Definition

A non - clustered index is a separate structure from the actual data rows. It contains a copy of the indexed columns and a pointer to the location of the corresponding data row in the table. A table can have multiple non - clustered indexes.

How it Works

  • Index Structure: Similar to a clustered index, a non - clustered index is also typically implemented as a B - tree. However, the leaf nodes of the non - clustered index do not contain the actual data rows but rather pointers to the data rows in the table.
  • Data Retrieval: When you query data using the columns in a non - clustered index, the database first searches the non - clustered index to find the pointers to the relevant data rows. Then it uses these pointers to access the actual data rows in the table. This additional step of accessing the data rows can make non - clustered index lookups slightly slower than clustered index lookups for large datasets.

Example

-- Create a table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE
);

-- Create a non - clustered index on the CustomerID column
CREATE NONCLUSTERED INDEX idx_CustomerID ON Orders (CustomerID);

In this example, the idx_CustomerID is a non - clustered index on the CustomerID column. The index stores the CustomerID values and pointers to the corresponding rows in the Orders table.

3. Differences between Clustered and Non - Clustered Indexes

Physical Order of Data

  • Clustered Index: Determines the physical order of data storage in the table. The data rows are physically sorted according to the clustered index key.
  • Non - Clustered Index: Does not affect the physical order of data in the table. It is a separate structure that points to the data rows.

Number of Indexes per Table

  • Clustered Index: A table can have only one clustered index because there can be only one physical ordering of the data.
  • Non - Clustered Index: A table can have multiple non - clustered indexes. You can create non - clustered indexes on different columns or combinations of columns to improve query performance for various types of queries.

Storage Space

  • Clustered Index: Since it stores the actual data rows, it generally requires more storage space compared to a non - clustered index.
  • Non - Clustered Index: Stores only the indexed columns and pointers to the data rows, so it usually requires less storage space.

Query Performance

  • Clustered Index: Is very efficient for range queries (e.g., retrieving all rows where the index value is between a certain range) because the data is physically sorted. It also has an advantage for queries that return a large number of rows.
  • Non - Clustered Index: Is useful for queries that filter on a small subset of data using the indexed columns. However, for queries that need to access a large number of rows, the additional step of following the pointers to the data rows can make it slower than using a clustered index.

Insert, Update, and Delete Operations

  • Clustered Index: Inserting, updating, or deleting rows can be more expensive because it may require re - arranging the physical order of the data on disk.
  • Non - Clustered Index: These operations are generally less expensive because they only involve updating the non - clustered index structure and the pointers, without affecting the physical order of the data.

What are normal forms

In the context of databases, “NF” usually stands for “Normal Form”. Normal forms are used in database design to organize data in a way that reduces data redundancy, improves data integrity, and makes the database more efficient and easier to manage. Some of the commonly known normal forms are:

  • First Normal Form (1NF): A relation is in 1NF if it has atomic values, meaning that each cell in the table contains only a single value and not a set of values. For example, a table where a column stores multiple phone numbers separated by commas would not be in 1NF.
  • Second Normal Form (2NF): A relation is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means that no non-key attribute should depend only on a part of the primary key in case of a composite primary key.
  • Third Normal Form (3NF): A relation is in 3NF if it is in 2NF and there is no transitive dependency of non-key attributes on the primary key. That is, a non-key attribute should not depend on another non-key attribute.
  • Boyce-Codd Normal Form (BCNF): BCNF is a stronger version of 3NF. A relation is in BCNF if for every functional dependency X → Y, X is a superkey. In other words, every determinant must be a candidate key.
  • Fourth Normal Form (4NF): A relation is in 4NF if it is in BCNF and there are no non-trivial multivalued dependencies.

1. Examples of Normalization

First Normal Form (1NF)

Original Table (Not in 1NF):
Suppose we have a Students table that stores information about students and their hobbies.

Student ID Student Name Hobbies
1 John Reading, Painting
2 Jane Singing, Dancing

The Hobbies column contains multiple values separated by commas, which violates 1NF.

Converted to 1NF:
We create a new table structure.
Students Table:

Student ID Student Name
1 John
2 Jane

StudentHobbies Table:

Student ID Hobby
1 Reading
1 Painting
2 Singing
2 Dancing

Second Normal Form (2NF)

Original Table (Violating 2NF):
Consider an Orders table with a composite primary key (Order ID, Product ID).

Order ID Product ID Product Name Order Quantity
1 101 Laptop 2
1 102 Mouse 3
2 101 Laptop 1

The Product Name depends only on the Product ID (part of the composite primary key), violating 2NF.

Converted to 2NF:
Products Table:

Product ID Product Name
101 Laptop
102 Mouse

OrderDetails Table:

Order ID Product ID Order Quantity
1 101 2
1 102 3
2 101 1

Third Normal Form (3NF)

Original Table (Violating 3NF):
Let’s have an Employees table.

Employee ID Department ID Department Name Employee Salary
1 1 IT 5000
2 1 IT 6000
3 2 HR 4500

The Department Name is transitively dependent on the Employee ID through the Department ID, violating 3NF.

Converted to 3NF:
Departments Table:

Department ID Department Name
1 IT
2 HR

Employees Table:

Employee ID Department ID Employee Salary
1 1 5000
2 1 6000
3 2 4500

2. Examples of Database Integrity

Entity Integrity

  • Explanation: Ensures that each row in a table is uniquely identifiable, usually through a primary key.
  • Example: In a Customers table, the Customer ID is set as the primary key.
    CREATE TABLE Customers (
    Customer ID INT PRIMARY KEY,
    Customer Name VARCHAR(100),
    Email VARCHAR(100)
    );

If you try to insert a new row with an existing Customer ID, the database will reject the insert operation because it violates entity integrity.

Referential Integrity

  • Explanation: Maintains the consistency between related tables. A foreign key in one table must match a primary key value in another table.
  • Example: Consider a Orders table and a Customers table. The Orders table has a foreign key Customer ID that references the Customer ID in the Customers table.
    CREATE TABLE Customers (
    Customer ID INT PRIMARY KEY,
    Customer Name VARCHAR(100)
    );

    CREATE TABLE Orders (
    Order ID INT PRIMARY KEY,
    Customer ID INT,
    Order Date DATE,
    FOREIGN KEY (Customer ID) REFERENCES Customers(Customer ID)
    );

If you try to insert an order with a Customer ID that does not exist in the Customers table, the database will not allow it due to referential integrity.

Domain Integrity

  • Explanation: Ensures that the data entered into a column falls within an acceptable range of values.
  • Example: In a Products table, the Price column should only accept positive values.
    CREATE TABLE Products (
    Product ID INT PRIMARY KEY,
    Product Name VARCHAR(100),
    Price DECIMAL(10, 2) CHECK (Price > 0)
    );

If you try to insert a product with a negative price, the database will reject the insert because it violates domain integrity.

How do you represent a multi-valued attribute in a database?

A multi - valued attribute is an attribute that can have multiple values for a single entity. Here are the common ways to represent multi - valued attributes in different types of databases:

Relational Databases

1. Using a Separate Table (Normalization Approach)

This is the most common and recommended method in relational databases as it adheres to the principles of database normalization.

Steps:

  • Identify the Entities and Attributes: Suppose you have an Employees entity with a multi - valued attribute Skills. An employee can have multiple skills, so the Skills attribute is multi - valued.
  • Create a New Table: Create a new table to store the multi - valued data. This table will have a foreign key that references the primary key of the main entity table.
  • Define the Schema:

    -- Create the Employees table
    CREATE TABLE Employees (
    employee_id INT PRIMARY KEY AUTO_INCREMENT,
    employee_name VARCHAR(100)
    );

    -- Create the Skills table
    CREATE TABLE Skills (
    skill_id INT PRIMARY KEY AUTO_INCREMENT,
    employee_id INT,
    skill_name VARCHAR(50),
    FOREIGN KEY (employee_id) REFERENCES Employees(employee_id)
    );
  • Insert and Query Data:

    -- Insert an employee
    INSERT INTO Employees (employee_name) VALUES ('John Doe');

    -- Insert skills for the employee
    INSERT INTO Skills (employee_id, skill_name) VALUES (1, 'Java');
    INSERT INTO Skills (employee_id, skill_name) VALUES (1, 'Python');

    -- Query all skills of an employee
    SELECT skill_name
    FROM Skills
    WHERE employee_id = 1;

2. Using Delimited Lists (Denormalization Approach)

In some cases, for simplicity or performance reasons, you may choose to use delimited lists to represent multi - valued attributes.

Steps:

  • Modify the Main Table: Instead of creating a separate table, you add a single column to the main table and store multiple values separated by a delimiter (e.g., comma).

    -- Create the Employees table with a multi - valued attribute as a delimited list
    CREATE TABLE Employees (
    employee_id INT PRIMARY KEY AUTO_INCREMENT,
    employee_name VARCHAR(100),
    skills VARCHAR(200)
    );
  • Insert and Query Data:

    -- Insert an employee with skills
    INSERT INTO Employees (employee_name, skills) VALUES ('John Doe', 'Java,Python');

    -- Query employees with a specific skill
    SELECT *
    FROM Employees
    WHERE skills LIKE '%Java%';

However, this approach has several drawbacks. It violates the first normal form of database normalization, making it difficult to perform data manipulation and queries, and it can lead to data integrity issues.

Non - Relational Databases

1. Document Databases (e.g., MongoDB)

In document databases, multi - valued attributes can be easily represented as arrays within a document.

Steps:

  • Define the Document Structure: Create a collection and define the document structure to include an array for the multi - valued attribute.

    // Insert a document in the Employees collection
    db.employees.insertOne({
    employee_name: 'John Doe',
    skills: ['Java', 'Python']
    });
  • Query Data:

    // Query employees with a specific skill
    db.employees.find({ skills: 'Java' });

2. Graph Databases (e.g., Neo4j)

In graph databases, multi - valued attributes can be represented as relationships between nodes.

Steps:

  • Create Nodes and Relationships: Create nodes for the main entity and the values of the multi - valued attribute, and then create relationships between them.

    // Create an employee node
    CREATE (:Employee {name: 'John Doe'})
    // Create skill nodes
    CREATE (:Skill {name: 'Java'})
    CREATE (:Skill {name: 'Python'})
    // Create relationships between the employee and skills
    MATCH (e:Employee {name: 'John Doe'}), (s1:Skill {name: 'Java'}), (s2:Skill {name: 'Python'})
    CREATE (e)-[:HAS_SKILL]->(s1)
    CREATE (e)-[:HAS_SKILL]->(s2);
  • Query Data:

    // Query all skills of an employee
    MATCH (e:Employee {name: 'John Doe'})-[:HAS_SKILL]->(s:Skill)
    RETURN s.name;

How do you represent a many-to-many relationship in database?

Here are the common ways to represent a many - to - many relationship in a database:

1. Using a Junction Table (Associative Table)

This is the most prevalent method in relational databases.

Suppose you have two entities that have a many - to - many relationship. For example, in a school database, “Students” and “Courses”. A student can enroll in multiple courses, and a course can have multiple students.

Step 2: Create the junction table

The junction table contains at least two foreign keys, each referencing the primary key of one of the related tables.

  • Table creation in SQL (for MySQL):
    -- Create the Students table
    CREATE TABLE Students (
    student_id INT PRIMARY KEY AUTO_INCREMENT,
    student_name VARCHAR(100)
    );

    -- Create the Courses table
    CREATE TABLE Courses (
    course_id INT PRIMARY KEY AUTO_INCREMENT,
    course_name VARCHAR(100)
    );

    -- Create the junction table (Enrollments)
    CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
    );

In this example, the Enrollments table is the junction table. The combination of student_id and course_id forms a composite primary key, which ensures that each enrollment (a relationship between a student and a course) is unique.

Step 3: Insert and query data

  • Inserting data:

    -- Insert a student
    INSERT INTO Students (student_name) VALUES ('John Doe');
    -- Insert a course
    INSERT INTO Courses (course_name) VALUES ('Mathematics');
    -- Record the enrollment
    INSERT INTO Enrollments (student_id, course_id) VALUES (1, 1);
  • Querying data: To find all courses a student is enrolled in, or all students enrolled in a course, you can use JOIN operations.

    -- Find all courses John Doe is enrolled in
    SELECT Courses.course_name
    FROM Students
    JOIN Enrollments ON Students.student_id = Enrollments.student_id
    JOIN Courses ON Enrollments.course_id = Courses.course_id
    WHERE Students.student_name = 'John Doe';

2. In Non - Relational Databases

Graph Databases

  • In graph databases like Neo4j, a many - to - many relationship is represented by nodes and relationships. Each entity is a node, and the relationship between them is an edge.
  • For example, you can create Student nodes and Course nodes. Then, you can create a ENROLLED_IN relationship between the Student and Course nodes.
    // Create a student node
    CREATE (:Student {name: 'John Doe'})
    // Create a course node
    CREATE (:Course {name: 'Mathematics'})
    // Create the enrollment relationship
    MATCH (s:Student {name: 'John Doe'}), (c:Course {name: 'Mathematics'})
    CREATE (s)-[:ENROLLED_IN]->(c);

Document Databases

  • In document databases such as MongoDB, you can use arrays to represent many - to - many relationships in a denormalized way. For example, in the students collection, each student document can have an array of course IDs, and in the courses collection, each course document can have an array of student IDs. However, this approach can lead to data duplication and potential consistency issues.
    // Insert a student document
    db.students.insertOne({
    name: 'John Doe',
    courses: [ObjectId("1234567890abcdef12345678"), ObjectId("234567890abcdef12345678")]
    });
    // Insert a course document
    db.courses.insertOne({
    name: 'Mathematics',
    students: [ObjectId("abcdef1234567890abcdef12"), ObjectId("bcdef1234567890abcdef12")]
    });

TRANSACTION JPA

  1. What is “Offline Transaction”?
  2. How do we usually perform Transaction Management in JDBC?
  3. What is Database Transaction?
  4. What are entity states defined in Hibernate / JPA?
  5. How can we transfer the entity between different states?
  6. What are differences between save, persist?
  7. What are differences between update, merge and saveOrUpdate?
  8. How do you use elasticSearch in your java application
  • @Transactional - atomic operation
    The @Transactional annotation in Spring JPA is used to mark a method or a class as a transactional operation. It ensures that the operations within the method are executed atomically. That is, either all the operations succeed and are committed to the database, or if an error occurs, all the operations are rolled back, maintaining data consistency.
  • Propagation, Isolation
    Transaction propagation defines how a transaction should behave when a transactional method calls another transactional method. There are several propagation types like REQUIRED, REQUIRES_NEW, SUPPORTS, etc. Isolation levels define the degree to which one transaction is isolated from other transactions. Common isolation levels are READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, and SERIALIZABLE. Each level has different trade-offs in terms of data consistency and concurrency.
  • JPA naming convention
    JPA has certain naming conventions for mapping entity classes to database tables and columns. By default, it uses a naming strategy where the entity class name is mapped to the table name, and the property names are mapped to column names. However, you can also customize the naming using annotations like @Table and @Column to specify different names if needed.
  • Paging and Sorting Using JPA
    JPA provides support for paging and sorting data. You can use the Pageable interface and related classes to specify the page number, page size, and sorting criteria. For example, you can use methods like findAll(Pageable pageable) in a JPA repository to retrieve a paginated and sorted list of entities.
  • Hibernate Persistence Context
    The Hibernate persistence context is a set of managed entities that are associated with a particular session. It tracks the state of the entities and is responsible for synchronizing the changes between the entities and the database. It manages the lifecycle of the entities, including loading, saving, and deleting them.

how does jdbc handle database connections

JDBC (Java Database Connectivity) is a Java API that lets Java programs connect to and interact with databases.It provides a standard way to send SQL queries, retrieve data, update records, and manage database connections.

JDBC hides the details of how different databases work, so your Java code doesn’t need to change much if you switch databases.Under the hood, JDBC uses drivers (small libraries) provided by database vendors to handle the communication.

Typical steps include loading the driver, opening a connection, running SQL commands, handling results, and closing the connection.In real-world apps, JDBC is the foundation for higher-level tools like Hibernate, MyBatis, and Spring Data.

  • JDBC, statement vs PreparedStatement, Datasource
    • JDBC (Java Database Connectivity) is an API for interacting with databases in Java.
    • Statement is used to execute SQL statements directly, but it is vulnerable to SQL injection attacks.
    • PreparedStatement is a more secure and efficient alternative. It allows you to precompile SQL statements and set parameters, preventing SQL injection.
    • A DataSource is a factory for connections to a database. It manages the connection pool and provides connections to the application.
  • Hibernate ORM, Session, Cache
    Hibernate ORM is an Object Relational Mapping framework that allows you to map Java objects to database tables. A Session in Hibernate is a lightweight, short-lived object that provides an interface to interact with the database. It is used to perform operations like saving, loading, and deleting objects. Hibernate also has a caching mechanism to improve performance. It can cache objects in memory to reduce database access. There are different levels of caches, such as the first-level cache (session-level cache) and the second-level cache (shared cache across sessions).
  • Optimistic Locking - add version column
    Optimistic locking is a concurrency control mechanism used in databases. In the context of Hibernate, it can be implemented by adding a version column to the database table. When an object is loaded, the version number is also loaded. When the object is updated, Hibernate checks if the version number has changed. If it has, it means the object has been modified by another transaction, and the update will fail, preventing data conflicts.
  • Association: many - to - many
    In object-relational mapping, a many-to-many association is used when multiple objects of one entity can be related to multiple objects of another entity. For example, in a system with users and roles, a user can have multiple roles, and a role can be assigned to multiple users. In Hibernate, this is usually mapped using a join table and appropriate annotations like @ManyToMany and @JoinTable.

常考题-写一个@TransactionalService

@Service
public class HelloService {


@Autowired
private UserRepository userRepo;

@Transactional(rollbackFor = Exception.class)
public void processOrder(User user) throws Exception {
userRepo.save(user);

if (!user.isValidUser()) {
throw new Exception("Invalid user");
}
}
}

1. What is “Offline Transaction”?

An offline transaction in the context of databases is a set of operations on data that occur without an immediate, real - time connection to the database server. The operations are carried out on a local copy of the data, and the changes are later synchronized with the main database.

Example:

  • Mobile Banking App: A user opens a mobile banking app on their smartphone while on an airplane (no internet connection). They can view their account balance, transaction history which is stored locally. They can also initiate a new fund transfer. The app records this transfer request in a local database on the phone. Once the plane lands and the phone connects to the internet, the app synchronizes with the bank’s central database, uploading the new transfer request and downloading any new account updates.
  • Field Salesperson: A salesperson visits clients in an area with poor network coverage. Using a tablet, they access a local copy of the customer database. They add new customer details and record sales orders. Later, when they get back to an area with a network, the tablet syncs the new data with the company’s central database.

2. How do we usually perform Transaction Management in JDBC?

In JDBC (Java Database Connectivity), transaction management involves the following steps:

Step 1: Disable Auto - Commit Mode
By default, JDBC operates in auto - commit mode where each SQL statement is treated as a separate transaction. To group multiple statements into a single transaction, we need to disable auto - commit.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;

public class JDBCTransactionExample {
public static void main(String[] args) {
Connection connection = null;
try {
// Establish a connection
connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "user", "password");
// Disable auto - commit
connection.setAutoCommit(false);

Statement statement = connection.createStatement();
// Execute SQL statements
statement.executeUpdate("INSERT INTO employees (name, salary) VALUES ('John', 5000)");
statement.executeUpdate("UPDATE departments SET budget = budget - 5000 WHERE dept_name = 'IT'");

// Commit the transaction
connection.commit();
} catch (SQLException e) {
try {
if (connection != null) {
// Rollback the transaction in case of an error
connection.rollback();
}
} catch (SQLException ex) {
ex.printStackTrace();
}
e.printStackTrace();
} finally {
try {
if (connection != null) {
connection.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}

Explanation:

  • connection.setAutoCommit(false): Disables auto - commit mode so that statements are grouped into a single transaction.
  • connection.commit(): Commits all the statements in the transaction if everything goes well.
  • connection.rollback(): Rolls back all the statements in the transaction if an error occurs.

3. What is Database Transaction?

A database transaction is a sequence of one or more SQL statements that are treated as a single unit of work. It must satisfy the ACID properties:

  • Atomicity: Either all the statements in the transaction are executed successfully, or none of them are. For example, in a bank transfer, if you transfer money from one account to another, either both the debit from the source account and the credit to the destination account happen, or neither does.
  • Consistency: The transaction takes the database from one consistent state to another. For instance, if a rule in the database states that the total balance of all accounts should always be the same, a transaction should maintain this consistency.
  • Isolation: Transactions are isolated from each other. One transaction should not be affected by the intermediate states of other concurrent transactions. For example, if two users are trying to transfer money at the same time, their transactions should not interfere with each other.
  • Durability: Once a transaction is committed, its changes are permanent and will survive any subsequent system failures.

4. What are entity states defined in Hibernate / JPA?

In Hibernate and JPA (Java Persistence API), entities can be in one of the following states:

  • Transient: An entity is transient when it is created using the new keyword and has not been associated with a persistence context. It has no corresponding row in the database.

    // Transient entity
    Employee employee = new Employee();
    employee.setName("Jane");
  • Persistent: A persistent entity is associated with a persistence context and has a corresponding row in the database. Any changes made to a persistent entity will be automatically synchronized with the database when the transaction is committed.

    EntityManager entityManager = entityManagerFactory.createEntityManager();
    entityManager.getTransaction().begin();
    Employee employee = entityManager.find(Employee.class, 1L);
    // Now the employee is in persistent state
  • Detached: A detached entity was once persistent but is no longer associated with a persistence context. It still has a corresponding row in the database, but changes made to it will not be automatically synchronized.

    entityManager.getTransaction().commit();
    entityManager.close();
    // Now the employee is in detached state
  • Removed: An entity is in the removed state when it has been marked for deletion from the database. Once the transaction is committed, the corresponding row in the database will be deleted.

    entityManager.getTransaction().begin();
    Employee employee = entityManager.find(Employee.class, 1L);
    entityManager.remove(employee);
    // Now the employee is in removed state

5. How can we transfer the entity between different states?

  • Transient to Persistent: Use methods like persist() or save() in Hibernate. In JPA, you can use EntityManager.persist().

    EntityManager entityManager = entityManagerFactory.createEntityManager();
    entityManager.getTransaction().begin();
    Employee employee = new Employee();
    employee.setName("Tom");
    entityManager.persist(employee);
    // Now the employee is in persistent state
  • Persistent to Detached: Closing the EntityManager or clearing the persistence context will make a persistent entity detached.

    entityManager.getTransaction().commit();
    entityManager.close();
    // The previously persistent entity is now detached
  • Detached to Persistent: Use the merge() method in JPA.

    EntityManager newEntityManager = entityManagerFactory.createEntityManager();
    newEntityManager.getTransaction().begin();
    Employee detachedEmployee = getDetachedEmployee();
    Employee persistentEmployee = newEntityManager.merge(detachedEmployee);
    // Now the entity is back in persistent state
  • Persistent/Detached to Removed: Use the remove() method in JPA.

    entityManager.getTransaction().begin();
    Employee employee = entityManager.find(Employee.class, 1L);
    entityManager.remove(employee);
    // Now the employee is in removed state

6. What are differences between save, persist?

  • save() (Hibernate - specific):

    • Returns the generated identifier immediately. It can be used to insert a new entity into the database. If the entity is already persistent, it may throw an exception.
      Session session = sessionFactory.openSession();
      Transaction transaction = session.beginTransaction();
      Employee employee = new Employee();
      employee.setName("Alice");
      Serializable id = session.save(employee);
      transaction.commit();
      session.close();
  • persist() (JPA - standard):

    • Does not guarantee that the identifier will be assigned immediately. It is used to make a transient entity persistent. If the entity is already persistent, it will have no effect.
      EntityManager entityManager = entityManagerFactory.createEntityManager();
      entityManager.getTransaction().begin();
      Employee employee = new Employee();
      employee.setName("Bob");
      entityManager.persist(employee);
      entityManager.getTransaction().commit();
      entityManager.close();

7. What are differences between update, merge and saveOrUpdate?

  • update() (Hibernate - specific):

    • Used to make a detached entity persistent. If the entity is already persistent, it may throw an exception. It directly updates the database row corresponding to the entity.
      Session session = sessionFactory.openSession();
      Transaction transaction = session.beginTransaction();
      Employee detachedEmployee = getDetachedEmployee();
      session.update(detachedEmployee);
      transaction.commit();
      session.close();
  • merge() (JPA - standard):

    • Creates a copy of the detached entity, makes the copy persistent, and returns the persistent copy. The original detached entity remains detached. It can handle both transient and detached entities.
      EntityManager entityManager = entityManagerFactory.createEntityManager();
      entityManager.getTransaction().begin();
      Employee detachedEmployee = getDetachedEmployee();
      Employee mergedEmployee = entityManager.merge(detachedEmployee);
      entityManager.getTransaction().commit();
      entityManager.close();
  • saveOrUpdate() (Hibernate - specific):

    • Checks if the entity has an identifier. If it does not have an identifier, it acts like save(). If it has an identifier, it acts like update().
      Session session = sessionFactory.openSession();
      Transaction transaction = session.beginTransaction();
      Employee employee = new Employee();
      session.saveOrUpdate(employee);
      transaction.commit();
      session.close();

8. How do you use Elasticsearch in your Java application?

To use Elasticsearch in a Java application, you can follow these steps:

Step 1: Add Dependencies
If you are using Maven, add the following dependencies to your pom.xml:

<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.17.3</version>
</dependency>

Step 2: Create a Client

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;

public class ElasticsearchClientExample {
public static void main(String[] args) {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));
// Use the client for operations
try {
// Perform operations like indexing, searching, etc.
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
client.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}

Step 3: Index a Document

import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.common.xcontent.XContentType;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class ElasticsearchIndexExample {
public static void main(String[] args) {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));

Map<String, Object> jsonMap = new HashMap<>();
jsonMap.put("title", "Elasticsearch Tutorial");
jsonMap.put("content", "Learn how to use Elasticsearch in Java");

IndexRequest request = new IndexRequest("my_index")
.id("1")
.source(jsonMap, XContentType.JSON);

try {
IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);
System.out.println(indexResponse);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Step 4: Search for Documents

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.builder.SearchSourceBuilder;

import java.io.IOException;

public class ElasticsearchSearchExample {
public static void main(String[] args) {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));

SearchRequest searchRequest = new SearchRequest("my_index");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("title", "Elasticsearch"));
searchRequest.source(searchSourceBuilder);

try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

UNIT TEST

  1. Explain and name some methods that you used in JUnit.
  2. Explain and name some annotations that you used in JUnit.
  3. What is Mockito and the usage of it?

1. Commonly - Used Methods in JUnit

assertEquals()

  • Explanation: This method is used to verify if two values are equal. It is very useful when you want to check if the result of a method call in your code under test matches the expected result.
  • Example:
    import org.junit.jupiter.api.Test;
    import static org.junit.jupiter.api.Assertions.assertEquals;

    public class CalculatorTest {
    @Test
    public void testAddition() {
    Calculator calculator = new Calculator();
    int result = calculator.add(2, 3);
    assertEquals(5, result);
    }
    }

    class Calculator {
    public int add(int a, int b) {
    return a + b;
    }
    }

assertTrue() and assertFalse()

  • Explanation: assertTrue() is used to verify if a given condition is true, and assertFalse() is used to verify if a condition is false. These are handy when you want to check the truth - value of a boolean expression returned by a method.
  • Example:
    import org.junit.jupiter.api.Test;
    import static org.junit.jupiter.api.Assertions.assertTrue;
    import static org.junit.jupiter.api.Assertions.assertFalse;

    public class StringUtilTest {
    @Test
    public void testIsEmpty() {
    StringUtil stringUtil = new StringUtil();
    assertTrue(stringUtil.isEmpty(""));
    assertFalse(stringUtil.isEmpty("Hello"));
    }
    }

    class StringUtil {
    public boolean isEmpty(String str) {
    return str == null || str.length() == 0;
    }
    }

assertNull() and assertNotNull()

  • Explanation: assertNull() checks if an object reference is null, while assertNotNull() checks if an object reference is not null. They are useful when you need to ensure that a method returns or does not return a null value.
  • Example:
    import org.junit.jupiter.api.Test;
    import static org.junit.jupiter.api.Assertions.assertNull;
    import static org.junit.jupiter.api.Assertions.assertNotNull;

    public class ObjectFactoryTest {
    @Test
    public void testCreateObject() {
    ObjectFactory objectFactory = new ObjectFactory();
    Object obj = objectFactory.createObject();
    assertNotNull(obj);
    Object nullObj = objectFactory.createNullObject();
    assertNull(nullObj);
    }
    }

    class ObjectFactory {
    public Object createObject() {
    return new Object();
    }

    public Object createNullObject() {
    return null;
    }
    }

2. Commonly - Used Annotations in JUnit

@Test

  • Explanation: This annotation is used to mark a method as a test method. JUnit will execute all methods annotated with @Test when running the test class.
  • Example:
    import org.junit.jupiter.api.Test;

    public class SimpleTest {
    @Test
    public void testSomething() {
    // Test logic here
    }
    }

@BeforeEach

  • Explanation: Methods annotated with @BeforeEach are executed before each test method. This is useful for setting up the test environment, such as initializing objects or variables that are needed for each test.
  • Example:
    import org.junit.jupiter.api.BeforeEach;
    import org.junit.jupiter.api.Test;

    public class UserServiceTest {
    private UserService userService;

    @BeforeEach
    public void setUp() {
    userService = new UserService();
    }

    @Test
    public void testCreateUser() {
    // Use userService for testing
    }
    }

    class UserService {
    // Class implementation
    }

@AfterEach

  • Explanation: Methods annotated with @AfterEach are executed after each test method. This is used for cleaning up resources, such as closing database connections or releasing memory.
  • Example:
    import org.junit.jupiter.api.AfterEach;
    import org.junit.jupiter.api.Test;
    import java.io.File;
    import java.io.FileWriter;
    import java.io.IOException;

    public class FileServiceTest {
    private File tempFile;

    @Test
    public void testWriteToFile() throws IOException {
    tempFile = new File("temp.txt");
    FileWriter writer = new FileWriter(tempFile);
    writer.write("Test data");
    writer.close();
    }

    @AfterEach
    public void tearDown() {
    if (tempFile != null && tempFile.exists()) {
    tempFile.delete();
    }
    }
    }

@BeforeAll and @AfterAll

  • Explanation: @BeforeAll is used to annotate a static method that will be executed once before all the test methods in the class. @AfterAll is used to annotate a static method that will be executed once after all the test methods in the class. These are useful for performing expensive setup and cleanup operations, like starting and stopping a database server.
  • Example:
    import org.junit.jupiter.api.BeforeAll;
    import org.junit.jupiter.api.AfterAll;
    import org.junit.jupiter.api.Test;

    public class DatabaseServiceTest {
    private static DatabaseService databaseService;

    @BeforeAll
    public static void setUpAll() {
    databaseService = new DatabaseService();
    databaseService.startDatabase();
    }

    @Test
    public void testQueryDatabase() {
    // Test database query
    }

    @AfterAll
    public static void tearDownAll() {
    databaseService.stopDatabase();
    }
    }

    class DatabaseService {
    public void startDatabase() {
    // Start database logic
    }

    public void stopDatabase() {
    // Stop database logic
    }
    }

3. What is Mockito and its Usage

Definition

Mockito is a popular open - source testing framework for Java that allows you to create mock objects. Mock objects are simulated objects that mimic the behavior of real objects in a controlled way. They are used to isolate the code under test from its dependencies, making unit tests more reliable and faster.

Common Usages

Creating Mock Objects

  • You can use Mockito.mock() to create a mock object of a class or an interface.
    import org.junit.jupiter.api.Test;
    import static org.mockito.Mockito.mock;

    public class MockitoExample {
    @Test
    public void testMockCreation() {
    MyInterface myMock = mock(MyInterface.class);
    // Now myMock is a mock object of MyInterface
    }
    }

    interface MyInterface {
    void doSomething();
    }

Stubbing Methods

  • Stubbing means defining the behavior of a method on a mock object. You can use methods like when() and thenReturn() to stub methods.
    import org.junit.jupiter.api.Test;
    import static org.mockito.Mockito.mock;
    import static org.mockito.Mockito.when;

    public class StubbingExample {
    @Test
    public void testStubbing() {
    MyService myService = mock(MyService.class);
    when(myService.getResult()).thenReturn(10);
    int result = myService.getResult();
    // result will be 10
    }
    }

    class MyService {
    public int getResult() {
    return 0;
    }
    }

Verifying Method Calls

  • You can use Mockito.verify() to check if a method on a mock object has been called with specific arguments.
    import org.junit.jupiter.api.Test;
    import static org.mockito.Mockito.mock;
    import static org.mockito.Mockito.verify;

    public class VerificationExample {
    @Test
    public void testVerification() {
    MyInterface myMock = mock(MyInterface.class);
    myMock.doSomething();
    verify(myMock).doSomething();
    }
    }

    interface MyInterface {
    void doSomething();
    }

@InjectMocks和@Mock

@InjectMocks@Mock 是在使用 Mockito 框架进行单元测试时常用的注解,二者在功能和使用场景上存在明显差异,下面为你详细介绍:

功能用途
  • @Mock:此注解的作用是创建一个模拟对象。模拟对象可以用来模拟真实对象的行为,它会接管真实对象的所有方法调用,让你能自由地设定方法的返回值、抛出异常等。在单元测试里,当你想要隔离外部依赖时,可使用 @Mock 来创建这些依赖的模拟对象,这样就能专注于测试目标对象的逻辑,而不受外部依赖的影响。
  • @InjectMocks:该注解用于创建一个真实对象,并且会尝试将使用 @Mock 注解创建的模拟对象注入到这个真实对象里。在测试时,若目标对象依赖于其他对象,你可以使用 @InjectMocks 来创建目标对象,再用 @Mock 来创建其依赖对象,随后 Mockito 会自动把这些模拟对象注入到目标对象中。
应用场景
  • @Mock:适用于需要对某个外部依赖进行模拟的场景。例如,当目标对象依赖于数据库访问对象、网络服务对象等,而你不希望在测试时实际访问数据库或网络时,就可以使用 @Mock 来模拟这些依赖对象的行为。
  • @InjectMocks:适用于测试目标对象的整体逻辑,且该对象依赖于多个其他对象的场景。通过 @InjectMocks 创建目标对象,再用 @Mock 创建其依赖对象,可模拟出目标对象的运行环境,进而测试其在不同依赖行为下的逻辑表现。
示例代码
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.Mockito.when;

// 定义一个依赖类
class Dependency {
public String getData() {
return "real data";
}
}

// 定义一个需要测试的类,它依赖于 Dependency 类
class MyService {
private Dependency dependency;

public MyService(Dependency dependency) {
this.dependency = dependency;
}

public String processData() {
return "Processed: " + dependency.getData();
}
}

@ExtendWith(MockitoExtension.class)
public class MockitoAnnotationsExample {
// 创建 Dependency 类的模拟对象
@Mock
private Dependency dependency;

// 创建 MyService 类的实例,并将模拟的 Dependency 对象注入其中
@InjectMocks
private MyService myService;

@Test
public void testProcessData() {
// 设定模拟对象 dependency 的 getData 方法的返回值
when(dependency.getData()).thenReturn("mocked data");

// 调用 MyService 的 processData 方法
String result = myService.processData();

// 验证结果
assertEquals("Processed: mocked data", result);
}
}
总结
  • @Mock 主要用于创建模拟对象,以此模拟外部依赖的行为。
  • @InjectMocks 用于创建真实对象,并把模拟对象注入到该真实对象中,从而测试其整体逻辑。

MICROSERVICE

  1. In your own word, please describe some of the advantages and disadvantages of a
  2. Monolithic Application.
  3. In your own word, please describe some of the advantages and disadvantages of a
  4. Microservice Application.
  5. What is the purpose of using Netflix Eureka?
  6. How can microservices communicate with each other?
  7. What is the purpose of using Spring API Gateway?
  8. Explain cascading failure in microservice and how to prevent it.
  9. Explain CircuitBreaker and how it works in detail.

The following is an explanation of each question along with relevant examples:

Monolithic Application

  • Advantages
    • Simplicity: It’s a single unit, easy to develop, test and deploy. For example, a small blog website built with a monolithic architecture can be developed quickly as all the components are in one place.
    • Ease of Data Management: All components can access the same database easily, simplifying data consistency. In a monolithic e-commerce app, the product, order and user data can be managed centrally.
    • Good for Small Projects: Ideal for small-scale applications with low complexity and clear requirements. A simple internal management system for a small company may not need the complexity of a distributed architecture.
  • Disadvantages
    • Scalability Issues: As the application grows, it becomes hard to scale. If a monolithic social media app experiences a sudden traffic spike, scaling the entire application is more difficult and expensive than scaling individual components.
    • Slow Deployment: Any change requires redeploying the entire application. If you want to update a single feature in a monolithic banking app, the whole app needs to be deployed, causing potential downtime.
    • Technology Limitations: It’s hard to adopt new technologies or frameworks in a monolithic structure. For example, if you want to use a new data processing framework in a monolithic app that’s already using an old tech stack, it may require a major rewrite.

Microservice Application

  • Advantages
    • High Scalability: Each microservice can be scaled independently. In a large e-commerce platform like Amazon, the order processing, inventory management and user profile services can be scaled based on their specific load.
    • Technology Diversity: Different microservices can use different technologies based on their requirements. For example, the image processing microservice can use a different technology stack than the user authentication microservice.
    • Faster Deployment: Only the updated microservice needs to be deployed. If a new feature is added to the payment microservice of a fintech app, only that microservice is deployed, minimizing downtime.
  • Disadvantages
    • Complexity in Management: Managing multiple microservices, their communication and dependencies is complex. For example, coordinating data updates across multiple microservices in a healthcare application can be challenging.
    • Data Consistency: Ensuring data consistency across multiple microservices is difficult. In a microservices-based ride-hailing app, maintaining the consistency of driver and rider data across different services can be a problem.
    • Testing Complexity: Testing the entire system becomes more complex as it involves testing multiple microservices and their interactions. Testing a microservices-based logistics app requires testing each service and how they work together.

Netflix Eureka

  • Purpose: It’s a service discovery tool. It allows microservices in a distributed system to register and discover each other. For example, in a microservices architecture where there are multiple user service instances and order service instances, Eureka helps the order service find the available user service instances to communicate with.

Microservices Communication

  • Methods:
    • RESTful API: Microservices can communicate via HTTP requests using RESTful APIs. For example, a product service can expose a REST API that a shopping cart service can call to get product details.
    • Message Queues: They can use message queues like RabbitMQ or Kafka. For instance, in an e-commerce system, when an order is placed, the order service can send a message to a message queue, which the inventory service listens to and updates the inventory accordingly.

Spring API Gateway

  • Purpose: It acts as a single entry point for all microservices. It provides features like request routing, authentication, rate limiting, etc. For example, in a microservices-based application, all external requests first come to the API gateway, which then routes the requests to the appropriate microservices. It can also apply authentication and authorization rules before allowing the request to reach the microservices.

Cascading Failure in Microservice and Prevention

  • Explanation: In a microservices environment, if one microservice fails, it can cause other dependent microservices to fail, leading to a cascading effect. For example, if the user service in a social media app fails, the services that depend on it like the post service (which needs to get user information) and the comment service may also fail.
  • Prevention:
    • Circuit Breaker: Implementing circuit breakers can prevent cascading failures. If a microservice fails to respond after a certain number of attempts, the circuit breaker trips and stops sending requests to that service, preventing other services from waiting indefinitely and potentially failing.
    • Isolation: Using techniques like thread pools and resource isolation to ensure that the failure of one microservice doesn’t exhaust the resources of other services.

Circuit Breaker

  • Explanation: A circuit breaker is a design pattern used to prevent cascading failures in a microservices architecture. It monitors the health of a service and if the service fails to respond or returns errors frequently, the circuit breaker trips and stops sending requests to that service for a certain period.
  • How it Works:
    • Closed State: Initially, the circuit breaker is in the closed state and all requests are sent to the service as normal.
    • Open State: If the service fails a certain number of times within a given time period, the circuit breaker opens. In this state, all requests to the service are immediately failed without being sent to the actual service.
    • Half-Open State: After a certain period in the open state, the circuit breaker enters the half-open state. It allows a small number of requests to be sent to the service to check if it has recovered. If the requests succeed, the circuit breaker closes and normal operation resumes. If the requests fail, the circuit breaker returns to the open state.

DEVOPS

  1. Use your own words to explain Jenkins.
  2. Can you talk about CI/CD?
  3. Git command you used in the project
  4. How do you release from the git repository
  5. How do you combine several commits together
  6. What is git cherry-pick
  7. difference between git and svn
  8. difference git merge and rebase

    1. Jenkins

    Jenkins is an open - source automation server widely used in software development. Its main purpose is to automate various stages of the software development lifecycle, such as building, testing, and deploying applications.

How it works:
Jenkins has a web - based interface where you can create and configure jobs. A job in Jenkins can represent a specific task, like building a Java project or running a set of unit tests. You can define the steps of the job, including the commands to execute, the source code repositories to pull from, and the environment variables to use.

Example:
Suppose you are developing a Python web application. You can set up a Jenkins job to automatically pull the latest code from a Git repository, install the necessary Python dependencies, run unit tests, and then deploy the application to a staging server if the tests pass.

2. CI/CD

  • CI (Continuous Integration):
    • CI is a development practice where developers frequently integrate their code changes into a shared repository. Every time code is pushed to the repository, an automated build and test process is triggered. This helps to catch integration issues early in the development cycle.
    • Example: In a team of developers working on a mobile app, each developer may push their code changes to the main Git repository several times a day. A CI server (like Jenkins) then automatically builds the app from the latest code and runs unit and integration tests. If any tests fail, the developers are notified immediately.
  • CD (Continuous Delivery/Deployment):
    • Continuous Delivery is an extension of CI. It ensures that the software can be reliably released to production at any time. After the code passes the CI tests, it is automatically prepared for deployment, but the actual deployment to production may be a manual step.
    • Continuous Deployment takes it a step further and automatically deploys the software to production if it passes all the tests.
    • Example: For a web - based e - commerce application, with continuous delivery, once the code passes the CI tests, it is packaged and stored in a deployment artifact repository. A release manager can then decide when to deploy it to the production servers. In continuous deployment, the application is automatically deployed to production as soon as the tests pass.

3. Git commands used in a project

  • git clone: Used to create a local copy of a remote Git repository.
    • Example: git clone https://github.com/user/repo.git creates a local copy of the repo repository hosted on GitHub.
  • git add: Adds changes in the working directory to the staging area.
    • Example: git add src/main.py adds the changes made to the main.py file to the staging area.
  • git commit: Commits the changes from the staging area to the local repository with a descriptive message.
    • Example: git commit -m "Fixed a bug in the login function"
  • git push: Pushes the committed changes from the local repository to a remote repository.
    • Example: git push origin main pushes the changes from the local main branch to the main branch of the remote repository named origin.
  • git pull: Fetches and merges changes from a remote repository into the local repository.
    • Example: git pull origin main fetches the latest changes from the main branch of the origin remote repository and merges them into the local main branch.

4. Releasing from the Git repository

  • Create a Release Branch (Optional):
    • You can create a dedicated release branch from the main development branch (e.g., main or master). For example, git checkout -b release/v1.0 main creates a new release branch named release/v1.0 from the main branch.
  • Tag the Release:
    • Use the git tag command to mark a specific commit as a release. For example, git tag v1.0 tags the current commit as version v1.0. You can then push the tags to the remote repository using git push origin --tags.
  • Build and Deploy:
    • Use the tagged commit to build the application and deploy it to the appropriate environments (staging, production, etc.).

5. Combining several commits together

You can use the git rebase -i (interactive rebase) command to combine multiple commits.

  • Example: Suppose you have made 3 consecutive commits and want to combine them into one.
    • First, find the commit hash of the commit before the first commit you want to combine. Let’s say the commit hash is abc123.
    • Then run git rebase -i abc123. This will open an editor where you can see a list of commits.
    • Change the pick keyword to squash (or s) for the commits you want to combine with the previous one.
    • Save and close the editor. Git will then combine the commits, and you can provide a new commit message for the combined commit.

6. Git cherry - pick

git cherry - pick is used to apply a specific commit from one branch to another.

  • Example: Suppose you have a feature branch with a commit that you want to apply to the main branch.
    • First, switch to the main branch: git checkout main.
    • Then use git cherry - pick <commit - hash> where <commit - hash> is the hash of the commit on the feature branch that you want to apply. Git will then try to apply that commit to the main branch.

7. Difference between Git and SVN

  • Architecture:
    • Git: It is a distributed version control system. Every developer has a complete copy of the repository, including the entire commit history. This allows developers to work offline and perform most operations locally.
    • SVN: It is a centralized version control system. There is a single central repository, and developers need to connect to it to perform operations like committing changes or getting the latest code.
  • Branching and Merging:
    • Git: Branching and merging are very fast and easy. Creating a new branch is just a matter of creating a new pointer to a commit. Merging between branches is also efficient.
    • SVN: Branching and merging can be more complex and slower. It involves copying the entire directory structure in the repository to create a branch.
  • Data Integrity:
    • Git: It uses a hash - based system to ensure data integrity. Every commit, file, and directory has a unique hash, and any change to the data will result in a different hash.
    • SVN: While it also has some mechanisms for data integrity, it is not as robust as Git’s hash - based system.

8. Difference between Git merge and rebase

  • Merge:
    • A git merge combines the changes from two or more branches into one. It creates a new “merge commit” that has two parents, one from each branch being merged.
    • Example: If you have a feature branch and a main branch, and you want to integrate the changes from the feature branch into the main branch, you can run git checkout main followed by git merge feature. This will create a merge commit on the main branch.
    • The commit history after a merge shows a more complex, branching structure.
  • Rebase:
    • A git rebase moves or combines a sequence of commits to a new base commit. It takes the commits from one branch and replays them on top of another branch.
    • Example: If you have a feature branch and a main branch, and you want to update the feature branch with the latest changes from the main branch, you can run git checkout feature followed by git rebase main. This will take the commits from the feature branch and replay them on top of the latest commit on the main branch.
    • The commit history after a rebase is linear, which can make it easier to understand and follow. However, it can also be more complex to resolve conflicts during a rebase compared to a merge.

Splunk

  • Overview: Splunk is a powerful data analytics platform that is widely used for monitoring and analyzing machine data. It can ingest, index, and correlate data from various sources such as logs, metrics, and events.
  • Features:
    • Data Collection: It can collect data from a large number of sources including servers, applications, network devices, etc.
    • Search and Analytics: Provides a powerful search language that allows users to quickly query and analyze data to find patterns, troubleshoot issues, and gain insights.
    • Visualization: Enables users to create various visualizations like dashboards, charts, and graphs to present data in an intuitive way.
    • Alerting: Can set up alerts based on specific conditions or thresholds, notifying users when important events occur.
  • Use Cases: Commonly used in IT operations for monitoring infrastructure health, in security for detecting threats and analyzing security incidents, and in business for analyzing customer behavior and operational data.

Grafana

  • Overview: Grafana is an open-source data visualization and monitoring tool. It focuses mainly on presenting data in a visually appealing and understandable way, making it easy for users to monitor and analyze metrics.
  • Features:
    • Data Sources: Supports a wide range of data sources such as Prometheus, InfluxDB, MySQL, etc.
    • Visualization Options: Offers a variety of visualization types including line charts, bar charts, pie charts, heatmaps, and more. Users can customize dashboards to display the data they need.
    • Alerting System: Allows setting up alerts based on metric values and conditions. It can send notifications through various channels like email, Slack, etc.
    • Plugin System: Has a rich ecosystem of plugins that can extend its functionality, enabling integration with other tools and adding new features.
  • Use Cases: It is popular in DevOps and IT teams for monitoring application performance, infrastructure metrics, and for visualizing time-series data. It helps in quickly identifying trends and anomalies in the data.

Kibana

  • Overview: Kibana is an open-source data visualization and exploration tool that is closely integrated with Elasticsearch. It is used to visualize and analyze data stored in Elasticsearch.
  • Features:
    • Data Visualization: Allows users to create a variety of visualizations such as bar charts, line charts, maps, and histograms. It provides an intuitive interface for exploring and filtering data.
    • Dashboard Creation: Users can easily create and customize dashboards to display multiple visualizations in one place, providing a comprehensive view of the data.
    • Search and Filtering: Provides a powerful search and filtering functionality to quickly find and analyze specific data subsets.
    • Time-series Analysis: Specializes in analyzing time-series data, which is useful for monitoring and understanding how data changes over time.
  • Use Cases: Commonly used in log analysis, security information and event management (SIEM), and for monitoring the performance of applications and infrastructure. It is widely used in combination with Elasticsearch for large-scale data analysis and monitoring.

CloudWatch

  • Overview: CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). It allows users to monitor AWS resources and the applications running on them.
  • Features:
    • Resource Monitoring: Automatically collects metrics from various AWS resources such as EC2 instances, RDS databases, S3 buckets, etc.
    • Custom Metrics: Allows users to define and send their own custom metrics to CloudWatch for monitoring application-specific performance indicators.
    • Alarms: Can set up alarms based on metric thresholds and events. It can trigger actions such as sending notifications, auto-scaling resources, or invoking Lambda functions.
    • Logs Management: Integrates with AWS CloudTrail and other services to collect and store logs. Users can analyze logs to gain insights into the behavior of their applications and resources.
  • Use Cases: In the AWS ecosystem, it is essential for monitoring the health and performance of cloud-based applications and infrastructure. It helps in optimizing resource utilization, detecting and resolving issues quickly, and ensuring the reliability of applications running on AWS.

CLOUD

  1. AWS difference between parameter store and secret manager
  2. AWS where to store certificate file
  3. extra:(those we are not sure which session to put in)
  4. Use your own words to explain TDD and why use TDD.
  5. Please do some research on Redis and use your own words to explain what Redis is.
  6. Use your own words to explain what Swagger is.
  7. Please do some research on ELK and use your own words to explain what they are.
  8. Use your own words to explain Jira.
  9. What is RabbitMQ and what can it help us to achieve in a web application?What are the component of RabbitMQ?
  10. What are different types of Exchange that exist in RabbitMQ?
  11. What is Scheduler and what can it help us to achieve in a web application?

AWS Modules with examples

AWS (Amazon Web Services) offers a wide range of modules and services to build and manage various types of applications and infrastructure. Here are some of the key AWS modules with examples:

Compute Modules

  • Amazon Elastic Compute Cloud (EC2)
    • Description: A web service that provides resizable compute capacity in the cloud. It allows users to launch virtual servers, known as instances, with various operating systems and configurations.
    • Example: A startup might use EC2 instances to host their web application. They can choose an appropriate instance type based on their CPU, memory, and storage requirements. For instance, they could select a t2.micro instance for a small-scale development environment or an m5.xlarge instance for a more resource-intensive production application.
  • AWS Lambda
    • Description: A serverless compute service that lets you run code without provisioning or managing servers. It automatically scales based on the incoming request volume.
    • Example: A mobile application might use AWS Lambda to process user sign-up events. When a user signs up, the app triggers a Lambda function that validates the input, stores the user data in a database, and sends a welcome email.

Storage Modules

  • Amazon Simple Storage Service (S3)
    • Description: An object storage service that offers high scalability, data durability, and security. It is used to store and retrieve any amount of data from anywhere on the web.
    • Example: A media company could use S3 to store and distribute large video files. They can create an S3 bucket, upload the video files, and then use S3’s content delivery network (CDN) integration to serve the videos to users with low latency.
  • Amazon Elastic Block Store (EBS)
    • Description: A block-level storage service that provides persistent storage volumes for EC2 instances. It offers high-performance storage that can be attached to instances and used like a local hard drive.
    • Example: A database server running on an EC2 instance might use an EBS volume to store its data. The EBS volume can be sized according to the database’s storage needs and can be easily detached and attached to another instance for maintenance or scaling purposes.

Database Modules

  • Amazon Relational Database Service (RDS)
    • Description: A managed relational database service that makes it easy to set up, operate, and scale a relational database in the cloud. It supports popular database engines like MySQL, PostgreSQL, and Oracle.
    • Example: An e-commerce website could use RDS to manage its customer and order data. They can create an RDS instance with the appropriate database engine and configure it with the necessary storage and compute resources. The website’s application can then connect to the RDS instance to perform database operations such as inserting, updating, and querying data.
  • Amazon DynamoDB
    • Description: A fully managed NoSQL database service that offers fast and predictable performance with seamless scalability. It is designed for applications that require low-latency access to data.
    • Example: A mobile gaming company might use DynamoDB to store user game progress, leaderboard data, and in-game purchases. The database can handle the high write and read throughput required by the game, and it can scale automatically as the number of users grows.

Networking Modules

  • Amazon Virtual Private Cloud (VPC)
    • Description: Allows you to provision a logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define.
    • Example: A financial institution could create a VPC to host its critical applications and services. They can define subnets, route tables, and security groups within the VPC to ensure secure and isolated networking. For example, they might have a public subnet for web servers that need to be accessible from the internet and a private subnet for database servers that should only be accessible from within the VPC.
  • Amazon Route 53
    • Description: A highly available and scalable Domain Name System (DNS) web service. It translates domain names into IP addresses and routes internet traffic to the appropriate AWS resources.
    • Example: A company with multiple websites and applications can use Route 53 to manage their domain names and DNS records. They can create DNS records to point their domain names to the corresponding EC2 instances, load balancers, or other AWS services. For instance, they can set up an A record to map a domain name to the IP address of a web server hosted on EC2.

1. AWS: Difference between Parameter Store and Secret Manager

Parameter Store

  • Explanation: AWS Systems Manager Parameter Store is a service that allows you to store configuration data such as database connection strings, API keys, and other parameters in a hierarchical structure. It’s designed for storing both plain - text and encrypted data. It helps in centralizing configuration management, making it easier to manage and update application settings across multiple environments.
  • Example: Suppose you have a microservices - based application deployed in multiple AWS regions. You can store the database connection strings for each region in the Parameter Store. For instance, a key - value pair like /myapp/production/db - connection - string with the actual connection string as the value. When your application starts, it can retrieve the appropriate connection string from the Parameter Store based on the environment.

Secret Manager

  • Explanation: AWS Secrets Manager is focused on securely managing secrets such as passwords, access keys, and other sensitive information. It provides features like automatic rotation of secrets, auditing, and fine - grained access control. It’s designed to reduce the risk of exposing sensitive data and simplify the process of keeping secrets up - to - date.
  • Example: Consider an application that uses an Amazon RDS database. You can store the database password in the Secrets Manager. The application can then retrieve the password securely when it needs to connect to the database. Additionally, you can set up automatic rotation of the password every 30 days, which helps in maintaining security.

2. AWS: Where to store certificate files

  • AWS Certificate Manager (ACM):
    • ACM is the recommended service for managing SSL/TLS certificates in AWS. It provides free SSL/TLS certificates for use with AWS services such as Elastic Load Balancing, Amazon CloudFront, and API Gateway. You can easily request, renew, and manage certificates through the ACM console or API.
    • Example: If you have a web application running behind an Elastic Load Balancer, you can request an SSL/TLS certificate from ACM and associate it with the load balancer. This enables secure communication between clients and your application.
  • AWS S3:
    • You can also store certificate files in an Amazon S3 bucket. However, you need to ensure proper security measures such as encryption and access control. This option is useful if you need to use the certificates with non - AWS services or if you want to have more control over the storage and management of the certificates.
    • Example: If you have an on - premise server that needs to use an SSL/TLS certificate stored in AWS, you can store the certificate in an S3 bucket and download it to the server when needed.

3. TDD (Test - Driven Development)

  • Explanation: TDD is a software development process where you write tests before writing the actual production code. The process typically follows a cycle of “Red - Green - Refactor”. First, you write a test that initially fails (Red). Then, you write the minimum amount of code to make the test pass (Green). Finally, you refactor the code to improve its design, readability, and maintainability without changing its behavior.
  • Why use TDD:
    • Early Bug Detection: By writing tests first, you can catch bugs early in the development process, reducing the cost of fixing them later.
    • Improved Design: TDD encourages writing modular and testable code, which leads to better software design.
    • Documentation: Tests serve as living documentation for the code, making it easier for other developers to understand how the code works.
  • Example: Suppose you are developing a simple calculator class with an add method. First, you write a test like this in Java using JUnit:
    import org.junit.jupiter.api.Test;
    import static org.junit.jupiter.api.Assertions.assertEquals;

    public class CalculatorTest {
    @Test
    public void testAdd() {
    Calculator calculator = new Calculator();
    int result = calculator.add(2, 3);
    assertEquals(5, result);
    }
    }

This test will initially fail because the Calculator class and the add method don’t exist yet. Then you write the minimum code to make the test pass:

public class Calculator {
public int add(int a, int b) {
return a + b;
}
}

Finally, you can refactor the code if needed, for example, by adding more error handling or improving the code style.

4. Redis

  • Explanation: Redis is an open - source, in - memory data structure store that can be used as a database, cache, and message broker. It supports various data structures such as strings, hashes, lists, sets, and sorted sets. Redis is known for its high performance because it stores data in memory, which allows for very fast read and write operations. It also provides features like data persistence, replication, and clustering.
  • Example: In a web application, Redis can be used as a cache to store frequently accessed data. For example, if you have a news website, you can store the top - viewed articles in Redis. When a user requests the top - viewed articles page, the application first checks Redis. If the data is available in Redis, it can be returned immediately, reducing the load on the database.

5. Swagger

  • Explanation: Swagger is a set of open - source tools and frameworks for designing, building, documenting, and consuming RESTful APIs. It provides a way to describe the structure of an API using a JSON or YAML - based specification. Swagger tools can then generate interactive documentation, client libraries, and server stubs based on the API specification.
  • Example: Suppose you have a RESTful API for a book management system. You can use Swagger to define the API endpoints, request and response formats, and available operations. Swagger UI can then generate an interactive documentation page where developers can explore the API, test the endpoints, and see the expected input and output formats.

6. ELK (Elasticsearch, Logstash, Kibana)

  • Explanation:
    • Elasticsearch: It is a distributed, open - source search and analytics engine. It stores data in a JSON - like format and allows for fast and flexible searching, filtering, and aggregation of data. It can handle large volumes of data and is often used for log analysis, full - text search, and real - time analytics.
    • Logstash: Logstash is a data processing pipeline that collects, filters, and transforms data from various sources (such as log files, system metrics, and application events) and sends it to a destination (such as Elasticsearch). It can perform tasks like parsing log data, enriching it with additional information, and cleaning it up.
    • Kibana: Kibana is a web - based visualization tool that works with Elasticsearch. It allows users to create visualizations, dashboards, and reports based on the data stored in Elasticsearch. It provides an intuitive interface for exploring and analyzing data.
  • Example: In a large - scale web application, Logstash can collect all the application logs from different servers. It can then parse the logs, extract relevant information such as request URLs, response times, and error messages. The processed data is sent to Elasticsearch for storage. Developers and administrators can then use Kibana to create dashboards showing the application’s performance metrics, error rates, and other important information.

7. Jira

  • Explanation: Jira is a popular project management and issue - tracking tool developed by Atlassian. It allows teams to plan, track, and manage projects, tasks, and bugs. Jira provides features such as customizable workflows, issue tracking, reporting, and integration with other tools. It can be used for software development projects, but also for other types of projects in different industries.
  • Example: In a software development team, Jira can be used to manage the development lifecycle of a project. Developers can create issues for new features, bugs, and tasks. The project manager can assign these issues to team members, set deadlines, and track the progress of each issue. Jira also provides reports on the project’s status, such as the number of open and closed issues, the time taken to resolve issues, and the overall project progress.

8. RabbitMQ

  • Explanation: RabbitMQ is an open - source message broker software that implements the Advanced Message Queuing Protocol (AMQP). It enables applications to communicate with each other by sending and receiving messages. It acts as an intermediary between producers (applications that send messages) and consumers (applications that receive messages).
  • What it can help achieve in a web application:
    • Decoupling: It allows different components of a web application to be decoupled. For example, in an e - commerce application, the order processing component can send messages to the inventory management component through RabbitMQ without having direct knowledge of the inventory system.
    • Asynchronous Processing: It enables asynchronous processing, which can improve the performance and scalability of the application. For instance, when a user submits a form, the application can send a message to RabbitMQ and continue processing other tasks without waiting for the form data to be fully processed.
  • Components of RabbitMQ:
    • Producer: An application that sends messages to a RabbitMQ broker.
    • Consumer: An application that receives messages from a RabbitMQ broker.
    • Queue: A buffer that stores messages until they are consumed.
    • Exchange: Routes messages to one or more queues based on rules.
    • Broker: The RabbitMQ server that manages the queues, exchanges, and message routing.

9. Different types of Exchanges in RabbitMQ

  • Direct Exchange: Routes messages to queues based on the message’s routing key. Each queue is bound to the direct exchange with a specific routing key. When a message is sent to the direct exchange with a certain routing key, it is delivered to the queues that are bound with the same routing key.
    • Example: In a logging application, different types of logs (e.g., error logs, warning logs) can be sent to different queues using a direct exchange.
  • Fanout Exchange: Routes messages to all the queues that are bound to it, regardless of the routing key. It is useful when you want to broadcast messages to multiple consumers.
    • Example: In a news application, when a new news article is published, a message can be sent to a fanout exchange, and all the queues (e.g., queues for different user groups) bound to the exchange will receive the message.
  • Topic Exchange: Routes messages to queues based on a pattern matching of the routing key. Queues are bound to the topic exchange with a binding key that can contain wildcards (* for single - word matching and # for multi - word matching).
    • Example: In a financial application, messages about different stocks can be sent to a topic exchange. A queue can be bound to the exchange with a binding key like stocks.# to receive all messages related to stocks.
  • Headers Exchange: Routes messages based on the message headers rather than the routing key. Queues are bound to the headers exchange with a set of header values. When a message is sent with specific headers, it is delivered to the queues that match the header values.

10. Scheduler

  • Explanation: A scheduler in a web application is a component that allows you to schedule tasks to run at specific times or intervals. It can be used to perform various tasks such as running batch jobs, sending periodic notifications, and refreshing caches.
  • What it can help achieve in a web application:
    • Automation: It automates repetitive tasks, reducing the need for manual intervention. For example, a scheduler can be used to automatically generate daily reports in a business application.
    • Resource Optimization: It can be used to schedule resource - intensive tasks during off - peak hours to optimize the use of system resources. For instance, a scheduler can be used to perform database backups at night when the application has low traffic.
  • Example: In a content management system, a scheduler can be used to publish new articles at a specific time. The administrator can set a publication time for an article, and the scheduler will ensure that the article is made available to the public at the specified time.

System Design

首先是一些基本的概念性问题需要去看一下比如高并发处理,接口幂等性,日志与监控,负载均衡,数据库优化等等。大公司的通用性岗位考题在面经里频率很高,小公司或者组招大概率是会出一些现有项目的实际系统设计。下面是一些本人碰到的考题:

  • 1.URL shortening system
  • 2.Design YouTube watch history
  • 3.Design a Message Queue System
  • 4.Design a Auto Complete System. 1point3acres
  • 5.Design a Trie Tree FileSystem. 1point 3 acres
  • 6.Design a Review and Reward system
  • 7.Design twitter
  • 8.Design a traffic router
  • 9.Design a report system
  • 10.Design a scanner system. 1point 3 acres
  • 11.Design a Rate Limiter. check 1point3acres for more.
  • 12.Design a Real-Time Chat Application
  • 13.Design a News Feed System
  • 14.Design a File Storage Service
  • 15.Design a Distributed Cache System
  • 在面试系统设计可以考虑遵循如下步骤:
    • Clarify 需求(问清楚!)
    • 估算规模(数据量,QPS,存储大小)
    • 拆解系统组件(大模块)
    • 画系统流程图(API Flow + Data Flow)
    • 深挖关键模块(DB设计,缓存策略,Queue机制等)
    • 提出扩展性方案(Scale, Replication). Χ
    • 提出高可用和故障恢复方案
    • 讲优化点(例如缓存优化、DB查询优化)
    • 总结(Trade-offs)

Computer Network

misc Q

18. What is the Same-Origin Policy?

  • 出于安全考虑,浏览器默认禁止网页向不同的源发送请求(同源策略,Same-Origin Policy)。
  • The Same-Origin Policy is a security measure that restricts web pages from making requests to a domain different from the one that served the web page.
  • This prevents malicious sites from accessing sensitive data on other sites.

20. What is the preflight request in Google Chrome for CORS?

  • Preflight requests are sent by the browser before the actual request to check if the server accepts the cross-origin request.
  • The browser sends an OPTIONS request to the server to confirm it supports the necessary HTTP methods.

21. When will a non-simple request be sent?

  • Non-simple requests are sent when the request method is not GET, POST, or HEAD, or when custom headers are used.
  • For example, using a custom Authorization header will trigger a preflight request.

23. Can you explain CORS in detail?

  • CORS (Cross-Origin Resource Sharing) allows servers to specify who can access their resources.
  • It involves adding specific headers like Access-Control-Allow-Origin to responses.
  • For example, a server might allow requests from https://example.com.

30. What is CDN, and how does it optimize content delivery?

  • CDN stands for Content Delivery Network. It is a distributed network of servers that delivers content like images, videos, and files quickly to users.
  • It reduces latency and improves load times by serving content from the nearest server.

35. What is the difference between HTTP and HTTPS?

  • HTTP is an unsecured protocol, while HTTPS adds SSL/TLS encryption for secure communication.
  • HTTPS ensures data confidentiality, integrity, and authenticity, which prevents man-in-the-middle attacks.

43. Can you explain the three-way handshake in TCP? What happens if it’s changed to a two-way handshake?

  • The three-way handshake in TCP involves SYN, SYN-ACK, and ACK messages to establish a connection.
  • If it were a two-way handshake, it would be less reliable and might lead to incomplete or corrupted connections.

44. What is the process of TCP’s four-way handshake for connection termination?

  • The four-way handshake involves FIN, ACK, FIN, and ACK messages.
  • It ensures that both sides of the connection are properly closed.

45. How is an HTTPS connection established, and what interactions occur during this process?

  • HTTPS uses SSL/TLS for secure communication.
  • The process involves key exchange, server authentication, and encryption to ensure data confidentiality and integrity.

55. What are some common values of the cache-control header?

  • Common values include no-cache, no-store, max-age, and public.
  • These control how browsers and caches handle the content, like how long to store it or whether to revalidate it.

56. What is the difference between Last-Modified/ETag and Cache-Control/Expires?

  • Last-Modified and ETag are used to check if a resource has changed since the last request.
  • Cache-Control and Expires define caching rules, such as how long content can be stored before it needs to be revalidated.

TCP队头阻塞

TCP 的队头阻塞(Head-of-Line Blocking, HOL) 是指:因为 TCP 保证有序传输,一旦某个数据包丢失,后续所有数据都必须等待这个丢失的包被重传并收到后,才能继续处理。

举个例子说明:

假设你在浏览网页,TCP 分成了 5 个包发送:

包1 -> 包2 -> 包3 -> 包4 -> 包5

如果 包2丢了,虽然包3、包4、包5已经到了,浏览器必须等包2重传回来,然后才能把整个数据流交给上层。

为什么这很糟糕?

  • 你请求了多个图片或 CSS 文件
  • 它们共用一个 TCP 连接(如 HTTP/2)
  • 只要中间丢一个包,所有请求都“卡住”

这在**高丢包网络(如移动网络)**下尤其严重。

HTTP/3 怎么解决?

HTTP/3 基于 QUIC(UDP 协议),每个请求有独立的逻辑流(stream),即使某个流中的数据包丢失,也不会影响其他流。从根本上消除了 TCP 的队头阻塞问题。

总结一句话:

TCP 的 HOL 阻塞是因为它必须保证数据顺序;QUIC(HTTP/3)通过独立流彻底解决了这个问题。

HTTP1.1和2和3

协议演进概览

Head-of-Line (HOL)

特性 HTTP/1.1 HTTP/2 HTTP/3
发布年份 1999 2015 2022
传输协议 TCP TCP QUIC (基于 UDP)
连接复用 ❌ 不支持 ✅ 支持多路复用 ✅ 真正无 HOL 阻塞
HOL Blocking Present队头阻塞 Still exists 应用层 无HOL 阻塞, TCP层依然有 完全无HOL阻塞
头部压缩 ❌ 无压缩 ✅ HPACK 压缩 ✅ QPACK 压缩
服务器推送 ❌ 不支持 ✅ 支持(已被废弃) ⚠️ 实验性/弱支持
安全性 明文/HTTPS 推荐 TLS 强制使用 TLS 1.3
拥塞控制 TCP 层 TCP 层 QUIC 内建(更快)
连接迁移 ❌ 不支持 ❌ 不支持 ✅ 支持(移动网络友好)
普及度 保留 主力 逐步普及

性能对比

场景 HTTP/1.1 HTTP/2 HTTP/3
多个请求 ❌ 阻塞严重 ⚠️ 应用层 HOL 阻塞 ✅ 并发性能最佳
首次连接延迟 慢(握手 + TLS) 一样慢 ✅ 快(0-RTT 支持)
移动网络切换 ❌ 需重连 ❌ 需重连 ✅ 无需重连
浏览器兼容 ✅ 广泛支持 ✅ 广泛支持 ✅ 现代浏览器支持

应用场景建议

场景 建议协议
普通网页、API HTTP/2(主流稳定)
高并发前端服务(如 CDN) HTTP/3(提升体验)
企业内网服务 / 微服务 HTTP/1.1 / HTTP/2
移动端用户较多的网站 HTTP/3(连接迁移更优)

总结

为什么HTTP3还没全面普及?

  • QUIC 基于 UDP,对某些旧防火墙/代理不兼容
  • 服务端需要额外部署(如 nginx 需配合 quiche / Cloudflare 早支持)
  • 某些中间件/网关还未完全支持(例如部分企业服务网关)

所以结论是:

  • 前端用户访问层面,HTTP/3 已经逐步成为默认(尤其 CDN 上如 Cloudflare/Akamai)。
  • 后端和通用服务层面,HTTP/2 仍是主力,HTTP/1.1 仍然保留。

HTTP/1.1 是老基础,HTTP/2 是主流现实,HTTP/3 是未来方向。

面试常问

36. What is the head-of-line blocking in HTTP/1.0, and how was it improved in HTTP/1.1?

  • In HTTP/1.0, head-of-line blocking happens because requests are processed sequentially.
  • HTTP/1.1 improved this by allowing pipelining, so multiple requests could be sent without waiting for the previous one to complete.

37. How did HTTP/1.1 improve performance over HTTP/1.0?

  • HTTP/1.1 introduced persistent connections, allowing multiple requests to be sent over the same TCP connection, reducing the overhead of opening new connections.

38. What are the shortcomings of HTTP/1.1 and its performance?

  • HTTP/1.1 still suffers from head-of-line blocking and lacks multiplexing, meaning one slow request can delay others.
    This reduces performance, especially for modern websites with many resources.

39. Can you describe the concept of HTTP/2 and how it works?

  • HTTP/2 improves performance by allowing multiple requests and responses to be multiplexed over a single connection.
    It uses binary framing, which makes communication more efficient compared to HTTP/1.x.

40. What optimizations were made in HTTP/2?

  • HTTP/2 introduced multiplexing, header compression, and prioritization.
    These improvements reduce latency and increase the speed of data transmission.

41. What are the advantages of HTTP/2 over HTTP/1.1?

  • HTTP/2 improves performance by allowing multiple streams of data over a single connection, eliminating the head-of-line blocking issue.
    It also uses header compression and prioritizes requests to deliver content faster.

42. What are the drawbacks of HTTP/2, and how does HTTP/3 improve upon it?

  • HTTP/2 suffers from head-of-line blocking at the TCP layer.
    HTTP/3 uses QUIC, a new protocol based on UDP, to eliminate this issue and improve connection setup time.

HTTP CODE

1xx (Informational)

  • 100 Continue: The client can continue with the request.
  • 101 Switching Protocols: The server is switching protocols as requested.

2xx (Success)

  • 200 OK: The request was successful.
  • 201 Created: The resource was successfully created.
  • 204 No Content: The request was successful, but there’s no content to return.

3xx (Redirection)

  • 301 Moved Permanently: The resource has been permanently moved to a new URL.
  • 302 Found: The resource is temporarily at a different URL.
  • 303 see other: which tells the client to fetch the resource using a GET request, often after a POST.
  • 304 Not Modified: The resource hasn’t changed; use cached version. 304 means “Not Modified,” indicating that the resource has not been modified since the last request, and the browser should use the cached version.

4xx (Client Errors)

  • 400 Bad Request: The request is malformed or invalid.
  • 401 Unauthorized: Authentication is required.
  • 403 Forbidden: You’re authenticated but not authorized.
  • 404 Not Found: The resource doesn’t exist.
  • 405 Method Not Allowed: The HTTP method is not allowed for this resource.

5xx (Server Errors)

  • 500 Internal Server Error: A generic server error.
  • 502 Bad Gateway: Invalid response from an upstream server.
  • 503 Service Unavailable: The server is temporarily unavailable.
  • 504 Gateway Timeout: The upstream server didn’t respond in time.

CORS

CORS(Cross-Origin Resource Sharing,跨域资源共享) 是一种浏览器安全机制,用于允许或限制不同源(域、协议、端口)的网页向你的服务器请求资源。

为什么需要 CORS

出于安全考虑,浏览器默认禁止网页向不同的源发送请求(同源策略,Same-Origin Policy)。

但在实际开发中,我们经常需要从不同的域名获取数据,比如:

CORS 允许跨域的方式

  • 当前端向服务器发送跨域请求时,服务器需要在响应头(Response Headers)中加入 CORS 相关字段,比如: Access-Control-Allow-Origin: *表示允许所有来源访问该资源。
  • 如果服务器只允许特定的域访问,比如 http://example.com, 那就设置Access-Control-Allow-Origin: http://example.com

CORS 主要的 HTTP 头

头部字段 作用
Access-Control-Request-Method 请求想允许的HTTP方法
Access-Control-Allow-Origin 允许的来源(* 代表所有)
Access-Control-Allow-Methods 允许的 HTTP 方法(如 GET, POST, PUT)
Access-Control-Allow-Headers 允许的请求头
Access-Control-Allow-Credentials 是否允许携带 Cookie

预检请求(Preflight Request)

对于某些跨域请求(如 PUT、DELETE 或自定义头部的 POST),浏览器会先发送一个 OPTIONS 请求,服务器需要返回正确的 CORS 头,浏览器才会继续发送真正的请求。

示例, 比如我当前浏览器在 jd.com 购物, 我现在想通过 JavaScript 访问 taobao.com 的资源, 这就是跨域请求(Cross-Origin Request)了:

  • 限制:浏览器会阻止从 jd.com 的页面通过 JavaScript 去请求 taobao.com 的敏感数据(如 API)。
  • 除非淘宝显式在响应中设置 Access-Control-Allow-Origin: jd.com,否则请求被 CORS 拦截。

得如下操作, 客户端才能通过 JavaScript 访问 taobao.com 的资源:

客户端请求:

OPTIONS /api/data HTTP/1.1
Origin: http://jd.com
Access-Control-Request-Method: POST

服务器响应:

HTTP/1.1 204 No Content
Access-Control-Allow-Origin: http://jd.com
Access-Control-Allow-Methods: POST, GET, OPTIONS

如何解决 CORS 问题?

如果遇到 CORS 跨域报错,可以:

  • • 修改后端代码 添加 CORS 头(推荐)
  • • 使用反向代理(Nginx、Webpack devServer 代理)
  • • 在本地开发时使用浏览器插件(仅限调试)