README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

# Markr - Marking as a Service

**Szymon Szukalski's Submission to the Stile Coding Challenge - July 2024**

## Assumptions

**Deployment**
- Encryption is not required
- Authentication is not required
- This is a standalone service used by a single institution
    - In order to support multiple institutions, the test results would need to be tied to the institution somehow.
- Imported data should persist between restarts of the service

**Import endpoint**
- That all the fields in the supplied XML document are mandatory and cannot be blank
- That the `student-id` is unique
- That marks obtained will never be higher than marks available
- Ignoring the `<answer/>` elements during import as all the necessary information is contained in the `<summary-marks/>` elements
- The endpoint may receive other XML documents which don't conform to the samle XML format and that they should be rejected.
- Duplicate entries should be handled gracefully

**Aggregate endpoint**
- That standard deviation is a requirement as it is included in the expected output:

```shell
curl http://localhost:4567/results/1234/aggregate
{"mean":65.0,"stddev":0.0,"min":65.0,"max":65.0,"p25":65.0,"p50":65.0,"p75":65.0,"count":1}
```

## Approach

- Implemented the application using Spring Boot, leveraging my Java background and the framework’s ease of setup for microservices.
- Created Java classes to represent the XML payloads
- Used PostgreSQL for data storage
- Used in-memory H2 database for integration testing
- Designed entity models for Student, Test, and TestResult to map database tables and manage relationships.
- Applied common Spring patterns:
    - Used JPA for ORM to interact with the PostgreSQL database.
    - Defined RESTful endpoints with RestController to handle HTTP requests.
    - Created repository interfaces for managing CRUD operations.
    - Encapsulated business logic in service classes to maintain separation of concerns.
- Developed unit and integration tests to validate REST endpoints and service functionality.
- Set up a multi-stage Docker Compose file to build, test, and run the application independently, supporting CI/CD pipelines.

**Data Model**
![Markr Data Model](markr_data_model.png)

## Next Steps / Recommendations

**Codebase Maturity**
- Document the API with endpoint details, request/response examples, error handling, and authentication.
- Expand test-suite with more test cases
- Integrate into CI/CD

**Security and Monitoring**
- Implement SSL, encryption, and authentication to secure communication, encrypt data at rest, and control access.
- Create health check endpoints to monitor service status and integrate with an observability platform.
- Implement real-time performance monitoring to track system responsiveness and resource usage. 

**Performance**
- Use profiling tools to identify and resolve performance bottlenecks in data processing and display.
- Index frequently queried fields to speed up query execution.
- Review and optimize SQL queries to reduce execution time.

**Supporting Real-time Dashboards**
- Implement real-time data streams with technologies like Apache Kafka or WebSockets to push live updates.
- Set up an event-driven architecture to automatically refresh the dashboard when new data becomes available.
- Calculate and store aggregate data in advance to speed up updates and reduce processing during real-time refreshes.
- 
**Alternate Pipeline Approach**
- Use an ELT approach by storing the original XML payload in block storage before loading and transforming it.
- Storing the original XML payload enables updating the transformation logic and reprocessing the original data with new logic if needed.
- Leverage the event-driven architecture to facilitate replaying and correcting results if bugs are fixed in the transformation logic, allowing for data reprocessing and validation.

## Running the Project

This project uses Docker for containerization.

### Docker CLI Version

The project uses the latest version of the Docker CLI (version 27.1), which includes the integrated `docker compose`  
command for managing multi-container Docker applications.

### Build

To build the Docker image for the application, run the following command:

```shell  
docker compose build
```  

This command will build the application image using the Dockerfile defined in the project.

### Test

To run the tests using Docker, use the following command:

```shell  
docker compose run --rm tests
```  

This command will build the Docker image (if not already built), start a container for testing, and run the tests. The  
--rm flag ensures that the test container is removed after the tests complete.

### Run

To run the application, use the following command:

```shell  
docker compose up postgres service
```  

This command will start the application and its dependencies (like PostgreSQL) as defined in the docker-compose.yml  
file.

### Cleanup

To stop the running containers and remove them along with associated volumes, use:

```shell  
docker compose down -v
```  

This command will stop and remove the containers and any associated volumes.