Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques published online monthly by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: CiteScore - Q2 (Computer Graphics and Computer-Aided Design)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 21.7 days after submission; acceptance to publication is undertaken in 3.8 days (median values for papers published in this journal in the second half of 2023).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.2 (2022);
5-Year Impact Factor:
3.2 (2022)
Latest Articles
Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps
J. Imaging 2024, 10(6), 133; https://doi.org/10.3390/jimaging10060133 (registering DOI) - 29 May 2024
Abstract
Abstract: Accurate and robust 3D human modeling from a single image presents significant challenges. Existing methods have shown potential, but they often fail to generate reconstructions that match the level of detail in the input image. These methods particularly struggle with loose
[...] Read more.
Abstract: Accurate and robust 3D human modeling from a single image presents significant challenges. Existing methods have shown potential, but they often fail to generate reconstructions that match the level of detail in the input image. These methods particularly struggle with loose clothing. They typically employ parameterized human models to constrain the reconstruction process, ensuring the results do not deviate too far from the model and produce anomalies. However, this also limits the recovery of loose clothing. To address this issue, we propose an end-to-end method called IHRPN for reconstructing clothed humans from a single 2D human image. This method includes a feature extraction module for semantic extraction of image features. We propose an image semantic feature extraction aimed at achieving pixel model space consistency and enhancing the robustness of loose clothing. We extract features from the input image to infer and recover the SMPL-X mesh, and then combine it with a normal map to guide the implicit function to reconstruct the complete clothed human. Unlike traditional methods, we use local features for implicit surface regression. Our experimental results show that our IHRPN method performs excellently on the CAPE and AGORA datasets, achieving good performance, and the reconstruction of loose clothing is noticeably more accurate and robust.
Full article
(This article belongs to the Special Issue Self-Supervised Learning for Image Processing and Analysis)
Open AccessArticle
Hybridizing Deep Neural Networks and Machine Learning Models for Aerial Satellite Forest Image Segmentation
by
Clopas Kwenda, Mandlenkosi Gwetu and Jean Vincent Fonou-Dombeu
J. Imaging 2024, 10(6), 132; https://doi.org/10.3390/jimaging10060132 (registering DOI) - 29 May 2024
Abstract
►▼
Show Figures
Forests play a pivotal role in mitigating climate change as well as contributing to the socio-economic activities of many countries. Therefore, it is of paramount importance to monitor forest cover. Traditional machine learning classifiers for segmenting images lack the ability to extract features
[...] Read more.
Forests play a pivotal role in mitigating climate change as well as contributing to the socio-economic activities of many countries. Therefore, it is of paramount importance to monitor forest cover. Traditional machine learning classifiers for segmenting images lack the ability to extract features such as the spatial relationship between pixels and texture, resulting in subpar segmentation results when used alone. To address this limitation, this study proposed a novel hybrid approach that combines deep neural networks and machine learning algorithms to segment an aerial satellite image into forest and non-forest regions. Aerial satellite forest image features were first extracted by two deep neural network models, namely, VGG16 and ResNet50. The resulting features are subsequently used by five machine learning classifiers including Random Forest (RF), Linear Support Vector Machines (LSVM), k-nearest neighbor (kNN), Linear Discriminant Analysis (LDA), and Gaussian Naive Bayes (GNB) to perform the final segmentation. The aerial satellite forest images were obtained from a deep globe challenge dataset. The performance of the proposed model was evaluated using metrics such as Accuracy, Jaccard score index, and Root Mean Square Error (RMSE). The experimental results revealed that the RF model achieved the best segmentation results with accuracy, Jaccard score, and RMSE of 94%, 0.913 and 0.245, respectively; followed by LSVM with accuracy, Jaccard score and RMSE of 89%, 0.876, 0.332, respectively. The LDA took the third position with accuracy, Jaccard score, and RMSE of 88%, 0.834, and 0.351, respectively, followed by GNB with accuracy, Jaccard score, and RMSE of 88%, 0.837, and 0.353, respectively. The kNN occupied the last position with accuracy, Jaccard score, and RMSE of 83%, 0.790, and 0.408, respectively. The experimental results also revealed that the proposed model has significantly improved the performance of the RF, LSVM, LDA, GNB and kNN models, compared to their performance when used to segment the images alone. Furthermore, the results showed that the proposed model outperformed other models from related studies, thereby, attesting its superior segmentation capability.
Full article
Figure 1
Open AccessArticle
Greedy Ensemble Hyperspectral Anomaly Detection
by
Mazharul Hossain, Mohammed Younis, Aaron Robinson, Lan Wang and Chrysanthe Preza
J. Imaging 2024, 10(6), 131; https://doi.org/10.3390/jimaging10060131 - 28 May 2024
Abstract
Hyperspectral images include information from a wide range of spectral bands deemed valuable for computer vision applications in various domains such as agriculture, surveillance, and reconnaissance. Anomaly detection in hyperspectral images has proven to be a crucial component of change and abnormality identification,
[...] Read more.
Hyperspectral images include information from a wide range of spectral bands deemed valuable for computer vision applications in various domains such as agriculture, surveillance, and reconnaissance. Anomaly detection in hyperspectral images has proven to be a crucial component of change and abnormality identification, enabling improved decision-making across various applications. These abnormalities/anomalies can be detected using background estimation techniques that do not require the prior knowledge of outliers. However, each hyperspectral anomaly detection (HS-AD) algorithm models the background differently. These different assumptions may fail to consider all the background constraints in various scenarios. We have developed a new approach called Greedy Ensemble Anomaly Detection (GE-AD) to address this shortcoming. It includes a greedy search algorithm to systematically determine the suitable base models from HS-AD algorithms and hyperspectral unmixing for the first stage of a stacking ensemble and employs a supervised classifier in the second stage of a stacking ensemble. It helps researchers with limited knowledge of the suitability of the HS-AD algorithms for the application scenarios to select the best methods automatically. Our evaluation shows that the proposed method achieves a higher average F1-macro score with statistical significance compared to the other individual methods used in the ensemble. This is validated on multiple datasets, including the Airport–Beach–Urban (ABU) dataset, the San Diego dataset, the Salinas dataset, the Hydice Urban dataset, and the Arizona dataset. The evaluation using the airport scenes from the ABU dataset shows that GE-AD achieves a 14.97% higher average F1-macro score than our previous method (HUE-AD), at least 17.19% higher than the individual methods used in the ensemble, and at least 28.53% higher than the other state-of-the-art ensemble anomaly detection algorithms. As using the combination of greedy algorithm and stacking ensemble to automatically select suitable base models and associated weights have not been widely explored in hyperspectral anomaly detection, we believe that our work will expand the knowledge in this research area and contribute to the wider application of this approach.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures
Graphical abstract
Open AccessArticle
Modeling of Ethiopian Beef Meat Marbling Score Using Image Processing for Rapid Meat Grading
by
Tariku Erena, Abera Belay, Demelash Hailu, Bezuayehu Gutema Asefa, Mulatu Geleta and Tesfaye Deme
J. Imaging 2024, 10(6), 130; https://doi.org/10.3390/jimaging10060130 - 28 May 2024
Abstract
Meat characterized by a high marbling value is typically anticipated to display enhanced sensory attributes. This study aimed to predict the marbling scores of rib-eye, steaks sourced from the Longissimus dorsi muscle of different cattle types, namely Boran, Senga, and Sheko, by employing
[...] Read more.
Meat characterized by a high marbling value is typically anticipated to display enhanced sensory attributes. This study aimed to predict the marbling scores of rib-eye, steaks sourced from the Longissimus dorsi muscle of different cattle types, namely Boran, Senga, and Sheko, by employing digital image processing and machine-learning algorithms. Marbling was analyzed using digital image processing coupled with an extreme gradient boosting (GBoost) machine learning algorithm. Meat texture was assessed using a universal texture analyzer. Sensory characteristics of beef were evaluated through quantitative descriptive analysis with a trained panel of twenty. Using selected image features from digital image processing, the marbling score was predicted with R2 (prediction) = 0.83. Boran cattle had the highest fat content in sirloin and chuck cuts (12.68% and 12.40%, respectively), followed by Senga (11.59% and 11.56%) and Sheko (11.40% and 11.17%). Tenderness scores for sirloin and chuck cuts differed among the three breeds: Boran (7.06 ± 2.75 and 3.81 ± 2.24, respectively), Senga (5.54 ± 1.90 and 5.25 ± 2.47), and Sheko (5.43 ± 2.76 and 6.33 ± 2.28 Nmm). Sheko and Senga had similar sensory attributes. Marbling scores were higher in Boran (4.28 ± 1.43 and 3.68 ± 1.21) and Senga (2.88 ± 0.69 and 2.83 ± 0.98) compared to Sheko (2.73 ± 1.28 and 2.90 ± 1.52). The study achieved a remarkable milestone in developing a digital tool for predicting marbling scores of Ethiopian beef breeds. Furthermore, the relationship between quality attributes and beef marbling score has been verified. After further validation, the output of this research can be utilized in the meat industry and quality control authorities.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures
Graphical abstract
Open AccessArticle
Point Cloud Quality Assessment Using a One-Dimensional Model Based on the Convolutional Neural Network
by
Abdelouahed Laazoufi, Mohammed El Hassouni and Hocine Cherifi
J. Imaging 2024, 10(6), 129; https://doi.org/10.3390/jimaging10060129 - 27 May 2024
Abstract
►▼
Show Figures
Recent advancements in 3D modeling have revolutionized various fields, including virtual reality, computer-aided diagnosis, and architectural design, emphasizing the importance of accurate quality assessment for 3D point clouds. As these models undergo operations such as simplification and compression, introducing distortions can significantly impact
[...] Read more.
Recent advancements in 3D modeling have revolutionized various fields, including virtual reality, computer-aided diagnosis, and architectural design, emphasizing the importance of accurate quality assessment for 3D point clouds. As these models undergo operations such as simplification and compression, introducing distortions can significantly impact their visual quality. There is a growing need for reliable and efficient objective quality evaluation methods to address this challenge. In this context, this paper introduces a novel methodology to assess the quality of 3D point clouds using a deep learning-based no-reference (NR) method. First, it extracts geometric and perceptual attributes from distorted point clouds and represent them as a set of 1D vectors. Then, transfer learning is applied to obtain high-level features using a 1D convolutional neural network (1D CNN) adapted from 2D CNN models through weight conversion from ImageNet. Finally, quality scores are predicted through regression utilizing fully connected layers. The effectiveness of the proposed approach is evaluated across diverse datasets, including the Colored Point Cloud Quality Assessment Database (SJTU_PCQA), the Waterloo Point Cloud Assessment Database (WPC), and the Colored Point Cloud Quality Assessment Database featured at ICIP2020. The outcomes reveal superior performance compared to several competing methodologies, as evidenced by enhanced correlation with average opinion scores.
Full article
Figure 1
Open AccessArticle
Integrated Building Modelling Using Geomatics and GPR Techniques for Cultural Heritage Preservation: A Case Study of the Charles V Pavilion in Seville (Spain)
by
María Zaragoza, Vicente Bayarri and Francisco García
J. Imaging 2024, 10(6), 128; https://doi.org/10.3390/jimaging10060128 - 27 May 2024
Abstract
►▼
Show Figures
This paper highlights the fundamental role of integrating different geomatics and geophysical imaging technologies in understanding and preserving cultural heritage, with a focus on the Pavilion of Charles V in Seville (Spain). Using a terrestrial laser scanner, global navigation satellite system, and ground-penetrating
[...] Read more.
This paper highlights the fundamental role of integrating different geomatics and geophysical imaging technologies in understanding and preserving cultural heritage, with a focus on the Pavilion of Charles V in Seville (Spain). Using a terrestrial laser scanner, global navigation satellite system, and ground-penetrating radar, we constructed a building information modelling (BIM) system to derive comprehensive decision-making models to preserve this historical asset. These models enable the generation of virtual reconstructions, encompassing not only the building but also its subsurface, distributable as augmented reality or virtual reality online. By leveraging these technologies, the research investigates complex details of the pavilion, capturing its current structure and revealing insights into past soil compositions and potential subsurface structures. This detailed analysis empowers stakeholders to make informed decisions about conservation and management. Furthermore, transparent data sharing fosters collaboration, advancing collective understanding and practices in heritage preservation.
Full article
Figure 1
Open AccessArticle
Enabling Low-Dose In Vivo Benchtop X-ray Fluorescence Computed Tomography through Deep-Learning-Based Denoising
by
Naghmeh Mahmoodian, Mohammad Rezapourian, Asim Abdulsamad Inamdar, Kunal Kumar, Melanie Fachet and Christoph Hoeschen
J. Imaging 2024, 10(6), 127; https://doi.org/10.3390/jimaging10060127 - 22 May 2024
Abstract
X-ray Fluorescence Computed Tomography (XFCT) is an emerging non-invasive imaging technique providing high-resolution molecular-level data. However, increased sensitivity with current benchtop X-ray sources comes at the cost of high radiation exposure. Artificial Intelligence (AI), particularly deep learning (DL), has revolutionized medical imaging by
[...] Read more.
X-ray Fluorescence Computed Tomography (XFCT) is an emerging non-invasive imaging technique providing high-resolution molecular-level data. However, increased sensitivity with current benchtop X-ray sources comes at the cost of high radiation exposure. Artificial Intelligence (AI), particularly deep learning (DL), has revolutionized medical imaging by delivering high-quality images in the presence of noise. In XFCT, traditional methods rely on complex algorithms for background noise reduction, but AI holds promise in addressing high-dose concerns. We present an optimized Swin-Conv-UNet (SCUNet) model for background noise reduction in X-ray fluorescence (XRF) images at low tracer concentrations. Our method’s effectiveness is evaluated against higher-dose images, while various denoising techniques exist for X-ray and computed tomography (CT) techniques, only a few address XFCT. The DL model is trained and assessed using augmented data, focusing on background noise reduction. Image quality is measured using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), comparing outcomes with 100% X-ray-dose images. Results demonstrate that the proposed algorithm yields high-quality images from low-dose inputs, with maximum PSNR of 39.05 and SSIM of 0.86. The model outperforms block-matching and 3D filtering (BM3D), block-matching and 4D filtering (BM4D), non-local means (NLM), denoising convolutional neural network (DnCNN), and SCUNet in both visual inspection and quantitative analysis, particularly in high-noise scenarios. This indicates the potential of AI, specifically the SCUNet model, in significantly improving XFCT imaging by mitigating the trade-off between sensitivity and radiation exposure.
Full article
(This article belongs to the Special Issue Recent Advances in X-ray Imaging)
Open AccessArticle
Fine-Grained Food Image Recognition: A Study on Optimising Convolutional Neural Networks for Improved Performance
by
Liam Boyd, Nonso Nnamoko and Ricardo Lopes
J. Imaging 2024, 10(6), 126; https://doi.org/10.3390/jimaging10060126 - 22 May 2024
Abstract
Addressing the pressing issue of food waste is vital for environmental sustainability and resource conservation. While computer vision has been widely used in food waste reduction research, existing food image datasets are typically aggregated into broad categories (e.g., fruits, meat, dairy, etc.) rather
[...] Read more.
Addressing the pressing issue of food waste is vital for environmental sustainability and resource conservation. While computer vision has been widely used in food waste reduction research, existing food image datasets are typically aggregated into broad categories (e.g., fruits, meat, dairy, etc.) rather than the fine-grained singular food items required for this research. The aim of this study is to develop a model capable of identifying individual food items to be integrated into a mobile application that allows users to photograph their food items, identify them, and offer suggestions for recipes. This research bridges the gap in available datasets and contributes to a more fine-grained approach to utilising existing technology for food waste reduction, emphasising both environmental and research significance. This study evaluates various (n = 7) convolutional neural network architectures for multi-class food image classification, emphasising the nuanced impact of parameter tuning to identify the most effective configurations. The experiments were conducted with a custom dataset comprising 41,949 food images categorised into 20 food item classes. Performance evaluation was based on accuracy and loss. DenseNet architecture emerged as the top-performing out of the seven examined, establishing a baseline performance (training accuracy = 0.74, training loss = 1.25, validation accuracy = 0.68, and validation loss = 2.89) on a predetermined set of parameters, including the RMSProp optimiser, ReLU activation function, ‘0.5’ dropout rate, and a image size. Subsequent parameter tuning involved a comprehensive exploration, considering six optimisers, four image sizes, two dropout rates, and five activation functions. The results show the superior generalisation capabilities of the optimised DenseNet, showcasing performance improvements over the established baseline across key metrics. Specifically, the optimised model demonstrated a training accuracy of 0.99, a training loss of 0.01, a validation accuracy of 0.79, and a validation loss of 0.92, highlighting its improved performance compared to the baseline configuration. The optimal DenseNet has been integrated into a mobile application called FridgeSnap, designed to recognise food items and suggest possible recipes to users, thus contributing to the broader mission of minimising food waste.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures
Figure 1
Open AccessArticle
MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation
by
Nikolaos Detsikas, Nikolaos Mitianoudis and Ioannis Pratikakis
J. Imaging 2024, 10(6), 125; https://doi.org/10.3390/jimaging10060125 - 21 May 2024
Abstract
A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation
[...] Read more.
A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.
Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision)
►▼
Show Figures
Figure 1
Open AccessArticle
Fast Linde–Buzo–Gray (FLBG) Algorithm for Image Compression through Rescaling Using Bilinear Interpolation
by
Muhammmad Bilal, Zahid Ullah, Omer Mujahid and Tama Fouzder
J. Imaging 2024, 10(5), 124; https://doi.org/10.3390/jimaging10050124 - 20 May 2024
Abstract
Vector quantization (VQ) is a block coding method that is famous for its high compression ratio and simple encoder and decoder implementation. Linde–Buzo–Gray (LBG) is a renowned technique for VQ that uses a clustering-based approach for finding the optimum codebook. Numerous algorithms, such
[...] Read more.
Vector quantization (VQ) is a block coding method that is famous for its high compression ratio and simple encoder and decoder implementation. Linde–Buzo–Gray (LBG) is a renowned technique for VQ that uses a clustering-based approach for finding the optimum codebook. Numerous algorithms, such as Particle Swarm Optimization (PSO), the Cuckoo search algorithm (CS), bat algorithm, and firefly algorithm (FA), are used for codebook design. These algorithms are primarily focused on improving the image quality in terms of the PSNR and SSIM but use exhaustive searching to find the optimum codebook, which causes the computational time to be very high. In our study, our algorithm enhances LBG by minimizing the computational complexity by reducing the total number of comparisons among the codebook and training vectors using a match function. The input image is taken as a training vector at the encoder side, which is initialized with the random selection of the vectors from the input image. Rescaling using bilinear interpolation through the nearest neighborhood method is performed to reduce the comparison of the codebook with the training vector. The compressed image is first downsized by the encoder, which is then upscaled at the decoder side during decompression. Based on the results, it is demonstrated that the proposed method reduces the computational complexity by 50.2% compared to LBG and above 97% compared to the other LBG-based algorithms. Moreover, a 20% reduction in the memory size is also obtained, with no significant loss in the image quality compared to the LBG algorithm.
Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
►▼
Show Figures
Figure 1
Open AccessCommunication
Head Gesture Recognition Combining Activity Detection and Dynamic Time Warping
by
Huaizhou Li and Haiyan Hu
J. Imaging 2024, 10(5), 123; https://doi.org/10.3390/jimaging10050123 - 19 May 2024
Abstract
The recognition of head movements plays an important role in human–computer interface domains. The data collected with image sensors or inertial measurement unit (IMU) sensors are often used for identifying these types of actions. Compared with image processing methods, a recognition system using
[...] Read more.
The recognition of head movements plays an important role in human–computer interface domains. The data collected with image sensors or inertial measurement unit (IMU) sensors are often used for identifying these types of actions. Compared with image processing methods, a recognition system using an IMU sensor has obvious advantages in terms of complexity, processing speed, and cost. In this paper, an IMU sensor is used to collect head movement data on the legs of glasses, and a new approach for recognizing head movements is proposed by combining activity detection and dynamic time warping (DTW). The activity detection of the time series of head movements is essentially based on the different characteristics exhibited by actions and noises. The DTW method estimates the warp path distances between the time series of the actions and the templates by warping under the time axis. Then, the types of head movements are determined by the minimum of these distances. The results show that a 100% accuracy was achieved in the task of classifying six types of head movements. This method provides a new option for head gesture recognition in current human–computer interfaces.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures
Figure 1
Open AccessArticle
Imaging-Based Deep Learning for Predicting Desmoid Tumor Progression
by
Rabih Fares, Lilian D. Atlan, Ido Druckmann, Shai Factor, Yair Gortzak, Ortal Segal, Moran Artzi and Amir Sternheim
J. Imaging 2024, 10(5), 122; https://doi.org/10.3390/jimaging10050122 - 17 May 2024
Abstract
Desmoid tumors (DTs) are non-metastasizing and locally aggressive soft-tissue mesenchymal neoplasms. Those that become enlarged often become locally invasive and cause significant morbidity. DTs have a varied pattern of clinical presentation, with up to 50–60% not growing after diagnosis and 20–30% shrinking or
[...] Read more.
Desmoid tumors (DTs) are non-metastasizing and locally aggressive soft-tissue mesenchymal neoplasms. Those that become enlarged often become locally invasive and cause significant morbidity. DTs have a varied pattern of clinical presentation, with up to 50–60% not growing after diagnosis and 20–30% shrinking or even disappearing after initial progression. Enlarging tumors are considered unstable and progressive. The management of symptomatic and enlarging DTs is challenging, and primarily consists of chemotherapy. Despite wide surgical resection, DTs carry a rate of local recurrence as high as 50%. There is a consensus that contrast-enhanced magnetic resonance imaging (MRI) or, alternatively, computerized tomography (CT) is the preferred modality for monitoring DTs. Each uses Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST 1.1), which measures the largest diameter on axial, sagittal, or coronal series. This approach, however, reportedly lacks accuracy in detecting response to therapy and fails to detect tumor progression, thus calling for more sophisticated methods. The objective of this study was to detect unique features identified by deep learning that correlate with the future clinical course of the disease. Between 2006 and 2019, 51 patients (mean age 41.22 ± 15.5 years) who had a tissue diagnosis of DT were included in this retrospective single-center study. Each had undergone at least three MRI examinations (including a pretreatment baseline study), and each was followed by orthopedic oncology specialists for a median of 38.83 months (IQR 44.38). Tumor segmentations were performed on a T2 fat-suppressed treatment-naive MRI sequence, after which the segmented lesion was extracted to a three-dimensional file together with its DICOM file and run through deep learning software. The results of the algorithm were then compared to clinical data collected from the patients’ medical files. There were 28 males (13 stable) and 23 females (15 stable) whose ages ranged from 19.07 to 83.33 years. The model was able to independently predict clinical progression as measured from the baseline MRI with an overall accuracy of 93% (93 ± 0.04) and ROC of 0.89 ± 0.08. Artificial intelligence may contribute to risk stratification and clinical decision-making in patients with DT by predicting which patients are likely to progress.
Full article
(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)
►▼
Show Figures
Figure 1
Open AccessArticle
Overcoming Dimensionality Constraints: A Gershgorin Circle Theorem-Based Feature Extraction for Weighted Laplacian Matrices in Computer Vision Applications
by
Sahaj Anilbhai Patel and Abidin Yildirim
J. Imaging 2024, 10(5), 121; https://doi.org/10.3390/jimaging10050121 - 15 May 2024
Abstract
In graph theory, the weighted Laplacian matrix is the most utilized technique to interpret the local and global properties of a complex graph structure within computer vision applications. However, with increasing graph nodes, the Laplacian matrix’s dimensionality also increases accordingly. Therefore, there is
[...] Read more.
In graph theory, the weighted Laplacian matrix is the most utilized technique to interpret the local and global properties of a complex graph structure within computer vision applications. However, with increasing graph nodes, the Laplacian matrix’s dimensionality also increases accordingly. Therefore, there is always the “curse of dimensionality”; In response to this challenge, this paper introduces a new approach to reducing the dimensionality of the weighted Laplacian matrix by utilizing the Gershgorin circle theorem by transforming the weighted Laplacian matrix into a strictly diagonal domain and then estimating rough eigenvalue inclusion of a matrix. The estimated inclusions are represented as reduced features, termed GC features; The proposed Gershgorin circle feature extraction (GCFE) method was evaluated using three publicly accessible computer vision datasets, varying image patch sizes, and three different graph types. The GCFE method was compared with eight distinct studies. The GCFE demonstrated a notable positive Z-score compared to other feature extraction methods such as I-PCA, kernel PCA, and spectral embedding. Specifically, it achieved an average Z-score of 6.953 with the 2D grid graph type and 4.473 with the pairwise graph type, particularly on the E_Balanced dataset. Furthermore, it was observed that while the accuracy of most major feature extraction methods declined with smaller image patch sizes, the GCFE maintained consistent accuracy across all tested image patch sizes. When the GCFE method was applied to the E_MNSIT dataset using the K-NN graph type, the GCFE method confirmed its consistent accuracy performance, evidenced by a low standard deviation (SD) of 0.305. This performance was notably lower compared to other methods like Isomap, which had an SD of 1.665, and LLE, which had an SD of 1.325; The GCFE outperformed most feature extraction methods in terms of classification accuracy and computational efficiency. The GCFE method also requires fewer training parameters for deep-learning models than the traditional weighted Laplacian method, establishing its potential for more effective and efficient feature extraction in computer vision tasks.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures
Figure 1
Open AccessReview
Advances in Real-Time 3D Reconstruction for Medical Endoscopy
by
Alexander Richter, Till Steinmann, Jean-Claude Rosenthal and Stefan J. Rupitsch
J. Imaging 2024, 10(5), 120; https://doi.org/10.3390/jimaging10050120 - 14 May 2024
Abstract
This contribution is intended to provide researchers with a comprehensive overview of the current state-of-the-art concerning real-time 3D reconstruction methods suitable for medical endoscopy. Over the past decade, there have been various technological advancements in computational power and an increased research effort in
[...] Read more.
This contribution is intended to provide researchers with a comprehensive overview of the current state-of-the-art concerning real-time 3D reconstruction methods suitable for medical endoscopy. Over the past decade, there have been various technological advancements in computational power and an increased research effort in many computer vision fields such as autonomous driving, robotics, and unmanned aerial vehicles. Some of these advancements can also be adapted to the field of medical endoscopy while coping with challenges such as featureless surfaces, varying lighting conditions, and deformable structures. To provide a comprehensive overview, a logical division of monocular, binocular, trinocular, and multiocular methods is performed and also active and passive methods are distinguished. Within these categories, we consider both flexible and non-flexible endoscopes to cover the state-of-the-art as fully as possible. The relevant error metrics to compare the publications presented here are discussed, and the choice of when to choose a GPU rather than an FPGA for camera-based 3D reconstruction is debated. We elaborate on the good practice of using datasets and provide a direct comparison of the presented work. It is important to note that in addition to medical publications, publications evaluated on the KITTI and Middlebury datasets are also considered to include related methods that may be suited for medical 3D reconstruction.
Full article
(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)
►▼
Show Figures
Figure 1
Open AccessSystematic Review
The Accuracy of Three-Dimensional Soft Tissue Simulation in Orthognathic Surgery—A Systematic Review
by
Anna Olejnik, Laurence Verstraete, Tomas-Marijn Croonenborghs, Constantinus Politis and Gwen R. J. Swennen
J. Imaging 2024, 10(5), 119; https://doi.org/10.3390/jimaging10050119 - 14 May 2024
Abstract
Three-dimensional soft tissue simulation has become a popular tool in the process of virtual orthognathic surgery planning and patient–surgeon communication. To apply 3D soft tissue simulation software in routine clinical practice, both qualitative and quantitative validation of its accuracy are required. The objective
[...] Read more.
Three-dimensional soft tissue simulation has become a popular tool in the process of virtual orthognathic surgery planning and patient–surgeon communication. To apply 3D soft tissue simulation software in routine clinical practice, both qualitative and quantitative validation of its accuracy are required. The objective of this study was to systematically review the literature on the accuracy of 3D soft tissue simulation in orthognathic surgery. The Web of Science, PubMed, Cochrane, and Embase databases were consulted for the literature search. The systematic review (SR) was conducted according to the PRISMA statement, and 40 articles fulfilled the inclusion and exclusion criteria. The Quadas-2 tool was used for the risk of bias assessment for selected studies. A mean error varying from 0.27 mm to 2.9 mm for 3D soft tissue simulations for the whole face was reported. In the studies evaluating 3D soft tissue simulation accuracy after a Le Fort I osteotomy only, the upper lip and paranasal regions were reported to have the largest error, while after an isolated bilateral sagittal split osteotomy, the largest error was reported for the lower lip and chin regions. In the studies evaluating simulation after bimaxillary osteotomy with or without genioplasty, the highest inaccuracy was reported at the level of the lips, predominantly the lower lip, chin, and, sometimes, the paranasal regions. Due to the variability in the study designs and analysis methods, a direct comparison was not possible. Therefore, based on the results of this SR, guidelines to systematize the workflow for evaluating the accuracy of 3D soft tissue simulations in orthognathic surgery in future studies are proposed.
Full article
(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)
►▼
Show Figures
Figure 1
Open AccessArticle
Semi-Supervised Medical Image Segmentation Based on Deep Consistent Collaborative Learning
by
Xin Zhao and Wenqi Wang
J. Imaging 2024, 10(5), 118; https://doi.org/10.3390/jimaging10050118 - 14 May 2024
Abstract
In the realm of medical image analysis, the cost associated with acquiring accurately labeled data is prohibitively high. To address the issue of label scarcity, semi-supervised learning methods are employed, utilizing unlabeled data alongside a limited set of labeled data. This paper presents
[...] Read more.
In the realm of medical image analysis, the cost associated with acquiring accurately labeled data is prohibitively high. To address the issue of label scarcity, semi-supervised learning methods are employed, utilizing unlabeled data alongside a limited set of labeled data. This paper presents a novel semi-supervised medical segmentation framework, DCCLNet (deep consistency collaborative learning UNet), grounded in deep consistent co-learning. The framework synergistically integrates consistency learning from feature and input perturbations, coupled with collaborative training between CNN (convolutional neural networks) and ViT (vision transformer), to capitalize on the learning advantages offered by these two distinct paradigms. Feature perturbation involves the application of auxiliary decoders with varied feature disturbances to the main CNN backbone, enhancing the robustness of the CNN backbone through consistency constraints generated by the auxiliary and main decoders. Input perturbation employs an MT (mean teacher) architecture wherein the main network serves as the student model guided by a teacher model subjected to input perturbations. Collaborative training aims to improve the accuracy of the main networks by encouraging mutual learning between the CNN and ViT. Experiments conducted on publicly available datasets for ACDC (automated cardiac diagnosis challenge) and Prostate datasets yielded Dice coefficients of 0.890 and 0.812, respectively. Additionally, comprehensive ablation studies were performed to demonstrate the effectiveness of each methodological contribution in this study.
Full article
(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)
►▼
Show Figures
Figure 1
Open AccessArticle
Bayesian Networks in the Management of Hospital Admissions: A Comparison between Explainable AI and Black Box AI during the Pandemic
by
Giovanna Nicora, Michele Catalano, Chandra Bortolotto, Marina Francesca Achilli, Gaia Messana, Antonio Lo Tito, Alessio Consonni, Sara Cutti, Federico Comotto, Giulia Maria Stella, Angelo Corsico, Stefano Perlini, Riccardo Bellazzi, Raffaele Bruno and Lorenzo Preda
J. Imaging 2024, 10(5), 117; https://doi.org/10.3390/jimaging10050117 - 10 May 2024
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) approaches that could learn from large data sources have been identified as useful tools to support clinicians in their decisional process; AI and ML implementations have had a rapid acceleration during the recent COVID-19 pandemic. However,
[...] Read more.
Artificial Intelligence (AI) and Machine Learning (ML) approaches that could learn from large data sources have been identified as useful tools to support clinicians in their decisional process; AI and ML implementations have had a rapid acceleration during the recent COVID-19 pandemic. However, many ML classifiers are “black box” to the final user, since their underlying reasoning process is often obscure. Additionally, the performance of such models suffers from poor generalization ability in the presence of dataset shifts. Here, we present a comparison between an explainable-by-design (“white box”) model (Bayesian Network (BN)) versus a black box model (Random Forest), both studied with the aim of supporting clinicians of Policlinico San Matteo University Hospital in Pavia (Italy) during the triage of COVID-19 patients. Our aim is to evaluate whether the BN predictive performances are comparable with those of a widely used but less explainable ML model such as Random Forest and to test the generalization ability of the ML models across different waves of the pandemic.
Full article
(This article belongs to the Special Issue Advances and Challenges in Multimodal Machine Learning)
►▼
Show Figures
Figure 1
Open AccessArticle
When Two Eyes Don’t Suffice—Learning Difficult Hyperfluorescence Segmentations in Retinal Fundus Autofluorescence Images via Ensemble Learning
by
Monty Santarossa, Tebbo Tassilo Beyer, Amelie Bernadette Antonia Scharf, Ayse Tatli, Claus von der Burchard, Jakob Nazarenus, Johann Baptist Roider and Reinhard Koch
J. Imaging 2024, 10(5), 116; https://doi.org/10.3390/jimaging10050116 - 9 May 2024
Abstract
Hyperfluorescence (HF) and reduced autofluorescence (RA) are important biomarkers in fundus autofluorescence images (FAF) for the assessment of health of the retinal pigment epithelium (RPE), an important indicator of disease progression in geographic atrophy (GA) or central serous chorioretinopathy (CSCR). Autofluorescence images have
[...] Read more.
Hyperfluorescence (HF) and reduced autofluorescence (RA) are important biomarkers in fundus autofluorescence images (FAF) for the assessment of health of the retinal pigment epithelium (RPE), an important indicator of disease progression in geographic atrophy (GA) or central serous chorioretinopathy (CSCR). Autofluorescence images have been annotated by human raters, but distinguishing biomarkers (whether signals are increased or decreased) from the normal background proves challenging, with borders being particularly open to interpretation. Consequently, significant variations emerge among different graders, and even within the same grader during repeated annotations. Tests on in-house FAF data show that even highly skilled medical experts, despite previously discussing and settling on precise annotation guidelines, reach a pair-wise agreement measured in a Dice score of no more than 63–80% for HF segmentations and only 14–52% for RA. The data further show that the agreement of our primary annotation expert with herself is a 72% Dice score for HF and 51% for RA. Given these numbers, the task of automated HF and RA segmentation cannot simply be refined to the improvement in a segmentation score. Instead, we propose the use of a segmentation ensemble. Learning from images with a single annotation, the ensemble reaches expert-like performance with an agreement of a 64–81% Dice score for HF and 21–41% for RA with all our experts. In addition, utilizing the mean predictions of the ensemble networks and their variance, we devise ternary segmentations where FAF image areas are labeled either as confident background, confident HF, or potential HF, ensuring that predictions are reliable where they are confident (97% Precision), while detecting all instances of HF (99% Recall) annotated by all experts.
Full article
(This article belongs to the Special Issue Medical Image Classification and Segmentation: Progress and Challenges)
►▼
Show Figures
Figure 1
Open AccessTechnical Note
Image Quality Assessment Tool for Conventional and Dynamic Magnetic Resonance Imaging Acquisitions
by
Katerina Nikiforaki, Ioannis Karatzanis, Aikaterini Dovrou, Maciej Bobowicz, Katarzyna Gwozdziewicz, Oliver Díaz, Manolis Tsiknakis, Dimitrios I. Fotiadis, Karim Lekadir and Kostas Marias
J. Imaging 2024, 10(5), 115; https://doi.org/10.3390/jimaging10050115 - 9 May 2024
Abstract
Image quality assessment of magnetic resonance imaging (MRI) data is an important factor not only for conventional diagnosis and protocol optimization but also for fairness, trustworthiness, and robustness of artificial intelligence (AI) applications, especially on large heterogeneous datasets. Information on image quality in
[...] Read more.
Image quality assessment of magnetic resonance imaging (MRI) data is an important factor not only for conventional diagnosis and protocol optimization but also for fairness, trustworthiness, and robustness of artificial intelligence (AI) applications, especially on large heterogeneous datasets. Information on image quality in multi-centric studies is important to complement the contribution profile from each data node along with quantity information, especially when large variability is expected, and certain acceptance criteria apply. The main goal of this work was to present a tool enabling users to assess image quality based on both subjective criteria as well as objective image quality metrics used to support the decision on image quality based on evidence. The evaluation can be performed on both conventional and dynamic MRI acquisition protocols, while the latter is also checked longitudinally across dynamic series. The assessment provides an overall image quality score and information on the types of artifacts and degrading factors as well as a number of objective metrics for automated evaluation across series (BRISQUE score, Total Variation, PSNR, SSIM, FSIM, MS-SSIM). Moreover, the user can define specific regions of interest (ROIs) to calculate the regional signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), thus individualizing the quality output to specific use cases, such as tissue-specific contrast or regional noise quantification.
Full article
(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)
►▼
Show Figures
Figure 1
Open AccessArticle
A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields
by
Raiyan Rahman, Christopher Indris, Goetz Bramesfeld, Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Ivan Grijalva, Brian McCornack, Daniel Flippo, Ajay Sharda and Guanghui Wang
J. Imaging 2024, 10(5), 114; https://doi.org/10.3390/jimaging10050114 - 8 May 2024
Abstract
Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use
[...] Read more.
Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use of harmful chemical pesticides that have negative health and environmental impacts. As a result, a large amount of pesticide is wasted on areas without significant pest infestation. This brings to attention the urgent need for an intelligent autonomous system that can locate and spray sufficiently large infestations selectively within the complex crop canopies. We have developed a large multi-scale dataset for aphid cluster detection and segmentation, collected from actual sorghum fields and meticulously annotated to include clusters of aphids. Our dataset comprises a total of 54,742 image patches, showcasing a variety of viewpoints, diverse lighting conditions, and multiple scales, highlighting its effectiveness for real-world applications. In this study, we trained and evaluated four real-time semantic segmentation models and three object detection models specifically for aphid cluster segmentation and detection. Considering the balance between accuracy and efficiency, Fast-SCNN delivered the most effective segmentation results, achieving 80.46% mean precision, 81.21% mean recall, and 91.66 frames per second (FPS). For object detection, RT-DETR exhibited the best overall performance with a 61.63% mean average precision (mAP), 92.6% mean recall, and 72.55 on an NVIDIA V100 GPU. Our experiments further indicate that aphid cluster segmentation is more suitable for assessing aphid infestations than using detection models.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures
Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Algorithms, Diagnostics, Entropy, Information, J. Imaging
Application of Machine Learning in Molecular Imaging
Topic Editors: Allegra Conti, Nicola Toschi, Marianna Inglese, Andrea Duggento, Matthew Grech-Sollars, Serena Monti, Giancarlo Sportelli, Pietro CarraDeadline: 31 May 2024
Topic in
Applied Sciences, Computation, Entropy, J. Imaging
Color Image Processing: Models and Methods (CIP: MM)
Topic Editors: Giuliana Ramella, Isabella TorcicolloDeadline: 30 July 2024
Topic in
Applied Sciences, Sensors, J. Imaging, MAKE, Optics
Applications in Image Analysis and Pattern Recognition
Topic Editors: Bin Fan, Wenqi RenDeadline: 31 August 2024
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Remote Sensing
Computational Intelligence in Remote Sensing: 2nd Edition
Topic Editors: Yue Wu, Kai Qin, Maoguo Gong, Qiguang MiaoDeadline: 31 December 2024
Conferences
Special Issues
Special Issue in
J. Imaging
The Mixed Reality Revolution: Challenges and Prospects 2nd Edition
Guest Editors: Sébastien Mavromatis, Jean SequeiraDeadline: 31 May 2024
Special Issue in
J. Imaging
Advances and Challenges in Multimodal Machine Learning
Guest Editor: Georgina CosmaDeadline: 30 June 2024
Special Issue in
J. Imaging
Advances in Image Analysis: Shapes, Textures and Multifractals
Guest Editor: Ramakrishnan MukundanDeadline: 31 July 2024
Special Issue in
J. Imaging
New Insights into Photoacoustic Imaging
Guest Editors: Yan Li, Min WuDeadline: 31 August 2024