Deep Visual-Semantic Alignments for Generating Image Descriptions

Because of the Nov. 14th submission  deadline for this years IEEE Conference on Computer Vision and Pattern Recognition (CVPR) several big image-recognition papers are coming out this week: From Andrej Karpathy and Li Fei-Fei of Stanford: We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations... ( website with examples ) ( full paper ) From Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan at Google: Show and Tell: A Neural Image Caption Generator  ( announcement post ) ( full paper ) From Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel at University of Toronto: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models  ( full paper ) From Junhua Mao, Wei Xu, Yi Yang, Jiang Wang and Alan L. Yuille at Baidu Research/UCLA: Explain Images with Multimodal Recurrent Neural Networks  ( full paper ) From Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell at UT Austin, UMass Lowell and UC Berkeley: Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( full paper ) All these came from this Hacker News discussion .

DJI Inspire 1

DJI's new Inspire 1 with 4K camera( $2899 USD from DJI ): Aircraft specs: Hovering Accuracy (GPS Mode) Vertical: 1.6' / 0.5 m Horizontal: 8.2' / 2.5 m Maximum Angular Velocity Pitch: 300°/s Yaw: 150°/s Maximum Tilt Angle 35°/s Maximum Ascent/Descent Speed Ascent: 16.4 fps / 5 m/s Descent: 13.1 fps / 4 m/s Maximum Speed 72.2 fps / 22 m/s (Attitude mode; no wind) Maximum Flight Altitude 14,764' / 4,500 m Maximum Wind Speed Resistance 32.8 fps / 10 m/s Maximum Flight Time Up to 18 minutes Camera: Model Name: X3 Designation: FC350 Sensor Sony EXMOR 1/2.3" CMOS Resolution 12.0 MP Lens Field of View: 94° Focal Length (35 mm Equivalent): 20 mm  Aperture: f/2.8 Design: 9 elements in 9 groups; aspherical lens element Filters: Anti-distortion filter; UV filter Video Recording UHD (4K): 4096 x 2160: 24p, 25p 3840 x 2160: 24p, 25p, 30p FHD (1080p): 1920 x 1080: 24p, 25p, 30p, 48p, 50p, 60p HD (720p): 1280 x 720: 24p, 25p, 30p, 48p, 50p, 60p Maximum Biterate: 60 Mbp/s File Format Photo: JPEG, DNG Video: MP4 in a .MOV wrapper (MPEG-4 AVC/H.264) Recording Media Type: microSD/SDHC/SDXC up to 64 GB Speed: Class 10 or faster Format: FAT32/exFAT Photography Modes Single shot Burst: 3, 5, 7 frames per second (AEB: 3/5 frames per second; 0.7 EV bias) Time-lapse Operating Temperature 32 to 104°F / 0 to 40°C Gimbal: Model Zenmuse X3 Number of Axes 3-axis Control Accuracy ±0.03° Maximum Controlled Rotation Speed Pitch: 120°/s Pan: 180°/s Controlled Rotation Range Pitch: -90° to +30° Pan: ±330° Angular Vibration Range ±0.03° Output Power Static: 9 W In Motion: 11 W Operational Current Static: 750 mA In Motion: 900 mA Mounting Detachable   Video with required "Johnny Ives-alike" introductory speech:

Perfecting Dental Treatments Via 3D Printed Models & Removable Dies

The case study presented herein illustrates the ease of utilizing consistent and reproducible 3D printed verification protocols as a means of ensuring the success of the restorative treatment plan.

Project Beyond: 360° 3D Camera

From Project Beyond/Samsung: Today we offer a sneak preview of Project Beyond,the world’s first true 3D 360˚ omniview camera. Beyond captures and streams immersive videos in stunning high-resolution 3D, and allows every user to enjoy their viewing experience in the way they see fit. It offers full 3D reconstruction in all directions, using stereo camera pairs combined with a top-view camera to capture independent left and right eye stereo pairs. Project Beyond uses patent-pending stereoscopic interleaved capture and 3D-aware stitching technology to capture the scene just like the human eye, but in a form factor that is extremely compact. The innovative reconstruction system recreates the view geometry in the same way that the human eyes see, producing unparalleled 3D perception. Project Beyond is not a product, but one of the many exciting projects currently being developed by the Think Tank Team, an advanced research team within Samsung Research America. This is the first operational version of the device, and just a taste of what the final system we are working on will be capable of. Once complete, we hope to deploy Project Beyond around the world to beautiful and noteworthy locations and events, and allow users to experience those locations as if they were really there. The camera system can stream real time events, as well as store the data for future viewing... ( website )

Are Ag Robots Ready? 27 Companies Profiled

This article profiles 27 of the many companies (from conglomerates to start-ups) attempting to provide robotic solutions for farming problems and explores what they are doing, when their products will be available, and at what cost.

Atlas Karate Kid

Atlas robot at IHMC standing on a stack of cinder blocks doing various poses. Robot is built by Boston Dynamics.

Micro Stepper Motors Keep Surgical Viewing System Small

Compact, automated lens positioner allows surgeons to view retina and cornea without effort or neck strain.

ROBO-STOX® introduces European Robotics and Automation Index

ROBO-STOX® licenses their proprietary index to ETF Securities to provide European investors with highly diversified access to a new age of growth in robotics and automation.

FPV Racing League

International FPV Multicopter Racing League: EQUIPMENT In the spec class, competitors must use a quadcopter with 2300Kv motors, 3S LiPo and 5" props. This is to ensure a level playing field amongst competitors with different budgetary constraints. SCORING 10 points will be awarded for 1st place, 8 points for 2nd place, 5 points for 3rd place and 1 point for 4th place. These results will be recorded on the regional leaderboard, with the champions at the end of each season being invited to a national competition. OBSTACLES Throughout the course there will be obstacles such as hoops. Missing an obstacle will incur a time penalty. These obstacles should be made clearly visible with brightly-colored material or flashing lights... ( Official page ) ( Subreddit ) ( Next event Brisbane, Dev 7 )

Boston Magazine Profiles Rodney Brooks of Rethink

Long article about Rodney Brooks co-founder of Rethink and former CTO at iRobot: ...Brooks cofounded the bedford-based iRobot in 1990, and his motivation, he explains, had something to do with vanity: “My thoughts on my self-image at the time was that I didn’t really want to be remembered for building insects.” Then he pauses for a moment and laughs. “But after that I started building vacuum-cleaning robots. And now there is a research group using Baxter to open stool samples. So now it’s shit-handling robots. I think maybe I should have quit while I was ahead. You know, that’s something no one ever says: ‘I hope my kid grows up to open stool samples... ( full article )

Rethink Robotics Introduces Industry-First Robot Positioning System

This disruptive technology enables Baxter to switch between tasks without retraining by using environmental markers, called Landmarks™, in conjunction with its existing, embedded vision system.

Reverse OCR

From Reverse OCR's tumblr: I am a bot that grabs a random word and draws semi-random lines until the OCRad.js library recognizes it as the word. By Darius Kazemi, creator of  Alternate Universe Prompts ,  Museum Bot , and  Scenes from The Wire ... ( see the latest results )

Small DC Motors Power Mini Reconnaissance Robot

Ruggedized motors, coupled with a titanium housing and clever clutch design deliver a dumbbell-sized unmanned ground vehicle that can survive a 30 foot drop onto concrete.

3D Reconstruction Firm Paracosm Has Closed $3.3 Million In Seed Funding

From Paracosm: Paracosm, a cloud-based software company, raised 3.3 million in seed round funding to further its mission to 3D-ify the world. The round, led by Atlas Venture, includes contributions from iRobot, Osage University Partners, BOLDstart Ventures, New World Angels, Deep Fork Capital and a number of angel investors.  Paracosm's advanced three-dimensional reconstruction technologies create digital models of physical spaces. When shared with machines, these models serve as blueprints which provide robots and applications a greater sense of awareness and understanding of the physical world. Such technologies are valuable for robotics, video game development, special effects, indoor navigation applications, and for the improvement of both virtual and augmented reality experiences... ( full press release )

Telemedicine Robots Bring Expertise to Remote Areas

Sophisticated controls and positioning deliver a lifelike experience to doctor and patient alike.

Records 2926 to 2940 of 3562

First | Previous | Next | Last

Featured Product

Discover how human-robot collaboration can take flexibility to new heights!

Discover how human-robot collaboration can take flexibility to new heights!

Humans and robots can now share tasks - and this new partnership is on the verge of revolutionizing the production line. Today's drivers like data-driven services, decreasing product lifetimes and the need for product differentiation are putting flexibility paramount, and no technology is better suited to meet these needs than the Omron TM Series Collaborative Robot. With force feedback, collision detection technology and an intuitive, hand-guided teaching mechanism, the TM Series cobot is designed to work in immediate proximity to a human worker and is easier than ever to train on new tasks.